【昇腾】用MindIE拉QwQ-32b模型报错:AttributeError: ‘ForkAwareLocal‘ object has no attribute ‘connection‘
用MindIE拉QwQ-32b模型报错:AttributeError: 'ForkAwareLocal' object has no attribute 'connection'权重文件下载不完整导致问题safetensors_rust.SafetensorError: Error while deserializing header: HeaderTooLarge
·
问题描述
用MindIE拉qwq-32b模型报错:AttributeError: 'ForkAwareLocal' object has no attribute 'connection',详细错误信息如下:
[2025-03-25 21:50:29,878] [1125] [281464557924704] [llm] [INFO] [dist.py-81] : initialize_distributed has been Set
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
Process ForkServerProcess-8:
Traceback (most recent call last):
File "/usr/lib64/python3.11/multiprocessing/managers.py", line 814, in _callmethod
conn = self._tls.connection
^^^^^^^^^^^^^^^^^^^^
AttributeError: 'ForkAwareLocal' object has no attribute 'connection'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib64/python3.11/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib64/python3.11/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/tbe/common/repository_manager/route.py", line 71, in wrapper
raise exp
File "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/tbe/common/repository_manager/route.py", line 63, in wrapper
func(*args, **kwargs)
File "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/tbe/common/repository_manager/route.py", line 264, in task_distribute
resource_proxy[SUB_PROCESS_STATE].append(True)
File "<string>", line 2, in append
File "/usr/lib64/python3.11/multiprocessing/managers.py", line 818, in _callmethod
self._connect()
File "/usr/lib64/python3.11/multiprocessing/managers.py", line 805, in _connect
conn = self._Client(self._token.address, authkey=self._authkey)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib64/python3.11/multiprocessing/connection.py", line 518, in Client
c = SocketClient(address)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib64/python3.11/multiprocessing/connection.py", line 646, in SocketClient
s.connect(address)
ConnectionRefusedError: [Errno 111] Connection refused
Process ForkServerProcess-6:
Traceback (most recent call last):
File "/usr/lib64/python3.11/multiprocessing/managers.py", line 814, in _callmethod
conn = self._tls.connection
^^^^^^^^^^^^^^^^^^^^
AttributeError: 'ForkAwareLocal' object has no attribute 'connection'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib64/python3.11/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib64/python3.11/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/tbe/common/repository_manager/route.py", line 71, in wrapper
raise exp
File "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/tbe/common/repository_manager/route.py", line 63, in wrapper
func(*args, **kwargs)
File "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/tbe/common/repository_manager/route.py", line 264, in task_distribute
resource_proxy[SUB_PROCESS_STATE].append(True)
File "<string>", line 2, in append
File "/usr/lib64/python3.11/multiprocessing/managers.py", line 818, in _callmethod
self._connect()
File "/usr/lib64/python3.11/multiprocessing/managers.py", line 805, in _connect
conn = self._Client(self._token.address, authkey=self._authkey)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib64/python3.11/multiprocessing/connection.py", line 518, in Client
c = SocketClient(address)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib64/python3.11/multiprocessing/connection.py", line 646, in SocketClient
s.connect(address)
ConnectionRefusedError: [Errno 111] Connection refused
从上面日志很难看出问题,修改mindie-service的配置文件,将日志设置为debug等级,重启mindie-serivce,查看日志
vim /usr/local/Ascend/mindie/latest/mindie-service/conf/config.json
错误信息如下:
[2025-03-26 20:23:13.026+0800] [18045] [281473177874784] [mindie-server] [ERROR] [model.py:40] : [Model] >>> Exception:Error while deserializing header: HeaderTooLarge
Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/model_wrapper/model.py", line 38, in initialize
return self.python_model.initialize(config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/model_wrapper/standard_model.py", line 93, in initialize
self.generator = Generator(
^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/mindie_llm/text_generator/generator.py", line 85, in __init__
self.generator_backend = get_generator_backend(model_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/mindie_llm/text_generator/adapter/__init__.py", line 26, in get_generator_backend
return generator_cls(model_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/mindie_llm/text_generator/adapter/generator_torch.py", line 97, in __init__
super().__init__(model_config)
File "/usr/local/lib/python3.11/site-packages/mindie_llm/text_generator/adapter/generator_backend.py", line 113, in __init__
self.model_wrapper = get_model_wrapper(model_config, backend_type)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/mindie_llm/modeling/model_wrapper/__init__.py", line 15, in get_model_wrapper
return wrapper_cls(**model_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/mindie_llm/modeling/model_wrapper/atb/atb_model_wrapper.py", line 52, in __init__
self.model_runner.load_weights()
File "/usr/local/Ascend/atb-models/atb_llm/runner/model_runner.py", line 149, in load_weights
weights = Weights(
^^^^^^^^
File "/usr/local/Ascend/atb-models/atb_llm/utils/weights.py", line 49, in __init__
routing = self.load_routing(process_group)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/Ascend/atb-models/atb_llm/utils/weights.py", line 77, in load_routing
with safe_open(filename, framework="pytorch") as f:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
safetensors_rust.SafetensorError: Error while deserializing header: HeaderTooLarge
[2025-03-26 20:23:13.026+0800] [18045] [281473177874784] [mindie-server] [ERROR] [model.py:43] : [MIE04E13030A] [Model] >>> return initialize error result: {'status': 'error', 'npuBlockNum': '0', 'cpuBlockNum': '0'}
怀疑是权重的问题,经检查,权重文件下载不完整
du -sh *
#权重目录内执行
源文件大小:
重新下载权重文件,问题解决

昇腾计算产业是基于昇腾系列(HUAWEI Ascend)处理器和基础软件构建的全栈 AI计算基础设施、行业应用及服务,包括昇腾系列处理器、系列硬件、CANN、AI计算框架、应用使能、开发工具链、管理运维工具、行业应用及服务等全产业链
更多推荐
所有评论(0)