You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The error log in outputs/default/..../infer/chatglm3-6b-hf:
/home/zhaojieying/anaconda3/envs/opencompass/lib/python3.10/site-packages/torch/cuda/init.py:141: UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 11070). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver. (Triggered internally at /opt/conda/conda-bld/pytorch_1708025847130/work/c10/cuda/CUDAFunctions.cpp:108.)
return torch._C._cuda_getDeviceCount() > 0
02/28 17:00:12 - OpenCompass - INFO - Task [chatglm3-6b-hf/C3]
0%| | 0/1825 [00:00<?, ?it/s]
0%| | 0/1825 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/home/zhaojieying/workspace/opencompass/opencompass/tasks/openicl_infer.py", line 153, in
inferencer.run()
File "/home/zhaojieying/workspace/opencompass/opencompass/tasks/openicl_infer.py", line 81, in run
self._inference()
File "/home/zhaojieying/workspace/opencompass/opencompass/tasks/openicl_infer.py", line 126, in _inference
inferencer.inference(retriever,
File "/home/zhaojieying/workspace/opencompass/opencompass/openicl/icl_inferencer/icl_ppl_inferencer.py", line 181, in inference
sub_res = self.model.get_ppl_from_template(
File "/home/zhaojieying/workspace/opencompass/opencompass/models/base.py", line 152, in get_ppl_from_template
return self.get_ppl(inputs, mask_length)
File "/home/zhaojieying/workspace/opencompass/opencompass/models/huggingface.py", line 472, in get_ppl
return np.concatenate([
File "/home/zhaojieying/workspace/opencompass/opencompass/models/huggingface.py", line 473, in
self._get_ppl(inputs=[text], mask_length=mask_length)
File "/home/zhaojieying/workspace/opencompass/opencompass/models/huggingface.py", line 494, in _get_ppl
outputs, inputs = self.get_logits(inputs)
File "/home/zhaojieying/workspace/opencompass/opencompass/models/huggingface.py", line 440, in get_logits
input_ids = self.tokenizer(
File "/home/zhaojieying/anaconda3/envs/opencompass/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2829, in call
encodings = self._call_one(text=text, text_pair=text_pair, **all_kwargs)
File "/home/zhaojieying/anaconda3/envs/opencompass/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2887, in _call_one
raise ValueError(
ValueError: text input must be of type str (single example), List[str] (batch or single pretokenized example) or List[List[str]] (batch of pretokenized examples).
[2024-02-28 17:00:20,603] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 950677) of binary: /home/zhaojieying/anaconda3/envs/opencompass/bin/python
Traceback (most recent call last):
File "/home/zhaojieying/anaconda3/envs/opencompass/bin/torchrun", line 33, in
sys.exit(load_entry_point('torch==2.2.1', 'console_scripts', 'torchrun')())
File "/home/zhaojieying/anaconda3/envs/opencompass/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 347, in wrapper
return f(*args, **kwargs)
File "/home/zhaojieying/anaconda3/envs/opencompass/lib/python3.10/site-packages/torch/distributed/run.py", line 812, in main
run(args)
File "/home/zhaojieying/anaconda3/envs/opencompass/lib/python3.10/site-packages/torch/distributed/run.py", line 803, in run
elastic_launch(
File "/home/zhaojieying/anaconda3/envs/opencompass/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 135, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/zhaojieying/anaconda3/envs/opencompass/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 268, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
The weights is already downloaded a local file.
The error log in outputs/default/..../infer/chatglm3-6b-hf:
/home/zhaojieying/anaconda3/envs/opencompass/lib/python3.10/site-packages/torch/cuda/init.py:141: UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 11070). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver. (Triggered internally at /opt/conda/conda-bld/pytorch_1708025847130/work/c10/cuda/CUDAFunctions.cpp:108.)
return torch._C._cuda_getDeviceCount() > 0
02/28 17:00:12 - OpenCompass - INFO - Task [chatglm3-6b-hf/C3]
Loading checkpoint shards: 0%| | 0/7 [00:00<?, ?it/s]
Loading checkpoint shards: 29%|██▊ | 2/7 [00:00<00:00, 11.41it/s]
Loading checkpoint shards: 57%|█████▋ | 4/7 [00:00<00:00, 11.97it/s]
Loading checkpoint shards: 86%|████████▌ | 6/7 [00:00<00:00, 12.19it/s]
Loading checkpoint shards: 100%|██████████| 7/7 [00:00<00:00, 12.05it/s]
02/28 17:00:15 - OpenCompass - INFO - Start inferencing [chatglm3-6b-hf/C3]
[2024-02-28 17:00:15,966] [opencompass.openicl.icl_inferencer.icl_ppl_inferencer] [INFO] Calculating PPL for prompts labeled '0'
0%| | 0/1825 [00:00<?, ?it/s]
0%| | 0/1825 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/home/zhaojieying/workspace/opencompass/opencompass/tasks/openicl_infer.py", line 153, in
inferencer.run()
File "/home/zhaojieying/workspace/opencompass/opencompass/tasks/openicl_infer.py", line 81, in run
self._inference()
File "/home/zhaojieying/workspace/opencompass/opencompass/tasks/openicl_infer.py", line 126, in _inference
inferencer.inference(retriever,
File "/home/zhaojieying/workspace/opencompass/opencompass/openicl/icl_inferencer/icl_ppl_inferencer.py", line 181, in inference
sub_res = self.model.get_ppl_from_template(
File "/home/zhaojieying/workspace/opencompass/opencompass/models/base.py", line 152, in get_ppl_from_template
return self.get_ppl(inputs, mask_length)
File "/home/zhaojieying/workspace/opencompass/opencompass/models/huggingface.py", line 472, in get_ppl
return np.concatenate([
File "/home/zhaojieying/workspace/opencompass/opencompass/models/huggingface.py", line 473, in
self._get_ppl(inputs=[text], mask_length=mask_length)
File "/home/zhaojieying/workspace/opencompass/opencompass/models/huggingface.py", line 494, in _get_ppl
outputs, inputs = self.get_logits(inputs)
File "/home/zhaojieying/workspace/opencompass/opencompass/models/huggingface.py", line 440, in get_logits
input_ids = self.tokenizer(
File "/home/zhaojieying/anaconda3/envs/opencompass/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2829, in call
encodings = self._call_one(text=text, text_pair=text_pair, **all_kwargs)
File "/home/zhaojieying/anaconda3/envs/opencompass/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2887, in _call_one
raise ValueError(
ValueError: text input must be of type
str
(single example),List[str]
(batch or single pretokenized example) orList[List[str]]
(batch of pretokenized examples).[2024-02-28 17:00:20,603] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 950677) of binary: /home/zhaojieying/anaconda3/envs/opencompass/bin/python
Traceback (most recent call last):
File "/home/zhaojieying/anaconda3/envs/opencompass/bin/torchrun", line 33, in
sys.exit(load_entry_point('torch==2.2.1', 'console_scripts', 'torchrun')())
File "/home/zhaojieying/anaconda3/envs/opencompass/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 347, in wrapper
return f(*args, **kwargs)
File "/home/zhaojieying/anaconda3/envs/opencompass/lib/python3.10/site-packages/torch/distributed/run.py", line 812, in main
run(args)
File "/home/zhaojieying/anaconda3/envs/opencompass/lib/python3.10/site-packages/torch/distributed/run.py", line 803, in run
elastic_launch(
File "/home/zhaojieying/anaconda3/envs/opencompass/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 135, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/zhaojieying/anaconda3/envs/opencompass/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 268, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
/home/zhaojieying/workspace/opencompass/opencompass/tasks/openicl_infer.py FAILED
Failures:
<NO_OTHER_FAILURES>
Root Cause (first observed failure):
[0]:
time : 2024-02-28_17:00:20
host : reg.mydomain.com
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 950677)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
Beta Was this translation helpful? Give feedback.
All reactions