[Ask for help] OpenCompass - ERROR - /home/zhaojieying/workspace/opencompass/opencompass/runners/base.py #932

zjy212321 · 2024-02-28T11:51:03Z

zjy212321
Feb 28, 2024

The weights is already downloaded a local file.

The error log in outputs/default/..../infer/chatglm3-6b-hf:

/home/zhaojieying/anaconda3/envs/opencompass/lib/python3.10/site-packages/torch/cuda/init.py:141: UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 11070). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver. (Triggered internally at /opt/conda/conda-bld/pytorch_1708025847130/work/c10/cuda/CUDAFunctions.cpp:108.)
return torch._C._cuda_getDeviceCount() > 0
02/28 17:00:12 - OpenCompass - INFO - Task [chatglm3-6b-hf/C3]

Loading checkpoint shards: 0%| | 0/7 [00:00<?, ?it/s]
Loading checkpoint shards: 29%|██▊ | 2/7 [00:00<00:00, 11.41it/s]
Loading checkpoint shards: 57%|█████▋ | 4/7 [00:00<00:00, 11.97it/s]
Loading checkpoint shards: 86%|████████▌ | 6/7 [00:00<00:00, 12.19it/s]
Loading checkpoint shards: 100%|██████████| 7/7 [00:00<00:00, 12.05it/s]
02/28 17:00:15 - OpenCompass - INFO - Start inferencing [chatglm3-6b-hf/C3]
[2024-02-28 17:00:15,966] [opencompass.openicl.icl_inferencer.icl_ppl_inferencer] [INFO] Calculating PPL for prompts labeled '0'

0%| | 0/1825 [00:00<?, ?it/s]
0%| | 0/1825 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/home/zhaojieying/workspace/opencompass/opencompass/tasks/openicl_infer.py", line 153, in
inferencer.run()
File "/home/zhaojieying/workspace/opencompass/opencompass/tasks/openicl_infer.py", line 81, in run
self._inference()
File "/home/zhaojieying/workspace/opencompass/opencompass/tasks/openicl_infer.py", line 126, in _inference
inferencer.inference(retriever,
File "/home/zhaojieying/workspace/opencompass/opencompass/openicl/icl_inferencer/icl_ppl_inferencer.py", line 181, in inference
sub_res = self.model.get_ppl_from_template(
File "/home/zhaojieying/workspace/opencompass/opencompass/models/base.py", line 152, in get_ppl_from_template
return self.get_ppl(inputs, mask_length)
File "/home/zhaojieying/workspace/opencompass/opencompass/models/huggingface.py", line 472, in get_ppl
return np.concatenate([
File "/home/zhaojieying/workspace/opencompass/opencompass/models/huggingface.py", line 473, in
self._get_ppl(inputs=[text], mask_length=mask_length)
File "/home/zhaojieying/workspace/opencompass/opencompass/models/huggingface.py", line 494, in _get_ppl
outputs, inputs = self.get_logits(inputs)
File "/home/zhaojieying/workspace/opencompass/opencompass/models/huggingface.py", line 440, in get_logits
input_ids = self.tokenizer(
File "/home/zhaojieying/anaconda3/envs/opencompass/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2829, in call
encodings = self._call_one(text=text, text_pair=text_pair, all_kwargs)
File "/home/zhaojieying/anaconda3/envs/opencompass/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2887, in _call_one
raise ValueError(
ValueError: text input must be of type `str` (single example), `List[str]` (batch or single pretokenized example) or `List[List[str]]` (batch of pretokenized examples).
[2024-02-28 17:00:20,603] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 950677) of binary: /home/zhaojieying/anaconda3/envs/opencompass/bin/python
Traceback (most recent call last):
File "/home/zhaojieying/anaconda3/envs/opencompass/bin/torchrun", line 33, in
sys.exit(load_entry_point('torch==2.2.1', 'console_scripts', 'torchrun')())
File "/home/zhaojieying/anaconda3/envs/opencompass/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/init**.py", line 347, in wrapper
return f(*args, kwargs)
File "/home/zhaojieying/anaconda3/envs/opencompass/lib/python3.10/site-packages/torch/distributed/run.py", line 812, in main
run(args)
File "/home/zhaojieying/anaconda3/envs/opencompass/lib/python3.10/site-packages/torch/distributed/run.py", line 803, in run
elastic_launch(
File "/home/zhaojieying/anaconda3/envs/opencompass/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 135, in call**
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/zhaojieying/anaconda3/envs/opencompass/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 268, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

/home/zhaojieying/workspace/opencompass/opencompass/tasks/openicl_infer.py FAILED

Failures:
<NO_OTHER_FAILURES>

Root Cause (first observed failure):
[0]:
time : 2024-02-28_17:00:20
host : reg.mydomain.com
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 950677)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Ask for help] OpenCompass - ERROR - /home/zhaojieying/workspace/opencompass/opencompass/runners/base.py #932

{{title}}

Replies: 0 comments

Select a reply

[Ask for help] OpenCompass - ERROR - /home/zhaojieying/workspace/opencompass/opencompass/runners/base.py #932

zjy212321 Feb 28, 2024

/home/zhaojieying/workspace/opencompass/opencompass/tasks/openicl_infer.py FAILED

Failures: <NO_OTHER_FAILURES>

Root Cause (first observed failure): [0]: time : 2024-02-28_17:00:20 host : reg.mydomain.com rank : 0 (local_rank: 0) exitcode : 1 (pid: 950677) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Replies: 0 comments

zjy212321
Feb 28, 2024

Failures:
<NO_OTHER_FAILURES>

Root Cause (first observed failure):
[0]:
time : 2024-02-28_17:00:20
host : reg.mydomain.com
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 950677)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html