Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

执行python -m web_demo.server --model_path demo_VITA_ckpt --ip 0.0.0.0 --port 8081 出现报错 #96

Open
moonspring233 opened this issue Jan 20, 2025 · 8 comments

Comments

@moonspring233
Copy link

[2025-01-20 15:44:31.379] reading a config file from VITA_ckpt/vita_tts_ckpt//decoder/model.json
/root/data/VITA/vita/model/vita_tts/decoder/llm2tts.py:50: FutureWarning: You are using torch.load with weights_only=False (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for weights_only will be flipped to True. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via torch.serialization.add_safe_globals. We recommend you start setting weights_only=True for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
snapshot_dict = torch.load(model_path + "/decoder/final.pt", map_location=lambda storage, loc: storage)
/root/data/VITA/vita/model/vita_tts/decoder/ticodec/vqvae.py:21: FutureWarning: You are using torch.load with weights_only=False (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for weights_only will be flipped to True. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via torch.serialization.add_safe_globals. We recommend you start setting weights_only=True for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
ckpt = torch.load(ckpt_path)
/opt/miniconda3/envs/vita_demo/lib/python3.10/site-packages/torch/nn/utils/weight_norm.py:134: FutureWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
WeightNorm.apply(module, name, dim)
[2025-01-20 15:44:35.338] Removing weight norm...
[2025-01-20 15:44:35.341] Removing weight norm...
[2025-01-20 15:44:35.592] [<EventProxy object, typeid 'Event' at 0x7fcac526c8e0>, <EventProxy object, typeid 'Event' at 0x7fcac526cac0>] wait_workers_readywait_workers_ready
[2025-01-20 15:44:35.592] [<EventProxy object, typeid 'Event' at 0x7f19cc66c8e0>, <EventProxy object, typeid 'Event' at 0x7f19cc66cac0>] wait_workers_readywait_workers_ready
WARNING 01-20 15:44:35 config.py:1563] Casting torch.bfloat16 to torch.float16.
WARNING 01-20 15:44:35 config.py:1563] Casting torch.bfloat16 to torch.float16.
INFO 01-20 15:44:35 llm_engine.py:184] Initializing an LLM engine (v0.5.5) with config: model='VITA_ckpt', speculative_config=None, tokenizer='VITA_ckpt', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.float16, max_seq_len=32768, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=True, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=VITA_ckpt, use_v2_block_manager=False, enable_prefix_caching=False)
INFO 01-20 15:44:35 llm_engine.py:184] Initializing an LLM engine (v0.5.5) with config: model='VITA_ckpt', speculative_config=None, tokenizer='VITA_ckpt', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.float16, max_seq_len=32768, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=True, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=VITA_ckpt, use_v2_block_manager=False, enable_prefix_caching=False)
Process Process-4:
[2025-01-20 15:44:35.987] Traceback (most recent call last):
[2025-01-20 15:44:35.987] File "/opt/miniconda3/envs/vita_demo/lib/python3.10/multiprocessing/process.py", line 315, in _bootstrap
self.run()
[2025-01-20 15:44:35.987] File "/opt/miniconda3/envs/vita_demo/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
[2025-01-20 15:44:35.987] File "/root/data/VITA/web_demo/server.py", line 180, in load_model
llm = LLM(
[2025-01-20 15:44:35.987] File "/opt/miniconda3/envs/vita_demo/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 175, in init
self.llm_engine = LLMEngine.from_engine_args(
[2025-01-20 15:44:35.987] File "/opt/miniconda3/envs/vita_demo/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 473, in from_engine_args
engine = cls(
[2025-01-20 15:44:35.987] File "/opt/miniconda3/envs/vita_demo/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 270, in init
self.model_executor = executor_class(
[2025-01-20 15:44:35.987] File "/opt/miniconda3/envs/vita_demo/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 46, in init
self._init_executor()
[2025-01-20 15:44:35.987] File "/opt/miniconda3/envs/vita_demo/lib/python3.10/site-packages/vllm/executor/gpu_executor.py", line 37, in _init_executor
self.driver_worker = self._create_worker()
[2025-01-20 15:44:35.987] File "/opt/miniconda3/envs/vita_demo/lib/python3.10/site-packages/vllm/executor/gpu_executor.py", line 104, in _create_worker
return create_worker(**self._get_create_worker_kwargs(
[2025-01-20 15:44:35.987] File "/opt/miniconda3/envs/vita_demo/lib/python3.10/site-packages/vllm/executor/gpu_executor.py", line 23, in create_worker
wrapper.init_worker(**kwargs)
[2025-01-20 15:44:35.987] File "/opt/miniconda3/envs/vita_demo/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 444, in init_worker
self.worker = worker_class(*args, **kwargs)
[2025-01-20 15:44:35.987] File "/opt/miniconda3/envs/vita_demo/lib/python3.10/site-packages/vllm/worker/worker.py", line 99, in init
self.model_runner: GPUModelRunnerBase = ModelRunnerClass(
[2025-01-20 15:44:35.987] File "/opt/miniconda3/envs/vita_demo/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 842, in init
self.attn_backend = get_attn_backend(
[2025-01-20 15:44:35.987] File "/opt/miniconda3/envs/vita_demo/lib/python3.10/site-packages/vllm/attention/selector.py", line 108, in get_attn_backend
backend = which_attn_to_use(num_heads, head_size, num_kv_heads,
[2025-01-20 15:44:35.987] File "/opt/miniconda3/envs/vita_demo/lib/python3.10/site-packages/vllm/attention/selector.py", line 215, in which_attn_to_use
if current_platform.get_device_capability()[0] < 8:
[2025-01-20 15:44:35.987] File "/opt/miniconda3/envs/vita_demo/lib/python3.10/site-packages/vllm/platforms/cuda.py", line 97, in get_device_capability
return get_physical_device_capability(physical_device_id)
[2025-01-20 15:44:35.987] File "/opt/miniconda3/envs/vita_demo/lib/python3.10/site-packages/vllm/platforms/cuda.py", line 39, in wrapper
return fn(*args, **kwargs)
[2025-01-20 15:44:35.987] File "/opt/miniconda3/envs/vita_demo/lib/python3.10/site-packages/vllm/platforms/cuda.py", line 49, in get_physical_device_capability
handle = pynvml.nvmlDeviceGetHandleByIndex(device_id)
[2025-01-20 15:44:35.987] File "/opt/miniconda3/envs/vita_demo/lib/python3.10/site-packages/pynvml.py", line 2437, in nvmlDeviceGetHandleByIndex
_nvmlCheckReturn(ret)
[2025-01-20 15:44:35.987] File "/opt/miniconda3/envs/vita_demo/lib/python3.10/site-packages/pynvml.py", line 979, in _nvmlCheckReturn
raise NVMLError(ret)
[2025-01-20 15:44:35.987] pynvml.NVMLError_InvalidArgument: Invalid Argument
[2025-01-20 15:44:36.031] 正在清理资源...
INFO 01-20 15:44:36 model_runner.py:879] Starting to load model VITA_ckpt...
Loading safetensors checkpoint shards: 0% Completed | 0/4 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 25% Completed | 1/4 [00:01<00:05, 1.97s/it]
Loading safetensors checkpoint shards: 50% Completed | 2/4 [00:04<00:04, 2.18s/it]
Loading safetensors checkpoint shards: 75% Completed | 3/4 [00:07<00:02, 2.50s/it]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:08<00:00, 1.95s/it]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:08<00:00, 2.07s/it]

[2025-01-20 15:44:44.744] Uninitialized parameters: ['model.audio_encoder.encoder.global_cmvn.istd', 'model.audio_encoder.encoder.global_cmvn.mean']
INFO 01-20 15:44:45 model_runner.py:890] Loading model weights took 15.5767 GB
WARNING 01-20 15:44:45 model_runner.py:1057] Computed max_num_seqs (min(256, 32768 // 182272)) to be less than 1. Setting it to the minimum value of 1.
INFO 01-20 15:44:52 gpu_executor.py:121] # GPU blocks: 41508, # CPU blocks: 4681
INFO 01-20 15:44:55 model_runner.py:1181] Capturing the model for CUDA graphs. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI.
INFO 01-20 15:44:55 model_runner.py:1185] CUDA graphs can take additional 1~3 GiB memory per GPU. If you are running out of memory, consider decreasing gpu_memory_utilization or enforcing eager mode. You can also reduce the max_num_seqs as needed to decrease memory usage.
INFO 01-20 15:45:07 model_runner.py:1300] Graph capturing finished in 12 secs.
WARNING 01-20 15:45:08 sampling_params.py:221] temperature 0.001 is less than 0.01, which may cause numerical errors nan or inf in tensors. We have maxed it out to 0.01.

基础的web_demo.web_ability_demo 是正常运行了,也按照说明中下载了silero_vad.onnx和silero_vad.jit放到了指定位置,把模型的config文件 的max_dynamic_patch在 中设置为了1 ,显卡是NVIDIA H800

@lxysl
Copy link
Contributor

lxysl commented Jan 20, 2025

Please verify the number of GPUs available. If you have only one GPU, update the device setting for the second model to cuda:0. By default, the models are deployed on cuda:0 and cuda:1. For reference, see this line in the code: web_demo/server.py#L1013.

@rjq123
Copy link

rjq123 commented Jan 21, 2025

Please verify the number of GPUs available. If you have only one GPU, update the device setting for the second model to cuda:0. By default, the models are deployed on cuda:0 and cuda:1. For reference, see this line in the code: web_demo/server.py#L1013.

[2025-01-21 10:22:26.834] Removing weight norm...
[2025-01-21 10:22:26.837] Removing weight norm...
[2025-01-21 10:22:27.066] [<EventProxy object, typeid 'Event' at 0x7f864538c4f0>, <EventProxy object, typeid 'Event' at 0x7f864538c700>] wait_workers_readywait_workers_ready
[2025-01-21 10:22:27.066] [<EventProxy object, typeid 'Event' at 0x7f287600c640>, <EventProxy object, typeid 'Event' at 0x7f287600c820>] wait_workers_readywait_workers_ready
WARNING 01-21 10:22:27 config.py:1563] Casting torch.bfloat16 to torch.float16.
WARNING 01-21 10:22:27 config.py:1563] Casting torch.bfloat16 to torch.float16.
INFO 01-21 10:22:27 llm_engine.py:184] Initializing an LLM engine (v0.5.5) with config: model='demo_VITA_ckpt', speculative_config=None, tokenizer='demo_VITA_ckpt', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.float16, max_seq_len=32768, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=True, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=demo_VITA_ckpt, use_v2_block_manager=False, enable_prefix_caching=False)
INFO 01-21 10:22:27 llm_engine.py:184] Initializing an LLM engine (v0.5.5) with config: model='demo_VITA_ckpt', speculative_config=None, tokenizer='demo_VITA_ckpt', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.float16, max_seq_len=32768, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=True, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=demo_VITA_ckpt, use_v2_block_manager=False, enable_prefix_caching=False)
INFO 01-21 10:22:27 model_runner.py:879] Starting to load model demo_VITA_ckpt...
INFO 01-21 10:22:27 model_runner.py:879] Starting to load model demo_VITA_ckpt...
Loading safetensors checkpoint shards: 0% Completed | 0/4 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 0% Completed | 0/4 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 25% Completed | 1/4 [00:00<00:01, 2.63it/s]
Loading safetensors checkpoint shards: 25% Completed | 1/4 [00:00<00:01, 2.42it/s]
Loading safetensors checkpoint shards: 50% Completed | 2/4 [00:01<00:01, 1.54it/s]
Loading safetensors checkpoint shards: 50% Completed | 2/4 [00:01<00:01, 1.49it/s]
Loading safetensors checkpoint shards: 75% Completed | 3/4 [00:02<00:00, 1.28it/s]
Loading safetensors checkpoint shards: 75% Completed | 3/4 [00:02<00:00, 1.28it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00, 1.19it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00, 1.29it/s]

[2025-01-21 10:22:30.953] Uninitialized parameters: ['model.audio_encoder.encoder.global_cmvn.istd', 'model.audio_encoder.encoder.global_cmvn.mean']
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00, 1.17it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00, 1.27it/s]

[2025-01-21 10:22:30.991] Uninitialized parameters: ['model.audio_encoder.encoder.global_cmvn.istd', 'model.audio_encoder.encoder.global_cmvn.mean']
INFO 01-21 10:22:31 model_runner.py:890] Loading model weights took 15.5767 GB
WARNING 01-21 10:22:31 model_runner.py:1057] Computed max_num_seqs (min(256, 32768 // 64000)) to be less than 1. Setting it to the minimum value of 1.
INFO 01-21 10:22:31 model_runner.py:890] Loading model weights took 15.5767 GB
WARNING 01-21 10:22:31 model_runner.py:1057] Computed max_num_seqs (min(256, 32768 // 64000)) to be less than 1. Setting it to the minimum value of 1.
Using a slow image processor as use_fast is unset and a slow processor was saved with this model. use_fast=True will be the default behavior in v4.48, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with use_fast=False.
Using a slow image processor as use_fast is unset and a slow processor was saved with this model. use_fast=True will be the default behavior in v4.48, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with use_fast=False.
INFO 01-21 10:22:35 gpu_executor.py:121] # GPU blocks: 58726, # CPU blocks: 4681
INFO 01-21 10:22:35 gpu_executor.py:121] # GPU blocks: 58726, # CPU blocks: 4681
INFO 01-21 10:22:37 model_runner.py:1181] Capturing the model for CUDA graphs. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI.
INFO 01-21 10:22:37 model_runner.py:1185] CUDA graphs can take additional 13 GiB memory per GPU. If you are running out of memory, consider decreasing gpu_memory_utilization or enforcing eager mode. You can also reduce the max_num_seqs as needed to decrease memory usage.
INFO 01-21 10:22:37 model_runner.py:1181] Capturing the model for CUDA graphs. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI.
INFO 01-21 10:22:37 model_runner.py:1185] CUDA graphs can take additional 1
3 GiB memory per GPU. If you are running out of memory, consider decreasing gpu_memory_utilization or enforcing eager mode. You can also reduce the max_num_seqs as needed to decrease memory usage.
INFO 01-21 10:22:49 model_runner.py:1300] Graph capturing finished in 12 secs.
INFO 01-21 10:22:50 model_runner.py:1300] Graph capturing finished in 12 secs.

I am using two GPUs, specifically cuda:6 and cuda:7, but the output is stuck at this point.. How can I resolve this?

@moonspring233
Copy link
Author

Thank you very much, this was the reason. After making the modifications, no errors occurred.

@lxysl
Copy link
Contributor

lxysl commented Jan 21, 2025

@rjq123 It seems there is no error.

@rjq123
Copy link

rjq123 commented Jan 21, 2025

@rjq123 It seems there is no error.看来没有错误。

yes, but no response, the output is stuck at this point.

@lxysl
Copy link
Contributor

lxysl commented Jan 21, 2025

@rjq123 It seems there is no error.看来没有错误。

yes, but no response, the output is stuck at this point.

If you haven't opened the webpage and started interacting, it's normal to have no output.

@rjq123
Copy link

rjq123 commented Jan 21, 2025

@rjq123 It seems there is no error.看来没有错误。@rjq123好像没有错误。显示没有错误。

yes, but no response, the output is stuck at this point.是的,但是没有任何反应,输出卡在此时。

If you haven't opened the webpage and started interacting, it's normal to have no output.如果你还没有打开网页并开始交互,没有输出是正常的。

CUDA_VISIBLE_DEVICES=6,7 python -m web_demo.server --model_path demo_VITA_ckpt --ip 0.0.0.0 --port 9086

Port 9086 is open.
url: http://ip:9086/

The webpage cannot function properly.

@lxysl
Copy link
Contributor

lxysl commented Jan 21, 2025

@rjq123 It seems there is no error.看来没有错误。@rjq123好像没有错误。显示没有错误。

yes, but no response, the output is stuck at this point.是的,但是没有任何反应,输出卡在此时。

If you haven't opened the webpage and started interacting, it's normal to have no output.如果你还没有打开网页并开始交互,没有输出是正常的。

CUDA_VISIBLE_DEVICES=6,7 python -m web_demo.server --model_path demo_VITA_ckpt --ip 0.0.0.0 --port 9086

Port 9086 is open. url: http://ip:9086/

The webpage cannot function properly.

please use https

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants