-
Notifications
You must be signed in to change notification settings - Fork 147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
执行python -m web_demo.server --model_path demo_VITA_ckpt --ip 0.0.0.0 --port 8081 出现报错 #96
Comments
Please verify the number of GPUs available. If you have only one GPU, update the device setting for the second model to |
[2025-01-21 10:22:30.953] Uninitialized parameters: ['model.audio_encoder.encoder.global_cmvn.istd', 'model.audio_encoder.encoder.global_cmvn.mean'] [2025-01-21 10:22:30.991] Uninitialized parameters: ['model.audio_encoder.encoder.global_cmvn.istd', 'model.audio_encoder.encoder.global_cmvn.mean'] I am using two GPUs, specifically cuda:6 and cuda:7, but the output is stuck at this point.. How can I resolve this? |
Thank you very much, this was the reason. After making the modifications, no errors occurred. |
@rjq123 It seems there is no error. |
yes, but no response, the output is stuck at this point. |
If you haven't opened the webpage and started interacting, it's normal to have no output. |
CUDA_VISIBLE_DEVICES=6,7 python -m web_demo.server --model_path demo_VITA_ckpt --ip 0.0.0.0 --port 9086 Port 9086 is open. The webpage cannot function properly. |
please use https |
[2025-01-20 15:44:31.379] reading a config file from VITA_ckpt/vita_tts_ckpt//decoder/model.json
/root/data/VITA/vita/model/vita_tts/decoder/llm2tts.py:50: FutureWarning: You are using
torch.load
withweights_only=False
(the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value forweights_only
will be flipped toTrue
. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user viatorch.serialization.add_safe_globals
. We recommend you start settingweights_only=True
for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.snapshot_dict = torch.load(model_path + "/decoder/final.pt", map_location=lambda storage, loc: storage)
/root/data/VITA/vita/model/vita_tts/decoder/ticodec/vqvae.py:21: FutureWarning: You are using
torch.load
withweights_only=False
(the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value forweights_only
will be flipped toTrue
. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user viatorch.serialization.add_safe_globals
. We recommend you start settingweights_only=True
for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.ckpt = torch.load(ckpt_path)
/opt/miniconda3/envs/vita_demo/lib/python3.10/site-packages/torch/nn/utils/weight_norm.py:134: FutureWarning:
torch.nn.utils.weight_norm
is deprecated in favor oftorch.nn.utils.parametrizations.weight_norm
.WeightNorm.apply(module, name, dim)
[2025-01-20 15:44:35.338] Removing weight norm...
[2025-01-20 15:44:35.341] Removing weight norm...
[2025-01-20 15:44:35.592] [<EventProxy object, typeid 'Event' at 0x7fcac526c8e0>, <EventProxy object, typeid 'Event' at 0x7fcac526cac0>] wait_workers_readywait_workers_ready
[2025-01-20 15:44:35.592] [<EventProxy object, typeid 'Event' at 0x7f19cc66c8e0>, <EventProxy object, typeid 'Event' at 0x7f19cc66cac0>] wait_workers_readywait_workers_ready
WARNING 01-20 15:44:35 config.py:1563] Casting torch.bfloat16 to torch.float16.
WARNING 01-20 15:44:35 config.py:1563] Casting torch.bfloat16 to torch.float16.
INFO 01-20 15:44:35 llm_engine.py:184] Initializing an LLM engine (v0.5.5) with config: model='VITA_ckpt', speculative_config=None, tokenizer='VITA_ckpt', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.float16, max_seq_len=32768, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=True, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=VITA_ckpt, use_v2_block_manager=False, enable_prefix_caching=False)
INFO 01-20 15:44:35 llm_engine.py:184] Initializing an LLM engine (v0.5.5) with config: model='VITA_ckpt', speculative_config=None, tokenizer='VITA_ckpt', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.float16, max_seq_len=32768, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=True, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=VITA_ckpt, use_v2_block_manager=False, enable_prefix_caching=False)
Process Process-4:
[2025-01-20 15:44:35.987] Traceback (most recent call last):
[2025-01-20 15:44:35.987] File "/opt/miniconda3/envs/vita_demo/lib/python3.10/multiprocessing/process.py", line 315, in _bootstrap
self.run()
[2025-01-20 15:44:35.987] File "/opt/miniconda3/envs/vita_demo/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
[2025-01-20 15:44:35.987] File "/root/data/VITA/web_demo/server.py", line 180, in load_model
llm = LLM(
[2025-01-20 15:44:35.987] File "/opt/miniconda3/envs/vita_demo/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 175, in init
self.llm_engine = LLMEngine.from_engine_args(
[2025-01-20 15:44:35.987] File "/opt/miniconda3/envs/vita_demo/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 473, in from_engine_args
engine = cls(
[2025-01-20 15:44:35.987] File "/opt/miniconda3/envs/vita_demo/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 270, in init
self.model_executor = executor_class(
[2025-01-20 15:44:35.987] File "/opt/miniconda3/envs/vita_demo/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 46, in init
self._init_executor()
[2025-01-20 15:44:35.987] File "/opt/miniconda3/envs/vita_demo/lib/python3.10/site-packages/vllm/executor/gpu_executor.py", line 37, in _init_executor
self.driver_worker = self._create_worker()
[2025-01-20 15:44:35.987] File "/opt/miniconda3/envs/vita_demo/lib/python3.10/site-packages/vllm/executor/gpu_executor.py", line 104, in _create_worker
return create_worker(**self._get_create_worker_kwargs(
[2025-01-20 15:44:35.987] File "/opt/miniconda3/envs/vita_demo/lib/python3.10/site-packages/vllm/executor/gpu_executor.py", line 23, in create_worker
wrapper.init_worker(**kwargs)
[2025-01-20 15:44:35.987] File "/opt/miniconda3/envs/vita_demo/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 444, in init_worker
self.worker = worker_class(*args, **kwargs)
[2025-01-20 15:44:35.987] File "/opt/miniconda3/envs/vita_demo/lib/python3.10/site-packages/vllm/worker/worker.py", line 99, in init
self.model_runner: GPUModelRunnerBase = ModelRunnerClass(
[2025-01-20 15:44:35.987] File "/opt/miniconda3/envs/vita_demo/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 842, in init
self.attn_backend = get_attn_backend(
[2025-01-20 15:44:35.987] File "/opt/miniconda3/envs/vita_demo/lib/python3.10/site-packages/vllm/attention/selector.py", line 108, in get_attn_backend
backend = which_attn_to_use(num_heads, head_size, num_kv_heads,
[2025-01-20 15:44:35.987] File "/opt/miniconda3/envs/vita_demo/lib/python3.10/site-packages/vllm/attention/selector.py", line 215, in which_attn_to_use
if current_platform.get_device_capability()[0] < 8:
[2025-01-20 15:44:35.987] File "/opt/miniconda3/envs/vita_demo/lib/python3.10/site-packages/vllm/platforms/cuda.py", line 97, in get_device_capability
return get_physical_device_capability(physical_device_id)
[2025-01-20 15:44:35.987] File "/opt/miniconda3/envs/vita_demo/lib/python3.10/site-packages/vllm/platforms/cuda.py", line 39, in wrapper
return fn(*args, **kwargs)
[2025-01-20 15:44:35.987] File "/opt/miniconda3/envs/vita_demo/lib/python3.10/site-packages/vllm/platforms/cuda.py", line 49, in get_physical_device_capability
handle = pynvml.nvmlDeviceGetHandleByIndex(device_id)
[2025-01-20 15:44:35.987] File "/opt/miniconda3/envs/vita_demo/lib/python3.10/site-packages/pynvml.py", line 2437, in nvmlDeviceGetHandleByIndex
_nvmlCheckReturn(ret)
[2025-01-20 15:44:35.987] File "/opt/miniconda3/envs/vita_demo/lib/python3.10/site-packages/pynvml.py", line 979, in _nvmlCheckReturn
raise NVMLError(ret)
[2025-01-20 15:44:35.987] pynvml.NVMLError_InvalidArgument: Invalid Argument
[2025-01-20 15:44:36.031] 正在清理资源...
INFO 01-20 15:44:36 model_runner.py:879] Starting to load model VITA_ckpt...
Loading safetensors checkpoint shards: 0% Completed | 0/4 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 25% Completed | 1/4 [00:01<00:05, 1.97s/it]
Loading safetensors checkpoint shards: 50% Completed | 2/4 [00:04<00:04, 2.18s/it]
Loading safetensors checkpoint shards: 75% Completed | 3/4 [00:07<00:02, 2.50s/it]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:08<00:00, 1.95s/it]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:08<00:00, 2.07s/it]
[2025-01-20 15:44:44.744] Uninitialized parameters: ['model.audio_encoder.encoder.global_cmvn.istd', 'model.audio_encoder.encoder.global_cmvn.mean']
INFO 01-20 15:44:45 model_runner.py:890] Loading model weights took 15.5767 GB
WARNING 01-20 15:44:45 model_runner.py:1057] Computed max_num_seqs (min(256, 32768 // 182272)) to be less than 1. Setting it to the minimum value of 1.
INFO 01-20 15:44:52 gpu_executor.py:121] # GPU blocks: 41508, # CPU blocks: 4681
INFO 01-20 15:44:55 model_runner.py:1181] Capturing the model for CUDA graphs. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI.
INFO 01-20 15:44:55 model_runner.py:1185] CUDA graphs can take additional 1~3 GiB memory per GPU. If you are running out of memory, consider decreasing
gpu_memory_utilization
or enforcing eager mode. You can also reduce themax_num_seqs
as needed to decrease memory usage.INFO 01-20 15:45:07 model_runner.py:1300] Graph capturing finished in 12 secs.
WARNING 01-20 15:45:08 sampling_params.py:221] temperature 0.001 is less than 0.01, which may cause numerical errors nan or inf in tensors. We have maxed it out to 0.01.
基础的web_demo.web_ability_demo 是正常运行了,也按照说明中下载了silero_vad.onnx和silero_vad.jit放到了指定位置,把模型的config文件 的max_dynamic_patch在 中设置为了1 ,显卡是NVIDIA H800
The text was updated successfully, but these errors were encountered: