[Bug] turbomind has not been built with fp32 support #3065

syorami · 2025-01-21T09:06:43Z

Checklist

1. I have searched related issues but cannot get the expected help.
2. The bug has not been fixed in the latest version.
3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

couldn't configure the model with TurbomindEngineConfig and the error shows RuntimeError: Error: turbomind has not been built with fp32 support. With PytorchEngineConfig it's ok to run the code.

Reproduction

I simply run the following code and nothing special:

from lmdeploy import pipeline, TurbomindEngineConfig
from lmdeploy.vl import load_image

model = 'OpenGVLab__InternVL2_5-4B-MPO'
pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192))

Environment

- cuda 12.4
- python 3.10.15
- torch 2.5.0+cu124
- lmdeploy 0.6.1
- transformers 4.46.1

Error traceback

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[1], line 7
      4 model = 'OpenGVLab__InternVL2_5-4B-MPO'
      5 # model = 'OpenGVLab__InternVL2_5-2B'
----> 7 pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192))
      8 # response = pipe(('describe this image', image))
      9 # print(response.text)

File /opt/conda/lib/python3.10/site-packages/lmdeploy/api.py:81, in pipeline(model_path, backend_config, chat_template_config, log_level, max_log_len, **kwargs)
     77 backend = 'pytorch' if type(
     78     backend_config) is PytorchEngineConfig else 'turbomind'
     79 logger.info(f'Using {backend} engine')
---> 81 return pipeline_class(model_path,
     82                       backend=backend,
     83                       backend_config=backend_config,
     84                       chat_template_config=chat_template_config,
     85                       max_log_len=max_log_len,
     86                       **kwargs)

File /opt/conda/lib/python3.10/site-packages/lmdeploy/serve/vl_async_engine.py:27, in VLAsyncEngine.__init__(self, model_path, **kwargs)
     23     try_import_deeplink(backend_config.device_type)
     24 self.vl_encoder = ImageEncoder(model_path,
     25                                vision_config,
     26                                backend_config=backend_config)
---> 27 super().__init__(model_path, **kwargs)
     28 if self.model_name == 'base':
     29     raise RuntimeError(
     30         'please specify chat template as guided in https://lmdeploy.readthedocs.io/en/latest/inference/vl_pipeline.html#set-chat-template'  # noqa: E501
     31     )

File /opt/conda/lib/python3.10/site-packages/lmdeploy/serve/async_engine.py:158, in AsyncEngine.__init__(self, model_path, model_name, backend, backend_config, chat_template_config, max_log_len, **kwargs)
    156 # build backend engine
    157 if backend == 'turbomind':
--> 158     self._build_turbomind(model_path=model_path,
    159                           backend_config=backend_config,
    160                           **kwargs)
    161 elif backend == 'pytorch':
    162     self._build_pytorch(model_path=model_path,
    163                         backend_config=backend_config,
    164                         **kwargs)

File /opt/conda/lib/python3.10/site-packages/lmdeploy/serve/async_engine.py:197, in AsyncEngine._build_turbomind(self, model_path, backend_config, **kwargs)
    195 """Innter build method for turbomind backend."""
    196 from lmdeploy import turbomind as tm
--> 197 self.engine = tm.TurboMind.from_pretrained(
    198     model_path, engine_config=backend_config, **kwargs)
    199 self.backend_config = self.engine.engine_config
    200 self.hf_tm_cfg = self.engine.config

File /opt/conda/lib/python3.10/site-packages/lmdeploy/turbomind/turbomind.py:302, in TurboMind.from_pretrained(cls, pretrained_model_name_or_path, model_name, chat_template_name, engine_config, **kwargs)
    300 model_source = get_model_source(pretrained_model_name_or_path)
    301 logger.info(f'model_source: {model_source}')
--> 302 return cls(model_path=pretrained_model_name_or_path,
    303            model_name=model_name,
    304            chat_template_name=chat_template_name,
    305            engine_config=engine_config,
    306            model_source=model_source,
    307            **kwargs)

File /opt/conda/lib/python3.10/site-packages/lmdeploy/turbomind/turbomind.py:112, in TurboMind.__init__(self, model_path, model_name, chat_template_name, engine_config, model_source, **kwargs)
    109         model_path = get_model(model_path, _engine_config.download_dir,
    110                                _engine_config.revision)
    111     self.tokenizer = Tokenizer(model_path)
--> 112     self.model_comm = self._from_hf(model_source=model_source,
    113                                     model_path=model_path,
    114                                     engine_config=_engine_config)
    116 with ThreadPoolExecutor(max_workers=self.gpu_count) as e:
    117     ranks = [
    118         self.node_id * self.gpu_count + device_id
    119         for device_id in range(self.gpu_count)
    120     ]

File /opt/conda/lib/python3.10/site-packages/lmdeploy/turbomind/turbomind.py:219, in TurboMind._from_hf(self, model_source, model_path, engine_config)
    214 tm_model = get_tm_model(model_path, self.model_name,
    215                         self.chat_template_name, engine_config)
    217 self._postprocess_config(tm_model.tm_config, engine_config)
--> 219 model_comm = _tm.AbstractTransformerModel.create_llama_model(
    220     model_dir='',
    221     config=yaml.safe_dump(self.config_dict),
    222     tensor_para_size=self.gpu_count,
    223     data_type=self.config.model_config.weight_type)
    225 # create empty weight
    226 self._create_weight(model_comm)

RuntimeError: Error: turbomind has not been built with fp32 support.

The text was updated successfully, but these errors were encountered:

lvhan028 · 2025-01-21T15:54:16Z

It is not a bug.
We have no plans to support fp32.
Could you let us know the scenario of applying fp32?

syorami · 2025-01-22T11:23:48Z

It is not a bug. We have no plans to support fp32. Could you let us know the scenario of applying fp32?

I dont mean to use fp32 but simply run the demo. Even if I set dtype to fp16 I still get the error. With version 0.6.3 it's ok.

lvhan028 · 2025-01-22T11:30:26Z

Oh, sorry for the misunderstanding.
I'll check it.

lvhan028 self-assigned this Jan 21, 2025

lvhan028 added the awaiting response label Jan 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] turbomind has not been built with fp32 support #3065

[Bug] turbomind has not been built with fp32 support #3065

syorami commented Jan 21, 2025 •

edited

Loading

lvhan028 commented Jan 21, 2025

syorami commented Jan 22, 2025 •

edited

Loading

lvhan028 commented Jan 22, 2025

[Bug] turbomind has not been built with fp32 support #3065

[Bug] turbomind has not been built with fp32 support #3065

Comments

syorami commented Jan 21, 2025 • edited Loading

Checklist

Describe the bug

Reproduction

Environment

Error traceback

lvhan028 commented Jan 21, 2025

syorami commented Jan 22, 2025 • edited Loading

lvhan028 commented Jan 22, 2025

syorami commented Jan 21, 2025 •

edited

Loading

syorami commented Jan 22, 2025 •

edited

Loading