Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] turbomind has not been built with fp32 support #3065

Open
2 of 3 tasks
syorami opened this issue Jan 21, 2025 · 3 comments
Open
2 of 3 tasks

[Bug] turbomind has not been built with fp32 support #3065

syorami opened this issue Jan 21, 2025 · 3 comments
Assignees

Comments

@syorami
Copy link

syorami commented Jan 21, 2025

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.
  • 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

couldn't configure the model with TurbomindEngineConfig and the error shows RuntimeError: Error: turbomind has not been built with fp32 support. With PytorchEngineConfig it's ok to run the code.

Reproduction

I simply run the following code and nothing special:

from lmdeploy import pipeline, TurbomindEngineConfig
from lmdeploy.vl import load_image

model = 'OpenGVLab__InternVL2_5-4B-MPO'
pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192))

Environment

- cuda 12.4
- python 3.10.15
- torch 2.5.0+cu124
- lmdeploy 0.6.1
- transformers 4.46.1

Error traceback

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[1], line 7
      4 model = 'OpenGVLab__InternVL2_5-4B-MPO'
      5 # model = 'OpenGVLab__InternVL2_5-2B'
----> 7 pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192))
      8 # response = pipe(('describe this image', image))
      9 # print(response.text)

File /opt/conda/lib/python3.10/site-packages/lmdeploy/api.py:81, in pipeline(model_path, backend_config, chat_template_config, log_level, max_log_len, **kwargs)
     77 backend = 'pytorch' if type(
     78     backend_config) is PytorchEngineConfig else 'turbomind'
     79 logger.info(f'Using {backend} engine')
---> 81 return pipeline_class(model_path,
     82                       backend=backend,
     83                       backend_config=backend_config,
     84                       chat_template_config=chat_template_config,
     85                       max_log_len=max_log_len,
     86                       **kwargs)

File /opt/conda/lib/python3.10/site-packages/lmdeploy/serve/vl_async_engine.py:27, in VLAsyncEngine.__init__(self, model_path, **kwargs)
     23     try_import_deeplink(backend_config.device_type)
     24 self.vl_encoder = ImageEncoder(model_path,
     25                                vision_config,
     26                                backend_config=backend_config)
---> 27 super().__init__(model_path, **kwargs)
     28 if self.model_name == 'base':
     29     raise RuntimeError(
     30         'please specify chat template as guided in https://lmdeploy.readthedocs.io/en/latest/inference/vl_pipeline.html#set-chat-template'  # noqa: E501
     31     )

File /opt/conda/lib/python3.10/site-packages/lmdeploy/serve/async_engine.py:158, in AsyncEngine.__init__(self, model_path, model_name, backend, backend_config, chat_template_config, max_log_len, **kwargs)
    156 # build backend engine
    157 if backend == 'turbomind':
--> 158     self._build_turbomind(model_path=model_path,
    159                           backend_config=backend_config,
    160                           **kwargs)
    161 elif backend == 'pytorch':
    162     self._build_pytorch(model_path=model_path,
    163                         backend_config=backend_config,
    164                         **kwargs)

File /opt/conda/lib/python3.10/site-packages/lmdeploy/serve/async_engine.py:197, in AsyncEngine._build_turbomind(self, model_path, backend_config, **kwargs)
    195 """Innter build method for turbomind backend."""
    196 from lmdeploy import turbomind as tm
--> 197 self.engine = tm.TurboMind.from_pretrained(
    198     model_path, engine_config=backend_config, **kwargs)
    199 self.backend_config = self.engine.engine_config
    200 self.hf_tm_cfg = self.engine.config

File /opt/conda/lib/python3.10/site-packages/lmdeploy/turbomind/turbomind.py:302, in TurboMind.from_pretrained(cls, pretrained_model_name_or_path, model_name, chat_template_name, engine_config, **kwargs)
    300 model_source = get_model_source(pretrained_model_name_or_path)
    301 logger.info(f'model_source: {model_source}')
--> 302 return cls(model_path=pretrained_model_name_or_path,
    303            model_name=model_name,
    304            chat_template_name=chat_template_name,
    305            engine_config=engine_config,
    306            model_source=model_source,
    307            **kwargs)

File /opt/conda/lib/python3.10/site-packages/lmdeploy/turbomind/turbomind.py:112, in TurboMind.__init__(self, model_path, model_name, chat_template_name, engine_config, model_source, **kwargs)
    109         model_path = get_model(model_path, _engine_config.download_dir,
    110                                _engine_config.revision)
    111     self.tokenizer = Tokenizer(model_path)
--> 112     self.model_comm = self._from_hf(model_source=model_source,
    113                                     model_path=model_path,
    114                                     engine_config=_engine_config)
    116 with ThreadPoolExecutor(max_workers=self.gpu_count) as e:
    117     ranks = [
    118         self.node_id * self.gpu_count + device_id
    119         for device_id in range(self.gpu_count)
    120     ]

File /opt/conda/lib/python3.10/site-packages/lmdeploy/turbomind/turbomind.py:219, in TurboMind._from_hf(self, model_source, model_path, engine_config)
    214 tm_model = get_tm_model(model_path, self.model_name,
    215                         self.chat_template_name, engine_config)
    217 self._postprocess_config(tm_model.tm_config, engine_config)
--> 219 model_comm = _tm.AbstractTransformerModel.create_llama_model(
    220     model_dir='',
    221     config=yaml.safe_dump(self.config_dict),
    222     tensor_para_size=self.gpu_count,
    223     data_type=self.config.model_config.weight_type)
    225 # create empty weight
    226 self._create_weight(model_comm)

RuntimeError: Error: turbomind has not been built with fp32 support.
@lvhan028
Copy link
Collaborator

It is not a bug.
We have no plans to support fp32.
Could you let us know the scenario of applying fp32?

@syorami
Copy link
Author

syorami commented Jan 22, 2025

It is not a bug. We have no plans to support fp32. Could you let us know the scenario of applying fp32?

I dont mean to use fp32 but simply run the demo. Even if I set dtype to fp16 I still get the error. With version 0.6.3 it's ok.

@lvhan028
Copy link
Collaborator

Oh, sorry for the misunderstanding.
I'll check it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants