inference-workers exits when trying other models than distilgpt2 (on a non-GPU system) #3358
Replies: 2 comments
-
Hi, Did you manage to make it work. I have the same situation. |
Beta Was this translation helpful? Give feedback.
-
No. I hoped for some feedback whether it is possible as in other solutions like fastchat (which does have an option for that). So I got my hands on an instance of a cloud gpu service provider with reasonable prices (like Lambda Labs) and tested there (not open assistant yet). Now that I know what the memory usage of some models are (7B ~8GB, 13B ~28 GB) I'm thinking of an effordable desktop GPU with 10/12 GB to play with smaller models (while dreaming of an NVIDIA A100 ;-). https://cloud-gpus.com/ - for an overview of providers |
Beta Was this translation helpful? Give feedback.
-
Is it possible to run the worker with other models than distilgpt2 on a non GPU-system?
After successfully launching the services (profiles ci + inference) with the distilgpt2 model, I tried to start it for other models (ex. OA_SFT_Pythia_12B_4), but the inference-workers container fails after waiting for the inference server to be ready.
The inference-server reports that it has started:
but the inference-worker stops after a minute of waiting:
The system running the container is an OpenStack Instance with 8 vCPUs and 32 GB vRAM running Ubuntu 22.04. I have a pile of vCPUs and vRAM, but sadly no GPU yet to run tests.
Before running the "docker compose up" I just set MODEL_CONFIG_NAME to OA_SFT_Pythia_12B_4 as env var.
This message baffles me: "None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used." - shouldn't there be a last one of them? (I assume they fall out of the "huggingface/transformers" requirement)
Thanks in advance!
Beta Was this translation helpful? Give feedback.
All reactions