-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions about OOM when running Qwen2.5-0.5B, 1.5B, and 3B on RTX4090 graphics cards OOM #64
Comments
Your 24GB GPUs probably can't deal with the memory pressure compared to the 141GB(!) in the original experiments. Considering reducing the "micro batch" values from 8 down to 4 or even 2.
Also look at enabling gradient checkpointing for the actor/critic models, something like
Finally, the pressure grows relative to the output length, so you could consider reducing the max response length, though since you're trying to see Long CoT, I'd discourage you from doing this unless you have no other option.
|
with N_GPU=8 can run, config as follow : result: When results are being produced, there are also prompts indicating insufficient memory, and it seems that the accuracy is not very high. Could this be related to the parameters in the config as well? Additionally: 1.When running a dataset on 8 GPUs, do all 8 GPUs need to run through the entire dataset, or is the dataset divided into 8 parts and run separately on each GPU? Does this have any impact on the learning accuracy? Thanks! |
Hello, are you only using a 4090 graphics card? Because I only have a 4090 graphics card, can I test this demo? |
我用的是8卡,1个卡不太行,显存炸,内存也得跟着炸 |
好吧,谢谢大佬!!! |
这个0.5B的,我这儿8卡,每个GPU显存都得5-22GB的来回计算,3B的模型训练出来,效果也不是很好,可以试试SFT |
好滴好滴,谢谢大佬!因为我是跨行过来的(之前做机器人算法的),不太懂如何在这个项目上微调,如果您有时间的话,可以简短讲一下吗,或者放一个其它SFT的开源链接。非常感谢! |
牛蛙,跨行,之前的机器人算法也好牛蛙,搜搜LLaMA,这个用的挺舒服的,随时交流 |
好滴好滴!非常感谢!! |
Basic Information:
CUDA12.4
Python 3.12.4
Systerm Debian GNU/Linux 12
RTX 4090-8 24564MiB
With model Qwen2.5-0.5B The configuration is as follows, and the operation is normal
answer as follow:
With model Qwen2.5-1B 3B The configuration is as follows, and the operation is not normal OOM or CUDA OOM

other question as follow

With model Qwen2.5-1B 3B,export N_GPUS=4,GPU runs a few min,then GPU oom
With model Qwen2.5-1B 3B,export N_GPUS=8,GPU runs a few min,then CPU run all the time until ram oom
Thanks
The text was updated successfully, but these errors were encountered: