-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
7B版本无法多卡运行 #303
Comments
同样的问题,4卡3090,example只能单卡,finetune单卡爆显存,多卡报错ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -9) local_rank: 2 (pid: 15250) of binary: /opt/conda/envs/internlm/bin/python |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
使用官方提供的7B版本,单卡24G内存的RTX上无法运行,报OOM错误,指定卡号后无法生效,依然还是只占用第0卡,要怎么推理才可以正常运行
报错:OOM错误
代码中指定所有卡号(机器信息:4卡,每张24G内存)
还是一样的错误,查看nvidia-smi发现实际还是跑在一张卡上,没有分布到其余卡上
The text was updated successfully, but these errors were encountered: