New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Unable to serve Llama3 using vLLM Docker container #4725
Comments
It works with the container tag [v0.3.3]. It does not work with either v0.4.2 or v0.4.1 |
We have the same symptom as well. we have conducted important function tests on a single A100 using vllm 0.3.3, but after upgrading the version to vllm 0.4 (and vllm 0.4.2 as well), it won't run at all with the same error message. |
@youkaichao Do you have any insights on this? FWIW, 0.4.1 works for me without the custom all-reduce operation but 0.4.2 does expose some issues as well |
The error trace points to pytorch distributed, so that's not what I know. It's inside pytorch i think. The problem looks strange, because you only have 1 GPUs while pytorch tries to read the p2p status between 0 and 0 (essentially the GPU itself). One educated guess, maybe you can try to upgrade the driver version, I remember several issues can be solved by upgrading to driver 540. 535 seems to be buggy. |
Thank you, @youkaichao. You're right. After upgrading the NVIDIA driver to v550.x vLLM 0.4.2 worked properly |
Your current environment
🐛 Describe the bug
I'm trying to run Llama3 using Docker this way:
My GPU is properly configured:
but I get the following error:
The text was updated successfully, but these errors were encountered: