You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the vllm, the reserved memory is always around 20GB whatever the model we use. For example, the opt model is only 125M, but the reserved memory is 20GB too. I tried to run the OPT model using a huggingface engine, which took only 568MB of memory.
Why does the vllm framework consume that much GPU memory even for a small model?
Your current environment (if you think it is necessary)
No response
The text was updated successfully, but these errors were encountered:
Proposal to improve performance
No response
Report of performance regression
No response
Misc discussion on performance
In the vllm, the reserved memory is always around 20GB whatever the model we use. For example, the opt model is only 125M, but the reserved memory is 20GB too. I tried to run the OPT model using a huggingface engine, which took only 568MB of memory.
Why does the vllm framework consume that much GPU memory even for a small model?
Your current environment (if you think it is necessary)
No response
The text was updated successfully, but these errors were encountered: