Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance]: Why does vllm spend so much memory even using OPT model? #4723

Closed
MitchellX opened this issue May 9, 2024 · 2 comments
Closed
Labels
performance Performance-related issues

Comments

@MitchellX
Copy link
Contributor

MitchellX commented May 9, 2024

Proposal to improve performance

No response

Report of performance regression

No response

Misc discussion on performance

In the vllm, the reserved memory is always around 20GB whatever the model we use. For example, the opt model is only 125M, but the reserved memory is 20GB too. I tried to run the OPT model using a huggingface engine, which took only 568MB of memory.
Why does the vllm framework consume that much GPU memory even for a small model?
Screenshot 2024-05-09 at 7 25 59 PM

Your current environment (if you think it is necessary)

No response

@MitchellX MitchellX added the performance Performance-related issues label May 9, 2024
@Qubitium
Copy link
Contributor

Qubitium commented May 10, 2024

@MitchellX Check your vllm config gpu_memory_utilization is set to sane value more applicable for the smaller OPT model and not default of 90% 0.9

@iohub
Copy link

iohub commented May 11, 2024

Because vllm using 90% (ratio) of GPU memory to build KVCache by default, you can change gpu_memory_utilization param in your scenario.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance-related issues
Projects
None yet
Development

No branches or pull requests

4 participants