[Performance]: Why does vllm spend so much memory even using OPT model? #4723

MitchellX · 2024-05-09T23:27:17Z

Proposal to improve performance

No response

Report of performance regression

No response

Misc discussion on performance

In the vllm, the reserved memory is always around 20GB whatever the model we use. For example, the opt model is only 125M, but the reserved memory is 20GB too. I tried to run the OPT model using a huggingface engine, which took only 568MB of memory.
Why does the vllm framework consume that much GPU memory even for a small model?

Your current environment (if you think it is necessary)

No response

Qubitium · 2024-05-10T00:39:29Z

@MitchellX Check your vllm config gpu_memory_utilization is set to sane value more applicable for the smaller OPT model and not default of 90% 0.9

iohub · 2024-05-11T07:30:55Z

Because vllm using 90% (ratio) of GPU memory to build KVCache by default, you can change gpu_memory_utilization param in your scenario.

MitchellX added the performance Performance-related issues label May 9, 2024

robertgshaw2-neuralmagic closed this as completed May 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Performance]: Why does vllm spend so much memory even using OPT model? #4723

[Performance]: Why does vllm spend so much memory even using OPT model? #4723

MitchellX commented May 9, 2024 •

edited

Qubitium commented May 10, 2024 •

edited

iohub commented May 11, 2024

[Performance]: Why does vllm spend so much memory even using OPT model? #4723

[Performance]: Why does vllm spend so much memory even using OPT model? #4723

Comments

MitchellX commented May 9, 2024 • edited

Proposal to improve performance

Report of performance regression

Misc discussion on performance

Your current environment (if you think it is necessary)

Qubitium commented May 10, 2024 • edited

iohub commented May 11, 2024

MitchellX commented May 9, 2024 •

edited

Qubitium commented May 10, 2024 •

edited