feat: Ultilize `free_gpu_memory_fraction` to control max VRAM consumption #25

hiro-v · 2024-03-16T07:48:33Z

Currently, tensorrt_llm with tries to allocate as much as possible the VRAM consumption with 3 portions: https://nvidia.github.io/TensorRT-LLM/memory.html

Please add free_gpu_memory_fraction - https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/runtime/model_runner_cpp.py#L169-L172 as nitro parameter so that we can control it (of course the machine has to have enough VRAM for weight loading but we can reduce VRAM for other portions.

This would let more people with GPU VRAM constraint be able to use tensorrt_llm

The text was updated successfully, but these errors were encountered:

tikikun · 2024-03-21T06:52:00Z

Relevant document

hiro-v added the type: feature request A new feature label Mar 16, 2024

tikikun self-assigned this Mar 21, 2024

tikikun changed the title ~~feature request: Add free_gpu_memory_fraction to control max VRAM consumption~~ feat: Ultilize free_gpu_memory_fraction to control max VRAM consumption Mar 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Ultilize `free_gpu_memory_fraction` to control max VRAM consumption #25

feat: Ultilize `free_gpu_memory_fraction` to control max VRAM consumption #25

hiro-v commented Mar 16, 2024

tikikun commented Mar 21, 2024

feat: Ultilize free_gpu_memory_fraction to control max VRAM consumption #25

feat: Ultilize free_gpu_memory_fraction to control max VRAM consumption #25

Comments

hiro-v commented Mar 16, 2024

tikikun commented Mar 21, 2024

feat: Ultilize `free_gpu_memory_fraction` to control max VRAM consumption #25

feat: Ultilize `free_gpu_memory_fraction` to control max VRAM consumption #25