Can't pass workers_per_resource to the bentoml container #901

hahmad2008 · 2024-02-12T14:34:43Z

Describe the bug

I have a machine with two GPUs, I run the model with openllm start command and everything went well.
CUDA_VISIBLE_DEVICES=0,1 TRANSFORMERS_OFFLINE=1 openllm start mistral --model-id mymodel --dtype float16 --gpu-memory-utilization 0.95 --workers-per-resource 0.5

there are two process appear on the two GPUs in this case one for the service and another for ray instance.

when I run start command without --gpu-memory-utilization 0.95 --workers-per-resource 0.5, only one GPU is running the service and CUDA out of memory is occured.

When I build the image and follow the steps to create container, however when i run the docker image, it issue error of cuda out of memory, such as the second case without passing these args: --gpu-memory-utilization 0.95 --workers-per-resource 0.5

steps:

openllm build mymodel --backend vllm --serialization safetensors
bentoml containerize mymodel-service:12345 --opt progress=plain
docker run --rm --gpus all -p 3000:3000 -it mymodel-service:12345

To reproduce

No response

Logs

No response

Environment

$ bentoml -v
bentoml, version 1.1.11

$openllm -v
openllm, 0.4.45.dev2 (compiled: False)
Python (CPython) 3.11.7

System information (Optional)

No response

The text was updated successfully, but these errors were encountered:

hahmad2008 · 2024-02-12T19:35:49Z

@aarnphm What is the difference between the previous two cases, so the first case can launch two processes one for ray worker and other for bentoml service (that when using --gpu-memory-utilization 0.95 --workers-per-resource 0.5

jeremyadamsfisher · 2024-02-13T00:40:06Z

Same issue: #872

bojiang assigned aarnphm Feb 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't pass workers_per_resource to the bentoml container #901

Can't pass workers_per_resource to the bentoml container #901

hahmad2008 commented Feb 12, 2024 •

edited

hahmad2008 commented Feb 12, 2024

jeremyadamsfisher commented Feb 13, 2024

Can't pass workers_per_resource to the bentoml container #901

Can't pass workers_per_resource to the bentoml container #901

Comments

hahmad2008 commented Feb 12, 2024 • edited

Describe the bug

To reproduce

Logs

Environment

System information (Optional)

hahmad2008 commented Feb 12, 2024

jeremyadamsfisher commented Feb 13, 2024

hahmad2008 commented Feb 12, 2024 •

edited