Need some help. " You need to decrease --max-batch-prefill-tokens." #390

KrisWongz · 2024-04-05T13:22:33Z

System Info

latest

Information

Docker
The CLI directly

Tasks

An officially supported command
My own modifications

Reproduction

4 A100(40G) can not run sucess. This should be a memory issue. What parameters should I adjust to run 72b model successfully?
In 2 A100, I set --max-batch-prefill-tokens and all four para to 1, still cannot make it.

docker run --gpus '"device=0,1,2,3"'
--shm-size 1g
-p 8081:80
-v /home/unionlab001/Model/qwen-72b:/data ghcr.io/predibase/lorax:latest
--model-id /data/Qwen1_5-72B-Chat
--trust-remote-code
--quantize bitsandbytes-nf4
--max-batch-prefill-tokens 300
--max-input-length 200
--max-total-tokens 1024
--num-shard 4 \

RuntimeError: Not enough memory to handle 300 prefill tokens. You need to decrease --max-batch-prefill-tokens
2024-04-04T14:14:39.297825Z ERROR warmup{max_input_length=200 max_prefill_tokens=300 max_total_tokens=1024}:warmup: lorax_client: router/client/src/lib.rs:34: Server error: Not enough memory to handle 300 prefill tokens. You need to decrease --max-batch-prefill-tokens

Expected behavior

none

The text was updated successfully, but these errors were encountered:

magdyksaleh · 2024-04-11T19:48:12Z

Will attempt to repro with these params and fix

KrisWongz changed the title ~~Need some help.~~ Need some help. " You need to decrease --max-batch-prefill-tokens." Apr 5, 2024

magdyksaleh self-assigned this Apr 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Need some help. " You need to decrease --max-batch-prefill-tokens." #390

Need some help. " You need to decrease --max-batch-prefill-tokens." #390

KrisWongz commented Apr 5, 2024 •

edited

magdyksaleh commented Apr 11, 2024

Need some help. " You need to decrease --max-batch-prefill-tokens." #390

Need some help. " You need to decrease --max-batch-prefill-tokens." #390

Comments

KrisWongz commented Apr 5, 2024 • edited

System Info

Information

Tasks

Reproduction

Expected behavior

magdyksaleh commented Apr 11, 2024

KrisWongz commented Apr 5, 2024 •

edited