阿里云DSW环境，总是提示 The model's max seq len (8192) is larger than the maximum number of tokens that can be stored in KV cache (5392). Try increasing `gpu_memory_utilization` or decreasing `max_model_len` when initializing the engine. #38

leeeex · 2024-02-16T06:30:41Z

KwaiKEG/kagentlms_qwen_7b_mat
Qwen/Qwen-7B-Chat
两个都一样的提示。
尝试qwen1.5新版本，提示keyerror，qwen2

leeeex · 2024-02-16T07:20:27Z

python3 -m fastchat.serve.model_worker --model-path KwaiKEG/kagentlms_qwen_7b_mat --controller http://localhost:21001 --port 31000 --worker http://localhost:31000
查了fastchat官网，第二个服务用这个命令可以解决

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

阿里云DSW环境，总是提示 The model's max seq len (8192) is larger than the maximum number of tokens that can be stored in KV cache (5392). Try increasing `gpu_memory_utilization` or decreasing `max_model_len` when initializing the engine. #38

阿里云DSW环境，总是提示 The model's max seq len (8192) is larger than the maximum number of tokens that can be stored in KV cache (5392). Try increasing `gpu_memory_utilization` or decreasing `max_model_len` when initializing the engine. #38

leeeex commented Feb 16, 2024

leeeex commented Feb 16, 2024

阿里云DSW环境，总是提示 The model's max seq len (8192) is larger than the maximum number of tokens that can be stored in KV cache (5392). Try increasing gpu_memory_utilization or decreasing max_model_len when initializing the engine. #38

阿里云DSW环境，总是提示 The model's max seq len (8192) is larger than the maximum number of tokens that can be stored in KV cache (5392). Try increasing gpu_memory_utilization or decreasing max_model_len when initializing the engine. #38

Comments

leeeex commented Feb 16, 2024

leeeex commented Feb 16, 2024

阿里云DSW环境，总是提示 The model's max seq len (8192) is larger than the maximum number of tokens that can be stored in KV cache (5392). Try increasing `gpu_memory_utilization` or decreasing `max_model_len` when initializing the engine. #38

阿里云DSW环境，总是提示 The model's max seq len (8192) is larger than the maximum number of tokens that can be stored in KV cache (5392). Try increasing `gpu_memory_utilization` or decreasing `max_model_len` when initializing the engine. #38