New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] 开启vllm加速后,客户端多并发调用服务时,出现Connection broken: InvalidChunkLength报错 #3885
Comments
To address the issues you're experiencing with Langchain-ChatGLM, particularly after enabling VLLM acceleration, consider the following steps:
These steps are derived from analyzing the provided configuration and the issues you're facing. Adjusting these areas should help mitigate the problems with initial call failures and intermittent
|
问题1同。只有第一次调用会有这个问题。 |
这个项目没有做并发。相关问题需要自己兼容 |
问题描述 / Problem Description
用简洁明了的语言描述这个问题 / Describe the problem in a clear and concise manner.
开启vllm加速启动服务后:
问题1:第一次调用若是多个并发去调用,则会因为模型没加载而调用失败;若是第一次调用是单次去调用,则可正常返回结果
问题2:多并发调用时,偶尔会出现报错:requests.exceptions.ChunkedEncodingError: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))
复现问题的步骤 / Steps to Reproduce
1.设置vllm加速:
FSCHAT_MODEL_WORKERS = {
# 所有模型共用的默认配置,可在模型专项配置中进行覆盖。
"default": {
"host": DEFAULT_BIND_HOST,
"port": 30002,
"device": LLM_DEVICE,
# False,'vllm',使用的推理加速框架,使用vllm如果出现HuggingFace通信问题,参见doc/FAQ
# vllm对一些模型支持还不成熟,暂时默认关闭
"infer_turbo": 'vllm',
2.启动服务:
python startup.py -a
3.python代码多并发调用
预期的结果 / Expected Result
正常返回生成的答案
实际结果 / Actual Result
描述实际发生的结果 / Describe the actual result.
多并发请求服务,偶尔能正常执行完,偶尔会部分正常执行,部分报错:
requests.exceptions.ChunkedEncodingError: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))
环境信息 / Environment Information
附加信息 / Additional Information
添加与问题相关的任何其他信息 / Add any other information related to the issue.
The text was updated successfully, but these errors were encountered: