[BUG] 多并发调用，偶尔出现因为embedding没加载而调用失败 #3899

sweetautumn · 2024-04-26T08:10:10Z

问题描述 / Problem Description
开启vllm加速启动服务后：
多并发调用，则会因为embedding没加载而调用失败

复现问题的步骤 / Steps to Reproduce
1.设置vllm加速：
FSCHAT_MODEL_WORKERS = {

"default": {
"host": DEFAULT_BIND_HOST,
"port": 30002,
"device": LLM_DEVICE,
"infer_turbo": 'vllm',

"max_parallel_loading_workers":3,
"enforce_eager":False,
"max_context_len_to_capture":2048,
"max_model_len":2048,

# model_worker多卡加载需要配置的参数
# "gpus": None, # 使用的GPU，以str的格式指定，如"0,1"，如失效请使用CUDA_VISIBLE_DEVICES="0,1"等形式指定
# "num_gpus": 1, # 使用GPU的数量
# "max_gpu_memory": "20GiB", # 每个GPU占用的最大显存

# 以下为model_worker非常用参数，可根据需要配置
# "load_8bit": False, # 开启8bit量化
# "cpu_offloading": None,
# "gptq_ckpt": None,
# "gptq_wbits": 16,
# "gptq_groupsize": -1,
# "gptq_act_order": False,
# "awq_ckpt": None,
# "awq_wbits": 16,
# "awq_groupsize": -1,
# "model_names": LLM_MODELS,
# "conv_template": None,
# "limit_worker_concurrency": 5,
# "stream_interval": 2,
# "no_register": False,
# "embed_in_truncate": False,

# 以下为vllm_worker配置参数,注意使用vllm必须有gpu，仅在Linux测试通过

# tokenizer = model_path # 如果tokenizer与model_path不一致在此处添加
'tokenizer_mode':'auto',
'trust_remote_code':True,
'download_dir':None,
'load_format':'auto',
'dtype':'auto',
'seed':0,
'worker_use_ray':False,
'pipeline_parallel_size':1,
'tensor_parallel_size':1,
'block_size':16,
'swap_space':4 , # GiB
'gpu_memory_utilization':0.80,
'max_num_batched_tokens':2560,
'max_num_seqs':256,
'disable_log_stats':False,
'conv_template':None,
'limit_worker_concurrency':3,
'no_register':False,
'num_gpus': 1,
'engine_use_ray': False,
'disable_log_requests': False

},

2.启动服务：
python startup.py -a

3.python代码多并发调用

预期的结果 / Expected Result
正常返回生成的答案

实际结果 / Actual Result
多并发请求服务，偶尔能正常执行完，偶尔会部分正常执行，部分报错：
AttributeError: 'NoneType' object has no attribute 'acquire'
具体报错信息：
2024-04-26 07:13:11,178 - _client.py[line:1758] - INFO: HTTP Request: POST http://127.0.0.1:30000/v1/chat/completions "HTTP/1.1 200 OK"
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/home/algo/anaconda3/envs/chatfaqgpu/lib/python3.11/site-packages/sse_starlette/sse.py", line 269, in call
await wrap(partial(self.listen_for_disconnect, receive))
File "/home/algo/anaconda3/envs/chatfaqgpu/lib/python3.11/site-packages/sse_starlette/sse.py", line 258, in wrap
await func()
File "/home/algo/anaconda3/envs/chatfaqgpu/lib/python3.11/site-packages/sse_starlette/sse.py", line 215, in listen_for_disconnect
message = await receive()
^^^^^^^^^^^^^^^
File "/home/algo/anaconda3/envs/chatfaqgpu/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 568, in receive
await self.message_event.wait()
File "/home/algo/anaconda3/envs/chatfaqgpu/lib/python3.11/asyncio/locks.py", line 213, in wait
await fut
asyncio.exceptions.CancelledError: Cancelled by cancel scope 7f1478689710

During handling of the above exception, another exception occurred:

Exception Group Traceback (most recent call last):
| File "/home/algo/anaconda3/envs/chatfaqgpu/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 411, in run_asgi
| result = await app( # type: ignore[func-returns-value]
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| File "/home/algo/anaconda3/envs/chatfaqgpu/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 69, in call
| return await self.app(scope, receive, send)
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| File "/home/algo/anaconda3/envs/chatfaqgpu/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in call
| await super().call(scope, receive, send)
| File "/home/algo/anaconda3/envs/chatfaqgpu/lib/python3.11/site-packages/starlette/applications.py", line 119, in call
| await self.middleware_stack(scope, receive, send)
| File "/home/algo/anaconda3/envs/chatfaqgpu/lib/python3.11/site-packages/starlette/middleware/errors.py", line 186, in call
| raise exc
| File "/home/algo/anaconda3/envs/chatfaqgpu/lib/python3.11/site-packages/starlette/middleware/errors.py", line 164, in call
| await self.app(scope, receive, _send)
| File "/home/algo/anaconda3/envs/chatfaqgpu/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 62, in call
| await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
| File "/home/algo/anaconda3/envs/chatfaqgpu/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
| raise exc
| File "/home/algo/anaconda3/envs/chatfaqgpu/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
| await app(scope, receive, sender)
| File "/home/algo/anaconda3/envs/chatfaqgpu/lib/python3.11/site-packages/starlette/routing.py", line 762, in call
| await self.middleware_stack(scope, receive, send)
| File "/home/algo/anaconda3/envs/chatfaqgpu/lib/python3.11/site-packages/starlette/routing.py", line 782, in app
| await route.handle(scope, receive, send)
| File "/home/algo/anaconda3/envs/chatfaqgpu/lib/python3.11/site-packages/starlette/routing.py", line 297, in handle
| await self.app(scope, receive, send)
| File "/home/algo/anaconda3/envs/chatfaqgpu/lib/python3.11/site-packages/starlette/routing.py", line 77, in app
| await wrap_app_handling_exceptions(app, request)(scope, receive, send)
| File "/home/algo/anaconda3/envs/chatfaqgpu/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
| raise exc
| File "/home/algo/anaconda3/envs/chatfaqgpu/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
| await app(scope, receive, sender)
| File "/home/algo/anaconda3/envs/chatfaqgpu/lib/python3.11/site-packages/starlette/routing.py", line 75, in app
| await response(scope, receive, send)
| File "/home/algo/anaconda3/envs/chatfaqgpu/lib/python3.11/site-packages/sse_starlette/sse.py", line 255, in call
| async with anyio.create_task_group() as task_group:
| File "/home/algo/anaconda3/envs/chatfaqgpu/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 678, in aexit
| raise BaseExceptionGroup(
| ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
+-+---------------- 1 ----------------
| Traceback (most recent call last):
| File "/home/algo/anaconda3/envs/chatfaqgpu/lib/python3.11/site-packages/sse_starlette/sse.py", line 258, in wrap
| await func()
| File "/home/algo/anaconda3/envs/chatfaqgpu/lib/python3.11/site-packages/sse_starlette/sse.py", line 245, in stream_response
| async for data in self.body_iterator:
| File "/home/algo/workproject/Langchain-Chatchat-gen-v1.0.1/server/chat/knowledge_base_chat.py", line 109, in knowledge_base_chat_iterator
| docs = await run_in_threadpool(search_docs,
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| File "/home/algo/anaconda3/envs/chatfaqgpu/lib/python3.11/site-packages/starlette/concurrency.py", line 40, in run_in_threadpool
| return await anyio.to_thread.run_sync(func, *args)
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| File "/home/algo/anaconda3/envs/chatfaqgpu/lib/python3.11/site-packages/anyio/to_thread.py", line 56, in run_sync
| return await get_async_backend().run_sync_in_worker_thread(
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| File "/home/algo/anaconda3/envs/chatfaqgpu/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2134, in run_sync_in_worker_thread
| return await future
| ^^^^^^^^^^^^
| File "/home/algo/anaconda3/envs/chatfaqgpu/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 851, in run
| result = context.run(func, *args)
| ^^^^^^^^^^^^^^^^^^^^^^^^
| File "/home/algo/workproject/Langchain-Chatchat-gen-v1.0.1/server/knowledge_base/kb_doc_api.py", line 38, in search_docs
| docs = kb.search_docs(query, top_k, score_threshold)
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| File "/home/algo/workproject/Langchain-Chatchat-gen-v1.0.1/server/knowledge_base/kb_service/base.py", line 182, in search_docs
| docs = self.do_search(query, top_k, score_threshold)
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| File "/home/algo/workproject/Langchain-Chatchat-gen-v1.0.1/server/knowledge_base/kb_service/faiss_kb_service.py", line 67, in do_search
| with self.load_vector_store().acquire() as vs:
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| AttributeError: 'NoneType' object has no attribute 'acquire'

环境信息 / Environment Information

langchain-ChatGLM 版本/commit 号：v0.2.10
是否使用 Docker 部署（是/否）：否
使用的模型（ChatGLM2-6B / Qwen-7B 等）：ChatGLM3-6B
使用的 Embedding 模型（moka-ai/m3e-base 等）：bge-m3
使用的向量库类型 (faiss / milvus / pg_vector 等)： faiss
操作系统及版本 / Operating system and version: ubuntu
Python 版本 / Python version: 3.11.7
其他相关环境信息 / Other relevant environment information:

附加信息 / Additional Information
添加与问题相关的任何其他信息 / Add any other information related to the issue.

The text was updated successfully, but these errors were encountered:

dosubot · 2024-04-26T08:10:44Z

The error you're encountering, "AttributeError: 'NoneType' object has no attribute 'acquire'", indicates that the ThreadSafeFaiss object is not properly initialized at the time you're trying to acquire a lock on it. This problem arises when the load_vector_store method returns None instead of a valid ThreadSafeFaiss instance. To address this issue, ensure that the embeddings or vector stores are properly initialized and loaded before they are accessed. Here's a more focused approach to resolve the issue:

Check Initialization and Loading: Confirm that the vector stores are correctly initialized and loaded before any access attempts. This involves ensuring that the load_vector_store method always returns a fully initialized ThreadSafeFaiss instance.
Implement Error Handling and Logging: Add error handling and logging within the initialization and loading processes. This will help identify and troubleshoot issues more effectively.
Ensure Object Readiness with wait_for_loading: Before using the embeddings or vector store, utilize the wait_for_loading method to ensure the object is fully loaded and ready for use.
Review Thread Safety and Concurrency: Given the concurrent nature of your application, it's crucial to ensure that all operations on shared resources are thread-safe to avoid race conditions or deadlocks.

By following these steps, you should be able to mitigate the intermittent failures related to embeddings not being loaded properly when handling multiple concurrent requests with vllm acceleration.

Details

Sources

About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

zRzRzRzRzRzRzR · 2024-05-01T09:19:12Z

这个框架没做并发啊

sweetautumn added the bug Something isn't working label Apr 26, 2024

zRzRzRzRzRzRzR closed this as not planned Won't fix, can't repro, duplicate, stale May 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] 多并发调用，偶尔出现因为embedding没加载而调用失败 #3899

[BUG] 多并发调用，偶尔出现因为embedding没加载而调用失败 #3899

sweetautumn commented Apr 26, 2024 •

edited

dosubot bot commented Apr 26, 2024 •

edited

Details

zRzRzRzRzRzRzR commented May 1, 2024

[BUG] 多并发调用，偶尔出现因为embedding没加载而调用失败 #3899

[BUG] 多并发调用，偶尔出现因为embedding没加载而调用失败 #3899

Comments

sweetautumn commented Apr 26, 2024 • edited

dosubot bot commented Apr 26, 2024 • edited

Details

zRzRzRzRzRzRzR commented May 1, 2024

sweetautumn commented Apr 26, 2024 •

edited

dosubot bot commented Apr 26, 2024 •

edited