Unhandled exceptions of restful api lead to server hang #1424

frostyplanet · 2024-05-03T11:09:53Z

Describe the bug

When access restful api with wrong model_uid, and error raise enough of times （benchmark with > 100 concurrence）, will lead to completely dead lock on server.
problem might be in restful api or xoscar of unhandle exceptions with locks.

To Reproduce

Python version: 3.10.12
Versions of crucial packages:
xoscar: 0.3.0
torch: 2.2.2
vllm: 0.4.1
transformers: 4.40.1
The version of xinference:
xinference : commit 7c974be
hardware enviroment: reproduce on multiple deployment, 4090x8 and a40x8
Steps to reproduce

a). env XINFERENCE_MODEL_SRC=modelscope xinference-local
b) xinference login --username administrator --password administrator
c) launch the model with model_uid "qwen1.5-7"

 xinference launch -u qwen1.5-7 -n qwen1.5-chat -s 7 --max_model_len 8192 --dtype half -f gptq -q Int4 --n-gpu 1

d) benchmark the model with wrong model_uid "qwen1.5.7-1"

 env HF_ENDPOINT=https://hf-mirror.com python benchmark/benchmark_serving.py --dataset ~/dataset/ShareGPT_V3_unfiltered_cleaned_split.json --tokenizer qwen/qwen1.5-7B-chat-gptq-int4 --model-uid qwen1.5-7-1 --num-prompts 400

Will raise many exception like:

Traceback (most recent call last):                                                      
  File "/home/clouduser/inference/xinference/api/restful_api.py", line 1322, in create_chat_completion            
    model = await (await self._get_supervisor_ref()).get_model(model_uid)
  File "/home/clouduser/anaconda3/lib/python3.11/site-packages/xoscar/backends/context.py", line 227, in send
    return self._process_result_message(result)
  File "/home/clouduser/anaconda3/lib/python3.11/site-packages/xoscar/backends/context.py", line 102, in _process_result_message
    raise message.as_instanceof_cause()
  File "/home/clouduser/anaconda3/lib/python3.11/site-packages/xoscar/backends/pool.py", line 659, in send
    result = await self._run_coro(message.message_id, coro)
  File "/home/clouduser/anaconda3/lib/python3.11/site-packages/xoscar/backends/pool.py", line 370, in _run_coro
    return await coro
  File "/home/clouduser/anaconda3/lib/python3.11/site-packages/xoscar/api.py", line 384, in __on_receive__
    return await super().__on_receive__(message)  # type: ignore                        
  File "xoscar/core.pyx", line 558, in __on_receive__                                   
    raise ex                                                                            
  File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.__on_receive__         
    async with self._lock: 
    ^^^^^^^^^^^^^^^^^
  File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.__on_receive__  
    with debug_async_timeout('actor_lock_timeout', 
    ^^^^^^^^^^^^^^^^^ 
  File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
    result = await result 
    ^^^^^^^^^^^^^^^^^                                                                   
  File "/home/clouduser/inference/xinference/core/utils.py", line 45, in wrapped
    ret = await func(*args, **kwargs) 
    ^^^^^^^^^^^^^^^^^ 
  File "/home/clouduser/inference/xinference/core/supervisor.py", line 989, in get_model              
    raise ValueError(f"Model not found in the model list, uid: {model_uid}")
    ^^^^^^^^^^^^^^^^^                                                                   
ValueError: [address=127.0.0.1:27897, pid=212329] Model not found in the model list, uid: qwen1.5-7-1

e) all command will hang afterwards:

or 
   xinference terminate --model-uid qwen1.5-7

Expected behavior

Server should not hang after running benchmark script with invalid model_uid

The text was updated successfully, but these errors were encountered:

XprobeBot added the gpu label May 3, 2024

XprobeBot added this to the v0.11.0 milestone May 3, 2024

XprobeBot modified the milestones: v0.11.0, v0.11.1, v0.11.2 May 11, 2024

XprobeBot modified the milestones: v0.11.2, v0.11.3 May 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unhandled exceptions of restful api lead to server hang #1424

Unhandled exceptions of restful api lead to server hang #1424

frostyplanet commented May 3, 2024 •

edited

Unhandled exceptions of restful api lead to server hang #1424

Unhandled exceptions of restful api lead to server hang #1424

Comments

frostyplanet commented May 3, 2024 • edited

Describe the bug

To Reproduce

Expected behavior

frostyplanet commented May 3, 2024 •

edited