You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When access restful api with wrong model_uid, and error raise enough of times (benchmark with > 100 concurrence), will lead to completely dead lock on server.
problem might be in restful api or xoscar of unhandle exceptions with locks.
The version of xinference:
xinference : commit 7c974be
hardware enviroment: reproduce on multiple deployment, 4090x8 and a40x8
Steps to reproduce
a). env XINFERENCE_MODEL_SRC=modelscope xinference-local
b) xinference login --username administrator --password administrator
c) launch the model with model_uid "qwen1.5-7"
Traceback (most recent call last):
File "/home/clouduser/inference/xinference/api/restful_api.py", line 1322, in create_chat_completion
model = await (await self._get_supervisor_ref()).get_model(model_uid)
File "/home/clouduser/anaconda3/lib/python3.11/site-packages/xoscar/backends/context.py", line 227, in send
return self._process_result_message(result)
File "/home/clouduser/anaconda3/lib/python3.11/site-packages/xoscar/backends/context.py", line 102, in _process_result_message
raise message.as_instanceof_cause()
File "/home/clouduser/anaconda3/lib/python3.11/site-packages/xoscar/backends/pool.py", line 659, in send
result = await self._run_coro(message.message_id, coro)
File "/home/clouduser/anaconda3/lib/python3.11/site-packages/xoscar/backends/pool.py", line 370, in _run_coro
return await coro
File "/home/clouduser/anaconda3/lib/python3.11/site-packages/xoscar/api.py", line 384, in __on_receive__
return await super().__on_receive__(message) # type: ignore
File "xoscar/core.pyx", line 558, in __on_receive__
raise ex
File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.__on_receive__
async with self._lock:
^^^^^^^^^^^^^^^^^
File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.__on_receive__
with debug_async_timeout('actor_lock_timeout',
^^^^^^^^^^^^^^^^^
File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
result = await result
^^^^^^^^^^^^^^^^^
File "/home/clouduser/inference/xinference/core/utils.py", line 45, in wrapped
ret = await func(*args, **kwargs)
^^^^^^^^^^^^^^^^^
File "/home/clouduser/inference/xinference/core/supervisor.py", line 989, in get_model
raise ValueError(f"Model not found in the model list, uid: {model_uid}")
^^^^^^^^^^^^^^^^^
ValueError: [address=127.0.0.1:27897, pid=212329] Model not found in the model list, uid: qwen1.5-7-1
e) all command will hang afterwards:
or
xinference terminate --model-uid qwen1.5-7
Expected behavior
Server should not hang after running benchmark script with invalid model_uid
The text was updated successfully, but these errors were encountered:
Describe the bug
When access restful api with wrong model_uid, and error raise enough of times (benchmark with > 100 concurrence), will lead to completely dead lock on server.
problem might be in restful api or xoscar of unhandle exceptions with locks.
To Reproduce
xoscar: 0.3.0
torch: 2.2.2
vllm: 0.4.1
transformers: 4.40.1
xinference : commit 7c974be
a). env XINFERENCE_MODEL_SRC=modelscope xinference-local
b) xinference login --username administrator --password administrator
c) launch the model with model_uid "qwen1.5-7"
d) benchmark the model with wrong model_uid "qwen1.5.7-1"
Will raise many exception like:
e) all command will hang afterwards:
Expected behavior
Server should not hang after running benchmark script with invalid model_uid
The text was updated successfully, but these errors were encountered: