bug: error coming up while install the vllm using pip install "openllm[vllm]" #967

Developer-atomic-amardeep · 2024-04-25T19:12:28Z

Describe the bug

(codellama) amardeep.yadav@fintricity.com@codellamamachine:~$ pip install "openllm[vllm]"
Requirement already satisfied: openllm[vllm] in ./miniconda3/envs/codellama/lib/python3.12/site-packages (0.4.44)
Requirement already satisfied: accelerate in ./miniconda3/envs/codellama/lib/python3.12/site-packages (from openllm[vllm]) (0.29.3)
Requirement already satisfied: bentoml<1.2,>=1.1.11 in ./miniconda3/envs/codellama/lib/python3.12/site-packages (from bentoml[io]<1.2,>=1.1.11->openllm[vllm]) (1.1.11)
Requirement already satisfied: bitsandbytes<0.42 in ./miniconda3/envs/codellama/lib/python3.12/site-packages (from openllm[vllm]) (0.41.3.post2)
Requirement already satisfied: build<1 in ./miniconda3/envs/codellama/lib/python3.12/site-packages (from build[virtualenv]<1->openllm[vllm]) (0.10.0)
Requirement already satisfied: click>=8.1.3 in ./miniconda3/envs/codellama/lib/python3.12/site-packages (from openllm[vllm]) (8.1.7)
Requirement already satisfied: cuda-python in ./miniconda3/envs/codellama/lib/python3.12/site-packages (from openllm[vllm]) (12.4.0)
Requirement already satisfied: einops in ./miniconda3/envs/codellama/lib/python3.12/site-packages (from openllm[vllm]) (0.7.0)
Requirement already satisfied: ghapi in ./miniconda3/envs/codellama/lib/python3.12/site-packages (from openllm[vllm]) (1.0.5)
Requirement already satisfied: openllm-client>=0.4.44 in ./miniconda3/envs/codellama/lib/python3.12/site-packages (from openllm[vllm]) (0.4.44)
Requirement already satisfied: openllm-core>=0.4.44 in ./miniconda3/envs/codellama/lib/python3.12/site-packages (from openllm[vllm]) (0.4.44)
Requirement already satisfied: optimum>=1.12.0 in ./miniconda3/envs/codellama/lib/python3.12/site-packages (from openllm[vllm]) (1.19.1)
Requirement already satisfied: safetensors in ./miniconda3/envs/codellama/lib/python3.12/site-packages (from openllm[vllm]) (0.4.3)
Requirement already satisfied: scipy in ./miniconda3/envs/codellama/lib/python3.12/site-packages (from openllm[vllm]) (1.13.0)
Requirement already satisfied: sentencepiece in ./miniconda3/envs/codellama/lib/python3.12/site-packages (from openllm[vllm]) (0.2.0)
Requirement already satisfied: transformers>=4.36.0 in ./miniconda3/envs/codellama/lib/python3.12/site-packages (from transformers[tokenizers,torch]>=4.36.0->openllm[vllm]) (4.40.1)
INFO: pip is looking at multiple versions of openllm[vllm] to determine which version is compatible with other requirements. This could take a while.
Collecting openllm[vllm]
Using cached openllm-0.4.43-py3-none-any.whl.metadata (62 kB)
Using cached openllm-0.4.42-py3-none-any.whl.metadata (62 kB)
Using cached openllm-0.4.41-py3-none-any.whl.metadata (62 kB)
Using cached openllm-0.4.40-py3-none-any.whl.metadata (62 kB)
Using cached openllm-0.4.39-py3-none-any.whl.metadata (62 kB)
Collecting megablocks (from openllm[vllm])
Using cached megablocks-0.5.1.tar.gz (49 kB)
Preparing metadata (setup.py) ... done
Collecting openllm[vllm]
Using cached openllm-0.4.38-py3-none-any.whl.metadata (62 kB)
Using cached openllm-0.4.37-py3-none-any.whl.metadata (62 kB)
INFO: pip is still looking at multiple versions of openllm[vllm] to determine which version is compatible with other requirements. This could take a while.
Using cached openllm-0.4.36-py3-none-any.whl.metadata (60 kB)
Using cached openllm-0.4.35-py3-none-any.whl.metadata (60 kB)
Collecting vllm>=0.2.2 (from openllm[vllm])
Using cached vllm-0.3.3.tar.gz (315 kB)
Installing build dependencies ... error
error: subprocess-exited-with-error

× pip subprocess to install build dependencies did not run successfully.
│ exit code: 1
╰─> [8 lines of output]
Collecting ninja
Using cached ninja-1.11.1.1-py2.py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl.metadata (5.3 kB)
Collecting packaging
Using cached packaging-24.0-py3-none-any.whl.metadata (3.2 kB)
Collecting setuptools>=49.4.0
Using cached setuptools-69.5.1-py3-none-any.whl.metadata (6.2 kB)
ERROR: Could not find a version that satisfies the requirement torch==2.1.2 (from versions: 2.2.0, 2.2.1, 2.2.2, 2.3.0)
ERROR: No matching distribution found for torch==2.1.2
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× pip subprocess to install build dependencies did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

To reproduce

Step:1 create a normal setup for openllm with conda env
Step:2 RUN TRUST_REMOTE_CODE=True openllm start codellama/CodeLlama-34b-Instruct-hf --backend vllm
The following error might me visible to you:
(codellama) amardeep.yadav@fintricity.com@codellamamachine:~$ TRUST_REMOTE_CODE=True openllm start codellama/CodeLlama-34b-Instruct-hf --backend vllm
config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 588/588 [00:00<00:00, 7.36MB/s]
tokenizer_config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.59k/1.59k [00:00<00:00, 20.3MB/s]
tokenizer.model: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 500k/500k [00:00<00:00, 91.4MB/s]
tokenizer.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.84M/1.84M [00:00<00:00, 60.7MB/s]
special_tokens_map.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 411/411 [00:00<00:00, 5.58MB/s]
generation_config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 116/116 [00:00<00:00, 1.54MB/s]
model.safetensors.index.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 37.6k/37.6k [00:00<00:00, 116MB/s]
pytorch_model.bin.index.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 35.8k/35.8k [00:00<00:00, 207MB/s]
model-00007-of-00007.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 9.19G/9.19G [00:50<00:00, 180MB/s]
model-00001-of-00007.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 9.85G/9.85G [00:52<00:00, 188MB/s]
model-00002-of-00007.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 9.69G/9.69G [00:52<00:00, 183MB/s]
model-00003-of-00007.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 9.69G/9.69G [00:52<00:00, 183MB/s]
model-00006-of-00007.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 9.69G/9.69G [00:53<00:00, 180MB/s]
model-00005-of-00007.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 9.69G/9.69G [00:54<00:00, 179MB/s]
model-00004-of-00007.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 9.69G/9.69G [00:54<00:00, 179MB/s]
Fetching 15 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 15/15 [00:54<00:00, 3.63s/it]
^[[B^[[B^[[B^[[B^[[B^[[B^[[A^[[A^[[A🚀Tip: run 'openllm build codellama/CodeLlama-34b-Instruct-hf --backend vllm --serialization safetensors' to create a BentoLLM for 'codellama/CodeLlama-34b-Instruct-hf's: 100%|████████████████████████████████████████████████████████████████████████████████████████████████▋| 9.66G/9.69G [00:54<00:00, 327MB/s]
2024-04-25T18:34:00+0000 [INFO] [cli] Prometheus metrics for HTTP BentoServer from "_service:svc" can be accessed at http://localhost:3000/metrics.9G [00:53<00:00, 285MB/s]
2024-04-25T18:34:01+0000 [INFO] [cli] Starting production HTTP BentoServer from "_service:svc" listening on http://0.0.0.0:3000 (Press CTRL+C to quit)[00:53<00:00, 286MB/s]
2024-04-25T18:34:04+0000 [ERROR] [runner:llm-llama-runner:1] An exception occurred while instantiating runner 'llm-llama-runner', see details below:
2024-04-25T18:34:04+0000 [ERROR] [runner:llm-llama-runner:1] Traceback (most recent call last):
File "/home/amardeep.yadav/miniconda3/envs/codellama/lib/python3.12/site-packages/bentoml/_internal/runner/runner.py", line 307, in init_local
self._set_handle(LocalRunnerRef)
File "/home/amardeep.yadav/miniconda3/envs/codellama/lib/python3.12/site-packages/bentoml/_internal/runner/runner.py", line 150, in _set_handle
runner_handle = handle_class(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/amardeep.yadav/miniconda3/envs/codellama/lib/python3.12/site-packages/bentoml/_internal/runner/runner_handle/local.py", line 27, in init
self._runnable = runner.runnable_class(**runner.runnable_init_params) # type: ignore
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/amardeep.yadav/miniconda3/envs/codellama/lib/python3.12/site-packages/openllm/_runners.py", line 121, in init
raise openllm.exceptions.OpenLLMException('vLLM is not installed. Do pip install "openllm[vllm]".')
openllm_core.exceptions.OpenLLMException: vLLM is not installed. Do pip install "openllm[vllm]".

2024-04-25T18:34:04+0000 [ERROR] [runner:llm-llama-runner:1] Traceback (most recent call last):
File "/home/amardeep.yadav/miniconda3/envs/codellama/lib/python3.12/site-packages/starlette/routing.py", line 732, in lifespan
async with self.lifespan_context(app) as maybe_state:
File "/home/amardeep.yadav/miniconda3/envs/codellama/lib/python3.12/contextlib.py", line 210, in aenter
return await anext(self.gen)
^^^^^^^^^^^^^^^^^^^^^
File "/home/amardeep.yadav/miniconda3/envs/codellama/lib/python3.12/site-packages/bentoml/_internal/server/base_app.py", line 75, in lifespan
on_startup()
File "/home/amardeep.yadav/miniconda3/envs/codellama/lib/python3.12/site-packages/bentoml/_internal/runner/runner.py", line 317, in init_local
raise e
File "/home/amardeep.yadav/miniconda3/envs/codellama/lib/python3.12/site-packages/bentoml/_internal/runner/runner.py", line 307, in init_local
self._set_handle(LocalRunnerRef)
File "/home/amardeep.yadav/miniconda3/envs/codellama/lib/python3.12/site-packages/bentoml/_internal/runner/runner.py", line 150, in _set_handle
runner_handle = handle_class(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/amardeep.yadav/miniconda3/envs/codellama/lib/python3.12/site-packages/bentoml/_internal/runner/runner_handle/local.py", line 27, in init
self._runnable = runner.runnable_class(**runner.runnable_init_params) # type: ignore
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/amardeep.yadav/miniconda3/envs/codellama/lib/python3.12/site-packages/openllm/_runners.py", line 121, in init
raise openllm.exceptions.OpenLLMException('vLLM is not installed. Do pip install "openllm[vllm]".')
openllm_core.exceptions.OpenLLMException: vLLM is not installed. Do pip install "openllm[vllm]".

2024-04-25T18:34:04+0000 [ERROR] [runner:llm-llama-runner:1] Application startup failed. Exiting.

Logs

Mentioned everything above.

Environment

Environment variable

BENTOML_DEBUG=''
BENTOML_QUIET=''
BENTOML_BUNDLE_LOCAL_BUILD=''
BENTOML_DO_NOT_TRACK=''
BENTOML_CONFIG=''
BENTOML_CONFIG_OPTIONS=''
BENTOML_PORT=''
BENTOML_HOST=''
BENTOML_API_WORKERS=''

System information

bentoml: 1.1.11
python: 3.12.2
platform: Linux-5.15.0-1050-azure-x86_64-with-glibc2.31
uid_gid: 14830125:14830125
conda: 24.3.0
in_conda_env: True

conda_packages

name: codellama
channels:
  - defaults
dependencies:
  - _libgcc_mutex=0.1=main
  - _openmp_mutex=5.1=1_gnu
  - bzip2=1.0.8=h5eee18b_5
  - ca-certificates=2024.3.11=h06a4308_0
  - expat=2.6.2=h6a678d5_0
  - ld_impl_linux-64=2.38=h1181459_1
  - libffi=3.4.4=h6a678d5_0
  - libgcc-ng=11.2.0=h1234567_1
  - libgomp=11.2.0=h1234567_1
  - libstdcxx-ng=11.2.0=h1234567_1
  - libuuid=1.41.5=h5eee18b_0
  - ncurses=6.4=h6a678d5_0
  - openssl=3.0.13=h7f8727e_0
  - pip=23.3.1=py312h06a4308_0
  - python=3.12.2=h996f2a0_0
  - readline=8.2=h5eee18b_0
  - setuptools=68.2.2=py312h06a4308_0
  - sqlite=3.41.2=h5eee18b_0
  - tk=8.6.12=h1ccaba5_0
  - wheel=0.41.2=py312h06a4308_0
  - xz=5.4.6=h5eee18b_0
  - zlib=1.2.13=h5eee18b_0
  - pip:
      - accelerate==0.29.3
      - aiohttp==3.9.5
      - aiosignal==1.3.1
      - anyio==4.3.0
      - appdirs==1.4.4
      - asgiref==3.8.1
      - attrs==23.2.0
      - bentoml==1.1.11
      - bitsandbytes==0.41.3.post2
      - build==0.10.0
      - cattrs==23.1.2
      - certifi==2024.2.2
      - charset-normalizer==3.3.2
      - circus==0.18.0
      - click==8.1.7
      - click-option-group==0.5.6
      - cloudpickle==3.0.0
      - coloredlogs==15.0.1
      - contextlib2==21.6.0
      - cuda-python==12.4.0
      - datasets==2.19.0
      - deepmerge==1.1.1
      - deprecated==1.2.14
      - dill==0.3.8
      - distlib==0.3.8
      - distro==1.9.0
      - einops==0.7.0
      - fastcore==1.5.29
      - filelock==3.13.4
      - filetype==1.2.0
      - frozenlist==1.4.1
      - fs==2.4.16
      - fsspec==2024.3.1
      - ghapi==1.0.5
      - h11==0.14.0
      - httpcore==1.0.5
      - httpx==0.27.0
      - huggingface-hub==0.22.2
      - humanfriendly==10.0
      - idna==3.7
      - importlib-metadata==6.11.0
      - inflection==0.5.1
      - jinja2==3.1.3
      - markdown-it-py==3.0.0
      - markupsafe==2.1.5
      - mdurl==0.1.2
      - mpmath==1.3.0
      - multidict==6.0.5
      - multiprocess==0.70.16
      - mypy-extensions==1.0.0
      - networkx==3.3
      - numpy==1.26.4
      - nvidia-cublas-cu12==12.1.3.1
      - nvidia-cuda-cupti-cu12==12.1.105
      - nvidia-cuda-nvrtc-cu12==12.1.105
      - nvidia-cuda-runtime-cu12==12.1.105
      - nvidia-cudnn-cu12==8.9.2.26
      - nvidia-cufft-cu12==11.0.2.54
      - nvidia-curand-cu12==10.3.2.106
      - nvidia-cusolver-cu12==11.4.5.107
      - nvidia-cusparse-cu12==12.1.0.106
      - nvidia-ml-py==11.525.150
      - nvidia-nccl-cu12==2.20.5
      - nvidia-nvjitlink-cu12==12.4.127
      - nvidia-nvtx-cu12==12.1.105
      - openllm==0.4.44
      - openllm-client==0.4.44
      - openllm-core==0.4.44
      - opentelemetry-api==1.20.0
      - opentelemetry-instrumentation==0.41b0
      - opentelemetry-instrumentation-aiohttp-client==0.41b0
      - opentelemetry-instrumentation-asgi==0.41b0
      - opentelemetry-sdk==1.20.0
      - opentelemetry-semantic-conventions==0.41b0
      - opentelemetry-util-http==0.41b0
      - optimum==1.19.1
      - orjson==3.10.1
      - packaging==24.0
      - pandas==2.2.2
      - pathspec==0.12.1
      - pillow==10.3.0
      - pip-requirements-parser==32.0.1
      - pip-tools==7.3.0
      - platformdirs==4.2.1
      - prometheus-client==0.20.0
      - protobuf==5.26.1
      - psutil==5.9.8
      - pyarrow==16.0.0
      - pyarrow-hotfix==0.6
      - pydantic==1.10.15
      - pygments==2.17.2
      - pyparsing==3.1.2
      - pyproject-hooks==1.0.0
      - python-dateutil==2.9.0.post0
      - python-json-logger==2.0.7
      - python-multipart==0.0.9
      - pytz==2024.1
      - pyyaml==6.0.1
      - pyzmq==26.0.2
      - regex==2024.4.16
      - requests==2.31.0
      - rich==13.7.1
      - safetensors==0.4.3
      - schema==0.7.5
      - scipy==1.13.0
      - sentencepiece==0.2.0
      - simple-di==0.1.5
      - six==1.16.0
      - sniffio==1.3.1
      - starlette==0.37.2
      - sympy==1.12
      - tokenizers==0.19.1
      - torch==2.3.0
      - tornado==6.4
      - tqdm==4.66.2
      - transformers==4.40.1
      - typing-extensions==4.11.0
      - tzdata==2024.1
      - urllib3==2.2.1
      - uvicorn==0.29.0
      - virtualenv==20.26.0
      - watchfiles==0.21.0
      - wrapt==1.16.0
      - xxhash==3.4.1
      - yarl==1.9.4
      - zipp==3.18.1
prefix: /home/amardeep.yadav/miniconda3/envs/codellama

pip_packages

accelerate==0.29.3
aiohttp==3.9.5
aiosignal==1.3.1
anyio==4.3.0
appdirs==1.4.4
asgiref==3.8.1
attrs==23.2.0
bentoml==1.1.11
bitsandbytes==0.41.3.post2
build==0.10.0
cattrs==23.1.2
certifi==2024.2.2
charset-normalizer==3.3.2
circus==0.18.0
click==8.1.7
click-option-group==0.5.6
cloudpickle==3.0.0
coloredlogs==15.0.1
contextlib2==21.6.0
cuda-python==12.4.0
datasets==2.19.0
deepmerge==1.1.1
Deprecated==1.2.14
dill==0.3.8
distlib==0.3.8
distro==1.9.0
einops==0.7.0
fastcore==1.5.29
filelock==3.13.4
filetype==1.2.0
frozenlist==1.4.1
fs==2.4.16
fsspec==2024.3.1
ghapi==1.0.5
h11==0.14.0
httpcore==1.0.5
httpx==0.27.0
huggingface-hub==0.22.2
humanfriendly==10.0
idna==3.7
importlib-metadata==6.11.0
inflection==0.5.1
Jinja2==3.1.3
markdown-it-py==3.0.0
MarkupSafe==2.1.5
mdurl==0.1.2
mpmath==1.3.0
multidict==6.0.5
multiprocess==0.70.16
mypy-extensions==1.0.0
networkx==3.3
numpy==1.26.4
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-ml-py==11.525.150
nvidia-nccl-cu12==2.20.5
nvidia-nvjitlink-cu12==12.4.127
nvidia-nvtx-cu12==12.1.105
openllm==0.4.44
openllm-client==0.4.44
openllm-core==0.4.44
opentelemetry-api==1.20.0
opentelemetry-instrumentation==0.41b0
opentelemetry-instrumentation-aiohttp-client==0.41b0
opentelemetry-instrumentation-asgi==0.41b0
opentelemetry-sdk==1.20.0
opentelemetry-semantic-conventions==0.41b0
opentelemetry-util-http==0.41b0
optimum==1.19.1
orjson==3.10.1
packaging==24.0
pandas==2.2.2
pathspec==0.12.1
pillow==10.3.0
pip-requirements-parser==32.0.1
pip-tools==7.3.0
platformdirs==4.2.1
prometheus_client==0.20.0
protobuf==5.26.1
psutil==5.9.8
pyarrow==16.0.0
pyarrow-hotfix==0.6
pydantic==1.10.15
Pygments==2.17.2
pyparsing==3.1.2
pyproject_hooks==1.0.0
python-dateutil==2.9.0.post0
python-json-logger==2.0.7
python-multipart==0.0.9
pytz==2024.1
PyYAML==6.0.1
pyzmq==26.0.2
regex==2024.4.16
requests==2.31.0
rich==13.7.1
safetensors==0.4.3
schema==0.7.5
scipy==1.13.0
sentencepiece==0.2.0
setuptools==68.2.2
simple-di==0.1.5
six==1.16.0
sniffio==1.3.1
starlette==0.37.2
sympy==1.12
tokenizers==0.19.1
torch==2.3.0
tornado==6.4
tqdm==4.66.2
transformers==4.40.1
typing_extensions==4.11.0
tzdata==2024.1
urllib3==2.2.1
uvicorn==0.29.0
virtualenv==20.26.0
watchfiles==0.21.0
wheel==0.41.2
wrapt==1.16.0
xxhash==3.4.1
yarl==1.9.4
zipp==3.18.1

System information (Optional)

No response

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: error coming up while install the vllm using pip install "openllm[vllm]" #967

bug: error coming up while install the vllm using pip install "openllm[vllm]" #967

Developer-atomic-amardeep commented Apr 25, 2024

bug: error coming up while install the vllm using pip install "openllm[vllm]" #967

bug: error coming up while install the vllm using pip install "openllm[vllm]" #967

Comments

Developer-atomic-amardeep commented Apr 25, 2024

Describe the bug

To reproduce

Logs

Environment

Environment variable

System information

System information (Optional)