Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: error coming up while install the vllm using pip install "openllm[vllm]" #967

Open
Developer-atomic-amardeep opened this issue Apr 25, 2024 · 0 comments

Comments

@Developer-atomic-amardeep

Describe the bug

(codellama) amardeep.yadav@fintricity.com@codellamamachine:~$ pip install "openllm[vllm]"
Requirement already satisfied: openllm[vllm] in ./miniconda3/envs/codellama/lib/python3.12/site-packages (0.4.44)
Requirement already satisfied: accelerate in ./miniconda3/envs/codellama/lib/python3.12/site-packages (from openllm[vllm]) (0.29.3)
Requirement already satisfied: bentoml<1.2,>=1.1.11 in ./miniconda3/envs/codellama/lib/python3.12/site-packages (from bentoml[io]<1.2,>=1.1.11->openllm[vllm]) (1.1.11)
Requirement already satisfied: bitsandbytes<0.42 in ./miniconda3/envs/codellama/lib/python3.12/site-packages (from openllm[vllm]) (0.41.3.post2)
Requirement already satisfied: build<1 in ./miniconda3/envs/codellama/lib/python3.12/site-packages (from build[virtualenv]<1->openllm[vllm]) (0.10.0)
Requirement already satisfied: click>=8.1.3 in ./miniconda3/envs/codellama/lib/python3.12/site-packages (from openllm[vllm]) (8.1.7)
Requirement already satisfied: cuda-python in ./miniconda3/envs/codellama/lib/python3.12/site-packages (from openllm[vllm]) (12.4.0)
Requirement already satisfied: einops in ./miniconda3/envs/codellama/lib/python3.12/site-packages (from openllm[vllm]) (0.7.0)
Requirement already satisfied: ghapi in ./miniconda3/envs/codellama/lib/python3.12/site-packages (from openllm[vllm]) (1.0.5)
Requirement already satisfied: openllm-client>=0.4.44 in ./miniconda3/envs/codellama/lib/python3.12/site-packages (from openllm[vllm]) (0.4.44)
Requirement already satisfied: openllm-core>=0.4.44 in ./miniconda3/envs/codellama/lib/python3.12/site-packages (from openllm[vllm]) (0.4.44)
Requirement already satisfied: optimum>=1.12.0 in ./miniconda3/envs/codellama/lib/python3.12/site-packages (from openllm[vllm]) (1.19.1)
Requirement already satisfied: safetensors in ./miniconda3/envs/codellama/lib/python3.12/site-packages (from openllm[vllm]) (0.4.3)
Requirement already satisfied: scipy in ./miniconda3/envs/codellama/lib/python3.12/site-packages (from openllm[vllm]) (1.13.0)
Requirement already satisfied: sentencepiece in ./miniconda3/envs/codellama/lib/python3.12/site-packages (from openllm[vllm]) (0.2.0)
Requirement already satisfied: transformers>=4.36.0 in ./miniconda3/envs/codellama/lib/python3.12/site-packages (from transformers[tokenizers,torch]>=4.36.0->openllm[vllm]) (4.40.1)
INFO: pip is looking at multiple versions of openllm[vllm] to determine which version is compatible with other requirements. This could take a while.
Collecting openllm[vllm]
Using cached openllm-0.4.43-py3-none-any.whl.metadata (62 kB)
Using cached openllm-0.4.42-py3-none-any.whl.metadata (62 kB)
Using cached openllm-0.4.41-py3-none-any.whl.metadata (62 kB)
Using cached openllm-0.4.40-py3-none-any.whl.metadata (62 kB)
Using cached openllm-0.4.39-py3-none-any.whl.metadata (62 kB)
Collecting megablocks (from openllm[vllm])
Using cached megablocks-0.5.1.tar.gz (49 kB)
Preparing metadata (setup.py) ... done
Collecting openllm[vllm]
Using cached openllm-0.4.38-py3-none-any.whl.metadata (62 kB)
Using cached openllm-0.4.37-py3-none-any.whl.metadata (62 kB)
INFO: pip is still looking at multiple versions of openllm[vllm] to determine which version is compatible with other requirements. This could take a while.
Using cached openllm-0.4.36-py3-none-any.whl.metadata (60 kB)
Using cached openllm-0.4.35-py3-none-any.whl.metadata (60 kB)
Collecting vllm>=0.2.2 (from openllm[vllm])
Using cached vllm-0.3.3.tar.gz (315 kB)
Installing build dependencies ... error
error: subprocess-exited-with-error

× pip subprocess to install build dependencies did not run successfully.
│ exit code: 1
╰─> [8 lines of output]
Collecting ninja
Using cached ninja-1.11.1.1-py2.py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl.metadata (5.3 kB)
Collecting packaging
Using cached packaging-24.0-py3-none-any.whl.metadata (3.2 kB)
Collecting setuptools>=49.4.0
Using cached setuptools-69.5.1-py3-none-any.whl.metadata (6.2 kB)
ERROR: Could not find a version that satisfies the requirement torch==2.1.2 (from versions: 2.2.0, 2.2.1, 2.2.2, 2.3.0)
ERROR: No matching distribution found for torch==2.1.2
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× pip subprocess to install build dependencies did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

To reproduce

Step:1 create a normal setup for openllm with conda env
Step:2 RUN TRUST_REMOTE_CODE=True openllm start codellama/CodeLlama-34b-Instruct-hf --backend vllm
The following error might me visible to you:
(codellama) amardeep.yadav@fintricity.com@codellamamachine:~$ TRUST_REMOTE_CODE=True openllm start codellama/CodeLlama-34b-Instruct-hf --backend vllm
config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 588/588 [00:00<00:00, 7.36MB/s]
tokenizer_config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.59k/1.59k [00:00<00:00, 20.3MB/s]
tokenizer.model: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 500k/500k [00:00<00:00, 91.4MB/s]
tokenizer.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.84M/1.84M [00:00<00:00, 60.7MB/s]
special_tokens_map.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 411/411 [00:00<00:00, 5.58MB/s]
generation_config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 116/116 [00:00<00:00, 1.54MB/s]
model.safetensors.index.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 37.6k/37.6k [00:00<00:00, 116MB/s]
pytorch_model.bin.index.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 35.8k/35.8k [00:00<00:00, 207MB/s]
model-00007-of-00007.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 9.19G/9.19G [00:50<00:00, 180MB/s]
model-00001-of-00007.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 9.85G/9.85G [00:52<00:00, 188MB/s]
model-00002-of-00007.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 9.69G/9.69G [00:52<00:00, 183MB/s]
model-00003-of-00007.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 9.69G/9.69G [00:52<00:00, 183MB/s]
model-00006-of-00007.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 9.69G/9.69G [00:53<00:00, 180MB/s]
model-00005-of-00007.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 9.69G/9.69G [00:54<00:00, 179MB/s]
model-00004-of-00007.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 9.69G/9.69G [00:54<00:00, 179MB/s]
Fetching 15 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 15/15 [00:54<00:00, 3.63s/it]
^[[B^[[B^[[B^[[B^[[B^[[B^[[A^[[A^[[A🚀Tip: run 'openllm build codellama/CodeLlama-34b-Instruct-hf --backend vllm --serialization safetensors' to create a BentoLLM for 'codellama/CodeLlama-34b-Instruct-hf's: 100%|████████████████████████████████████████████████████████████████████████████████████████████████▋| 9.66G/9.69G [00:54<00:00, 327MB/s]
2024-04-25T18:34:00+0000 [INFO] [cli] Prometheus metrics for HTTP BentoServer from "_service:svc" can be accessed at http://localhost:3000/metrics.9G [00:53<00:00, 285MB/s]
2024-04-25T18:34:01+0000 [INFO] [cli] Starting production HTTP BentoServer from "_service:svc" listening on http://0.0.0.0:3000 (Press CTRL+C to quit)[00:53<00:00, 286MB/s]
2024-04-25T18:34:04+0000 [ERROR] [runner:llm-llama-runner:1] An exception occurred while instantiating runner 'llm-llama-runner', see details below:
2024-04-25T18:34:04+0000 [ERROR] [runner:llm-llama-runner:1] Traceback (most recent call last):
File "/home/amardeep.yadav/miniconda3/envs/codellama/lib/python3.12/site-packages/bentoml/_internal/runner/runner.py", line 307, in init_local
self._set_handle(LocalRunnerRef)
File "/home/amardeep.yadav/miniconda3/envs/codellama/lib/python3.12/site-packages/bentoml/_internal/runner/runner.py", line 150, in _set_handle
runner_handle = handle_class(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/amardeep.yadav/miniconda3/envs/codellama/lib/python3.12/site-packages/bentoml/_internal/runner/runner_handle/local.py", line 27, in init
self._runnable = runner.runnable_class(**runner.runnable_init_params) # type: ignore
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/amardeep.yadav/miniconda3/envs/codellama/lib/python3.12/site-packages/openllm/_runners.py", line 121, in init
raise openllm.exceptions.OpenLLMException('vLLM is not installed. Do pip install "openllm[vllm]".')
openllm_core.exceptions.OpenLLMException: vLLM is not installed. Do pip install "openllm[vllm]".

2024-04-25T18:34:04+0000 [ERROR] [runner:llm-llama-runner:1] Traceback (most recent call last):
File "/home/amardeep.yadav/miniconda3/envs/codellama/lib/python3.12/site-packages/starlette/routing.py", line 732, in lifespan
async with self.lifespan_context(app) as maybe_state:
File "/home/amardeep.yadav/miniconda3/envs/codellama/lib/python3.12/contextlib.py", line 210, in aenter
return await anext(self.gen)
^^^^^^^^^^^^^^^^^^^^^
File "/home/amardeep.yadav/miniconda3/envs/codellama/lib/python3.12/site-packages/bentoml/_internal/server/base_app.py", line 75, in lifespan
on_startup()
File "/home/amardeep.yadav/miniconda3/envs/codellama/lib/python3.12/site-packages/bentoml/_internal/runner/runner.py", line 317, in init_local
raise e
File "/home/amardeep.yadav/miniconda3/envs/codellama/lib/python3.12/site-packages/bentoml/_internal/runner/runner.py", line 307, in init_local
self._set_handle(LocalRunnerRef)
File "/home/amardeep.yadav/miniconda3/envs/codellama/lib/python3.12/site-packages/bentoml/_internal/runner/runner.py", line 150, in _set_handle
runner_handle = handle_class(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/amardeep.yadav/miniconda3/envs/codellama/lib/python3.12/site-packages/bentoml/_internal/runner/runner_handle/local.py", line 27, in init
self._runnable = runner.runnable_class(**runner.runnable_init_params) # type: ignore
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/amardeep.yadav/miniconda3/envs/codellama/lib/python3.12/site-packages/openllm/_runners.py", line 121, in init
raise openllm.exceptions.OpenLLMException('vLLM is not installed. Do pip install "openllm[vllm]".')
openllm_core.exceptions.OpenLLMException: vLLM is not installed. Do pip install "openllm[vllm]".

2024-04-25T18:34:04+0000 [ERROR] [runner:llm-llama-runner:1] Application startup failed. Exiting.

Logs

Mentioned everything above.

Environment

Environment variable

BENTOML_DEBUG=''
BENTOML_QUIET=''
BENTOML_BUNDLE_LOCAL_BUILD=''
BENTOML_DO_NOT_TRACK=''
BENTOML_CONFIG=''
BENTOML_CONFIG_OPTIONS=''
BENTOML_PORT=''
BENTOML_HOST=''
BENTOML_API_WORKERS=''

System information

bentoml: 1.1.11
python: 3.12.2
platform: Linux-5.15.0-1050-azure-x86_64-with-glibc2.31
uid_gid: 14830125:14830125
conda: 24.3.0
in_conda_env: True

conda_packages
name: codellama
channels:
  - defaults
dependencies:
  - _libgcc_mutex=0.1=main
  - _openmp_mutex=5.1=1_gnu
  - bzip2=1.0.8=h5eee18b_5
  - ca-certificates=2024.3.11=h06a4308_0
  - expat=2.6.2=h6a678d5_0
  - ld_impl_linux-64=2.38=h1181459_1
  - libffi=3.4.4=h6a678d5_0
  - libgcc-ng=11.2.0=h1234567_1
  - libgomp=11.2.0=h1234567_1
  - libstdcxx-ng=11.2.0=h1234567_1
  - libuuid=1.41.5=h5eee18b_0
  - ncurses=6.4=h6a678d5_0
  - openssl=3.0.13=h7f8727e_0
  - pip=23.3.1=py312h06a4308_0
  - python=3.12.2=h996f2a0_0
  - readline=8.2=h5eee18b_0
  - setuptools=68.2.2=py312h06a4308_0
  - sqlite=3.41.2=h5eee18b_0
  - tk=8.6.12=h1ccaba5_0
  - wheel=0.41.2=py312h06a4308_0
  - xz=5.4.6=h5eee18b_0
  - zlib=1.2.13=h5eee18b_0
  - pip:
      - accelerate==0.29.3
      - aiohttp==3.9.5
      - aiosignal==1.3.1
      - anyio==4.3.0
      - appdirs==1.4.4
      - asgiref==3.8.1
      - attrs==23.2.0
      - bentoml==1.1.11
      - bitsandbytes==0.41.3.post2
      - build==0.10.0
      - cattrs==23.1.2
      - certifi==2024.2.2
      - charset-normalizer==3.3.2
      - circus==0.18.0
      - click==8.1.7
      - click-option-group==0.5.6
      - cloudpickle==3.0.0
      - coloredlogs==15.0.1
      - contextlib2==21.6.0
      - cuda-python==12.4.0
      - datasets==2.19.0
      - deepmerge==1.1.1
      - deprecated==1.2.14
      - dill==0.3.8
      - distlib==0.3.8
      - distro==1.9.0
      - einops==0.7.0
      - fastcore==1.5.29
      - filelock==3.13.4
      - filetype==1.2.0
      - frozenlist==1.4.1
      - fs==2.4.16
      - fsspec==2024.3.1
      - ghapi==1.0.5
      - h11==0.14.0
      - httpcore==1.0.5
      - httpx==0.27.0
      - huggingface-hub==0.22.2
      - humanfriendly==10.0
      - idna==3.7
      - importlib-metadata==6.11.0
      - inflection==0.5.1
      - jinja2==3.1.3
      - markdown-it-py==3.0.0
      - markupsafe==2.1.5
      - mdurl==0.1.2
      - mpmath==1.3.0
      - multidict==6.0.5
      - multiprocess==0.70.16
      - mypy-extensions==1.0.0
      - networkx==3.3
      - numpy==1.26.4
      - nvidia-cublas-cu12==12.1.3.1
      - nvidia-cuda-cupti-cu12==12.1.105
      - nvidia-cuda-nvrtc-cu12==12.1.105
      - nvidia-cuda-runtime-cu12==12.1.105
      - nvidia-cudnn-cu12==8.9.2.26
      - nvidia-cufft-cu12==11.0.2.54
      - nvidia-curand-cu12==10.3.2.106
      - nvidia-cusolver-cu12==11.4.5.107
      - nvidia-cusparse-cu12==12.1.0.106
      - nvidia-ml-py==11.525.150
      - nvidia-nccl-cu12==2.20.5
      - nvidia-nvjitlink-cu12==12.4.127
      - nvidia-nvtx-cu12==12.1.105
      - openllm==0.4.44
      - openllm-client==0.4.44
      - openllm-core==0.4.44
      - opentelemetry-api==1.20.0
      - opentelemetry-instrumentation==0.41b0
      - opentelemetry-instrumentation-aiohttp-client==0.41b0
      - opentelemetry-instrumentation-asgi==0.41b0
      - opentelemetry-sdk==1.20.0
      - opentelemetry-semantic-conventions==0.41b0
      - opentelemetry-util-http==0.41b0
      - optimum==1.19.1
      - orjson==3.10.1
      - packaging==24.0
      - pandas==2.2.2
      - pathspec==0.12.1
      - pillow==10.3.0
      - pip-requirements-parser==32.0.1
      - pip-tools==7.3.0
      - platformdirs==4.2.1
      - prometheus-client==0.20.0
      - protobuf==5.26.1
      - psutil==5.9.8
      - pyarrow==16.0.0
      - pyarrow-hotfix==0.6
      - pydantic==1.10.15
      - pygments==2.17.2
      - pyparsing==3.1.2
      - pyproject-hooks==1.0.0
      - python-dateutil==2.9.0.post0
      - python-json-logger==2.0.7
      - python-multipart==0.0.9
      - pytz==2024.1
      - pyyaml==6.0.1
      - pyzmq==26.0.2
      - regex==2024.4.16
      - requests==2.31.0
      - rich==13.7.1
      - safetensors==0.4.3
      - schema==0.7.5
      - scipy==1.13.0
      - sentencepiece==0.2.0
      - simple-di==0.1.5
      - six==1.16.0
      - sniffio==1.3.1
      - starlette==0.37.2
      - sympy==1.12
      - tokenizers==0.19.1
      - torch==2.3.0
      - tornado==6.4
      - tqdm==4.66.2
      - transformers==4.40.1
      - typing-extensions==4.11.0
      - tzdata==2024.1
      - urllib3==2.2.1
      - uvicorn==0.29.0
      - virtualenv==20.26.0
      - watchfiles==0.21.0
      - wrapt==1.16.0
      - xxhash==3.4.1
      - yarl==1.9.4
      - zipp==3.18.1
prefix: /home/amardeep.yadav/miniconda3/envs/codellama
pip_packages
accelerate==0.29.3
aiohttp==3.9.5
aiosignal==1.3.1
anyio==4.3.0
appdirs==1.4.4
asgiref==3.8.1
attrs==23.2.0
bentoml==1.1.11
bitsandbytes==0.41.3.post2
build==0.10.0
cattrs==23.1.2
certifi==2024.2.2
charset-normalizer==3.3.2
circus==0.18.0
click==8.1.7
click-option-group==0.5.6
cloudpickle==3.0.0
coloredlogs==15.0.1
contextlib2==21.6.0
cuda-python==12.4.0
datasets==2.19.0
deepmerge==1.1.1
Deprecated==1.2.14
dill==0.3.8
distlib==0.3.8
distro==1.9.0
einops==0.7.0
fastcore==1.5.29
filelock==3.13.4
filetype==1.2.0
frozenlist==1.4.1
fs==2.4.16
fsspec==2024.3.1
ghapi==1.0.5
h11==0.14.0
httpcore==1.0.5
httpx==0.27.0
huggingface-hub==0.22.2
humanfriendly==10.0
idna==3.7
importlib-metadata==6.11.0
inflection==0.5.1
Jinja2==3.1.3
markdown-it-py==3.0.0
MarkupSafe==2.1.5
mdurl==0.1.2
mpmath==1.3.0
multidict==6.0.5
multiprocess==0.70.16
mypy-extensions==1.0.0
networkx==3.3
numpy==1.26.4
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-ml-py==11.525.150
nvidia-nccl-cu12==2.20.5
nvidia-nvjitlink-cu12==12.4.127
nvidia-nvtx-cu12==12.1.105
openllm==0.4.44
openllm-client==0.4.44
openllm-core==0.4.44
opentelemetry-api==1.20.0
opentelemetry-instrumentation==0.41b0
opentelemetry-instrumentation-aiohttp-client==0.41b0
opentelemetry-instrumentation-asgi==0.41b0
opentelemetry-sdk==1.20.0
opentelemetry-semantic-conventions==0.41b0
opentelemetry-util-http==0.41b0
optimum==1.19.1
orjson==3.10.1
packaging==24.0
pandas==2.2.2
pathspec==0.12.1
pillow==10.3.0
pip-requirements-parser==32.0.1
pip-tools==7.3.0
platformdirs==4.2.1
prometheus_client==0.20.0
protobuf==5.26.1
psutil==5.9.8
pyarrow==16.0.0
pyarrow-hotfix==0.6
pydantic==1.10.15
Pygments==2.17.2
pyparsing==3.1.2
pyproject_hooks==1.0.0
python-dateutil==2.9.0.post0
python-json-logger==2.0.7
python-multipart==0.0.9
pytz==2024.1
PyYAML==6.0.1
pyzmq==26.0.2
regex==2024.4.16
requests==2.31.0
rich==13.7.1
safetensors==0.4.3
schema==0.7.5
scipy==1.13.0
sentencepiece==0.2.0
setuptools==68.2.2
simple-di==0.1.5
six==1.16.0
sniffio==1.3.1
starlette==0.37.2
sympy==1.12
tokenizers==0.19.1
torch==2.3.0
tornado==6.4
tqdm==4.66.2
transformers==4.40.1
typing_extensions==4.11.0
tzdata==2024.1
urllib3==2.2.1
uvicorn==0.29.0
virtualenv==20.26.0
watchfiles==0.21.0
wheel==0.41.2
wrapt==1.16.0
xxhash==3.4.1
yarl==1.9.4
zipp==3.18.1

System information (Optional)

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant