Skip to content

Releases: bentoml/OpenLLM

v0.5.0-alpha

15 Mar 09:28
Compare
Choose a tag to compare
v0.5.0-alpha Pre-release
Pre-release
Release 0.5.0-alpha [generated by GitHub Actions]

v0.4.44

06 Feb 03:17
Compare
Choose a tag to compare

Installation

pip install openllm==0.4.44

To upgrade from a previous version, use the following command:

pip install --upgrade openllm==0.4.44

Usage

All available models: openllm models

To start a LLM: python -m openllm start HuggingFaceH4/zephyr-7b-beta

To run OpenLLM within a container environment (requires GPUs): docker run --gpus all -it -P -v $PWD/data:$HOME/.cache/huggingface/ ghcr.io/bentoml/openllm:0.4.44 start HuggingFaceH4/zephyr-7b-beta

Find more information about this release in the CHANGELOG.md

What's Changed

  • fix: remove vllm dependency for pytorch bento by @larme in #893

Full Changelog: v0.4.43...v0.4.44

v0.4.43

05 Feb 10:58
Compare
Choose a tag to compare

Installation

pip install openllm==0.4.43

To upgrade from a previous version, use the following command:

pip install --upgrade openllm==0.4.43

Usage

All available models: openllm models

To start a LLM: python -m openllm start HuggingFaceH4/zephyr-7b-beta

To run OpenLLM within a container environment (requires GPUs): docker run --gpus all -it -P -v $PWD/data:$HOME/.cache/huggingface/ ghcr.io/bentoml/openllm:0.4.43 start HuggingFaceH4/zephyr-7b-beta

Find more information about this release in the CHANGELOG.md

What's Changed

  • fix: limit BentoML version range by @larme in #881
  • chore: bump up bentoml version to 1.1.11 by @larme in #883
  • Bump BentoML version in tools by @larme in #884

Full Changelog: v0.4.42...v0.4.43

v0.4.42

02 Feb 12:31
Compare
Choose a tag to compare

Installation

pip install openllm==0.4.42

To upgrade from a previous version, use the following command:

pip install --upgrade openllm==0.4.42

Usage

All available models: openllm models

To start a LLM: python -m openllm start HuggingFaceH4/zephyr-7b-beta

To run OpenLLM within a container environment (requires GPUs): docker run --gpus all -it -P -v $PWD/data:$HOME/.cache/huggingface/ ghcr.io/bentoml/openllm:0.4.42 start HuggingFaceH4/zephyr-7b-beta

Find more information about this release in the CHANGELOG.md

What's Changed

New Contributors

Full Changelog: v0.4.41...v0.4.42

v0.4.41

18 Dec 18:18
Compare
Choose a tag to compare

GPTQ Supports

vLLM backend now support GPTQ with upstream

openlml start TheBloke/Mistral-7B-Instruct-v0.2-GPTQ --backend vllm --quantise gptq

Installation

pip install openllm==0.4.41

To upgrade from a previous version, use the following command:

pip install --upgrade openllm==0.4.41

Usage

All available models: openllm models

To start a LLM: python -m openllm start HuggingFaceH4/zephyr-7b-beta

To run OpenLLM within a container environment (requires GPUs): docker run --gpus all -it -P -v $PWD/data:$HOME/.cache/huggingface/ ghcr.io/bentoml/openllm:0.4.41 start HuggingFaceH4/zephyr-7b-beta

Find more information about this release in the CHANGELOG.md

What's Changed

  • docs: add notes about dtypes usage. by @aarnphm in #786
  • chore(deps): bump taiki-e/install-action from 2.22.0 to 2.22.5 by @dependabot in #790
  • chore(deps): bump github/codeql-action from 2.22.9 to 3.22.11 by @dependabot in #794
  • chore(deps): bump sigstore/cosign-installer from 3.2.0 to 3.3.0 by @dependabot in #793
  • chore(deps): bump actions/download-artifact from 3.0.2 to 4.0.0 by @dependabot in #791
  • chore(deps): bump actions/upload-artifact from 3.1.3 to 4.0.0 by @dependabot in #792
  • ci: pre-commit autoupdate [pre-commit.ci] by @pre-commit-ci in #796
  • fix(cli): avoid runtime __origin__ check for older Python by @aarnphm in #798
  • feat(vllm): support GPTQ with 0.2.6 by @aarnphm in #797
  • fix(ci): lock to v3 iteration of actions/artifacts workflow by @aarnphm in #799

Full Changelog: v0.4.40...v0.4.41

v0.4.40

15 Dec 16:57
Compare
Choose a tag to compare

Installation

pip install openllm==0.4.40

To upgrade from a previous version, use the following command:

pip install --upgrade openllm==0.4.40

Usage

All available models: openllm models

To start a LLM: python -m openllm start HuggingFaceH4/zephyr-7b-beta

To run OpenLLM within a container environment (requires GPUs): docker run --gpus all -it -P -v $PWD/data:$HOME/.cache/huggingface/ ghcr.io/bentoml/openllm:0.4.40 start HuggingFaceH4/zephyr-7b-beta

Find more information about this release in the CHANGELOG.md

What's Changed

  • fix(infra): conform ruff to 150 LL by @aarnphm in #781
  • infra: update blame ignore to formatter hash by @aarnphm in #782
  • perf: upgrade mixtral to use expert parallelism by @aarnphm in #783

Full Changelog: v0.4.39...v0.4.40

v0.4.39

14 Dec 19:30
Compare
Choose a tag to compare

Installation

pip install openllm==0.4.39

To upgrade from a previous version, use the following command:

pip install --upgrade openllm==0.4.39

Usage

All available models: openllm models

To start a LLM: python -m openllm start HuggingFaceH4/zephyr-7b-beta

To run OpenLLM within a container environment (requires GPUs): docker run --gpus all -it -P -v $PWD/data:$HOME/.cache/huggingface/ ghcr.io/bentoml/openllm:0.4.39 start HuggingFaceH4/zephyr-7b-beta

Find more information about this release in the CHANGELOG.md

What's Changed

Full Changelog: v0.4.38...v0.4.39

v0.4.38

13 Dec 23:36
Compare
Choose a tag to compare

Installation

pip install openllm==0.4.38

To upgrade from a previous version, use the following command:

pip install --upgrade openllm==0.4.38

Usage

All available models: openllm models

To start a LLM: python -m openllm start HuggingFaceH4/zephyr-7b-beta

To run OpenLLM within a container environment (requires GPUs): docker run --gpus all -it -P -v $PWD/data:$HOME/.cache/huggingface/ ghcr.io/bentoml/openllm:0.4.38 start HuggingFaceH4/zephyr-7b-beta

Find more information about this release in the CHANGELOG.md

What's Changed

  • fix(mixtral): correct chat templates to remove additional spacing by @aarnphm in #774
  • fix(cli): correct set arguments for openllm import and openllm build by @aarnphm in #775
  • fix(mixtral): setup hack atm to load weights from pt specifically instead of safetensors by @aarnphm in #776

Full Changelog: v0.4.37...v0.4.38

v0.4.37

13 Dec 14:22
Compare
Choose a tag to compare

Installation

pip install openllm==0.4.37

To upgrade from a previous version, use the following command:

pip install --upgrade openllm==0.4.37

Usage

All available models: openllm models

To start a LLM: python -m openllm start HuggingFaceH4/zephyr-7b-beta

To run OpenLLM within a container environment (requires GPUs): docker run --gpus all -it -P -v $PWD/data:$HOME/.cache/huggingface/ ghcr.io/bentoml/openllm:0.4.37 start HuggingFaceH4/zephyr-7b-beta

Find more information about this release in the CHANGELOG.md

What's Changed

  • feat(mixtral): correct support for mixtral by @aarnphm in #772
  • chore: running all script when installation by @aarnphm in #773

Full Changelog: v0.4.36...v0.4.37

v0.4.36

12 Dec 06:44
Compare
Choose a tag to compare

Mixtral supports

Supports Mixtral on BentoCloud with vLLM and all required dependencies.

Bento built with openllm now defaults to python 3.11 for this change to work.

Installation

pip install openllm==0.4.36

To upgrade from a previous version, use the following command:

pip install --upgrade openllm==0.4.36

Usage

All available models: openllm models

To start a LLM: python -m openllm start HuggingFaceH4/zephyr-7b-beta

To run OpenLLM within a container environment (requires GPUs): docker run --gpus all -it -P -v $PWD/data:$HOME/.cache/huggingface/ ghcr.io/bentoml/openllm:0.4.36 start HuggingFaceH4/zephyr-7b-beta

Find more information about this release in the CHANGELOG.md

What's Changed

  • feat(openai): supports echo by @aarnphm in #760
  • fix(openai): logprobs when echo is enabled by @aarnphm in #761
  • ci: pre-commit autoupdate [pre-commit.ci] by @pre-commit-ci in #767
  • chore(deps): bump docker/metadata-action from 5.2.0 to 5.3.0 by @dependabot in #766
  • chore(deps): bump actions/setup-python from 4.7.1 to 5.0.0 by @dependabot in #765
  • chore(deps): bump taiki-e/install-action from 2.21.26 to 2.22.0 by @dependabot in #764
  • chore(deps): bump aquasecurity/trivy-action from 0.14.0 to 0.16.0 by @dependabot in #763
  • chore(deps): bump github/codeql-action from 2.22.8 to 2.22.9 by @dependabot in #762
  • feat: mixtral support by @aarnphm in #770

Full Changelog: v0.4.35...v0.4.36