Skip to content

Releases: bentoml/OpenLLM

v0.5.0-alpha.1

21 Mar 01:46
v0.5.0-alpha.1
12ac998
Compare
Choose a tag to compare
v0.5.0-alpha.1 Pre-release
Pre-release
Release 0.5.0-alpha.1 [generated by GitHub Actions]

v0.5.0-alpha

15 Mar 09:28
v0.5.0-alpha
58c741c
Compare
Choose a tag to compare
v0.5.0-alpha Pre-release
Pre-release
Release 0.5.0-alpha [generated by GitHub Actions]

v0.4.44

06 Feb 03:17
v0.4.44
1b54d64
Compare
Choose a tag to compare

Installation

pip install openllm==0.4.44

To upgrade from a previous version, use the following command:

pip install --upgrade openllm==0.4.44

Usage

All available models: openllm models

To start a LLM: python -m openllm start HuggingFaceH4/zephyr-7b-beta

To run OpenLLM within a container environment (requires GPUs): docker run --gpus all -it -P -v $PWD/data:$HOME/.cache/huggingface/ ghcr.io/bentoml/openllm:0.4.44 start HuggingFaceH4/zephyr-7b-beta

Find more information about this release in the CHANGELOG.md

What's Changed

  • fix: remove vllm dependency for pytorch bento by @larme in #893

Full Changelog: v0.4.43...v0.4.44

v0.4.43

05 Feb 10:58
v0.4.43
fe44c84
Compare
Choose a tag to compare

Installation

pip install openllm==0.4.43

To upgrade from a previous version, use the following command:

pip install --upgrade openllm==0.4.43

Usage

All available models: openllm models

To start a LLM: python -m openllm start HuggingFaceH4/zephyr-7b-beta

To run OpenLLM within a container environment (requires GPUs): docker run --gpus all -it -P -v $PWD/data:$HOME/.cache/huggingface/ ghcr.io/bentoml/openllm:0.4.43 start HuggingFaceH4/zephyr-7b-beta

Find more information about this release in the CHANGELOG.md

What's Changed

  • fix: limit BentoML version range by @larme in #881
  • chore: bump up bentoml version to 1.1.11 by @larme in #883
  • Bump BentoML version in tools by @larme in #884

Full Changelog: v0.4.42...v0.4.43

v0.4.42

02 Feb 12:31
v0.4.42
d1583cc
Compare
Choose a tag to compare

Installation

pip install openllm==0.4.42

To upgrade from a previous version, use the following command:

pip install --upgrade openllm==0.4.42

Usage

All available models: openllm models

To start a LLM: python -m openllm start HuggingFaceH4/zephyr-7b-beta

To run OpenLLM within a container environment (requires GPUs): docker run --gpus all -it -P -v $PWD/data:$HOME/.cache/huggingface/ ghcr.io/bentoml/openllm:0.4.42 start HuggingFaceH4/zephyr-7b-beta

Find more information about this release in the CHANGELOG.md

What's Changed

New Contributors

Full Changelog: v0.4.41...v0.4.42

v0.4.41

18 Dec 18:18
v0.4.41
b09bd20
Compare
Choose a tag to compare

GPTQ Supports

vLLM backend now support GPTQ with upstream

openlml start TheBloke/Mistral-7B-Instruct-v0.2-GPTQ --backend vllm --quantise gptq

Installation

pip install openllm==0.4.41

To upgrade from a previous version, use the following command:

pip install --upgrade openllm==0.4.41

Usage

All available models: openllm models

To start a LLM: python -m openllm start HuggingFaceH4/zephyr-7b-beta

To run OpenLLM within a container environment (requires GPUs): docker run --gpus all -it -P -v $PWD/data:$HOME/.cache/huggingface/ ghcr.io/bentoml/openllm:0.4.41 start HuggingFaceH4/zephyr-7b-beta

Find more information about this release in the CHANGELOG.md

What's Changed

  • docs: add notes about dtypes usage. by @aarnphm in #786
  • chore(deps): bump taiki-e/install-action from 2.22.0 to 2.22.5 by @dependabot in #790
  • chore(deps): bump github/codeql-action from 2.22.9 to 3.22.11 by @dependabot in #794
  • chore(deps): bump sigstore/cosign-installer from 3.2.0 to 3.3.0 by @dependabot in #793
  • chore(deps): bump actions/download-artifact from 3.0.2 to 4.0.0 by @dependabot in #791
  • chore(deps): bump actions/upload-artifact from 3.1.3 to 4.0.0 by @dependabot in #792
  • ci: pre-commit autoupdate [pre-commit.ci] by @pre-commit-ci in #796
  • fix(cli): avoid runtime __origin__ check for older Python by @aarnphm in #798
  • feat(vllm): support GPTQ with 0.2.6 by @aarnphm in #797
  • fix(ci): lock to v3 iteration of actions/artifacts workflow by @aarnphm in #799

Full Changelog: v0.4.40...v0.4.41

v0.4.40

15 Dec 16:57
v0.4.40
2e8fc28
Compare
Choose a tag to compare

Installation

pip install openllm==0.4.40

To upgrade from a previous version, use the following command:

pip install --upgrade openllm==0.4.40

Usage

All available models: openllm models

To start a LLM: python -m openllm start HuggingFaceH4/zephyr-7b-beta

To run OpenLLM within a container environment (requires GPUs): docker run --gpus all -it -P -v $PWD/data:$HOME/.cache/huggingface/ ghcr.io/bentoml/openllm:0.4.40 start HuggingFaceH4/zephyr-7b-beta

Find more information about this release in the CHANGELOG.md

What's Changed

  • fix(infra): conform ruff to 150 LL by @aarnphm in #781
  • infra: update blame ignore to formatter hash by @aarnphm in #782
  • perf: upgrade mixtral to use expert parallelism by @aarnphm in #783

Full Changelog: v0.4.39...v0.4.40

v0.4.39

14 Dec 19:30
v0.4.39
d4fbbce
Compare
Choose a tag to compare

Installation

pip install openllm==0.4.39

To upgrade from a previous version, use the following command:

pip install --upgrade openllm==0.4.39

Usage

All available models: openllm models

To start a LLM: python -m openllm start HuggingFaceH4/zephyr-7b-beta

To run OpenLLM within a container environment (requires GPUs): docker run --gpus all -it -P -v $PWD/data:$HOME/.cache/huggingface/ ghcr.io/bentoml/openllm:0.4.39 start HuggingFaceH4/zephyr-7b-beta

Find more information about this release in the CHANGELOG.md

What's Changed

Full Changelog: v0.4.38...v0.4.39

v0.4.38

13 Dec 23:36
v0.4.38
1dbae67
Compare
Choose a tag to compare

Installation

pip install openllm==0.4.38

To upgrade from a previous version, use the following command:

pip install --upgrade openllm==0.4.38

Usage

All available models: openllm models

To start a LLM: python -m openllm start HuggingFaceH4/zephyr-7b-beta

To run OpenLLM within a container environment (requires GPUs): docker run --gpus all -it -P -v $PWD/data:$HOME/.cache/huggingface/ ghcr.io/bentoml/openllm:0.4.38 start HuggingFaceH4/zephyr-7b-beta

Find more information about this release in the CHANGELOG.md

What's Changed

  • fix(mixtral): correct chat templates to remove additional spacing by @aarnphm in #774
  • fix(cli): correct set arguments for openllm import and openllm build by @aarnphm in #775
  • fix(mixtral): setup hack atm to load weights from pt specifically instead of safetensors by @aarnphm in #776

Full Changelog: v0.4.37...v0.4.38

v0.4.37

13 Dec 14:22
v0.4.37
8d9d212
Compare
Choose a tag to compare

Installation

pip install openllm==0.4.37

To upgrade from a previous version, use the following command:

pip install --upgrade openllm==0.4.37

Usage

All available models: openllm models

To start a LLM: python -m openllm start HuggingFaceH4/zephyr-7b-beta

To run OpenLLM within a container environment (requires GPUs): docker run --gpus all -it -P -v $PWD/data:$HOME/.cache/huggingface/ ghcr.io/bentoml/openllm:0.4.37 start HuggingFaceH4/zephyr-7b-beta

Find more information about this release in the CHANGELOG.md

What's Changed

  • feat(mixtral): correct support for mixtral by @aarnphm in #772
  • chore: running all script when installation by @aarnphm in #773

Full Changelog: v0.4.36...v0.4.37