Skip to content

v0.1.0

Compare
Choose a tag to compare
@github-actions github-actions released this 17 May 16:26
· 55 commits to main since this release

Major changes:

  • Added python wrapper and published scalellm package to PyPI.
  • Supported openai-compatible rest api server. 'python3 -m scalellm.serve.api_server'
  • Install scalellm with pip: 'pip install scalellm'
  • Added examples for offline inference and async stream.

What's Changed

  • [fix] use the pybind11 from libtorch and fix model download issue. by @guocuimi in #167
  • [misc] upgrade torch to 2.3 and use gcc-12 by @guocuimi in #168
  • [feat] added python rest api server skeleton by @guocuimi in #169
  • [refactor] combine sequence and request outputs by @guocuimi in #170
  • [feat] added python LLMEngine skeleton by @guocuimi in #171
  • [refactor] move proto definitions into proto namespace by @guocuimi in #173
  • [feat] implement async llm engine for python wrapper by @guocuimi in #172
  • [refactor] consolidate handlers to share llm_handler between python rest api server and grpc server by @guocuimi in #174
  • [python] move request handling logic into seperate file from api server by @guocuimi in #175
  • [python] added model check for rest api by @guocuimi in #176
  • [feat] added status handling for grpc server by @guocuimi in #177
  • [misc] some changes to cmake file by @guocuimi in #180
  • [kernle] change head_dim list to reduce binary size by @guocuimi in #181
  • [CI] added base docker image for python wheel build by @guocuimi in #182
  • [ci] build python wheels by @guocuimi in #183
  • [CI] fix docker image issues and build wheel for different python, pytorch versions by @guocuimi in #184
  • [fix] added manylinux support by @guocuimi in #185
  • [fix] added cuda 11.8 support for manylinux by @guocuimi in #186
  • [feat] added version suffix to include cuda and torch version by @guocuimi in #187
  • [CI] Upload wheels to release as asserts by @guocuimi in #188
  • [fix] fix extension typo for wheel publish workflow by @guocuimi in #189
  • [python] added LLM for offline inference and stream examples for chat and complete by @guocuimi in #190
  • [python] added requirements into package by @guocuimi in #191
  • [Release] prepare 0.1.0 release by @guocuimi in #192
  • [Release] added workflow to publish whls to PyPI by @guocuimi in #193

Full Changelog: v0.0.9...v0.1.0