v0.1.0
Major changes:
- Added python wrapper and published scalellm package to PyPI.
- Supported openai-compatible rest api server. 'python3 -m scalellm.serve.api_server'
- Install scalellm with pip: 'pip install scalellm'
- Added examples for offline inference and async stream.
What's Changed
- [fix] use the pybind11 from libtorch and fix model download issue. by @guocuimi in #167
- [misc] upgrade torch to 2.3 and use gcc-12 by @guocuimi in #168
- [feat] added python rest api server skeleton by @guocuimi in #169
- [refactor] combine sequence and request outputs by @guocuimi in #170
- [feat] added python LLMEngine skeleton by @guocuimi in #171
- [refactor] move proto definitions into proto namespace by @guocuimi in #173
- [feat] implement async llm engine for python wrapper by @guocuimi in #172
- [refactor] consolidate handlers to share llm_handler between python rest api server and grpc server by @guocuimi in #174
- [python] move request handling logic into seperate file from api server by @guocuimi in #175
- [python] added model check for rest api by @guocuimi in #176
- [feat] added status handling for grpc server by @guocuimi in #177
- [misc] some changes to cmake file by @guocuimi in #180
- [kernle] change head_dim list to reduce binary size by @guocuimi in #181
- [CI] added base docker image for python wheel build by @guocuimi in #182
- [ci] build python wheels by @guocuimi in #183
- [CI] fix docker image issues and build wheel for different python, pytorch versions by @guocuimi in #184
- [fix] added manylinux support by @guocuimi in #185
- [fix] added cuda 11.8 support for manylinux by @guocuimi in #186
- [feat] added version suffix to include cuda and torch version by @guocuimi in #187
- [CI] Upload wheels to release as asserts by @guocuimi in #188
- [fix] fix extension typo for wheel publish workflow by @guocuimi in #189
- [python] added LLM for offline inference and stream examples for chat and complete by @guocuimi in #190
- [python] added requirements into package by @guocuimi in #191
- [Release] prepare 0.1.0 release by @guocuimi in #192
- [Release] added workflow to publish whls to PyPI by @guocuimi in #193
Full Changelog: v0.0.9...v0.1.0