Skip to content

Commit

Permalink
feat: Add rocm builds and documentation (#1012)
Browse files Browse the repository at this point in the history
* Added rocm builds and documentation

* Pulled build improvements from #902

* Fixed build container for rocm build

* Install git in rocm container

* Fixed github step

* Try to fix if statement

* Added more generic dependency installation

* upgraded rustup action

* Update sccache

* Try pytorch manylinux image

* Switched location for toolchain parameter

* Downgraded to deprecated action again

* Readded set default step

* Install minimal rocm on the fly

* fixed typo in binary name

* Downgraded checkout action

* Use curl to download

* Add -y flag to yum

* Also install rocblas

* Update release.yml

* Update release.yml

* Update prepare_build_environment.sh

* Update prepare_build_environment.sh

* Update build.rs

* Update build.rs

* Update README.md

* Update website/docs/faq.mdx

* Update index.md

* Update and rename docker-cuda.yml to docker.yml

* Delete .github/workflows/docker-rocm.yml

* Delete rocm.Dockerfile

* Rename cuda.Dockerfile to Dockerfile

* Update docker.yml

* Update website/docs/installation/docker.mdx

* Update website/docs/installation/docker-compose.mdx

* Update docker-compose.mdx

* Update docker-compose.mdx

* Update docker.mdx

* Update docker.mdx

* Update website/docs/faq.mdx

---------

Co-authored-by: Meng Zhang <[email protected]>
  • Loading branch information
cromefire and wsxiaoys committed Dec 13, 2023
1 parent ca80413 commit 6ef9040
Show file tree
Hide file tree
Showing 6 changed files with 35 additions and 12 deletions.
6 changes: 6 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,8 @@
.idea
ci
clients
.github
python
**/target
**/node_modules
website
11 changes: 8 additions & 3 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,7 @@ jobs:
container: ${{ matrix.container }}
strategy:
matrix:
binary: [aarch64-apple-darwin, x86_64-manylinux2014, x86_64-manylinux2014-cuda117,
x86_64-windows-msvc-cuda117, x86_64-windows-msvc-cuda122]
binary: [aarch64-apple-darwin, x86_64-manylinux2014, x86_64-manylinux2014-cuda117, x86_64-windows-msvc-cuda117, x86_64-windows-msvc-cuda122, x86_64-manylinux2014-rocm57]
include:
- os: macos-latest
target: aarch64-apple-darwin
Expand All @@ -53,6 +52,11 @@ jobs:
ext: .exe
build_args: --features cuda
windows_cuda: '12.2.0'
- os: ubuntu-latest
target: x86_64-unknown-linux-gnu
binary: x86_64-manylinux2014-rocm57
container: ghcr.io/cromefire/hipblas-manylinux/2014/5.7:latest
build_args: --features rocm

env:
SCCACHE_GHA_ENABLED: true
Expand All @@ -72,7 +76,8 @@ jobs:
target: ${{ matrix.target }}
components: clippy

- run: rustup default ${{ env.RUST_TOOLCHAIN }}
- name: Set default rust version
run: rustup default ${{ env.RUST_TOOLCHAIN }}

- name: Sccache cache
uses: mozilla-actions/[email protected]
Expand Down
3 changes: 2 additions & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -29,12 +29,13 @@ RUN curl https://sh.rustup.rs -sSf | bash -s -- --default-toolchain ${RUST_TOOLC
ENV PATH="/root/.cargo/bin:${PATH}"

WORKDIR /root/workspace
COPY . .

RUN mkdir -p /opt/tabby/bin
RUN mkdir -p /opt/tabby/lib
RUN mkdir -p target

COPY . .

RUN --mount=type=cache,target=/usr/local/cargo/registry \
--mount=type=cache,target=/root/workspace/target \
cargo build --features cuda --release --package tabby && \
Expand Down
6 changes: 3 additions & 3 deletions website/docs/extensions/troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,9 +112,9 @@ for the current code context.
If your completion requests are timing out, Tabby may display a warning message.
This could be due to network issues or poor server performance, especially when
running a large model on a CPU. To improve performance, consider running the model
on a GPU with CUDA support or on Apple M1/M2 with Metal support. When running
the server, make sure to specify the device in the arguments using `--device cuda`
or `--device metal`. You can also try using a smaller model from the available [models](https://tabby.tabbyml.com/docs/models/).
on a GPU with CUDA or ROCm support or on Apple M1/M2 with Metal support. When running
the server, make sure to specify the device in the arguments using `--device cuda`, `--device rocm` or
`--device metal`. You can also try using a smaller model from the available [models](https://tabby.tabbyml.com/docs/models/).

By default, the timeout for automatically triggered completion requests is set to 4 seconds.
You can adjust this timeout value in the `~/.tabby-client/agent/config.toml` configuration file.
Expand Down
19 changes: 15 additions & 4 deletions website/docs/faq.mdx
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
import CodeBlock from '@theme/CodeBlock';

# ⁉️ Frequently Asked Questions

<details>
<summary>How much VRAM a LLM model consumes?</summary>
<div>By default, Tabby operates in int8 mode with CUDA, requiring approximately 8GB of VRAM for CodeLlama-7B.</div>
<div>
<p>By default, Tabby operates in int8 mode with CUDA, requiring approximately 8GB of VRAM for CodeLlama-7B.</p>
<p>For ROCm the actual limits are currently largely untested, but the same CodeLlama-7B seems to use 8GB of VRAM as well on a AMD Radeon™ RX 7900 XTX according to the ROCm monitoring tools.</p>
</div>
</details>

<details>
Expand All @@ -24,7 +25,17 @@ import CodeBlock from '@theme/CodeBlock';
<details>
<summary>How to utilize multiple NVIDIA GPUs?</summary>
<div>
<p>Tabby only supports the use of a single GPU. To utilize multiple GPUs, you can initiate multiple Tabby instances and set CUDA_VISIBLE_DEVICES accordingly.</p>
<p>Tabby only supports the use of a single GPU. To utilize multiple GPUs, you can initiate multiple Tabby instances and set CUDA_VISIBLE_DEVICES (for cuda) or HIP_VISIBLE_DEVICES (for rocm) accordingly.</p>
</div>
</details>

<details>
<summary>My AMD ROCm device isn't supported by ROCm</summary>
<div>
<p>
You can use the HSA_OVERRIDE_GFX_VERSION variable if there is a similar GPU that is supported by ROCm you can set it to that.
For example for RDNA2 you can set it to 10.3.0 and to 11.0.0 for RDNA3.
</p>
</div>
</details>

Expand Down
2 changes: 1 addition & 1 deletion website/docs/installation/apple.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,4 +14,4 @@ brew install tabbyml/tabby/tabby
tabby serve --device metal --model TabbyML/StarCoder-1B
```

The compute power of M1/M2 is limited and is likely to be sufficient only for individual usage. If you require a shared instance for a team, we recommend considering Docker hosting with CUDA. You can find more information about Docker [here](./docker).
The compute power of M1/M2 is limited and is likely to be sufficient only for individual usage. If you require a shared instance for a team, we recommend considering Docker hosting with CUDA or ROCm. You can find more information about Docker [here](./docker).

0 comments on commit 6ef9040

Please sign in to comment.