Use pre-built FA2, vllm, quantization kernels in the dockerfiles #1867

fxmarty · 2024-05-07T15:57:28Z

Feature request

We could just use

RUN pip install whatever_package.whl

which would tremendously speedup the build given that the target is always Hopper, Ampere.

This would involve having for example our own wheel index, handle the build when we want to upgrade, and use whl in TGI Dockerfiles.

Motivation

Building TGI is prohibitively slow for developers

Your contribution

I could work on that sometime

The text was updated successfully, but these errors were encountered:

fxmarty changed the title ~~Use pre-built FA2, vllm, quantization kernels in the docker image~~ Use pre-built FA2, vllm, quantization kernels in the dockerfiles May 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use pre-built FA2, vllm, quantization kernels in the dockerfiles #1867

Use pre-built FA2, vllm, quantization kernels in the dockerfiles #1867

fxmarty commented May 7, 2024

Use pre-built FA2, vllm, quantization kernels in the dockerfiles #1867

Use pre-built FA2, vllm, quantization kernels in the dockerfiles #1867

Comments

fxmarty commented May 7, 2024

Feature request

Motivation

Your contribution