Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fallback to Flash Attention v1 for pre-Ampere GPUs #440

Open
tgaddair opened this issue Apr 26, 2024 · 1 comment · May be fixed by #480
Open

Fallback to Flash Attention v1 for pre-Ampere GPUs #440

tgaddair opened this issue Apr 26, 2024 · 1 comment · May be fixed by #480
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@tgaddair
Copy link
Contributor

We can add back the FA1 implementation from huggingface/text-generation-inference#624 when compute capability of Volta or Turing is detected. This may bloat the Docker somewhat to support both, but it seems this is a common user pain point we should definitely address.

@tgaddair tgaddair added enhancement New feature or request good first issue Good for newcomers labels Apr 26, 2024
@N1RM4L13
Copy link

N1RM4L13 commented May 7, 2024

@tgaddair I would like to contribute to this

@flozi00 flozi00 linked a pull request May 21, 2024 that will close this issue
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants