Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quantization for FSDPA #976

Merged
merged 7 commits into from
Jun 6, 2024
Merged

Conversation

dudilester
Copy link
Contributor

@dudilester dudilester commented May 13, 2024

Added use_flash_attention, flash_attention_causal_mask and flash_attention_recompute to run_lm_eval
Enforce recompute flag on fsdpa quantization
Allow quantization using HQT
Document FusedScaledDotProductAttention quantization

@dudilester
Copy link
Contributor Author

Added a commit for documenting the fsdpa quantization changes.
This PR includes the below PR commits + the doc commit
#967
@libinta - the PR should be labeled synapse_1.16_dependency

@dudilester dudilester changed the title Document FusedScaledDotProductAttention quantization Quantization for FSDPA May 15, 2024
@libinta libinta added the synapse 1.16_dependency synapse 1.16 dependency label May 15, 2024
@ssarkar2 ssarkar2 mentioned this pull request May 16, 2024
hsubramony added a commit that referenced this pull request May 29, 2024
hsubramony added a commit that referenced this pull request May 29, 2024
hsubramony added a commit that referenced this pull request May 29, 2024
hsubramony added a commit that referenced this pull request May 29, 2024
hsubramony added a commit that referenced this pull request May 29, 2024
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@regisss regisss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please run make style.

Should the regression tests used for Llama fp8 be updated? Like here and there for instance?

@MrGeva
Copy link

MrGeva commented May 30, 2024

Should the regression tests used for Llama fp8 be updated? Like here and there for instance?

@regisss I see that sdpa is not tested in bf16 too. it can be added. can you or @libinta take care of it?

hsubramony added a commit that referenced this pull request May 31, 2024
@regisss regisss merged commit 3c6e508 into huggingface:main Jun 6, 2024
6 of 7 checks passed
imangohari1 pushed a commit to imangohari1/optimum-habana that referenced this pull request Jun 13, 2024
Co-authored-by: Yeonsil Yoon <[email protected]>
Co-authored-by: Libin Tang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
synapse 1.16_dependency synapse 1.16 dependency
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants