Quantization for FSDPA #976

dudilester · 2024-05-13T10:01:58Z

Added use_flash_attention, flash_attention_causal_mask and flash_attention_recompute to run_lm_eval
Enforce recompute flag on fsdpa quantization
Allow quantization using HQT
Document FusedScaledDotProductAttention quantization

…at matches its scale method (#92)

* Done to allow quantization using HQT * Added use_flash_attention and flash_attention_recompute to run_lm_eval

dudilester · 2024-05-13T10:07:10Z

Added a commit for documenting the fsdpa quantization changes.
This PR includes the below PR commits + the doc commit
#967
@libinta - the PR should be labeled synapse_1.16_dependency

This reverts commit 7ec9185.

This reverts commit 87386b7.

HuggingFaceDocBuilderDev · 2024-05-30T12:59:08Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

regisss

Please run make style.

Should the regression tests used for Llama fp8 be updated? Like here and there for instance?

MrGeva · 2024-05-30T14:04:17Z

Should the regression tests used for Llama fp8 be updated? Like here and there for instance?

@regisss I see that sdpa is not tested in bf16 too. it can be added. can you or @libinta take care of it?

Co-authored-by: Yeonsil Yoon <[email protected]> Co-authored-by: Libin Tang <[email protected]>

dudilester and others added 5 commits May 13, 2024 12:57

added text-generation quantization_config example file with a name th…

b705633

…at matches its scale method (#92)

Encapsulate FSDPA in GaudiLlamaAttention (#129)

eb8b435

* Done to allow quantization using HQT * Added use_flash_attention and flash_attention_recompute to run_lm_eval

enforce recompute flag on fsdpa quantization (#133)

a963932

add flash_attention_causal_mask to run_lm_eval.py (#142)

e6bae7f

Document FusedScaledDotProductAttention quantization

d638f6a

dudilester requested review from mandy-li, libinta, dvarshney-habana and regisss as code owners May 13, 2024 10:01

dudilester changed the title ~~Document FusedScaledDotProductAttention quantization~~ Quantization for FSDPA May 15, 2024

libinta added the synapse 1.16_dependency synapse 1.16 dependency label May 15, 2024

ssarkar2 mentioned this pull request May 16, 2024

Quantization for FSDPA #967

Closed

hsubramony added a commit that referenced this pull request May 29, 2024

Quantization for FSDPA #976

87386b7

hsubramony added a commit that referenced this pull request May 29, 2024

merge conflict for #976

7ec9185

hsubramony added a commit that referenced this pull request May 29, 2024

Revert "merge conflict for #976"

34016c9

This reverts commit 7ec9185.

hsubramony added a commit that referenced this pull request May 29, 2024

Revert "Quantization for FSDPA #976"

ceaa74c

This reverts commit 87386b7.

hsubramony added a commit that referenced this pull request May 29, 2024

Quantization for FSDPA #976

05b95fb

regisss reviewed May 30, 2024

View reviewed changes

hsubramony added a commit that referenced this pull request May 31, 2024

Quantization for FSDPA #976

13bf2e1

Fix code style

4b9addd

regisss approved these changes Jun 6, 2024

View reviewed changes

Merge branch 'main' into dev/dlester/fsdpa_doc

43c5a3f

regisss merged commit 3c6e508 into huggingface:main Jun 6, 2024
6 of 7 checks passed

astachowiczhabana mentioned this pull request Jun 12, 2024

Document FusedScaledDotProductAttention quantization HabanaAI/optimum-habana-fork#201

Merged

imangohari1 pushed a commit to imangohari1/optimum-habana that referenced this pull request Jun 13, 2024

Quantization for FSDPA (huggingface#976)

339f5a3

Co-authored-by: Yeonsil Yoon <[email protected]> Co-authored-by: Libin Tang <[email protected]>

This was referenced Jun 13, 2024

add flash_attention_causal_mask to run_lm_eval.py HabanaAI/optimum-habana-fork#142

Merged

Encapsulate FSDPA in GaudiLlamaAttention HabanaAI/optimum-habana-fork#129

Merged

dudilester mentioned this pull request Jun 13, 2024

enforce recompute flag on fsdpa quantization HabanaAI/optimum-habana-fork#133

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quantization for FSDPA #976

Quantization for FSDPA #976

dudilester commented May 13, 2024 •

edited

Loading

dudilester commented May 13, 2024

HuggingFaceDocBuilderDev commented May 30, 2024

regisss left a comment

MrGeva commented May 30, 2024

Quantization for FSDPA #976

Quantization for FSDPA #976

Conversation

dudilester commented May 13, 2024 • edited Loading

dudilester commented May 13, 2024

HuggingFaceDocBuilderDev commented May 30, 2024

regisss left a comment

Choose a reason for hiding this comment

MrGeva commented May 30, 2024

dudilester commented May 13, 2024 •

edited

Loading