Fast softmax #972

wszczurekhabana · 2024-05-10T12:03:51Z

Changes from: HabanaAI#159
This change is dependent on: #967 to be merged first.

Original description:

Support for setting fast softmax mode in FusedSDPA operator. This is a tradeoff: performance vs accuracy.

Data on performance:

Ratio	Max input tokens	Max new tokens	Batch size	Throughput without fast softmax [tokens/s]	Throughput with fast softmax [tokens/s]	Improvement %
97%	31744	1042	12	139.08	147.97	6.4%
75%	24576	8192	16	431.09	437.95	1.6%
50%	16384	16384	24	653.39	656.38	0.5%

Data on accuracy (using mlperf test from: https://gerrit.habana-labs.com/plugins/gitiles/mlperf_inference/+/refs/heads/master_next/code/llama/llama_greedy.py
and https://gerrit.habana-labs.com/plugins/gitiles/mlperf_inference/+/refs/heads/master_next/code/llama/evaluation.py):

	rouge1	rouge2	rougeL	rougeLsum	accuracy
without fast softmax	44.4279	22.0536	28.6362	42.0044	99.99
with fast softmax	44.4065	22.0229	28.6156	41.9858	99.94

…at matches its scale method (#92)

* Done to allow quantization using HQT * Added use_flash_attention and flash_attention_recompute to run_lm_eval

* Enable fast softmax mode in FusedSDPA * Add fast_softmax parameter to _gradient_checkpointing_func

dvarshney-habana

@wszczurekhabana Pls confirm we can merge this now as #967 is merged.

regisss

LGTM!
Please run make style.

HuggingFaceDocBuilderDev · 2024-05-30T12:55:34Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Co-authored-by: Dudi Lester <[email protected]> Co-authored-by: Sayantan Sarkar <[email protected]>

dudilester and others added 5 commits May 8, 2024 16:10

added text-generation quantization_config example file with a name th…

83b3605

…at matches its scale method (#92)

Encapsulate FSDPA in GaudiLlamaAttention (#129)

e231aa5

* Done to allow quantization using HQT * Added use_flash_attention and flash_attention_recompute to run_lm_eval

enforce recompute flag on fsdpa quantization (#133)

86fa5b6

add flash_attention_causal_mask to run_lm_eval.py (#142)

659b2d1

Enable fast softmax mode in FusedSDPA (#159)

791b42a

* Enable fast softmax mode in FusedSDPA * Add fast_softmax parameter to _gradient_checkpointing_func

wszczurekhabana requested review from ssarkar2, bhargaveede, vivekgoe, mandy-li, libinta, dvarshney-habana and regisss as code owners May 10, 2024 12:03

libinta added the synapse 1.16_dependency synapse 1.16 dependency label May 10, 2024

dvarshney-habana approved these changes May 22, 2024

View reviewed changes

hsubramony added a commit that referenced this pull request May 29, 2024

Fast softmax #972

570cfa1

regisss reviewed May 30, 2024

View reviewed changes

hsubramony added a commit that referenced this pull request May 31, 2024

Fast softmax #972

2b1cdf1

ssarkar2 added 2 commits June 6, 2024 22:38

Merge remote-tracking branch 'oh_origin/main' into fast_softmax

39ed9ff

Style

d0c4d7e

regisss approved these changes Jun 6, 2024

View reviewed changes

ssarkar2 approved these changes Jun 6, 2024

View reviewed changes

regisss merged commit adcec3d into huggingface:main Jun 6, 2024
6 of 7 checks passed

wszczurekhabana mentioned this pull request Jun 11, 2024

Enable fast softmax mode in FusedSDPA HabanaAI/optimum-habana-fork#159

Merged

imangohari1 pushed a commit to imangohari1/optimum-habana that referenced this pull request Jun 13, 2024

Fast softmax (huggingface#972)

d26b9ae

Co-authored-by: Dudi Lester <[email protected]> Co-authored-by: Sayantan Sarkar <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fast softmax #972

Fast softmax #972

wszczurekhabana commented May 10, 2024

dvarshney-habana left a comment

regisss left a comment

HuggingFaceDocBuilderDev commented May 30, 2024

Fast softmax #972

Fast softmax #972

Conversation

wszczurekhabana commented May 10, 2024

dvarshney-habana left a comment

Choose a reason for hiding this comment

regisss left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented May 30, 2024