Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fast softmax #972

Merged
merged 7 commits into from
Jun 6, 2024
Merged

Fast softmax #972

merged 7 commits into from
Jun 6, 2024

Conversation

wszczurekhabana
Copy link
Contributor

Changes from: HabanaAI#159
This change is dependent on: #967 to be merged first.

Original description:

Support for setting fast softmax mode in FusedSDPA operator. This is a tradeoff: performance vs accuracy.

Data on performance:

Ratio Max input tokens Max new tokens Batch size Throughput without fast softmax [tokens/s] Throughput with fast softmax [tokens/s] Improvement %
97% 31744 1042 12 139.08 147.97 6.4%
75% 24576 8192 16 431.09 437.95 1.6%
50% 16384 16384 24 653.39 656.38 0.5%

Data on accuracy (using mlperf test from: https://gerrit.habana-labs.com/plugins/gitiles/mlperf_inference/+/refs/heads/master_next/code/llama/llama_greedy.py
and https://gerrit.habana-labs.com/plugins/gitiles/mlperf_inference/+/refs/heads/master_next/code/llama/evaluation.py):

  rouge1 rouge2 rougeL rougeLsum accuracy
without fast softmax 44.4279 22.0536 28.6362 42.0044 99.99
with fast softmax 44.4065 22.0229 28.6156 41.9858 99.94

dudilester and others added 5 commits May 8, 2024 16:10
* Done to allow quantization using HQT

* Added use_flash_attention and flash_attention_recompute to run_lm_eval
* Enable fast softmax mode in FusedSDPA

* Add fast_softmax parameter to _gradient_checkpointing_func
Copy link
Collaborator

@dvarshney-habana dvarshney-habana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wszczurekhabana Pls confirm we can merge this now as #967 is merged.

hsubramony added a commit that referenced this pull request May 29, 2024
Copy link
Collaborator

@regisss regisss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
Please run make style.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

hsubramony added a commit that referenced this pull request May 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
synapse 1.16_dependency synapse 1.16 dependency
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants