Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure KV cache is not returned as output tensor during decode phase for Falcon #993

Merged
merged 1 commit into from
Jun 6, 2024

Conversation

schoi-habana
Copy link
Collaborator

This PR follows the change for llama in https://github.com/HabanaAI/optimum-habana-fork/pull/154/files
This increased the maximum batch size of BF16 falcon180b inference from 250 to 316

updated command (remove --reuse_cache)
python ../gaudi_spawn.py
--use_deepspeed --world_size 8 run_generation.py
--model_name_or_path /root/data/falcon/falcon-180b/snapshots/d2ea5531862d4fe907280234990e6380d2befd97/
--use_hpu_graphs
--use_kv_cache
--bf16
--batch_size 316
--max_new_tokens 128
--max_input_tokens 128
--limit_hpu_graphs
--n_iterations 3
--trim_logits
--bucket_internal
--bucket_size 128
--prompt "I've always managed to dodge the bullet and avoid the addictive pull of Pokemon. Leave it to a button-mashing brawler with plastic figurine accessories to finally get me hooked. At first glance, Pokemon Rumble U isn't much to look at. With its simplistic controls and repetitive gameplay, you might feel inclined to dismiss it as yet another cash-in of the popular Nintendo franchise. But despite its faults, there's actually much more to Rumble U than meets the eye, making this a satisfying and"

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@mandy-li mandy-li requested a review from regisss May 18, 2024 03:44
@mandy-li mandy-li added run-test Run CI for PRs from external contributors synapse1.16 labels May 18, 2024
@libinta libinta added synapse 1.16_dependency synapse 1.16 dependency and removed synapse1.16 labels May 20, 2024
@libinta libinta added synapse 1.16_dependency synapse 1.16 dependency and removed synapse 1.16_dependency synapse 1.16 dependency labels May 31, 2024
@regisss regisss merged commit e778ccb into main Jun 6, 2024
25 of 29 checks passed
@regisss regisss deleted the harsh/falcon180b branch June 6, 2024 22:23
imangohari1 pushed a commit to imangohari1/optimum-habana that referenced this pull request Jun 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
run-test Run CI for PRs from external contributors synapse 1.16_dependency synapse 1.16 dependency
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants