Add support for torch.compile dynamic shapes #30560
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds support for compiling models with dynamic shapes
dynamic=True
to almost all models with SDPAttention implementations which currently do not support dynamic shapes. #30442 added support for Llama, Gemma, OLMo, & Cohere.The only model not modified is DBRX, which needs the changes from both #30070 and #30442 to add support for SDPA's Flash Attention kernel and support for dynamic shapes, as it I believe it suffers from the same training memory issues detailed in #30010.
As mentioned in #30442, moving the
is_causal
dispatch logic from inline to an if statement is required to support bothfullgraph=True
anddynamic=True
.I kept the
qlen>1
comments but could remove them if we want to match Llama, which doesn't have it.cc @ArthurZucker and @fxmarty