Investigate a new SDPA IR node #2278

Priya2698 · 2024-05-21T23:20:12Z

Add a new IR node for SDPA that is currently not supported within nvFuser.
CC: @cowanmeg @kevinstephano

Priya2698 · 2024-05-28T21:53:16Z

Summarizing discussion from today's meeting and an offline discussion with @jjsjann123:

For training, we need two SdpaOpFwd and SdpaOpBwd nodes. PR #2294 currently uses at::scaled_dot_product_attention that does not return any intermediate values to be stored for backward and is an inference-only node. We can merge this with SdpaOpFwd and potentially have a different API if we don't want to return all the outputs.

There are different variants of SDPA in use (flash attention, memory efficient) with slightly different function signatures, we will initially start with one (possibly, flash attention, after verifying that it is indeed being used in models like nanogpt).

CC: @IvanYashchuk

Priya2698 · 2024-05-28T22:30:21Z

References:

IvanYashchuk · 2024-05-29T06:47:57Z

which signature is the right one to target here? PyTorch itself has so many different variants and each has a different signature:
https://github.com/pytorch/pytorch/blob/a6b994ed5467d4df8320cbae51cba6a98ffb139c/aten/src/ATen/native/transformers/attention.cpp#L665-L706
https://github.com/pytorch/pytorch/blob/a6b994ed5467d4df8320cbae51cba6a98ffb139c/tools/autograd/derivatives.yaml#L2806-L2829

There's no right or wrong signature, it depends on how you want to do backward computation and that would dictate the output signature for the forward function. You need to make the decision yourself what needs to be stashed for backward or could be recomputed. If you want to have fallbacks to ATen then there's no other choice than directly mimicking ATen's function signatures. It's important to remember that Flash Attention doesn't work for all input cases, the "memory efficient" one also doesn't work for all input cases.

Is flash attention kernel representable in nvFuser primitives?

jjsjann123 · 2024-05-29T08:51:17Z

There's no right or wrong signature, it depends on how you want to do backward computation and that would dictate the output signature for the forward function. You need to make the decision yourself what needs to be stashed for backward or could be recomputed.

Yes. The question here is mostly for @cowanmeg, i.e. which implementation we are targeting in codegen would determine what signature we would want to have.

Priya2698 · 2024-06-06T00:20:06Z

[Update from Jun4 meeting]
At the moment we only plan on supporting Flash Attention to support multi-GPU development. Once we support Flash Attention, we can revisit, if we need to add Memory Efficient Attention as well. There could be a few ways:

Plumbing down the backend info from Thunder and using that within our nodes: While the two implementations have different function signatures, there are overlaps and hence, one possibility is to use a superset of the inputs and outputs. The other design here would be distinct nodes for each implementation.
We make the decision about the backend within nvFuser using the same logic as ATen/Thunder. See: https://github.com/Lightning-AI/lightning-thunder/blob/9f0c50cc6df187cf5fd2e31240690fe2b5e9ccc1/thunder/executors/sdpaex.py#L618-L680

Based on the PR discussions, this PR is repurposed to introduce a new IR node `SdpaFwdOp` for scaled dot product flash attention forward (see #2278 for details). This PR does not include changes to the scheduler.

Priya2698 self-assigned this May 21, 2024

Priya2698 mentioned this issue May 23, 2024

Add a new SdpaFwdOp IR node for Flash Attention #2294

Merged

wujingyue mentioned this issue May 29, 2024

Write a sharded transformer block in nvFuser API. #2199

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate a new SDPA IR node #2278

Investigate a new SDPA IR node #2278

Priya2698 commented May 21, 2024

Priya2698 commented May 28, 2024 •

edited

Priya2698 commented May 28, 2024

IvanYashchuk commented May 29, 2024

jjsjann123 commented May 29, 2024

Priya2698 commented Jun 6, 2024

Investigate a new SDPA IR node #2278

Investigate a new SDPA IR node #2278

Comments

Priya2698 commented May 21, 2024

Priya2698 commented May 28, 2024 • edited

Priya2698 commented May 28, 2024

IvanYashchuk commented May 29, 2024

jjsjann123 commented May 29, 2024

Priya2698 commented Jun 6, 2024

Priya2698 commented May 28, 2024 •

edited