Why mamba2 is much slower than transformer (flash attn)? #378

TimothyChen225 · 2024-06-09T02:08:41Z

No description provided.

Dexterp37 · 2024-06-09T10:18:03Z

AlwaysFHao · 2024-06-17T00:43:25Z

I also encountered the same problem. Referring to the author's explanation in the #355 it is necessary to pre compile the model, which can be referred to https://discuss.pytorch.org/t/how-to-use-torch-compile-with-cuda-graphs-when-using-gradient-activation-checkpointing/179466.
You can wrap your model with the following code, although the first round of forward propagation will be very slow, the subsequent propagation will be very fast:
model_compile = torch.compile(model, mode="reduce-overhead")

TimothyChen225 · 2024-06-18T01:18:39Z

I also encountered the same problem. Referring to the author's explanation in the #355 it is necessary to pre compile the model, which can be referred to https://discuss.pytorch.org/t/how-to-use-torch-compile-with-cuda-graphs-when-using-gradient-activation-checkpointing/179466. You can wrap your model with the following code, although the first round of forward propagation will be very slow, the subsequent propagation will be very fast: model_compile = torch.compile(model, mode="reduce-overhead")

thks for help, but it is still slow

TimothyChen225 closed this as completed Jun 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why mamba2 is much slower than transformer (flash attn)? #378

Why mamba2 is much slower than transformer (flash attn)? #378

TimothyChen225 commented Jun 9, 2024

Dexterp37 commented Jun 9, 2024

AlwaysFHao commented Jun 17, 2024

TimothyChen225 commented Jun 18, 2024

Why mamba2 is much slower than transformer (flash attn)? #378

Why mamba2 is much slower than transformer (flash attn)? #378

Comments

TimothyChen225 commented Jun 9, 2024

Dexterp37 commented Jun 9, 2024

AlwaysFHao commented Jun 17, 2024

TimothyChen225 commented Jun 18, 2024