How to understand the relationship between mamba2 and GLA? #380

alpacaduby · 2024-06-10T12:52:02Z

Hi, author, thank you for sharing your fantastic work.

When I was reading things about mamba, I found that in mamba-mini , it says that GLA is a special case of mamba and when the dimension of w changed from [batch, seqlen, dtstate, dim] to [batch, seqlen, dtstate], then they are equalivent.

The author of VMamba also suggested that in the arxiv paper:

Then I found that in the SSD, which is the core component of mamba2, the dimension of matrix A $\odot$ dt is also reduced to [..., nheads], which is may suggests that the matrix w has been reduced to [batch, seqlen]. So my question is, is mamba2 a special case of GLA?

Moreover, I did an experiment testing mamba2 and GLA, and found that they almost share the same result with each other, only with numerical differences (1e-5).

So, How to understand the relationship between mamba1, mamba2 and GLA?

The text was updated successfully, but these errors were encountered:

alpacaduby changed the title ~~What is the difference between mamba2 and GLA?~~ How to understand the relationship between mamba2 and GLA? Jun 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to understand the relationship between mamba2 and GLA? #380

How to understand the relationship between mamba2 and GLA? #380

alpacaduby commented Jun 10, 2024 •

edited

Loading

How to understand the relationship between mamba2 and GLA? #380

How to understand the relationship between mamba2 and GLA? #380

Comments

alpacaduby commented Jun 10, 2024 • edited Loading

alpacaduby commented Jun 10, 2024 •

edited

Loading