You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Flash Attention was added recently but I cannot find any documentation on it. I'm running CPU-only. Does it work for CPU inference? I can't measure any difference. Perhaps it helps for large context? Is it supposed to speed up text generation? Or it only works for large batches of input text? Is there any downside for enabling it? Why is it off by default?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Flash Attention was added recently but I cannot find any documentation on it. I'm running CPU-only. Does it work for CPU inference? I can't measure any difference. Perhaps it helps for large context? Is it supposed to speed up text generation? Or it only works for large batches of input text? Is there any downside for enabling it? Why is it off by default?
Beta Was this translation helpful? Give feedback.
All reactions