-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deepspeed Ulysses #5492
Comments
Ulysses is, in principle, attention-type agnostic. Although we haven’t specifically tested Ulysses with Ring Attention, as long as the qkv can be split or sharded along sequence and head dimensions, it should work. Contributions are welcome! |
Hi @samadejacobs, I appreciate the insight. I will have to test both of them in conjunction together and let you know. Thank you, Enrico |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Ring Attention should work with Deepspeed Ulysses, correct? Are there any notable issues combining deepspeed's efficient sequence parallelism with such an attention mechanism? I do understand flash attention works.
https://github.com/zhuzilin/ring-flash-attention
The text was updated successfully, but these errors were encountered: