-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about RoPE code #253
Comments
The ordering is different. So it wont affect training from scratch but you cant load a model trained with different ordering. |
Thanks for your answer : ) Is there exist some reason that the latter implementation was widely used in code instead former one ? |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I found here exist a difference in rope implementation mostly on permutation. Does this difference not affect the final result ? I'm not quite sure what I'm thinking. Sincerely ask for your advice : )
Paper version should be:
version in this repo:
The text was updated successfully, but these errors were encountered: