-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
triton generates unnecessary shared memory stores/loads #3491
Comments
Based on a suggestion from @peterbell10 I removed |
Is there a case where removing |
I don't think it will result in incorrect code, but I may be wrong. It can affect performance, so will likely need to go through benchmark suites to verify performance impact. |
I'm using pytorch |
Given that |
You are right. I thought I built it after the source pull. |
I looked at this, but not sure what is the best solution :] |
cc @ThomasRaoux @Jokeren for visibility. |
For the following triton kernels generated by pytorch, triton generated shared memory stores and loads in the LLVM IR and PTX just before the atomic add operation.
Shared memory loads/stores are unnecessary in this case. cc @peterbell10
The text was updated successfully, but these errors were encountered: