-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NDTensors] Fix contracting dense with diag on GPU #1453
Conversation
I would prefer not going that route since that is a much more involved change that would likely require rewriting a lot of the |
Tr
on GPUCo-authored-by: Matt Fishman <[email protected]>
@mtfishman So I am pushing an idea on how to fix the issue with
Since the dense for
Update* I just checked and the code I pushed fixes both of the errors in the bug report on metal. Testing the other backends now. |
What about using |
@mtfishman For the dispatch of the |
Yes. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #1453 +/- ##
==========================================
+ Coverage 43.65% 44.14% +0.49%
==========================================
Files 136 144 +8
Lines 8806 9374 +568
==========================================
+ Hits 3844 4138 +294
- Misses 4962 5236 +274 ☔ View full report in Codecov by Sentry. |
628e330
to
2059288
Compare
1cdde58
to
9a48ca0
Compare
Co-authored-by: Matt Fishman <[email protected]>
@kmp5VT this looks good to me. The only remaining thing I can see to do is fix tensor contractions involving block sparse tensors with diagonal blocks. What's the status of that? I think we can fix that in a follow-up PR, I guess that can be fixed in a similar way (say by calling |
@mtfishman Thats a good point. I hadn't checked the blocksparse implementation. I ran this code
and it currently fails because of scalar indexing (because its fed into the cpu dense * diag code). I found that in |
Great, glad to see it was a simple fix, simpler than I was picturing. Seems like the only thing left is to add a test for the block sparse case, after that is this good to go? |
Besides the final comment, this looks good, thanks. |
Description
In reference to this bug report
This bug is two parted. First I am working to make it possible to call
tr
on GPU based Tensors. The issue here is a scalar indexing problem where the code tries togetdiagindex
of the tensor. For now I have tried to use@allowscalar
andexpose
to solve the issue. Timing wise I have found CUDA basedtr
with@allowscalar
is roughly 2x faster than CPU based trace.The second bug is that delta functions are not being properly constructed on GPU. This is, impart, due to delta being a
UniformDiag
which does not carry information about where the diag will be allocated and assumes CPU. I am not sure yet how to fix this problem. Potentially we could replace the datatype ofUniformDiag
from<:Number
to<:UnallocatedFill{<:Number}
Checklist: