-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA]: Investigate if NVTX ranges in CUB algorithms support graph capture #1674
Comments
Testing with You can see the @gevtushenko Would you like to have NVTX ranges disabled when stream capturing is active? That would require us to check the stream state on each invocation of a CUB device API. |
@bernhardmgruber thank you for taking a look! The results seem to match our intuition. Regarding the action item, investigating how much overhead is caused by checking if stream is in capture mode is non-trivial amount of work. I'd just update the NVTX section of the developer overview to clarify this behavior. |
I will add it after #1753 is merged. |
Is this a duplicate?
Area
CUB
Is your feature request related to a problem? Please describe.
As of #719 we have NVTX ranges in CUB device algorithms. Most CUB device algorithms support graph capture. For now, it's not clear if NVTX is working correctly in presence of graph capture.
Describe the solution you'd like
We need to understand if NVTX ranges work correctly when CUB is in graph capture mode. Since all of our
*_.lid_2
tests run CUB algorithms in graph capture mode, one of these tests, saycub.cpp17.test.device_select_if.lid_2
, can be used as an example. If NVTX ranges do not contain kernels they surround, I'd prefer no NVTX ranges to be reported.Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: