-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
joint_exclusive_scan does not work in-place #1440
Comments
One reason is that we overwrite Upd 1: And NVIDIA Compute Sanitizer reports a potential race on a scratch buffer in Both fixes above are in d41f958 for anyone interested. Upd 2: And the host implementation seems to be pretty broken in the "in-place" case too, but apparently for independent reasons since the implementation is quite different. Upd 3: Ah, yes, we pass input and output to |
Bug summary
The current implementation of
joint_exclusive_scan
(at least HIP-like; SSCP does not support them yet) does not seem to support in-place operations, even though the standard requires it ("Note thatfirst
may be equal toresult
.")Also, it allocates a
__shared__
scratch storage for the operation (inside of__hipsycl_inclusive_scan_over_group
) even if the output is__shared__
too and can safely be used for scratch. Not a bug, just inefficiencyTo Reproduce
Expected behavior
A clear and concise description of what you expected to happen.
Describe your setup
The text was updated successfully, but these errors were encountered: