[SYCL 2020] Add support for ridiculous SYCL 2020 reduction API #1453
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds support for the ridiculous SYCL 2020 reduction API. We did have some earlier reduction support, which however was incomplete, and most importantly, implemented at the backend-specific level of the kernel launchers. This means that the existing support did not scale to the new backends that were added more recently (OpenCL or Level Zero) and our current main compilation flow, the generic SSCP compiler.
Overall, the SYCL 2020 reduction API is quite ridiculous because it integrates a high-level feature (reductions) directly into the low-level kernel launch API, which creates all sorts of massive software engineering challenges purely by choice, not by necessity. Also, the unconstrained generality of this feature requires massive effort to cover all of the different cases. It also does not provide users who actually want/need control with control of critical behavior, such as scratch allocation and deallocation behavior. So you're going to have to trust that your implementation does something reasonable.
This PR reimplements reductions at a higher level, and maps them to the reduction engine that was introduced for stdpar support.
In more detail:
initialize_to_identity
property and aligns default behavior with the SYCL 2020 specification by reducing on top of existing output values by default.generic
target on all backends.Limitations:
span
remain unimplemented. Implementing those might be possible viamarray
but it seems like a pretty pointless feature to me.omp.library-only
target will face a substantial performance regression. This is because implementing reductions turned out to only be feasible with the simplifying assumption that we always have the ndrange kernel execution model. This model is however very inefficient onomp.library-only
by design.Draft because this needs way more testing for all the different cases. It's only very lightly tested at the moment.