Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate std::execution for alpaka #2235

Open
mehmetyusufoglu opened this issue Jan 30, 2024 · 6 comments
Open

Investigate std::execution for alpaka #2235

mehmetyusufoglu opened this issue Jan 30, 2024 · 6 comments

Comments

@mehmetyusufoglu
Copy link
Contributor

mehmetyusufoglu commented Jan 30, 2024

  1. First part of std::execution is c++17 stl execution policies. Can stl functions like std::transform be used by alpaka so that they use different backends.

  2. New std::Execution includes sender/receiver graph structure

std::execution or sender/receiver framework is a proposal (expected to be included in C++26) which creates an abstraction mainly for async task scheduling by creating a task graph and a scheduler to run those tasks. Scheduler is similar to alpaka accelerator concept. It does not create a mechanism to access to the thread index or block index.

Each task is called sender, and senders have different types (created by specific sender adapters) as nodes of directed acyclic graph. Some examples:

"Then" sender adapter, returns a sender which waits for the result of previous sender in the task graph and uses that result as an argument to a given function.

"Bulk" sender adapter, returns a sender describing the task of invoking the provided function with every index in the provided shape along with the values sent by the input sender.

"when_all" returns a sender that completes once all of the input senders have completed.

As a preliminary result: std::execution is another abstraction similar to alpaka, in my opinion it is an alternative to alpaka because it creates an abstration over different execution resources. Nameyl, by creating scheduler concept it provides adaptability to different backends. On the other hand, it does not provide direct access to the tread or block indexes. It lefts "the thread managament issue" to the different backend implementations of the "schedulers".

By defining task types which interpret previous task's results in a structured way it helps creating structured paralelism. Nvidia implementation of std::execution (with stdexec backend for gpu support) shows that it has an nvidia support. After many discussions, this proposal is not included in 2023, it is expected to be included in 2026.

Sources
A detailed and good video
https://youtu.be/QSaUCzL7nCU?si=g4kl_DXrAa4Sd_ZD

A recent version of proposal:

https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p2300r7.html

Previous paper:

https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p0443r14.html

Implementation by nvidia, with stdexec for GPU backend (scheduler)

https://github.com/NVIDIA/stdexec?tab=readme-ov-file

@mehmetyusufoglu
Copy link
Contributor Author

mehmetyusufoglu commented Feb 1, 2024

Meeting notes

  1. Related with the standard execution policies of C++17: @psychocoderHPC made a good suggestion, but after discussions it turned out to be not feasible enough.
  2. About the std::execution proposal for C++26: The task graph concept and connecting nodes with pipes so that the results of one node passed to another node (without copying the results to the CPU back) is important and promising. But since it is still a proposal for C++26 and currently there is no urgent need for supporting it with alpaka, there is nothing planned for now.

@bernhardmgruber
Copy link
Member

First part of std::execution is c++17 stl execution policies. Can stl functions like std::transform be used by alpaka so that they use different backends.

Do you have a source that says that C++17 parallel STL (PSTL) == std::execution? In my mind, they are completely separate things. The PSTL is a parallelisation of STL algorithms, whereas std::execution refers to senders&receivers (and previously to executers) which are a lower level task graph building framework.

  1. Related with the standard execution policies of C++17: @psychocoderHPC made a good suggestion, but after discussions it turned out to be not feasible enough.

Oh, I think it's perfectly feasible. This is what vikunja is, or should be. I imagine it like:

transform(with_alpaka(... params ...), begin(data), end(data), op);

Where with_alpaka(... params ...) is the execution policy.

@mehmetyusufoglu
Copy link
Contributor Author

mehmetyusufoglu commented Feb 2, 2024

The problem turned out to be after each transform call, the result will be copied to the CPU again not to break the c++ standard. It will not be a "task graph" like solution. ( as @SimeonEhrig said. ) Parameters can be passed by beginFilter() instead of begin (as @psychocoderHPC proposed) or in my opinion may be by lambda using a lambda generator.

@mehmetyusufoglu
Copy link
Contributor Author

mehmetyusufoglu commented Feb 2, 2024

C++17 parallel STL is in the header called execution, hence execution policies of C++17 are in the execution namespace. https://en.cppreference.com/w/cpp/header/execution

@bernhardmgruber
Copy link
Member

The problem turned out to be after each transform call, the result will be copied to the CPU again not to break the c++ standard. It will not be a "task graph" like solution. ( as @SimeonEhrig said. )

That's why we have C++20 ranges, so we can build lazy graphs to then schedule them on a PSTL algorithm:

auto graph = data | transform(...) | filter(...) | transform(...) ;
for_each(with_alpaka(... params ...), begin(data), end(data));

However, there are certain problems with this design as well. E.g. some implementations struggle to accelerate non-random access ranges, like filter ranges.

Still, sometimes a simple transform or reduce is all you need, and you would be happy to write that in a single line :)

C++17 parallel STL is in the header called execution, hence execution policies of C++17 are in the execution namespace. https://en.cppreference.com/w/cpp/header/execution

That is correct. However, the C++ proposal std::execution is something different than the PSTL or execution policies.

@mehmetyusufoglu
Copy link
Contributor Author

mehmetyusufoglu commented Feb 2, 2024

Yes, using ranges would simplify the alpaka pipeline a lot. Which is I think needed, but needs C++20 as you said.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants