-
Notifications
You must be signed in to change notification settings - Fork 987
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deadlock issue in OpenBLAS with TBB #1336
Labels
Comments
Hi @goplanid, To guarantee parallelism in the inner loop, you could use TBB in the outer loop only. In the inner loop, you could launch You can prevent oversubscription by throttling down the oneTBB concurrency (e.g., to |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Brief Description: I am trying out this OpenBLAS PR [https://github.com/OpenMathLib/OpenBLAS/pull/4577] with TBB. I first register a callback in my code to dynamically change the threading backend. Instead of creating its own threads, OpenBLAS passes the work to the registered callback. I use TBB for running gemm and again want to use TBB for executing the callback.
Issue: I am facing deadlock issue in OpenBLAS (multiple threads get stuck in inner_threads function in OpenBLAS). OpenBLAS apears to encounter deadlock when used with fewer threads than no of available threads.
Below is my test code and steps to reproduce it.
Run command: g++ -std=c++11 -o tbb_nested tbb_nested.cpp -ltbb -lpthread -I/home/openblas/include -L/home/openblas/lib -lopenblas -Wl,-rpath,/home/openblas/lib
Help needed: So as you can see here, I have below case of nested parallelism,
outer loop: tbb::parallel_for(tbb::blocked_range(0, 2), MatrixMultiplicationTask(A,B,C));
inner loop: tbb::parallel_for(tbb::blocked_range(0, numjobs), innerLoopTask);
In the above code Level 1 runs for 2 iterations and each iteration of Level 1 runs numjobs no of iterations(as it is an inner loop). I have a dependency in my code such that innerLoopTask can only operate when exact no of numjobs threads are used. What is the best possible nested solution provided by TBB to solve this problem? Kindly advise.
The text was updated successfully, but these errors were encountered: