Convert OpenMP parallelization to OneAPI::TBB #6626

dbs4261 · 2024-01-27T03:35:43Z

OpenMP acceleration has been migrated to use oneapi::TBB.

Type

Bug fix (non-breaking change which fixes an issue): Fixes #
New feature (non-breaking change which adds functionality). Resolves #
Breaking change (fix or feature that would cause existing functionality to not work as expected) Resolves #N/A

Motivation and Context

Many components of Open3D imply an eventual shift away from OpenMP to TBB. This includes some sections where tbb is only used on one platform as 2D loop unrolling isn't supported on Win32. Lastly, by using multiple parallelization paradigms, nested parallelism is problematic. When using some Open3D methods from a TBB context, an OpenMP thread pool is created for each TBB thread.

Checklist:

I have run python util/check_style.py --apply to apply Open3D code style
to my code.
This PR changes Open3D behavior or adds new functionality.
- Both C++ (Doxygen) and Python (Sphinx / Google style) documentation is
  updated accordingly.
- I have added or updated C++ and / or Python unit tests OR included test
  results (e.g. screenshots or numbers) here.
I will follow up and update the code if CI fails.
For fork PRs, I have selected Allow edits from maintainers.

Description

Updated parallel for sections to use tbb::parallel_for. Adapted most loops that performed reductions with either omp reduction clauses or with critical sections to tbb::parallel_reduce implementations. Some of which required custom reduction objects instead of using lambdas. Added an atomic version of the ProgressBar for use with TBB.

There is still work to be done in documentation. This will break any user code that directly uses ParallelForCPU as OpenMP critical sections will no longer work. Additionally, TBB has no approach for setting the maximum number of threads like OpenMP does with OMP_NUM_THREADS. In C++ code a tbb::global_control object could be used, but it is unclear to how to provide that sort of functionality for python users.

…gh not using the oneapi scope). Untested but building.

…in conjunction with tbb parallel constructs.

…of self intersecting triangles.

…he progress bar into its own function and a bulk inplace add function operator+=. Also added TBBProgressBar. It does not inherit from ProgressBar as it uses an atomic for counting and has slightly different internals to use that atomicity.

…gress bar to limit spinning on the mutex.

update-docs · 2024-01-27T03:35:47Z

Thanks for submitting this pull request! The maintainers of this repository would appreciate if you could update the CHANGELOG.md based on your changes.

dbs4261 · 2024-01-27T18:18:30Z

Ok, I ran my tests in my development environment. Guess I should use the docker containers to replicate the CI environment and figure out those tests.

ssheorey · 2024-01-30T05:01:06Z

Hi @dbs4261 thanks for picking this up!

Possibly fixes #6544

errissa · 2024-02-06T19:23:55Z

@dbs4261 Thanks for working on this! I just tested this PR on my Mac and got numerous TBB related compilation errors. I tried using the Homebrew version of TBB as well as the "build from source" configuration. There appear to be functions that this PR uses that are missing from the Homebrew and "build from source" versions of TBB on Mac.

I know this PR is still draft but wanted to report what I had found. Please let me know if you need any help testing/diagnosing issues on Mac.

dbs4261 · 2024-02-06T20:57:13Z

Hi @ssheorey this PR likely wont fix that issue as I haven't yet changed how the TBB dependency is being accessed. This is likely also why @errissa is facing issues building on Mac.

@errissa is homebrew pulling the OneAPI version of TBB? If you can provide me with the version of TBB you tried and the compiler errors I can take a look and figure out which version is required and work that into the PR.

ssheorey · 2024-02-06T23:58:21Z

@dbs4261 yes, you are right about not fixing #6544. We should update to the latest oneTBB as part of this PR to fix that though.

This is the latest version of oneTBB and is available for all platforms on github:

https://github.com/oneapi-src/oneTBB/releases/tag/v2021.11.0

The naming is off - this was released in Nov 2023.

I think this should also resolve @errissa 's issues on macOS.

dbs4261 · 2024-02-07T01:07:31Z

I agree that setting the version requirement for TBB should be part of this PR. Based on the ubuntu failure in CI, its the collaborative_call_once header that is missing. The TBB repo says the header hasnt been modified in 3 years, so I would think that any version that reports 2021+ should be fine. What does Open3D CI currently use for TBB?

errissa · 2024-02-07T02:16:50Z

@dbs4261 @ssheorey is correct about the oneTBB version. Homebrew's most recent version is 2021.11.0 so if this PR builds successfully against it, it would solve the MacOS issue I experienced.

dbs4261 · 2024-02-07T18:48:21Z

Looks like the minimum version requirement for collaborative_call_once.h is v2021.4.0. Now it looks like we aren't putting version requirements in the find package scripts in 3rdparty/find_dependencies.cmake, this means that error's like @errissa can still happen when using the system library. This raises the question of if I should set the system version requirement to the same version that I am providing in the ExternalProject_add call, or if I should add the newest version but set the system requirement to the minimal version.

ssheorey · 2024-02-10T00:02:32Z

Hi @dbs4261 , our usual policy is to upgrade to the latest version available, but set minimum version to what is required to make everything work. This helps to "future-proof" the updated code as much as possible by incorporating the latest bugfixes. Official binaries will be built with the latest version, but also allows the library to build on older versions by users.

…ase codacy

…versions of format wont automatically convert it to its underlying type.

ssheorey

[Initial look]

ssheorey · 2024-02-24T04:21:37Z

3rdparty/mkl/tbb.cmake

@@ -26,13 +26,10 @@ find_package(Git QUIET REQUIRED)
 ExternalProject_Add(
 ext_tbb
 PREFIX tbb
- URL https://github.com/wjakob/tbb/archive/141b0e310e1fb552bdca887542c9c1a8544d6503.tar.gz # Sept 2020
- URL_HASH SHA256=bb29b76eabf7549660e3dba2feb86ab501469432a15fb0bf2c21e24d6fbc4c72
+ URL https://github.com/oneapi-src/oneTBB/archive/refs/tags/v2021.4.0.tar.gz


Can we upgrade to the latest? v2021.11.0

No reason why not. I just put in the older version that had all the features I used.

ssheorey · 2024-02-24T04:22:42Z

cpp/open3d/geometry/PointCloudSegmentation.cpp

There's a merge conflict here. The CI can run only after it's fixed.

ssheorey · 2024-02-24T04:25:34Z

cpp/open3d/core/ParallelFor.h

- func(i);
- }
+ tbb::parallel_for(tbb::blocked_range<int64_t>(0, n, 32),
+ [&func](const tbb::blocked_range<int64_t>& range) {


How many threads will be used here? Currently, it's estimated with utility::EstimateMaxThreads() which gives us one thread per core (excluding hyperthreading).

Also, avoid using "magic numbers" (32). I think you have a GetDefaultChunkSize() function.

It will use up to the number of threads in task arena that called it. As for the chunk size, see my other comment.

ssheorey · 2024-02-27T15:51:05Z

cpp/open3d/utility/Parallel.cpp

- return "";
- }
-}
+int EstimateMaxThreads() { return tbb::this_task_arena::max_concurrency(); }


Can we use the number of cores (not number of HW threads)?

No, the number of tasks is determined by the caller. A caller could be using a small task arena to deal with IO, while a larger arena deals with processing something else. This actually brings up an issue that I don't yet know how to solve. TBB sets the maximum concurrency with a C++ variable that follows scope rules but doesn't need to be passed to functions. So I don't know how a python user would set the concurrency limit yet. I think it might need to be done with some sort of context manager. But I guess this change behavior in an environment where the number of threads was limited with the OpenMP environment variable.

ssheorey · 2024-02-27T15:58:59Z

cpp/open3d/utility/Parallel.cpp

- return 1;
-#endif
+std::size_t& DefaultGrainSizeTBB() noexcept {
+ static std::size_t GrainSize = 256;


Can you comment on how this value was selected? Did you see any performance differences for this value versus other values?

Honestly, I was guessing at grain size from this, but it really should be picked based off of profiling. My understanding is that the grain size provides loose guidance to TBB's automatic chunking mechanism. It works similarly to omp schedule(guided). Overall the goal is to provide plenty of work to each thread so the overhead of chunking is minimized, but small enough chunks that the scheduler can go back in a steal some if one of the threads gets held up. It might be worth taking another pass through the grain sizes that I put in and set them as a magic number times the DefaultGrainSizeTBB (which is mutable). That way the chunk size could be higher for doing a single operation with tensors, and smaller when looping through complex sections like in RANSAC.

ssheorey · 2024-03-07T22:00:34Z

[Notes about linking and binary distribution]

For linking TBB, recommendation is to link dynamically. For C++ binaries and applications, we will distribute TBB DLL along with the Open3D DLL.
oneapi-src/oneTBB#646

For Python, TBB libraries are available through PyPI, so we can add these as dependencies to requirements.txt
https://community.intel.com/t5/Intel-oneAPI-Threading-Building/How-to-ship-a-package-using-TBB-on-PyPI-manylinux/m-p/1227574

benjaminum · 2024-03-15T18:18:10Z

cpp/open3d/utility/ProgressBar.h

@@ -15,30 +18,57 @@ namespace utility {
 class ProgressBar {
 public:
 ProgressBar(size_t expected_count,
- const std::string &progress_info,
+ std::string progress_info,


Why has the const been removed here?

It has to be copied into the object, so it passed by value into the constructor and then by move into the member variable.

dbs4261 added 9 commits January 26, 2024 18:13

Switched from using OpenMP for parallelism to using oneAPI::TBB (thou…

1406872

…gh not using the oneapi scope). Untested but building.

Swichted usage of std::mutex to tbb::mutex for consistency when used …

cc96769

…in conjunction with tbb parallel constructs.

Fixed bug in CPU reduction

532f434

Shift atomic to outside of RW mutex in PointCloudSegmentation.cpp

ad6e2b2

Switch from using a mutex to a concurrent vector for parallelization …

9579664

…of self intersecting triangles.

Get maximum threads from TBB instead of OpenMP

0342274

Updated ClusterDBSCAN in PointCloudCluster.cpp to bulk update the pro…

f50505c

…gress bar to limit spinning on the mutex.

Applied Open3D style

d1a461d

dbs4261 changed the title ~~Omp2tbb~~ concert Jan 27, 2024

dbs4261 changed the title ~~concert~~ Convert OpenMP parallelization to OneAPI::TBB Jan 27, 2024

Updated tbb version

039b1a8

ssheorey requested review from errissa and ssheorey February 10, 2024 00:03

dbs4261 added 2 commits February 12, 2024 12:51

Marked single argument constructors for reductions as explicit to ple…

482af78

…ase codacy

Explicitly load atomics in calls to utility::Log*(...) because newer …

068bdcf

…versions of format wont automatically convert it to its underlying type.

ssheorey requested a review from benjaminum February 20, 2024 15:45

ssheorey reviewed Feb 27, 2024

View reviewed changes

ssheorey mentioned this pull request Mar 15, 2024

Error when installing open3d for conda environment, missing libomp, seg fault when installed #6196

Open

3 tasks

ssheorey linked an issue Mar 15, 2024 that may be closed by this pull request

Error when installing open3d for conda environment, missing libomp, seg fault when installed #6196

Open

3 tasks

benjaminum reviewed Mar 15, 2024

View reviewed changes

ssheorey added 2 commits April 17, 2024 12:50

Merge branch 'main' of github.com:intel-isl/Open3D into omp2tbb

0c583fb

style fix

c897a5a

ssheorey added this to the v0.20 milestone Apr 29, 2024

ssheorey added the build/install Build or installation issue label Apr 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert OpenMP parallelization to OneAPI::TBB #6626

Convert OpenMP parallelization to OneAPI::TBB #6626

dbs4261 commented Jan 27, 2024

update-docs bot commented Jan 27, 2024

dbs4261 commented Jan 27, 2024

ssheorey commented Jan 30, 2024

errissa commented Feb 6, 2024

dbs4261 commented Feb 6, 2024

ssheorey commented Feb 6, 2024

dbs4261 commented Feb 7, 2024

errissa commented Feb 7, 2024

dbs4261 commented Feb 7, 2024

ssheorey commented Feb 10, 2024

ssheorey left a comment

ssheorey Feb 24, 2024

dbs4261 Feb 27, 2024

ssheorey Feb 24, 2024

ssheorey Feb 24, 2024

ssheorey Feb 27, 2024

dbs4261 Feb 27, 2024

ssheorey Feb 27, 2024

dbs4261 Feb 27, 2024 •

edited

ssheorey Feb 27, 2024

dbs4261 Feb 27, 2024

ssheorey commented Mar 7, 2024

benjaminum Mar 15, 2024

dbs4261 Mar 15, 2024

Convert OpenMP parallelization to OneAPI::TBB #6626

Are you sure you want to change the base?

Convert OpenMP parallelization to OneAPI::TBB #6626

Conversation

dbs4261 commented Jan 27, 2024

Type

Motivation and Context

Checklist:

Description

update-docs bot commented Jan 27, 2024

dbs4261 commented Jan 27, 2024

ssheorey commented Jan 30, 2024

errissa commented Feb 6, 2024

dbs4261 commented Feb 6, 2024

ssheorey commented Feb 6, 2024

dbs4261 commented Feb 7, 2024

errissa commented Feb 7, 2024

dbs4261 commented Feb 7, 2024

ssheorey commented Feb 10, 2024

ssheorey left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dbs4261 Feb 27, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ssheorey commented Mar 7, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dbs4261 Feb 27, 2024 •

edited