Added parallel implementation of PrincipalCurvaturesEstimation #6048

alexnavtt · 2024-05-20T16:00:31Z

I was using PrincipalCurvaturesEstimation for a project and noticed that it didn't have an OMP variant like a lot of the other features, and the runtime on this algorithm is fairly slow. I took inspiration from the NormalEstimation -> NormalEstimationOMP comparison. I've tested my implementation on a sample pointcloud, and verified that all curvatures were exactly equal to those calculated using the single-threaded version, and it was calculated in parallel as expected.

I do have a couple questions to just double check with any reviewer to make sure I did this right:

The original PrincipalCurvaturesEstimation stored most of their calculation variables (covariance matrix, projected normals, etc.) as private class members, when they could have easily been local in scope. I changed this to local variables in my implementation to prevent data races. I just want to make sure there wasn't some subtle reason that I'm missing that those variables would be stored as class members.
For the reason described above, it would also be very easy to modify the existing PrincipalCurvatuesEstimation class to support parallel execution. Would that be preferable to creating a new class as I've done?
In my implementation I corrected one or two minor things from the original such as const-correctness. I also switched to using the more expressive getNormalVector3fMap() where applicable. I just want to verify that any class with a normal is guaranteed to have that method available.

larshg · 2024-05-21T09:36:20Z

Hey @alexnavtt

I just notized this comment at the documentation:
Note
The code is stateful as we do not expect this class to be multicore parallelized. Please look at NormalEstimationOMP for an example on how to extend this to parallel implementations.

I wonder if it returns the same information? I haven't looked deeply into this.

Edit:
I guess you made the OMP version non-stateful and it does provides additional information 👍 I missread the original note.

larshg · 2024-05-21T09:47:11Z

As per having both PrincipalCurvaturesEstimation and a PrincipalCurvaturesEstimationOMP - I think the latest direction was moving the algorithm to be only in the base class and then in the OMP subclass only the methods for setting number of threads etc.

So could the PrincipalCurvaturesEstimation be made non-statefull as well and then only keep a single implementation.

Whats your opinion @mvieth ?

mvieth · 2024-05-21T13:48:04Z

I think it could make sense to just parallelize the existing class, and not add another class for the parallel implementation. The only concern I would have is whether changing computePointPrincipalCurvatures to use local temporary variables instead of the private class members is slower because it needs to allocate/construct and free the variables repeatedly. A small benchmark might be interesting. An alternative would be to add an overload of computePointPrincipalCurvatures that also receives the temporary variables (or at least some of them, like projected_normals_, maybe covariance_matrix_), which are then local to each thread.

alexnavtt · 2024-05-21T15:36:10Z

Alright, this all sounds good. I'll run a quick benchmark today or tomorrow to see if it makes any difference (and personally I'm expecting it won't since statically sized Eigen matrices are allocated on the stack) and then I'll move the edits to the base class.

The only remaining question now is whether the parallelized PrincipalCurvaturesEstimation should have default thread number of 0 or 1. I'm leaning towards leaving it at 1 to maintain API stability and people can manually change that if they want parallelization.

mvieth · 2024-05-22T14:33:33Z

Alright, this all sounds good. I'll run a quick benchmark today or tomorrow to see if it makes any difference (and personally I'm expecting it won't since statically sized Eigen matrices are allocated on the stack)

That is true, but I could imagine that it might make a difference due to the projected_normals_ vector

The only remaining question now is whether the parallelized PrincipalCurvaturesEstimation should have default thread number of 0 or 1. I'm leaning towards leaving it at 1 to maintain API stability and people can manually change that if they want parallelization.

I am fine with either since the output is the same, and I would not consider parallelization a breaking change. But if you want 1 to be the default, that is also okay.

alexnavtt · 2024-05-22T17:27:21Z

I've migrated the changes to the base class. Benchmarking showed about a 1% slowdown using local variables compared to storing them as class members, but that seems like a fair trade to me with the option for parallelization. I kept the default thread number at 1.

mvieth

Looks good to me, thank you!

larshg · 2024-05-23T10:38:26Z

The above is only suggestions as we try to move towards https://clang.llvm.org/extra/clang-tidy/checks/modernize/use-default-member-init.html

mvieth · 2024-05-23T12:26:10Z

features/include/pcl/features/principal_curvatures.h

+ * low will lead to more parallelization overhead. Setting it too high 
+ * will lead to a worse balancing between the threads.
+ */
+ PrincipalCurvaturesEstimation (unsigned int nr_threads = 1, int chunk_size = 256) : 


This would mean that it is no longer possible to set nr_threads in the constructor, and that it is no longer possible to change chunk_size at all? Btw the clang-tidy CI already checks use-default-member-init https://github.com/PointCloudLibrary/pcl/blob/master/.clang-tidy#L18

Added parallel implementation of PrincipalCurvatureEstimation

9b1e455

alexnavtt changed the title ~~Added parallel implementation of PrincipalCurvatureEstimation~~ Added parallel implementation of PrincipalCurvaturesEstimation May 20, 2024

Fixed openMP integral type error when building on MSVC

77ae5a2

mvieth added changelog: enhancement Meta-information for changelog generation module: features labels May 21, 2024

Migrated changes to base class

8233b53

Fixed signed error for OpenMP loop variable on MSVC

39cabf2

mvieth approved these changes May 23, 2024

View reviewed changes

mvieth reviewed May 23, 2024

View reviewed changes

larshg approved these changes May 23, 2024

View reviewed changes

mvieth merged commit f7fde26 into PointCloudLibrary:master May 24, 2024
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added parallel implementation of PrincipalCurvaturesEstimation #6048

Added parallel implementation of PrincipalCurvaturesEstimation #6048

alexnavtt commented May 20, 2024 •

edited

larshg commented May 21, 2024 •

edited

larshg commented May 21, 2024

mvieth commented May 21, 2024

alexnavtt commented May 21, 2024

mvieth commented May 22, 2024

alexnavtt commented May 22, 2024

mvieth left a comment

larshg commented May 23, 2024

mvieth May 23, 2024

Added parallel implementation of PrincipalCurvaturesEstimation #6048

Added parallel implementation of PrincipalCurvaturesEstimation #6048

Conversation

alexnavtt commented May 20, 2024 • edited

larshg commented May 21, 2024 • edited

larshg commented May 21, 2024

mvieth commented May 21, 2024

alexnavtt commented May 21, 2024

mvieth commented May 22, 2024

alexnavtt commented May 22, 2024

mvieth left a comment

Choose a reason for hiding this comment

larshg commented May 23, 2024

mvieth May 23, 2024

Choose a reason for hiding this comment

alexnavtt commented May 20, 2024 •

edited

larshg commented May 21, 2024 •

edited