Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added parallel implementation of PrincipalCurvaturesEstimation #6048

Merged
merged 4 commits into from
May 24, 2024

Conversation

alexnavtt
Copy link
Contributor

@alexnavtt alexnavtt commented May 20, 2024

I was using PrincipalCurvaturesEstimation for a project and noticed that it didn't have an OMP variant like a lot of the other features, and the runtime on this algorithm is fairly slow. I took inspiration from the NormalEstimation -> NormalEstimationOMP comparison. I've tested my implementation on a sample pointcloud, and verified that all curvatures were exactly equal to those calculated using the single-threaded version, and it was calculated in parallel as expected.

I do have a couple questions to just double check with any reviewer to make sure I did this right:

  • The original PrincipalCurvaturesEstimation stored most of their calculation variables (covariance matrix, projected normals, etc.) as private class members, when they could have easily been local in scope. I changed this to local variables in my implementation to prevent data races. I just want to make sure there wasn't some subtle reason that I'm missing that those variables would be stored as class members.
  • For the reason described above, it would also be very easy to modify the existing PrincipalCurvatuesEstimation class to support parallel execution. Would that be preferable to creating a new class as I've done?
  • In my implementation I corrected one or two minor things from the original such as const-correctness. I also switched to using the more expressive getNormalVector3fMap() where applicable. I just want to verify that any class with a normal is guaranteed to have that method available.

@alexnavtt alexnavtt changed the title Added parallel implementation of PrincipalCurvatureEstimation Added parallel implementation of PrincipalCurvaturesEstimation May 20, 2024
@larshg
Copy link
Contributor

larshg commented May 21, 2024

Hey @alexnavtt

I just notized this comment at the documentation:
Note
The code is stateful as we do not expect this class to be multicore parallelized. Please look at NormalEstimationOMP for an example on how to extend this to parallel implementations.

I wonder if it returns the same information? I haven't looked deeply into this.

Edit:
I guess you made the OMP version non-stateful and it does provides additional information 👍 I missread the original note.

@larshg
Copy link
Contributor

larshg commented May 21, 2024

As per having both PrincipalCurvaturesEstimation and a PrincipalCurvaturesEstimationOMP - I think the latest direction was moving the algorithm to be only in the base class and then in the OMP subclass only the methods for setting number of threads etc.

So could the PrincipalCurvaturesEstimation be made non-statefull as well and then only keep a single implementation.

Whats your opinion @mvieth ?

@mvieth
Copy link
Member

mvieth commented May 21, 2024

I think it could make sense to just parallelize the existing class, and not add another class for the parallel implementation. The only concern I would have is whether changing computePointPrincipalCurvatures to use local temporary variables instead of the private class members is slower because it needs to allocate/construct and free the variables repeatedly. A small benchmark might be interesting. An alternative would be to add an overload of computePointPrincipalCurvatures that also receives the temporary variables (or at least some of them, like projected_normals_, maybe covariance_matrix_), which are then local to each thread.

@mvieth mvieth added changelog: enhancement Meta-information for changelog generation module: features labels May 21, 2024
@alexnavtt
Copy link
Contributor Author

Alright, this all sounds good. I'll run a quick benchmark today or tomorrow to see if it makes any difference (and personally I'm expecting it won't since statically sized Eigen matrices are allocated on the stack) and then I'll move the edits to the base class.

The only remaining question now is whether the parallelized PrincipalCurvaturesEstimation should have default thread number of 0 or 1. I'm leaning towards leaving it at 1 to maintain API stability and people can manually change that if they want parallelization.

@mvieth
Copy link
Member

mvieth commented May 22, 2024

Alright, this all sounds good. I'll run a quick benchmark today or tomorrow to see if it makes any difference (and personally I'm expecting it won't since statically sized Eigen matrices are allocated on the stack)

That is true, but I could imagine that it might make a difference due to the projected_normals_ vector

The only remaining question now is whether the parallelized PrincipalCurvaturesEstimation should have default thread number of 0 or 1. I'm leaning towards leaving it at 1 to maintain API stability and people can manually change that if they want parallelization.

I am fine with either since the output is the same, and I would not consider parallelization a breaking change. But if you want 1 to be the default, that is also okay.

@alexnavtt
Copy link
Contributor Author

I've migrated the changes to the base class. Benchmarking showed about a 1% slowdown using local variables compared to storing them as class members, but that seems like a fair trade to me with the option for parallelization. I kept the default thread number at 1.

Copy link
Member

@mvieth mvieth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thank you!

@larshg
Copy link
Contributor

larshg commented May 23, 2024

The above is only suggestions as we try to move towards https://clang.llvm.org/extra/clang-tidy/checks/modernize/use-default-member-init.html

* low will lead to more parallelization overhead. Setting it too high
* will lead to a worse balancing between the threads.
*/
PrincipalCurvaturesEstimation (unsigned int nr_threads = 1, int chunk_size = 256) :
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would mean that it is no longer possible to set nr_threads in the constructor, and that it is no longer possible to change chunk_size at all? Btw the clang-tidy CI already checks use-default-member-init https://github.com/PointCloudLibrary/pcl/blob/master/.clang-tidy#L18

@mvieth mvieth merged commit f7fde26 into PointCloudLibrary:master May 24, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog: enhancement Meta-information for changelog generation module: features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants