-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added parallel implementation of PrincipalCurvaturesEstimation #6048
Conversation
Hey @alexnavtt I just notized this comment at the documentation: I wonder if it returns the same information? I haven't looked deeply into this. Edit: |
As per having both PrincipalCurvaturesEstimation and a PrincipalCurvaturesEstimationOMP - I think the latest direction was moving the algorithm to be only in the base class and then in the OMP subclass only the methods for setting number of threads etc. So could the PrincipalCurvaturesEstimation be made non-statefull as well and then only keep a single implementation. Whats your opinion @mvieth ? |
I think it could make sense to just parallelize the existing class, and not add another class for the parallel implementation. The only concern I would have is whether changing |
Alright, this all sounds good. I'll run a quick benchmark today or tomorrow to see if it makes any difference (and personally I'm expecting it won't since statically sized Eigen matrices are allocated on the stack) and then I'll move the edits to the base class. The only remaining question now is whether the parallelized PrincipalCurvaturesEstimation should have default thread number of 0 or 1. I'm leaning towards leaving it at 1 to maintain API stability and people can manually change that if they want parallelization. |
That is true, but I could imagine that it might make a difference due to the
I am fine with either since the output is the same, and I would not consider parallelization a breaking change. But if you want 1 to be the default, that is also okay. |
I've migrated the changes to the base class. Benchmarking showed about a 1% slowdown using local variables compared to storing them as class members, but that seems like a fair trade to me with the option for parallelization. I kept the default thread number at 1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thank you!
The above is only suggestions as we try to move towards https://clang.llvm.org/extra/clang-tidy/checks/modernize/use-default-member-init.html |
* low will lead to more parallelization overhead. Setting it too high | ||
* will lead to a worse balancing between the threads. | ||
*/ | ||
PrincipalCurvaturesEstimation (unsigned int nr_threads = 1, int chunk_size = 256) : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would mean that it is no longer possible to set nr_threads
in the constructor, and that it is no longer possible to change chunk_size
at all? Btw the clang-tidy CI already checks use-default-member-init https://github.com/PointCloudLibrary/pcl/blob/master/.clang-tidy#L18
I was using PrincipalCurvaturesEstimation for a project and noticed that it didn't have an OMP variant like a lot of the other features, and the runtime on this algorithm is fairly slow. I took inspiration from the NormalEstimation -> NormalEstimationOMP comparison. I've tested my implementation on a sample pointcloud, and verified that all curvatures were exactly equal to those calculated using the single-threaded version, and it was calculated in parallel as expected.
I do have a couple questions to just double check with any reviewer to make sure I did this right:
getNormalVector3fMap()
where applicable. I just want to verify that any class with a normal is guaranteed to have that method available.