Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvement #14

Open
DonaldTrump88 opened this issue May 19, 2021 · 4 comments
Open

Performance improvement #14

DonaldTrump88 opened this issue May 19, 2021 · 4 comments

Comments

@DonaldTrump88
Copy link

I am doing clustering of about 50K locations. Each cluster should have about 20 or less locations. Unfortunately it takes about 1 hour to finish the algorithm. My initial guess says that repeated distance calculation makes it slow, if I add the correct distance formula based on LatLong it will be slower.
If you also think so then adding distance matrix will be help to optimize it. Here is similar example in DBScan.
https://github.com/bhavikm/DBSCAN-clustering/blob/master/index.php
The matrix calculation can be done when user calls solve.

@DonaldTrump88
Copy link
Author

DonaldTrump88 commented May 19, 2021

@bdelespierre
Copy link
Owner

Thanks @Ninja-007, I'll give it a look. If you have suggestions for implementation feel free to start a PR

@halfhope
Copy link

Hi!

You can store a diagonal matrix in a one-dimensional array. It could be two times faster.

https://gist.github.com/halfhope/8589f5f97f76e066480dcfc7c0ac88da

@bdelespierre
Copy link
Owner

Hi @halfhope, thanks for your comment. Can you propose an implementation and make a pull-request?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants