-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run summary show number of edges instead of number of images #74
Comments
HI @amirmk89 not sure how to fix this, since the percentage is related to edges, since if we run k=3 and k=100 the computation will be different.. |
It's bizarre seeing counts that are higher than the number of images
What I would expect is that there's a hierarchy of types of similarity, so the images get binned into being fully identical or nearly identical or similar or outlier. If an image is fully identical with any other image then it's classed as fully identical, even if it is also nearly identical or similar to other images. |
Following a fastdup run with a lower threshold, the summary screen lists counts and percentages that are inconsistent with the number of images, and refer to the number of edges. Also, counts and percentages don't align.
Here, for outliers, 1,339 outliers are ~10% of the data if are all images. if 3.33% are outliers, count should be 442 images.
Thanks!
The text was updated successfully, but these errors were encountered: