Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect results and a few further observations #79

Open
audiomuze opened this issue Sep 27, 2023 · 4 comments · Fixed by #93
Open

Incorrect results and a few further observations #79

audiomuze opened this issue Sep 27, 2023 · 4 comments · Fixed by #93
Assignees
Labels
bug : critical A bug was found in difPy, and it is critical to the functioning of the package. status : implemented Feature was implemented in difPy.

Comments

@audiomuze
Copy link

audiomuze commented Sep 27, 2023

@elisemercury , I've just pulled and tested your latest commit and have encountered what I assume are bugs:

running python /home/x/git/Duplicate-Image-Finder/difPy/dif.py --directory /mnt/sdc/2tag/ --output_directory /tmp --recursive True --limit_extensions True --show_progress True:

  • In one instance a 3000x3000 file was marked as lower quality than a much smaller image with otherwise same properties
    image

Edited extract from /tmp/difPy_20230927222221_lower_quality.json:

{"lower_quality": ["/pathtofile/xfolder.jpg"]}
  • if there are x identical (i.e. their md5sum is identical) files of lower quality in the same folder and one of superior quality, difpy only flags one of the lower quality files rather than all of them

  • as an observation: perusal of stats.json shows many instances of "ImageFilterWarning: invalid image extension." signifying to me that these non-image files are still being assessed rather than behaving according to the --limit_extensions True switch shown above. Thus it looks like there's a further opportunity to enhance performance by ignoring non-image extensions.

@audiomuze
Copy link
Author

I've just run it against another group of files and on this occasion difpy reported no lower quality images whereas in reality there were many instances of a smaller image and a larger, higher resolution image.

Perhaps the easiest way to illustrate would be for me to send you the image files to run against and compare results locally?

@audiomuze
Copy link
Author

@elisemercury,just flagging in case you missed this?

@elisemercury
Copy link
Owner

Hi @audiomuze

Thanks so much for flagging these issues! They will be investigated and considered with the next difPy release.

Thanks again!
Best
Elise

@elisemercury elisemercury self-assigned this Dec 6, 2023
@elisemercury elisemercury added status : in progress Feature is currently being implemented in difPy. bug : critical A bug was found in difPy, and it is critical to the functioning of the package. labels Dec 6, 2023
@elisemercury
Copy link
Owner

Hi @audiomuze,

difPy v4.1.0 has been release and I would recommend testing it on your dataset to see if you can see some improvements. The new version comes with an improved comparison algorithm.

Feel free to reach out if the issue should still persist.

Thanks,
Elise

@elisemercury elisemercury added status : implemented Feature was implemented in difPy. and removed status : in progress Feature is currently being implemented in difPy. labels Feb 21, 2024
@elisemercury elisemercury linked a pull request Feb 21, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug : critical A bug was found in difPy, and it is critical to the functioning of the package. status : implemented Feature was implemented in difPy.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants