You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for creating this wonderful tool. I'm using it to deduplicate photo albums going back to my days with 35mm film. Simple file matching tools won't find photos that were accidentally scanned more than once.
I've found a probable bug, and I'm reporting the bug and a workaround.
In the file dif.py, the function _build_image_dictionaries() has this code at about line 182:
file_nums = [(i, valid_files[i]) for i in range(len(valid_files))]
Just after that there is logic that checks for invalid files and records those, but also adds valid files to the dictionaries. Invalid files are never added to the dictionaries, but the count is incremented.
The result of this is that there can be gaps in the file numbers. The build process works fine, but during the union search phase there will be a key error. When I first encountered this, I thought it must be a duplicate key, but it's actually a missing key.
I added some scaffold code to dump out the filename dictionary and the list of invalid files to an extra scratch log, and I found the numbering gaps.
It's not clear to me whether the correct solution would be to not increment the file count for invalid files, or to put dummy items into the dictionaries in place of invalid files (but keep their filenames in that dictionary for logging?). My workaround has been to clean up or delete the faulty image files, after which re-running the same operation will succeed.
The text was updated successfully, but these errors were encountered:
Thanks for creating this wonderful tool. I'm using it to deduplicate photo albums going back to my days with 35mm film. Simple file matching tools won't find photos that were accidentally scanned more than once.
I've found a probable bug, and I'm reporting the bug and a workaround.
In the file dif.py, the function _build_image_dictionaries() has this code at about line 182:
file_nums = [(i, valid_files[i]) for i in range(len(valid_files))]
Just after that there is logic that checks for invalid files and records those, but also adds valid files to the dictionaries. Invalid files are never added to the dictionaries, but the count is incremented.
The result of this is that there can be gaps in the file numbers. The build process works fine, but during the union search phase there will be a key error. When I first encountered this, I thought it must be a duplicate key, but it's actually a missing key.
I added some scaffold code to dump out the filename dictionary and the list of invalid files to an extra scratch log, and I found the numbering gaps.
It's not clear to me whether the correct solution would be to not increment the file count for invalid files, or to put dummy items into the dictionaries in place of invalid files (but keep their filenames in that dictionary for logging?). My workaround has been to clean up or delete the faulty image files, after which re-running the same operation will succeed.
The text was updated successfully, but these errors were encountered: