Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Key error during union search if any invalid files #96

Open
sna-scourtney opened this issue May 6, 2024 · 1 comment
Open

Key error during union search if any invalid files #96

sna-scourtney opened this issue May 6, 2024 · 1 comment

Comments

@sna-scourtney
Copy link

Thanks for creating this wonderful tool. I'm using it to deduplicate photo albums going back to my days with 35mm film. Simple file matching tools won't find photos that were accidentally scanned more than once.

I've found a probable bug, and I'm reporting the bug and a workaround.

In the file dif.py, the function _build_image_dictionaries() has this code at about line 182:

file_nums = [(i, valid_files[i]) for i in range(len(valid_files))]

Just after that there is logic that checks for invalid files and records those, but also adds valid files to the dictionaries. Invalid files are never added to the dictionaries, but the count is incremented.

The result of this is that there can be gaps in the file numbers. The build process works fine, but during the union search phase there will be a key error. When I first encountered this, I thought it must be a duplicate key, but it's actually a missing key.

I added some scaffold code to dump out the filename dictionary and the list of invalid files to an extra scratch log, and I found the numbering gaps.

It's not clear to me whether the correct solution would be to not increment the file count for invalid files, or to put dummy items into the dictionaries in place of invalid files (but keep their filenames in that dictionary for logging?). My workaround has been to clean up or delete the faulty image files, after which re-running the same operation will succeed.

@sideshot
Copy link

sideshot commented May 30, 2024

I had a KeyError because of a few bad files. I moved them out of the folder, and it is working now. They were all under 1kb.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants