Key error during union search if any invalid files #96

sna-scourtney · 2024-05-06T16:28:37Z

Thanks for creating this wonderful tool. I'm using it to deduplicate photo albums going back to my days with 35mm film. Simple file matching tools won't find photos that were accidentally scanned more than once.

I've found a probable bug, and I'm reporting the bug and a workaround.

In the file dif.py, the function _build_image_dictionaries() has this code at about line 182:

file_nums = [(i, valid_files[i]) for i in range(len(valid_files))]

Just after that there is logic that checks for invalid files and records those, but also adds valid files to the dictionaries. Invalid files are never added to the dictionaries, but the count is incremented.

The result of this is that there can be gaps in the file numbers. The build process works fine, but during the union search phase there will be a key error. When I first encountered this, I thought it must be a duplicate key, but it's actually a missing key.

I added some scaffold code to dump out the filename dictionary and the list of invalid files to an extra scratch log, and I found the numbering gaps.

It's not clear to me whether the correct solution would be to not increment the file count for invalid files, or to put dummy items into the dictionaries in place of invalid files (but keep their filenames in that dictionary for logging?). My workaround has been to clean up or delete the faulty image files, after which re-running the same operation will succeed.

The text was updated successfully, but these errors were encountered:

sideshot · 2024-05-30T13:17:40Z

I had a KeyError because of a few bad files. I moved them out of the folder, and it is working now. They were all under 1kb.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Key error during union search if any invalid files #96

Key error during union search if any invalid files #96

sna-scourtney commented May 6, 2024

sideshot commented May 30, 2024 •

edited

Key error during union search if any invalid files #96

Key error during union search if any invalid files #96

Comments

sna-scourtney commented May 6, 2024

sideshot commented May 30, 2024 • edited

sideshot commented May 30, 2024 •

edited