Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move to client side hashing #1553

Open
jrasm91 opened this issue Feb 5, 2023 · 13 comments · May be fixed by #9306
Open

Move to client side hashing #1553

jrasm91 opened this issue Feb 5, 2023 · 13 comments · May be fixed by #9306
Labels

Comments

@jrasm91
Copy link
Contributor

jrasm91 commented Feb 5, 2023

Feature detail

Before upload, compute a client side hash in the mobile app and use that (eventually in combination with #731) to determine if an asset should be uploaded.

Platform

Mobile App

@mike-lloyd03
Copy link

As an alternative to a simple SHA hash of the file, I suggest using an algorithm which allows to detect for near duplicate photos: photos that are visually identical but would result in different hashes as a result of compression, resizing, or filetype conversion. I've used this library written in Go to create a duplicate image finder which could pick up duplicates between originals on my iPhone and those which had been through Google Photos' compression algo. Photoprism is also using this library to detect duplicates.

However, I'm not totally sure how something like this would be used to prevent the client from uploading an existing duplicate to the server as it doesn't generate a unique artifact like hashing does. But I figured it was worth mentioning while this is being considered.

@bo0tzz
Copy link
Member

bo0tzz commented Feb 5, 2023

@mike-lloyd03 if we implement similarity detection that will be server side only. The current implementation is hash only. You can track #644 if interested in the fuzzy deduplication.

@ikaruswill
Copy link

ikaruswill commented Mar 9, 2023

As mentioned by @bo0tzz, I believe we should scope this issue to only duplicate detection rather than similar photo detection as it is a more foundational functionality of backup and sync.

Scenario
The main scenario is when the phone is reinitialized, the Immich app loses its sync state and recognizes all photos as new photos.

  • This incurs huge and superfluous data transfer in the deduplication process as it has to upload all assets to the remote server for hash computation.
  • On the server side, a large amount of unnecessary CPU cycles/memory are also expended in handling the download of the entire duplicated photo library.

Why this deserves priority

  • Reinitialization of a mobile device is not a frequent event, but being able to synchronize state effectively is arguably is the most important part of a backup/sync application.
  • The upload of the library is without a doubt, much more battery intensive than computing the hash of all images locally on the device.
  • Since Immich is self-hosted, we're not just expending battery on the mobile device, but also server compute power, so the impact is more significant to users than a hosted service.

Own context

  • I have 56GB of photos on my mobile and recently reinitialized the phone, and now the Immich app has to upload all 56GB of photos for its state to be in sync with the Immich server

@nijhawank
Copy link

+1 for client side hashing + deduplicating similar (not identical) photos. Similar looking photos could be collapsed into a single one (similar to how a burst photo is shown) on iOS

@smnhdy
Copy link

smnhdy commented Oct 12, 2023

Is there any update on this FR? This is a blocking point ofr my iOS device users, as they have 50k+ photos in their iclouds, and immich is trying to upload everything every time it's installed. This is after i manually imported all photos via CLI.

@athornfam2
Copy link

athornfam2 commented Oct 13, 2023

Growing library of multiple family members with 10K photos at least combined. Hoping this comes out soon for iPhone users.

@sgloutnikov
Copy link

Noticed today on a fresh iOS install that the application properly detected photos that were already uploaded to the server, and the cloud checkmark appeared in the corner of the photos. That however didn't change the files to be sent to the server and the mobile application wanted to upload the full library to the server.

@DX37
Copy link

DX37 commented Nov 18, 2023

Noticed today on a fresh iOS install that the application properly detected photos that were already uploaded to the server, and the cloud checkmark appeared in the corner of the photos. That however didn't change the files to be sent to the server and the mobile application wanted to upload the full library to the server.

Same thing on Android.

@smnhdy
Copy link

smnhdy commented Nov 18, 2023

Strange... not the experience I get.

I reset my iPhone this week, and did a fresh install of 1.86 and it's now trying to upload all50k photos...

@DX37
Copy link

DX37 commented Nov 18, 2023

Strange... not the experience I get.

I reset my iPhone this week, and did a fresh install of 1.86 and it's now trying to upload all50k photos...

Check for Duplicated Assets in Immich settings (local storage, I guess). The number of assets maybe growing while uploading...

@smnhdy
Copy link

smnhdy commented Dec 29, 2023

I don't see any comments that this feature is on the roadmap at all.. is there any official word on this?

@bo0tzz
Copy link
Member

bo0tzz commented Dec 29, 2023

This is definitely planned, we just don't have that many people working on the mobile app.

@p7996619
Copy link

I'd like to add that it would probably also be better performance-wise to switch to a more modern hash algorithm that can run in parallel, e.g. BLAKE3 or XXH3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: 🚧 Tasks
11 participants