-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
De-duplicate shared bytes as git does for texty files #293
Comments
@sebastianrath I was expecting snow-fs already had this |
SnowFS supports copy-on-write for certain file systems like APFS, but it does not yet have deduplication implemented in the application layer. Currently, the main reason for this is performance, as fragmentation in binaries can have a higher impact on CPU and I/O. For the first implementation of SnowFS speed had a higher priority over disk space. However, we are considering adding this as an opt-in option, as these impacts may not be relevant for every project. |
I'm here cheering for this to become an opt-in feature (personally ASAP but for y'all no pressure) |
Could you share some background info? What type of projects would that be beneficial to? How many files, and what are the overall file sizes? Thanks! |
To have an idea, I have tons of GB of screenshots both on mobile and on desktop. Imagine a screenshot of a notepad, where most of its pixels are white; so all of that could be dedupliced (for example, Windows start menu icon on these screenshots wouldn't be repeated). |
BTW I'm working at a new symlink daemon that will support to form a single file from shared objects. |
@sebastianrath do you know libraries that finds duplicate bytes on files and moves these duplicates into separate files? I would love if git natively had more than 1 object per file, so there wouldn't be "foo", "bar" and "foobar" objects but only "foo" and "bar". |
No description provided.
The text was updated successfully, but these errors were encountered: