Skip to content
This repository has been archived by the owner on May 31, 2023. It is now read-only.

Is there a sensible way of only processing modified files? #18

Open
Technifocal opened this issue May 31, 2017 · 1 comment
Open

Is there a sensible way of only processing modified files? #18

Technifocal opened this issue May 31, 2017 · 1 comment

Comments

@Technifocal
Copy link

Since this is a stream-based backup solution, is there any sensible way of only processing modified files? I.E. Files that have either changed in size or timestamp?

The issue as of current is that because everything is based on deduplication, and a computationally expensive deduplication at that (As before it's checksummed to see if the block is a duplicate, it has to be read off disk, compressed, split, paritied, etc...), it can take quite a lot of time to even locally process a large backup array to see that nothing has changed (A 20GB on HDDs can take ~3 and a half minutes, scaling that to TB scale backups can take hours, which is getting to an impractical length for continuous, hell, even daily backups).

One could obviously do some trickery with scripting to store the timestamp of the last backup date, then only backup files with a modification date newer than that, but then the issue is restoration would be an absolute pain the backside (You'd have to basically restore every backup in chronological order to get a full restoration).

I'm simply wondering:

  1. What was the decision to make this software stream-based?
  2. What is the best way to avoid having to read, process, compress, checksum, split, and parity every chunk just to discard it because it's not new?

Thanks.

@Roman2K
Copy link
Owner

Roman2K commented May 31, 2017

Hi Nicholas! I see what you mean. Indeed, scat isn't most optimized for file-based accesses (initial read, final write). It's very generic by nature on purpose. I wanted it that way, to be able to back up anything from any source, to anywhere. Independent of the concept of files (but compatible via file-packing programs like tar).

It wouldn't be impossible to make it file-specific but very contrived. It would take the form of a proc that decodes a tar stream and skips files based on mtimes and a previous index of paths<=>mtimes.

I think scat is just not adapted to your use. You would be better off with restic or the likes of it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants