Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data loss with file Hardlinks on NTFS when overwriting files #715

Open
fitdev opened this issue Jul 19, 2023 · 6 comments
Open

Data loss with file Hardlinks on NTFS when overwriting files #715

fitdev opened this issue Jul 19, 2023 · 6 comments
Labels

Comments

@fitdev
Copy link

fitdev commented Jul 19, 2023

Far Manager version

3.0.6074.0 x64

OS version

10.0.22621

Other software

No response

Steps to reproduce

  1. Create a few hardlinks to the same non-zero-length file in different dirs on the same drive.
  2. Now overwrite one of the files by extracting an identically named file of zero size from 7z archive in FAR.

Expected behavior

The overwritten files should now be of zero size (and contain metadata of that file from the archive). Most importantly, it should not have any hardlinks any more. All the other hardlinks to the original file should be unaffected.

Actual behavior

The file does get overwritten - kind of. That is the file is now zero length, however it still has hardlinks. Moreover for some time (until you try openning the file) the other hardlinks report non-zero size, however eventually the new zero-size file seems to trickle down and soon all hardlinks are of zero size.

This is very serious behavior as it could easily lead to data loss due to inconsistent result.

The data loss occurs that once such zero-length file is then deleted, the other hardlinks (to which it was still mistakenly "linked") now "get synchronized" and all report 0 size - leading to data loss!

I have confirmed that the same behavior occurs even with regular File Copy with Overwrite routine (so it does not have to be extraction from archive).

Remarks

  • Not sure if overwriting with zero size as opposed to non-zero size is important
  • Also not sure whether the fact that the zero-size file is being extracted from the archive with overwrite option (as opposed to just being copied for example from another dir) is important either
@fitdev fitdev added the bug label Jul 19, 2023
@fitdev fitdev changed the title Inconsistent results with file Hardlinks on NTFS when overwriting files from archive Inconsistent results with file Hardlinks on NTFS when overwriting files from archive possibly leading to data loss Jul 19, 2023
@fitdev fitdev changed the title Inconsistent results with file Hardlinks on NTFS when overwriting files from archive possibly leading to data loss Inconsistent results with file Hardlinks on NTFS when overwriting files with zero-length files leading to data loss Jul 19, 2023
@fitdev fitdev changed the title Inconsistent results with file Hardlinks on NTFS when overwriting files with zero-length files leading to data loss Data loss with file Hardlinks on NTFS when overwriting files with zero-length files Jul 19, 2023
@alabuzhev
Copy link
Contributor

Most importantly, it should not have any hardlinks any more. All the other hardlinks to the original file should be unaffected.

This is a somewhat grey area.
We do not know why the user created a hardlink to the target file.
Maybe it was just a space/time optimization and the files are logically independent.
Maybe it is a part of an elaborate setup and they expect the data to be updated everywhere.

If we think of the "overwrite" action as "remove the existing file and create a new one from scratch" then yes, the connection should be broken.
If we think of it as "empty the existing file and write a new data stream into it", then the connection should stay.

@fitdev
Copy link
Author

fitdev commented Jul 19, 2023

I agree. However I have always thought that for the purposes of file ops it is "remove the existing file and create a new one from scratch", whereas for opening / editing a file with a tool, it would be "empty the existing file and write a new data stream into it". So FAR's copy/extract routine is definitely targeting the first case and thus it seemed natural to me it should behave accordingly.

Maybe it was just a space/time optimization and the files are logically independent.

This was precisely my case.

Perhaps, therefore there can be additional option or a set of options related to links processing in general added to account for such cases to copy, move, extract dialogs, since unwittingly one can cause data loss by accident assuming one behavior when in fact the behavior is different.

Similar argument I think should also extend into symbolic links (I am not sure of FAR's current's behavior in this regard) and there are quite a few extra cases possible:

  • Replace one symlink with the source symlink
  • Replace one symlink with the source symlink's target / physical file
  • Replace symlink's immediate target with the source symlink
  • Replace symlink's immediate target with the source symlink's target / physical file
  • Replace symlink's final target with the source symlink
  • Replace symlink's final target with the source symlink's target / physical file

Personally I think only the first 2 options make practical sense without gotchas - first option being the default, while the second option applying when Copy symlink's contents flag is selected.

@fitdev
Copy link
Author

fitdev commented Jul 20, 2023

In fact perhaps several link-related options could be consolidated under either a new button (accessible similar to filter for example) or put directly on the Copy/Move/Extract dialogs:

@alabuzhev
Copy link
Contributor

Alternatively, we probably can show a confirmation like "file exists and has hardlinks, what to do? break link / update all".

@fitdev
Copy link
Author

fitdev commented Jul 20, 2023

Alternatively, we probably can show a confirmation like "file exists and has hardlinks, what to do? break link / update all".

Yes, probably. But the issue is with multiple files (when you need to overwrite multiple files, some of which have hard links) and if you also want to remember the answer, such that for example files with hardlinks should all be overwritten, while files without hardlinks should be skipped. With a single confirmation dialog that may be difficult to implement.

So, perhaps a separate, additional or seperate confirmation dialog should be displayed in the case of hardlinks overwriting (independent from regular file overwriting dialog). And so, if both kinbds of files are encountered - with and without hardlinks, then 2 overwrite prompts should be shown - the regular one for normal files (a it happens currently) and a new one for hardlinks.

@fitdev fitdev changed the title Data loss with file Hardlinks on NTFS when overwriting files with zero-length files Data loss with file Hardlinks on NTFS when overwriting files Jul 21, 2023
@xparq
Copy link
Contributor

xparq commented Nov 7, 2023

... I have always thought that for the purposes of file ops it is "remove the existing file and create a new one from scratch", whereas for opening / editing a file with a tool, it would be "empty the existing file ... "

Sounds like a sound principle, file extraction should probably do a generic replace for overwriting, and shouldn't try to be clever.

If someone sets up space-saving links manually, and wants it to be robust, symlinks should be used instead (despite the nuisances on Windows): they are much better suited for human use. If those links were generated by a tool (I tend to think hardlinks should generally be, as an internal implementation detail), then it's likely easy to regenerate, too, or shouldn't have been tampered with in the first place. (The risk of breaking such a setup by manually unpacking files over it, IOW tampering the guts of a (tool-managed) setup, already falls into the "warranty void" category.)

Alternatively, we probably can show a confirmation like "file exists and has hardlinks, what to do?

I'm not sure if telling whether a file has hardlinks is actually always cheap/free. On some filesystems it isn't even really viable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants