Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cp: keep handling the rest of the paths if one does not exists #6208

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

granquet
Copy link
Contributor

@granquet granquet commented Apr 9, 2024

Quick attempt at fixing issue #6177

I'm implementing a custom type for the value parser so that I have the option to not fail early if the path does not exists and I let the operations continue "as if" everything is fine.

Not really happy with the solution as the "bad" path is still in the array of paths and is checked a second time... though I don't have a better idea yet.

@tertsdiepraam
Copy link
Member

Interesting! I think you're on the right track by looking into the value parser, but I think the problematic part in the current code is the PathBuf value parser, which requires a non-empty value. We can use the OsString value parser instead to fix that. Then we can filter out the files later if we're still not compatible.

Copy link

github-actions bot commented Apr 9, 2024

GNU testsuite comparison:

GNU test failed: tests/cp/abuse. tests/cp/abuse is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/cp/fail-perm. tests/cp/fail-perm is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/cp/thru-dangling. tests/cp/thru-dangling is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/mv/trailing-slash. tests/mv/trailing-slash is passing on 'main'. Maybe you have to rebase?

@granquet
Copy link
Contributor Author

granquet commented Apr 9, 2024

@tertsdiepraam : makes more sense to do it the way you describe it, will make the change ASAP.

btw, gnu cp accepts both empty strings and non existant files. both produces the same error and other files are handled.

@granquet
Copy link
Contributor Author

I've updated to use an OsString instead of a custom struct in the parsing and then filter out the non existant paths in the parse_path_args.

A quick test here with empty filepath and non existant files.
Compared to cp (GNU coreutils) 9.4

➜  coreutils git:(d000b88d3) ✗ mkdir test_cp
➜  coreutils git:(d000b88d3) ✗ touch exists
➜  coreutils git:(d000b88d3) ✗ ls test_cp
exists
➜  coreutils git:(d000b88d3) ✗ cp "" noent exists noent2 test_cp
cp: cannot stat '': No such file or directory
cp: cannot stat 'noent': No such file or directory
cp: cannot stat 'noent2': No such file or directory
➜  coreutils git:(d000b88d3) ✗ echo $?
1


➜  coreutils git:(d000b88d3) ✗ rm test_cp/exists
➜  coreutils git:(d000b88d3) ✗ target/debug/coreutils cp "" noent exists noent2 test_cp
cp: cannot stat '': No such file or directory
cp: cannot stat 'noent': No such file or directory
cp: cannot stat 'noent2': No such file or directory
➜  coreutils git:(d000b88d3) ✗ echo $?
1
➜  coreutils git:(d000b88d3) ✗ ls test_cp
exists

Copy link

GNU testsuite comparison:

GNU test failed: tests/cp/abuse. tests/cp/abuse is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/cp/link-no-deref. tests/cp/link-no-deref is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/cp/r-vs-symlink. tests/cp/r-vs-symlink is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/cp/slink-2-slink. tests/cp/slink-2-slink is passing on 'main'. Maybe you have to rebase?
Skip an intermittent issue tests/tail/inotify-dir-recreate (fails in this run but passes in the 'main' branch)
GNU test failed: tests/timeout/timeout. tests/timeout/timeout is passing on 'main'. Maybe you have to rebase?
GNU test error: tests/cp/preserve-slink-time. tests/cp/preserve-slink-time is passing on 'main'. Maybe you have to rebase?

@BenWiederhake
Copy link
Collaborator

Just a few points, maybe you're already aware, but just in case:

  • The goal is compatibility with GNU coreutils 9.5 (typically "the newest version", although I'm sure there will be exceptions to that made-up rule). Can you compile GNU coreutils 9.5 and sanity-check that the behavior is as intended?
  • Are the GNU test errors related? Especially tests/cp/abuse sounds like a regression.

@granquet
Copy link
Contributor Author

@BenWiederhake : thanks for your feedback, I've updated to coreutils 9.5 and I'm seeing the same behaviour.
For the tests/cp/abuse issue, yes this is a regression. I need to look into it.

Seems I'm catching some of the test cases in parse_path_arg instead of clap or later in the copy code which produces a different.
i.e:

[2024-04-10 14:50:49] <cp: cannot stat 'nonexistent_file.txt': No such file or directory
[2024-04-10 14:50:49] >cp: cannot stat 'nonexistent_file.txt': No such file or directory (os error 2)

I'll double check and fix the regressions before removing the draft status.

@granquet granquet force-pushed the cp_fix_empty_str branch 2 times, most recently from 8a2a1c3 to 87facf6 Compare April 19, 2024 12:43
Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/tail/inotify-dir-recreate (passes in this run but fails in the 'main' branch)

@granquet granquet force-pushed the cp_fix_empty_str branch 2 times, most recently from f604765 to 5d85b34 Compare April 19, 2024 13:22
@granquet granquet marked this pull request as ready for review April 19, 2024 14:28
Copy link
Collaborator

@BenWiederhake BenWiederhake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't try to predict whether an error is about to happen; this approach is always trouble. Instead, let's handle the error if and when it happens. This has the added advantage that we will automatically have the correct "continue"-behavior for other types of errors.

Ok((paths, target))
// If the path exists and is valid keep it in the argument list.
// Exception for symlinks as they are a bit more tricky to handle
// and there's logic further down the execution to handle them.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's logic further down

So this code doesn't actually handle the error as it happens, but rather tries to predict whether an error is about to happen? I don't like it; this type of approach is inevitably fragile: What if we check a file now, but later it appears? When copying several gigabytes on a slow disk, this might be a window of minutes! There are several options that change behavior around symlinks: Either this code results in the wrong error message for those, or wrong behavior, or both. If there are any other problems when copying, I get the impression that GNU cp would continue with the next file, and uutils would still abort; i.e. the same issue that this PR is trying to fix, but in a slightly different flavor.

No. Instead, let's handle the error if and when it happens. This should also have the advantage of reducing the TOCTOU-timespan, and guaranteed robustness: There cannot be false-positive or false-negatives when there is no "prediction" involved.

tests/by-util/test_cp.rs Show resolved Hide resolved
@granquet
Copy link
Contributor Author

@BenWiederhake : Thanks for the feedback.
I've pushed a new version where I'm checking the existence of the path closer to its actual usage.
It should reduce the TOCTOU-timespan significantly. what do you think?

Copy link
Collaborator

@BenWiederhake BenWiederhake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding a test!

  1. However, the basic approach is still wrong: Don't try to predict whether an error is about to happen, because this will always be inaccurate. Instead, try to execute the action, and then handle the error if and when it happens.

  2. Your PR is based on a somewhat old version of uutils, and cannot be easily rebased. I suggest fixing this first, because it might solve some of your troubles, and cannot cause any new work that you wouldn't have otherwise.

  3. For extra bonus points, you might even add a test to demonstrate that a certain error type (or scenario that prevents cp from trying to copy at all) has "priority". Note that I just discovered three bugs while doing so, so this might be hard or impossible, but should be done eventually. It's up to you to decide whether this is in-scope or out-of-scope for this PR. If you do, please run all your tests also against GNU 9.5.

@@ -1200,6 +1201,12 @@ pub fn copy(sources: &[PathBuf], target: &Path, options: &Options) -> CopyResult
if seen_sources.contains(source) {
// FIXME: compare sources by the actual file they point to, not their path. (e.g. dir/file == dir/../dir/file in most cases)
show_warning!("source file {} specified more than once", source.quote());
} else if !source.exists() && !source.is_symlink() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still checks a particular condition in advance, instead of handling the error if and when it actually happens.

This type of thinking ("oh I'll just detect this particular scenario beforehand!") can lead to weird inconsistencies that are hard or even impossible to fix. For example, there might be a hypothetical flag where copy attempts are avoided if a particular condition is met (i.e. the filename contains a number that is divisible by 257 or whatever), and suddenly this check would break cp in really weird ways.

In fact, pretty much this silly thing is happening here! Your PR is based on a very old version with a particular bug that has already been fixed. With that bug, cp -i -u is a no-op, and will not attempt to copy anything, so a missing source file is no problem. Because it doesn't even try to copy the source file, there should be no stat, and no error message – but your code introduces an error message that claims something very particular that never happened, misleading the user!

I can't give a more realistic example, because it turns out that most of the paths are buggy; I'll create some issues shortly.

I'm not absolutely sure when we stat the source, but I think it's this chunk in fn copyfile:

    let source_metadata = {
        let result = if options.dereference(source_in_command_line) {
            fs::metadata(source)
        } else {
            fs::symlink_metadata(source)
        };
        result.context(context)?
    };

If that assumption is true (and please check this assumption, I might be totally wrong!), then this is where we should raise the error. Not before, not after.

@sylvestre
Copy link
Sponsor Contributor

ping ? :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants