Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--exclude doesn't work with absolute paths #851

Open
morganmay opened this issue Sep 12, 2021 · 10 comments · May be fixed by #1272
Open

--exclude doesn't work with absolute paths #851

morganmay opened this issue Sep 12, 2021 · 10 comments · May be fixed by #1272
Labels

Comments

@morganmay
Copy link

Describe the bug you encountered:

The examples for the --exclude or -E option imply that it should work with absolute paths (/mnt/external-drive is given as an example). However, it only seems to work with relative paths. For example, if I'm trying to exclude the directory /home/user/Library/:

fd -E Library pattern /home

works, as does

fd -E "*/Library/*" pattern /home

However,

fd -E /home/user/Library/ pattern /home

doesn't work (i.e. /home/user/Library/pattern.txt would still show up in search results ). Adding other options, such as -p or -a doesn't seem to affect this behavior.

The only way I've found to exclude absolute paths is to add them to ~/.config/fd/ignore, which is sowewhat inconvenient.

What version of fd are you using?

fd 8.2.1

Which operating system / distribution are you on?

Linux 5.14.2-arch1-2 x86_64
LSB Version:    1.4
Distributor ID: Arch
Description:    Arch Linux
Release:        rolling
Codename:       n/a
@morganmay morganmay added the bug label Sep 12, 2021
@alessandroasm
Copy link

Hi! I would like to work on this. :)

I'm gonna try to fix it and open a PR.

@alessandroasm
Copy link

alessandroasm commented Oct 5, 2021

The issue here is as follows: the exclude option works the same way .gitignore patterns work. This means that an absolute path is relative to the root of the git repo (which is the first search path in our case).

To fix this, we can check which exclude options are absolute and filter the results after crate ignore finds them. What do you think about this approach? @sharkdp

@andrejp88
Copy link

Just ran into this problem. Thanks for the tip, @alessandroasm. In my case I made the excluded paths relative to the root folder, and it worked. Perhaps the man page could be updated to note that the flag follows the same rules as ignore entries.

@d12bb
Copy link

d12bb commented Jul 22, 2022

Especially bad in combination with --follow, as ~/Library/Containers (Mac) contains thousands of symlinks to directories like ~/Pictures or ~/Music, which themselves can have tens of thousands of files in it. Blows up search results a lot, >8x time and >14x result count for me:

~
❯ fd --exclude Containers --follow |wc -l
 2657140

~ took 8s 
❯ fd --follow |wc -l
 38664223

~ took 1m10s 
❯ 

I don't really want to add plain Containers into my global ignore, as it's a name that may be used outside ~/Library (for, well, containers for example), which should not be excluded.

My current approach, for everyone wanting sth similar, is this rather granular global ignore, which allows me to find files living in these containers (sandboxed apps' documents) while not blowing up completely:

# Source:
~/Library/Containers 
❯ fd --type symlink |cut -d '/' -f 4 |sort |uniq

# $XDG_CONFIG_HOME/fd/ignore
Library/Containers/*/Data/Desktop
Library/Containers/*/Data/Downloads
Library/Containers/*/Data/Library
Library/Containers/*/Data/Movies
Library/Containers/*/Data/Music
Library/Containers/*/Data/Pictures

# Result:
~ 
❯ fd --follow |wc -l
 2702849

~ took 9s
❯ 

@cyqsimon
Copy link
Contributor

cyqsimon commented Aug 5, 2022

@alessandroasm any progress on this? If you've encountered any difficulty or cannot spare the time, I am willing and able to help.

@alessandroasm
Copy link

alessandroasm commented Oct 11, 2022 via email

@SoftwareApe
Copy link

I would have liked this feature, too. If it's a performance or compatibility concern we could have an --exclude-abs option, that would then do a check if it's a file in the current search directory.

@tmccombs
Copy link
Collaborator

It's more of a "the library we use for this doesn't really support this". So we would have to find a way to work around that, or stop using that library. See BurntSushi/ripgrep#2366

@cyqsimon
Copy link
Contributor

cyqsimon commented Feb 27, 2023

I finally have some time to come back to this issue.

From reading https://github.com/sharkdp/fd/blob/master/src/walk.rs, it seems like there is no good way to implement an "ignore by absolute path" mechanism within the confines of the ignore crate. And I think BurntSushi does make some good points in BurntSushi/ripgrep#2366 (comment) on why it's a "wont-fix", in particular the non-trivial performance impact such a feature will incur.

So considering the performance impact, would it make some sense to split "absolute ignore" into its own flag, and implement it independently of what's offered by ignore? Something like --exclude-absolute maybe (and the corresponding global config file ~/.config/fd/ignore-absolute)? And then in documentation we can inform the user very explicitly about the performance impact it entails.

As of the specific implementation, I imagine it won't be too difficult (if some performance penalty is acceptable). In fd::walk::spawn_senders, simply canonicalise the current path (which is where most of the penalty is going to come from), and then use globset to match. I'll make sure the canonicalization doesn't happen if the user hasn't specified anything via --ignore-absolute so that there's no performance regression if the user doesn't use this new functionality. Further optimisations are going to be much more difficult I think, but at least the option to use it is there.

I'll quickly put together a prototype to test. Any ideas/suggestions are welcomed!

@cyqsimon cyqsimon linked a pull request Mar 2, 2023 that will close this issue
3 tasks
@musjj
Copy link

musjj commented Apr 17, 2023

Another problem related to this is that --exclude seems to use some kind of fuzzy matching.
Given a directory like this:

.
├── directory
│   ├── exclude-me
│   └── just-some-file
└── exclude-me

There's no way to exclude just the exclude-me that is in the root directory:

❯ fd --exclude exclude-me
directory/
directory/just-some-file

EDIT: Never mind, my use case does not require any additional features. Pre-pending the pattern with a slash anchors it to the root directory:

❯ fd --exclude /exclude-me
directory/
directory/exclude-me
directory/just-some-file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants