Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add -printf formatting like GNU find #533

Open
avielsh opened this issue Feb 8, 2020 · 10 comments
Open

Add -printf formatting like GNU find #533

avielsh opened this issue Feb 8, 2020 · 10 comments
Labels

Comments

@avielsh
Copy link

avielsh commented Feb 8, 2020

Hi
Please add a -printf like feature to fd.
Currently, to get the file information I'm running:
fd --changed-within=10days -x ls -dl --time-style=long-iso "{}"
Which spawns ls for every result.

This is actually much slower than running:
find . -mtime 10 -printf "%TY-%Tm-%Td %TH:%TM %p\n"
(For example)

Test dataset is around 20,000 files. fd is running for ~21 seconds, find for ~10.
Of course when more results are found, fd method would become slower.
Thanks !

@avielsh avielsh changed the title add -printf like gnu find #feature add -printf like gnu find Feb 8, 2020
@avielsh avielsh changed the title add -printf like gnu find Add -printf formatting like GNU find Feb 8, 2020
@sharkdp
Copy link
Owner

sharkdp commented Feb 10, 2020

Currently, to get the file information I'm running:
fd --changed-within=10days -x ls -dl --time-style=long-iso "{}"
Which spawns ls for every result.

That's what -X/--exec-batch is for, which would spawn just a single ls process. This has the added advantage that the output columns are aligned (minor comment: the "{}" argument is superfluous and can be omitted). Would this work for you?

fd --changed-within=10days -X ls -dl --time-style=long-iso

This is actually much slower than running:
find . -mtime 10 -printf "%TY-%Tm-%Td %TH:%TM %p\n"

The equivalent command would use -mtime -10, right?

Test dataset is around 20,000 files. fd is running for ~21 seconds, find for ~10.

How do you perform your benchmarks? Do you account for disk caching effects?

@avielsh
Copy link
Author

avielsh commented Feb 10, 2020

That's what -X/--exec-batch is for, which would spawn just a single ls process.

My result set for that is too large (~20000) files. I'm getting:
[fd error]: Problem while executing command: Argument list too long (os error 7)
Maybe add a batch count argument to limit arguments per batch ?🤔

I actually tried to pipe to xargs with -P4 -n1500 for batch processing while maintaining the argument list for ls short enough, but the parallel execution in xargs causes all sorts of string issues in the output.

The equivalent command would use -mtime -10, right?

Yes my mistake :)

How do you perform your benchmarks? Do you account for disk caching effects?

Hyperfine is cool. I was using time.

Actually, my example was a simplified version of what I actually use, which is listing the whole file system. ( for instant searching like locate)

Here's my hyperfine result (gfind and gls are GNU versions of the commands)
(~150k files)

hyperfine --warmup 3 '\fd -E ''/Volumes/*'' -E ''/dev/*'' -E ''.git'' -IH --color=never . . -x gls -dl --time-style=long-iso' 'gfind . -type d \( -path /dev -o -path /Volumes \) -prune -o -printf "%TY-%Tm-%Td %TH:%TM %p\n"'

Screenshot_2020-02-11 00 56 59_EwcGaz

Sorry for the superfluous arguments (-E|-prune), I was copying the command from another search.

I actually tried to run this command on the whole file system (~2 million files) but after 55 minutes, I gave up.

@sharkdp
Copy link
Owner

sharkdp commented Feb 11, 2020

My result set for that is too large (~20000) files. I'm getting:
[fd error]: Problem while executing command: Argument list too long (os error 7)
Maybe add a batch count argument to limit arguments per batch ?thinking

Unfortunately, this is a known issue which should be fixed, see #410.

I actually tried to pipe to xargs with -P4 -n1500 for batch processing while maintaining the argument list for ls short enough, but the parallel execution in xargs causes all sorts of string issues in the output.

Are you sure that this is caused by the parallel execution? Or could it be related to file names with spaces? In the latter case, please try to use the -0 (zero) option for both fd and xargs:

fd --changed-within 10day -0 | xargs -P4 -n1500 -0 ls -dl --time-style=long-iso

This seems to work just fine for me (even with smaller -n arguments).

@avielsh
Copy link
Author

avielsh commented Feb 11, 2020

Are you sure that this is caused by the parallel execution? Or could it be related to file names with spaces? In the latter case, please try to use the -0 (zero) option for both fd and xargs:

Yes, this is a known issue. Here. Here also.

I am using -0 on fd and xargs

This seems to work just fine for me (even with smaller -n arguments).

-n isn't the issue, it's the combination with -P that causes the output to get garbled. I'm guessing it has something to do with the write buffer, I haven't completely understood it .
Try to output a large result set into a file and grep out the right pattern ( grep -v "^.\{10,11\} "), you'll see the issue.

This is not related to fd, I get the same effect when cating the file list into xargs -P0.

Anyway when using fd with xargs -n1500 , find is still getting better results:

hyperfine -i --warmup 3 '\fd -0 -IH . / | gxargs -0 -n1500 gls -dl --time-style=long-iso' 'gfind / -printf "%TY-%Tm-%Td %TH:%TM %p\n"'

Screenshot_2020-02-11 23 35 14_CyvudS

When I run xargs with -P , fd/xargs is about half the time of find but about 1000 files out of 2million are garbled so I cannot use it.
Note: I don't know why hyperfine is reporting only 1.15 difference , I ran it a few times with time and it was about half the time than find. Maybe something was running in the background...

hyperfine -i --warmup 3 '\fd -0 -IH . / | gxargs -P4 -0 -n1500 gls -dl --time-style=long-iso' 'gfind / -printf "%TY-%Tm-%Td %TH:%TM %p\n"'

Screenshot_2020-02-11 23 34 07_5wNvVa

I would think that adding -printf like option would overcome all this piping because my guess is that fd is already stating everyfile and piping out to another command would never be as fast as internal function (for data it already has).

@priyadarshan
Copy link

priyadarshan commented Feb 12, 2020

I would also find printf functionality quite useful and pertaining to fd's direct domain.

A use-case: Sometimes we need to sanitize filenames to port them from one platform to another. We find extreme cases, such as (not kidding),

''$'\n\n''    * courierblog'$'\n\n''Our history'$'\n''07-01-2009'$'\n''"The cat?" by Charles Perrault.webloc

mv is picky and we must deal with cases where -0 or xargs -n2 are not enough, by properly wrapping filenames.

@sharkdp sharkdp added idea and removed question labels May 25, 2020
@sharkdp sharkdp added question and removed idea labels Dec 1, 2020
@matu3ba
Copy link

matu3ba commented Mar 21, 2023

To my knowledge, there is currently no (convenient) way to print the list of files sorted by modification date as shown here https://stackoverflow.com/a/1405664: find . -type f -printf "%-.22T+ %M %n %-8u %-8g %8s %Tx %.8TX %p\n" | sort | cut -f 2- -d ' '.
One could default to iso8601 utc (adding Z for zero) with less formatting gibberish needed.

APPENDUM: fd -t x --changed-before now and DATE=$(date) && fd -t x --changed-before "$DATE" also does not work, so there is no convenient way to "just print stuff sorted by date", which is a bad user experience.

[fd error]: 'Tue 21 Mar 2023 11:06:00 AM CET' is not a valid date or duration. See 'fd --help'.

@cohml
Copy link

cohml commented Dec 5, 2023

Throwing another vote into the ring for -printf equivalent. It's a very general tool that can be used for so much more than -X ls can.

@tmccombs
Copy link
Collaborator

tmccombs commented Dec 5, 2023

I have made a PR that might help with some uses of this: #1043 but I haven't gotten around to benchmarking it yet.

@cohml
Copy link

cohml commented Dec 5, 2023

I have made a PR that might help with some uses of this: #1043 but I haven't gotten around to benchmarking it yet.

Unfortunately I don't know Rust so can't understand any of your changes. A little example of the core functionality you're adding would make a nice addition to the PR description. That way people like me can follow.

Regardless, if your changes bring about some -printf-iness, I'm all for it and hope you can get it merged soon!

@sergeevabc
Copy link

sergeevabc commented Jan 7, 2024

Windows 7 x64, fd 9.0.0.

I want to get a list of folders with .startup file in the alphabetical order.

Bypassing fd's birth traumas of having to specify the -H key to find .filenames and still not having --sort key to sort the output, I tried -printf "%h\n" to get paths without filenames, but this option is still not implemented as well. Aggrrhh!

Workaround using coreutils or busybox:

$ fd -H -g ".startup" -X coreutils dirname {} | coreutils sort
.\01 Portable\Autohotkey
.\01 Portable\DNSCrypt
.\01 Portable\Keepass

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants