Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/shfmt: Files with non-shell extension excluded #944

Open
katexochen opened this issue Nov 17, 2022 · 7 comments
Open

cmd/shfmt: Files with non-shell extension excluded #944

katexochen opened this issue Nov 17, 2022 · 7 comments

Comments

@katexochen
Copy link

Hi, I'm using shfmt -f . to list files I want to format. However, many files aren't found as they don't have a .sh/.bash extension, but are named something like mkosi.finalize and I can't change the name as it is expected by some consumer. The files have, however, a valid shebang.

The files are excluded because of the following line:

case strings.IndexByte(name, '.') > 0:

Would be great to just omit it, and do the shebang check in any case.

@mvdan
Copy link
Owner

mvdan commented Nov 17, 2022

Interesting edge case. I did it this way for two reasons:

  • Being conservative. If there's a foo.go file which happens to have a valid shell shebang, should we format it? My intuition is that we should not.
  • Avoiding overhead. Determining if a file has a valid shebang requires opening it and reading a chunk of it. We currently only read shebangs for regular files with no extension, which aren't that common - most files have extensions. However, if we start reading shebangs for all regular files, no matter the extension, I imagine shfmt . (when walking directories) could easily get twice as slow due to the extra I/O.

I think your case is a bit special because .finalize is not really an extension from the looks of it. But I also can't imagine what a good heuristic would be. If the extension is longer than four characters, it's not really an extension?

@katexochen
Copy link
Author

If there's a foo.go file which happens to have a valid shell shebang

I don't think a valid Go file can have a valid shebang at the beginning, can it? And your shebang regex checks for the shebang at the beginning of a file, so this shouldn't be a problem.

Avoiding overhead.

Yeah, I see that point. There are situations where performance isn't that relevant, like in CI, where it would be okay to run a minute instead of a second. You could add a flag, but I understand that isn't a good solution either.

@mvdan
Copy link
Owner

mvdan commented Nov 17, 2022

I don't think a valid Go file can have a valid shebang at the beginning, can it? And your shebang regex checks for the shebang at the beginning of a file, so this shouldn't be a problem.

Right, in practice it's not really a Go file. But I would argue the file doesn't look like a shell file, either. The Go toolchain and IDEs would pick it up as a Go file, for example. That's what I mean by being conservative - we should likely not format files which don't appear to be shell files.

After all, it's always possible to format extra files if you need to (e.g. shfmt -w **/*.finalize), but it's harder to exclude formatting files from shfmt -w .. You would need to use shfmt -f, filter the list, and then feed it back.

Yeah, I see that point. There are situations where performance isn't that relevant, like in CI, where it would be okay to run a minute instead of a second. You could add a flag, but I understand that isn't a good solution either.

I'm not opposed to a new feature (like a new flag) as long as it's reasonable and commonly needed - I think this scenario is a bit too niche to warrant a flag. If I were you, assuming that these shell files with odd extensions are easy to find (either via **/*.finalize or finalizers/*), I would probably include them explicitly like: shfmt -w . **/*.finalize.

@mvdan
Copy link
Owner

mvdan commented Nov 17, 2022

Another option in terms of new flags would be to turn -f into a flag which also accepts "exhaustive" modes, like -f=anyextension. Though again it feels pretty niche, so I'm not a big fan unless there's good demand for it :)

@sellout
Copy link

sellout commented Jan 2, 2023

FWIW, my IDE (Emacs) does recognize a file named foo.go as a Bash file if it has a Bash shebang.

It seems, strictly speaking, that if a file has any executable bit set, then only the shebang should be checked, and if it has no executable bit set, then only the extension should be checked (since no shebang will have an effect). This matches my own usage, where I use .bash on files that are meant to be sourced, and no extension on files that are executable (and thus have a shebang). However, shfmt -f . already seems to check the shebang for files without an extension, so It Works For Me™️ as-is.

@mvdan
Copy link
Owner

mvdan commented Jan 2, 2023

I actually thought about using executable bits; regardless what a file's extension is, if it has an executable bit set, we could look for a shebang inside it. Since executable files are relatively rare in software development scenarios, it could be a nice trade-off conceptually.

However, it would slow down formatting directory trees, especially when there are many files. We use filepath.WalkDir, which only uses readdir to read the contents of directories - that gives us the name of each file and its type (regular file, directory, symlink, etc). It does not give us the permission bits; an extra lstat call per file is needed for that on Linux and Mac.

For more information, see golang/go#42027.

I'm still happy to entertain a flag to tell shfmt to spend extra CPU time statting, opening, and reading many more files than it usually would. But I definitely don't want to make it the default behavior, because it will be much slower when there are many non-shell files, which is common. Walking one directory with a thousand Go files would suddenly go from one syscall (readdir) to over a thousand (readdir + 1000*lstat), for example.

@sellout
Copy link

sellout commented Jan 2, 2023

Yeah, agreed on the default. Maybe a flag like --precise-identification with the description, “Check shebang lines in executable files to identify shell scripts more precisely. This can have a performance impact on larger file trees and is not helpful if all of your shell scripts have standard extensions.”

manu0x0 pushed a commit to isc-projects/bind9 that referenced this issue Oct 26, 2023
All changes in this commit were automated using the command:

  shfmt -w -i 2 -ci -bn . $(find . -name "*.sh.in")

By default, only *.sh and files without extension are checked, so
*.sh.in files have to be added additionally. (See mvdan/sh#944)
manu0x0 pushed a commit to isc-projects/bind9 that referenced this issue Oct 26, 2023
All changes in this commit were automated using the command:

shfmt -w -i 2 -ci -bn . $(find . -name "*.sh.in")

By default, only *.sh and files without extension are checked, so
*.sh.in files have to be added additionally. (See mvdan/sh#944)

(manually replayed commit 4cb8b13)
manu0x0 pushed a commit to isc-projects/bind9 that referenced this issue Oct 26, 2023
All changes in this commit were automated using the command:

  shfmt -w -i 2 -ci -bn bin/tests/system/ util/ $(find bin/tests/system/ -name "*.sh.in")

By default, only *.sh and files without extension are checked, so
*.sh.in files have to be added additionally. (See mvdan/sh#944)

(manually replayed commit 4cb8b13)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants