Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Operations on capture groups #172

Open
alecandido opened this issue Nov 16, 2022 · 5 comments
Open

Operations on capture groups #172

alecandido opened this issue Nov 16, 2022 · 5 comments
Labels
C-enhancement Category: New feature or request

Comments

@alecandido
Copy link

alecandido commented Nov 16, 2022

I'd like to have some flexibility about how to use the captured text.
My specific use case is about case: I'd like to lower case the matched text.

Possible with sed:
https://stackoverflow.com/a/1814396/8653979

@Nate-Wilkins
Copy link

Nate-Wilkins commented Jan 1, 2023

Looking through the issues specifically for something like this.

But my case is a bit different. When I think "operation" I think any shell command with input that generates an output. Whereas what sed can do looks like it only supports specifically set operations.

In my project I'm probably going to implement this with ripgrep and current sd functionality though.

@alecandido
Copy link
Author

alecandido commented Jan 1, 2023

In my project I'm probably going to implement this with ripgrep and current sd functionality though.

The how would also be interesting, but I expect that it will involve matching, piping, transforming, and finally replacing. While this is always good to know, the original intention was to request support for basic operations, in order to make them possible with a single simple sd invocation.

P.S.: when the operation becomes that complex

.. matching, piping, transforming, and finally replacing

I prefer to write a full-fledged program, that can be a Rust one if efficiency is required (using regex is the programmatic alternative to sd), or even with a simple Python script (at least I have full and structured access to the match object and location, without parsing strings)

@Nate-Wilkins
Copy link

Nate-Wilkins commented Jan 1, 2023

I prefer to write a full-fledged program, that can be a Rust one if efficiency is required (using regex is the programmatic alternative to sd), or even with a simple Python script (at least I have full and structured access to the match object and location, without parsing strings)

I think that's fair but I do like operating in the shell to reduce dependencies :)

@stellarpower
Copy link

Personally, I quite like how much you can do with jq. The syntax is terse, if a bit hard to remember - not unlike sed or other "original" tools really. But you can do quite a lot quickly if you can google or get use to it.

I reckon some commands in the replace expression might work well with sd's approach. As we have the $1 syntax for replacing capture groups, I'll use the syntax ${command} for a second to represent these commands:

sd '^.*,[:space:]*,.*$' '${delete}' - delete any lines from a CSV file with empty elements (represented by two commas)
sd 'SIG[:alpha:]+ received' '${keep}' - delete any lines that don't contain a nasty error
sd "(hello|hi|g'day|buenos dias)" '${1.upcase}!' - make a greeting a bit more shouty

Sure there are more examples, but I am too busy as I type to think of some.

Where sed uses a different command represented by a letter (e.g. d/.../), I am wondering if we represent some simple operations (using the dollar to escape, as this should help compatibility since it's already a special character), in the second argument, whether this adds more possibilities to use sd, without cluttering it too far. The line deletion is a real thing I have come across; I think without being able to remove lines I have just had to use something else in the shell to cut lines that are completely empty, after using sd to do the original replacement. I like the cleanliness, regexes are bad enough as it is without the extra stuff on top, but also would like to be able to use it to do more things.

@alecandido
Copy link
Author

alecandido commented Jan 17, 2023

@stellarpower thanks for the syntax proposal, I like the idea of the command inside some delimiters. I would always keep the group identifier, and use a different separator, like ${0:upper}, ${1:lower}, or ${dollars:rev} (to mirror the order of character, I would suggested mirror for the sake of clarity, but rev is already a shell command performing this job).

However, to delete and keep lines you don't need a specific command: you just have to use multi-line mode [with the m flag] (you can pass with sd -f m), match the object you want to remove, and replace it with ''.

E.g.:

# delete
sd --flags m '^.*,[:space:]*,.*$\n' '' your-file.txt
# keep
# not really working with sd, because the `regex` crate does not support look around
sd --flags m '^.*(?!SIG[[:alpha:]]+ received.*$\n' '' your-file.txt

For more information about the look around you can have a look at this SO answer and the suggested replacement http://www.formauri.es/personal/pgimeno/misc/non-match-regex/?word=foo

However, a practical answer is that, if you want to operate line-wise, just use ripgrep:

# delete
rg -v ',[:space:]*,' your-file.txt > new-file.txt
# keep
rg 'SIG[[:alpha:]]+ received' your-file.txt > new-file.txt

It is based on the same regex crate of sd, and it has been created exactly for this task (not file editing, but line-wise selection).

@CosmicHorrorDev CosmicHorrorDev added the M-needs triage Meta: Maintainer label me! label May 17, 2023
@CosmicHorrorDev CosmicHorrorDev added C-enhancement Category: New feature or request and removed M-needs triage Meta: Maintainer label me! labels Oct 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category: New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants