Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When a literal ] appears in square brackets in a regular expression, base R functions find nothing within the range unless perl=TRUE (R for Data Science could mention this) #1629

Open
markpurver opened this issue Feb 8, 2024 · 0 comments

Comments

@markpurver
Copy link

Section 15.4.3 in R for Data Science (https://r4ds.hadley.nz/regexps.html#character-classes) says this about regular expressions:
\ escapes special characters, so [\^\-\]] matches ^, -, or ].
But this specific example does not seem to be true when using base R, unless perl=TRUE is chosen (I am using R 4.2.1).
The general issue of slight differences between base R and stringr is noted in section 15.7.2, but perhaps this particular quirk is worth mentioning in 15.4.3 as the example contains one of these differences.

For example:
grepl("[\\^\\-\\]]", "]")
returns FALSE.
And:
grepl("[\\^\\-\\]]", "^-]")
also returns FALSE, indicating that nothing in the range is found in the string.
But only the ] symbol appears to cause this. So:
grepl("[\\^\\-\\[]", "^-]")
returns TRUE, seemingly because the ] is not there (in this example it has been replaced by [ but it could just as well be replaced by nothing).

This issue seems to go away entirely when perl=TRUE is used, so:
grepl("[\\^\\-\\]]", "]", perl=TRUE)
and
grepl("[\\^\\-\\]]", "-", perl=TRUE)
both return TRUE.

Perhaps there could to be a note in the book to reflect this, or perhaps it is an issue with base R or the TRE engine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant