New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correct Regular Expressions Behavior Related to Annex B #58320
base: main
Are you sure you want to change the base?
Correct Regular Expressions Behavior Related to Annex B #58320
Conversation
This PR doesn't have any linked issues. Please open an issue that references this PR. From there we can discuss and prioritise. |
src/compiler/scanner.ts
Outdated
@@ -3390,7 +3400,7 @@ export function createScanner(languageVersion: ScriptTarget, skipTrivia: boolean | |||
error(Diagnostics.Unicode_property_value_expressions_are_only_available_when_the_Unicode_u_flag_or_the_Unicode_Sets_v_flag_is_set, start, pos - start); | |||
} | |||
} | |||
else if (unicodeMode) { | |||
else if (!annexB) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Annex B, braces after p
actually should not be parsed at all, but it does provide helpful errors like #58275 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The outdated label is just because the comment is on an old revision of the PR and GitHub can't figure out where the comment goes after.
/\q\u\i\c\k\_\f\o\x\-\j\u\m\p\s/, | ||
|
||
!!! error TS1125: Hexadecimal digit expected. | ||
~~ | ||
!!! error TS1510: '\k' must be followed by a capturing group name enclosed in angle brackets. | ||
|
||
!!! error TS1125: Hexadecimal digit expected. | ||
|
||
!!! error TS1125: Hexadecimal digit expected. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are all valid in Annex B. Even /\u{1/
is. In Annex B essentially all weird things are valid, you know. Currently (after #58295) /[\1]/
is valid but /[\8]/
isn’t. I don’t think it’s ideal.
This will probably need to be rebased. |
@jakebailey is correct. #58339 also made some changes to this code, so a rebase or merge from |
I know, I just haven’t got the time to do so. It would be faster if you could do that for me (I will be back soon). |
error(Diagnostics.Numbers_out_of_order_in_quantifier, digitsStart, pos - digitsStart); | ||
} | ||
} | ||
else if (!min) { | ||
if (unicodeMode) { | ||
if (!annexB) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though it may be redundant, I think it might be better to still indicate unicodeMode
here so that someone editing this code in the future doesn't mistakenly think this only applies to non-Annex B code. It may be better to use unicodeMode || !annexB
and remove the if (unicodeMode) { annexB = false; }
at the top of scanRegularExpressionWorker
.
The same would go for other uses of annexB
as well.
@@ -2801,7 +2811,10 @@ export function createScanner(languageVersion: ScriptTarget, skipTrivia: boolean | |||
scanGroupName(/*isReference*/ true); | |||
scanExpectedChar(CharacterCodes.greaterThan); | |||
} | |||
else if (unicodeMode) { | |||
else { | |||
// This is actually allowed in Annex B if there are no named capturing groups in the regex, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we keep track of whether we encountered a (?<
during reScanSlashToken
and add an entry to the RegularExpressionFlags
enum? The spec passes NamedCaptureGroups
as a production parameter just as it does for UnicodeMode
and UnicodeSetsMode
, but only ever passes it as ~NamedCaptureGroups
in Annex B.
Follow-up of #58295
This fixes issues like the
'}' expected
cases given at #58275 (comment).