Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: make no-misleading-character-class report more granular errors #18082

Merged
merged 19 commits into from
Mar 5, 2024

Conversation

fasttime
Copy link
Member

@fasttime fasttime commented Feb 4, 2024

Prerequisites checklist

What is the purpose of this pull request? (put an "X" next to an item)

[ ] Documentation update
[ ] Bug fix (template)
[ ] New rule (template)
[X] Changes an existing rule (template)
[ ] Add autofix to a rule
[ ] Add a CLI option
[ ] Add something to the core
[ ] Other, please explain:

What rule do you want to change?

no-misleading-character-class

What change do you want to make (place an "X" next to just one item)?

[X] Generate more warnings
[ ] Generate fewer warnings
[ ] Implement autofix
[ ] Implement suggestions

How will the change be implemented (place an "X" next to just one item)?

[ ] A new option
[X] A new default behavior
[ ] Other

Please provide some example code that this change will affect:

/* eslint no-misleading-character-class: "error" */

RegExp("[ \\ufe0f][ \\ufe0f]");

RegExp(`"[👍]"`);

What does the rule currently do for this code?

Reports one problem per RegExp expression (Playground link).

What will the rule do after it's changed?

A granular problem will be reported for each problematic character class. For the first RegExp, two problems will be reported.

What changes did you make? (Give an overview)

This is an attempt to extend granular error reporting for the no-misleading-character-class rule to string and template literal patterns whose source code doesn't match the string value. This includes literals with escape sequences, line continuations, and unescaped CRLF line breaks in templates.

This change was suggested during the implementation of granular errors, but it was postponed to keep the work manageable: #17515 (comment).

Is there anything you'd like reviewers to focus on?

  • The new implementation only reports more problems on code that already has one problem with the current implementation, so I think it doesn't qualify as a breaking change, but I'd like others to confirm.
  • I'm using a self-written dependency to track characters back to their positions in source code. While I'm happy to continue maintaining that project on my own, I think it would be better to extract the core of the code to ESLint for licensing reasons. Alternatively, the repo and npm package could be transferred to the eslint org or to eslint-community.
  • The new implementation could be further extended to templates with interpolated expression like new RegExp(`[ \\ufe0f]${0}`), but I haven't explored that yet.

@eslint-github-bot eslint-github-bot bot added the feature This change adds a new feature to ESLint label Feb 4, 2024
Copy link

netlify bot commented Feb 4, 2024

Deploy Preview for docs-eslint canceled.

Name Link
🔨 Latest commit 64813fb
🔍 Latest deploy log https://app.netlify.com/sites/docs-eslint/deploys/65e5666e9c6e36000853e830

@fasttime fasttime added the rule Relates to ESLint's core rules label Feb 4, 2024
@fasttime fasttime marked this pull request as ready for review February 5, 2024 15:51
@fasttime fasttime requested a review from a team as a code owner February 5, 2024 15:51
Copy link
Member

@nzakas nzakas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to produce more errors here and not mark it as breaking. It seems more like a bug fix to me.

default:
return null;
// Only literals and expression-less templates generate granular errors.
if (!(node.type === "TemplateLiteral" && !node.expressions.length || node.type === "Literal")) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finding this a bit difficult to understand, maybe this?

Suggested change
if (!(node.type === "TemplateLiteral" && !node.expressions.length || node.type === "Literal")) {
const isTemplateWithoutExpressions = node.type === "TemplateLiteral" && node.expressions.length === 0;
if (!isTemplateWithoutExpressions && node.type !== "Literal")) {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Funny, I was going to suggest if (node.type !== "TemplateLiteral" || (node.type === "Literal" && node.expressions.length)) {. No preference between the two from me. 🙂

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I realized that that check was always falsy, so I just removed it. It's because strings and templates without expressions always produce granular reports now, and so the default case isn't hit any more. Somehow I missed that while testing.

package.json Outdated
@@ -72,6 +72,7 @@
"@nodelib/fs.walk": "^1.2.8",
"ajv": "^6.12.4",
"chalk": "^4.0.0",
"char-source": "^0.0.0",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, since you're the author of this package, is there any reason it's just not included in this PR as a utility?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that that package contains logic that isn't needed in ESLint. The whole validation part for one, including 50% of the unit tests, is not necessary because the syntax of the arguments can be assumed to be valid since it was checked by the parser. There's also additional information about used features (for example, to tell if a string literal is valid in strict mode or not) which seems superfluous unless we expect to use it in ESLint somewhere later.

So if we are going to extract a utility from that package we should probably do some refactoring to remove the unnecessary logic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, let's remove the unnecessary code, of course. As it seems like the package was created just for this purpose (?), I'm assuming that's not an issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 53e262a, thanks!

Copy link
Contributor

@JoshuaKGoldberg JoshuaKGoldberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a great start! 👏 on going through the effort of making the parser. Looks from the files on GitHub like it was a lot of tricky work.

Since you mentioned bringing the code inline, requesting changes on that before doing a deeper review. Also I think I might have missed something with the charInfos caching?

}]
},

/* eslint-disable lines-around-comment -- see https://github.com/eslint/eslint/issues/18081 */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Question] Why not put the comment outside the code? It's not necessary for the test, so I'd think that preferable even ignoring the lines-around-comment shenanigan.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, that's not for the comment inside the code, it's this comment that is producing errors now. I though that the comments in the code would make it easier to locate a test in the source code given its title, because the code as it's calculated here doesn't match what is printed in the console.

lib/rules/no-misleading-character-class.js Outdated Show resolved Hide resolved
default:
return null;
// Only literals and expression-less templates generate granular errors.
if (!(node.type === "TemplateLiteral" && !node.expressions.length || node.type === "Literal")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Funny, I was going to suggest if (node.type !== "TemplateLiteral" || (node.type === "Literal" && node.expressions.length)) {. No preference between the two from me. 🙂

lib/rules/no-misleading-character-class.js Show resolved Hide resolved
* @returns {Generator<CodeUnit>} Zero, one or two `CodeUnit` elements.
*/
function *mapEscapeSequenceOrLineContinuation(reader) {
const start = reader.pos++;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little unclear here...are you intentionally moving the reader position by one? Or did you intend for const start = reader.pos + 1?

Either way, I think this line needs reworking to make its intention clear.

/**
* An object used to keep track of the position in a source text where the next characters will be read.
*/
class SourceReader {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might rename this as TextSource. SourceReader implies that there's a read() method or something that this object is supposed to be doing, but it's just a data structure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback. I've added a read() method in 448d9a9 to read characters relative to the current position. So it's no longer necessary to store the position in a variable before accessing the source like it was done before. And I'm using += for relative position changes. I think this makes the code a little clearer but if that's not the case we can revert to how it was done previously and just address the suggestions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the read()!

SourceReader is still a little ambiguous IMO. Maybe, CharsReader? CharactersReader? TextReader? Not a blocker from my end.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I don't have a preference for the name, so pick one that sounds better to you.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd vote for TextReader then, to align with #18082 (comment).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed in 072f256.

@@ -32,6 +32,12 @@ class SourceReader {
this.source = source;
this.pos = 0;
}

read(offset = 0, length = 1) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some JSDoc here?


reader.pos = posAfterBackslash + octalStr.length;
reader.pos += octalStr.length - 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because you refactored to include a read() method in SourceReader, it might make things clearer if you also created a advance() method that moves pos rather than augmenting the value like this. So this code would become:

Suggested change
reader.pos += octalStr.length - 1;
reader.advance(octalStr.length - 1);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea 👍🏻

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All done in b1cf05f.

@fasttime fasttime requested a review from nzakas February 15, 2024 08:28
Copy link
Member

@nzakas nzakas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Would like @JoshuaKGoldberg to review his changes before merging.

@mdjermanovic mdjermanovic added the accepted There is consensus among the team that this change meets the criteria for inclusion label Feb 15, 2024
codeUnits ??= parseStringLiteral(source);
start = offset + codeUnits[firstIndex].start;
end = offset + codeUnits[lastIndex].end;
} else { // RegExp Literal
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else can be anything, for example:

image

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 7bb7e55.

let start;
let end;

if (node.type === "TemplateLiteral") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should check if the template literal has expressions. For example, this crashes:

/* eslint no-misleading-character-class: "error" */

new RegExp(`${"[👍]"}`);
TypeError: Cannot read properties of undefined (reading 'start')
Occurred while linting C:\projects\eslint\foo.js:3
Rule: "no-misleading-character-class"
    at C:\projects\eslint\lib\rules\no-misleading-character-class.js:294:64

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct! Those cases should be covered by unit tests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 7bb7e55.

@fasttime
Copy link
Member Author

This could be a bug in the current implementation:

RegExp(/[👍]/u);

Unexpected surrogate pair in character class. Use 'u' flag. (no-misleading-character-class)

The rule complains about a missing 'u' flag which is actually there:

console.log(RegExp(/[👍]/u).flags);
// "u"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted There is consensus among the team that this change meets the criteria for inclusion feature This change adds a new feature to ESLint rule Relates to ESLint's core rules
Projects
Status: Complete
Development

Successfully merging this pull request may close these issues.

None yet

4 participants