Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add replaceSeparatorsForGlob utility #142

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

feat: add replaceSeparatorsForGlob utility #142

wants to merge 1 commit into from

Conversation

SimenB
Copy link

@SimenB SimenB commented Jan 14, 2019

Description

We've had some issues in Jest with paths using mixed path separators due to both running on windows and some incomplete normalization. So we've added a utility which we use everywhere we use micromatch

See jestjs/jest#6650 where this regex comes from. (if you have any feedback on that PR, it'd be most welcome btw 馃榾)

Note that we should probably be better at normalization in Jest as well as use path.posix.join etc to avoid the backslashes. But it doesn't always make sense (and doesn't necessarily always work) to use forward slashes, especially when doing FS operation, so being able to just pass things through a helper for multimatch is simpler.

Checklist

  • If this is a big feature with breaking changes, consider [opening an issue][issues] to discuss first. This is completely up to you, but please keep in mind that your pr might not be accepted.
  • Run unit tests to ensure all existing tests are still passing
  • Add new passing unit tests to cover the code introduced by your pr
  • Update the readme (see readme advice)
  • Add your info to the contributors array in package.json!

throw new TypeError('expected a string: "' + util.inspect(path) + '"');
}

return path.replace(/\\(?![{}()+?.^$])/g, '/');
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copied from the linked Jest PR, all credit goes to @aldarund

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to use this regex in the toPosixSlashes utility method in picomatch. However, I think we'll need to remove the ., since that would cause dotfiles and directories like foo\\.git to incorrect. I think we also need * to be added to the regex.

@SimenB SimenB mentioned this pull request Mar 14, 2019
@jonschlinkert
Copy link
Member

@SimenB I don't know how I missed this. I must have marked it as "read" at some point! My apologies. I think this is resolved by the .format function (which can do any kind of modification to the string before it's matched).

If .format() doesn't resolve your issue, let's discuss what needs to be done and I'll get it fixed! Thanks so much for this PR, I feel really bad that I missed it! I guess I've been too focused on refactoring.

@SimenB
Copy link
Author

SimenB commented Mar 30, 2019

No worries, v4 is way more important!

While I agree format is a good place to put this sort of logic, I would still love to se the logic itself live in micromatch (or some other place that's easy to discover) - it's super hard to get right

@jonschlinkert
Copy link
Member

it's super hard to get right

Agreed, there is an unescape option that removes backslashes in the actual glob.

That said, I hate sounding like a broken record, but globs are regular expressions, not paths. There are too many edge cases where removing backslashes like that will cause the regex to be wrong, and/or could make the regular expression unsafe. It's possible to intentionally create malicious regular expressions resulting in DDoS attacks by defining a glob that seems fine until backslashes are removed. Not to mention, filenames may start with several of those characters, and removing \\ will most likely cause the regex to be invalid.

The good news is that there is an easier way to solve it. The path that you're prefixing onto the glob needs to be converted to posix slashes before joining it to the glob. You can use one of these methods to join the glob (I'm exposing them on micromatch too), or just always use path.posix.join(). Don't use path.join() because that will add backslashes to the regular expression.

Picomatch and micromatch convert paths to / when matching on windows, so if the glob has a bunch of \\ separators, they shouldn't match anyway.

Think about it this way: if we do what you're suggesting, we're literally potentially changing the intent of the glob defined by the user. Which means that - in some downstream lib like rimraf - when the user accidentally deletes a bunch of files that shouldn't have been matched, it will be 100% our fault. If we do it the way I'm suggesting, then yes, we need to remind people that globs are not file paths and that they need to correctly use forward slashes as path separators, and I know it's a pain (I see all of the issues for it too), but it's safer, and it's correct.

@jonschlinkert
Copy link
Member

@SimenB I have some ideas for how to make this easier, but as you mentioned it's pretty difficult to get right. Can you give me some scenarios that are used in jest, so that I can reason through different options for how we can handle this in picomatch or micromatch?

Meaning, can you provide some examples of how jest handles paths and globs, and scenarios where problems have occurred?

@SimenB
Copy link
Author

SimenB commented Apr 10, 2019

Sure! E.g. from jestjs/jest#7814

const rootDir = 'C:\\Program files (x86)\\dir'; // maybe a result of doing `process.cwd()` on win32

const filePath = '<rootDir>/test/{TestCasesNormal,StatsTestCases,ConfigTestCases}.test.js';

const result = path.resolve(
      rootDir,
      path.normalize('./' + filePath.substr('<rootDir>'.length)),
);

// result should be 'C:/Program files \(x86\)/dir/test/{TestCasesNormal,StatsTestCases,ConfigTestCases}.test.js' I think? escaping the parens

@jonschlinkert
Copy link
Member

jonschlinkert commented Apr 10, 2019

@SimenB I accidentally closed this.

Thanks! That helps, I'll review and come back with thoughts after I play around with some code. I appreciate it!

@jonschlinkert jonschlinkert reopened this Apr 10, 2019
@jonschlinkert
Copy link
Member

jonschlinkert commented Apr 14, 2019

@SimenB a couple of questions:

  • Is <rootDir> the only path variable that's supported?
  • Are there any other nuances you can think of that have made this difficult, besides the one in your example? (perhaps you can link to any unit tests that address this?)

What if we added support for path variables to picomatch? We could allow path variables to be passed on the options. Since the picomatch parser already iterates over the entire string, it would take a trivial amount of processing to do this, and IMHO it might be useful for other projects.

Thoughts?

edit: on a related note, I coincidentally worked on a library for handling tabstops and variable expansion in snippets (like the ones in TextMate, Sublime Text, VS Code etc), and I had to add logic for recursively resolving path variables. Might be useful for this.

@SimenB
Copy link
Author

SimenB commented Apr 14, 2019

Is the only path variable that's supported?

Yup!

Are there any other nuances you can think of that have made this difficult, besides the one in your example? (perhaps you can link to any unit tests that address this?)

Not of the top of my head. The main difficulties have been how to escape things that look like globs but are not in paths (like parens) and successfully injecting the path replacement without messing up slashes or relative paths

Our unit tests are mostly in https://github.com/facebook/jest/blob/master/packages/jest-config/src/__tests__/normalize.test.js. We also have some e2e tests, but I don't think those test any more edge cases

What if we added support for path variables to picomatch?

That'd be great! It would really help. We also need to do resolution of relative paths (e.g. resolving /hello/there/../world -> /hello/world), but that's way simpler than the path replacement since we don't have to worry about escaping.

@jonschlinkert
Copy link
Member

Got it, thanks for the detail. I might have a solution, I'll come back with some examples! Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants