-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Replies: 1 comment · 11 replies
-
Can you provide a minimal example of markdown code where the limit of MAX_VERSION_COUNT is an issue? |
Beta Was this translation helpful? Give feedback.
All reactions
-
Thanks for all the interest. First of all thank you for complimenting my test coverage, but I have to give credit to the actual spec, where all test cases actually came from and also ikatyang, who implemented a previous markdown grammar from where I got my initial version of the tests. @maxbrunsfeld getting an example that results in may parse stack versions is quite a bit more complex than I originally thought. It happens mostly with inputs where a lot of (potential) delimiter environments are open at the same time as e.g.
But of course this is not really a good minimal example. @alemuller Adressing your comments:
As I said, I'm refactoring right now but I would love some people to review my code after that. Also should make it a lot easier to contribute once I comment everything properly |
Beta Was this translation helpful? Give feedback.
All reactions
-
I see your point, but I personally believe the Unix philosophy (do one thing and do it well) would be a better fit for tree-sitter for many reasons. And by doing well, in the tree-sitter parser, means doing less work. Instead of validating the HTML entities on the parser, use a injection. Same for email, just see if has an at and delegate the parsing for another parser that implements the RFC. But, I'm not going to discuss those now. You seems to care a lot about performance. I avoided talking about this for a few reason:
|
Beta Was this translation helpful? Give feedback.
All reactions
-
Personally, I think your current design is good. markdown is all one language; the block structure by itself isn’t really it’s own language from the user’s perspective - the two phase split is just an implementation strategy. |
Beta Was this translation helpful? Give feedback.
All reactions
-
Yeah. I agree 100%. But. I think some things should be delegated to a specialized parsers (not only in Markdown), like:
Again, not talking about markdown parser that exceptionally tries to validate them. Having specialized parsers would lead to a more consistent highlighting in different languages, writing plugins easier, and so on. Things like email address are not trivial to parse. Most regex doesn't conform with the RFC. Other things like Unicode code points escape sequence are not hard, but not many people care about to implement. Etc. |
Beta Was this translation helpful? Give feedback.
All reactions
-
👍 1
-
On the topic of performance I'm obviously just guessing. But at least for some use cases there is definitely some overhead for using different grammars. For example when the file is too large to fit in memory and thus must be streamed there would be two passes. One to parse the block structure and one to parse the inline structure. I am pretty sure this would be slower since often IO is a bottleneck. I agree it would be nice to have specialized parsers for things like emails, which could also be used for markdown files using language injections. But just so we're on the same page: the markdown grammar still needs to decide whether something counts as an email or not using the regex from the spec. |
Beta Was this translation helpful? Give feedback.
-
My grammar in https://github.com/MDeiml/tree-sitter-markdown uses conflicts very heavily, as markdown is a language with a very non strict syntax. For example every
*
could be the beginning of an emphasis, but sometimes this can only be decided after parsing the whole paragraph.This means that, to parse everything 100% correct, the
MAX_VERSION_COUNT
would need to be infinite, since emphasis can be nested. Right now it is statically set to 6 with no method to change this number.Now it obviously makes sense to restrict the version count in most cases, even for this markdown grammar when used for highlighting for example. But it would be nice to have the option to increase or disable the maximum version count to e.g. use the grammar as a very primitive but fast markdown compiler.
How do you think about this?
Beta Was this translation helpful? Give feedback.
All reactions