Help with assembly grammar #731
-
Hi, I was wondering if you could help me figure out how to make a grammar for assembly, specifically this assembly. Unfortunately it uses LLVM's assembly parser which supports a dozen different variants and is very ad-hoc. In other words there's no formal grammar. Here's the gist though:
So the issues I'm having are:
Thanks! Sorry if this is the wrong place to ask! |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
You can handle newlines in
The C preprocessor is tricky. I think you'll need to approximate it as if it were part of the same language by explicitly modeling certain preprocessor constructs and usage patterns. This is a fundamental limitation of parsing arbitrary source files without knowing the values of compiler flags. Of course, you could also run the preprocessor before parsing your file with Tree-sitter, but usually when people are using Tree-sitter, it's because they want their grammar to be applicable within the limitations of lightweight text editors and other programming tools that operate on raw files.
I have given this a super careful read right now, but this seems doable with normal regexes, or normal grammar constructs. Let me know if you continue to have trouble with this, and you are able to narrow it down to a tree-sitter-specific issue or limitation. |
Beta Was this translation helpful? Give feedback.
-
Ah... very clever. It might be worth explaining this in the docs since it is pretty confusing! Something like this, if I'm understanding it correctly (feel free to copy/pasted):
Ah I figured as much. I guess it might be worth pointing this limitation out in the introduction?
Yeah exactly, and I imagine tracking spans/line numbers would get very complicated!
I think I figured this out actually. Thanks for all the help! |
Beta Was this translation helpful? Give feedback.
You can handle newlines in
grammar.js
too. The Go grammar does this for example. It can work even though newlines can also function as whitespace. The lexer will only produce a\n
token in states where that token is valid.The C preprocessor is tricky. I think you'll need to approximate it as if it were part of the same language by expli…