Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compile-time kind and field ids that are generated by binding generation. #3276

Open
VonTum opened this issue Apr 10, 2024 · 8 comments
Open
Labels
enhancement Feature request rust

Comments

@VonTum
Copy link

VonTum commented Apr 10, 2024

Problem

So while it's possible to use Language::id_for_node_kind and Language::field_id_for_name to request ids for kinds and fields, these are runtime functions. This means that it is not possible to use these in compile-time contexts, such as in match arms. It would be cleaner to provide some macros or compile time constants to allow getting these values for these contexts, and removes the need to write a mess like this. As an added benefit, compile time ids allow for more optimizations to code walking the tree 😄

I was able to build such proc_macros myself for my own language 'SUS', and they work wonderfully. Perhaps it would be a good idea to incorporate this into mainline tree-sitter?

Expected behavior

It should be possible to write code like this:

use mylang::{kind,kw,field};

fn test_fn(cursor : &TreeCursor) {
  if cursor.field() == field!("sub_expr") {
    let node = cursor.node();
    match node.kind_id() {
      kind!("array_expr") => {},
      kind!("binary_expr") => {},
      kind!("unary_expr") => {},
      // etc
    }
  }
}

So proc_macros such as these need to be part of a separate crate. During my discussion with @WillLillis on this issue, we found it difficult to find a good spot to place this new crate.

Sadly it cannot be part of the tree_sitter core crate. It has to be part of the generated code, since for the proc macros to work they need to have the user's Language as a dependency. Sadly again, we couldn't simply include such proc macros into the generated bindings crate itself, because rust makes a strict distinction between proc_macro and non-proc_macro crates. No other functions are allowed in non-proc_macro crates.

That leaves the issue of where to put it. Following the directory structure, we'd need to put them in tree-sitter-myLang/bindings/rust/proc/Cargo.toml, but this is ugly, and brittle. Not to mention it requires having a dependency that goes up the file tree instead of down.

We also discussed simply adding it on the root of the myLang folder, but that's cluttering up that folder further. If you can make a decision on where and how it should be placed, I'd be happy to create a PR to implement it.

@VonTum VonTum added the enhancement Feature request label Apr 10, 2024
@ObserverOfTime
Copy link
Member

Ideally this would be implemented in a grammar-agnostic way in or depending on the tree-sitter-language crate (#3069).

@VonTum
Copy link
Author

VonTum commented Apr 12, 2024

The issue with trying to fit these proc_macros into any grammar-agnostic shared library crate is that the language's id_for_node_kind and field_id_for_name functions must be available at compiletime to the crate containing the proc_macros. You can't pass a runtime Language object to a macro. In a proc_macro you only have access to a raw TokenStream.

What might be possible, but would create much uglier & brittle code, is passing the fully qualified name of the fn my_lang::language function to the macro, and having it name-resolve it itself and run it to fetch the language. Then probably to make it useable we'd want to have shorthand macros regardless. In any case I would prefer the simplicitly of simply including it in the generated bindings.

@WillLillis
Copy link
Contributor

What might be possible, but would create much uglier & brittle code, is passing the fully qualified name of the fn my_lang::language function to the macro, and having it name-resolve it itself and run it to fetch the language.

I tried this approach but wasn't able to get it to work. How would this look in the macro's definition?

@ObserverOfTime
Copy link
Member

Would #2380 be possible to implement instead?

@VonTum
Copy link
Author

VonTum commented Apr 12, 2024

I think either approach would do. Though I do prefer the proc_macro approach, as that allows these to be used in more contexts (most notably match arms), and it's arguably more readable. (kw!("<=") vs Keyword::LESS_EQUAL)

@WillLillis
Copy link
Contributor

I think either method is fine. If the enum was part of the generated rust code (rather than constants translated over ffi), it could still be used in match arms. I tend to lean towards the proc macro approach as it leverages existing portions of the library (the proc macro just calls id_for_node_kind()) that have already been extensively tested. I threw together an example proc crate for tree-sitter-c using VonTum's macros here. This secondary crate could be generated in the same manner as the existing grammar crates with a simple template and replacements for the grammar's name, e.g.

use proc_macro::TokenStream;

use quote::{quote, quote_spanned};
use syn::{parse_macro_input, LitStr};

#[proc_macro]
pub fn kind(token_stream : TokenStream) -> TokenStream {
    let string_literal : LitStr = parse_macro_input!(token_stream);

    // Get the string value and calculate its length
    let requested_kind = string_literal.value();

    let language = tree_sitter_LOWER_PARSER_NAME_PLACEHOLDER::language();
    let found_id = language.id_for_node_kind(&requested_kind, true);

    if found_id != 0 {
        quote! {
            #found_id
        }
    } else {
        quote_spanned!(
            string_literal.span() =>
            compile_error!("This is not a valid node kind in the LOWER_PARSER_NAME_PLACEHOLDER grammar")
        )
    }.into()
}

Maybe there's a cleaner/ easier way? Thoughts?

@WillLillis
Copy link
Contributor

Also as an aside, if the proc macro approach is used, should kind!() be altered to accept a second boolean argument to forward to id_for_node_kind(), rather than hard-coding true for its second argument?

@VonTum
Copy link
Author

VonTum commented Apr 12, 2024

// Get the string value and calculate its length
Oh, I see now there was a holdover from the example I started from 😂

But yes exactly, the macro code itself isn't difficult to generate, it's just that block three times, for kind, keyword, and field. The main question is "Where should we put it?"

Maybe one thing about the macros that could be discussed is specifically the field! macro. tree-sitter seems to have switched to NonZeroU16 recently for fields, but that means if the macro also produces a NonZeroU16 then it wouldn't be useable in match statements. (Constructing a nonzero requires an expression, which is forbidden in rust patterns. )

Also as an aside, if the proc macro approach is used, should kind!() be altered to accept a second boolean argument to forward to id_for_node_kind(), rather than hard-coding true for its second argument?

So kind! passes true, and kw! passes false.

This point of interface would allow us to share some code between kind! and kw! though.

It is a bit strange, the field in id_for_node_kind is called named : bool, but only through experimentation did I figure out that that distinguished explicit kinds and implicit kinds generated for keywords.

I don't think it's possible to get kinds for hidden names, like _expression, as nodes with these kinds aren't accessible to the user of the parse tree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Feature request rust
Projects
None yet
Development

No branches or pull requests

3 participants