Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow contextual keywords to be used as identifiers #276

Open
12 of 43 tasks
tamasvajk opened this issue Jan 11, 2023 · 4 comments
Open
12 of 43 tasks

Allow contextual keywords to be used as identifiers #276

tamasvajk opened this issue Jan 11, 2023 · 4 comments

Comments

@tamasvajk
Copy link
Collaborator

tamasvajk commented Jan 11, 2023

This is a tracking issue to see the progress on validating which contextual keywords can be used as identifiers. All contextual keywords are listed below from https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/keywords/#contextual-keywords.

Tree sitter is performing context aware lexing, so some of the keywords can already be used as identifiers even without listing them in _contextual_keywords. Some of them need special handling, and some of them are never going to be correctly handled. Examples for the latter:

  • var in implicit_type could also be parsed as a named type.
  • nint, nuint in predefined_type could also be parsed as a named type.
  • notnull in type_parameter_constraint can be parsed as a notnull constraint or a type constraint.
  • ...
    In these cases we should choose the most likely option as the preferred parse rule.

Keywords:

@damieng
Copy link
Contributor

damieng commented Jan 11, 2023

Do you think it is likely that each keyword is going to require additional rules and scenarios much like scoped has?

My concern here is how much this is going to blow the grammar up in terms of size/complexity. I think we are already the biggest tree-sitter grammar and if the PR for the 'scoped' rule is anything to go by this could really cause a big expansion not just in terms of maintaining it but also in terms of conflict rules and execution overhead/parser size (e.g. the current LFS warning)

This is going to come down I guess to the trade-off between being very strict/accurate vs being fast/lightweight. Given that Roslyn exists for the former there was definitely a push from Max to be the latter especially given the current usage of tree-sitter to provide fast syntatical analysis over a huge amount of code at GitHub for the semantic feature.

Worth a discussion.

@tamasvajk
Copy link
Collaborator Author

tamasvajk commented Jan 11, 2023

I don't have a definitive answer to this. I hoped that most of these wouldn't even be required in _contextual_keywords, but that doesn't seem to be the case so far.

These are the ones that I already looked at and might give some rough idea on the changes needed:

  • scoped: as mentioned above, needs additional rules
  • partial, required: probably doesn't need additional rules (based on this commit)
  • yield: probably doesn't need additional rules (based on this commit)
  • var, nint, nuint: needs additional rules (see here)
  • async: needs additional rules (see here)

Some of these "contextual keyword as identifier" problems seem more important than others. For example file should be allowed as a parameter name and as an identifier anywhere a local variable can show up. (I'm not sure if this is already the case.) var, group, from, set, value are somewhat similar, but probably less frequent.

I don't know anything about the performance impact of the recent changes that have been introduced. I guess these can be two fold: (i) compiling the parser might take longer, (ii) the performance of the parser degrades. How much do we care about (i)? Should we add some tests for (ii)?

Regarding grammar size and complexity: I agree with you, I think it's already too big. (And somewhat fragile) At the same time, we're still mixing up generic types and expressions (mentioned in #176), which somewhat saddens me as even a simple syntax highlighting task would be inaccurate. And fixing it will probably make the grammar even more complex.

@tamasvajk
Copy link
Collaborator Author

I continued looking into this a bit more:

@tamasvajk
Copy link
Collaborator Author

I just came across this tweet, which shows a great test case for contextual keywords: Sharplab link.

The interesting bits:

class where { }
    
class String {
    public static implicit operator String(dynamic var) => null;
}

class dynamic { }
public class var { 

    dynamic dynamic => null;
          
    void record(where String = null, int y = default)
    {
        String s = this.dynamic;
    }    
         
    async Task<int> M() {
        var v = new();
        await v.async(new async());
        return 0;
    }
    
    async async async(async async) => 
        await async;
}

class await : INotifyCompletion {
  ...
}

class async { 
    public await GetAwaiter() => throw null;
}

@tamasvajk tamasvajk mentioned this issue Feb 5, 2023
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants