Allow contextual keywords to be used as identifiers #276

tamasvajk · 2023-01-11T10:58:20Z

This is a tracking issue to see the progress on validating which contextual keywords can be used as identifiers. All contextual keywords are listed below from https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/keywords/#contextual-keywords.

Tree sitter is performing context aware lexing, so some of the keywords can already be used as identifiers even without listing them in _contextual_keywords. Some of them need special handling, and some of them are never going to be correctly handled. Examples for the latter:

var in implicit_type could also be parsed as a named type.
nint, nuint in predefined_type could also be parsed as a named type.
notnull in type_parameter_constraint can be parsed as a notnull constraint or a type constraint.
...
In these cases we should choose the most likely option as the preferred parse rule.

Keywords:

The text was updated successfully, but these errors were encountered:

damieng · 2023-01-11T12:53:36Z

Do you think it is likely that each keyword is going to require additional rules and scenarios much like scoped has?

My concern here is how much this is going to blow the grammar up in terms of size/complexity. I think we are already the biggest tree-sitter grammar and if the PR for the 'scoped' rule is anything to go by this could really cause a big expansion not just in terms of maintaining it but also in terms of conflict rules and execution overhead/parser size (e.g. the current LFS warning)

This is going to come down I guess to the trade-off between being very strict/accurate vs being fast/lightweight. Given that Roslyn exists for the former there was definitely a push from Max to be the latter especially given the current usage of tree-sitter to provide fast syntatical analysis over a huge amount of code at GitHub for the semantic feature.

Worth a discussion.

tamasvajk · 2023-01-11T13:31:10Z

I don't have a definitive answer to this. I hoped that most of these wouldn't even be required in _contextual_keywords, but that doesn't seem to be the case so far.

These are the ones that I already looked at and might give some rough idea on the changes needed:

scoped: as mentioned above, needs additional rules
partial, required: probably doesn't need additional rules (based on this commit)
yield: probably doesn't need additional rules (based on this commit)
var, nint, nuint: needs additional rules (see here)
async: needs additional rules (see here)

Some of these "contextual keyword as identifier" problems seem more important than others. For example file should be allowed as a parameter name and as an identifier anywhere a local variable can show up. (I'm not sure if this is already the case.) var, group, from, set, value are somewhat similar, but probably less frequent.

I don't know anything about the performance impact of the recent changes that have been introduced. I guess these can be two fold: (i) compiling the parser might take longer, (ii) the performance of the parser degrades. How much do we care about (i)? Should we add some tests for (ii)?

Regarding grammar size and complexity: I agree with you, I think it's already too big. (And somewhat fragile) At the same time, we're still mixing up generic types and expressions (mentioned in #176), which somewhat saddens me as even a simple syntax highlighting task would be inaccurate. And fixing it will probably make the grammar even more complex.

tamasvajk · 2023-01-12T12:32:59Z

I continued looking into this a bit more:

file didn't need any changes Add test cases with file identifiers #277
get, set, ... accessor keywords could be removed from the list, and also from the conflicts table. Remove contextual keyword set, get, init, add, remove #279
value is not a dedicated token in the grammar, so it needed no change.
dynamic and nameof were only present in _contextual_keywords, so they could be removed without any other changes - Remove dynamic and nameof from contextual keywords #281

tamasvajk · 2023-01-13T23:02:53Z

I just came across this tweet, which shows a great test case for contextual keywords: Sharplab link.

The interesting bits:

class where { }
    
class String {
    public static implicit operator String(dynamic var) => null;
}

class dynamic { }
public class var { 

    dynamic dynamic => null;
          
    void record(where String = null, int y = default)
    {
        String s = this.dynamic;
    }    
         
    async Task<int> M() {
        var v = new();
        await v.async(new async());
        return 0;
    }
    
    async async async(async async) => 
        await async;
}

class await : INotifyCompletion {
  ...
}

class async { 
    public await GetAwaiter() => throw null;
}

tamasvajk mentioned this issue Feb 5, 2023

WIP: Remove implicit_type #263

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow contextual keywords to be used as identifiers #276

Allow contextual keywords to be used as identifiers #276

tamasvajk commented Jan 11, 2023 •

edited

damieng commented Jan 11, 2023 •

edited

tamasvajk commented Jan 11, 2023 •

edited

tamasvajk commented Jan 12, 2023

tamasvajk commented Jan 13, 2023

Allow contextual keywords to be used as identifiers #276

Allow contextual keywords to be used as identifiers #276

Comments

tamasvajk commented Jan 11, 2023 • edited

damieng commented Jan 11, 2023 • edited

tamasvajk commented Jan 11, 2023 • edited

tamasvajk commented Jan 12, 2023

tamasvajk commented Jan 13, 2023

tamasvajk commented Jan 11, 2023 •

edited

damieng commented Jan 11, 2023 •

edited

tamasvajk commented Jan 11, 2023 •

edited