Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Matching order of external tokens is not preserved during generation #860

Open
js2xxx opened this issue Dec 23, 2023 · 2 comments
Open

Matching order of external tokens is not preserved during generation #860

js2xxx opened this issue Dec 23, 2023 · 2 comments

Comments

@js2xxx
Copy link

js2xxx commented Dec 23, 2023

There is a case where multiple terminal strings map to the same variant of the token but with patterns of different exhaustiveness like this:

extern {
    enum Token {
        // Keywords
        "while" => Token::Ident { symbol: kw::WHILE, is_raw: false },
        ...
        // Non-keywords
        "ident" => Token::Ident { symbol: <Symbol>, is_raw: <bool> }, // Will be checked in a production rule.
        ...
    }
}

However, due to the current implementation, the terminal string will be reordered alphabetically alongside the target token pattern, thus preventing patterns of less exhaustiveness but greater alphabetic order like "while" from being matched before those that are more exhaustive but alphabetically less like "ident" ('i' < 'w' in this case).

Users could add prefixes to those terminal strings to preserve order manually, but I think changing the implementation is the optimal solution.

If the current behavior is intended and not going to be changed, an explanation of that behavior should be placed in the manual to avoid confusion.

@Pat-Lafon
Copy link
Contributor

Hmmm, I think this is related to #671 and #670. From reading those issues, I think the sorting behavior is removable? It's still a little unclear.

@js2xxx
Copy link
Author

js2xxx commented Dec 23, 2023

Oh, I should have checked those issues.

I wonder what the purpose of the sorting behavior is in the first place. It seems unable to make the following process more efficient but makes the external token conversion somehow unpredictable. Therefore, it should be removed in my opinion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants