Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] Pratt parsing with spans #507

Open
stefnotch opened this issue Aug 22, 2023 · 8 comments
Open

[Feature request] Pratt parsing with spans #507

stefnotch opened this issue Aug 22, 2023 · 8 comments

Comments

@stefnotch
Copy link
Contributor

stefnotch commented Aug 22, 2023

When using the new pratt parsing feature, how would I create a syntax tree which includes spans?

e.g. How does one get spans that include the operators? And how does one get the spans of the operators themselves?

chumsky/src/pratt.rs

Lines 709 to 719 in 01b96cd

let parser = atom
.pratt(operator)
.with_postfix_ops(choice((
// Because we defined '+' and '-' as left associative operators,
// in order to get these to function as expected, their strength
// must be higher, i.e. they must bind tighter
postfix(just('!'), 2, |lhs| Expr::Factorial(Box::new(lhs))),
// Or weirdness happens
postfix(just('$'), 0, |lhs| Expr::Value(Box::new(lhs))),
)))
.map(|x| x.to_string());

@stefnotch
Copy link
Contributor Author

stefnotch commented Aug 22, 2023

I did try map_with_span, but that gives me a PrattOpOutput. I'm not sure if I can do anything with it.
image

On that note, why is type InfixBuilder<E> = fn(lhs: E, rhs: E) -> E; a fn? Most other chumsky APIs accept a Fn, which makes some tasks much easier.

Edit: I see, it's because getting a Fn to work in that case is tricky #464 (comment)
Would it be possible to extend the InfixBuilder to also have access to the infix token parser result? Same for prefix and postfix parsers. Then it would at least be possible to add some info to the token parser, and pass it into the infix builder.

@stefnotch
Copy link
Contributor Author

This is my attempt.

https://github.com/zesterer/chumsky/compare/main...stefnotch:pratt-with-op?expand=1

I hope it's roughly the right direction, but one thing that's missing is the ability to write build functions that take an operator that has a different type. For example, here the build function takes a char as the operator, and two Exprs as the children. Supporting this probably requires passing another generic type through all the functions.
image

@Zij-IT
Copy link
Contributor

Zij-IT commented Aug 23, 2023

I’ll give it a look today to see what I can do to get it to give you the op spanned. A quick and dirty way to do it would be to use the span’s of the expressions to calculate the span of the operator, as the operator’s span is the space between the two expr’s (though whitespace makes that not 100% true)

@stefnotch
Copy link
Contributor Author

@Zij-IT

I think there's a certain value in having access to the entire parsed operator. For example, when parsing sufficiently complex mathematical expressions, then there is Knuth's up arrow notation.
https://en.m.wikipedia.org/wiki/Knuth%27s_up-arrow_notation
Which has an operator with an exponent. That would be most easily parsed by writing a "up arrow followed by exponent" parser for the operator.

@Zij-IT
Copy link
Contributor

Zij-IT commented Aug 23, 2023

@Zij-IT

I think there's a certain value in having access to the entire parsed operator. For example, when parsing sufficiently complex mathematical expressions, then there is Knuth's up arrow notation. https://en.m.wikipedia.org/wiki/Knuth%27s_up-arrow_notation Which has an operator with an exponent. That would be most easily parsed by writing a "up arrow followed by exponent" parser for the operator.

Excellent point. I’ll give it a shot tonight and see what I can do!

@Zij-IT
Copy link
Contributor

Zij-IT commented Aug 24, 2023

So, here is some API questions that I have. How does this look?

let parser = atom 
    .pratt(choice((
        left_infix(just('-'), 0, |lhs, op, rhs| Expr::Sub(Box::new(lhs), Box::new(rhs))),
        left_infix(just('+'), 0, |lhs, op, rhs| Expr::Add(Box::new(lhs), Box::new(rhs))),
    ))) 
    .with_postfix_ops(
        // For postfix the `op` is on the right
        postfix(just('!'), 0, |lhs, op| Expr::Factorial(Box::new(lhs))),
    ) 
    .with_prefix_ops(
        // For prefix the `op` is on the left
        prefix(just('-'), 0, |op, rhs| Expr::Negate(Box::new(rhs))),
    )
    .map(|x| x.to_string()); 

Or do we follow str::split_once and str::rsplit_once and keep the op placement consistent so that both use fn(Op, Expr) -> Expr

@stefnotch
Copy link
Contributor Author

stefnotch commented Aug 24, 2023

That looks like a good API. I'd keep the |lhs, op| design, if the infix operator's callback is |lhs, op, rhs|.

If it's |op, lhs|, then I'd expect the infix operator's callback to also look roughly like that. That's actually what I did above, mostly because I wanted to have an easy time implementing it. There I have an API that looks like left_infix(just('-'), 0, |op, [lhs, rhs]| Expr::Sub(Box::new(lhs), Box::new(rhs))). That made using the current Mode API very easy.
https://github.com/zesterer/chumsky/compare/main...stefnotch:pratt-with-op?expand=1#diff-350f258948b06701ad68bdfb8528d5fdc35c29e5d6a4bdbd0c7673eea00a7e3fR365-R366

@stefnotch
Copy link
Contributor Author

Did you have any luck with a proper implementation so far, or are some parts trickier than expected? Is there anything I could do to help?

In my little experiments with extending the Pratt parsing implementation, I never managed to figure out what exactly Expr in impl<'a, P, Expr, I, O, E> ParserSealed<'a, I, PrattOpOutput<PrefixBuilder<Expr>>, E> was used for, which sadly limited my understanding of that bit of code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants