Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation request: State and Context #534

Open
stefnotch opened this issue Sep 26, 2023 · 10 comments
Open

Documentation request: State and Context #534

stefnotch opened this issue Sep 26, 2023 · 10 comments

Comments

@stefnotch
Copy link
Contributor

Chumsky has a concept of a state and context. And quite a few combinators that do things with them. And a map_ctx standalone function.

I personally never quite managed to wrap my head around that featureset. What is it intended to be used for, and how does one work with it?
e.g. How do I take an existing context, and change it into a new type of context?

@zesterer
Copy link
Owner

'State' is just a mutable value that gets carried around during parsing. It can contain arbitrary data and can be whatever you like: common choices include string interners and arena allocators (or both! Just make a struct containing both of them). There are good examples of both in the doc examples of fold_with_state and map_with_state.

Context is different: it's some information passed from a parser earlier in the input to one later in the input that changes how the latter behaves. For example, an indentation-sensitive parser might pass the indentation level from the first indentation parser to all of the trailing indentation in a block, allowing it to properly parse indentation (despite this usually being a context-sensitive thing that recursive descent parsers struggle with). Here's an example of this.

Hopefully I'll get the time to write up proper long-form guides for both of these soon.

@stefnotch
Copy link
Contributor Author

stefnotch commented Sep 27, 2023

Btw, I think there's a typo in the documentation for ignore_with_ctx. It should probably say "if you do"

/// the first. If you don't need the context in the output, use [`Parser::then_with_ctx`].

@zesterer
Copy link
Owner

Yep, you're right. I think this was due to the combinator roles being switched at one point.

@stefnotch
Copy link
Contributor Author

For when you're going to write up a guide, here are a few questions that I personally have:

  • How do I take an existing context, and change it into a new type of context? I did eventually stumble upon .map_with_ctx(|_, ctx| ctx.clone()).ignore_with_ctx(map_ctx(|ctx| do something with the ctx)), but that seems pretty convoluted.

  • How do I start a parser with a given context? parser.with_ctx(...).parse(input) doesn't immediately work.

use chumsky::{extra, prelude::EmptyErr, text, IterParser, Parser};

#[derive(Clone, Debug, Default)]
struct TestContext {
    value: char,
}

#[test]
fn test_chumsky_recursive_context() {
    let number = text::digits::<char, &str, extra::Full<EmptyErr, (), TestContext>>(10)
        .exactly(1)
        .collect::<Vec<_>>()
        .map_with_ctx(|result, ctx| {
            println!("result: {:?}, ctx: {:?}", result, ctx);
            if ctx.value == result[0] {
                Some(result[0])
            } else {
                None
            }
        })
        .boxed();

    assert_eq!(
        number
            .with_ctx(TestContext { value: '1' })
            .parse("1")
            .into_output(),
        Some(Some('1'))
    );
}
  • How do I only temporarily change the context? Like how do I express "then run indent parser with current context plus 2"?

@zesterer
Copy link
Owner

How do I take an existing context, and change it into a new type of context?

There is a map_ctx function. Unlike most functions, it's in postfix position (i.e: to indicate that it's changing the context that's getting passed into the parser, not mapping some output).

How do I start a parser with a given context?

with_ctx should work. What problem are you running up against?

How do I only temporarily change the context? Like how do I express "then run indent parser with current context plus 2"?

Context is local to a section of the parser tree. When using map_ctx, say, the mapped context will only apply to the parser given by the second argument. The new context will only be observable by that parser.

@CraftSpider
Copy link
Collaborator

If you imagine a parser as a tree with ordered children, context is 'created' by a node, and can only be used by nodes that share the same parent and come after the creator, and children of those nodes. It flows 'forwards and down', which is the opposite of most parse results, which flow 'backwards and up' from where they are generated until they are returned as output at the top of the tree.

@stefnotch
Copy link
Contributor Author

How do I take an existing context, and change it into a new type of context?

There is a map_ctx function. Unlike most functions, it's in postfix position (i.e: to indicate that it's changing the context that's getting passed into the parser, not mapping some output).

How do I start a parser with a given context?

with_ctx should work. What problem are you running up against?

I see. Apparently I'm running into the same type inference issue in both cases. The piece of code above should demonstrate the problem. map_ctx and with_ctx allow for an arbitrary, unrelated output type. This means that in most cases, I run into the Rust compiler saying that it doesn't know which type to use.

For example:

error[E0284]: type annotations needed
  --> parser\tests\chumsky_tests.rs:99:27
   |
99 |         let base_number = map_ctx(|ctx: &TestContext| ctx.clone(), number).boxed();
   |                           ^^^^^^^ cannot infer type of the type parameter `E` declared on the function `map_ctx`
   |
   = note: cannot satisfy `<_ as ParserExtra<'_, &str>>::Context == TestContext`
help: consider specifying the generic arguments
   |
99 |         let base_number = map_ctx::<Boxed<'_, '_, &str, char, chumsky::extra::Full<EmptyErr, (), TestContext>>, char, &str, E, chumsky::extra::Full<EmptyErr, (), TestContext>, _>(|ctx: &TestContext| ctx.clone(), number).boxed();
   |                                  +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Without an explanation, my default assumption was "I'm doing something wrong". But I'm glad to see that map_ctx::<_, _, _, extra::Full<EmptyErr, (), _>, _, _> does indeed work, even if it's pretty verbose.

The with_ctx example would be
image

How do I only temporarily change the context? Like how do I express "then run indent parser with current context plus 2"?

Context is local to a section of the parser tree. When using map_ctx, say, the mapped context will only apply to the parser given by the second argument. The new context will only be observable by that parser.

Ah, I see. Thank you!

@zesterer
Copy link
Owner

Oh hmm, looks like we need to thread E through as a phantom type parameter. Let me do that now.

@zesterer
Copy link
Owner

If you hitch yourself up to the latest commit, it should work fine without the explicit type parameter now.

@stefnotch
Copy link
Contributor Author

Woah, map_ctx is really nice to use now! Thank you very much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants