Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enable using a tokenized input as a Stream input #513

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

cosmicexplorer
Copy link

Problem

I'm parsing a string into tokens with chumsky, and I would like to also use chumsky to parse those tokens into something else. While select! { ... } is intended to enable this, it assumes that the stream of tokens is produced externally to chumsky, as in the logos example:

chumsky/examples/logos.rs

Lines 130 to 145 in 56762fe

let token_iter = Token::lexer(SRC)
.spanned()
// Convert logos errors into tokens. We want parsing to be recoverable and not fail at the lexing stage, so
// we have a dedicated `Token::Error` variant that represents a token error that was previously encountered
.map(|(tok, span)| match tok {
// Turn the `Range<usize>` spans logos gives us into chumsky's `SimpleSpan` via `Into`, because it's easier
// to work with
Ok(tok) => (tok, span.into()),
Err(()) => (Token::Error, span.into()),
});
// Turn the token iterator into a stream that chumsky can use for things like backtracking
let token_stream = Stream::from_iter(token_iter)
// Tell chumsky to split the (Token, SimpleSpan) stream into its parts so that it can handle the spans for us
// This involves giving chumsky an 'end of input' span: we just use a zero-width span at the end of the string
.spanned((SRC.len()..SRC.len()).into());

Solution

  • Expose .parse_iter() outside of #[cfg(test)] and use it to construct a Stream instance.
  • Expose .stream(input) as a public method of IterParser to generate a stream of transformed input.

@zesterer
Copy link
Owner

One thing I worry about is that the API seems to imply that the parser gets turned into a Stream, when in reality it's used used to parse elements, collected into a vector, and then those elements are used as a Stream. #399 discusses the former use-case and what problems we've run up against when trying to do this.

Did you have an example of the sort of patterns that this enables?

@zesterer
Copy link
Owner

Edit: It seems I misread the implementation earlier, I see it is turning the parser directly into a stream. As mentioned, #399 discusses some of these issues. In particular, ParseIter currently just swallows parser errors, pretending they don't exist.

@cosmicexplorer
Copy link
Author

Ah, I see #399 (mentioned directly above parse_iter()) covers exactly this issue, not a different one. I'll see if I can page into that.

@cosmicexplorer cosmicexplorer marked this pull request as draft September 1, 2023 23:11
@cosmicexplorer
Copy link
Author

Converted this into a draft as this is really just the easier part of #399.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants