Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nested_in() + repeated() produce incorrect error spans. #577

Open
ktsimpso opened this issue Nov 27, 2023 · 5 comments
Open

nested_in() + repeated() produce incorrect error spans. #577

ktsimpso opened this issue Nov 27, 2023 · 5 comments

Comments

@ktsimpso
Copy link

Using 1.0.0.alpha6 using a combination of nested_in + repeated() if there is a parse error in the inner parser, the error is reported with a span relative to the start of the chunk that the outer parser parsed rather than the start of the entire input. (This makes sense from the inner parser's perspective as it does not know about the other input). Though my expectation would be for the combinator to reframe error spans to be consistent with the outer span context.

Example:

let blank_line = text::newline::<&str, extra::Err<Rich<'_, char>>>()
    .repeated()
    .exactly(2)
    .ignored();
let chunker = any()
    .and_is(blank_line.not())
    .repeated()
    .at_least(1)
    .to_slice()
    .then_ignore(blank_line.or(end()));
let numbers = text::int(10)
    .separated_by(text::newline())
    .allow_trailing()
    .collect::<Vec<_>>();
let parser = numbers.nested_in(chunker).repeated().collect::<Vec<_>>();

let result = parser.parse("1\n2\n\n3\n4\n\n").into_result();
assert_eq!(result, Ok(vec![vec!["1", "2"], vec!["3", "4"]]));

let result = parser.parse("1\na\n\n3\n4\n\n").into_result();
assert_eq!(
    result.map_err(|e| *e.first().unwrap().span()),
    Err(SimpleSpan::new(2usize, 3))
);

let result = parser.parse("1\n2\n\n3\na\n\n").into_result();
assert_eq!(
    result.map_err(|e| *e.first().unwrap().span()),
    Err(SimpleSpan::new(7usize, 8))
); // fails with left: Err(2..3)

Alternatively is there a nice way for me to manually map the span to the correct outer context?

@zesterer
Copy link
Owner

Yes, you could use Input::spanned, like this example.

That aside, it's not clear to me why nested_in is being used here: it's designed primarily for recursive inputs like token trees, not for performing extra validation upon an existing output.

@ktsimpso
Copy link
Author

ktsimpso commented Nov 27, 2023

That aside, it's not clear to me why nested_in is being used here: it's designed primarily for recursive inputs like token trees, not for performing extra validation upon an existing output.

You'd think I'd be able to something like the following for the parser.

let parser = numbers
    .separated_by(blank_line)
    .allow_trailing()
    .collect::<Vec<_>>();

However this assertion will fail.

let result = parser.parse("1\n2\n\n3\n4\n\n").into_result();
assert_eq!(result, Ok(vec![vec!["1", "2"], vec!["3", "4"]])); // fails with left: Err([found end of input at 5..6 expected something else])

I think it's because the trailing member of the numbers parser consumes the last \n token which then causes the blank_line parser to fail. If I modify numbers to be the following to remove the trailing newline issue, the above test case passes.

let numbers = text::int(10)
    .separated_by(text::newline())
    .at_least(1)
    .collect::<Vec<_>>();

However the following test case would fail

let result = parser.parse("1\n2\n\n3\n4\n").into_result();
assert_eq!(result, Ok(vec![vec!["1", "2"], vec!["3", "4"]])); // fails with left: Err([found end of input at 9..9 expected ''0'', ''\r'', or ''\n''])

This would imply I need to add allow_trailing back into the numbers parser which reintroduces the previous error.

I originally attempted to write this parser a year ago on version 0.8.0 but could not get any combination of combinators to parse all cases correctly (though I don't recall exactly all the combinations I attempted). I ended up solving this by doing two parsing passes. When I saw nested_in in 1.0.0 I was quite happy as that added a combinator that essentially did what I have to manually do in 2 passes in 0.8.0 and it works for the cases above though with the quirks with the error spans. Though if I can find some combination that works and avoids having to muck around with nested_in (and thus SpannedInput if I want to have proper error handling) that would be great.

Yes, you could use Input::spanned, like this example.

Thanks I'll take a look at this. My initial attempt stalled out after converting the input to a spanned input. (Which doesn't feel the best)

let input = "1\n2\n\n3\n4\n\n";
let spanned = input
    .chars()
    .enumerate()
    .map(|(i, c)| (c, SimpleSpan::new(i, i + 1)))
    .collect::<Vec<_>>()
    .spanned(SimpleSpan::new(input.len(), input.len()));

All my parsers are defined currently with input as &str, however changing them to SpannedInput<char, SimpleSpan, &[(char, SimpleSpan)]> results in an error with the text::int combinator which only works with a StrInput

It's not immediately clear to me the next step from there.

@Zij-IT
Copy link
Contributor

Zij-IT commented Nov 27, 2023

Correct me if I am wrong, but wouldn't it be possible to deny allowing trailing, and then allow the last one to have trailing newlines? Something like:

    let blank_line = text::newline::<&str, extra::Err<Rich<'_, char>>>()
        .repeated()
        .exactly(2)
        .ignored();
    let numbers = text::int::<_, _, extra::Err<Rich<'_, _>>>(10)
        .separated_by(text::newline())
        .at_least(1)
        .collect::<Vec<_>>();
    let parser = numbers
        .separated_by(blank_line)
        .collect::<Vec<_>>()
        .then_ignore(text::newline().repeated());

    let result = parser.parse("1\n2\n\n3\n4\n\n").into_result();
    assert_eq!(result, Ok(vec![vec!["1", "2"], vec!["3", "4"]]));

    let result: Result<Vec<Vec<&str>>, Vec<Rich<'_, char>>> =
        parser.parse("1\na\n\n3\n4\n\n").into_result();
    assert_eq!(
        result.map_err(|e: Vec<_>| *e.first().unwrap().span()),
        Err(SimpleSpan::new(2usize, 3))
    );

    let result = parser.parse("1\n2\n\n3\n4\n").into_result();
    assert_eq!(result, Ok(vec![vec!["1", "2"], vec!["3", "4"]]));

@ktsimpso
Copy link
Author

ktsimpso commented Nov 28, 2023

.then_ignore(text::newline().repeated());

This works and passes all my tests. Thanks!

@zesterer I still think the default nested_in error span behavior is a bit surprising and does not seem consistent with the other combinators. At the very least I think this should be called out in the docs, though I think ideally the combinator would internally track the proper spans. I'll leave it to you if you want to close this issue or not.

@zesterer
Copy link
Owner

zesterer commented Dec 6, 2023

@zesterer I still think the default nested_in error span behavior is a bit surprising and does not seem consistent with the other combinators. At the very least I think this should be called out in the docs, though I think ideally the combinator would internally track the proper spans. I'll leave it to you if you want to close this issue or not.

I'm not sure how that could work in the general case. Chumsky doesn't intuitively 'understand' spans, they're entirely determined by the input you're parsing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants