`nested_in()` + `repeated()` produce incorrect error spans. #577

ktsimpso · 2023-11-27T07:10:09Z

Using 1.0.0.alpha6 using a combination of nested_in + repeated() if there is a parse error in the inner parser, the error is reported with a span relative to the start of the chunk that the outer parser parsed rather than the start of the entire input. (This makes sense from the inner parser's perspective as it does not know about the other input). Though my expectation would be for the combinator to reframe error spans to be consistent with the outer span context.

Example:

let blank_line = text::newline::<&str, extra::Err<Rich<'_, char>>>()
    .repeated()
    .exactly(2)
    .ignored();
let chunker = any()
    .and_is(blank_line.not())
    .repeated()
    .at_least(1)
    .to_slice()
    .then_ignore(blank_line.or(end()));
let numbers = text::int(10)
    .separated_by(text::newline())
    .allow_trailing()
    .collect::<Vec<_>>();
let parser = numbers.nested_in(chunker).repeated().collect::<Vec<_>>();

let result = parser.parse("1\n2\n\n3\n4\n\n").into_result();
assert_eq!(result, Ok(vec![vec!["1", "2"], vec!["3", "4"]]));

let result = parser.parse("1\na\n\n3\n4\n\n").into_result();
assert_eq!(
    result.map_err(|e| *e.first().unwrap().span()),
    Err(SimpleSpan::new(2usize, 3))
);

let result = parser.parse("1\n2\n\n3\na\n\n").into_result();
assert_eq!(
    result.map_err(|e| *e.first().unwrap().span()),
    Err(SimpleSpan::new(7usize, 8))
); // fails with left: Err(2..3)

Alternatively is there a nice way for me to manually map the span to the correct outer context?

The text was updated successfully, but these errors were encountered:

zesterer · 2023-11-27T11:10:31Z

Yes, you could use Input::spanned, like this example.

That aside, it's not clear to me why nested_in is being used here: it's designed primarily for recursive inputs like token trees, not for performing extra validation upon an existing output.

ktsimpso · 2023-11-27T16:33:14Z

That aside, it's not clear to me why nested_in is being used here: it's designed primarily for recursive inputs like token trees, not for performing extra validation upon an existing output.

You'd think I'd be able to something like the following for the parser.

let parser = numbers
    .separated_by(blank_line)
    .allow_trailing()
    .collect::<Vec<_>>();

However this assertion will fail.

let result = parser.parse("1\n2\n\n3\n4\n\n").into_result();
assert_eq!(result, Ok(vec![vec!["1", "2"], vec!["3", "4"]])); // fails with left: Err([found end of input at 5..6 expected something else])

I think it's because the trailing member of the numbers parser consumes the last \n token which then causes the blank_line parser to fail. If I modify numbers to be the following to remove the trailing newline issue, the above test case passes.

let numbers = text::int(10)
    .separated_by(text::newline())
    .at_least(1)
    .collect::<Vec<_>>();

However the following test case would fail

let result = parser.parse("1\n2\n\n3\n4\n").into_result();
assert_eq!(result, Ok(vec![vec!["1", "2"], vec!["3", "4"]])); // fails with left: Err([found end of input at 9..9 expected ''0'', ''\r'', or ''\n''])

This would imply I need to add allow_trailing back into the numbers parser which reintroduces the previous error.

I originally attempted to write this parser a year ago on version 0.8.0 but could not get any combination of combinators to parse all cases correctly (though I don't recall exactly all the combinations I attempted). I ended up solving this by doing two parsing passes. When I saw nested_in in 1.0.0 I was quite happy as that added a combinator that essentially did what I have to manually do in 2 passes in 0.8.0 and it works for the cases above though with the quirks with the error spans. Though if I can find some combination that works and avoids having to muck around with nested_in (and thus SpannedInput if I want to have proper error handling) that would be great.

Yes, you could use Input::spanned, like this example.

Thanks I'll take a look at this. My initial attempt stalled out after converting the input to a spanned input. (Which doesn't feel the best)

let input = "1\n2\n\n3\n4\n\n";
let spanned = input
    .chars()
    .enumerate()
    .map(|(i, c)| (c, SimpleSpan::new(i, i + 1)))
    .collect::<Vec<_>>()
    .spanned(SimpleSpan::new(input.len(), input.len()));

All my parsers are defined currently with input as &str, however changing them to SpannedInput<char, SimpleSpan, &[(char, SimpleSpan)]> results in an error with the text::int combinator which only works with a StrInput

It's not immediately clear to me the next step from there.

Zij-IT · 2023-11-27T18:03:24Z

Correct me if I am wrong, but wouldn't it be possible to deny allowing trailing, and then allow the last one to have trailing newlines? Something like:

    let blank_line = text::newline::<&str, extra::Err<Rich<'_, char>>>()
        .repeated()
        .exactly(2)
        .ignored();
    let numbers = text::int::<_, _, extra::Err<Rich<'_, _>>>(10)
        .separated_by(text::newline())
        .at_least(1)
        .collect::<Vec<_>>();
    let parser = numbers
        .separated_by(blank_line)
        .collect::<Vec<_>>()
        .then_ignore(text::newline().repeated());

    let result = parser.parse("1\n2\n\n3\n4\n\n").into_result();
    assert_eq!(result, Ok(vec![vec!["1", "2"], vec!["3", "4"]]));

    let result: Result<Vec<Vec<&str>>, Vec<Rich<'_, char>>> =
        parser.parse("1\na\n\n3\n4\n\n").into_result();
    assert_eq!(
        result.map_err(|e: Vec<_>| *e.first().unwrap().span()),
        Err(SimpleSpan::new(2usize, 3))
    );

    let result = parser.parse("1\n2\n\n3\n4\n").into_result();
    assert_eq!(result, Ok(vec![vec!["1", "2"], vec!["3", "4"]]));

ktsimpso · 2023-11-28T01:49:11Z

.then_ignore(text::newline().repeated());

This works and passes all my tests. Thanks!

@zesterer I still think the default nested_in error span behavior is a bit surprising and does not seem consistent with the other combinators. At the very least I think this should be called out in the docs, though I think ideally the combinator would internally track the proper spans. I'll leave it to you if you want to close this issue or not.

zesterer · 2023-12-06T20:23:56Z

@zesterer I still think the default nested_in error span behavior is a bit surprising and does not seem consistent with the other combinators. At the very least I think this should be called out in the docs, though I think ideally the combinator would internally track the proper spans. I'll leave it to you if you want to close this issue or not.

I'm not sure how that could work in the general case. Chumsky doesn't intuitively 'understand' spans, they're entirely determined by the input you're parsing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`nested_in()` + `repeated()` produce incorrect error spans. #577

`nested_in()` + `repeated()` produce incorrect error spans. #577

ktsimpso commented Nov 27, 2023

zesterer commented Nov 27, 2023

ktsimpso commented Nov 27, 2023 •

edited

Zij-IT commented Nov 27, 2023 •

edited

ktsimpso commented Nov 28, 2023 •

edited

zesterer commented Dec 6, 2023

nested_in() + repeated() produce incorrect error spans. #577

nested_in() + repeated() produce incorrect error spans. #577

Comments

ktsimpso commented Nov 27, 2023

zesterer commented Nov 27, 2023

ktsimpso commented Nov 27, 2023 • edited

Zij-IT commented Nov 27, 2023 • edited

ktsimpso commented Nov 28, 2023 • edited

zesterer commented Dec 6, 2023

`nested_in()` + `repeated()` produce incorrect error spans. #577

`nested_in()` + `repeated()` produce incorrect error spans. #577

ktsimpso commented Nov 27, 2023 •

edited

Zij-IT commented Nov 27, 2023 •

edited

ktsimpso commented Nov 28, 2023 •

edited