Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nom::bytes::complete::escaped_transform woes? #1679

Open
kitchen opened this issue Aug 3, 2023 · 1 comment
Open

nom::bytes::complete::escaped_transform woes? #1679

kitchen opened this issue Aug 3, 2023 · 1 comment

Comments

@kitchen
Copy link

kitchen commented Aug 3, 2023

I'm trying to use nom::bytes::complete::escaped_transform and running into some trouble.

Specifically, I'm running into an issue where the function wants and escape char but I am trying to give it an escape byte, one that doesn't seem to be playing nicely with as char (specifically, 0xDB)

It seems as though in rust, a char is actually a multi-byte representation of a unicode character. And if I'm understanding things correctly 0xDB is above decimal 127, which means the "there's another byte to this character" utf-8 encoding thing so it's more like 0xDB00 internally? Now that I think of that, I actually wrote a little test case to check for that and sure enough that's exactly what it is.

Anywho, this possibly raises a bigger issue: this function maybe should be in nom::character::complete instead of bytes since it's clearly character oriented? And then a byte-oriented version placed in nom::bytes::complete? Also I wonder how hard it would be to have the escape char argument be another parser, so you could use tag or something else in place (not that I need that, but it might be useful to make it more generic?)

Thanks!

Prerequisites

❯ rustc --version
rustc 1.71.0 (8ede3aae2 2023-07-12)

❯ grep nom Cargo.toml
nom = "7.1.3"

Test case

use nom::branch::alt;
use nom::bytes::complete::{escaped_transform, is_not, tag};
use nom::combinator::value;
use nom::IResult;

const FEND: u8 = 0xC0;
const FESC: u8 = 0xDB;
const TFEND: u8 = 0xDC;
const TFESC: u8 = 0xDD;

pub fn unescape(input: &[u8]) -> IResult<&[u8], Vec<u8>> {
    escaped_transform(
        is_not([FESC]),
        FESC as char,
        alt((
            value(&[FEND][..], tag(&[TFEND])),
            value(&[FESC][..], tag(&[TFESC])),
        )),
    )(input)
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn try_fesc() {
        let res = unescape(&[0x61, 0x62, FESC, TFEND, 0x63, 0x64, 0x65]);
        assert_eq!(res, Ok((&[][..], vec![0x61, 0x62, FEND, 0x63, 0x64, 0x65])))
    }

    #[test]
    fn try_fesczerozero() {
        // 0xDB as char internally gets turned into 0xDB00, it seems
        // this test case is *not* desired behavior, but I put it here
        // for insight into the implementation details
        let res = unescape(&[0x61, FESC, 0x00, TFEND, 0x63, 0x64]);
        assert_eq!(res, Ok((&[][..], vec![0x61, FEND, 0x63, 0x64])));
    }

    #[test]
    fn try_noesc() {
        let res = unescape(&[0x61, 0x62, 0x63]);
        assert_eq!(res, Ok((&[][..], vec![0x61, 0x62, 0x63])));
    }
}

output of test run:

❯ cargo test
    Finished test [unoptimized + debuginfo] target(s) in 0.00s
     Running unittests src/lib.rs (target/debug/deps/nomplayground-ec796cae7e096d2e)

running 3 tests
test tests::try_noesc ... ok
test tests::try_fesczerozero ... ok
test tests::try_fesc ... FAILED

failures:

---- tests::try_fesc stdout ----
thread 'tests::try_fesc' panicked at 'assertion failed: `(left == right)`
  left: `Err(Error(Error { input: [99, 100, 101], code: Tag }))`,
 right: `Ok(([], [97, 98, 192, 99, 100, 101]))`', src/lib.rs:29:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    tests::try_fesc

test result: FAILED. 2 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

error: test failed, to rerun pass `--lib`

@Geal
Copy link
Collaborator

Geal commented Aug 9, 2023

right it looks like it's missing something when looking at utf8 input

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants