You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to use nom::bytes::complete::escaped_transform and running into some trouble.
Specifically, I'm running into an issue where the function wants and escape char but I am trying to give it an escape byte, one that doesn't seem to be playing nicely with as char (specifically, 0xDB)
It seems as though in rust, a char is actually a multi-byte representation of a unicode character. And if I'm understanding things correctly 0xDB is above decimal 127, which means the "there's another byte to this character" utf-8 encoding thing so it's more like 0xDB00 internally? Now that I think of that, I actually wrote a little test case to check for that and sure enough that's exactly what it is.
Anywho, this possibly raises a bigger issue: this function maybe should be in nom::character::complete instead of bytes since it's clearly character oriented? And then a byte-oriented version placed in nom::bytes::complete? Also I wonder how hard it would be to have the escape char argument be another parser, so you could use tag or something else in place (not that I need that, but it might be useful to make it more generic?)
Thanks!
Prerequisites
❯ rustc --version
rustc 1.71.0 (8ede3aae2 2023-07-12)
❯ grep nom Cargo.toml
nom = "7.1.3"
Test case
use nom::branch::alt;
use nom::bytes::complete::{escaped_transform, is_not, tag};
use nom::combinator::value;
use nom::IResult;
const FEND: u8 = 0xC0;
const FESC: u8 = 0xDB;
const TFEND: u8 = 0xDC;
const TFESC: u8 = 0xDD;
pub fn unescape(input: &[u8]) -> IResult<&[u8], Vec<u8>> {
escaped_transform(
is_not([FESC]),
FESC as char,
alt((
value(&[FEND][..], tag(&[TFEND])),
value(&[FESC][..], tag(&[TFESC])),
)),
)(input)
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn try_fesc() {
let res = unescape(&[0x61, 0x62, FESC, TFEND, 0x63, 0x64, 0x65]);
assert_eq!(res, Ok((&[][..], vec![0x61, 0x62, FEND, 0x63, 0x64, 0x65])))
}
#[test]
fn try_fesczerozero() {
// 0xDB as char internally gets turned into 0xDB00, it seems
// this test case is *not* desired behavior, but I put it here
// for insight into the implementation details
let res = unescape(&[0x61, FESC, 0x00, TFEND, 0x63, 0x64]);
assert_eq!(res, Ok((&[][..], vec![0x61, FEND, 0x63, 0x64])));
}
#[test]
fn try_noesc() {
let res = unescape(&[0x61, 0x62, 0x63]);
assert_eq!(res, Ok((&[][..], vec![0x61, 0x62, 0x63])));
}
}
output of test run:
❯ cargo test
Finished test [unoptimized + debuginfo] target(s) in 0.00s
Running unittests src/lib.rs (target/debug/deps/nomplayground-ec796cae7e096d2e)
running 3 tests
test tests::try_noesc ... ok
test tests::try_fesczerozero ... ok
test tests::try_fesc ... FAILED
failures:
---- tests::try_fesc stdout ----
thread 'tests::try_fesc' panicked at 'assertion failed: `(left == right)`
left: `Err(Error(Error { input: [99, 100, 101], code: Tag }))`,
right: `Ok(([], [97, 98, 192, 99, 100, 101]))`', src/lib.rs:29:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
failures:
tests::try_fesc
test result: FAILED. 2 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
error: test failed, to rerun pass `--lib`
The text was updated successfully, but these errors were encountered:
I'm trying to use nom::bytes::complete::escaped_transform and running into some trouble.
Specifically, I'm running into an issue where the function wants and escape
char
but I am trying to give it an escape byte, one that doesn't seem to be playing nicely withas char
(specifically,0xDB
)It seems as though in rust, a
char
is actually a multi-byte representation of a unicode character. And if I'm understanding things correctly0xDB
is above decimal 127, which means the "there's another byte to this character" utf-8 encoding thing so it's more like0xDB00
internally? Now that I think of that, I actually wrote a little test case to check for that and sure enough that's exactly what it is.Anywho, this possibly raises a bigger issue: this function maybe should be in
nom::character::complete
instead ofbytes
since it's clearly character oriented? And then a byte-oriented version placed innom::bytes::complete
? Also I wonder how hard it would be to have the escape char argument be another parser, so you could usetag
or something else in place (not that I need that, but it might be useful to make it more generic?)Thanks!
Prerequisites
Test case
output of test run:
The text was updated successfully, but these errors were encountered: