Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding::decode_to_utf16 ? #24

Open
SimonSapin opened this issue Oct 31, 2017 · 2 comments
Open

Encoding::decode_to_utf16 ? #24

SimonSapin opened this issue Oct 31, 2017 · 2 comments

Comments

@SimonSapin
Copy link

SimonSapin commented Oct 31, 2017

I’ve just written this function:

fn decode_to_utf16(bytes: &[u8], encoding: &'static Encoding) -> Vec<u16> {
    let mut decoder = encoding.new_decoder();
    let capacity = decoder.max_utf16_buffer_length(bytes.len()).exepct("Overflow");
    let mut utf16 = Vec::with_capacity(capacity);
    let uninitialized = unsafe {
        slice::from_raw_parts_mut(utf16.as_ptr(), capacity)
    };
    let last = true;
    let (_, read, written, _) = decoder.decode_to_utf16(bytes, uninitialized, last);
    assert!(read == bytes.len());
    unsafe {
        utf16.set_len(written)
    }
    utf16
}

Do you think it would belong as a method of Encoding?

@hsivonen
Copy link
Owner

hsivonen commented Nov 1, 2017

Do you think it would belong as a method of Encoding?

It doesn't exist as a method on Encoding at present, because I thought Rust programs would want to decode to UTF-8 and encode from UTF-16.

If there's a reason to believe that wishing to decode to UTF-16 in the non-streaming manner (with infallible allocation) has utility for Rust programs beyond one isolated case, then it would make sense to add UTF-16 variants of the non-streaming API to Rust, too. (Currently those variants are in C++ only.)

What's the context of your function? That is, should we expect it to represent a recurring use case or a one-time oddity?

@SimonSapin
Copy link
Author

I’ve used this in the Servo implementation of https://xhr.spec.whatwg.org/#json-response which takes a Vec<u8> that was earlier read from the network, and calls a SpiderMonkey function that takes const char16_t* chars, uint32_t len. So it is a rather isolated case.

(By the way we’re switching Servo to encoding_rs: servo/servo#19073)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants