Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: can frame.into_data() be incomplete? #101

Open
kristof-mattei opened this issue Nov 28, 2023 · 4 comments
Open

Question: can frame.into_data() be incomplete? #101

kristof-mattei opened this issue Nov 28, 2023 · 4 comments

Comments

@kristof-mattei
Copy link

kristof-mattei commented Nov 28, 2023

I use http-body to parse the body of an endless Transfer-Encoding: Chunked stream.

let frame = response.frame().await.expect("Stream ended").expect("Failed to read frame");

let Ok(data) = frame.into_data() else {
    // frame is trailers, ignored
    continue;
};

let decoded = serde_json::from_slice(&data)?;

// ...

But as I've discovered, under certain conditions data is incomplete. When complete it ends in \n.

To fix it I have a buffer that I only parse out the part of [0..(index of first b'\n'] and remove it from the buffer.

This leaves me with the following questions:

  • Is this expected behavior from Frame? Having a partial piece in there?
  • Is the \n a left-over from the Chunked separator \r\n?
@davidpdrsn
Copy link
Member

Are you intending to buffer the whole response body? If so, then yes it might contain more than one frame. You can get the whole response using BodyExt::collect:

body.collect().await?.to_bytes()

@seanmonstar
Copy link
Member

It's not that it's incomplete, but this a common misconception: writes from a peer do not equal the exact same reads locally. There are multiple things that can make a write get cut up into smaller pieces: TCP segment size, HTTP/2 DATA frame size, TLS record size, proxies/intermediaries.

You essentially want something like read_until(). This requires buffering data, since each "frame" may not contain all the bytes you want.

Enough people have asked about this that it makes me think we could probably come up with a helper in http-body-util.

@kristof-mattei
Copy link
Author

Are you intending to buffer the whole response body? If so, then yes it might contain more than one frame. You can get the whole response using BodyExt::collect:

body.collect().await?.to_bytes()

No, the body is endless.

It's not that it's incomplete, but this a common misconception: writes from a peer do not equal the exact same reads locally. There are multiple things that can make a write get cut up into smaller pieces: TCP segment size, HTTP/2 DATA frame size, TLS record size, proxies/intermediaries.

You essentially want something like read_until(). This requires buffering data, since each "frame" may not contain all the bytes you want.

Enough people have asked about this that it makes me think we could probably come up with a helper in http-body-util.

Okay Frame is a lower level than the CRLF-separated Chunk.

Looking at the spec a little bit more: https://en.wikipedia.org/wiki/Chunked_transfer_encoding#Encoded_data it seems that my \n detection is probably not correct and I need to do something a little bit smarter taking the chunk size into account.

@kristof-mattei
Copy link
Author

@seanmonstar reading more I think I found where I got confused.

In HTTP2, which doesn't have Chunked, but it has Frames: https://httpwg.org/specs/rfc7540.html#FrameTypes

So the name Frame in http1 shorted my brain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants