Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Unmasked client to server frame" causes 502s #566

Open
alexandergunnarson opened this issue Mar 29, 2024 · 9 comments
Open

"Unmasked client to server frame" causes 502s #566

alexandergunnarson opened this issue Mar 29, 2024 · 9 comments

Comments

@alexandergunnarson
Copy link

alexandergunnarson commented Mar 29, 2024

Hey @ptaoussanis — huge fan of http-kit. We use http-kit (both server and client) at my company, with Sente on top.

Wanted to figure out together why a particularly nasty issue is happening. Every time I see the following stack trace show up in our logs:

at java.base/java.lang.Thread.run(Thread.java:1583)
at org.httpkit.server.HttpServer.run(HttpServer.java:425)
at org.httpkit.server.HttpServer.doRead(HttpServer.java:307)
at org.httpkit.server.HttpServer.decodeWs(HttpServer.java:260)
at org.httpkit.server.WSDecoder.decode(WSDecoder.java:75)
org.httpkit.ProtocolException: unmasked client to server frame
Fri Mar 29 05:05:36 UTC 2024 [server-loop] ERROR - null

we get 502s in various subsequent requests to that server, even unrelated to WebSockets. The problem appears to linger for a few minutes, but it could be less. I believe there are still some successful requests that go through in that time frame — not every request to the affected server will 502 during that time, as far as I can tell, but I can double check this. (I haven't yet traced which requests 502 and which don't, and why.)

We run on AWS, and have our servers load balanced behind CloudFront. The 502 is being emitted from the AWS load balancer due to its failure to connect to a server that experienced the unmasked client to server frame issue.

Given that you have much more familiarity with http-kit's code than I, are you aware of some mechanism that might cause it to 502? It's almost as if it's defensively dropping requests for a time, or an event loop dies (I do see server-loop in the logs, after all) and needs a little time to pick itself back up.

Meanwhile I'll go dig into http-kit's code to see if I can gather any clues.

Also curious what you might recommend to mitigate this problem. Maybe a relevant exception needs to be caught somewhere? Presumably can't do much to mitigate the issue on the client end, as bad payloads happen sometimes.

Thanks! 🙏

@ptaoussanis
Copy link
Member

@alexandergunnarson Hi Alex,

A few questions so long-

  • What version of http-kit are you running?
  • I presume this behaviour is new? Did anything obvious change prior to the new behaviour?
  • Are you able to easily check if http-kit is receiving https "Upgrade", and "Connection: upgrade" headers? Off-hand, I seem to recall having seen this error before when behind a proxy that wasn't forwarding the above headers.

Thanks

@alexandergunnarson
Copy link
Author

alexandergunnarson commented Mar 29, 2024

Hey! Thanks for your quick response.

  • Running 2.8.0-beta3. I believe there was some reason why we're using a beta version, but I don't remember precisely why. Could have been because the client used virtual threads, and that was a desirable feature for us? I'm going to upgrade now to 2.8.0-RC1 to rule out issues.
  • This behavior has happened occasionally ever since we introduced WebSockets to our system back in December. So not new, just newly reported here :)
  • CloudFront is forwarding all headers as far as I can tell (based on config and logs). However, it just occurred to me that because Upgrade, and Connection: upgrade is specific to HTTP/1.1 (not HTTP/2), maybe what's happening here is that Upgrade, and Connection: upgrade aren't actually being handled correctly somehow, and that the issue is being masked. That is, maybe it's the case that >99% of our clients are using HTTP/2 in the browser, not HTTP/1.1, and so when a user uses HTTP/1.1, it yields the "Unmasked client to server frame" error. But this is highly conjectural. I will see whether I can check in our logs.

@alexandergunnarson
Copy link
Author

alexandergunnarson commented Mar 29, 2024

If CloudFront logs are to be believed, the following headers were forwarded for the most recent offending WS requests (all such requests yield 401 or 502 errors):

Accept-Encoding
Accept-Language
Cache-Control
Connection
Cookie
Host
Origin
Pragma
Sec-WebSocket-Extensions
Sec-WebSocket-Key
Sec-WebSocket-Version
Upgrade
User-Agent

The load balancer we have is "dumb" in that it forwards everything and is unlikely to affect anything.

@alexandergunnarson
Copy link
Author

alexandergunnarson commented Mar 29, 2024

Just checked and the vast majority of our WebSocket requests (both failed and successful) are HTTP/1.1, so we can seemingly rule out the headers issue.

@ptaoussanis
Copy link
Member

@alexandergunnarson Hi Alex, apologies for the delay replying!

Unfortunately this isn't an area I'm familiar with so it's not clear to me what might be going on here.

From some very light digging, what I gather is:

  • During initial WebSocket handshake/upgrade, the client shares a masking key with the server. As a security measure, subsequent requests from the client are meant to be masked with that key to help protect the content.

  • So http-kit server is expecting subsequent WebSocket requests to be so masked.

  • But the error you're seeing appears to indicate that http-kit server is seeing a request that isn't masked as expected (?).

  • It seems possible causes for an unmasked or incorrectly masked request from the client would include (in ~order of expected likelihoood):

  1. Proxy or firewall interference (i.e. modifying WS traffic in a way that's interfering with masking)
  2. An issue with http-kit server (presumably some edge-case?)
  3. An issue with the sending WebSocket client
  4. Malicious activity

I'm just engaged with some other work atm so won't have the opportunity to investigate further right now, but perhaps you or someone else might be able to?

Some quick ideas-

Re: possible cause 1

  • Is it possible in your setup that a proxy or firewall may be interfering? Is there perhaps some pattern in the logs for affected requests (e.g. geographical source, etc.)?

Re: possible cause 2

  • Someone would need to dig into the relevant http-kit code, and/or protocols to better understand what might be happening. Ideally some sort of reproducible example would be ideal, but I understand that that might be very difficult to identify.

Re: possible cause 3

  • Do you control the client sending the requests? Is it always the same client? Is it always configured the same way?
  • If the client varies and/or isn't under your control, might it be possible to identify the client in your logs in the cases with the error?

Re: possible cause 4

  • Again I'd probably start by looking for patterns in the logs of affected requests. Perhaps something stands out? Related to cause 3 above, I'd especially look for unexpected clients, etc. Is it possible to tell from the logs whether the affected requests appear to be properly authenticated?

Next steps

I've marked this issue as help wanted. Hopefully someone is available to help investigate this.

Otherwise I'll try take a closer look myself when I'm next on batched http-kit work.

In the meantime any additional info you could provide (esp. re: questions above) would be helpful!

@alexandergunnarson
Copy link
Author

Thanks so much for your help @ptaoussanis! Unfortunately after some deliberation we decided to move to Jetty 11 and put together a working Sente adapter for it. Took a bit of tweaking but works great now!

@alexandergunnarson
Copy link
Author

Cross-posting taoensso/sente#426 (comment)

@ptaoussanis
Copy link
Member

@alexandergunnarson You're very welcome, I'm sorry that you needed to put effort into a move! But happy it worked out (and happy for Sente to get Jetty 11 support!). Thanks for updating 👍

Will keep this issue open until someone can investigate further and hopefully find a fix on http-kit's side if necessary/possible.

@alexandergunnarson
Copy link
Author

No worries — appreciate your support :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants