Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add test for HTTP2 CONNECT termination #3655

Merged
merged 4 commits into from
May 17, 2024

Conversation

howardjohn
Copy link
Contributor

@howardjohn howardjohn commented May 1, 2024

For #3652.

In #3647, I thought I tracked down the root cause of #3652. However, with the help of @seanmonstar in #3652 (comment) I realized that this was too forcefully terminating the connection even if things other than SendRequest were still alive. I also realized the original test case I wrote was just wrong and failed to properly reproduce the issue.

This adds a test that properly reproduces the issue. On my machine, it fails about 5% of the time:

1876 runs so far, 100 failures (94.94% pass rate). 95.197349ms avg, 1.097347435s max, 5.398457ms min

With further investigation, I believe this bug actually originates in h2 itself. hyperium/h2#772. With that PR, this test is 100% reliable

64010 runs so far, 0 failures (100.00% pass rate). 44.484057ms avg, 121.454709ms max, 1.872657ms min

howardjohn added a commit to howardjohn/h2 that referenced this pull request May 1, 2024
See hyperium/hyper#3652.

What I have found is the final reference to a stream being dropped
after the `maybe_close_connection_if_no_streams` but before the
`inner.poll()` completes can lead to the connection dangling forever
without any forward progress. No streams/references are alive, but the
connection is not complete and never wakes up again. This seems like a
classic TOCTOU race condition.

In this fix, I check again at the end of poll and if this state is
detected, wake up the task again.

Wth the test in hyperium/hyper#3655, on my machine, it fails about 5% of the time:
```
1876 runs so far, 100 failures (94.94% pass rate). 95.197349ms avg, 1.097347435s max, 5.398457ms min
```

With that PR, this test is 100% reliable
```
64010 runs so far, 0 failures (100.00% pass rate). 44.484057ms avg, 121.454709ms max, 1.872657ms min
```

Note: we also have reproduced this using `h2` directly outside of `hyper`, which is what gives me
confidence this issue lies in `h2` and not `hyper`.
@howardjohn howardjohn marked this pull request as ready for review May 1, 2024 22:35
seanmonstar pushed a commit to hyperium/h2 that referenced this pull request May 2, 2024
See hyperium/hyper#3652.

What I have found is the final reference to a stream being dropped
after the `maybe_close_connection_if_no_streams` but before the
`inner.poll()` completes can lead to the connection dangling forever
without any forward progress. No streams/references are alive, but the
connection is not complete and never wakes up again. This seems like a
classic TOCTOU race condition.

In this fix, I check again at the end of poll and if this state is
detected, wake up the task again.

Wth the test in hyperium/hyper#3655, on my machine, it fails about 5% of the time:
```
1876 runs so far, 100 failures (94.94% pass rate). 95.197349ms avg, 1.097347435s max, 5.398457ms min
```

With that PR, this test is 100% reliable
```
64010 runs so far, 0 failures (100.00% pass rate). 44.484057ms avg, 121.454709ms max, 1.872657ms min
```

Note: we also have reproduced this using `h2` directly outside of `hyper`, which is what gives me
confidence this issue lies in `h2` and not `hyper`.
@seanmonstar
Copy link
Member

Thank you! I was waiting to release h2 with the fix, so we don't add a flaky test to hyper. The h2 v0.4.5 is out now :)

@seanmonstar seanmonstar merged commit a8f9e06 into hyperium:master May 17, 2024
22 checks passed
@howardjohn
Copy link
Contributor Author

Ah makes sense. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants