New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
We might benefit from a "network should continue working in cases that resemble syn_cookie or syn_flood" test... #120979
Comments
/sig network |
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/good-first-issue since this is experimental, im thinking it would be a good exploratory issue, even if all we did is write a shell script or something that simulated this as a blog post ... and didnt commit it to core k/k..... |
@jayunit100: GuidelinesPlease ensure that the issue body includes answers to the following questions:
For more details on the requirements of such an issue, please see here and ensure that they are met. If this request no longer meets these requirements, the label can be removed In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/sig scalability |
Hello @jayunit100 I would like to work on this issue |
Ok I suppose this would be experimental for now. If your able to build a cluster on Ubuntu kernel version 5.4.0-123 .... that might allow you to simulate this type of corner case. Also need to see from others if they agree such a test would be worthwhile and if so wether k/k is the right place for it |
the problem is that the test will depend a lot on the environment, and you may hit other bottlenecks or impact other things ... this can only run in a very specific and controller environment to be reliable |
Ya my initial attempt I hit ip exhaustion before I could hit a stack trace :). I'm ok closing this issue if folks feel like it's not reproducible . I felt like maybe there's a clever way to do this that I haven't thought of though? |
In Still, as @aojea says, this depend a lot on the environment. |
yes. I don't think we need to go down to the level of reproducing it reliably. But.. . It feels like it would be nice to have a network flood of some sort . But maybe there's a combination of e2es which can do that as is? I'm ok to open or close this one . I think leaving it open until the bot closes It is ok to In case someone decides they have an idea for it. |
@jayunit100 May I work on this issue? |
Sure , but... see earlier comments ... as it will be hard to reproduce. See if you can detect syn flood in your Ubuntu box by running something locally If you can then next step would be to containerized it and see if you can get it to trigger when hitting a pod instead of hitting localhost. I think it would be a good experiment to explore either way and write up the results here. Lars has a idea that sounds like a good first pass. |
What happened?
There are kernel bugs such as https://bugs.launchpad.net/ubuntu/+source/linux-hwe-5.4/+bug/1981658, which we've found on ubuntu versions, which only are visible if a node is pushed to the point that it "Switches" the way it processes packets, to the syn_cookie kernel path...
Now, Looking at https://access.redhat.com/solutions/30453, it doesnt look like we really want this to be a normal scenario......
Status
I ran our existing e2es (iperf, networkpolicies, and sig-network tests) at high concurrency, and wasnt able to force a synflood/syncookie fallback at any point, but i know of clusters where kubelets have failed due to that path in the kernel.
Goal
Would be nice if we had a sig-network e2e that simulated this ? Feel free to close if one of the existing tests already does this.
This may or may not be possible. I THINK that the way to do this would be to fire off lots of TCP connections.... and somehow keep them open for a long time (i.e. maybe serve very large packets?)
What did you expect to happen?
Kubernetes sig-net or similar e2e's would be able to simulate overloaded service/endpoints where non-normal TCP connection handling start happening.
How can we reproduce it (as minimally and precisely as possible)?
Not sure, thats the purpose of this test :). But... I suppose we might be able to reproduce syn-floods if theres too many TCP asks coming in at agiven time, in parallel (see above linked RH article).
Anything else we need to know?
No response
Kubernetes version
Cloud provider
OS version
Install tools
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, ...) and versions (if applicable)
The text was updated successfully, but these errors were encountered: