Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connectivity test concurrent run #2496

Merged
merged 9 commits into from
May 16, 2024
Merged

Conversation

viktor-kurchenko
Copy link
Contributor

@viktor-kurchenko viktor-kurchenko commented Apr 22, 2024

This PR introduces connectivity test concurrent run.

The new test-concurrency input param can be used to specify a count of K8S test namespaces.
So, most connectivity tests will run concurrently across test namespaces.

The new parameter is marked hidden because of the following reasons:

  1. test and stabilisation internally (e.g.: in CI)
  2. report output must be improved to avoid mess (especially for the debug and verbose modes). I want to address this in a separate PR.

Command run example: cilium connectivity test --test-concurrency=5

Local test in a 4 node kind cluster result:

@michi-covalent
Copy link
Contributor

@viktor-kurchenko could you take a look at kind test failures? from-cidr-host-netns/from-cidr-to-pod/host-netns-to-pod seems to be failing consistently.

Copy link
Contributor

@michi-covalent michi-covalent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add a matrix for the kind job and run with both --test-concurrency=1 and --test-concurrency=5: https://github.com/cilium/cilium-cli/blob/main/.github/workflows/kind.yaml#L24

cli/connectivity.go Outdated Show resolved Hide resolved
connectivity/suite.go Outdated Show resolved Hide resolved
connectivity/check/deployment.go Show resolved Hide resolved
cli/install.go Outdated Show resolved Hide resolved
Copy link
Member

@nebril nebril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome change overall, very happy to see it! I think the only blocking comment from me is showing the user which test Fatald.

connectivity/suite.go Show resolved Hide resolved
connectivity/check/policy.go Show resolved Hide resolved
michi-covalent added a commit that referenced this pull request Apr 25, 2024
This commit changes the behavior of --test-namespace flag. Instead of
only deleting the namespace that matches --test-namespace flag exactly,
delete all the namespaces with the prefix specified in --test-namespace
flag. This is in preparation to support running connectivity tests in
parallel using multiple namespaces (#2496).

Signed-off-by: Michi Mutsuzaki <[email protected]>
@viktor-kurchenko viktor-kurchenko requested a review from a team as a code owner May 2, 2024 08:10
@viktor-kurchenko viktor-kurchenko requested a review from aanm May 2, 2024 08:10
@asauber asauber self-requested a review May 10, 2024 19:42
cli/connectivity.go Show resolved Hide resolved
connectivity/builder/builder.go Outdated Show resolved Hide resolved
connectivity/builder/builder.go Outdated Show resolved Hide resolved
connectivity/suite.go Outdated Show resolved Hide resolved
connectivity/builder/builder.go Show resolved Hide resolved
Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for my codeowners. I think the new policy lock retains the same assumptions we have today where only one piece of software should be manipulating policies at the time that you do the connectivity test (ie, only the CLI does it). And that should be good enough for now. It would be cool to explore better mechanisms for ensuring that datapath updates have occurred for a specific policy, but for now we don't really have a solution to that problem and it's not worth holding back this effort until we do that.

Comment on lines 28 to 31
fail-fast: false
matrix:
include:
# run connectivity tests explicitly without concurrency
- test-concurrency: 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to retain fail-fast for test-concurrency: 1 for quicker iteration and keep it off for higher concurrency?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I've disabled fail-fast for the debugging purpose.
I think it's better to remove it at all for the workflow.

New `test-concurrency` input param added for the connectivity
test command to provide namespace count for concurrent tests
execution.
The hidden flag will be removed after internal testing.

Signed-off-by: viktor-kurchenko <[email protected]>
This PR introduces connectivity test concurrent run:
- Parameters struct extended with ExternalDeploymentPort.
- ConnectivityTest.Cleanup method implemented for instance reusability.
- ConnectivityTest instantiated concerning test-concurrency input param.
- GetTestSuites function implemented to return test suite functions.
- Run function refactored to properly setup and run ConnectivityTest instances.

Signed-off-by: viktor-kurchenko <[email protected]>
Package global policy mutex added to guarantee proper
policy provisioning and deletion in case of concurrent run.

Signed-off-by: viktor-kurchenko <[email protected]>
Signed-off-by: viktor-kurchenko <[email protected]>
The MultiError struct implemented and used instead of `hashicorp/go-multierror`
library. The MultiError allows running many goroutine concurrently and returns
a joined error if at least one goroutine returns an error.

Signed-off-by: viktor-kurchenko <[email protected]>
The kind workflow improved with matrix to test `test-concurrency` parameter.

Signed-off-by: viktor-kurchenko <[email protected]>
Signed-off-by: viktor-kurchenko <[email protected]>
The check.EchoServerHostPort constant moved into check.Parameters
to make it unique for each test namespace in case of concurrent run.

Signed-off-by: viktor-kurchenko <[email protected]>
ClusterMesh connectivity tests setup fixed to support test-concurrency
param. Kind workflow (clustermesh part) updated with test-concurrency.

Signed-off-by: viktor-kurchenko <[email protected]>
Copy link
Contributor

@michi-covalent michi-covalent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's do it

@michi-covalent michi-covalent merged commit 1855f6c into main May 16, 2024
15 checks passed
@michi-covalent michi-covalent deleted the pr/vk/conn/tests/concurrency branch May 16, 2024 22:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants