Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trt-1538: Wait for monitor resources cleanup #28760

Closed
wants to merge 1 commit into from

Conversation

jluhrsen
Copy link
Contributor

@jluhrsen jluhrsen commented May 1, 2024

No description provided.

@openshift-ci-robot
Copy link

openshift-ci-robot commented May 1, 2024

@jluhrsen: This pull request references trt-1538 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.15.z" version, but no target version was set.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 1, 2024
@openshift-ci openshift-ci bot requested review from csrwng and jwforres May 1, 2024 00:45
Copy link
Contributor

openshift-ci bot commented May 1, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: jluhrsen
Once this PR has been reviewed and has the lgtm label, please assign spadgett for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@@ -186,7 +186,7 @@ func (o *RunMonitorOptions) Run() error {

fmt.Fprintf(o.Out, "Monitor shutting down, this may take up to five minutes...\n")

cleanupContext, cleanupCancel := context.WithTimeout(context.Background(), 5*time.Minute)
cleanupContext, cleanupCancel := context.WithTimeout(context.Background(), 15*time.Minute)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wound up bumping this to 20 minutes master run_monitor_command

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

func (w *availability) Cleanup(ctx context.Context) error {
if w.imageRegistryRoute != nil {
err := w.routeClient.RouteV1().Routes("openshift-image-registry").Delete(ctx, w.imageRegistryRoute.Name, metav1.DeleteOptions{})
if err != nil {
return fmt.Errorf("failed to delete route: %w", err)
}

startTime := time.Now()
err = wait.PollUntilContextTimeout(ctx, 15*time.Second, 15*time.Minute, true, w.routeDeleted)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

20

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

func (pna *podNetworkAvalibility) Cleanup(ctx context.Context) error {
if len(pna.namespaceName) > 0 && pna.kubeClient != nil {
if err := pna.kubeClient.CoreV1().Namespaces().Delete(ctx, pna.namespaceName, metav1.DeleteOptions{}); err != nil {
return err
}

startTime := time.Now()
err := wait.PollUntilContextTimeout(ctx, 15*time.Second, 15*time.Minute, true, pna.namespaceDeleted)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

20

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

func (w *availability) Cleanup(ctx context.Context) error {
if len(w.namespaceName) > 0 && w.kubeClient != nil {
if err := w.kubeClient.CoreV1().Namespaces().Delete(ctx, w.namespaceName, metav1.DeleteOptions{}); err != nil {
return err
}
}

startTime := time.Now()
err := wait.PollUntilContextTimeout(ctx, 15*time.Second, 15*time.Minute, true, w.namespaceDeleted)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and 20 minutes here as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

func (w *availability) Cleanup(ctx context.Context) error {
if len(w.namespaceName) > 0 && w.kubeClient != nil {
if err := w.kubeClient.CoreV1().Namespaces().Delete(ctx, w.namespaceName, metav1.DeleteOptions{}); err != nil {
return err
}
}

startTime := time.Now()
err := wait.PollUntilContextTimeout(ctx, 20*time.Second, 15*time.Minute, true, w.namespaceDeleted)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you meant to bump the timeout (minutes) not the interval

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sloppy... fixed.

@neisw
Copy link
Contributor

neisw commented May 3, 2024

/retest-required

1 similar comment
@jluhrsen
Copy link
Contributor Author

jluhrsen commented May 3, 2024

/retest-required

@jluhrsen
Copy link
Contributor Author

jluhrsen commented May 3, 2024

@neisw , is there any value in this PR? the purpose of it was to prove that our OCPBUGS-31868 is not a regression in 4.16. So assuming it's not a regression, this PR has the chance of making new failures show up in 4.15. Or, we can get this in and backport the change to allow for some errors in the checking of namespaces.

@neisw
Copy link
Contributor

neisw commented May 3, 2024

@jluhrsen The issue in 4.16 is occurring infrequently I believe. If you don't have a strong push to get this in and compare the results it seems reasonable to hold off. We didn't backport it originally even though the root issue that surfaced the need for this was in 4.15. I don't think it is super risky since we have been running it for a while but if you are ok without it we could close this.

@jluhrsen
Copy link
Contributor Author

jluhrsen commented May 3, 2024

totally agree, @neisw. closing this now.

@jluhrsen jluhrsen closed this May 3, 2024
Copy link
Contributor

openshift-ci bot commented May 3, 2024

@jluhrsen: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-etcd-recovery 2a07b9f link false /test e2e-aws-etcd-recovery
ci/prow/e2e-openstack-ovn 2a07b9f link false /test e2e-openstack-ovn
ci/prow/e2e-aws-ovn-single-node-serial 2a07b9f link false /test e2e-aws-ovn-single-node-serial
ci/prow/e2e-aws-ovn-single-node-upgrade 2a07b9f link false /test e2e-aws-ovn-single-node-upgrade
ci/prow/e2e-aws-ovn-single-node 2a07b9f link false /test e2e-aws-ovn-single-node
ci/prow/e2e-metal-ipi-sdn 2a07b9f link false /test e2e-metal-ipi-sdn

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants