Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-31868: allow for some errors checking namespace delete #28761

Merged
merged 1 commit into from
May 4, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@neisw , how about something like this.

bumped the poller to 30s and then added 6 tries in the namespaceDeleted() in case it hits an error. So, if we did hit some error 6 times in a row then we log all of them and pass the last one back out to the poller which will exit and fail the test case.

Original file line number Diff line number Diff line change
Expand Up @@ -275,17 +275,21 @@ func (w *availability) WriteContentToStorage(ctx context.Context, storageDir, ti
}

func (w *availability) namespaceDeleted(ctx context.Context) (bool, error) {
_, err := w.kubeClient.CoreV1().Namespaces().Get(ctx, w.namespaceName, metav1.GetOptions{})
if apierrors.IsNotFound(err) {
return true, nil
}
var lastError error
for retry := 0; retry < 6; retry++ {
_, err := w.kubeClient.CoreV1().Namespaces().Get(ctx, w.namespaceName, metav1.GetOptions{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think being in the loop here will just hammer it 6 times in a row without a pause in between and then exit. It is the PollUntilContextTimeout that waits in between the calls.

err := wait.PollUntilContextTimeout(ctx, 30*time.Second, 20*time.Minute, true, w.namespaceDeleted)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I missed the sleep that you have but still not sure this is the best way.

Copy link
Contributor

@neisw neisw May 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so if the error is nil we return false, nil

If the error is not nil and not IsNotFound we try up to 6 times with a 5 second sleep in between. But we still always return false, nil. So really we just try a few more times in the same call with a smaller interval when we see an non IsNotFound but ultimately keep going until we return true or timeout..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you wanting to quit polling after 6 consecutive failures? Maybe preserve the error and return it at the end.
so var err error outside the loop and return false,err at the end of the function. Should only return there after the 6 tries and no IsNotFound or nil error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that was my intention originally. I've updated it now. look ok?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep I figured that was what you were going for, it just took me a couple of read throughs to get caught up...

if apierrors.IsNotFound(err) {
return true, nil
}

if err != nil {
if err == nil {
return false, nil
}
logrus.Errorf("Error checking for deleted namespace: %s, %s", w.namespaceName, err.Error())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean to suppress the return false here? Looks like this just move the logging up but the comment in the jira had me thinking you wanted to keep polling.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose it would be return false, nil to keep polling but still log the error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I confused myself that a return false, err would not stop the polling. but that's actually what was already happening and how we got to this bug.

but, if we return false, nil then I'm not sure we have a way to keep polling. 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to go look it up earlier myself. I think by returning false, nil it will continue to process. So you could keep the log entry but return nil for the error. This doesn't accomplish my other suggestion but that was just a thought.

PollUntilContextTimeout

PollUntilContextCancel tries a condition func until it returns true, an error, or the context is cancelled or hits a deadline

return false, err
lastError = err
time.Sleep(5 * time.Second)
}

return false, nil
return false, lastError
}

func (w *availability) Cleanup(ctx context.Context) error {
Expand All @@ -299,9 +303,9 @@ func (w *availability) Cleanup(ctx context.Context) error {

startTime := time.Now()
log.Info("waiting for namespace deletion to complete")
err := wait.PollUntilContextTimeout(ctx, 15*time.Second, 20*time.Minute, true, w.namespaceDeleted)
err := wait.PollUntilContextTimeout(ctx, 30*time.Second, 20*time.Minute, true, w.namespaceDeleted)
if err != nil {
log.WithError(err).Error("error waiting for namespace to delete")
log.Errorf("Encountered error while waiting for deleted namespace: %s, %s", w.namespaceName, err)
return err
}
log.Infof("namespace deleted in %.2f seconds", time.Now().Sub(startTime).Seconds())
Expand Down