-
Notifications
You must be signed in to change notification settings - Fork 371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integration test intermittent failures #1067
Comments
Looks like an interesting case, do you know if this happens to multiple tests or is always the same flaky one? |
Hi @erickisos , unfortunately I don't remember, I didn't record previous ones and I don't think I've seen any new ones since creating this issue. But it can only occur for tests where we make another call to Vault. So in this search: https://github.com/search?q=repo%3Ahvac%2Fhvac+self.assertNotIn+path%3A%2F%5Etests%5C%2Fintegration_tests%5C%2F%2F&type=code The asserts where one of the operands is some call that reaches out to Vault are the ones that would be susceptible. The specific call in the CI run in this issue is this one:
One way that we might be able to add retries is with the method described here: But it might have to be used selected cases only because we'd tweaking things like the backoff and number of retries, and in particular, we'd be trying on successful response codes and not failures like usual. Bit of a weird situation! Anyway just an idea, thanks for your interest! |
Example:
These come up from time to time, and typically re-running the tests works fine.
The issue seems to be that the tests delete a value from Vault, then check to ensure it was deleted. Sometimes, the value was "unexpectedly found" when checked.
My guess is that this is a sort of race condition; Vault is accepting the delete request but actually has not deleted it yet, and the next request comes so quickly that it actually returns the value before deletion happens.
If so, we could solve it in tests by either delaying before checking, or retrying the check on failure (that is, retrying the request if it succeeds and returns the value we expect should not be there).
I prefer the retry mechanism rather than a sleep.
If we still see failures with a retry, then we might have a different problem, where intermittently the delete request itself never gets to Vault or is never acted on. Seems unlikely, but we won't know until we do some retries.
I don't think I've ever seen this running the tests locally, but it's possible that it could happen that way too.
The text was updated successfully, but these errors were encountered: