Integration test intermittent failures #1067

briantist · 2023-09-20T13:54:40Z

Example:

These come up from time to time, and typically re-running the tests works fine.

The issue seems to be that the tests delete a value from Vault, then check to ensure it was deleted. Sometimes, the value was "unexpectedly found" when checked.

My guess is that this is a sort of race condition; Vault is accepting the delete request but actually has not deleted it yet, and the next request comes so quickly that it actually returns the value before deletion happens.

If so, we could solve it in tests by either delaying before checking, or retrying the check on failure (that is, retrying the request if it succeeds and returns the value we expect should not be there).

I prefer the retry mechanism rather than a sleep.

If we still see failures with a retry, then we might have a different problem, where intermittently the delete request itself never gets to Vault or is never acted on. Seems unlikely, but we won't know until we do some retries.

I don't think I've ever seen this running the tests locally, but it's possible that it could happen that way too.

erickisos · 2023-10-03T15:52:30Z

Looks like an interesting case, do you know if this happens to multiple tests or is always the same flaky one?

briantist · 2023-10-03T17:12:47Z

Hi @erickisos , unfortunately I don't remember, I didn't record previous ones and I don't think I've seen any new ones since creating this issue. But it can only occur for tests where we make another call to Vault.

So in this search: https://github.com/search?q=repo%3Ahvac%2Fhvac+self.assertNotIn+path%3A%2F%5Etests%5C%2Fintegration_tests%5C%2F%2F&type=code

The asserts where one of the operands is some call that reaches out to Vault are the ones that would be susceptible.

The specific call in the CI run in this issue is this one:

hvac/tests/integration_tests/v1/test_system_backend.py

Line 115 in 9befaa4

self.assertNotIn(

One way that we might be able to add retries is with the method described here:
https://hvac.readthedocs.io/en/stable/advanced_usage.html#retrying-failed-requests

But it might have to be used selected cases only because we'd tweaking things like the backoff and number of retries, and in particular, we'd be trying on successful response codes and not failures like usual. Bit of a weird situation!

Anyway just an idea, thanks for your interest!

briantist added help wanted Contributions welcome! CI/CD related to CI/CD (not necessarily tests) tests related to tests (not necessarily CI/CD) developer experience Developer setup and experience labels Sep 20, 2023

briantist mentioned this issue Mar 22, 2024

LDAP secret engine support (#1032) #1033

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integration test intermittent failures #1067

Integration test intermittent failures #1067

briantist commented Sep 20, 2023 •

edited

erickisos commented Oct 3, 2023

briantist commented Oct 3, 2023

Integration test intermittent failures #1067

Integration test intermittent failures #1067

Comments

briantist commented Sep 20, 2023 • edited

erickisos commented Oct 3, 2023

briantist commented Oct 3, 2023

briantist commented Sep 20, 2023 •

edited