Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase service startup delay for tests that fail the health checks #1583

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

jefferbrecht
Copy link
Member

@jefferbrecht jefferbrecht commented Jan 19, 2024

Description

The health checks can take up to 30s to time out when they fail. We use a service startup delay of 20s. which is adequate in passing scenarios. However, for tests that deliberately fail the health checks, we need to make sure that the full 30s can lapse before validating the health check results.

Related issue

b/321001728

How has this been tested?

Will let presubmits run.

Checklist:

  • Unit tests
    • Unit tests do not apply.
    • Unit tests have been added/modified and passed for this PR.
  • Integration tests
    • Integration tests do not apply.
    • Integration tests have been added/modified and passed for this PR.
  • Documentation
    • This PR introduces no user visible changes.
    • This PR introduces user visible changes and the corresponding documentation change has been made.
  • Minor version bump
    • This PR introduces no new features.
    • This PR introduces new features, and there is a separate PR to bump the minor version since the last release already.
    • This PR bumps the version.

@jefferbrecht jefferbrecht changed the title Block on service startup before restarting on Windows Increase service startup delay for Windows Jan 22, 2024
@jefferbrecht jefferbrecht force-pushed the jefferbrecht-windows-flakes branch 2 times, most recently from c927d4f to 95aec32 Compare January 23, 2024 14:53
@jefferbrecht jefferbrecht changed the title Increase service startup delay for Windows Increase service startup delay for tests that fail the health checks Jan 23, 2024
The health checks have a backoff of up to 30s in some cases,
which should lapse before we perform any validations of the
health check results.
Cleaning up some compiler warnings
@@ -4505,6 +4510,10 @@ func TestNetworkHealthCheck(t *testing.T) {
t.Fatal(err)
}

// Use a longer service startup delay to allow the health check backoffs to complete.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous remote command startCommandForPlatform won't continue until the Health Checks finish executing. So, there is not need to account for the network check backoffs within this tests.

s.log.Info(EngineEventID, "generated configuration files")
s.runHealthChecks()
changes <- svc.Status{State: svc.Running, Accepts: cmdsAccepted}
if err := s.startSubagents(); err != nil {
s.log.Error(EngineEventID, fmt.Sprintf("failed to start subagents: %v", err))
// TODO: Ignore failures for partial startup?
}
s.log.Info(EngineEventID, "started subagents")
defer func() {

AFAIU, the flakiness arises (as @jefferbrecht pointed offline) when querying for the service status within getRecentServiceOutputForPlatform(). We could still add some delays, to be sure the status reaches the Windows Event Log or journald.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants