-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unreachable hosts not reset on platform group failure. #6100
Comments
Thanks for finding a nice simple reproducer. I would not have thought to use the |
There aren't very many things deliberately breaking your SSH config is good for, but it's a very easy way to simulate broken connections. |
8 tasks
8 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Description
When all hosts in a platform are uncontactable we remove all the hosts on that platform from the list of bad hosts to allow submission retries.
It doesn't look like this happens when all the platforms in a group are exhausted leading to submit retries being ineffective at handling short term network blips.
Reproducible Example
foo
has succeeded, then remove the lines in the ssh config.Expected Behaviour
If all the hosts of all the platforms in a group are bad, all the hosts of all the platforms should be removed from the bad-hosts set to allow resubmission.
Note on logging
Logging, especially error logging is not alway clear where selection of hosts from platform groups has failed:
The text was updated successfully, but these errors were encountered: