Replies: 2 comments
-
Preliminaries:
These things point to slow volume i/o (i.e. slow filesystem):
Do you have any kind of system probe that can tell whether disk i/o has dropped? Things like telegraf and the datadog agent measure these things pretty nicely, but I don't think bosh does out of the box. If you don't then the only way to get a scientific measurement would be to downgrade and install such a probe, establish a baseline and then re-upgrade. And that will only be conclusive if the issue is actually correlated to the upgrade. this all reminds me a bit of a customer incident where a vSphere stemcell got upgraded and the baggageclaim driver, through automated detection, got switched from EDIT: in general perhaps there could be some errors in the worker logs? particularly those from baggageclaim. |
Beta Was this translation helpful? Give feedback.
-
I'm also reminded of #5298, but I have no idea how that helps. |
Beta Was this translation helpful? Give feedback.
-
Hello concourse folks,
We started noticing issues with container creation on Windows and Linux vSphere workers following an upgrade to 6.1. We believed that issue to be resolved, after a change to the task run params led to the next several builds going green. However, we've seen this come up again with some delay, even after deleting and recreating the deployment.
We have seen these issues on a variety of tasks run on these workers. The issues manifest in two ways:
bosh ssh
onto an affected worker and try to download something usingcurl
, we see download speeds of about ~1.1 Mi/BPut "/volumes/{volume-id}/stream-in?path=.": net/http: timeout awaiting response headers
leading to a failed job.contract-test
task in this job.Things we tried:
bosh recreate
did not fix this issuebosh vms --vitals
did not reveal anything abnormal:Could we get some suggestions of things to try or look for to diagnose and resolve our issue?
Beta Was this translation helpful? Give feedback.
All reactions