-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Buildbot infrastructure instability caused by time synchronization on latent workers #269
Comments
I just checked and it seems that ntpd was started incorrectly on service3 but I don't see anything in the logs about there being any time issues when I did restart it the adjustment was microseconds. |
I’m not sure if you use VM's for running worker machines.
For example, here https://buildbot.buildbot.net/#/builders/108/builds/2120 is an interesting situation where there is probably a time shifted twice.
|
Interesting. These workers are on a machine I boot up when I want faster test execution. Recently I migrated them to podman containers using gVisor container runtime. Probably gVisor doesn't fake syscalls well enough. |
Just checking in this isn't an issue with time on the master? Sounds like it's not but I want to make sure if there's anything I need to do let me know. |
@verm There's no issues with time on master. For any issues in p12-* workers the worker setup is the first suspect. |
@p12tic okay great! |
Now, it looks like p12-pd-? workers have run out of disk space for /home because errors like error Error: ENOSPC: no space left on device, mkdir '/home/buildbot/... WARNING: Building wheel for buildbot failed: [Errno 28] No space left on device: '/home/buildbot/.cache/pip/wheels/62' This applies at least to
|
Hi
I guess there is Buildbot infrastructure instability caused by time synchronization on latent workers.
On latent workers p12-pd-?? I'm seeing bizarre errors that seem to be time sync related.
It looks as if the time synchronization occurred during the execution of steps.
Reasons why I suspect time sync issue is that I randomly seeing these problems:
Negative elapsed time:
elapsedTime=-16489.823580
see line 38 in https://buildbot.buildbot.net/#/builders/100/builds/3298/steps/1/logs/stdio
Elapsed time much bigger than timeout 1200 seconds
elapsedTime=16497.727639
see https://buildbot.buildbot.net/#/builders/108/builds/2120/steps/7/logs/stdiotime related node assert
node[2513]: ../src/env.cc:1288:v8::Local<v8::Value> node::Environment::GetNow(): Assertion
(now) >= (timer_base())' failed.`https://buildbot.buildbot.net/#/builders/126/builds/92
inconsistencies in the duration of the step reported in the master webUI and timeout in the workers
master webUI reports
coverage tests
run for 8:28 (508 seconds)command timed out: 1200 seconds without output running b'/tmp/bbvenv/bin/coverage run
https://buildbot.buildbot.net/#/builders/83/builds/4711
master webUI reports
set -e
run for 5 secondscommand timed out: 1200 seconds without output running
https://buildbot.buildbot.net/#/builders/108/builds/2120
...
@p12tic what do you think?
The text was updated successfully, but these errors were encountered: