Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prioritizeBuilders not functioning well #6285

Open
rpurdie opened this issue Oct 30, 2021 · 0 comments · May be fixed by #6286
Open

prioritizeBuilders not functioning well #6285

rpurdie opened this issue Oct 30, 2021 · 0 comments · May be fixed by #6286

Comments

@rpurdie
Copy link
Contributor

rpurdie commented Oct 30, 2021

Yocto Project implemented a prioritizeBuilders function roughly as per an example tardyp was kind enough to share:
https://gist.github.com/tardyp/f5ba4591f1c65b5d823ef5ba1dc3f399

Unfortunately in daily use, this is not doing what it should. It has taken us a while to track down the issue but basically if the system just has one builder running which then through a sched.Triggerable triggers a set of builderNames (say 30 of them), the 30 builds are not allocated in prioritizeBuilders order and some may stall due to the allocation order being poor.

Debugging suggests that the build requests appear in a 'random' order (database ID?) and they're assigned to a worker faster than they appear. I did try ordering the builders list we add the builders with but that doesn't change anything, I also tried changing the builderNames in the Triaggerable but that also has no influence.

What also makes prioritizeBuilders() function poorly is that the code in buildrequestdistrubutor:_activityLoop gobbles the data from self._pending_builders into it's own pending_builder list which means the picture prioritizeBuilders() has at any given time can be extremely limited having been gobbled by the activity loop. I did prove this was a problem by adding some code:

--- buildrequestdistributor1.py	2021-10-30 15:38:08.916281551 +0000
+++ buildrequestdistributor.py	2021-10-30 15:44:54.984008401 +0000
@@ -29,6 +29,8 @@
 from buildbot.process.buildrequest import BuildRequest
 from buildbot.util import epoch2datetime
 from buildbot.util import service
+from twisted.internet import reactor
+from twisted.internet.task import deferLater
 
 class BuildChooserBase:
     #
@@ -440,6 +442,9 @@
         timer.stop()
         return rv
 
+    def sleep(self, secs):
+        return deferLater(reactor, secs, lambda: None)
+
     @defer.inlineCallbacks
     def _activityLoop(self):
         self.active = True
@@ -480,6 +485,9 @@
 
             self.activity_lock.release()
 
+            if not pending_builders:
+                yield self.sleep(3)
+
         timer.stop()

which limits how quickly the loop retries the gobbling of data. It means that whilst the initial build may not be ideally prioritized, it leaves time for the other requests to back up and allows them to be prioritised to workers correctly which for us is worth the slight delay.

I'm not sure what the right fix is but sadly it does seem prioritizeBuilders doesn't function as well as you'd first think.

@tardyp tardyp linked a pull request Nov 1, 2021 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant