Add GNU make jobserver style "fifo" support #2263

stefanb2 · 2023-02-22T19:37:39Z

Part 3 in the implementation series for PR #1139: add support for the POSIX jobserver style fifo, introduced in GNU make 4.4, that uses a named pipe as communication channel between master & clients.

add an optional argument to the command line option -m / --tokenpool-master to specify the jobserver style.
- POSIX supports fifo (default) and pipe
- Win32 supports sem (default)
- specifying an unknown style will cause ninja to abort with a fatal error
update client & server parts of GNUmakeTokenPoolPosix to support the fifo style
add tests for fifo style

Documentation for GNU make jobserver

http://make.mad-scientist.net/papers/jobserver-implementation/
https://www.gnu.org/software/make/manual/html_node/Job-Slots.html#Job-Slots

Fixes #1139

- add new TokenPool interface - GNU make implementation for TokenPool parses and verifies the magic information from the MAKEFLAGS environment variable - RealCommandRunner tries to acquire TokenPool * if no token pool is available then there is no change in behaviour - When a token pool is available then RealCommandRunner behaviour changes as follows * CanRunMore() only returns true if TokenPool::Acquire() returns true * StartCommand() calls TokenPool::Reserve() * WaitForCommand() calls TokenPool::Release() Documentation for GNU make jobserver http://make.mad-scientist.net/papers/jobserver-implementation/ Fixes ninja-build#1139

Improve on the original jobserver client implementation. This makes ninja a more aggressive GNU make jobserver client. - add monitor interface to TokenPool - TokenPool is passed down when main loop indicates that more work is ready and would be allowed to start if a token becomes available - posix: update DoWork() to monitor TokenPool read file descriptor - WaitForCommand() exits when DoWork() sets token flag - Main loop starts over when WaitForCommand() sets token exit status

This emulates the behaviour of GNU make. - add parallelism_from_cmdline flag to build configuration - set the flag when -jN is given on command line - pass the flag to TokenPool::Get() - GNUmakeTokenPool::Setup() * prints a warning when the flag is true and jobserver was detected * returns false, i.e. jobserver will be ignored - ignore config.parallelism in CanRunMore() when we have a valid TokenPool, because it gets always initialized to a default when not given on the command line

This emulates the behaviour of GNU make. - build: make a copy of max_load_average and pass it to TokenPool. - GNUmakeTokenPool: if we detect a jobserver and a valid -lN argument in MAKEFLAGS then set max_load_average to N.

- replace printf() with calls to LinePrinter - print GNU make jobserver message only when verbose build is requested

- fix Windows build error in no-op TokenPool implementation - improve Acquire() to block for a maximum of 100ms - address review comments

- TokenPool setup - GetMonitorFd() API - implicit token and tokens in jobserver pipe - Acquire() / Reserve() / Release() protocol - Clear() API

- add TokenPoolTest stub to provide TokenPool::GetMonitorFd() - add two tests * both tests set up a dummy GNUmake jobserver pipe * both tests call DoWork() with TokenPoolTest * test 1: verify that DoWork() detects when a token is available * test 2: verify that DoWork() works as before without a token - the tests are not compiled in under Windows

Add tests that verify the token functionality of the builder main loop. We replace the default fake command runner with a special version where the tests can control each call to AcquireToken(), CanRunMore() and WaitForCommand().

GNU make uses a semaphore as jobserver protocol on Win32. See also https://www.gnu.org/software/make/manual/html_node/Windows-Jobserver.html Usage is pretty simple and straightforward, i.e. WaitForSingleObject() to obtain a token and ReleaseSemaphore() to return it. Unfortunately subprocess-win32.cc uses an I/O completion port (IOCP). IOCPs aren't waitable objects, i.e. we can't use WaitForMultipleObjects() to wait on the IOCP and the token semaphore at the same time. Therefore GNUmakeTokenPoolWin32 creates a child thread that waits on the token semaphore and posts a dummy I/O completion status on the IOCP when it was able to obtain a token. That unblocks SubprocessSet::DoWork() and it can then check if a token became available or not. - split existing GNUmakeTokenPool into common and platform bits - add GNUmakeTokenPool interface - move the Posix bits to GNUmakeTokenPoolPosix - add the Win32 bits as GNUmakeTokenPoolWin32 - move Setup() method up to TokenPool interface - update Subprocess & TokenPool tests accordingly

- remove unnecessary "struct" from TokenPool - add PAPCFUNC cast to QueryUserAPC() - remove hard-coded MAKEFLAGS string from win32 - remove useless build test CompleteNoWork - rename TokenPoolTest to TestTokenPool - add tokenpool modules to CMake build - remove unused no-op TokenPool implementation - fix errors flagged by codespell & clang-tidy - POSIX GNUmakeTokenPool should return same token - address review comments from ninja-build#1140 (comment) ninja-build#1140 (review) ninja-build#1140 (review) ninja-build#1140 (comment) ninja-build#1140 (comment)

Make space to add new API to set up token pool master.

This method will set up to the token pool master.

When this option is given on the command line then ninja will set up a token pool master instead of being a token pool client.

- don't set up token pool for serial builds - add implementation specific CreatePool() & SetEnv() methods - generate contents for MAKEFLAGS variable to pass down to children

Set up a pipe (POSIX) or semaphore (win32) with N tokens.

GNU make 4.4 introduced a new jobserver style "fifo" for POSIX systems which passes a named pipe down to the clients. - update auth parser to recognize "fifo:<name>" format - open named pipe for reading and writing - make sure the file descriptors are closed in the destructor - add 2 tests that aren't compiled for WIN32

GNU make 4.4 introduced a new command line option --jobserver-style with which the user can choose a different style than the default. For ninja we make the style an optional argument to the -m/--tokenpool-master option instead. - add argument value to BuildConfig and pass it down via SetupMaster() to CreatePool() - POSIX supports the styles "fifo" (default) and "pipe" - Win32 only supports the style "sem" (default) - an unsupported style causes ninja to abort with a fatal error - as the "fifo" style isn't implemented yet, hard-code the tests to the "pipe" style to make them pass - replace "OPTIONAL_ARG" with "optional_argument" in the getopt implementation to match the getopt_long() man page.

GNU make 4.4 introduced a new jobserver style "fifo" for POSIX systems which passes a named pipe down to the clients. - split CreatePool() into CreateFifo(), CreatePipe() & CreateTokens() - add implementation to CreateFifo() which creates a named pipe in the temp directory - make sure the named pipe ise removed in the destructor - update non-WIN32 tests to support "fifo" style as default - add a test for "pipe" style that isn't compiled for WIN32

Ext3h · 2024-03-18T08:52:38Z

src/tokenpool-gnu-make.cc

+ double max_load_average,
+ const char* style) {
+ // no need to set up token pool for serial builds
+ if (parallelism == 1)


That's not true - you still need to setup a job server, as it's the responsibility of the job server to communicate the serial nature to child processes. There's not only make which is default-serial, but also gcc and co. which are default-parallel.

Ext3h · 2024-03-18T09:02:46Z

src/build.cc

+ bool can_run_more =
+ failures_allowed &&
+ plan_.more_ready() &&
+ command_runner_->CanRunMore();


This CanRunMore() is kind of wrong. If there is a token pool, it's the sole authority on parallelism. No matter whether it's doing this based on load measurement or token counting.

If you have an AcquireToken() now, then you should only be calling CanRunMore() as an implementation detail within AcquireToken in the fallback case when a token pool is absent.

Ext3h · 2024-03-18T10:40:17Z

src/build.cc

+ // token became available
+ if (subproc == NULL) {
+ result->status = ExitTokenAvailable;
+ return true;


~~This can't be right... subprocs_.NextFinished() in that loop above has popped any number of completed processes.~~

~~But tokens_->Release() is only being called conditionally and only once at most.~~

~~So this is leaving tokens incorrectly marked as "in-use". You would have needed to release the token together with finished_.push(*i);.~~

... no, that loop above is looping while it does not pop anything. And you are also only quitting if no process had finished, and you were certain to have gotten a token instead.

Okay, this is working then, just really hard to follow and the naming got seriously confusing.

You might have wanted to put it in the else to the above if, and put a comment on the loop explaining the non-trivial exit condition.

Also the methods used here are definitely overdue for renaming, their names don't reflect what they do.

Ext3h · 2024-03-18T10:42:56Z

src/tokenpool-gnu-make.cc

+void GNUmakeTokenPool::Release() {
+ available_++;
+ used_--;
+ if (available_ > 1)


You shouldn't do that synchronously on every Release(). Return tokens in bulk when you start blocking on sub-process completion, but before that, keep holding on!

Ext3h · 2024-03-18T11:03:13Z

src/tokenpool-gnu-make.cc

+bool GNUmakeTokenPool::SetupClient(bool ignore,
+ bool verbose,
+ double& max_load_average) {
+ const char* value = GetEnv("MAKEFLAGS");


You appear to have missed the part where those flags can also occur on the command line, and have yet to be copied into the environment block for child processes.

Accidental variable expansion of $(MAKEFLAGS) is common enough.

Ext3h · 2024-03-18T11:47:28Z

src/tokenpool-gnu-make.cc

+ const char* jobserver = strstr(value, "--jobserver-fds=");
+ if (!jobserver)
+ // GNU make => 4.2
+ jobserver = strstr(value, "--jobserver-auth=");


Assume --jobserver-auth= has precedence over --jobserver-fds= please.

Ext3h · 2024-03-18T12:11:25Z

src/tokenpool-gnu-make.cc

+}
+
+void GNUmakeTokenPool::Clear() {
+ while (used_ > 0)


If you got it right, you should always end up with used_ == 0 at this point, given that all sub-processes must have been terminated.

Ext3h · 2024-03-18T12:34:23Z

src/tokenpool-gnu-make-posix.cc

+ // Temporarily replace SIGCHLD handler with our own
+ memset(&act, 0, sizeof(act));
+ act.sa_handler = CloseDupRfd;
+ if (sigaction(SIGCHLD, &act, &old_act) == 0) {


Why?... Unlike make, ninja doesn't monitor the child as a child process via SIGCHLD, but merely tracks the pipe opened to it. Meaning it's not even subject to the same race conditions which make had to handle.

Ah... this code used to provide non-blocking reads in a legacy portable way, not recognizing O_NONBLOCK as a legit way of doing this.

The SIGCHLD part is merely an optimization.

Still, nothing you wouldn't had rather done with a non-blocking handle instead. There are a couple of bad edge-cases where this will now result in a 100ms stall.

Ext3h · 2024-03-18T12:45:37Z

src/tokenpool-gnu-make-posix.cc

+
+ free(filename);
+
+ rfd_ = rfd;


Didn't you forget SetCloseOnExec for the FIFO case? You almost certainly didn't want them to be inherited for the named-pipe case.

If you did, then you should have also exported --jobserver-fds= for downward compatibility.

Ext3h · 2024-03-18T12:56:20Z

src/tokenpool-gnu-make-posix.cc

+ // Temporarily replace SIGCHLD handler with our own
+ memset(&act, 0, sizeof(act));
+ act.sa_handler = CloseDupRfd;
+ if (sigaction(SIGCHLD, &act, &old_act) == 0) {


Ah... this code used to provide non-blocking reads in a legacy portable way, not recognizing O_NONBLOCK as a legit way of doing this.

The SIGCHLD part is merely an optimization.

Still, nothing you wouldn't had rather done with a non-blocking handle instead. There are a couple of bad edge-cases where this will now result in a 100ms stall.

Ext3h · 2024-03-18T13:02:08Z

src/tokenpool-gnu-make-posix.cc

+ struct timeval timeout = { 0, 0 };
+ FD_ZERO(&set);
+ FD_SET(rfd_, &set);
+ int ret = select(rfd_ + 1, &set, NULL, NULL, &timeout);


Duplicated code between the code updating SubprocessSet::token_available_ and this select here.

Ext3h · 2024-03-18T13:11:44Z

src/subprocess-posix.cc

+ pollfd *pfd = &fds[nfds - 1];
+ if (pfd->fd >= 0) {
+ assert(pfd->fd == tokens->GetMonitorFd());
+ if (pfd->revents != 0)


You did only care for POLLIN | POLLPRI event bits, unlike for the fds[cur_nfd++].revents where also the error states were of interest.

Ext3h · 2024-03-18T13:38:46Z

src/build.cc

+
+ // command completed
+ if (tokens_)
+ tokens_->Release();


This has to be done after result->status = subproc->Finish() - the process is still running, only the pipe was broken yet.

Unrelated - but this is also a spot for optimization, as Ninja will get stalled here if a long-running process detaches from terminal early.

Stefan Becker and others added 20 commits February 19, 2023 16:01

Honor -lN from MAKEFLAGS

3d15060

This emulates the behaviour of GNU make. - build: make a copy of max_load_average and pass it to TokenPool. - GNUmakeTokenPool: if we detect a jobserver and a valid -lN argument in MAKEFLAGS then set max_load_average to N.

Use LinePrinter for TokenPool messages

1157e43

- replace printf() with calls to LinePrinter - print GNU make jobserver message only when verbose build is requested

Prepare PR for merging

6100863

- fix Windows build error in no-op TokenPool implementation - improve Acquire() to block for a maximum of 100ms - address review comments

Add tests for TokenPool

008bc6e

- TokenPool setup - GetMonitorFd() API - implicit token and tokens in jobserver pipe - Acquire() / Reserve() / Release() protocol - Clear() API

Add tests for build module

f99d747

Add tests that verify the token functionality of the builder main loop. We replace the default fake command runner with a special version where the tests can control each call to AcquireToken(), CanRunMore() and WaitForCommand().

Rename TokenPool::Setup() to SetupClient()

6ec8fb0

Make space to add new API to set up token pool master.

Add TokenPool::SetupMaster() method

89e4aef

This method will set up to the token pool master.

Add command line option -m/--tokenpool-master

0ec8c4a

When this option is given on the command line then ninja will set up a token pool master instead of being a token pool client.

Implement GNUmakeTokenPool::SetupMaster()

8d21910

- don't set up token pool for serial builds - add implementation specific CreatePool() & SetEnv() methods - generate contents for MAKEFLAGS variable to pass down to children

Implement GNUmakeTokenPool*::CreatePool()

6602155

Set up a pipe (POSIX) or semaphore (win32) with N tokens.

Add tests for TokenPool::SetupMaster()

bee56b6

milahu mentioned this pull request Aug 11, 2023

update cc code: tokenpool-client, tokenpool-master, jobserver-fifo milahu/gnumake-tokenpool#2

Open

Ext3h reviewed Mar 18, 2024

View reviewed changes

hundeboll mentioned this pull request Apr 4, 2024

Jobserver fifo Kitware/ninja#2

Open

hundeboll mentioned this pull request May 17, 2024

Implement GNU Make 4.4+ jobserver fifo / semaphore client support #2450

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GNU make jobserver style "fifo" support #2263

Add GNU make jobserver style "fifo" support #2263

stefanb2 commented Feb 22, 2023

Ext3h Mar 18, 2024 •

edited

Loading

Ext3h Mar 18, 2024 •

edited

Loading

Ext3h Mar 18, 2024 •

edited

Loading

Ext3h Mar 18, 2024 •

edited

Loading

Ext3h Mar 18, 2024

Ext3h Mar 18, 2024

Ext3h Mar 18, 2024

Ext3h Mar 18, 2024 •

edited

Loading

Ext3h Mar 18, 2024

Ext3h Mar 18, 2024

Ext3h Mar 18, 2024

Ext3h Mar 18, 2024 •

edited

Loading

Ext3h Mar 18, 2024

Ext3h Mar 18, 2024 •

edited

Loading

Ext3h Mar 18, 2024 •

edited

Loading

Ext3h Mar 18, 2024

Ext3h Mar 18, 2024

Ext3h Mar 18, 2024

Add GNU make jobserver style "fifo" support #2263

Are you sure you want to change the base?

Add GNU make jobserver style "fifo" support #2263

Conversation

stefanb2 commented Feb 22, 2023

Ext3h Mar 18, 2024 • edited Loading

Choose a reason for hiding this comment

Ext3h Mar 18, 2024 • edited Loading

Choose a reason for hiding this comment

Ext3h Mar 18, 2024 • edited Loading

Choose a reason for hiding this comment

Ext3h Mar 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Ext3h Mar 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Ext3h Mar 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Ext3h Mar 18, 2024 • edited Loading

Choose a reason for hiding this comment

Ext3h Mar 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Ext3h Mar 18, 2024 •

edited

Loading

Ext3h Mar 18, 2024 •

edited

Loading

Ext3h Mar 18, 2024 •

edited

Loading

Ext3h Mar 18, 2024 •

edited

Loading

Ext3h Mar 18, 2024 •

edited

Loading

Ext3h Mar 18, 2024 •

edited

Loading

Ext3h Mar 18, 2024 •

edited

Loading

Ext3h Mar 18, 2024 •

edited

Loading