[PATCH v4] tools: jobserver: Prevent deadlock caused by incorrect jobserver configuration and enhance error reporting

Changbin Du posted 1 patch 1 month ago
tools/lib/python/jobserver.py | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
[PATCH v4] tools: jobserver: Prevent deadlock caused by incorrect jobserver configuration and enhance error reporting
Posted by Changbin Du 1 month ago
When using GNU Make's jobserver feature in kernel builds, a bug in MAKEFLAGS
propagation caused "--jobserver-auth=r,w" to reference an unintended file
descriptor. This led to infinite loops in jobserver-exec's os.read() calls
due to empty token.

My shell opened /etc/passwd for some reason without closing it, and as a
result, all child processes inherited this fd 3.

$ ls -l /proc/self/fd
total 0
lrwx------ 1 changbin changbin 64 Dec 25 13:03 0 -> /dev/pts/1
lrwx------ 1 changbin changbin 64 Dec 25 13:03 1 -> /dev/pts/1
lrwx------ 1 changbin changbin 64 Dec 25 13:03 2 -> /dev/pts/1
lr-x------ 1 changbin changbin 64 Dec 25 13:03 3 -> /etc/passwd
lr-x------ 1 changbin changbin 64 Dec 25 13:03 4 -> /proc/1421383/fd

In this case, the `make` should open a new file descriptor for jobserver
control, but clearly, it did not do so and instead still passed fd 3 as
"--jobserver-auth=3,4" in MAKEFLAGS. (The version of my gnu make is 4.3)

This update ensures robustness against invalid jobserver configurations,
even when `make` incorrectly pass non-pipe file descriptors.
 * Rejecting empty reads to prevent infinite loops on EOF.
 * Clearing `self.jobs` to avoid writing to incorrect files if invalid tokens
   are detected.
 * Printing detailed error messages to stderr to inform the user.

Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Changbin Du <changbin.du@huawei.com>

---
  v4: set self.jobs to b"" instead of [] (so no type change).
  v3: format exception with repr(e).
  v2: remove validation for all bytes are '+' characters. (Mauro)
---
 tools/lib/python/jobserver.py | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/tools/lib/python/jobserver.py b/tools/lib/python/jobserver.py
index a24f30ef4fa8..616411087725 100755
--- a/tools/lib/python/jobserver.py
+++ b/tools/lib/python/jobserver.py
@@ -91,6 +91,10 @@ class JobserverExec:
             while True:
                 try:
                     slot = os.read(self.reader, 8)
+                    if not slot:
+                        # Clear self.jobs to prevent us from probably writing incorrect file.
+                        self.jobs = b""
+                        raise ValueError("unexpected empty token from jobserver fd, invalid '--jobserver-auth=' setting?")
                     self.jobs += slot
                 except (OSError, IOError) as e:
                     if e.errno == errno.EWOULDBLOCK:
@@ -105,7 +109,8 @@ class JobserverExec:
             # to sit here blocked on our child.
             self.claim = len(self.jobs) + 1
 
-        except (KeyError, IndexError, ValueError, OSError, IOError):
+        except (KeyError, IndexError, ValueError, OSError, IOError) as e:
+            print(f"jobserver: warning: {repr(e)}", file=sys.stderr)
             # Any missing environment strings or bad fds should result in just
             # not being parallel.
             self.claim = None
-- 
2.43.0
Re: [PATCH v4] tools: jobserver: Prevent deadlock caused by incorrect jobserver configuration and enhance error reporting
Posted by Jonathan Corbet 3 weeks, 5 days ago
Changbin Du <changbin.du@huawei.com> writes:

> When using GNU Make's jobserver feature in kernel builds, a bug in MAKEFLAGS
> propagation caused "--jobserver-auth=r,w" to reference an unintended file
> descriptor. This led to infinite loops in jobserver-exec's os.read() calls
> due to empty token.
>
> My shell opened /etc/passwd for some reason without closing it, and as a
> result, all child processes inherited this fd 3.
>
> $ ls -l /proc/self/fd
> total 0
> lrwx------ 1 changbin changbin 64 Dec 25 13:03 0 -> /dev/pts/1
> lrwx------ 1 changbin changbin 64 Dec 25 13:03 1 -> /dev/pts/1
> lrwx------ 1 changbin changbin 64 Dec 25 13:03 2 -> /dev/pts/1
> lr-x------ 1 changbin changbin 64 Dec 25 13:03 3 -> /etc/passwd
> lr-x------ 1 changbin changbin 64 Dec 25 13:03 4 -> /proc/1421383/fd
>
> In this case, the `make` should open a new file descriptor for jobserver
> control, but clearly, it did not do so and instead still passed fd 3 as
> "--jobserver-auth=3,4" in MAKEFLAGS. (The version of my gnu make is 4.3)
>
> This update ensures robustness against invalid jobserver configurations,
> even when `make` incorrectly pass non-pipe file descriptors.
>  * Rejecting empty reads to prevent infinite loops on EOF.
>  * Clearing `self.jobs` to avoid writing to incorrect files if invalid tokens
>    are detected.
>  * Printing detailed error messages to stderr to inform the user.
>
> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> Signed-off-by: Changbin Du <changbin.du@huawei.com>

So I've applied this; it appears to work, though I can't really test the
error case that it is intended to fix.

However, it adds a new warning to a standard "make htmldocs" build:

> jobserver: warning: IndexError('list index out of range')

You have not added the exception, you just put in a print that brought
it to the surface.

The warning comes from JobserverExec::open(), for an exception that
appears to be expected.  This is the sort of use of exceptions that has
made me almost swear off them entirely in Python - it's a huge try block
that is using exceptions to hide a bunch of the assumptions and logic.
I'll be posting a patch shortly to remove this non-exceptional exception
case.

Thanks,

jon
Re: [PATCH v4] tools: jobserver: Prevent deadlock caused by incorrect jobserver configuration and enhance error reporting
Posted by duchangbin 3 weeks, 5 days ago
On Mon, Jan 12, 2026 at 09:08:58AM -0700, Jonathan Corbet wrote:
> Changbin Du <changbin.du@huawei.com> writes:
> 
> > When using GNU Make's jobserver feature in kernel builds, a bug in MAKEFLAGS
> > propagation caused "--jobserver-auth=r,w" to reference an unintended file
> > descriptor. This led to infinite loops in jobserver-exec's os.read() calls
> > due to empty token.
> >
> > My shell opened /etc/passwd for some reason without closing it, and as a
> > result, all child processes inherited this fd 3.
> >
> > $ ls -l /proc/self/fd
> > total 0
> > lrwx------ 1 changbin changbin 64 Dec 25 13:03 0 -> /dev/pts/1
> > lrwx------ 1 changbin changbin 64 Dec 25 13:03 1 -> /dev/pts/1
> > lrwx------ 1 changbin changbin 64 Dec 25 13:03 2 -> /dev/pts/1
> > lr-x------ 1 changbin changbin 64 Dec 25 13:03 3 -> /etc/passwd
> > lr-x------ 1 changbin changbin 64 Dec 25 13:03 4 -> /proc/1421383/fd
> >
> > In this case, the `make` should open a new file descriptor for jobserver
> > control, but clearly, it did not do so and instead still passed fd 3 as
> > "--jobserver-auth=3,4" in MAKEFLAGS. (The version of my gnu make is 4.3)
> >
> > This update ensures robustness against invalid jobserver configurations,
> > even when `make` incorrectly pass non-pipe file descriptors.
> >  * Rejecting empty reads to prevent infinite loops on EOF.
> >  * Clearing `self.jobs` to avoid writing to incorrect files if invalid tokens
> >    are detected.
> >  * Printing detailed error messages to stderr to inform the user.
> >
> > Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> > Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> > Signed-off-by: Changbin Du <changbin.du@huawei.com>
> 
> So I've applied this; it appears to work, though I can't really test the
> error case that it is intended to fix.
> 

Here's my local test result in case you need it.

$ make ARCH=riscv LLVM=1 LLVM_IAS=1 -j$(nproc) Image
  ...
  AR      drivers/built-in.a
  AR      built-in.a
  AR      vmlinux.a
  GEN     .tmp_initcalls.lds
jobserver: warning: ValueError("unexpected empty token from jobserver fd, invalid '--jobserver-auth=' setting?")
  LD      vmlinux.o

-- 
Cheers,
Changbin Du
Re: [PATCH v4] tools: jobserver: Prevent deadlock caused by incorrect jobserver configuration and enhance error reporting
Posted by Jonathan Corbet 3 weeks, 5 days ago
Jonathan Corbet <corbet@lwn.net> writes:

> The warning comes from JobserverExec::open(), for an exception that
> appears to be expected.  This is the sort of use of exceptions that has
> made me almost swear off them entirely in Python - it's a huge try block
> that is using exceptions to hide a bunch of the assumptions and logic.
> I'll be posting a patch shortly to remove this non-exceptional exception
> case.

Here's a first step, just to show what I have in mind.

jon

From bdbb48e153714ae1c9e5214ba3ecd6142536ee6f Mon Sep 17 00:00:00 2001
From: Jonathan Corbet <corbet@lwn.net>
Date: Mon, 12 Jan 2026 09:19:49 -0700
Subject: [PATCH] jobserver: Begin to split up the big try: block

The parsing of jobserver options is done in a massive try: block that hides
problems and (perhaps) bugs.  Start to split up that block and make the
logic explicit by moving the initial parsing of MAKEFLAGS out of that
block.

Among other things, this removes the warning:

  jobserver: warning: IndexError('list index out of range')

Seen after the application of bbf8c67aa6ae8.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 tools/lib/python/jobserver.py | 28 ++++++++++++++++++++--------
 1 file changed, 20 insertions(+), 8 deletions(-)

diff --git a/tools/lib/python/jobserver.py b/tools/lib/python/jobserver.py
index 616411087725e..c674319f6cb1f 100755
--- a/tools/lib/python/jobserver.py
+++ b/tools/lib/python/jobserver.py
@@ -58,15 +58,27 @@ class JobserverExec:
 
         if self.is_open:
             return
-
+        self.claim = None
+        self.is_open = True  # We only try once
+        #
+        # See what they have told us to do here.
+        #
+        try:
+            flags = os.environ['MAKEFLAGS']
+        except KeyError:
+            return
+        #
+        # Look for "--jobserver=R,W"
+        # Note that GNU Make has used --jobserver-fds and --jobserver-auth
+        # so this handles all of them.
+        #
+        opts = [x for x in flags.split(" ") if x.startswith("--jobserver")]
+        if not opts:
+            return
+        #
+        # OK, parse the result.
+        #
         try:
-            # Fetch the make environment options.
-            flags = os.environ["MAKEFLAGS"]
-            # Look for "--jobserver=R,W"
-            # Note that GNU Make has used --jobserver-fds and --jobserver-auth
-            # so this handles all of them.
-            opts = [x for x in flags.split(" ") if x.startswith("--jobserver")]
-
             # Parse out R,W file descriptor numbers and set them nonblocking.
             # If the MAKEFLAGS variable contains multiple instances of the
             # --jobserver-auth= option, the last one is relevant.
-- 
2.52.0
Re: [PATCH v4] tools: jobserver: Prevent deadlock caused by incorrect jobserver configuration and enhance error reporting
Posted by Mauro Carvalho Chehab 3 weeks, 5 days ago
Em Mon, 12 Jan 2026 09:23:30 -0700
Jonathan Corbet <corbet@lwn.net> escreveu:

> Jonathan Corbet <corbet@lwn.net> writes:
> 
> > The warning comes from JobserverExec::open(), for an exception that
> > appears to be expected.  This is the sort of use of exceptions that has
> > made me almost swear off them entirely in Python - it's a huge try block
> > that is using exceptions to hide a bunch of the assumptions and logic.
> > I'll be posting a patch shortly to remove this non-exceptional exception
> > case.  
> 
> Here's a first step, just to show what I have in mind.
> 
> jon
> 
> From bdbb48e153714ae1c9e5214ba3ecd6142536ee6f Mon Sep 17 00:00:00 2001
> From: Jonathan Corbet <corbet@lwn.net>
> Date: Mon, 12 Jan 2026 09:19:49 -0700
> Subject: [PATCH] jobserver: Begin to split up the big try: block
> 
> The parsing of jobserver options is done in a massive try: block that hides
> problems and (perhaps) bugs.  Start to split up that block and make the
> logic explicit by moving the initial parsing of MAKEFLAGS out of that
> block.

Agreed. Still, I would try to minimize the number of try/except, as those
makes the code more complex to read. 

FYI, when I converted it to a class in a way that it could
be re-used by doc build, I opted, by purpose, to preserve the code
as much as possible, to be bug-compatible with the original version,
but yeah, this big try/except would work better if split.

> Among other things, this removes the warning:
> 
>   jobserver: warning: IndexError('list index out of range')
> 
> Seen after the application of bbf8c67aa6ae8.
> 
> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
> ---
>  tools/lib/python/jobserver.py | 28 ++++++++++++++++++++--------
>  1 file changed, 20 insertions(+), 8 deletions(-)
> 
> diff --git a/tools/lib/python/jobserver.py b/tools/lib/python/jobserver.py
> index 616411087725e..c674319f6cb1f 100755
> --- a/tools/lib/python/jobserver.py
> +++ b/tools/lib/python/jobserver.py
> @@ -58,15 +58,27 @@ class JobserverExec:
>  
>          if self.is_open:
>              return
> -
> +        self.claim = None
> +        self.is_open = True  # We only try once
> +        #
> +        # See what they have told us to do here.
> +        #
> +        try:
> +            flags = os.environ['MAKEFLAGS']
> +        except KeyError:
> +            return

I would use, instead:

	flags = os.environ.get('MAKEFLAGS', '')

No need to have a return there...

> +        #
> +        # Look for "--jobserver=R,W"
> +        # Note that GNU Make has used --jobserver-fds and --jobserver-auth
> +        # so this handles all of them.
> +        #
> +        opts = [x for x in flags.split(" ") if x.startswith("--jobserver")]
> +        if not opts:
> +            return

... as, if MAKEFLAGS is an empty string, opts will be an empty list, causing
it to return.

> +        #
> +        # OK, parse the result.
> +        #
>          try:
> -            # Fetch the make environment options.
> -            flags = os.environ["MAKEFLAGS"]
> -            # Look for "--jobserver=R,W"
> -            # Note that GNU Make has used --jobserver-fds and --jobserver-auth
> -            # so this handles all of them.
> -            opts = [x for x in flags.split(" ") if x.startswith("--jobserver")]
> -
>              # Parse out R,W file descriptor numbers and set them nonblocking.
>              # If the MAKEFLAGS variable contains multiple instances of the
>              # --jobserver-auth= option, the last one is relevant.



Thanks,
Mauro
Re: [PATCH v4] tools: jobserver: Prevent deadlock caused by incorrect jobserver configuration and enhance error reporting
Posted by Jonathan Corbet 3 weeks, 5 days ago
Jonathan Corbet <corbet@lwn.net> writes:

> Jonathan Corbet <corbet@lwn.net> writes:
>
>> The warning comes from JobserverExec::open(), for an exception that
>> appears to be expected.  This is the sort of use of exceptions that has
>> made me almost swear off them entirely in Python - it's a huge try block
>> that is using exceptions to hide a bunch of the assumptions and logic.
>> I'll be posting a patch shortly to remove this non-exceptional exception
>> case.
>
> Here's a first step, just to show what I have in mind.

So FYI I'm going to add this into docs-next just to avoid adding a
potentially worrisome build warning to linux-next.  It's not yet in
docs-mw, so it's easily removable if a better way comes along.

jon