tools/lib/python/jobserver.py | 2 ++ 1 file changed, 2 insertions(+)
Add validation for jobserver tokens to prevent infinite loops on invalid fds
When using GNU Make's jobserver feature in kernel builds, a bug in MAKEFLAGS
propagation caused "--jobserver-auth=3,4" to reference an unintended file
descriptor (Here, fd 3 was inherited from a shell command that opened
"/etc/passwd" instead of a valid pipe). This led to infinite loops in
jobserver-exec's os.read() calls due to empty or corrupted tokens. (The
version of my make is 4.3)
$ ls -l /proc/self/fd
total 0
lrwx------ 1 changbin changbin 64 Dec 25 13:03 0 -> /dev/pts/1
lrwx------ 1 changbin changbin 64 Dec 25 13:03 1 -> /dev/pts/1
lrwx------ 1 changbin changbin 64 Dec 25 13:03 2 -> /dev/pts/1
lr-x------ 1 changbin changbin 64 Dec 25 13:03 3 -> /etc/passwd
lr-x------ 1 changbin changbin 64 Dec 25 13:03 4 -> /proc/1421383/fd
The modified code now explicitly validates tokens:
1. Rejects empty reads (prevents infinite loops on EOF)
2. Checks all bytes are '+' characters (catches fd reuse issues)
3. Raises ValueError with clear diagnostics for debugging
This ensures robustness against invalid jobserver configurations, even when
external tools (like make) incorrectly pass non-pipe file descriptors.
Signed-off-by: Changbin Du <changbin.du@huawei.com>
---
tools/lib/python/jobserver.py | 2 ++
1 file changed, 2 insertions(+)
diff --git a/tools/lib/python/jobserver.py b/tools/lib/python/jobserver.py
index a24f30ef4fa8..88d005f96bed 100755
--- a/tools/lib/python/jobserver.py
+++ b/tools/lib/python/jobserver.py
@@ -91,6 +91,8 @@ class JobserverExec:
while True:
try:
slot = os.read(self.reader, 8)
+ if not slot or any(c != b'+'[0] for c in slot):
+ raise ValueError("empty or unexpected token from jobserver")
self.jobs += slot
except (OSError, IOError) as e:
if e.errno == errno.EWOULDBLOCK:
--
2.43.0
Changbin Du <changbin.du@huawei.com> writes:
> Add validation for jobserver tokens to prevent infinite loops on invalid fds
> When using GNU Make's jobserver feature in kernel builds, a bug in MAKEFLAGS
> propagation caused "--jobserver-auth=3,4" to reference an unintended file
> descriptor (Here, fd 3 was inherited from a shell command that opened
> "/etc/passwd" instead of a valid pipe). This led to infinite loops in
> jobserver-exec's os.read() calls due to empty or corrupted tokens. (The
> version of my make is 4.3)
>
> $ ls -l /proc/self/fd
> total 0
> lrwx------ 1 changbin changbin 64 Dec 25 13:03 0 -> /dev/pts/1
> lrwx------ 1 changbin changbin 64 Dec 25 13:03 1 -> /dev/pts/1
> lrwx------ 1 changbin changbin 64 Dec 25 13:03 2 -> /dev/pts/1
> lr-x------ 1 changbin changbin 64 Dec 25 13:03 3 -> /etc/passwd
> lr-x------ 1 changbin changbin 64 Dec 25 13:03 4 -> /proc/1421383/fd
>
> The modified code now explicitly validates tokens:
> 1. Rejects empty reads (prevents infinite loops on EOF)
> 2. Checks all bytes are '+' characters (catches fd reuse issues)
> 3. Raises ValueError with clear diagnostics for debugging
> This ensures robustness against invalid jobserver configurations, even when
> external tools (like make) incorrectly pass non-pipe file descriptors.
>
> Signed-off-by: Changbin Du <changbin.du@huawei.com>
> ---
> tools/lib/python/jobserver.py | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/tools/lib/python/jobserver.py b/tools/lib/python/jobserver.py
> index a24f30ef4fa8..88d005f96bed 100755
> --- a/tools/lib/python/jobserver.py
> +++ b/tools/lib/python/jobserver.py
> @@ -91,6 +91,8 @@ class JobserverExec:
> while True:
> try:
> slot = os.read(self.reader, 8)
> + if not slot or any(c != b'+'[0] for c in slot):
> + raise ValueError("empty or unexpected token from jobserver")
So I had to stare at this for a while to figure out what it was doing; a
comment might help.
But if it finds something that's not b'+', it simply crashes the whole
thing? Is that really what we want to do? It would seem better to
proceed if we got any slots at all, and to emit a message telling the
poor user what they might want to do about the situation?
> self.jobs += slot
> except (OSError, IOError) as e:
> if e.errno == errno.EWOULDBLOCK:
Thanks,
jon
On Tue, Jan 06, 2026 at 02:52:06PM -0700, Jonathan Corbet wrote:
> Changbin Du <changbin.du@huawei.com> writes:
>
> > Add validation for jobserver tokens to prevent infinite loops on invalid fds
> > When using GNU Make's jobserver feature in kernel builds, a bug in MAKEFLAGS
> > propagation caused "--jobserver-auth=3,4" to reference an unintended file
> > descriptor (Here, fd 3 was inherited from a shell command that opened
> > "/etc/passwd" instead of a valid pipe). This led to infinite loops in
> > jobserver-exec's os.read() calls due to empty or corrupted tokens. (The
> > version of my make is 4.3)
> >
> > $ ls -l /proc/self/fd
> > total 0
> > lrwx------ 1 changbin changbin 64 Dec 25 13:03 0 -> /dev/pts/1
> > lrwx------ 1 changbin changbin 64 Dec 25 13:03 1 -> /dev/pts/1
> > lrwx------ 1 changbin changbin 64 Dec 25 13:03 2 -> /dev/pts/1
> > lr-x------ 1 changbin changbin 64 Dec 25 13:03 3 -> /etc/passwd
> > lr-x------ 1 changbin changbin 64 Dec 25 13:03 4 -> /proc/1421383/fd
> >
> > The modified code now explicitly validates tokens:
> > 1. Rejects empty reads (prevents infinite loops on EOF)
> > 2. Checks all bytes are '+' characters (catches fd reuse issues)
> > 3. Raises ValueError with clear diagnostics for debugging
> > This ensures robustness against invalid jobserver configurations, even when
> > external tools (like make) incorrectly pass non-pipe file descriptors.
> >
> > Signed-off-by: Changbin Du <changbin.du@huawei.com>
> > ---
> > tools/lib/python/jobserver.py | 2 ++
> > 1 file changed, 2 insertions(+)
> >
> > diff --git a/tools/lib/python/jobserver.py b/tools/lib/python/jobserver.py
> > index a24f30ef4fa8..88d005f96bed 100755
> > --- a/tools/lib/python/jobserver.py
> > +++ b/tools/lib/python/jobserver.py
> > @@ -91,6 +91,8 @@ class JobserverExec:
> > while True:
> > try:
> > slot = os.read(self.reader, 8)
> > + if not slot or any(c != b'+'[0] for c in slot):
> > + raise ValueError("empty or unexpected token from jobserver")
>
> So I had to stare at this for a while to figure out what it was doing; a
> comment might help.
>
> But if it finds something that's not b'+', it simply crashes the whole
> thing? Is that really what we want to do? It would seem better to
> proceed if we got any slots at all, and to emit a message telling the
> poor user what they might want to do about the situation?
>
I suspect that in Make versions prior to 4.3, when generating the "--jobserver-auth=r,w"
parameter, the implementation fails to properly handle situations where file descriptor 3
is already occupied by the parent process (as I encountered where fd 3 was actually used to
open /etc/passwd). This appears to force Make to always use fd3 regardless of its
availability (I'm not sure how Make was written). In contrast, Make 4.4+ versions
default to using named pipes, which avoids this issue entirely.
When this problem occurs, the current implementation deadlocks because for regular files,
os.read() returns empty bytes after reaching EOF, creating an infinite loop. My workaround
is to ignore this error condition to prevent deadlock, although this means the jobserver
protocol will no longer be honored.
As you suggested above, We can output an error message to stderr to inform users, but
must not use stdout, as it would corrupt the tool's normal output stream. For
example, in scripts/Makefile.vmlinux_o we have:
quiet_cmd_gen_initcalls_lds = GEN $@
cmd_gen_initcalls_lds = \
$(PYTHON3) $(srctree)/scripts/jobserver-exec \
$(PERL) $(real-prereqs) > $@
> > self.jobs += slot
> > except (OSError, IOError) as e:
> > if e.errno == errno.EWOULDBLOCK:
>
> Thanks,
>
> jon
--
Cheers,
Changbin Du
Em Wed, 7 Jan 2026 08:11:29 +0000
duchangbin <changbin.du@huawei.com> escreveu:
> On Tue, Jan 06, 2026 at 02:52:06PM -0700, Jonathan Corbet wrote:
> > Changbin Du <changbin.du@huawei.com> writes:
> >
> > > Add validation for jobserver tokens to prevent infinite loops on invalid fds
> > > When using GNU Make's jobserver feature in kernel builds, a bug in MAKEFLAGS
> > > propagation caused "--jobserver-auth=3,4" to reference an unintended file
> > > descriptor (Here, fd 3 was inherited from a shell command that opened
> > > "/etc/passwd" instead of a valid pipe). This led to infinite loops in
> > > jobserver-exec's os.read() calls due to empty or corrupted tokens. (The
> > > version of my make is 4.3)
> > >
> > > $ ls -l /proc/self/fd
> > > total 0
> > > lrwx------ 1 changbin changbin 64 Dec 25 13:03 0 -> /dev/pts/1
> > > lrwx------ 1 changbin changbin 64 Dec 25 13:03 1 -> /dev/pts/1
> > > lrwx------ 1 changbin changbin 64 Dec 25 13:03 2 -> /dev/pts/1
> > > lr-x------ 1 changbin changbin 64 Dec 25 13:03 3 -> /etc/passwd
> > > lr-x------ 1 changbin changbin 64 Dec 25 13:03 4 -> /proc/1421383/fd
> > >
> > > The modified code now explicitly validates tokens:
> > > 1. Rejects empty reads (prevents infinite loops on EOF)
> > > 2. Checks all bytes are '+' characters (catches fd reuse issues)
> > > 3. Raises ValueError with clear diagnostics for debugging
> > > This ensures robustness against invalid jobserver configurations, even when
> > > external tools (like make) incorrectly pass non-pipe file descriptors.
> > >
> > > Signed-off-by: Changbin Du <changbin.du@huawei.com>
> > > ---
> > > tools/lib/python/jobserver.py | 2 ++
> > > 1 file changed, 2 insertions(+)
> > >
> > > diff --git a/tools/lib/python/jobserver.py b/tools/lib/python/jobserver.py
> > > index a24f30ef4fa8..88d005f96bed 100755
> > > --- a/tools/lib/python/jobserver.py
> > > +++ b/tools/lib/python/jobserver.py
> > > @@ -91,6 +91,8 @@ class JobserverExec:
> > > while True:
> > > try:
> > > slot = os.read(self.reader, 8)
> > > + if not slot or any(c != b'+'[0] for c in slot):
> > > + raise ValueError("empty or unexpected token from jobserver")
> >
> > So I had to stare at this for a while to figure out what it was doing; a
> > comment might help.
> >
> > But if it finds something that's not b'+', it simply crashes the whole
> > thing? Is that really what we want to do? It would seem better to
> > proceed if we got any slots at all, and to emit a message telling the
> > poor user what they might want to do about the situation?
> >
> I suspect that in Make versions prior to 4.3, when generating the "--jobserver-auth=r,w"
> parameter, the implementation fails to properly handle situations where file descriptor 3
> is already occupied by the parent process (as I encountered where fd 3 was actually used to
> open /etc/passwd). This appears to force Make to always use fd3 regardless of its
> availability (I'm not sure how Make was written). In contrast, Make 4.4+ versions
> default to using named pipes, which avoids this issue entirely.
It would be nice if you could provide more details about how to reproduce it.
Are you doing anything special? What distro are you using? what python version?
> When this problem occurs, the current implementation deadlocks because for regular files,
> os.read() returns empty bytes after reaching EOF, creating an infinite loop. My workaround
> is to ignore this error condition to prevent deadlock, although this means the jobserver
> protocol will no longer be honored.
testing if slot is empty makes sense, but why testing if it is "+"?
>
> As you suggested above, We can output an error message to stderr to inform users, but
> must not use stdout, as it would corrupt the tool's normal output stream.
You could do something like (untested):
while True:
try:
slot = os.read(self.reader, 8)
+ if not slot:
+ # Stop at the end of the jobserver queue.
+ break
+ # Why do we need this?
+ if any(c != b'+'[0] for c in slot):
+ print("Warning: invalid jobserver slots", file=sys.stderr)
+ break
self.jobs += slot
except (OSError, IOError) as e:
if e.errno == errno.EWOULDBLOCK:
# Stop at the end of the jobserver queue.
break
# If something went wrong, give back the jobs.
if self.jobs:
os.write(self.writer, self.jobs)
raise e
Yet, if os.read() fails or reaches EOF, I would expect that the "except" block
would pick it. It sounds to me that it could be some issue with the python
version you're using.
> For
> example, in scripts/Makefile.vmlinux_o we have:
>
> quiet_cmd_gen_initcalls_lds = GEN $@
> cmd_gen_initcalls_lds = \
> $(PYTHON3) $(srctree)/scripts/jobserver-exec \
> $(PERL) $(real-prereqs) > $@
>
>
> > > self.jobs += slot
> > > except (OSError, IOError) as e:
> > > if e.errno == errno.EWOULDBLOCK:
> >
> > Thanks,
> >
> > jon
>
Thanks,
Mauro
On Wed, Jan 07, 2026 at 10:29:10AM +0100, Mauro Carvalho Chehab wrote:
> Em Wed, 7 Jan 2026 08:11:29 +0000
> duchangbin <changbin.du@huawei.com> escreveu:
>
> > On Tue, Jan 06, 2026 at 02:52:06PM -0700, Jonathan Corbet wrote:
> > > Changbin Du <changbin.du@huawei.com> writes:
> > >
> > > > Add validation for jobserver tokens to prevent infinite loops on invalid fds
> > > > When using GNU Make's jobserver feature in kernel builds, a bug in MAKEFLAGS
> > > > propagation caused "--jobserver-auth=3,4" to reference an unintended file
> > > > descriptor (Here, fd 3 was inherited from a shell command that opened
> > > > "/etc/passwd" instead of a valid pipe). This led to infinite loops in
> > > > jobserver-exec's os.read() calls due to empty or corrupted tokens. (The
> > > > version of my make is 4.3)
> > > >
> > > > $ ls -l /proc/self/fd
> > > > total 0
> > > > lrwx------ 1 changbin changbin 64 Dec 25 13:03 0 -> /dev/pts/1
> > > > lrwx------ 1 changbin changbin 64 Dec 25 13:03 1 -> /dev/pts/1
> > > > lrwx------ 1 changbin changbin 64 Dec 25 13:03 2 -> /dev/pts/1
> > > > lr-x------ 1 changbin changbin 64 Dec 25 13:03 3 -> /etc/passwd
> > > > lr-x------ 1 changbin changbin 64 Dec 25 13:03 4 -> /proc/1421383/fd
> > > >
> > > > The modified code now explicitly validates tokens:
> > > > 1. Rejects empty reads (prevents infinite loops on EOF)
> > > > 2. Checks all bytes are '+' characters (catches fd reuse issues)
> > > > 3. Raises ValueError with clear diagnostics for debugging
> > > > This ensures robustness against invalid jobserver configurations, even when
> > > > external tools (like make) incorrectly pass non-pipe file descriptors.
> > > >
> > > > Signed-off-by: Changbin Du <changbin.du@huawei.com>
> > > > ---
> > > > tools/lib/python/jobserver.py | 2 ++
> > > > 1 file changed, 2 insertions(+)
> > > >
> > > > diff --git a/tools/lib/python/jobserver.py b/tools/lib/python/jobserver.py
> > > > index a24f30ef4fa8..88d005f96bed 100755
> > > > --- a/tools/lib/python/jobserver.py
> > > > +++ b/tools/lib/python/jobserver.py
> > > > @@ -91,6 +91,8 @@ class JobserverExec:
> > > > while True:
> > > > try:
> > > > slot = os.read(self.reader, 8)
> > > > + if not slot or any(c != b'+'[0] for c in slot):
> > > > + raise ValueError("empty or unexpected token from jobserver")
> > >
> > > So I had to stare at this for a while to figure out what it was doing; a
> > > comment might help.
> > >
> > > But if it finds something that's not b'+', it simply crashes the whole
> > > thing? Is that really what we want to do? It would seem better to
> > > proceed if we got any slots at all, and to emit a message telling the
> > > poor user what they might want to do about the situation?
> > >
> > I suspect that in Make versions prior to 4.3, when generating the "--jobserver-auth=r,w"
> > parameter, the implementation fails to properly handle situations where file descriptor 3
> > is already occupied by the parent process (as I encountered where fd 3 was actually used to
> > open /etc/passwd). This appears to force Make to always use fd3 regardless of its
> > availability (I'm not sure how Make was written). In contrast, Make 4.4+ versions
> > default to using named pipes, which avoids this issue entirely.
>
> It would be nice if you could provide more details about how to reproduce it.
> Are you doing anything special? What distro are you using? what python version?
>
> > When this problem occurs, the current implementation deadlocks because for regular files,
> > os.read() returns empty bytes after reaching EOF, creating an infinite loop. My workaround
> > is to ignore this error condition to prevent deadlock, although this means the jobserver
> > protocol will no longer be honored.
>
> testing if slot is empty makes sense, but why testing if it is "+"?
>
> >
> > As you suggested above, We can output an error message to stderr to inform users, but
> > must not use stdout, as it would corrupt the tool's normal output stream.
>
After thinking a little bit more about this, IMHO the best is to have
two separate patches (assuming that there is a good reason why ensuring that the
slot's character is "+"):
> You could do something like (untested):
>
> while True:
> try:
> slot = os.read(self.reader, 8)
> + if not slot:
> + # Stop at the end of the jobserver queue.
> + break
This would be patch 1, to overcome some issue (probably due to Python
version) that reading past EOF won't rise an exception. I would very much
want to see what python version you're using and see if some other
exception arose (like EOFError), properly described at the patch description.
> + # Why do we need this?
> + if any(c != b'+'[0] for c in slot):
> + print("Warning: invalid jobserver slots", file=sys.stderr)
> + break
This seems to be a separate issue. Why do we need to enforce that the slot data
is "+"? If it doesn't, why this would be a problem?
Btw, reading:
https://www.gnu.org/software/make/manual/html_node/POSIX-Jobserver.html
We have:
"In both implementations of the jobserver, the pipe will be pre-loaded with
one single-character token for each available job. To obtain an extra slot
you must read a single character from the jobserver; to release a slot you
must write a single character back into the jobserver.
It’s important that when you release the job slot, you write back the same
character you read. Don’t assume that all tokens are the same character;
different characters may have different meanings to GNU make. The order is
not important, since make has no idea in what order jobs will complete anyway."
So, a 100% compliant POSIX jobserver code shall not test for "+", but, instead,
preserve whatever character is there.
Yet, checking for "+" is really needed, please add a rationale at the patch
description justifying why. On such case, we should still:
- release the slot(s) we don't want by writing the character via
os.write();
- print a warning message about why we rejected the slot(s).
> self.jobs += slot
> except (OSError, IOError) as e:
> if e.errno == errno.EWOULDBLOCK:
> # Stop at the end of the jobserver queue.
> break
> # If something went wrong, give back the jobs.
> if self.jobs:
> os.write(self.writer, self.jobs)
> raise e
>
> Yet, if os.read() fails or reaches EOF, I would expect that the "except" block
> would pick it. It sounds to me that it could be some issue with the python
> version you're using.
>
> > For
> > example, in scripts/Makefile.vmlinux_o we have:
> >
> > quiet_cmd_gen_initcalls_lds = GEN $@
> > cmd_gen_initcalls_lds = \
> > $(PYTHON3) $(srctree)/scripts/jobserver-exec \
> > $(PERL) $(real-prereqs) > $@
> >
> >
> > > > self.jobs += slot
> > > > except (OSError, IOError) as e:
> > > > if e.errno == errno.EWOULDBLOCK:
> > >
> > > Thanks,
> > >
> > > jon
> >
>
>
>
> Thanks,
> Mauro
--
Thanks,
Mauro
On Wed, Jan 07, 2026 at 11:42:38AM +0100, Mauro Carvalho Chehab wrote:
> >
> > It would be nice if you could provide more details about how to reproduce it.
> > Are you doing anything special? What distro are you using? what python version?
> >
> > > When this problem occurs, the current implementation deadlocks because for regular files,
> > > os.read() returns empty bytes after reaching EOF, creating an infinite loop. My workaround
> > > is to ignore this error condition to prevent deadlock, although this means the jobserver
> > > protocol will no longer be honored.
> >
> > testing if slot is empty makes sense, but why testing if it is "+"?
> >
> > >
> > > As you suggested above, We can output an error message to stderr to inform users, but
> > > must not use stdout, as it would corrupt the tool's normal output stream.
> >
>
> After thinking a little bit more about this, IMHO the best is to have
> two separate patches (assuming that there is a good reason why ensuring that the
> slot's character is "+"):
>
> > You could do something like (untested):
> >
> > while True:
> > try:
> > slot = os.read(self.reader, 8)
> > + if not slot:
> > + # Stop at the end of the jobserver queue.
> > + break
>
> This would be patch 1, to overcome some issue (probably due to Python
> version) that reading past EOF won't rise an exception. I would very much
> want to see what python version you're using and see if some other
> exception arose (like EOFError), properly described at the patch description.
>
My Python is 3.12.3, and GNU Make is 4.3. But I don't think the issue is caused
by the Python interpreter. Instead, my shell opened /etc/passwd for some reason
without closing it, and as a result, all child processes inherited this fd3 file
descriptor.
$ ls -l /proc/self/fd
total 0
lrwx------ 1 changbin changbin 64 Jan 8 10:40 0 -> /dev/pts/0
lrwx------ 1 changbin changbin 64 Jan 8 10:40 1 -> /dev/pts/0
lrwx------ 1 changbin changbin 64 Jan 8 10:40 2 -> /dev/pts/0
lr-x------ 1 changbin changbin 64 Jan 8 10:40 3 -> /etc/passwd
lr-x------ 1 changbin changbin 64 Jan 8 10:40 4 -> /proc/2468162/fd
In this case, make should open a new file descriptor for jobserver control, but
clearly, it did not do so and instead still passed fd 3. Once I get a chance,
I'll look into how Make 4.3 actually works.
> > + # Why do we need this?
> > + if any(c != b'+'[0] for c in slot):
> > + print("Warning: invalid jobserver slots", file=sys.stderr)
> > + break
>
> This seems to be a separate issue. Why do we need to enforce that the slot data
> is "+"? If it doesn't, why this would be a problem?
>
> Btw, reading:
>
> https://www.gnu.org/software/make/manual/html_node/POSIX-Jobserver.html
>
> We have:
>
> "In both implementations of the jobserver, the pipe will be pre-loaded with
> one single-character token for each available job. To obtain an extra slot
> you must read a single character from the jobserver; to release a slot you
> must write a single character back into the jobserver.
>
> It’s important that when you release the job slot, you write back the same
> character you read. Don’t assume that all tokens are the same character;
> different characters may have different meanings to GNU make. The order is
> not important, since make has no idea in what order jobs will complete anyway."
>
> So, a 100% compliant POSIX jobserver code shall not test for "+", but, instead,
> preserve whatever character is there.
>
> Yet, checking for "+" is really needed, please add a rationale at the patch
> description justifying why. On such case, we should still:
>
> - release the slot(s) we don't want by writing the character via
> os.write();
> - print a warning message about why we rejected the slot(s).
>
Thank you for the information. I previously misunderstood that reading from the
jobserver would only return a '+' symbol, but now it's obviously not the case.
At this point, we seem unable to verify whether it's a valid jobserver file
descriptor, so we have to read the entire file contents util EOF.
> > self.jobs += slot
> > except (OSError, IOError) as e:
> > if e.errno == errno.EWOULDBLOCK:
> > # Stop at the end of the jobserver queue.
> > break
> > # If something went wrong, give back the jobs.
> > if self.jobs:
> > os.write(self.writer, self.jobs)
> > raise e
> >
> > Yet, if os.read() fails or reaches EOF, I would expect that the "except" block
> > would pick it. It sounds to me that it could be some issue with the python
> > version you're using.
> >
> > > For
> > > example, in scripts/Makefile.vmlinux_o we have:
> > >
> > > quiet_cmd_gen_initcalls_lds = GEN $@
> > > cmd_gen_initcalls_lds = \
> > > $(PYTHON3) $(srctree)/scripts/jobserver-exec \
> > > $(PERL) $(real-prereqs) > $@
> > >
> > >
> > > > > self.jobs += slot
> > > > > except (OSError, IOError) as e:
> > > > > if e.errno == errno.EWOULDBLOCK:
> > > >
> > > > Thanks,
> > > >
> > > > jon
> > >
> >
> >
> >
> > Thanks,
> > Mauro
>
> --
> Thanks,
> Mauro
>
--
Cheers,
Changbin Du
Em Thu, 8 Jan 2026 02:58:05 +0000
duchangbin <changbin.du@huawei.com> escreveu:
> On Wed, Jan 07, 2026 at 11:42:38AM +0100, Mauro Carvalho Chehab wrote:
> > >
> > > It would be nice if you could provide more details about how to reproduce it.
> > > Are you doing anything special? What distro are you using? what python version?
> > >
> > > > When this problem occurs, the current implementation deadlocks because for regular files,
> > > > os.read() returns empty bytes after reaching EOF, creating an infinite loop. My workaround
> > > > is to ignore this error condition to prevent deadlock, although this means the jobserver
> > > > protocol will no longer be honored.
> > >
> > > testing if slot is empty makes sense, but why testing if it is "+"?
> > >
> > > >
> > > > As you suggested above, We can output an error message to stderr to inform users, but
> > > > must not use stdout, as it would corrupt the tool's normal output stream.
> > >
> >
> > After thinking a little bit more about this, IMHO the best is to have
> > two separate patches (assuming that there is a good reason why ensuring that the
> > slot's character is "+"):
> >
> > > You could do something like (untested):
> > >
> > > while True:
> > > try:
> > > slot = os.read(self.reader, 8)
> > > + if not slot:
> > > + # Stop at the end of the jobserver queue.
> > > + break
> >
> > This would be patch 1, to overcome some issue (probably due to Python
> > version) that reading past EOF won't rise an exception. I would very much
> > want to see what python version you're using and see if some other
> > exception arose (like EOFError), properly described at the patch description.
> >
>
> My Python is 3.12.3, and GNU Make is 4.3. But I don't think the issue is caused
> by the Python interpreter. Instead, my shell opened /etc/passwd for some reason
> without closing it, and as a result, all child processes inherited this fd3 file
> descriptor.
Maybe there's something weird with your PAM settings and/or /etc/nsswitch.conf. I
saw some issues in the past related to kerberos/ldap/radius/sso timeouts.
>
> $ ls -l /proc/self/fd
> total 0
> lrwx------ 1 changbin changbin 64 Jan 8 10:40 0 -> /dev/pts/0
> lrwx------ 1 changbin changbin 64 Jan 8 10:40 1 -> /dev/pts/0
> lrwx------ 1 changbin changbin 64 Jan 8 10:40 2 -> /dev/pts/0
> lr-x------ 1 changbin changbin 64 Jan 8 10:40 3 -> /etc/passwd
> lr-x------ 1 changbin changbin 64 Jan 8 10:40 4 -> /proc/2468162/fd
>
> In this case, make should open a new file descriptor for jobserver control, but
> clearly, it did not do so and instead still passed fd 3. Once I get a chance,
> I'll look into how Make 4.3 actually works.
>
>
> > > + # Why do we need this?
> > > + if any(c != b'+'[0] for c in slot):
> > > + print("Warning: invalid jobserver slots", file=sys.stderr)
> > > + break
> >
> > This seems to be a separate issue. Why do we need to enforce that the slot data
> > is "+"? If it doesn't, why this would be a problem?
> >
> > Btw, reading:
> >
> > https://www.gnu.org/software/make/manual/html_node/POSIX-Jobserver.html
> >
> > We have:
> >
> > "In both implementations of the jobserver, the pipe will be pre-loaded with
> > one single-character token for each available job. To obtain an extra slot
> > you must read a single character from the jobserver; to release a slot you
> > must write a single character back into the jobserver.
> >
> > It’s important that when you release the job slot, you write back the same
> > character you read. Don’t assume that all tokens are the same character;
> > different characters may have different meanings to GNU make. The order is
> > not important, since make has no idea in what order jobs will complete anyway."
> >
> > So, a 100% compliant POSIX jobserver code shall not test for "+", but, instead,
> > preserve whatever character is there.
> >
> > Yet, checking for "+" is really needed, please add a rationale at the patch
> > description justifying why. On such case, we should still:
> >
> > - release the slot(s) we don't want by writing the character via
> > os.write();
> > - print a warning message about why we rejected the slot(s).
> >
> Thank you for the information. I previously misunderstood that reading from the
> jobserver would only return a '+' symbol, but now it's obviously not the case.
> At this point, we seem unable to verify whether it's a valid jobserver file
> descriptor, so we have to read the entire file contents util EOF.
Agreed. I guess that just checking "if not slot" should be enough for such
purpose.
If you can still reproduce the original issue, I would try that and
see if it works, e.g. this should likely be enough:
slot = os.read(self.reader, 8)
if not slot:
# Stop at the end of the jobserver queue.
break
>
> > > self.jobs += slot
> > > except (OSError, IOError) as e:
> > > if e.errno == errno.EWOULDBLOCK:
> > > # Stop at the end of the jobserver queue.
> > > break
> > > # If something went wrong, give back the jobs.
> > > if self.jobs:
> > > os.write(self.writer, self.jobs)
> > > raise e
> > >
> > > Yet, if os.read() fails or reaches EOF, I would expect that the "except" block
> > > would pick it. It sounds to me that it could be some issue with the python
> > > version you're using.
> > >
> > > > For
> > > > example, in scripts/Makefile.vmlinux_o we have:
> > > >
> > > > quiet_cmd_gen_initcalls_lds = GEN $@
> > > > cmd_gen_initcalls_lds = \
> > > > $(PYTHON3) $(srctree)/scripts/jobserver-exec \
> > > > $(PERL) $(real-prereqs) > $@
> > > >
> > > >
> > > > > > self.jobs += slot
> > > > > > except (OSError, IOError) as e:
> > > > > > if e.errno == errno.EWOULDBLOCK:
> > > > >
> > > > > Thanks,
> > > > >
> > > > > jon
> > > >
> > >
> > >
> > >
> > > Thanks,
> > > Mauro
> >
> > --
> > Thanks,
> > Mauro
> >
>
Thanks,
Mauro
On Thu, Jan 08, 2026 at 09:24:17AM +0100, Mauro Carvalho Chehab wrote:
> Em Thu, 8 Jan 2026 02:58:05 +0000
> duchangbin <changbin.du@huawei.com> escreveu:
>
> > On Wed, Jan 07, 2026 at 11:42:38AM +0100, Mauro Carvalho Chehab wrote:
> > > >
> > > > It would be nice if you could provide more details about how to reproduce it.
> > > > Are you doing anything special? What distro are you using? what python version?
> > > >
> > > > > When this problem occurs, the current implementation deadlocks because for regular files,
> > > > > os.read() returns empty bytes after reaching EOF, creating an infinite loop. My workaround
> > > > > is to ignore this error condition to prevent deadlock, although this means the jobserver
> > > > > protocol will no longer be honored.
> > > >
> > > > testing if slot is empty makes sense, but why testing if it is "+"?
> > > >
> > > > >
> > > > > As you suggested above, We can output an error message to stderr to inform users, but
> > > > > must not use stdout, as it would corrupt the tool's normal output stream.
> > > >
> > >
> > > After thinking a little bit more about this, IMHO the best is to have
> > > two separate patches (assuming that there is a good reason why ensuring that the
> > > slot's character is "+"):
> > >
> > > > You could do something like (untested):
> > > >
> > > > while True:
> > > > try:
> > > > slot = os.read(self.reader, 8)
> > > > + if not slot:
> > > > + # Stop at the end of the jobserver queue.
> > > > + break
> > >
> > > This would be patch 1, to overcome some issue (probably due to Python
> > > version) that reading past EOF won't rise an exception. I would very much
> > > want to see what python version you're using and see if some other
> > > exception arose (like EOFError), properly described at the patch description.
> > >
> >
> > My Python is 3.12.3, and GNU Make is 4.3. But I don't think the issue is caused
> > by the Python interpreter. Instead, my shell opened /etc/passwd for some reason
> > without closing it, and as a result, all child processes inherited this fd3 file
> > descriptor.
>
> Maybe there's something weird with your PAM settings and/or /etc/nsswitch.conf. I
> saw some issues in the past related to kerberos/ldap/radius/sso timeouts.
>
> >
> > $ ls -l /proc/self/fd
> > total 0
> > lrwx------ 1 changbin changbin 64 Jan 8 10:40 0 -> /dev/pts/0
> > lrwx------ 1 changbin changbin 64 Jan 8 10:40 1 -> /dev/pts/0
> > lrwx------ 1 changbin changbin 64 Jan 8 10:40 2 -> /dev/pts/0
> > lr-x------ 1 changbin changbin 64 Jan 8 10:40 3 -> /etc/passwd
> > lr-x------ 1 changbin changbin 64 Jan 8 10:40 4 -> /proc/2468162/fd
> >
> > In this case, make should open a new file descriptor for jobserver control, but
> > clearly, it did not do so and instead still passed fd 3. Once I get a chance,
> > I'll look into how Make 4.3 actually works.
> >
> >
> > > > + # Why do we need this?
> > > > + if any(c != b'+'[0] for c in slot):
> > > > + print("Warning: invalid jobserver slots", file=sys.stderr)
> > > > + break
> > >
> > > This seems to be a separate issue. Why do we need to enforce that the slot data
> > > is "+"? If it doesn't, why this would be a problem?
> > >
> > > Btw, reading:
> > >
> > > https://www.gnu.org/software/make/manual/html_node/POSIX-Jobserver.html
> > >
> > > We have:
> > >
> > > "In both implementations of the jobserver, the pipe will be pre-loaded with
> > > one single-character token for each available job. To obtain an extra slot
> > > you must read a single character from the jobserver; to release a slot you
> > > must write a single character back into the jobserver.
> > >
> > > It’s important that when you release the job slot, you write back the same
> > > character you read. Don’t assume that all tokens are the same character;
> > > different characters may have different meanings to GNU make. The order is
> > > not important, since make has no idea in what order jobs will complete anyway."
> > >
> > > So, a 100% compliant POSIX jobserver code shall not test for "+", but, instead,
> > > preserve whatever character is there.
> > >
> > > Yet, checking for "+" is really needed, please add a rationale at the patch
> > > description justifying why. On such case, we should still:
> > >
> > > - release the slot(s) we don't want by writing the character via
> > > os.write();
> > > - print a warning message about why we rejected the slot(s).
> > >
> > Thank you for the information. I previously misunderstood that reading from the
> > jobserver would only return a '+' symbol, but now it's obviously not the case.
> > At this point, we seem unable to verify whether it's a valid jobserver file
> > descriptor, so we have to read the entire file contents util EOF.
>
> Agreed. I guess that just checking "if not slot" should be enough for such
> purpose.
>
> If you can still reproduce the original issue, I would try that and
> see if it works, e.g. this should likely be enough:
>
> slot = os.read(self.reader, 8)
> if not slot:
> # Stop at the end of the jobserver queue.
> break
>
>
I have tested with below changes and it works. It also prevent us from probably
writing incorrect file.
@@ -91,6 +91,10 @@ class JobserverExec:
while True:
try:
slot = os.read(self.reader, 8)
+ if not slot:
+ # Clear self.jobs to prevent us from probably writing incorrect file.
+ self.jobs = []
+ raise ValueError("unexpected empty token from jobserver fd, invalid '--jobserver-auth=' setting?")
self.jobs += slot
except (OSError, IOError) as e:
if e.errno == errno.EWOULDBLOCK:
@@ -105,7 +109,8 @@ class JobserverExec:
# to sit here blocked on our child.
self.claim = len(self.jobs) + 1
- except (KeyError, IndexError, ValueError, OSError, IOError):
+ except (KeyError, IndexError, ValueError, OSError, IOError) as e:
+ print(f"Warning: {e}", file=sys.stderr)
# Any missing environment strings or bad fds should result in just
--
Cheers,
Changbin Du
On Wed, Jan 07, 2026 at 11:42:38AM +0100, Mauro Carvalho Chehab wrote:
> On Wed, Jan 07, 2026 at 10:29:10AM +0100, Mauro Carvalho Chehab wrote:
> > Em Wed, 7 Jan 2026 08:11:29 +0000
> > duchangbin <changbin.du@huawei.com> escreveu:
> >
> > > On Tue, Jan 06, 2026 at 02:52:06PM -0700, Jonathan Corbet wrote:
> > > > Changbin Du <changbin.du@huawei.com> writes:
> > > >
> > > > > Add validation for jobserver tokens to prevent infinite loops on invalid fds
> > > > > When using GNU Make's jobserver feature in kernel builds, a bug in MAKEFLAGS
> > > > > propagation caused "--jobserver-auth=3,4" to reference an unintended file
> > > > > descriptor (Here, fd 3 was inherited from a shell command that opened
> > > > > "/etc/passwd" instead of a valid pipe). This led to infinite loops in
> > > > > jobserver-exec's os.read() calls due to empty or corrupted tokens. (The
> > > > > version of my make is 4.3)
> > > > >
> > > > > $ ls -l /proc/self/fd
> > > > > total 0
> > > > > lrwx------ 1 changbin changbin 64 Dec 25 13:03 0 -> /dev/pts/1
> > > > > lrwx------ 1 changbin changbin 64 Dec 25 13:03 1 -> /dev/pts/1
> > > > > lrwx------ 1 changbin changbin 64 Dec 25 13:03 2 -> /dev/pts/1
> > > > > lr-x------ 1 changbin changbin 64 Dec 25 13:03 3 -> /etc/passwd
> > > > > lr-x------ 1 changbin changbin 64 Dec 25 13:03 4 -> /proc/1421383/fd
> > > > >
> > > > > The modified code now explicitly validates tokens:
> > > > > 1. Rejects empty reads (prevents infinite loops on EOF)
> > > > > 2. Checks all bytes are '+' characters (catches fd reuse issues)
> > > > > 3. Raises ValueError with clear diagnostics for debugging
> > > > > This ensures robustness against invalid jobserver configurations, even when
> > > > > external tools (like make) incorrectly pass non-pipe file descriptors.
> > > > >
> > > > > Signed-off-by: Changbin Du <changbin.du@huawei.com>
> > > > > ---
> > > > > tools/lib/python/jobserver.py | 2 ++
> > > > > 1 file changed, 2 insertions(+)
> > > > >
> > > > > diff --git a/tools/lib/python/jobserver.py b/tools/lib/python/jobserver.py
> > > > > index a24f30ef4fa8..88d005f96bed 100755
> > > > > --- a/tools/lib/python/jobserver.py
> > > > > +++ b/tools/lib/python/jobserver.py
> > > > > @@ -91,6 +91,8 @@ class JobserverExec:
> > > > > while True:
> > > > > try:
> > > > > slot = os.read(self.reader, 8)
> > > > > + if not slot or any(c != b'+'[0] for c in slot):
> > > > > + raise ValueError("empty or unexpected token from jobserver")
> > > >
> > > > So I had to stare at this for a while to figure out what it was doing; a
> > > > comment might help.
> > > >
> > > > But if it finds something that's not b'+', it simply crashes the whole
> > > > thing? Is that really what we want to do? It would seem better to
> > > > proceed if we got any slots at all, and to emit a message telling the
> > > > poor user what they might want to do about the situation?
> > > >
> > > I suspect that in Make versions prior to 4.3, when generating the "--jobserver-auth=r,w"
> > > parameter, the implementation fails to properly handle situations where file descriptor 3
> > > is already occupied by the parent process (as I encountered where fd 3 was actually used to
> > > open /etc/passwd). This appears to force Make to always use fd3 regardless of its
> > > availability (I'm not sure how Make was written). In contrast, Make 4.4+ versions
> > > default to using named pipes, which avoids this issue entirely.
> >
> > It would be nice if you could provide more details about how to reproduce it.
> > Are you doing anything special? What distro are you using? what python version?
> >
> > > When this problem occurs, the current implementation deadlocks because for regular files,
> > > os.read() returns empty bytes after reaching EOF, creating an infinite loop. My workaround
> > > is to ignore this error condition to prevent deadlock, although this means the jobserver
> > > protocol will no longer be honored.
> >
> > testing if slot is empty makes sense, but why testing if it is "+"?
> >
> > >
> > > As you suggested above, We can output an error message to stderr to inform users, but
> > > must not use stdout, as it would corrupt the tool's normal output stream.
> >
>
> After thinking a little bit more about this, IMHO the best is to have
> two separate patches (assuming that there is a good reason why ensuring that the
> slot's character is "+"):
>
> > You could do something like (untested):
> >
> > while True:
> > try:
> > slot = os.read(self.reader, 8)
> > + if not slot:
> > + # Stop at the end of the jobserver queue.
> > + break
>
> This would be patch 1, to overcome some issue (probably due to Python
> version) that reading past EOF won't rise an exception. I would very much
> want to see what python version you're using and see if some other
> exception arose (like EOFError), properly described at the patch description.
Answering myself, EOFError is only for input() method:
https://docs.python.org/3/library/exceptions.html#EOFError
reading past EOF returns an empty string, so the above check is indeed
needed to avoid an endless loop.
>
> > + # Why do we need this?
> > + if any(c != b'+'[0] for c in slot):
> > + print("Warning: invalid jobserver slots", file=sys.stderr)
> > + break
>
> This seems to be a separate issue. Why do we need to enforce that the slot data
> is "+"? If it doesn't, why this would be a problem?
>
> Btw, reading:
>
> https://www.gnu.org/software/make/manual/html_node/POSIX-Jobserver.html
>
> We have:
>
> "In both implementations of the jobserver, the pipe will be pre-loaded with
> one single-character token for each available job. To obtain an extra slot
> you must read a single character from the jobserver; to release a slot you
> must write a single character back into the jobserver.
>
> It’s important that when you release the job slot, you write back the same
> character you read. Don’t assume that all tokens are the same character;
> different characters may have different meanings to GNU make. The order is
> not important, since make has no idea in what order jobs will complete anyway."
>
> So, a 100% compliant POSIX jobserver code shall not test for "+", but, instead,
> preserve whatever character is there.
>
> Yet, checking for "+" is really needed, please add a rationale at the patch
> description justifying why. On such case, we should still:
>
> - release the slot(s) we don't want by writing the character via
> os.write();
> - print a warning message about why we rejected the slot(s).
>
> > self.jobs += slot
> > except (OSError, IOError) as e:
> > if e.errno == errno.EWOULDBLOCK:
> > # Stop at the end of the jobserver queue.
> > break
> > # If something went wrong, give back the jobs.
> > if self.jobs:
> > os.write(self.writer, self.jobs)
> > raise e
> >
> > Yet, if os.read() fails or reaches EOF, I would expect that the "except" block
> > would pick it. It sounds to me that it could be some issue with the python
> > version you're using.
> >
> > > For
> > > example, in scripts/Makefile.vmlinux_o we have:
> > >
> > > quiet_cmd_gen_initcalls_lds = GEN $@
> > > cmd_gen_initcalls_lds = \
> > > $(PYTHON3) $(srctree)/scripts/jobserver-exec \
> > > $(PERL) $(real-prereqs) > $@
> > >
> > >
> > > > > self.jobs += slot
> > > > > except (OSError, IOError) as e:
> > > > > if e.errno == errno.EWOULDBLOCK:
> > > >
> > > > Thanks,
> > > >
> > > > jon
> > >
> >
> >
> >
> > Thanks,
> > Mauro
>
> --
> Thanks,
> Mauro
--
Thanks,
Mauro
Kindly ping for this fix. This patch resolves the issue where kernel compilation
gets stuck in certain situations.
On Thu, Dec 25, 2025 at 02:26:22PM +0800, Changbin Du wrote:
> Add validation for jobserver tokens to prevent infinite loops on invalid fds
> When using GNU Make's jobserver feature in kernel builds, a bug in MAKEFLAGS
> propagation caused "--jobserver-auth=3,4" to reference an unintended file
> descriptor (Here, fd 3 was inherited from a shell command that opened
> "/etc/passwd" instead of a valid pipe). This led to infinite loops in
> jobserver-exec's os.read() calls due to empty or corrupted tokens. (The
> version of my make is 4.3)
>
> $ ls -l /proc/self/fd
> total 0
> lrwx------ 1 changbin changbin 64 Dec 25 13:03 0 -> /dev/pts/1
> lrwx------ 1 changbin changbin 64 Dec 25 13:03 1 -> /dev/pts/1
> lrwx------ 1 changbin changbin 64 Dec 25 13:03 2 -> /dev/pts/1
> lr-x------ 1 changbin changbin 64 Dec 25 13:03 3 -> /etc/passwd
> lr-x------ 1 changbin changbin 64 Dec 25 13:03 4 -> /proc/1421383/fd
>
> The modified code now explicitly validates tokens:
> 1. Rejects empty reads (prevents infinite loops on EOF)
> 2. Checks all bytes are '+' characters (catches fd reuse issues)
> 3. Raises ValueError with clear diagnostics for debugging
> This ensures robustness against invalid jobserver configurations, even when
> external tools (like make) incorrectly pass non-pipe file descriptors.
>
> Signed-off-by: Changbin Du <changbin.du@huawei.com>
> ---
> tools/lib/python/jobserver.py | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/tools/lib/python/jobserver.py b/tools/lib/python/jobserver.py
> index a24f30ef4fa8..88d005f96bed 100755
> --- a/tools/lib/python/jobserver.py
> +++ b/tools/lib/python/jobserver.py
> @@ -91,6 +91,8 @@ class JobserverExec:
> while True:
> try:
> slot = os.read(self.reader, 8)
> + if not slot or any(c != b'+'[0] for c in slot):
> + raise ValueError("empty or unexpected token from jobserver")
> self.jobs += slot
> except (OSError, IOError) as e:
> if e.errno == errno.EWOULDBLOCK:
> --
> 2.43.0
>
--
Cheers,
Changbin Du
duchangbin <changbin.du@huawei.com> writes: > Kindly ping for this fix. This patch resolves the issue where kernel compilation > gets stuck in certain situations. Patience, we're just coming out of the holiday period. Thanks, jon
© 2016 - 2026 Red Hat, Inc.