In a prior patch series we tried to cleanly differentiate between:
(1) The task has already been reaped.
(2) The caller requested a pidfd for a thread-group leader but the pid
actually references a struct pid that isn't used as a thread-group
leader.
as this was causing issues for non-threaded workloads.
But there's cases where the current simple logic is wrong. Specifically,
if the pid was a leader pid and the check races with __unhash_process().
Stabilize this by using the pidfd waitqueue lock.
Signed-off-by: Christian Brauner <brauner@kernel.org>
---
kernel/fork.c | 31 +++++++++++++------------------
1 file changed, 13 insertions(+), 18 deletions(-)
diff --git a/kernel/fork.c b/kernel/fork.c
index 4a2080b968c8..cde960fd0c71 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2108,28 +2108,23 @@ static int __pidfd_prepare(struct pid *pid, unsigned int flags, struct file **re
*/
int pidfd_prepare(struct pid *pid, unsigned int flags, struct file **ret)
{
- int err = 0;
-
- if (!(flags & PIDFD_THREAD)) {
+ scoped_guard(spinlock_irq, &pid->wait_pidfd.lock) {
+ /*
+ * If this wasn't a thread-group leader struct pid or
+ * the task already been reaped report ESRCH to
+ * userspace.
+ */
+ if (!pid_has_task(pid, PIDTYPE_PID))
+ return -ESRCH;
/*
- * If this is struct pid isn't used as a thread-group
- * leader pid but the caller requested to create a
- * thread-group leader pidfd then report ENOENT to the
- * caller as a hint.
+ * If this struct pid isn't used as a thread-group
+ * leader but the caller requested to create a
+ * thread-group leader pidfd then report ENOENT.
*/
- if (!pid_has_task(pid, PIDTYPE_TGID))
- err = -ENOENT;
+ if (!(flags & PIDFD_THREAD) && !pid_has_task(pid, PIDTYPE_TGID))
+ return -ENOENT;
}
- /*
- * If this wasn't a thread-group leader struct pid or the task
- * got reaped in the meantime report -ESRCH to userspace.
- */
- if (!pid_has_task(pid, PIDTYPE_PID))
- err = -ESRCH;
- if (err)
- return err;
-
return __pidfd_prepare(pid, flags, ret);
}
--
2.47.2
For both patches:
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
a minor nit below...
On 04/11, Christian Brauner wrote:
>
> int pidfd_prepare(struct pid *pid, unsigned int flags, struct file **ret)
> {
> - int err = 0;
> -
> - if (!(flags & PIDFD_THREAD)) {
> + scoped_guard(spinlock_irq, &pid->wait_pidfd.lock) {
> + /*
> + * If this wasn't a thread-group leader struct pid or
> + * the task already been reaped report ESRCH to
> + * userspace.
> + */
> + if (!pid_has_task(pid, PIDTYPE_PID))
> + return -ESRCH;
The "If this wasn't a thread-group leader struct pid" part of the
comment looks a bit confusing to me, as if pid_has_task(PIDTYPE_PID)
should return false in this case.
OTOH, perhaps it makes sense to explain scoped_guard(wait_pidfd.lock)?
Something like "see unhash_process -> wake_up_all(), detach_pid(TGID)
isn't possible if pid_has_task(PID) succeeds".
Oleg.
On Fri, Apr 11, 2025 at 03:54:45PM +0200, Oleg Nesterov wrote:
> For both patches:
>
> Reviewed-by: Oleg Nesterov <oleg@redhat.com>
>
> a minor nit below...
>
> On 04/11, Christian Brauner wrote:
> >
> > int pidfd_prepare(struct pid *pid, unsigned int flags, struct file **ret)
> > {
> > - int err = 0;
> > -
> > - if (!(flags & PIDFD_THREAD)) {
> > + scoped_guard(spinlock_irq, &pid->wait_pidfd.lock) {
> > + /*
> > + * If this wasn't a thread-group leader struct pid or
> > + * the task already been reaped report ESRCH to
> > + * userspace.
> > + */
> > + if (!pid_has_task(pid, PIDTYPE_PID))
> > + return -ESRCH;
>
> The "If this wasn't a thread-group leader struct pid" part of the
> comment looks a bit confusing to me, as if pid_has_task(PIDTYPE_PID)
> should return false in this case.
Ok.
>
> OTOH, perhaps it makes sense to explain scoped_guard(wait_pidfd.lock)?
> Something like "see unhash_process -> wake_up_all(), detach_pid(TGID)
> isn't possible if pid_has_task(PID) succeeds".
I'm verbose. I hope you can live with it:
/*
* While holding the pidfd waitqueue lock removing the task
* linkage for the thread-group leader pid (PIDTYPE_TGID) isn't
* possible. Thus, if there's still task linkage for PIDTYPE_PID
* not having thread-group leader linkage for the pid means it
* wasn't a thread-group leader in the first place.
*/
:)
On 04/11, Christian Brauner wrote: > > I'm verbose. I hope you can live with it: > > /* > * While holding the pidfd waitqueue lock removing the task > * linkage for the thread-group leader pid (PIDTYPE_TGID) isn't > * possible. Thus, if there's still task linkage for PIDTYPE_PID > * not having thread-group leader linkage for the pid means it > * wasn't a thread-group leader in the first place. > */ > > :) LGTM ;) Oleg.
© 2016 - 2025 Red Hat, Inc.