[v1] procfs: fix missing RCU protection when reading real_parent in do_task_stat()

[PATCH] procfs: fix missing RCU protection when reading real_parent in do_task_stat()

Posted by alexjlzheng@gmail.com 1 week, 3 days ago

From: Jinliang Zheng <alexjlzheng@tencent.com>

When reading /proc/[pid]/stat, do_task_stat() accesses task->real_parent
without proper RCU protection, which leads:

  cpu 0                               cpu 1
  -----                               -----
  do_task_stat
    var = task->real_parent
                                      release_task
                                        call_rcu(delayed_put_task_struct)
    task_tgid_nr_ns(var)
      rcu_read_lock   <--- Too late!
      task_pid_ptr    <--- UAF!
      rcu_read_unlock

This fix adds proper RCU protection similar to getppid() in kernel/sys.c
which correctly uses rcu_dereference() when accessing current->real_parent.

Fixes: 06fffb1267c9 ("do_task_stat: don't take rcu_read_lock()")
Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com>
---
 fs/proc/array.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/proc/array.c b/fs/proc/array.c
index 42932f88141a..3c2eea2c551a 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -528,7 +528,9 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,
 		}
 
 		sid = task_session_nr_ns(task, ns);
-		ppid = task_tgid_nr_ns(task->real_parent, ns);
+		rcu_read_lock();
+		ppid = task_tgid_nr_ns(rcu_dereference(task->real_parent), ns);
+		rcu_read_unlock();
 		pgid = task_pgrp_nr_ns(task, ns);
 
 		unlock_task_sighand(task, &flags);
-- 
2.39.3

Re: [PATCH] procfs: fix missing RCU protection when reading real_parent in do_task_stat()

Posted by Oleg Nesterov 1 week, 3 days ago

On 01/27, alexjlzheng@gmail.com wrote:
>
> From: Jinliang Zheng <alexjlzheng@tencent.com>
>
> When reading /proc/[pid]/stat, do_task_stat() accesses task->real_parent
> without proper RCU protection, which leads:

Thanks for the patch...

>   cpu 0                               cpu 1
>   -----                               -----
>   do_task_stat
>     var = task->real_parent
>                                       release_task
>                                         call_rcu(delayed_put_task_struct)
>     task_tgid_nr_ns(var)
>       rcu_read_lock   <--- Too late!

Almost off-topic, but I can't resist. This looks confusing to me.
It is not "Too late", this rcu_read_lock() protects another thing.
Nevermind.

I think that the changelog could be more clear. It should probably
mention that forget_original_parent() doesn't take child->signal->siglock
and thus we have a race... I dunno.

> --- a/fs/proc/array.c
> +++ b/fs/proc/array.c
> @@ -528,7 +528,9 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,
>  		}
>
>  		sid = task_session_nr_ns(task, ns);
> -		ppid = task_tgid_nr_ns(task->real_parent, ns);
> +		rcu_read_lock();
> +		ppid = task_tgid_nr_ns(rcu_dereference(task->real_parent), ns);
> +		rcu_read_unlock();

But this can't really help. If task->real_parent has already exited and
it was reaped, then it is actually "Too late!" for rcu_read_lock().

Please use task_ppid_nr_ns() which does the necessary pid_alive() check.

Oleg.

Re: [PATCH] procfs: fix missing RCU protection when reading real_parent in do_task_stat()

Posted by Jinliang Zheng 1 week, 3 days ago

On Tue, 27 Jan 2026 18:25:25 +0100, oleg@redhat.com wrote:
> On 02/27, alexjlzheng@gmail.com wrote:
> >
> > From: Jinliang Zheng <alexjlzheng@tencent.com>
> >
> > When reading /proc/[pid]/stat, do_task_stat() accesses task->real_parent
> > without proper RCU protection, which leads:
> 
> Thanks for the patch...
> 
> >   cpu 0                               cpu 1
> >   -----                               -----
> >   do_task_stat
> >     var = task->real_parent
> >                                       release_task
> >                                         call_rcu(delayed_put_task_struct)
> >     task_tgid_nr_ns(var)
> >       rcu_read_lock   <--- Too late!
> 
> Almost off-topic, but I can't resist. This looks confusing to me.
> It is not "Too late", this rcu_read_lock() protects another thing.
> Nevermind.

Yes, and would "Too late to protect task->real_parent!" be better?

> 
> I think that the changelog could be more clear. It should probably
> mention that forget_original_parent() doesn't take child->signal->siglock
> and thus we have a race... I dunno.
> 
> > --- a/fs/proc/array.c
> > +++ b/fs/proc/array.c
> > @@ -528,7 +528,9 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,
> >  		}
> >
> >  		sid = task_session_nr_ns(task, ns);
> > -		ppid = task_tgid_nr_ns(task->real_parent, ns);
> > +		rcu_read_lock();
> > +		ppid = task_tgid_nr_ns(rcu_dereference(task->real_parent), ns);
> > +		rcu_read_unlock();
> 
> But this can't really help. If task->real_parent has already exited and
> it was reaped, then it is actually "Too late!" for rcu_read_lock().

I think this is acceptable, because we tolerate obtaining a stale value as
long as it doesn’t lead to a Use-After-Free (UAF) bug. This is similar to
the comments in the syscall getppid().

With the protection of rcu_read_lock()/rcu_read_unlock() for loading
task->real_parent, we can guarantee that the task_struct of real_parent
itself will not be freed.

Or, do I miss something?

Thanks,
Jinliang Zheng. :)

> 
> Please use task_ppid_nr_ns() which does the necessary pid_alive() check.



> 
> Oleg.

Re: [PATCH] procfs: fix missing RCU protection when reading real_parent in do_task_stat()

Posted by Mateusz Guzik 1 week, 3 days ago

On Tue, Jan 27, 2026 at 06:25:25PM +0100, Oleg Nesterov wrote:
> On 01/27, alexjlzheng@gmail.com wrote:
> > --- a/fs/proc/array.c
> > +++ b/fs/proc/array.c
> > @@ -528,7 +528,9 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,
> >  		}
> >
> >  		sid = task_session_nr_ns(task, ns);
> > -		ppid = task_tgid_nr_ns(task->real_parent, ns);
> > +		rcu_read_lock();
> > +		ppid = task_tgid_nr_ns(rcu_dereference(task->real_parent), ns);
> > +		rcu_read_unlock();
> 
> But this can't really help. If task->real_parent has already exited and
> it was reaped, then it is actually "Too late!" for rcu_read_lock().
> 
> Please use task_ppid_nr_ns() which does the necessary pid_alive() check.
> 

That routine looks bogus in its own right though.

Suppose it fits the time window between the current parent exiting and
the task being reassigned to init. Then you transiently see 0 as the pid,
instead of 1 (or whatever). This reads like a bug to me.

But suppose task_ppid_nr_ns() managed to get the right value at the
time. As per usual, such an exit + reaping could have happened before
the caller even looks at the returned pid.

Or to put it differently, imo the check in the routine not only does not
help, but introduces a corner case with a bogus result.

It probably should do precisely the same thing proposed in this patch,
as in:
	rcu_read_lock();
	ppid = task_tgid_nr_ns(rcu_dereference(task->real_parent), ns);
	rcu_read_unlock();

Re: [PATCH] procfs: fix missing RCU protection when reading real_parent in do_task_stat()

Posted by Oleg Nesterov 1 week, 2 days ago

On 01/27, Mateusz Guzik wrote:
>
> On Tue, Jan 27, 2026 at 06:25:25PM +0100, Oleg Nesterov wrote:
> > On 01/27, alexjlzheng@gmail.com wrote:
> > > --- a/fs/proc/array.c
> > > +++ b/fs/proc/array.c
> > > @@ -528,7 +528,9 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,
> > >  		}
> > >
> > >  		sid = task_session_nr_ns(task, ns);
> > > -		ppid = task_tgid_nr_ns(task->real_parent, ns);
> > > +		rcu_read_lock();
> > > +		ppid = task_tgid_nr_ns(rcu_dereference(task->real_parent), ns);
> > > +		rcu_read_unlock();
> >
> > But this can't really help. If task->real_parent has already exited and
> > it was reaped, then it is actually "Too late!" for rcu_read_lock().
> >
> > Please use task_ppid_nr_ns() which does the necessary pid_alive() check.

Ah, I was wrong, I forgot about lock_task_sighand(task). So in this case
pid_alive() is not necessary, and the patch is fine.

But unless you have a strong opinion, I'd still suggest to use
task_ppid_nr_ns(), see below.

> Suppose it fits the time window between the current parent exiting and
> the task being reassigned to init. Then you transiently see 0 as the pid,
> instead of 1 (or whatever). This reads like a bug to me.

But we can't avoid this. Without tasklist_lock even

 	task_tgid_nr_ns(current->real_parent, ns);

can return zero if we race with reparenting. If ->real_parent is reaped
right after we read the ->real_parent pointer, it has no pids. See
__unhash_process() -> detach_pid().

> It probably should do precisely the same thing proposed in this patch,
> as in:
> 	rcu_read_lock();
> 	ppid = task_tgid_nr_ns(rcu_dereference(task->real_parent), ns);
> 	rcu_read_unlock();

No, task_ppid_nr_ns(tsk) does need the pid_alive() check. If tsk exits,
tsk->real_parent points to nowhere, rcu_read_lock() can't help.

This all needs cleanups. ->real_parent and ->group_leader need the helpers
(probably with some CONFIG_PROVE_RCU checks) and they should be moved to
signal_struct.

So far I have only sent some trivial initial cleanups/preparations, see
https://lore.kernel.org/all/aXY_h8i78n6yD9JY@redhat.com/

I'll try to do the next step this week. If I have time ;) I am on a
forced PTO caused by renovations in our apartment.

Oleg.

Re: [PATCH] procfs: fix missing RCU protection when reading real_parent in do_task_stat()

Posted by Jinliang Zheng 1 week, 2 days ago

On Wed, 28 Jan 2026 09:01:35 +0100, oleg@redhat.com wrote:
> On 01/27, Mateusz Guzik wrote:
> >
> > On Tue, Jan 27, 2026 at 06:25:25PM +0100, Oleg Nesterov wrote:
> > > On 01/27, alexjlzheng@gmail.com wrote:
> > > > --- a/fs/proc/array.c
> > > > +++ b/fs/proc/array.c
> > > > @@ -528,7 +528,9 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,
> > > >  		}
> > > >
> > > >  		sid = task_session_nr_ns(task, ns);
> > > > -		ppid = task_tgid_nr_ns(task->real_parent, ns);
> > > > +		rcu_read_lock();
> > > > +		ppid = task_tgid_nr_ns(rcu_dereference(task->real_parent), ns);
> > > > +		rcu_read_unlock();
> > >
> > > But this can't really help. If task->real_parent has already exited and
> > > it was reaped, then it is actually "Too late!" for rcu_read_lock().
> > >
> > > Please use task_ppid_nr_ns() which does the necessary pid_alive() check.
> 
> Ah, I was wrong, I forgot about lock_task_sighand(task). So in this case
> pid_alive() is not necessary, and the patch is fine.
> 
> But unless you have a strong opinion, I'd still suggest to use
> task_ppid_nr_ns(), see below.

I don't have a strong opinion on this. Your suggestion makes sense - task_ppid_nr_ns()
is more maintainable. I'm happy to update the patch as you recommend.

Thanks,
Jinliang Zheng. :)

> 
> > Suppose it fits the time window between the current parent exiting and
> > the task being reassigned to init. Then you transiently see 0 as the pid,
> > instead of 1 (or whatever). This reads like a bug to me.
> 
> But we can't avoid this. Without tasklist_lock even
> 
>  	task_tgid_nr_ns(current->real_parent, ns);
> 
> can return zero if we race with reparenting. If ->real_parent is reaped
> right after we read the ->real_parent pointer, it has no pids. See
> __unhash_process() -> detach_pid().
> 
> > It probably should do precisely the same thing proposed in this patch,
> > as in:
> > 	rcu_read_lock();
> > 	ppid = task_tgid_nr_ns(rcu_dereference(task->real_parent), ns);
> > 	rcu_read_unlock();
> 
> No, task_ppid_nr_ns(tsk) does need the pid_alive() check. If tsk exits,
> tsk->real_parent points to nowhere, rcu_read_lock() can't help.
> 
> This all needs cleanups. ->real_parent and ->group_leader need the helpers
> (probably with some CONFIG_PROVE_RCU checks) and they should be moved to
> signal_struct.
> 
> So far I have only sent some trivial initial cleanups/preparations, see
> https://lore.kernel.org/all/aXY_h8i78n6yD9JY@redhat.com/
> 
> I'll try to do the next step this week. If I have time ;) I am on a
> forced PTO caused by renovations in our apartment.
> 
> Oleg.

Re: [PATCH] procfs: fix missing RCU protection when reading real_parent in do_task_stat()

Posted by Jinliang Zheng 1 week, 3 days ago

On Tue, 27 Jan 2026 19:49:11 +0100, mjguzik@gmail.com wrote:
> On Tue, Jan 27, 2026 at 06:25:25PM +0100, Oleg Nesterov wrote:
> > On 01/27, alexjlzheng@gmail.com wrote:
> > > --- a/fs/proc/array.c
> > > +++ b/fs/proc/array.c
> > > @@ -528,7 +528,9 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,
> > >  		}
> > >
> > >  		sid = task_session_nr_ns(task, ns);
> > > -		ppid = task_tgid_nr_ns(task->real_parent, ns);
> > > +		rcu_read_lock();
> > > +		ppid = task_tgid_nr_ns(rcu_dereference(task->real_parent), ns);
> > > +		rcu_read_unlock();
> > 
> > But this can't really help. If task->real_parent has already exited and
> > it was reaped, then it is actually "Too late!" for rcu_read_lock().
> > 
> > Please use task_ppid_nr_ns() which does the necessary pid_alive() check.
> > 
> 
> That routine looks bogus in its own right though.
> 
> Suppose it fits the time window between the current parent exiting and
> the task being reassigned to init. Then you transiently see 0 as the pid,
> instead of 1 (or whatever). This reads like a bug to me.
> 
> But suppose task_ppid_nr_ns() managed to get the right value at the
> time. As per usual, such an exit + reaping could have happened before
> the caller even looks at the returned pid.
> 
> Or to put it differently, imo the check in the routine not only does not
> help, but introduces a corner case with a bogus result.
> 
> It probably should do precisely the same thing proposed in this patch,
> as in:
> 	rcu_read_lock();
> 	ppid = task_tgid_nr_ns(rcu_dereference(task->real_parent), ns);
> 	rcu_read_unlock();

I agree.

Thanks,
Jinliang Zheng. :)