fs/proc/array.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
From: Jinliang Zheng <alexjlzheng@tencent.com>
When reading /proc/[pid]/stat, do_task_stat() accesses task->real_parent
without proper RCU protection, which leads:
cpu 0 cpu 1
----- -----
do_task_stat
var = task->real_parent
release_task
call_rcu(delayed_put_task_struct)
task_tgid_nr_ns(var)
rcu_read_lock <--- Too late!
task_pid_ptr <--- UAF!
rcu_read_unlock
This fix adds proper RCU protection similar to getppid() in kernel/sys.c
which correctly uses rcu_dereference() when accessing current->real_parent.
Fixes: 06fffb1267c9 ("do_task_stat: don't take rcu_read_lock()")
Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com>
---
fs/proc/array.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/fs/proc/array.c b/fs/proc/array.c
index 42932f88141a..3c2eea2c551a 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -528,7 +528,9 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,
}
sid = task_session_nr_ns(task, ns);
- ppid = task_tgid_nr_ns(task->real_parent, ns);
+ rcu_read_lock();
+ ppid = task_tgid_nr_ns(rcu_dereference(task->real_parent), ns);
+ rcu_read_unlock();
pgid = task_pgrp_nr_ns(task, ns);
unlock_task_sighand(task, &flags);
--
2.39.3
On 01/27, alexjlzheng@gmail.com wrote: > > From: Jinliang Zheng <alexjlzheng@tencent.com> > > When reading /proc/[pid]/stat, do_task_stat() accesses task->real_parent > without proper RCU protection, which leads: Thanks for the patch... > cpu 0 cpu 1 > ----- ----- > do_task_stat > var = task->real_parent > release_task > call_rcu(delayed_put_task_struct) > task_tgid_nr_ns(var) > rcu_read_lock <--- Too late! Almost off-topic, but I can't resist. This looks confusing to me. It is not "Too late", this rcu_read_lock() protects another thing. Nevermind. I think that the changelog could be more clear. It should probably mention that forget_original_parent() doesn't take child->signal->siglock and thus we have a race... I dunno. > --- a/fs/proc/array.c > +++ b/fs/proc/array.c > @@ -528,7 +528,9 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns, > } > > sid = task_session_nr_ns(task, ns); > - ppid = task_tgid_nr_ns(task->real_parent, ns); > + rcu_read_lock(); > + ppid = task_tgid_nr_ns(rcu_dereference(task->real_parent), ns); > + rcu_read_unlock(); But this can't really help. If task->real_parent has already exited and it was reaped, then it is actually "Too late!" for rcu_read_lock(). Please use task_ppid_nr_ns() which does the necessary pid_alive() check. Oleg.
On Tue, 27 Jan 2026 18:25:25 +0100, oleg@redhat.com wrote: > On 02/27, alexjlzheng@gmail.com wrote: > > > > From: Jinliang Zheng <alexjlzheng@tencent.com> > > > > When reading /proc/[pid]/stat, do_task_stat() accesses task->real_parent > > without proper RCU protection, which leads: > > Thanks for the patch... > > > cpu 0 cpu 1 > > ----- ----- > > do_task_stat > > var = task->real_parent > > release_task > > call_rcu(delayed_put_task_struct) > > task_tgid_nr_ns(var) > > rcu_read_lock <--- Too late! > > Almost off-topic, but I can't resist. This looks confusing to me. > It is not "Too late", this rcu_read_lock() protects another thing. > Nevermind. Yes, and would "Too late to protect task->real_parent!" be better? > > I think that the changelog could be more clear. It should probably > mention that forget_original_parent() doesn't take child->signal->siglock > and thus we have a race... I dunno. > > > --- a/fs/proc/array.c > > +++ b/fs/proc/array.c > > @@ -528,7 +528,9 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns, > > } > > > > sid = task_session_nr_ns(task, ns); > > - ppid = task_tgid_nr_ns(task->real_parent, ns); > > + rcu_read_lock(); > > + ppid = task_tgid_nr_ns(rcu_dereference(task->real_parent), ns); > > + rcu_read_unlock(); > > But this can't really help. If task->real_parent has already exited and > it was reaped, then it is actually "Too late!" for rcu_read_lock(). I think this is acceptable, because we tolerate obtaining a stale value as long as it doesn’t lead to a Use-After-Free (UAF) bug. This is similar to the comments in the syscall getppid(). With the protection of rcu_read_lock()/rcu_read_unlock() for loading task->real_parent, we can guarantee that the task_struct of real_parent itself will not be freed. Or, do I miss something? Thanks, Jinliang Zheng. :) > > Please use task_ppid_nr_ns() which does the necessary pid_alive() check. > > Oleg.
On Tue, Jan 27, 2026 at 06:25:25PM +0100, Oleg Nesterov wrote: > On 01/27, alexjlzheng@gmail.com wrote: > > --- a/fs/proc/array.c > > +++ b/fs/proc/array.c > > @@ -528,7 +528,9 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns, > > } > > > > sid = task_session_nr_ns(task, ns); > > - ppid = task_tgid_nr_ns(task->real_parent, ns); > > + rcu_read_lock(); > > + ppid = task_tgid_nr_ns(rcu_dereference(task->real_parent), ns); > > + rcu_read_unlock(); > > But this can't really help. If task->real_parent has already exited and > it was reaped, then it is actually "Too late!" for rcu_read_lock(). > > Please use task_ppid_nr_ns() which does the necessary pid_alive() check. > That routine looks bogus in its own right though. Suppose it fits the time window between the current parent exiting and the task being reassigned to init. Then you transiently see 0 as the pid, instead of 1 (or whatever). This reads like a bug to me. But suppose task_ppid_nr_ns() managed to get the right value at the time. As per usual, such an exit + reaping could have happened before the caller even looks at the returned pid. Or to put it differently, imo the check in the routine not only does not help, but introduces a corner case with a bogus result. It probably should do precisely the same thing proposed in this patch, as in: rcu_read_lock(); ppid = task_tgid_nr_ns(rcu_dereference(task->real_parent), ns); rcu_read_unlock();
On 01/27, Mateusz Guzik wrote: > > On Tue, Jan 27, 2026 at 06:25:25PM +0100, Oleg Nesterov wrote: > > On 01/27, alexjlzheng@gmail.com wrote: > > > --- a/fs/proc/array.c > > > +++ b/fs/proc/array.c > > > @@ -528,7 +528,9 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns, > > > } > > > > > > sid = task_session_nr_ns(task, ns); > > > - ppid = task_tgid_nr_ns(task->real_parent, ns); > > > + rcu_read_lock(); > > > + ppid = task_tgid_nr_ns(rcu_dereference(task->real_parent), ns); > > > + rcu_read_unlock(); > > > > But this can't really help. If task->real_parent has already exited and > > it was reaped, then it is actually "Too late!" for rcu_read_lock(). > > > > Please use task_ppid_nr_ns() which does the necessary pid_alive() check. Ah, I was wrong, I forgot about lock_task_sighand(task). So in this case pid_alive() is not necessary, and the patch is fine. But unless you have a strong opinion, I'd still suggest to use task_ppid_nr_ns(), see below. > Suppose it fits the time window between the current parent exiting and > the task being reassigned to init. Then you transiently see 0 as the pid, > instead of 1 (or whatever). This reads like a bug to me. But we can't avoid this. Without tasklist_lock even task_tgid_nr_ns(current->real_parent, ns); can return zero if we race with reparenting. If ->real_parent is reaped right after we read the ->real_parent pointer, it has no pids. See __unhash_process() -> detach_pid(). > It probably should do precisely the same thing proposed in this patch, > as in: > rcu_read_lock(); > ppid = task_tgid_nr_ns(rcu_dereference(task->real_parent), ns); > rcu_read_unlock(); No, task_ppid_nr_ns(tsk) does need the pid_alive() check. If tsk exits, tsk->real_parent points to nowhere, rcu_read_lock() can't help. This all needs cleanups. ->real_parent and ->group_leader need the helpers (probably with some CONFIG_PROVE_RCU checks) and they should be moved to signal_struct. So far I have only sent some trivial initial cleanups/preparations, see https://lore.kernel.org/all/aXY_h8i78n6yD9JY@redhat.com/ I'll try to do the next step this week. If I have time ;) I am on a forced PTO caused by renovations in our apartment. Oleg.
On Wed, 28 Jan 2026 09:01:35 +0100, oleg@redhat.com wrote: > On 01/27, Mateusz Guzik wrote: > > > > On Tue, Jan 27, 2026 at 06:25:25PM +0100, Oleg Nesterov wrote: > > > On 01/27, alexjlzheng@gmail.com wrote: > > > > --- a/fs/proc/array.c > > > > +++ b/fs/proc/array.c > > > > @@ -528,7 +528,9 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns, > > > > } > > > > > > > > sid = task_session_nr_ns(task, ns); > > > > - ppid = task_tgid_nr_ns(task->real_parent, ns); > > > > + rcu_read_lock(); > > > > + ppid = task_tgid_nr_ns(rcu_dereference(task->real_parent), ns); > > > > + rcu_read_unlock(); > > > > > > But this can't really help. If task->real_parent has already exited and > > > it was reaped, then it is actually "Too late!" for rcu_read_lock(). > > > > > > Please use task_ppid_nr_ns() which does the necessary pid_alive() check. > > Ah, I was wrong, I forgot about lock_task_sighand(task). So in this case > pid_alive() is not necessary, and the patch is fine. > > But unless you have a strong opinion, I'd still suggest to use > task_ppid_nr_ns(), see below. I don't have a strong opinion on this. Your suggestion makes sense - task_ppid_nr_ns() is more maintainable. I'm happy to update the patch as you recommend. Thanks, Jinliang Zheng. :) > > > Suppose it fits the time window between the current parent exiting and > > the task being reassigned to init. Then you transiently see 0 as the pid, > > instead of 1 (or whatever). This reads like a bug to me. > > But we can't avoid this. Without tasklist_lock even > > task_tgid_nr_ns(current->real_parent, ns); > > can return zero if we race with reparenting. If ->real_parent is reaped > right after we read the ->real_parent pointer, it has no pids. See > __unhash_process() -> detach_pid(). > > > It probably should do precisely the same thing proposed in this patch, > > as in: > > rcu_read_lock(); > > ppid = task_tgid_nr_ns(rcu_dereference(task->real_parent), ns); > > rcu_read_unlock(); > > No, task_ppid_nr_ns(tsk) does need the pid_alive() check. If tsk exits, > tsk->real_parent points to nowhere, rcu_read_lock() can't help. > > This all needs cleanups. ->real_parent and ->group_leader need the helpers > (probably with some CONFIG_PROVE_RCU checks) and they should be moved to > signal_struct. > > So far I have only sent some trivial initial cleanups/preparations, see > https://lore.kernel.org/all/aXY_h8i78n6yD9JY@redhat.com/ > > I'll try to do the next step this week. If I have time ;) I am on a > forced PTO caused by renovations in our apartment. > > Oleg.
On Tue, 27 Jan 2026 19:49:11 +0100, mjguzik@gmail.com wrote: > On Tue, Jan 27, 2026 at 06:25:25PM +0100, Oleg Nesterov wrote: > > On 01/27, alexjlzheng@gmail.com wrote: > > > --- a/fs/proc/array.c > > > +++ b/fs/proc/array.c > > > @@ -528,7 +528,9 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns, > > > } > > > > > > sid = task_session_nr_ns(task, ns); > > > - ppid = task_tgid_nr_ns(task->real_parent, ns); > > > + rcu_read_lock(); > > > + ppid = task_tgid_nr_ns(rcu_dereference(task->real_parent), ns); > > > + rcu_read_unlock(); > > > > But this can't really help. If task->real_parent has already exited and > > it was reaped, then it is actually "Too late!" for rcu_read_lock(). > > > > Please use task_ppid_nr_ns() which does the necessary pid_alive() check. > > > > That routine looks bogus in its own right though. > > Suppose it fits the time window between the current parent exiting and > the task being reassigned to init. Then you transiently see 0 as the pid, > instead of 1 (or whatever). This reads like a bug to me. > > But suppose task_ppid_nr_ns() managed to get the right value at the > time. As per usual, such an exit + reaping could have happened before > the caller even looks at the returned pid. > > Or to put it differently, imo the check in the routine not only does not > help, but introduces a corner case with a bogus result. > > It probably should do precisely the same thing proposed in this patch, > as in: > rcu_read_lock(); > ppid = task_tgid_nr_ns(rcu_dereference(task->real_parent), ns); > rcu_read_unlock(); I agree. Thanks, Jinliang Zheng. :)
© 2016 - 2026 Red Hat, Inc.