[PATCH v1] pid: annotate data-races around pid_ns->pid_allocated

Jiayuan Chen posted 1 patch 7 months, 4 weeks ago
There is a newer version of this series
kernel/fork.c          | 2 +-
kernel/pid.c           | 7 ++++---
kernel/pid_namespace.c | 2 +-
3 files changed, 6 insertions(+), 5 deletions(-)
[PATCH v1] pid: annotate data-races around pid_ns->pid_allocated
Posted by Jiayuan Chen 7 months, 4 weeks ago
Suppress syzbot reports by annotating these accesses using
READ_ONCE() / WRITE_ONCE().

Reported-by: syzbot+adcaa842b762a1762e7d@syzkaller.appspotmail.com
Reported-by: syzbot+fab52e3459fa2f95df57@syzkaller.appspotmail.com
Reported-by: syzbot+0718f65353d72efaac1e@syzkaller.appspotmail.com
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
---
 kernel/fork.c          | 2 +-
 kernel/pid.c           | 7 ++++---
 kernel/pid_namespace.c | 2 +-
 3 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index c4b26cd8998b..1966ddea150d 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2584,7 +2584,7 @@ __latent_entropy struct task_struct *copy_process(
 	rseq_fork(p, clone_flags);
 
 	/* Don't start children in a dying pid namespace */
-	if (unlikely(!(ns_of_pid(pid)->pid_allocated & PIDNS_ADDING))) {
+	if (unlikely(!(READ_ONCE(ns_of_pid(pid)->pid_allocated) & PIDNS_ADDING))) {
 		retval = -ENOMEM;
 		goto bad_fork_core_free;
 	}
diff --git a/kernel/pid.c b/kernel/pid.c
index 4ac2ce46817f..47e74457572f 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -122,7 +122,8 @@ void free_pid(struct pid *pid)
 	for (i = 0; i <= pid->level; i++) {
 		struct upid *upid = pid->numbers + i;
 		struct pid_namespace *ns = upid->ns;
-		switch (--ns->pid_allocated) {
+		WRITE_ONCE(ns->pid_allocated, READ_ONCE(ns->pid_allocated) - 1);
+		switch (READ_ONCE(ns->pid_allocated)) {
 		case 2:
 		case 1:
 			/* When all that is left in the pid namespace
@@ -271,13 +272,13 @@ struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid,
 	upid = pid->numbers + ns->level;
 	idr_preload(GFP_KERNEL);
 	spin_lock(&pidmap_lock);
-	if (!(ns->pid_allocated & PIDNS_ADDING))
+	if (!(READ_ONCE(ns->pid_allocated) & PIDNS_ADDING))
 		goto out_unlock;
 	pidfs_add_pid(pid);
 	for ( ; upid >= pid->numbers; --upid) {
 		/* Make the PID visible to find_pid_ns. */
 		idr_replace(&upid->ns->idr, pid, upid->nr);
-		upid->ns->pid_allocated++;
+		WRITE_ONCE(upid->ns->pid_allocated, READ_ONCE(upid->ns->pid_allocated) + 1);
 	}
 	spin_unlock(&pidmap_lock);
 	idr_preload_end();
diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
index 7098ed44e717..148f7789d6f3 100644
--- a/kernel/pid_namespace.c
+++ b/kernel/pid_namespace.c
@@ -268,7 +268,7 @@ void zap_pid_ns_processes(struct pid_namespace *pid_ns)
 	 */
 	for (;;) {
 		set_current_state(TASK_INTERRUPTIBLE);
-		if (pid_ns->pid_allocated == init_pids)
+		if (READ_ONCE(pid_ns->pid_allocated) == init_pids)
 			break;
 		schedule();
 	}
-- 
2.47.1
Re: [PATCH v1] pid: annotate data-races around pid_ns->pid_allocated
Posted by Oleg Nesterov 7 months, 4 weeks ago
On 04/23, Jiayuan Chen wrote:
>
> Suppress syzbot reports by annotating these accesses using
> READ_ONCE() / WRITE_ONCE().
...
> --- a/kernel/pid.c
> +++ b/kernel/pid.c
> @@ -122,7 +122,8 @@ void free_pid(struct pid *pid)
>  	for (i = 0; i <= pid->level; i++) {
>  		struct upid *upid = pid->numbers + i;
>  		struct pid_namespace *ns = upid->ns;
> -		switch (--ns->pid_allocated) {
> +		WRITE_ONCE(ns->pid_allocated, READ_ONCE(ns->pid_allocated) - 1);
> +		switch (READ_ONCE(ns->pid_allocated)) {

I keep forgetting how kcsan works, but we don't need
READ_ONCE(ns->pid_allocated) under pidmap_lock?

Same for other functions which read/modify ->pid_allocated with
this lock held.

Oleg.
Re: [PATCH v1] pid: annotate data-races around pid_ns->pid_allocated
Posted by Jiayuan Chen 7 months, 4 weeks ago
April 23, 2025 at 21:51, "Oleg Nesterov" <oleg@redhat.com> wrote:



> 
> On 04/23, Jiayuan Chen wrote:
> 
> > 
> > Suppress syzbot reports by annotating these accesses using
> > 
> >  READ_ONCE() / WRITE_ONCE().
> > 
> 
> ...
> 
> > 
> > --- a/kernel/pid.c
> > 
> >  +++ b/kernel/pid.c
> > 
> >  @@ -122,7 +122,8 @@ void free_pid(struct pid *pid)
> > 
> >  for (i = 0; i <= pid->level; i++) {
> > 
> >  struct upid *upid = pid->numbers + i;
> > 
> >  struct pid_namespace *ns = upid->ns;
> > 
> >  - switch (--ns->pid_allocated) {
> > 
> >  + WRITE_ONCE(ns->pid_allocated, READ_ONCE(ns->pid_allocated) - 1);
> > 
> >  + switch (READ_ONCE(ns->pid_allocated)) {
> > 
> 
> I keep forgetting how kcsan works, but we don't need
> 
> READ_ONCE(ns->pid_allocated) under pidmap_lock?
> 
> Same for other functions which read/modify ->pid_allocated with
> 
> this lock held.
> 
> Oleg.
>

However, not all places that read/write pid_allocated are locked,
for example:
https://web.git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/kernel/pid_namespace.c#n271
https://web.git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/kernel/fork.c#n2602

So, in fact, the pidmap_lock is not effective. And if we were to add locks
to all these places, it would be too heavy.

There's no actual impact on usage without locks, so I think it might be more
suitable to add these macros, KASAN can recognize READ_ONCE and WRITE_ONCE
and suppress warnings.

Thanks.
Re: [PATCH v1] pid: annotate data-races around pid_ns->pid_allocated
Posted by Oleg Nesterov 7 months, 4 weeks ago
On 04/23, Jiayuan Chen wrote:
>
> April 23, 2025 at 21:51, "Oleg Nesterov" <oleg@redhat.com> wrote:
>
>
>
> >
> > On 04/23, Jiayuan Chen wrote:
> >
> > >
> > > Suppress syzbot reports by annotating these accesses using
> > >
> > >  READ_ONCE() / WRITE_ONCE().
> > >
> >
> > ...
> >
> > >
> > > --- a/kernel/pid.c
> > >
> > >  +++ b/kernel/pid.c
> > >
> > >  @@ -122,7 +122,8 @@ void free_pid(struct pid *pid)
> > >
> > >  for (i = 0; i <= pid->level; i++) {
> > >
> > >  struct upid *upid = pid->numbers + i;
> > >
> > >  struct pid_namespace *ns = upid->ns;
> > >
> > >  - switch (--ns->pid_allocated) {
> > >
> > >  + WRITE_ONCE(ns->pid_allocated, READ_ONCE(ns->pid_allocated) - 1);
> > >
> > >  + switch (READ_ONCE(ns->pid_allocated)) {
> > >
> >
> > I keep forgetting how kcsan works, but we don't need
> >
> > READ_ONCE(ns->pid_allocated) under pidmap_lock?
> >
> > Same for other functions which read/modify ->pid_allocated with
> >
> > this lock held.
> >
> > Oleg.
> >
>
> However, not all places that read/write pid_allocated are locked,
> for example:
> https://web.git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/kernel/pid_namespace.c#n271
> https://web.git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/kernel/fork.c#n2602
>
> So, in fact, the pidmap_lock is not effective. And if we were to add locks
> to all these places, it would be too heavy.

It seems you misunderstood me. I didn't argue with the lockless READ_ONCE()s
outside of pidmap_lock.

Oleg.
Re: [PATCH v1] pid: annotate data-races around pid_ns->pid_allocated
Posted by Christian Brauner 7 months, 4 weeks ago
On Wed, Apr 23, 2025 at 06:38:18PM +0200, Oleg Nesterov wrote:
> On 04/23, Jiayuan Chen wrote:
> >
> > April 23, 2025 at 21:51, "Oleg Nesterov" <oleg@redhat.com> wrote:
> >
> >
> >
> > >
> > > On 04/23, Jiayuan Chen wrote:
> > >
> > > >
> > > > Suppress syzbot reports by annotating these accesses using
> > > >
> > > >  READ_ONCE() / WRITE_ONCE().
> > > >
> > >
> > > ...
> > >
> > > >
> > > > --- a/kernel/pid.c
> > > >
> > > >  +++ b/kernel/pid.c
> > > >
> > > >  @@ -122,7 +122,8 @@ void free_pid(struct pid *pid)
> > > >
> > > >  for (i = 0; i <= pid->level; i++) {
> > > >
> > > >  struct upid *upid = pid->numbers + i;
> > > >
> > > >  struct pid_namespace *ns = upid->ns;
> > > >
> > > >  - switch (--ns->pid_allocated) {
> > > >
> > > >  + WRITE_ONCE(ns->pid_allocated, READ_ONCE(ns->pid_allocated) - 1);
> > > >
> > > >  + switch (READ_ONCE(ns->pid_allocated)) {
> > > >
> > >
> > > I keep forgetting how kcsan works, but we don't need
> > >
> > > READ_ONCE(ns->pid_allocated) under pidmap_lock?
> > >
> > > Same for other functions which read/modify ->pid_allocated with
> > >
> > > this lock held.
> > >
> > > Oleg.
> > >
> >
> > However, not all places that read/write pid_allocated are locked,
> > for example:
> > https://web.git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/kernel/pid_namespace.c#n271
> > https://web.git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/kernel/fork.c#n2602
> >
> > So, in fact, the pidmap_lock is not effective. And if we were to add locks
> > to all these places, it would be too heavy.
> 
> It seems you misunderstood me. I didn't argue with the lockless READ_ONCE()s
> outside of pidmap_lock.

Agreed. We should only add those annotations where they're really
needed (someone once taught me ;).
Re: [PATCH v1] pid: annotate data-races around pid_ns->pid_allocated
Posted by Jiayuan Chen 7 months, 3 weeks ago
April 24, 2025 at 17:38, "Christian Brauner" <brauner@kernel.org> wrote:

> 
> On Wed, Apr 23, 2025 at 06:38:18PM +0200, Oleg Nesterov wrote:
> 
> > 
> > On 04/23, Jiayuan Chen wrote:
> > 
> >  April 23, 2025 at 21:51, "Oleg Nesterov" <oleg@redhat.com> wrote:
> > 
> >  >
> > 
> >  > On 04/23, Jiayuan Chen wrote:
> > 
> >  >
> > 
> >  > >
> > 
> >  > > Suppress syzbot reports by annotating these accesses using
> > 
> >  > >
> > 
> >  > > READ_ONCE() / WRITE_ONCE().
> > 
> >  > >
> > 
> >  >
> > 
> >  > ...
> > 
> >  >
> > 
> >  > >
> > 
> >  > > --- a/kernel/pid.c
> > 
> >  > >
> > 
> >  > > +++ b/kernel/pid.c
> > 
> >  > >
> > 
> >  > > @@ -122,7 +122,8 @@ void free_pid(struct pid *pid)
> > 
> >  > >
> > 
> >  > > for (i = 0; i <= pid->level; i++) {
> > 
> >  > >
> > 
> >  > > struct upid *upid = pid->numbers + i;
> > 
> >  > >
> > 
> >  > > struct pid_namespace *ns = upid->ns;
> > 
> >  > >
> > 
> >  > > - switch (--ns->pid_allocated) {
> > 
> >  > >
> > 
> >  > > + WRITE_ONCE(ns->pid_allocated, READ_ONCE(ns->pid_allocated) - 1);
> > 
> >  > >
> > 
> >  > > + switch (READ_ONCE(ns->pid_allocated)) {
> > 
> >  > >
> > 
> >  >
> > 
> >  > I keep forgetting how kcsan works, but we don't need
> > 
> >  >
> > 
> >  > READ_ONCE(ns->pid_allocated) under pidmap_lock?
> > 
> >  >
> > 
> >  > Same for other functions which read/modify ->pid_allocated with
> > 
> >  >
> > 
> >  > this lock held.
> > 
> >  >
> > 
> >  > Oleg.
> > 
> >  >
> > 
> >  However, not all places that read/write pid_allocated are locked,
> > 
> >  for example:
> > 
> >  https://web.git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/kernel/pid_namespace.c#n271
> > 
> >  https://web.git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/kernel/fork.c#n2602
> > 
> >  So, in fact, the pidmap_lock is not effective. And if we were to add locks
> > 
> >  to all these places, it would be too heavy.
> > 
> >  
> > 
> >  It seems you misunderstood me. I didn't argue with the lockless READ_ONCE()s
> > 
> >  outside of pidmap_lock.
> > 
> 
> Agreed. We should only add those annotations where they're really
> 
> needed (someone once taught me ;).
>

Thank you for your suggestion, it make sense to me.
Re: [PATCH v1] pid: annotate data-races around pid_ns->pid_allocated
Posted by Michal Koutný 7 months, 4 weeks ago
On Wed, Apr 23, 2025 at 02:33:37PM +0000, Jiayuan Chen <jiayuan.chen@linux.dev> wrote:
> However, not all places that read/write pid_allocated are locked,
> for example:
> https://web.git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/kernel/pid_namespace.c#n271
> https://web.git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/kernel/fork.c#n2602
> 
> So, in fact, the pidmap_lock is not effective. And if we were to add locks
> to all these places, it would be too heavy.
> 
> There's no actual impact on usage without locks, so I think it might be more
> suitable to add these macros, KASAN can recognize READ_ONCE and WRITE_ONCE
> and suppress warnings.

Wouldn't it be nicer to add data_race() to mark those places where the
race (presumably) doesn't matter? (Instead of _ONCE'ing places that are
under the lock.)

0.02€,
Michal