mm/percpu.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
Read/Write to pcpu_nr_populated should be performed while protected
by pcpu_lock. However, pcpu_nr_pages() reads pcpu_nr_populated without any
protection, which causes a data race between read/write.
Therefore, when reading pcpu_nr_populated in pcpu_nr_pages(), it should be
modified to be protected by pcpu_lock.
Reported-by: syzbot+e5bd32b79413e86f389e@syzkaller.appspotmail.com
Fixes: 7e8a6304d541 ("/proc/meminfo: add percpu populated pages count")
Signed-off-by: Jeongjun Park <aha310510@gmail.com>
---
mm/percpu.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/mm/percpu.c b/mm/percpu.c
index b35494c8ede2..0f98b857fb36 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -3355,7 +3355,13 @@ void __init setup_per_cpu_areas(void)
*/
unsigned long pcpu_nr_pages(void)
{
- return pcpu_nr_populated * pcpu_nr_units;
+ unsigned long flags, ret;
+
+ spin_lock_irqsave(&pcpu_lock, flags);
+ ret = pcpu_nr_populated * pcpu_nr_units;
+ spin_unlock_irqrestore(&pcpu_lock, flags);
+
+ return ret;
}
/*
--
On Wed, Jul 02, 2025 at 05:27:49PM +0900, Jeongjun Park wrote: > Read/Write to pcpu_nr_populated should be performed while protected > by pcpu_lock. However, pcpu_nr_pages() reads pcpu_nr_populated without any > protection, which causes a data race between read/write. > > Therefore, when reading pcpu_nr_populated in pcpu_nr_pages(), it should be > modified to be protected by pcpu_lock. > > Reported-by: syzbot+e5bd32b79413e86f389e@syzkaller.appspotmail.com > Fixes: 7e8a6304d541 ("/proc/meminfo: add percpu populated pages count") > Signed-off-by: Jeongjun Park <aha310510@gmail.com> > --- > mm/percpu.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/mm/percpu.c b/mm/percpu.c > index b35494c8ede2..0f98b857fb36 100644 > --- a/mm/percpu.c > +++ b/mm/percpu.c > @@ -3355,7 +3355,13 @@ void __init setup_per_cpu_areas(void) > */ > unsigned long pcpu_nr_pages(void) > { > - return pcpu_nr_populated * pcpu_nr_units; No need for the lock as I think race is fine here. Use something like the following and add a comment. data_race(READ_ONCE(pcpu_nr_populated)) * pcpu_nr_units;
Shakeel Butt <shakeel.butt@linux.dev> wrote: > > On Wed, Jul 02, 2025 at 05:27:49PM +0900, Jeongjun Park wrote: > > Read/Write to pcpu_nr_populated should be performed while protected > > by pcpu_lock. However, pcpu_nr_pages() reads pcpu_nr_populated without any > > protection, which causes a data race between read/write. > > > > Therefore, when reading pcpu_nr_populated in pcpu_nr_pages(), it should be > > modified to be protected by pcpu_lock. > > > > Reported-by: syzbot+e5bd32b79413e86f389e@syzkaller.appspotmail.com > > Fixes: 7e8a6304d541 ("/proc/meminfo: add percpu populated pages count") > > Signed-off-by: Jeongjun Park <aha310510@gmail.com> > > --- > > mm/percpu.c | 8 +++++++- > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > diff --git a/mm/percpu.c b/mm/percpu.c > > index b35494c8ede2..0f98b857fb36 100644 > > --- a/mm/percpu.c > > +++ b/mm/percpu.c > > @@ -3355,7 +3355,13 @@ void __init setup_per_cpu_areas(void) > > */ > > unsigned long pcpu_nr_pages(void) > > { > > - return pcpu_nr_populated * pcpu_nr_units; > > No need for the lock as I think race is fine here. Use something like > the following and add a comment. > > data_race(READ_ONCE(pcpu_nr_populated)) * pcpu_nr_units; > This race itself is not a critical security vuln, but it is a read/write race that actually occurs. Writing to pcpu_nr_populated is already systematically protected through pcpu_lock, so why do you think you can ignore the data race only when reading? -- Regards, Jeongjun Park
On Thu, Jul 03, 2025 at 02:19:34PM +0900, Jeongjun Park wrote: > Shakeel Butt <shakeel.butt@linux.dev> wrote: > > > > On Wed, Jul 02, 2025 at 05:27:49PM +0900, Jeongjun Park wrote: > > > Read/Write to pcpu_nr_populated should be performed while protected > > > by pcpu_lock. However, pcpu_nr_pages() reads pcpu_nr_populated without any > > > protection, which causes a data race between read/write. > > > > > > Therefore, when reading pcpu_nr_populated in pcpu_nr_pages(), it should be > > > modified to be protected by pcpu_lock. > > > > > > Reported-by: syzbot+e5bd32b79413e86f389e@syzkaller.appspotmail.com > > > Fixes: 7e8a6304d541 ("/proc/meminfo: add percpu populated pages count") > > > Signed-off-by: Jeongjun Park <aha310510@gmail.com> > > > --- > > > mm/percpu.c | 8 +++++++- > > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > > > diff --git a/mm/percpu.c b/mm/percpu.c > > > index b35494c8ede2..0f98b857fb36 100644 > > > --- a/mm/percpu.c > > > +++ b/mm/percpu.c > > > @@ -3355,7 +3355,13 @@ void __init setup_per_cpu_areas(void) > > > */ > > > unsigned long pcpu_nr_pages(void) > > > { > > > - return pcpu_nr_populated * pcpu_nr_units; > > > > No need for the lock as I think race is fine here. Use something like > > the following and add a comment. > > > > data_race(READ_ONCE(pcpu_nr_populated)) * pcpu_nr_units; > > > > This race itself is not a critical security vuln, but it is a read/write > race that actually occurs. Writing to pcpu_nr_populated is already > systematically protected through pcpu_lock, so why do you think you can > ignore the data race only when reading? > As mentioned in the other thread, the reader of this value is /proc/meminfo and reading a stale value isn't the end of the world either. Thanks, Dennis
On Wed, 2 Jul 2025, Jeongjun Park wrote: > diff --git a/mm/percpu.c b/mm/percpu.c > index b35494c8ede2..0f98b857fb36 100644 > --- a/mm/percpu.c > +++ b/mm/percpu.c > @@ -3355,7 +3355,13 @@ void __init setup_per_cpu_areas(void) > */ > unsigned long pcpu_nr_pages(void) > { > - return pcpu_nr_populated * pcpu_nr_units; > + unsigned long flags, ret; > + > + spin_lock_irqsave(&pcpu_lock, flags); > + ret = pcpu_nr_populated * pcpu_nr_units; > + spin_unlock_irqrestore(&pcpu_lock, flags); Ummm.. What? You are protecting a single read with a spinlock? There needs to be some updating of data somewhere for this to make sense. Unless a different critical section protected by the lock sets the value intermittendly to something you are not allowed to see before a final store of a valid value. But that would be unusual. This is an academic exercise or did you really see a problem? What is racing?
Christoph Lameter (Ampere) <cl@gentwo.org> wrote: > > On Wed, 2 Jul 2025, Jeongjun Park wrote: > > > diff --git a/mm/percpu.c b/mm/percpu.c > > index b35494c8ede2..0f98b857fb36 100644 > > --- a/mm/percpu.c > > +++ b/mm/percpu.c > > @@ -3355,7 +3355,13 @@ void __init setup_per_cpu_areas(void) > > */ > > unsigned long pcpu_nr_pages(void) > > { > > - return pcpu_nr_populated * pcpu_nr_units; > > + unsigned long flags, ret; > > + > > + spin_lock_irqsave(&pcpu_lock, flags); > > + ret = pcpu_nr_populated * pcpu_nr_units; > > + spin_unlock_irqrestore(&pcpu_lock, flags); > > > Ummm.. What? You are protecting a single read with a spinlock? There needs > to be some updating of data somewhere for this to make sense. > > > Unless a different critical section protected by the lock sets the value > intermittendly to something you are not allowed to see before a final > store of a valid value. But that would be unusual. > > This is an academic exercise or did you really see a problem? > > What is racing? > > This patch is by no means an academic exercise. As written in the reported tag, This race has actually been reported in syzbot [1]. [1]: https://syzkaller.appspot.com/bug?extid=e5bd32b79413e86f389e pcpu_nr_populated is currently being write in pcpu_chunk_populated() and pcpu_chunk_depopulated(), and since this two functions perform pcpu_nr_populated write under the protection of pcpu_lock, there is no race for write/write. However, since pcpu_nr_pages(), which performs a read operation on pcpu_nr_populated, is not protected by pcpu_lock, races between read/write can easily occur. Therefore, I think it is appropriate to protect it through pcpu_lock according to the comment written in the definition of pcpu_nr_populated. -- Regards, Jeongjun Park
Hello, On Thu, Jul 03, 2025 at 01:45:36PM +0900, Jeongjun Park wrote: > Christoph Lameter (Ampere) <cl@gentwo.org> wrote: > > > > On Wed, 2 Jul 2025, Jeongjun Park wrote: > > > > > diff --git a/mm/percpu.c b/mm/percpu.c > > > index b35494c8ede2..0f98b857fb36 100644 > > > --- a/mm/percpu.c > > > +++ b/mm/percpu.c > > > @@ -3355,7 +3355,13 @@ void __init setup_per_cpu_areas(void) > > > */ > > > unsigned long pcpu_nr_pages(void) > > > { > > > - return pcpu_nr_populated * pcpu_nr_units; > > > + unsigned long flags, ret; > > > + > > > + spin_lock_irqsave(&pcpu_lock, flags); > > > + ret = pcpu_nr_populated * pcpu_nr_units; > > > + spin_unlock_irqrestore(&pcpu_lock, flags); > > > > > > Ummm.. What? You are protecting a single read with a spinlock? There needs > > to be some updating of data somewhere for this to make sense. > > > > > > Unless a different critical section protected by the lock sets the value > > intermittendly to something you are not allowed to see before a final > > store of a valid value. But that would be unusual. > > > > This is an academic exercise or did you really see a problem? > > > > What is racing? > > > > > > This patch is by no means an academic exercise. > > As written in the reported tag, This race has actually been reported > in syzbot [1]. > > [1]: https://syzkaller.appspot.com/bug?extid=e5bd32b79413e86f389e > A report by syzbot doesn't mean it is a real problem. A production problem or broken test case is much more urgent. > pcpu_nr_populated is currently being write in pcpu_chunk_populated() > and pcpu_chunk_depopulated(), and since this two functions perform > pcpu_nr_populated write under the protection of pcpu_lock, there is no > race for write/write. > > However, since pcpu_nr_pages(), which performs a read operation on > pcpu_nr_populated, is not protected by pcpu_lock, races between read/write > can easily occur. > > Therefore, I think it is appropriate to protect it through pcpu_lock > according to the comment written in the definition of pcpu_nr_populated. > You're right that this is a race condition, but this was an intention choice done because the value read here is only being used to pass information to userspace for /proc/meminfo. As Christoph mentioned, the caller of pcpu_nr_pages() will never see an invalid value nor does it really matter either. The pcpu_lock is core to the percpu allocator and isn't something we would want to blindly expose either. The appropriate solution here is what Shakeel proposed to just mark the access as a data_race(). Thanks, Dennis
On Wed, Jul 02, 2025 at 10:51:25PM -0700, Dennis Zhou wrote: > > However, since pcpu_nr_pages(), which performs a read operation on > > pcpu_nr_populated, is not protected by pcpu_lock, races between read/write > > can easily occur. > > > > Therefore, I think it is appropriate to protect it through pcpu_lock > > according to the comment written in the definition of pcpu_nr_populated. > > You're right that this is a race condition, but this was an intention > choice done because the value read here is only being used to pass > information to userspace for /proc/meminfo. As Christoph mentioned, the > caller of pcpu_nr_pages() will never see an invalid value nor does it > really matter either. This isn't an actual race condition. The value can be read atomically and an unprotected read can't lead to a result which wouldn't be possible when reading under the lock. ie. Whether the lock is added or not, the end result doesn't change. It's just that syzbot can't tell the difference. Thanks. -- tejun
Hello, Dennis Zhou <dennis@kernel.org> wrote: > > Hello, > > On Thu, Jul 03, 2025 at 01:45:36PM +0900, Jeongjun Park wrote: > > Christoph Lameter (Ampere) <cl@gentwo.org> wrote: > > > > > > On Wed, 2 Jul 2025, Jeongjun Park wrote: > > > > > > > diff --git a/mm/percpu.c b/mm/percpu.c > > > > index b35494c8ede2..0f98b857fb36 100644 > > > > --- a/mm/percpu.c > > > > +++ b/mm/percpu.c > > > > @@ -3355,7 +3355,13 @@ void __init setup_per_cpu_areas(void) > > > > */ > > > > unsigned long pcpu_nr_pages(void) > > > > { > > > > - return pcpu_nr_populated * pcpu_nr_units; > > > > + unsigned long flags, ret; > > > > + > > > > + spin_lock_irqsave(&pcpu_lock, flags); > > > > + ret = pcpu_nr_populated * pcpu_nr_units; > > > > + spin_unlock_irqrestore(&pcpu_lock, flags); > > > > > > > > > Ummm.. What? You are protecting a single read with a spinlock? There needs > > > to be some updating of data somewhere for this to make sense. > > > > > > > > > Unless a different critical section protected by the lock sets the value > > > intermittendly to something you are not allowed to see before a final > > > store of a valid value. But that would be unusual. > > > > > > This is an academic exercise or did you really see a problem? > > > > > > What is racing? > > > > > > > > > > This patch is by no means an academic exercise. > > > > As written in the reported tag, This race has actually been reported > > in syzbot [1]. > > > > [1]: https://syzkaller.appspot.com/bug?extid=e5bd32b79413e86f389e > > > > A report by syzbot doesn't mean it is a real problem. A production > problem or broken test case is much more urgent. > > > pcpu_nr_populated is currently being write in pcpu_chunk_populated() > > and pcpu_chunk_depopulated(), and since this two functions perform > > pcpu_nr_populated write under the protection of pcpu_lock, there is no > > race for write/write. > > > > However, since pcpu_nr_pages(), which performs a read operation on > > pcpu_nr_populated, is not protected by pcpu_lock, races between read/write > > can easily occur. > > > > Therefore, I think it is appropriate to protect it through pcpu_lock > > according to the comment written in the definition of pcpu_nr_populated. > > > > You're right that this is a race condition, but this was an intention > choice done because the value read here is only being used to pass > information to userspace for /proc/meminfo. As Christoph mentioned, the > caller of pcpu_nr_pages() will never see an invalid value nor does it > really matter either. > > The pcpu_lock is core to the percpu allocator and isn't something we > would want to blindly expose either. > > The appropriate solution here is what Shakeel proposed to just mark the > access as a data_race(). > > Thanks, > Dennis If this data race was intentional, it makes sense why it was written this way. I'll send v2 patch with the fix Shakeel proposed. -- Regards, Jeongjun Park
© 2016 - 2025 Red Hat, Inc.