[PATCH net] page_pool: Fix use-after-free in page_pool_recycle_in_ring

Dong Chenchen posted 1 patch 6 months, 3 weeks ago
There is a newer version of this series
net/core/page_pool.c | 19 ++++++++++---------
1 file changed, 10 insertions(+), 9 deletions(-)
[PATCH net] page_pool: Fix use-after-free in page_pool_recycle_in_ring
Posted by Dong Chenchen 6 months, 3 weeks ago
syzbot reported a uaf in page_pool_recycle_in_ring:

BUG: KASAN: slab-use-after-free in lock_release+0x151/0xa30 kernel/locking/lockdep.c:5862
Read of size 8 at addr ffff8880286045a0 by task syz.0.284/6943

CPU: 0 UID: 0 PID: 6943 Comm: syz.0.284 Not tainted 6.13.0-rc3-syzkaller-gdfa94ce54f41 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:94 [inline]
 dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
 print_address_description mm/kasan/report.c:378 [inline]
 print_report+0x169/0x550 mm/kasan/report.c:489
 kasan_report+0x143/0x180 mm/kasan/report.c:602
 lock_release+0x151/0xa30 kernel/locking/lockdep.c:5862
 __raw_spin_unlock_bh include/linux/spinlock_api_smp.h:165 [inline]
 _raw_spin_unlock_bh+0x1b/0x40 kernel/locking/spinlock.c:210
 spin_unlock_bh include/linux/spinlock.h:396 [inline]
 ptr_ring_produce_bh include/linux/ptr_ring.h:164 [inline]
 page_pool_recycle_in_ring net/core/page_pool.c:707 [inline]
 page_pool_put_unrefed_netmem+0x748/0xb00 net/core/page_pool.c:826
 page_pool_put_netmem include/net/page_pool/helpers.h:323 [inline]
 page_pool_put_full_netmem include/net/page_pool/helpers.h:353 [inline]
 napi_pp_put_page+0x149/0x2b0 net/core/skbuff.c:1036
 skb_pp_recycle net/core/skbuff.c:1047 [inline]
 skb_free_head net/core/skbuff.c:1094 [inline]
 skb_release_data+0x6c4/0x8a0 net/core/skbuff.c:1125
 skb_release_all net/core/skbuff.c:1190 [inline]
 __kfree_skb net/core/skbuff.c:1204 [inline]
 sk_skb_reason_drop+0x1c9/0x380 net/core/skbuff.c:1242
 kfree_skb_reason include/linux/skbuff.h:1263 [inline]
 __skb_queue_purge_reason include/linux/skbuff.h:3343 [inline]

root cause is:

page_pool_recycle_in_ring
  ptr_ring_produce
    spin_lock(&r->producer_lock);
    WRITE_ONCE(r->queue[r->producer++], ptr)
      //recycle last page to pool
				page_pool_release
				  page_pool_scrub
				    page_pool_empty_ring
				      ptr_ring_consume
				      page_pool_return_page  //release all page
				  __page_pool_destroy
				     free_percpu(pool->recycle_stats);
				     free(pool) //free

     spin_unlock(&r->producer_lock); //pool->ring uaf read
  recycle_stat_inc(pool, ring);

page_pool can be free while page pool recycle the last page in ring.
Add producer-lock barrier to page_pool_release to prevent the page
pool from being free before all pages have been recycled.

Suggested-by: Jakub Kacinski <kuba@kernel.org>
Link: https://lore.kernel.org/netdev/20250513083123.3514193-1-dongchenchen2@huawei.com
Fixes: ff7d6b27f894 ("page_pool: refurbish version of page_pool code")
Reported-by: syzbot+204a4382fcb3311f3858@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=204a4382fcb3311f3858
Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com>
---
 net/core/page_pool.c | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 7745ad924ae2..08f1b000ebc1 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -707,19 +707,16 @@ void page_pool_return_page(struct page_pool *pool, netmem_ref netmem)
 
 static bool page_pool_recycle_in_ring(struct page_pool *pool, netmem_ref netmem)
 {
+	bool in_softirq;
 	int ret;
 	/* BH protection not needed if current is softirq */
-	if (in_softirq())
-		ret = ptr_ring_produce(&pool->ring, (__force void *)netmem);
-	else
-		ret = ptr_ring_produce_bh(&pool->ring, (__force void *)netmem);
-
-	if (!ret) {
+	in_softirq = page_pool_producer_lock(pool);
+	ret = !__ptr_ring_produce(&pool->ring, (__force void *)netmem);
+	if (ret)
 		recycle_stat_inc(pool, ring);
-		return true;
-	}
+	page_pool_producer_unlock(pool, in_softirq);
 
-	return false;
+	return ret;
 }
 
 /* Only allow direct recycling in special circumstances, into the
@@ -1091,10 +1088,14 @@ static void page_pool_scrub(struct page_pool *pool)
 
 static int page_pool_release(struct page_pool *pool)
 {
+	bool in_softirq;
 	int inflight;
 
 	page_pool_scrub(pool);
 	inflight = page_pool_inflight(pool, true);
+	/* Acquire producer lock to make sure producers have exited. */
+	in_softirq = page_pool_producer_lock(pool);
+	page_pool_producer_unlock(pool, in_softirq);
 	if (!inflight)
 		__page_pool_destroy(pool);
 
-- 
2.25.1
Re: [PATCH net] page_pool: Fix use-after-free in page_pool_recycle_in_ring
Posted by kernel test robot 6 months, 3 weeks ago
Hi Dong,

kernel test robot noticed the following build warnings:

[auto build test WARNING on net/main]

url:    https://github.com/intel-lab-lkp/linux/commits/Dong-Chenchen/page_pool-Fix-use-after-free-in-page_pool_recycle_in_ring/20250523-144323
base:   net/main
patch link:    https://lore.kernel.org/r/20250523064524.3035067-1-dongchenchen2%40huawei.com
patch subject: [PATCH net] page_pool: Fix use-after-free in page_pool_recycle_in_ring
config: x86_64-buildonly-randconfig-004-20250524 (https://download.01.org/0day-ci/archive/20250524/202505240558.ANlvx42u-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250524/202505240558.ANlvx42u-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202505240558.ANlvx42u-lkp@intel.com/

All warnings (new ones prefixed by >>):

   net/core/page_pool.c: In function 'page_pool_recycle_in_ring':
>> net/core/page_pool.c:716:45: warning: suggest braces around empty body in an 'if' statement [-Wempty-body]
     716 |                 recycle_stat_inc(pool, ring);
         |                                             ^


vim +/if +716 net/core/page_pool.c

ff7d6b27f894f1 Jesper Dangaard Brouer 2018-04-17  707  
4dec64c52e24c2 Mina Almasry           2024-06-28  708  static bool page_pool_recycle_in_ring(struct page_pool *pool, netmem_ref netmem)
ff7d6b27f894f1 Jesper Dangaard Brouer 2018-04-17  709  {
8801e4b0622139 Dong Chenchen          2025-05-23  710  	bool in_softirq;
ff7d6b27f894f1 Jesper Dangaard Brouer 2018-04-17  711  	int ret;
542bcea4be866b Qingfang DENG          2023-02-03  712  	/* BH protection not needed if current is softirq */
8801e4b0622139 Dong Chenchen          2025-05-23  713  	in_softirq = page_pool_producer_lock(pool);
8801e4b0622139 Dong Chenchen          2025-05-23  714  	ret = !__ptr_ring_produce(&pool->ring, (__force void *)netmem);
8801e4b0622139 Dong Chenchen          2025-05-23  715  	if (ret)
ad6fa1e1ab1b81 Joe Damato             2022-03-01 @716  		recycle_stat_inc(pool, ring);
8801e4b0622139 Dong Chenchen          2025-05-23  717  	page_pool_producer_unlock(pool, in_softirq);
ad6fa1e1ab1b81 Joe Damato             2022-03-01  718  
8801e4b0622139 Dong Chenchen          2025-05-23  719  	return ret;
ff7d6b27f894f1 Jesper Dangaard Brouer 2018-04-17  720  }
ff7d6b27f894f1 Jesper Dangaard Brouer 2018-04-17  721  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Re: [PATCH net] page_pool: Fix use-after-free in page_pool_recycle_in_ring
Posted by Paolo Abeni 6 months, 3 weeks ago
On 5/23/25 8:45 AM, Dong Chenchen wrote:
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index 7745ad924ae2..08f1b000ebc1 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -707,19 +707,16 @@ void page_pool_return_page(struct page_pool *pool, netmem_ref netmem)
>  
>  static bool page_pool_recycle_in_ring(struct page_pool *pool, netmem_ref netmem)
>  {
> +	bool in_softirq;
>  	int ret;
>  	/* BH protection not needed if current is softirq */
> -	if (in_softirq())
> -		ret = ptr_ring_produce(&pool->ring, (__force void *)netmem);
> -	else
> -		ret = ptr_ring_produce_bh(&pool->ring, (__force void *)netmem);
> -
> -	if (!ret) {
> +	in_softirq = page_pool_producer_lock(pool);
> +	ret = !__ptr_ring_produce(&pool->ring, (__force void *)netmem);
> +	if (ret)
>  		recycle_stat_inc(pool, ring);
> -		return true;
> -	}

Does not build in our CI:

net/core/page_pool.c: In function ‘page_pool_recycle_in_ring’:
net/core/page_pool.c:750:45: error: suggest braces around empty body in
an ‘if’ statement [-Werror=empty-body]
  750 |                 recycle_stat_inc(pool, ring);
      |                                             ^

/P

Re: [PATCH net] page_pool: Fix use-after-free in page_pool_recycle_in_ring
Posted by dongchenchen (A) 6 months, 3 weeks ago
在 2025/5/23 21:31, Paolo Abeni 写道:
> On 5/23/25 8:45 AM, Dong Chenchen wrote:
>> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
>> index 7745ad924ae2..08f1b000ebc1 100644
>> --- a/net/core/page_pool.c
>> +++ b/net/core/page_pool.c
>> @@ -707,19 +707,16 @@ void page_pool_return_page(struct page_pool *pool, netmem_ref netmem)
>>   
>>   static bool page_pool_recycle_in_ring(struct page_pool *pool, netmem_ref netmem)
>>   {
>> +	bool in_softirq;
>>   	int ret;
>>   	/* BH protection not needed if current is softirq */
>> -	if (in_softirq())
>> -		ret = ptr_ring_produce(&pool->ring, (__force void *)netmem);
>> -	else
>> -		ret = ptr_ring_produce_bh(&pool->ring, (__force void *)netmem);
>> -
>> -	if (!ret) {
>> +	in_softirq = page_pool_producer_lock(pool);
>> +	ret = !__ptr_ring_produce(&pool->ring, (__force void *)netmem);
>> +	if (ret)
>>   		recycle_stat_inc(pool, ring);
>> -		return true;
>> -	}
> Does not build in our CI:
>
> net/core/page_pool.c: In function ‘page_pool_recycle_in_ring’:
> net/core/page_pool.c:750:45: error: suggest braces around empty body in
> an ‘if’ statement [-Werror=empty-body]
>    750 |                 recycle_stat_inc(pool, ring);
>        |                                             ^
>
> /P
>
I am sorry for this mistake.
recycle_stat_inc() is empty when CONFIG_PAGE_POOL_STATS is not enabled.
Maybe we can fix it as:

diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 08f1b000ebc1..19c1505ec40f 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -154,8 +154,8 @@ EXPORT_SYMBOL(page_pool_ethtool_stats_get);
  
  #else
  #define alloc_stat_inc(pool, __stat)
-#define recycle_stat_inc(pool, __stat)
-#define recycle_stat_add(pool, __stat, val)
+#define recycle_stat_inc(pool, __stat)         do { } while (0)
+#define recycle_stat_add(pool, __stat, val)    do { } while (0)
  #endif

Thanks a lot!

Dong Chenchen

Re: [PATCH net] page_pool: Fix use-after-free in page_pool_recycle_in_ring
Posted by Toke Høiland-Jørgensen 6 months, 3 weeks ago
Dong Chenchen <dongchenchen2@huawei.com> writes:

> syzbot reported a uaf in page_pool_recycle_in_ring:
>
> BUG: KASAN: slab-use-after-free in lock_release+0x151/0xa30 kernel/locking/lockdep.c:5862
> Read of size 8 at addr ffff8880286045a0 by task syz.0.284/6943
>
> CPU: 0 UID: 0 PID: 6943 Comm: syz.0.284 Not tainted 6.13.0-rc3-syzkaller-gdfa94ce54f41 #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
> Call Trace:
>  <TASK>
>  __dump_stack lib/dump_stack.c:94 [inline]
>  dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
>  print_address_description mm/kasan/report.c:378 [inline]
>  print_report+0x169/0x550 mm/kasan/report.c:489
>  kasan_report+0x143/0x180 mm/kasan/report.c:602
>  lock_release+0x151/0xa30 kernel/locking/lockdep.c:5862
>  __raw_spin_unlock_bh include/linux/spinlock_api_smp.h:165 [inline]
>  _raw_spin_unlock_bh+0x1b/0x40 kernel/locking/spinlock.c:210
>  spin_unlock_bh include/linux/spinlock.h:396 [inline]
>  ptr_ring_produce_bh include/linux/ptr_ring.h:164 [inline]
>  page_pool_recycle_in_ring net/core/page_pool.c:707 [inline]
>  page_pool_put_unrefed_netmem+0x748/0xb00 net/core/page_pool.c:826
>  page_pool_put_netmem include/net/page_pool/helpers.h:323 [inline]
>  page_pool_put_full_netmem include/net/page_pool/helpers.h:353 [inline]
>  napi_pp_put_page+0x149/0x2b0 net/core/skbuff.c:1036
>  skb_pp_recycle net/core/skbuff.c:1047 [inline]
>  skb_free_head net/core/skbuff.c:1094 [inline]
>  skb_release_data+0x6c4/0x8a0 net/core/skbuff.c:1125
>  skb_release_all net/core/skbuff.c:1190 [inline]
>  __kfree_skb net/core/skbuff.c:1204 [inline]
>  sk_skb_reason_drop+0x1c9/0x380 net/core/skbuff.c:1242
>  kfree_skb_reason include/linux/skbuff.h:1263 [inline]
>  __skb_queue_purge_reason include/linux/skbuff.h:3343 [inline]
>
> root cause is:
>
> page_pool_recycle_in_ring
>   ptr_ring_produce
>     spin_lock(&r->producer_lock);
>     WRITE_ONCE(r->queue[r->producer++], ptr)
>       //recycle last page to pool
> 				page_pool_release
> 				  page_pool_scrub
> 				    page_pool_empty_ring
> 				      ptr_ring_consume
> 				      page_pool_return_page  //release all page
> 				  __page_pool_destroy
> 				     free_percpu(pool->recycle_stats);
> 				     free(pool) //free
>
>      spin_unlock(&r->producer_lock); //pool->ring uaf read
>   recycle_stat_inc(pool, ring);
>
> page_pool can be free while page pool recycle the last page in ring.
> Add producer-lock barrier to page_pool_release to prevent the page
> pool from being free before all pages have been recycled.
>
> Suggested-by: Jakub Kacinski <kuba@kernel.org>
> Link: https://lore.kernel.org/netdev/20250513083123.3514193-1-dongchenchen2@huawei.com
> Fixes: ff7d6b27f894 ("page_pool: refurbish version of page_pool code")
> Reported-by: syzbot+204a4382fcb3311f3858@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=204a4382fcb3311f3858
> Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com>

Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Re: [PATCH net] page_pool: Fix use-after-free in page_pool_recycle_in_ring
Posted by Yunsheng Lin 6 months, 3 weeks ago
On 2025/5/23 14:45, Dong Chenchen wrote:

>  
>  static bool page_pool_recycle_in_ring(struct page_pool *pool, netmem_ref netmem)
>  {
> +	bool in_softirq;
>  	int ret;
int -> bool?

>  	/* BH protection not needed if current is softirq */
> -	if (in_softirq())
> -		ret = ptr_ring_produce(&pool->ring, (__force void *)netmem);
> -	else
> -		ret = ptr_ring_produce_bh(&pool->ring, (__force void *)netmem);
> -
> -	if (!ret) {
> +	in_softirq = page_pool_producer_lock(pool);
> +	ret = !__ptr_ring_produce(&pool->ring, (__force void *)netmem);
> +	if (ret)
>  		recycle_stat_inc(pool, ring);
> -		return true;
> -	}
> +	page_pool_producer_unlock(pool, in_softirq);
>  
> -	return false;
> +	return ret;
>  }
>  
>  /* Only allow direct recycling in special circumstances, into the
> @@ -1091,10 +1088,14 @@ static void page_pool_scrub(struct page_pool *pool)
>  
>  static int page_pool_release(struct page_pool *pool)
>  {
> +	bool in_softirq;
>  	int inflight;
>  
>  	page_pool_scrub(pool);
>  	inflight = page_pool_inflight(pool, true);
> +	/* Acquire producer lock to make sure producers have exited. */
> +	in_softirq = page_pool_producer_lock(pool);
> +	page_pool_producer_unlock(pool, in_softirq);

Is a compiler barrier needed to ensure compiler doesn't optimize away
the above code?

>  	if (!inflight)
>  		__page_pool_destroy(pool);
>
Re: [PATCH net] page_pool: Fix use-after-free in page_pool_recycle_in_ring
Posted by dongchenchen (A) 6 months, 3 weeks ago
> On 2025/5/23 14:45, Dong Chenchen wrote:
>
>>   
>>   static bool page_pool_recycle_in_ring(struct page_pool *pool, netmem_ref netmem)
>>   {
>> +	bool in_softirq;
>>   	int ret;
> int -> bool?

Thanks for your review!

v2 will fix it.

>>   	/* BH protection not needed if current is softirq */
>> -	if (in_softirq())
>> -		ret = ptr_ring_produce(&pool->ring, (__force void *)netmem);
>> -	else
>> -		ret = ptr_ring_produce_bh(&pool->ring, (__force void *)netmem);
>> -
>> -	if (!ret) {
>> +	in_softirq = page_pool_producer_lock(pool);
>> +	ret = !__ptr_ring_produce(&pool->ring, (__force void *)netmem);
>> +	if (ret)
>>   		recycle_stat_inc(pool, ring);
>> -		return true;
>> -	}
>> +	page_pool_producer_unlock(pool, in_softirq);
>>   
>> -	return false;
>> +	return ret;
>>   }
>>   
>>   /* Only allow direct recycling in special circumstances, into the
>> @@ -1091,10 +1088,14 @@ static void page_pool_scrub(struct page_pool *pool)
>>   
>>   static int page_pool_release(struct page_pool *pool)
>>   {
>> +	bool in_softirq;
>>   	int inflight;
>>   
>>   	page_pool_scrub(pool);
>>   	inflight = page_pool_inflight(pool, true);
>> +	/* Acquire producer lock to make sure producers have exited. */
>> +	in_softirq = page_pool_producer_lock(pool);
>> +	page_pool_producer_unlock(pool, in_softirq);
> Is a compiler barrier needed to ensure compiler doesn't optimize away
> the above code?

All the code logic of this function accesses the pool. Do we need to
add a compiler barrier for it?

Best Regards,
Dong Chenchen

>>   	if (!inflight)
>>   		__page_pool_destroy(pool);
>>
Re: [PATCH net] page_pool: Fix use-after-free in page_pool_recycle_in_ring
Posted by Mina Almasry 6 months, 3 weeks ago
On Fri, May 23, 2025 at 1:31 AM Yunsheng Lin <linyunsheng@huawei.com> wrote:
>
> On 2025/5/23 14:45, Dong Chenchen wrote:
>
> >
> >  static bool page_pool_recycle_in_ring(struct page_pool *pool, netmem_ref netmem)
> >  {
> > +     bool in_softirq;
> >       int ret;
> int -> bool?
>
> >       /* BH protection not needed if current is softirq */
> > -     if (in_softirq())
> > -             ret = ptr_ring_produce(&pool->ring, (__force void *)netmem);
> > -     else
> > -             ret = ptr_ring_produce_bh(&pool->ring, (__force void *)netmem);
> > -
> > -     if (!ret) {
> > +     in_softirq = page_pool_producer_lock(pool);
> > +     ret = !__ptr_ring_produce(&pool->ring, (__force void *)netmem);
> > +     if (ret)
> >               recycle_stat_inc(pool, ring);
> > -             return true;
> > -     }
> > +     page_pool_producer_unlock(pool, in_softirq);
> >
> > -     return false;
> > +     return ret;
> >  }
> >
> >  /* Only allow direct recycling in special circumstances, into the
> > @@ -1091,10 +1088,14 @@ static void page_pool_scrub(struct page_pool *pool)
> >
> >  static int page_pool_release(struct page_pool *pool)
> >  {
> > +     bool in_softirq;
> >       int inflight;
> >
> >       page_pool_scrub(pool);
> >       inflight = page_pool_inflight(pool, true);
> > +     /* Acquire producer lock to make sure producers have exited. */
> > +     in_softirq = page_pool_producer_lock(pool);
> > +     page_pool_producer_unlock(pool, in_softirq);
>
> Is a compiler barrier needed to ensure compiler doesn't optimize away
> the above code?
>

I don't want to derail this conversation too much, and I suggested a
similar fix to this initially, but now I'm not sure I understand why
it works.

Why is the existing barrier not working and acquiring/releasing the
producer lock fixes this issue instead? The existing barrier is the
producer thread incrementing pool->pages_state_release_cnt, and
page_pool_release() is supposed to block the freeing of the page_pool
until it sees the
`atomic_inc_return_relaxed(&pool->pages_state_release_cnt);` from the
producer thread. Any idea why this barrier is not working? AFAIU it
should do the exact same thing as acquiring/dropping the producer
lock.


-- 
Thanks,
Mina
Re: [PATCH net] page_pool: Fix use-after-free in page_pool_recycle_in_ring
Posted by dongchenchen (A) 6 months, 3 weeks ago
> On Fri, May 23, 2025 at 1:31 AM Yunsheng Lin <linyunsheng@huawei.com> wrote:
>> On 2025/5/23 14:45, Dong Chenchen wrote:
>>
>>>   static bool page_pool_recycle_in_ring(struct page_pool *pool, netmem_ref netmem)
>>>   {
>>> +     bool in_softirq;
>>>        int ret;
>> int -> bool?
>>
>>>        /* BH protection not needed if current is softirq */
>>> -     if (in_softirq())
>>> -             ret = ptr_ring_produce(&pool->ring, (__force void *)netmem);
>>> -     else
>>> -             ret = ptr_ring_produce_bh(&pool->ring, (__force void *)netmem);
>>> -
>>> -     if (!ret) {
>>> +     in_softirq = page_pool_producer_lock(pool);
>>> +     ret = !__ptr_ring_produce(&pool->ring, (__force void *)netmem);
>>> +     if (ret)
>>>                recycle_stat_inc(pool, ring);
>>> -             return true;
>>> -     }
>>> +     page_pool_producer_unlock(pool, in_softirq);
>>>
>>> -     return false;
>>> +     return ret;
>>>   }
>>>
>>>   /* Only allow direct recycling in special circumstances, into the
>>> @@ -1091,10 +1088,14 @@ static void page_pool_scrub(struct page_pool *pool)
>>>
>>>   static int page_pool_release(struct page_pool *pool)
>>>   {
>>> +     bool in_softirq;
>>>        int inflight;
>>>
>>>        page_pool_scrub(pool);
>>>        inflight = page_pool_inflight(pool, true);
>>> +     /* Acquire producer lock to make sure producers have exited. */
>>> +     in_softirq = page_pool_producer_lock(pool);
>>> +     page_pool_producer_unlock(pool, in_softirq);
>> Is a compiler barrier needed to ensure compiler doesn't optimize away
>> the above code?
>>
> I don't want to derail this conversation too much, and I suggested a
> similar fix to this initially, but now I'm not sure I understand why
> it works.
>
> Why is the existing barrier not working and acquiring/releasing the
> producer lock fixes this issue instead? The existing barrier is the
> producer thread incrementing pool->pages_state_release_cnt, and
> page_pool_release() is supposed to block the freeing of the page_pool
> until it sees the
> `atomic_inc_return_relaxed(&pool->pages_state_release_cnt);` from the
> producer thread. Any idea why this barrier is not working? AFAIU it
> should do the exact same thing as acquiring/dropping the producer
> lock.

Hi, Mina
As previously mentioned:
page_pool_recycle_in_ring
   ptr_ring_produce
     spin_lock(&r->producer_lock);
     WRITE_ONCE(r->queue[r->producer++], ptr)
       //recycle last page to pool, producer + release_cnt = hold_cnt
				page_pool_release
				  page_pool_scrub
				    page_pool_empty_ring
				      ptr_ring_consume
				        page_pool_return_page
				       //release_cnt=hold_cnt
				  __page_pool_destroy //inflight=0
				     free_percpu(pool->recycle_stats);
				     free(pool) //free
      spin_unlock(&r->producer_lock); //pool->ring uaf read
   recycle_stat_inc(pool, ring);

release_cnt can block the freeing of the page_pool until it sees the
(release_cnt = hold_cnt) from the producer thread.
However, page_pool_release() can be executed simultaneously when a page
is recycle (e.g. kfree_skb). page_pool release_cnt will increase after
the producer is written, then pool can be free and pool read in producer
will trigger UAF.
So adding a producer lock barrier to wait for recycle process to
complete can fix it.

Best Regards,
Dong Chenchen

>
Re: [PATCH net] page_pool: Fix use-after-free in page_pool_recycle_in_ring
Posted by Mina Almasry 6 months, 3 weeks ago
)

On Mon, May 26, 2025 at 7:47 AM dongchenchen (A)
<dongchenchen2@huawei.com> wrote:
>
>
> > On Fri, May 23, 2025 at 1:31 AM Yunsheng Lin <linyunsheng@huawei.com> wrote:
> >> On 2025/5/23 14:45, Dong Chenchen wrote:
> >>
> >>>   static bool page_pool_recycle_in_ring(struct page_pool *pool, netmem_ref netmem)
> >>>   {
> >>> +     bool in_softirq;
> >>>        int ret;
> >> int -> bool?
> >>
> >>>        /* BH protection not needed if current is softirq */
> >>> -     if (in_softirq())
> >>> -             ret = ptr_ring_produce(&pool->ring, (__force void *)netmem);
> >>> -     else
> >>> -             ret = ptr_ring_produce_bh(&pool->ring, (__force void *)netmem);
> >>> -
> >>> -     if (!ret) {
> >>> +     in_softirq = page_pool_producer_lock(pool);
> >>> +     ret = !__ptr_ring_produce(&pool->ring, (__force void *)netmem);
> >>> +     if (ret)
> >>>                recycle_stat_inc(pool, ring);
> >>> -             return true;
> >>> -     }
> >>> +     page_pool_producer_unlock(pool, in_softirq);
> >>>
> >>> -     return false;
> >>> +     return ret;
> >>>   }
> >>>
> >>>   /* Only allow direct recycling in special circumstances, into the
> >>> @@ -1091,10 +1088,14 @@ static void page_pool_scrub(struct page_pool *pool)
> >>>
> >>>   static int page_pool_release(struct page_pool *pool)
> >>>   {
> >>> +     bool in_softirq;
> >>>        int inflight;
> >>>
> >>>        page_pool_scrub(pool);
> >>>        inflight = page_pool_inflight(pool, true);
> >>> +     /* Acquire producer lock to make sure producers have exited. */
> >>> +     in_softirq = page_pool_producer_lock(pool);
> >>> +     page_pool_producer_unlock(pool, in_softirq);
> >> Is a compiler barrier needed to ensure compiler doesn't optimize away
> >> the above code?
> >>
> > I don't want to derail this conversation too much, and I suggested a
> > similar fix to this initially, but now I'm not sure I understand why
> > it works.
> >
> > Why is the existing barrier not working and acquiring/releasing the
> > producer lock fixes this issue instead? The existing barrier is the
> > producer thread incrementing pool->pages_state_release_cnt, and
> > page_pool_release() is supposed to block the freeing of the page_pool
> > until it sees the
> > `atomic_inc_return_relaxed(&pool->pages_state_release_cnt);` from the
> > producer thread. Any idea why this barrier is not working? AFAIU it
> > should do the exact same thing as acquiring/dropping the producer
> > lock.
>
> Hi, Mina
> As previously mentioned:
> page_pool_recycle_in_ring
>    ptr_ring_produce
>      spin_lock(&r->producer_lock);
>      WRITE_ONCE(r->queue[r->producer++], ptr)
>        //recycle last page to pool, producer + release_cnt = hold_cnt

This is not right. release_cnt != hold_cnt at this point.

Release_cnt is only incremented by the producer _after_ the
spin_unlock and the recycle_stat_inc have been done. The full call
stack on the producer thread:

page_pool_put_unrefed_netmem
    page_pool_recycle_in_ring
        ptr_ring_produce(&pool->ring, (__force void *)netmem);
             spin_lock(&r->producer_lock);
             __ptr_ring_produce(r, ptr);
             spin_unlock(&r->producer_lock);
        recycle_stat_inc(pool, ring);
    recycle_stat_inc(pool, ring_full);
    page_pool_return_page
        atomic_inc_return_relaxed(&pool->pages_state_release_cnt);

The atomic_inc_return_relaxed happens after all the lines that could
cause UAF are already executed. Is it because we're using the _relaxed
version of the atomic operation, that the compiler can reorder it to
happen before the spin_unlock(&r->producer_lock) and before the
recycle_stat_inc...?

-- 
Thanks,
Mina
Re: [PATCH net] page_pool: Fix use-after-free in page_pool_recycle_in_ring
Posted by dongchenchen (A) 6 months, 3 weeks ago
> )
>
> On Mon, May 26, 2025 at 7:47 AM dongchenchen (A)
> <dongchenchen2@huawei.com> wrote:
>>
>>> On Fri, May 23, 2025 at 1:31 AM Yunsheng Lin <linyunsheng@huawei.com> wrote:
>>>> On 2025/5/23 14:45, Dong Chenchen wrote:
>>>>
>>>>>    static bool page_pool_recycle_in_ring(struct page_pool *pool, netmem_ref netmem)
>>>>>    {
>>>>> +     bool in_softirq;
>>>>>         int ret;
>>>> int -> bool?
>>>>
>>>>>         /* BH protection not needed if current is softirq */
>>>>> -     if (in_softirq())
>>>>> -             ret = ptr_ring_produce(&pool->ring, (__force void *)netmem);
>>>>> -     else
>>>>> -             ret = ptr_ring_produce_bh(&pool->ring, (__force void *)netmem);
>>>>> -
>>>>> -     if (!ret) {
>>>>> +     in_softirq = page_pool_producer_lock(pool);
>>>>> +     ret = !__ptr_ring_produce(&pool->ring, (__force void *)netmem);
>>>>> +     if (ret)
>>>>>                 recycle_stat_inc(pool, ring);
>>>>> -             return true;
>>>>> -     }
>>>>> +     page_pool_producer_unlock(pool, in_softirq);
>>>>>
>>>>> -     return false;
>>>>> +     return ret;
>>>>>    }
>>>>>
>>>>>    /* Only allow direct recycling in special circumstances, into the
>>>>> @@ -1091,10 +1088,14 @@ static void page_pool_scrub(struct page_pool *pool)
>>>>>
>>>>>    static int page_pool_release(struct page_pool *pool)
>>>>>    {
>>>>> +     bool in_softirq;
>>>>>         int inflight;
>>>>>
>>>>>         page_pool_scrub(pool);
>>>>>         inflight = page_pool_inflight(pool, true);
>>>>> +     /* Acquire producer lock to make sure producers have exited. */
>>>>> +     in_softirq = page_pool_producer_lock(pool);
>>>>> +     page_pool_producer_unlock(pool, in_softirq);
>>>> Is a compiler barrier needed to ensure compiler doesn't optimize away
>>>> the above code?
>>>>
>>> I don't want to derail this conversation too much, and I suggested a
>>> similar fix to this initially, but now I'm not sure I understand why
>>> it works.
>>>
>>> Why is the existing barrier not working and acquiring/releasing the
>>> producer lock fixes this issue instead? The existing barrier is the
>>> producer thread incrementing pool->pages_state_release_cnt, and
>>> page_pool_release() is supposed to block the freeing of the page_pool
>>> until it sees the
>>> `atomic_inc_return_relaxed(&pool->pages_state_release_cnt);` from the
>>> producer thread. Any idea why this barrier is not working? AFAIU it
>>> should do the exact same thing as acquiring/dropping the producer
>>> lock.
>> Hi, Mina
>> As previously mentioned:
>> page_pool_recycle_in_ring
>>     ptr_ring_produce
>>       spin_lock(&r->producer_lock);
>>       WRITE_ONCE(r->queue[r->producer++], ptr)
>>         //recycle last page to pool, producer + release_cnt = hold_cnt
> This is not right. release_cnt != hold_cnt at this point.

Hi,Mina!
Thanks for your review!
release_cnt != hold_cnt at this point. producer inc r->producer
and release_cnt will be incremented by page_pool_empty_ring() in
page_pool_release().

> Release_cnt is only incremented by the producer _after_ the
> spin_unlock and the recycle_stat_inc have been done. The full call
> stack on the producer thread:
>
> page_pool_put_unrefed_netmem
>      page_pool_recycle_in_ring
>          ptr_ring_produce(&pool->ring, (__force void *)netmem);
>               spin_lock(&r->producer_lock);
>               __ptr_ring_produce(r, ptr);
>               spin_unlock(&r->producer_lock);
>          recycle_stat_inc(pool, ring);

If page_ring is not full, page_pool_recycle_in_ring will return true.
The release cnt will be incremented by page_pool_empty_ring() in
page_pool_release(), and the code as below will not be executed.

page_pool_put_unrefed_netmem
   if (!page_pool_recycle_in_ring(pool, netmem)) //return true
       page_pool_return_page(pool, netmem);

Best Regards,
Dong Chenchen

>      recycle_stat_inc(pool, ring_full);
>      page_pool_return_page
>          atomic_inc_return_relaxed(&pool->pages_state_release_cnt);
>
> The atomic_inc_return_relaxed happens after all the lines that could
> cause UAF are already executed. Is it because we're using the _relaxed
> version of the atomic operation, that the compiler can reorder it to
> happen before the spin_unlock(&r->producer_lock) and before the
> recycle_stat_inc...?
>
Re: [PATCH net] page_pool: Fix use-after-free in page_pool_recycle_in_ring
Posted by Mina Almasry 6 months, 3 weeks ago
On Mon, May 26, 2025 at 6:53 PM dongchenchen (A)
<dongchenchen2@huawei.com> wrote:
>
>
> > )
> >
> > On Mon, May 26, 2025 at 7:47 AM dongchenchen (A)
> > <dongchenchen2@huawei.com> wrote:
> >>
> >>> On Fri, May 23, 2025 at 1:31 AM Yunsheng Lin <linyunsheng@huawei.com> wrote:
> >>>> On 2025/5/23 14:45, Dong Chenchen wrote:
> >>>>
> >>>>>    static bool page_pool_recycle_in_ring(struct page_pool *pool, netmem_ref netmem)
> >>>>>    {
> >>>>> +     bool in_softirq;
> >>>>>         int ret;
> >>>> int -> bool?
> >>>>
> >>>>>         /* BH protection not needed if current is softirq */
> >>>>> -     if (in_softirq())
> >>>>> -             ret = ptr_ring_produce(&pool->ring, (__force void *)netmem);
> >>>>> -     else
> >>>>> -             ret = ptr_ring_produce_bh(&pool->ring, (__force void *)netmem);
> >>>>> -
> >>>>> -     if (!ret) {
> >>>>> +     in_softirq = page_pool_producer_lock(pool);
> >>>>> +     ret = !__ptr_ring_produce(&pool->ring, (__force void *)netmem);
> >>>>> +     if (ret)
> >>>>>                 recycle_stat_inc(pool, ring);
> >>>>> -             return true;
> >>>>> -     }
> >>>>> +     page_pool_producer_unlock(pool, in_softirq);
> >>>>>
> >>>>> -     return false;
> >>>>> +     return ret;
> >>>>>    }
> >>>>>
> >>>>>    /* Only allow direct recycling in special circumstances, into the
> >>>>> @@ -1091,10 +1088,14 @@ static void page_pool_scrub(struct page_pool *pool)
> >>>>>
> >>>>>    static int page_pool_release(struct page_pool *pool)
> >>>>>    {
> >>>>> +     bool in_softirq;
> >>>>>         int inflight;
> >>>>>
> >>>>>         page_pool_scrub(pool);
> >>>>>         inflight = page_pool_inflight(pool, true);
> >>>>> +     /* Acquire producer lock to make sure producers have exited. */
> >>>>> +     in_softirq = page_pool_producer_lock(pool);
> >>>>> +     page_pool_producer_unlock(pool, in_softirq);
> >>>> Is a compiler barrier needed to ensure compiler doesn't optimize away
> >>>> the above code?
> >>>>
> >>> I don't want to derail this conversation too much, and I suggested a
> >>> similar fix to this initially, but now I'm not sure I understand why
> >>> it works.
> >>>
> >>> Why is the existing barrier not working and acquiring/releasing the
> >>> producer lock fixes this issue instead? The existing barrier is the
> >>> producer thread incrementing pool->pages_state_release_cnt, and
> >>> page_pool_release() is supposed to block the freeing of the page_pool
> >>> until it sees the
> >>> `atomic_inc_return_relaxed(&pool->pages_state_release_cnt);` from the
> >>> producer thread. Any idea why this barrier is not working? AFAIU it
> >>> should do the exact same thing as acquiring/dropping the producer
> >>> lock.
> >> Hi, Mina
> >> As previously mentioned:
> >> page_pool_recycle_in_ring
> >>     ptr_ring_produce
> >>       spin_lock(&r->producer_lock);
> >>       WRITE_ONCE(r->queue[r->producer++], ptr)
> >>         //recycle last page to pool, producer + release_cnt = hold_cnt
> > This is not right. release_cnt != hold_cnt at this point.
>
> Hi,Mina!
> Thanks for your review!
> release_cnt != hold_cnt at this point. producer inc r->producer
> and release_cnt will be incremented by page_pool_empty_ring() in
> page_pool_release().
>
> > Release_cnt is only incremented by the producer _after_ the
> > spin_unlock and the recycle_stat_inc have been done. The full call
> > stack on the producer thread:
> >
> > page_pool_put_unrefed_netmem
> >      page_pool_recycle_in_ring
> >          ptr_ring_produce(&pool->ring, (__force void *)netmem);
> >               spin_lock(&r->producer_lock);
> >               __ptr_ring_produce(r, ptr);
> >               spin_unlock(&r->producer_lock);
> >          recycle_stat_inc(pool, ring);
>
> If page_ring is not full, page_pool_recycle_in_ring will return true.
> The release cnt will be incremented by page_pool_empty_ring() in
> page_pool_release(), and the code as below will not be executed.
>
> page_pool_put_unrefed_netmem
>    if (!page_pool_recycle_in_ring(pool, netmem)) //return true
>        page_pool_return_page(pool, netmem);
>

Oh! Thanks! I see the race now.

page_pool_recycle_in_ring in the producer can return the page to the
ring. Then the consumer will see the netmem in the ring, free it,
increment release_cnt, and free the page_pool. Then the producer
continues executing and hits a UAF. Very subtle race indeed. Thanks
for the patient explanation.

Reviewed-by: Mina Almasry <almasrymina@google.com>

-- 
Thanks,
Mina