[PATCH net-next v23 3/7] mm: page_frag: use initial zero offset for page_frag_alloc_align()

Yunsheng Lin posted 7 patches 1 year, 3 months ago
[PATCH net-next v23 3/7] mm: page_frag: use initial zero offset for page_frag_alloc_align()
Posted by Yunsheng Lin 1 year, 3 months ago
We are about to use page_frag_alloc_*() API to not just
allocate memory for skb->data, but also use them to do
the memory allocation for skb frag too. Currently the
implementation of page_frag in mm subsystem is running
the offset as a countdown rather than count-up value,
there may have several advantages to that as mentioned
in [1], but it may have some disadvantages, for example,
it may disable skb frag coalescing and more correct cache
prefetching

We have a trade-off to make in order to have a unified
implementation and API for page_frag, so use a initial zero
offset in this patch, and the following patch will try to
make some optimization to avoid the disadvantages as much
as possible.

1. https://lore.kernel.org/all/f4abe71b3439b39d17a6fb2d410180f367cadf5c.camel@gmail.com/

CC: Alexander Duyck <alexander.duyck@gmail.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Linux-MM <linux-mm@kvack.org>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
---
 mm/page_frag_cache.c | 46 ++++++++++++++++++++++----------------------
 1 file changed, 23 insertions(+), 23 deletions(-)

diff --git a/mm/page_frag_cache.c b/mm/page_frag_cache.c
index 609a485cd02a..4c8e04379cb3 100644
--- a/mm/page_frag_cache.c
+++ b/mm/page_frag_cache.c
@@ -63,9 +63,13 @@ void *__page_frag_alloc_align(struct page_frag_cache *nc,
 			      unsigned int fragsz, gfp_t gfp_mask,
 			      unsigned int align_mask)
 {
+#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
+	unsigned int size = nc->size;
+#else
 	unsigned int size = PAGE_SIZE;
+#endif
+	unsigned int offset;
 	struct page *page;
-	int offset;
 
 	if (unlikely(!nc->va)) {
 refill:
@@ -85,11 +89,24 @@ void *__page_frag_alloc_align(struct page_frag_cache *nc,
 		/* reset page count bias and offset to start of new frag */
 		nc->pfmemalloc = page_is_pfmemalloc(page);
 		nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1;
-		nc->offset = size;
+		nc->offset = 0;
 	}
 
-	offset = nc->offset - fragsz;
-	if (unlikely(offset < 0)) {
+	offset = __ALIGN_KERNEL_MASK(nc->offset, ~align_mask);
+	if (unlikely(offset + fragsz > size)) {
+		if (unlikely(fragsz > PAGE_SIZE)) {
+			/*
+			 * The caller is trying to allocate a fragment
+			 * with fragsz > PAGE_SIZE but the cache isn't big
+			 * enough to satisfy the request, this may
+			 * happen in low memory conditions.
+			 * We don't release the cache page because
+			 * it could make memory pressure worse
+			 * so we simply return NULL here.
+			 */
+			return NULL;
+		}
+
 		page = virt_to_page(nc->va);
 
 		if (!page_ref_sub_and_test(page, nc->pagecnt_bias))
@@ -100,33 +117,16 @@ void *__page_frag_alloc_align(struct page_frag_cache *nc,
 			goto refill;
 		}
 
-#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
-		/* if size can vary use size else just use PAGE_SIZE */
-		size = nc->size;
-#endif
 		/* OK, page count is 0, we can safely set it */
 		set_page_count(page, PAGE_FRAG_CACHE_MAX_SIZE + 1);
 
 		/* reset page count bias and offset to start of new frag */
 		nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1;
-		offset = size - fragsz;
-		if (unlikely(offset < 0)) {
-			/*
-			 * The caller is trying to allocate a fragment
-			 * with fragsz > PAGE_SIZE but the cache isn't big
-			 * enough to satisfy the request, this may
-			 * happen in low memory conditions.
-			 * We don't release the cache page because
-			 * it could make memory pressure worse
-			 * so we simply return NULL here.
-			 */
-			return NULL;
-		}
+		offset = 0;
 	}
 
 	nc->pagecnt_bias--;
-	offset &= align_mask;
-	nc->offset = offset;
+	nc->offset = offset + fragsz;
 
 	return nc->va + offset;
 }
-- 
2.33.0
Re: [PATCH net-next v23 3/7] mm: page_frag: use initial zero offset for page_frag_alloc_align()
Posted by Florian Fainelli 1 year ago
Hi Yunsheng,

On 10/28/24 04:53, Yunsheng Lin wrote:
> We are about to use page_frag_alloc_*() API to not just
> allocate memory for skb->data, but also use them to do
> the memory allocation for skb frag too. Currently the
> implementation of page_frag in mm subsystem is running
> the offset as a countdown rather than count-up value,
> there may have several advantages to that as mentioned
> in [1], but it may have some disadvantages, for example,
> it may disable skb frag coalescing and more correct cache
> prefetching
> 
> We have a trade-off to make in order to have a unified
> implementation and API for page_frag, so use a initial zero
> offset in this patch, and the following patch will try to
> make some optimization to avoid the disadvantages as much
> as possible.
> 
> 1. https://lore.kernel.org/all/f4abe71b3439b39d17a6fb2d410180f367cadf5c.camel@gmail.com/
> 
> CC: Alexander Duyck <alexander.duyck@gmail.com>
> CC: Andrew Morton <akpm@linux-foundation.org>
> CC: Linux-MM <linux-mm@kvack.org>
> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>

Sorry for the late feedback, this patch causes the bgmac driver in is 
.ndo_open() function to return -ENOMEM, the call trace looks like this:

  bgmac_open
   -> bgmac_dma_init
     -> bgmac_dma_rx_skb_for_slot
       -> netdev_alloc_frag

BGMAC_RX_ALLOC_SIZE = 10048 and PAGE_FRAG_CACHE_MAX_SIZE = 32768.

Eventually we land into __page_frag_alloc_align() with the following 
parameters across multiple successive calls:

__page_frag_alloc_align: fragsz=10048, align_mask=-1, size=32768, offset=0
__page_frag_alloc_align: fragsz=10048, align_mask=-1, size=32768, 
offset=10048
__page_frag_alloc_align: fragsz=10048, align_mask=-1, size=32768, 
offset=20096
__page_frag_alloc_align: fragsz=10048, align_mask=-1, size=32768, 
offset=30144

So in that case we do indeed have offset + fragsz (40192) > size (32768) 
and so we would eventually return NULL.

Any idea on how to best fix that within the bgmac driver?

Thanks!

> ---
>   mm/page_frag_cache.c | 46 ++++++++++++++++++++++----------------------
>   1 file changed, 23 insertions(+), 23 deletions(-)
> 
> diff --git a/mm/page_frag_cache.c b/mm/page_frag_cache.c
> index 609a485cd02a..4c8e04379cb3 100644
> --- a/mm/page_frag_cache.c
> +++ b/mm/page_frag_cache.c
> @@ -63,9 +63,13 @@ void *__page_frag_alloc_align(struct page_frag_cache *nc,
>   			      unsigned int fragsz, gfp_t gfp_mask,
>   			      unsigned int align_mask)
>   {
> +#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
> +	unsigned int size = nc->size;
> +#else
>   	unsigned int size = PAGE_SIZE;
> +#endif
> +	unsigned int offset;
>   	struct page *page;
> -	int offset;
>   
>   	if (unlikely(!nc->va)) {
>   refill:
> @@ -85,11 +89,24 @@ void *__page_frag_alloc_align(struct page_frag_cache *nc,
>   		/* reset page count bias and offset to start of new frag */
>   		nc->pfmemalloc = page_is_pfmemalloc(page);
>   		nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1;
> -		nc->offset = size;
> +		nc->offset = 0;
>   	}
>   
> -	offset = nc->offset - fragsz;
> -	if (unlikely(offset < 0)) {
> +	offset = __ALIGN_KERNEL_MASK(nc->offset, ~align_mask);
> +	if (unlikely(offset + fragsz > size)) {
> +		if (unlikely(fragsz > PAGE_SIZE)) {
> +			/*
> +			 * The caller is trying to allocate a fragment
> +			 * with fragsz > PAGE_SIZE but the cache isn't big
> +			 * enough to satisfy the request, this may
> +			 * happen in low memory conditions.
> +			 * We don't release the cache page because
> +			 * it could make memory pressure worse
> +			 * so we simply return NULL here.
> +			 */
> +			return NULL;
> +		}
> +
>   		page = virt_to_page(nc->va);
>   
>   		if (!page_ref_sub_and_test(page, nc->pagecnt_bias))
> @@ -100,33 +117,16 @@ void *__page_frag_alloc_align(struct page_frag_cache *nc,
>   			goto refill;
>   		}
>   
> -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
> -		/* if size can vary use size else just use PAGE_SIZE */
> -		size = nc->size;
> -#endif
>   		/* OK, page count is 0, we can safely set it */
>   		set_page_count(page, PAGE_FRAG_CACHE_MAX_SIZE + 1);
>   
>   		/* reset page count bias and offset to start of new frag */
>   		nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1;
> -		offset = size - fragsz;
> -		if (unlikely(offset < 0)) {
> -			/*
> -			 * The caller is trying to allocate a fragment
> -			 * with fragsz > PAGE_SIZE but the cache isn't big
> -			 * enough to satisfy the request, this may
> -			 * happen in low memory conditions.
> -			 * We don't release the cache page because
> -			 * it could make memory pressure worse
> -			 * so we simply return NULL here.
> -			 */
> -			return NULL;
> -		}
> +		offset = 0;
>   	}
>   
>   	nc->pagecnt_bias--;
> -	offset &= align_mask;
> -	nc->offset = offset;
> +	nc->offset = offset + fragsz;
>   
>   	return nc->va + offset;
>   }


-- 
Florian
Re: [PATCH net-next v23 3/7] mm: page_frag: use initial zero offset for page_frag_alloc_align()
Posted by Yunsheng Lin 1 year ago
On 2025/1/24 3:15, Florian Fainelli wrote:
> 
> Sorry for the late feedback, this patch causes the bgmac driver in is .ndo_open() function to return -ENOMEM, the call trace looks like this:

Hi, Florian
Thanks for the report.

> 
>  bgmac_open
>   -> bgmac_dma_init
>     -> bgmac_dma_rx_skb_for_slot
>       -> netdev_alloc_frag
> 
> BGMAC_RX_ALLOC_SIZE = 10048 and PAGE_FRAG_CACHE_MAX_SIZE = 32768.

I guess BGMAC_RX_ALLOC_SIZE being bigger than PAGE_SIZE is the
problem here, as the frag API is not really supporting allocating
fragment that is bigger than PAGE_SIZE, as it will fall back to
allocate the base page when order-3 compound page allocation fails,
see __page_frag_cache_refill().

Also, it seems strange that bgmac driver seems to always use jumbo
frame size to allocate fragment, isn't more  appropriate to allocate
the fragment according to MTU?

> 
> Eventually we land into __page_frag_alloc_align() with the following parameters across multiple successive calls:
> 
> __page_frag_alloc_align: fragsz=10048, align_mask=-1, size=32768, offset=0
> __page_frag_alloc_align: fragsz=10048, align_mask=-1, size=32768, offset=10048
> __page_frag_alloc_align: fragsz=10048, align_mask=-1, size=32768, offset=20096
> __page_frag_alloc_align: fragsz=10048, align_mask=-1, size=32768, offset=30144
> 
> So in that case we do indeed have offset + fragsz (40192) > size (32768) and so we would eventually return NULL.

It seems the changing of '(unlikely(offset < 0))' checking to
"fragsz > PAGE_SIZE" causes bgmac driver to break more easily
here. But bgmac driver might likely break too if the system
memory is severely fragmented when falling back to alllocate
base page before this patch.

> 
> Any idea on how to best fix that within the bgmac driver?

Maybe use the page allocator API directly when allocating fragment
with BGMAC_RX_ALLOC_SIZE > PAGE_SIZE for a quick fix.

In the long term, maybe it makes sense to use the page_pool API
as more drivers are converting to use page_pool API already.
Re: [PATCH net-next v23 3/7] mm: page_frag: use initial zero offset for page_frag_alloc_align()
Posted by Florian Fainelli 1 year ago
On 1/24/25 01:52, Yunsheng Lin wrote:
> On 2025/1/24 3:15, Florian Fainelli wrote:
>>
>> Sorry for the late feedback, this patch causes the bgmac driver in is .ndo_open() function to return -ENOMEM, the call trace looks like this:
> 
> Hi, Florian
> Thanks for the report.
> 
>>
>>   bgmac_open
>>    -> bgmac_dma_init
>>      -> bgmac_dma_rx_skb_for_slot
>>        -> netdev_alloc_frag
>>
>> BGMAC_RX_ALLOC_SIZE = 10048 and PAGE_FRAG_CACHE_MAX_SIZE = 32768.
> 
> I guess BGMAC_RX_ALLOC_SIZE being bigger than PAGE_SIZE is the
> problem here, as the frag API is not really supporting allocating
> fragment that is bigger than PAGE_SIZE, as it will fall back to
> allocate the base page when order-3 compound page allocation fails,
> see __page_frag_cache_refill().
> 
> Also, it seems strange that bgmac driver seems to always use jumbo
> frame size to allocate fragment, isn't more  appropriate to allocate
> the fragment according to MTU?

Totally, even though my email domain would suggest otherwise, I am just 
an user of that driver here, not its maintainer, though I do have some 
familiarity with it, I don't know why that choice was made.

> 
>>
>> Eventually we land into __page_frag_alloc_align() with the following parameters across multiple successive calls:
>>
>> __page_frag_alloc_align: fragsz=10048, align_mask=-1, size=32768, offset=0
>> __page_frag_alloc_align: fragsz=10048, align_mask=-1, size=32768, offset=10048
>> __page_frag_alloc_align: fragsz=10048, align_mask=-1, size=32768, offset=20096
>> __page_frag_alloc_align: fragsz=10048, align_mask=-1, size=32768, offset=30144
>>
>> So in that case we do indeed have offset + fragsz (40192) > size (32768) and so we would eventually return NULL.
> 
> It seems the changing of '(unlikely(offset < 0))' checking to
> "fragsz > PAGE_SIZE" causes bgmac driver to break more easily
> here. But bgmac driver might likely break too if the system
> memory is severely fragmented when falling back to alllocate
> base page before this patch.
> 
>>
>> Any idea on how to best fix that within the bgmac driver?
> 
> Maybe use the page allocator API directly when allocating fragment
> with BGMAC_RX_ALLOC_SIZE > PAGE_SIZE for a quick fix.
> 
> In the long term, maybe it makes sense to use the page_pool API
> as more drivers are converting to use page_pool API already.

Short term, I think I am going to submit a quick fix that is inspired by 
the out of tree patches carried in OpenWrt:

https://git.openwrt.org/?p=openwrt/openwrt.git;a=blob;f=target/linux/bcm53xx/patches-6.6/700-bgmac-reduce-max-frame-size-to-support-just-MTU-1500.patch;h=3a2f4b06ed6d8cda1f3f0be23e1066267234766b;hb=HEAD

Thanks!
-- 
Florian