[PATCH net-next 2/2] page_pool: support non-frag page for page_pool_alloc_frag()

Yunsheng Lin posted 2 patches 4 months ago
There is a newer version of this series
[PATCH net-next 2/2] page_pool: support non-frag page for page_pool_alloc_frag()
Posted by Yunsheng Lin 4 months ago
There is performance penalty with using page frag support when
user requests a larger frag size and a page only supports one
frag user, see [1].

It seems like user may request different frag size depending
on the mtu and packet size, provide an option to allocate
non-frag page when a whole page is not able to hold two frags,
so that user has a unified interface for the memory allocation
with least memory utilization and performance penalty.

1. https://lore.kernel.org/netdev/ZEU+vospFdm08IeE@localhost.localdomain/

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
CC: Lorenzo Bianconi <lorenzo@kernel.org>
CC: Alexander Duyck <alexander.duyck@gmail.com>
---
 net/core/page_pool.c | 47 +++++++++++++++++++++++++++-----------------
 1 file changed, 29 insertions(+), 18 deletions(-)

diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 0868aa8f6323..e84ec6eabefd 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -699,14 +699,27 @@ struct page *page_pool_alloc_frag(struct page_pool *pool,
 	unsigned int max_size = PAGE_SIZE << pool->p.order;
 	struct page *page = pool->frag_page;
 
-	if (WARN_ON(!(pool->p.flags & PP_FLAG_PAGE_FRAG) ||
-		    size > max_size))
+	if (unlikely(size > max_size))
 		return NULL;
 
+	if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT) {
+		*offset = 0;
+		return page_pool_alloc_pages(pool, gfp);
+	}
+
 	size = ALIGN(size, dma_get_cache_alignment());
-	*offset = pool->frag_offset;
 
-	if (page && *offset + size > max_size) {
+	if (page) {
+		*offset = pool->frag_offset;
+
+		if (*offset + size <= max_size) {
+			pool->frag_users++;
+			pool->frag_offset = *offset + size;
+			alloc_stat_inc(pool, fast);
+			return page;
+		}
+
+		pool->frag_page = NULL;
 		page = page_pool_drain_frag(pool, page);
 		if (page) {
 			alloc_stat_inc(pool, fast);
@@ -714,26 +727,24 @@ struct page *page_pool_alloc_frag(struct page_pool *pool,
 		}
 	}
 
-	if (!page) {
-		page = page_pool_alloc_pages(pool, gfp);
-		if (unlikely(!page)) {
-			pool->frag_page = NULL;
-			return NULL;
-		}
-
-		pool->frag_page = page;
+	page = page_pool_alloc_pages(pool, gfp);
+	if (unlikely(!page))
+		return NULL;
 
 frag_reset:
-		pool->frag_users = 1;
+	/* return page as non-frag page if a page is not able to
+	 * hold two frags for the current requested size.
+	 */
+	if (unlikely(size << 1 > max_size)) {
 		*offset = 0;
-		pool->frag_offset = size;
-		page_pool_fragment_page(page, BIAS_MAX);
 		return page;
 	}
 
-	pool->frag_users++;
-	pool->frag_offset = *offset + size;
-	alloc_stat_inc(pool, fast);
+	pool->frag_page = page;
+	pool->frag_users = 1;
+	*offset = 0;
+	pool->frag_offset = size;
+	page_pool_fragment_page(page, BIAS_MAX);
 	return page;
 }
 EXPORT_SYMBOL(page_pool_alloc_frag);
-- 
2.33.0
Re: [PATCH net-next 2/2] page_pool: support non-frag page for page_pool_alloc_frag()
Posted by Alexander Duyck 4 months ago
On Fri, May 26, 2023 at 2:28 AM Yunsheng Lin <linyunsheng@huawei.com> wrote:
>
> There is performance penalty with using page frag support when
> user requests a larger frag size and a page only supports one
> frag user, see [1].
>
> It seems like user may request different frag size depending
> on the mtu and packet size, provide an option to allocate
> non-frag page when a whole page is not able to hold two frags,
> so that user has a unified interface for the memory allocation
> with least memory utilization and performance penalty.
>
> 1. https://lore.kernel.org/netdev/ZEU+vospFdm08IeE@localhost.localdomain/
>
> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
> CC: Lorenzo Bianconi <lorenzo@kernel.org>
> CC: Alexander Duyck <alexander.duyck@gmail.com>

The way I see it there are several problems with this approach.

First, why not just increase the page order rather than trying to
essentially make page_pool_alloc_frag into an analog for
page_pool_alloc_pages? I know for the skb allocator we are working
with an order 3 page. You could likely do something similar here to
achieve the better performance you are looking for.

Second, I am not a fan of these changes as they seem to be wasteful
for drivers that might make use of a mix of large and small
allocations. If we aren't going to use fragments then we should
probably just wrap the call to this function in an inline wrapper that
checks the size and just automatically pulls the larger sizes off into
the non-frag allocation path. Look at something such as
__netdev_alloc_skb as an example.
Re: [PATCH net-next 2/2] page_pool: support non-frag page for page_pool_alloc_frag()
Posted by Yunsheng Lin 4 months ago
On 2023/5/26 23:16, Alexander Duyck wrote:
> On Fri, May 26, 2023 at 2:28 AM Yunsheng Lin <linyunsheng@huawei.com> wrote:
>>
>> There is performance penalty with using page frag support when
>> user requests a larger frag size and a page only supports one
>> frag user, see [1].
>>
>> It seems like user may request different frag size depending
>> on the mtu and packet size, provide an option to allocate
>> non-frag page when a whole page is not able to hold two frags,
>> so that user has a unified interface for the memory allocation
>> with least memory utilization and performance penalty.
>>
>> 1. https://lore.kernel.org/netdev/ZEU+vospFdm08IeE@localhost.localdomain/
>>
>> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
>> CC: Lorenzo Bianconi <lorenzo@kernel.org>
>> CC: Alexander Duyck <alexander.duyck@gmail.com>
> 
> The way I see it there are several problems with this approach.
> 
> First, why not just increase the page order rather than trying to
> essentially make page_pool_alloc_frag into an analog for
> page_pool_alloc_pages? I know for the skb allocator we are working
> with an order 3 page. You could likely do something similar here to
> achieve the better performance you are looking for.

I suppose the "an order 3 page" refers to __page_frag_cache_refill()
trying to allocate an order 3 page, if it fails, then fail back to
order 0 page?

As page pool alloc and recycle page in order based on pool->alloc
and pool->ring, so we can not do the above failback trick for page
pool.

We could add different pool->alloc/pool->ring for different order
page, for example, pool->alloc_order_0/pool->ring_order_0 for order
0 page, and pool->alloc_order_3/pool->ring_order_3 for order 3 page,
but then it will be complicated and possibly more memory wasteful.

We would also create page pool with pool->p.order being 3, then
the optimization in this patch also apply.

> 
> Second, I am not a fan of these changes as they seem to be wasteful
> for drivers that might make use of a mix of large and small

I suppose 'wasteful' refer to memory usage, right?
I am not sure how drivers that might make use a mix of large and
small yet, like mlx5 using page_pool_defrag_page() directly.
but if the PAGE_SIZE << pool->p.order is integral multiple of the frag,
then some waste seems to be unavoidable, does handling the frag count
in the driver save more memory than handling the frag count in page pool?
If not, handling the frag count in page pool seems more reasonable?

> allocations. If we aren't going to use fragments then we should
> probably just wrap the call to this function in an inline wrapper that
> checks the size and just automatically pulls the larger sizes off into
> the non-frag allocation path. Look at something such as
> __netdev_alloc_skb as an example.

Do you mean adding an new inline wrapper like below?

if (len > xxx)
	return page_pool_alloc_pages(pool, gfp);
else
	return page_pool_alloc_frag();

It seems the above is essentially the same as this patch does,
this patch does it by not introducing another interface and doing
some optimization. For example, for the above case with 'len > xxx'
in this patch, we still allow frag allocate if the remaining size of
the page is bigger than 'len', which I think it is a small optimization
for 64K page size or order 3 page.

Also, do you have some thing in mind above xxx here? As this patch
chooses the ((PAGE_SIZE << pool->p.order) / 2) as xxx.

> .
>