From nobody Fri Sep 20 16:30:29 2024 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5596BC7EE23 for ; Mon, 12 Jun 2023 13:05:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235900AbjFLNFU (ORCPT ); Mon, 12 Jun 2023 09:05:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46570 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232133AbjFLNFQ (ORCPT ); Mon, 12 Jun 2023 09:05:16 -0400 Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C010519D; Mon, 12 Jun 2023 06:05:13 -0700 (PDT) Received: from dggpemm500005.china.huawei.com (unknown [172.30.72.56]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4QfsKC4GY6zLqgj; Mon, 12 Jun 2023 21:02:07 +0800 (CST) Received: from localhost.localdomain (10.69.192.56) by dggpemm500005.china.huawei.com (7.185.36.74) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.23; Mon, 12 Jun 2023 21:05:11 +0800 From: Yunsheng Lin To: , , CC: , , Yunsheng Lin , Lorenzo Bianconi , Alexander Duyck , Saeed Mahameed , Leon Romanovsky , Eric Dumazet , Jesper Dangaard Brouer , Ilias Apalodimas , Subject: [PATCH net-next v4 1/5] page_pool: frag API support for 32-bit arch with 64-bit DMA Date: Mon, 12 Jun 2023 21:02:52 +0800 Message-ID: <20230612130256.4572-2-linyunsheng@huawei.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20230612130256.4572-1-linyunsheng@huawei.com> References: <20230612130256.4572-1-linyunsheng@huawei.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Originating-IP: [10.69.192.56] X-ClientProxiedBy: dggems703-chm.china.huawei.com (10.3.19.180) To dggpemm500005.china.huawei.com (7.185.36.74) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Currently page_pool_alloc_frag() is not supported in 32-bit arch with 64-bit DMA, which seems to be quite common, see [1], which means driver may need to handle it when using page_pool_alloc_frag() API. In order to simplify the driver's work for supporting page frag, this patch allows page_pool_alloc_frag() to call page_pool_alloc_pages() to return a big page frag without page splitting because of overlap issue between pp_frag_count and dma_addr_upper in 'struct page' for those arches. As page_pool_create() with PP_FLAG_PAGE_FRAG is supported in 32-bit arch with 64-bit DMA now, mlx5 calls page_pool_create() with PP_FLAG_PAGE_FRAG and manipulate the page->pp_frag_count directly using the page_pool_defrag_page(), so add a checking for it to aoivd writing to page->pp_frag_count that may not exist in some arch. Note that it may aggravate truesize underestimate problem for skb as there is no page splitting for those pages, if driver need a accuate truesize, it may calculate that according to frag size, page order and PAGE_POOL_DMA_USE_PP_FRAG_COUNT being true or not. And we may provide a helper for that if it turns out to be helpful. 1. https://lore.kernel.org/all/20211117075652.58299-1-linyunsheng@huawei.co= m/ Signed-off-by: Yunsheng Lin CC: Lorenzo Bianconi CC: Alexander Duyck --- .../net/ethernet/mellanox/mlx5/core/en_main.c | 10 +++++ include/net/page_pool.h | 44 ++++++++++++++++--- net/core/page_pool.c | 18 ++------ 3 files changed, 52 insertions(+), 20 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/ne= t/ethernet/mellanox/mlx5/core/en_main.c index a7c526ee5024..593cdfbfc035 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -832,6 +832,16 @@ static int mlx5e_alloc_rq(struct mlx5e_params *params, /* Create a page_pool and register it with rxq */ struct page_pool_params pp_params =3D { 0 }; =20 + /* Return error here to avoid writing to page->pp_frag_count in + * mlx5e_page_release_fragmented() for page->pp_frag_count is + * not usable for arch with PAGE_POOL_DMA_USE_PP_FRAG_COUNT + * being true. + */ + if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT) { + err =3D -EINVAL; + goto err_free_by_rq_type; + } + pp_params.order =3D 0; pp_params.flags =3D PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV | PP_FLAG= _PAGE_FRAG; pp_params.pool_size =3D pool_size; diff --git a/include/net/page_pool.h b/include/net/page_pool.h index 126f9e294389..5c7f7501f300 100644 --- a/include/net/page_pool.h +++ b/include/net/page_pool.h @@ -33,6 +33,7 @@ #include /* Needed by ptr_ring */ #include #include +#include =20 #define PP_FLAG_DMA_MAP BIT(0) /* Should page_pool do the DMA * map/unmap @@ -50,6 +51,9 @@ PP_FLAG_DMA_SYNC_DEV |\ PP_FLAG_PAGE_FRAG) =20 +#define PAGE_POOL_DMA_USE_PP_FRAG_COUNT \ + (sizeof(dma_addr_t) > sizeof(unsigned long)) + /* * Fast allocation side cache array/stack * @@ -219,8 +223,33 @@ static inline struct page *page_pool_dev_alloc_pages(s= truct page_pool *pool) return page_pool_alloc_pages(pool, gfp); } =20 -struct page *page_pool_alloc_frag(struct page_pool *pool, unsigned int *of= fset, - unsigned int size, gfp_t gfp); +struct page *__page_pool_alloc_frag(struct page_pool *pool, + unsigned int *offset, unsigned int size, + gfp_t gfp); + +static inline struct page *page_pool_alloc_frag(struct page_pool *pool, + unsigned int *offset, + unsigned int size, gfp_t gfp) +{ + unsigned int max_size =3D PAGE_SIZE << pool->p.order; + + size =3D ALIGN(size, dma_get_cache_alignment()); + + if (WARN_ON(!(pool->p.flags & PP_FLAG_PAGE_FRAG) || + size > max_size)) + return NULL; + + /* Don't allow page splitting and allocate one big frag + * for 32-bit arch with 64-bit DMA, corresponding to + * the checking in page_pool_is_last_frag(). + */ + if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT) { + *offset =3D 0; + return page_pool_alloc_pages(pool, gfp); + } + + return __page_pool_alloc_frag(pool, offset, size, gfp); +} =20 static inline struct page *page_pool_dev_alloc_frag(struct page_pool *pool, unsigned int *offset, @@ -322,8 +351,14 @@ static inline long page_pool_defrag_page(struct page *= page, long nr) static inline bool page_pool_is_last_frag(struct page_pool *pool, struct page *page) { - /* If fragments aren't enabled or count is 0 we were the last user */ + /* We assume we are the last frag user that is still holding + * on to the page if: + * 1. Fragments aren't enabled. + * 2. We are running in 32-bit arch with 64-bit DMA. + * 3. page_pool_defrag_page() indicate we are the last user. + */ return !(pool->p.flags & PP_FLAG_PAGE_FRAG) || + PAGE_POOL_DMA_USE_PP_FRAG_COUNT || (page_pool_defrag_page(page, 1) =3D=3D 0); } =20 @@ -357,9 +392,6 @@ static inline void page_pool_recycle_direct(struct page= _pool *pool, page_pool_put_full_page(pool, page, true); } =20 -#define PAGE_POOL_DMA_USE_PP_FRAG_COUNT \ - (sizeof(dma_addr_t) > sizeof(unsigned long)) - static inline dma_addr_t page_pool_get_dma_addr(struct page *page) { dma_addr_t ret =3D page->dma_addr; diff --git a/net/core/page_pool.c b/net/core/page_pool.c index a3e12a61d456..9c4118c62997 100644 --- a/net/core/page_pool.c +++ b/net/core/page_pool.c @@ -14,7 +14,6 @@ #include =20 #include -#include #include #include /* for put_page() */ #include @@ -200,10 +199,6 @@ static int page_pool_init(struct page_pool *pool, */ } =20 - if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT && - pool->p.flags & PP_FLAG_PAGE_FRAG) - return -EINVAL; - #ifdef CONFIG_PAGE_POOL_STATS pool->recycle_stats =3D alloc_percpu(struct page_pool_recycle_stats); if (!pool->recycle_stats) @@ -715,18 +710,13 @@ static void page_pool_free_frag(struct page_pool *poo= l) page_pool_return_page(pool, page); } =20 -struct page *page_pool_alloc_frag(struct page_pool *pool, - unsigned int *offset, - unsigned int size, gfp_t gfp) +struct page *__page_pool_alloc_frag(struct page_pool *pool, + unsigned int *offset, + unsigned int size, gfp_t gfp) { unsigned int max_size =3D PAGE_SIZE << pool->p.order; struct page *page =3D pool->frag_page; =20 - if (WARN_ON(!(pool->p.flags & PP_FLAG_PAGE_FRAG) || - size > max_size)) - return NULL; - - size =3D ALIGN(size, dma_get_cache_alignment()); *offset =3D pool->frag_offset; =20 if (page && *offset + size > max_size) { @@ -759,7 +749,7 @@ struct page *page_pool_alloc_frag(struct page_pool *poo= l, alloc_stat_inc(pool, fast); return page; } -EXPORT_SYMBOL(page_pool_alloc_frag); +EXPORT_SYMBOL(__page_pool_alloc_frag); =20 static void page_pool_empty_ring(struct page_pool *pool) { --=20 2.33.0 From nobody Fri Sep 20 16:30:29 2024 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1B8ECC7EE23 for ; Mon, 12 Jun 2023 13:05:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236453AbjFLNFt (ORCPT ); Mon, 12 Jun 2023 09:05:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46856 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236047AbjFLNFe (ORCPT ); Mon, 12 Jun 2023 09:05:34 -0400 Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 97A3D10D4; Mon, 12 Jun 2023 06:05:29 -0700 (PDT) Received: from dggpemm500005.china.huawei.com (unknown [172.30.72.56]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4QfsNN6k9hzTl9D; Mon, 12 Jun 2023 21:04:52 +0800 (CST) Received: from localhost.localdomain (10.69.192.56) by dggpemm500005.china.huawei.com (7.185.36.74) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.23; Mon, 12 Jun 2023 21:05:15 +0800 From: Yunsheng Lin To: , , CC: , , Yunsheng Lin , Lorenzo Bianconi , Alexander Duyck , Jesper Dangaard Brouer , Ilias Apalodimas , Eric Dumazet Subject: [PATCH net-next v4 2/5] page_pool: unify frag_count handling in page_pool_is_last_frag() Date: Mon, 12 Jun 2023 21:02:53 +0800 Message-ID: <20230612130256.4572-3-linyunsheng@huawei.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20230612130256.4572-1-linyunsheng@huawei.com> References: <20230612130256.4572-1-linyunsheng@huawei.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Originating-IP: [10.69.192.56] X-ClientProxiedBy: dggems703-chm.china.huawei.com (10.3.19.180) To dggpemm500005.china.huawei.com (7.185.36.74) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Currently when page_pool_create() is called with PP_FLAG_PAGE_FRAG flag, page_pool_alloc_pages() is only allowed to be called under the below constraints: 1. page_pool_fragment_page() need to be called to setup page->pp_frag_count immediately. 2. page_pool_defrag_page() often need to be called to drain the page->pp_frag_count when there is no more user will be holding on to that page. Those constraints exist in order to support a page to be split into multi frags. And those constraints have some overhead because of the cache line dirtying/bouncing and atomic update. Those constraints are unavoidable for case when we need a page to be split into more than one frag, but there is also case that we want to avoid the above constraints and their overhead when a page can't be split as it can only hold a big frag as requested by user, depending on different use cases: use case 1: allocate page without page splitting. use case 2: allocate page with page splitting. use case 3: allocate page with or without page splitting depending on the frag size. Currently page pool only provide page_pool_alloc_pages() and page_pool_alloc_frag() API to enable the 1 & 2 separately, so we can not use a combination of 1 & 2 to enable 3, it is not possible yet because of the per page_pool flag PP_FLAG_PAGE_FRAG. So in order to allow allocating unsplit page without the overhead of split page while still allow allocating split page we need to remove the per page_pool flag in page_pool_is_last_frag(), as best as I can think of, it seems there are two methods as below: 1. Add per page flag/bit to indicate a page is split or not, which means we might need to update that flag/bit everytime the page is recycled, dirtying the cache line of 'struct page' for use case 1. 2. Unify the page->pp_frag_count handling for both split and unsplit page by assuming all pages in the page pool is split into a big frag initially. As page pool already supports use case 1 without dirtying the cache line of 'struct page' whenever a page is recyclable, we need to support the above use case 3 with minimal overhead, especially not adding any noticeable overhead for use case 1, and we are already doing an optimization by not updating pp_frag_count in page_pool_defrag_page() for the last frag user, this patch chooses to unify the pp_frag_count handling to support the above use case 3. Signed-off-by: Yunsheng Lin CC: Lorenzo Bianconi CC: Alexander Duyck --- include/net/page_pool.h | 49 ++++++++++++++++++++++++++++++----------- net/core/page_pool.c | 8 +++++++ 2 files changed, 44 insertions(+), 13 deletions(-) diff --git a/include/net/page_pool.h b/include/net/page_pool.h index 5c7f7501f300..0b8cd2acc1d7 100644 --- a/include/net/page_pool.h +++ b/include/net/page_pool.h @@ -324,7 +324,8 @@ void page_pool_put_defragged_page(struct page_pool *poo= l, struct page *page, */ static inline void page_pool_fragment_page(struct page *page, long nr) { - atomic_long_set(&page->pp_frag_count, nr); + if (!PAGE_POOL_DMA_USE_PP_FRAG_COUNT) + atomic_long_set(&page->pp_frag_count, nr); } =20 static inline long page_pool_defrag_page(struct page *page, long nr) @@ -332,19 +333,43 @@ static inline long page_pool_defrag_page(struct page = *page, long nr) long ret; =20 /* If nr =3D=3D pp_frag_count then we have cleared all remaining - * references to the page. No need to actually overwrite it, instead - * we can leave this to be overwritten by the calling function. + * references to the page: + * 1. 'n =3D=3D 1': no need to actually overwrite it. + * 2. 'n !=3D 1': overwrite it with one, which is the rare case + * for frag draining. * - * The main advantage to doing this is that an atomic_read is - * generally a much cheaper operation than an atomic update, - * especially when dealing with a page that may be partitioned - * into only 2 or 3 pieces. + * The main advantage to doing this is that not only we avoid a + * atomic update, as an atomic_read is generally a much cheaper + * operation than an atomic update, especially when dealing with + * a page that may be partitioned into only 2 or 3 pieces; but + * also unify the frag and non-frag handling by ensuring all + * pages have been split into one big frag initially, and only + * overwrite it when the page is split into more than one frag. */ - if (atomic_long_read(&page->pp_frag_count) =3D=3D nr) + if (atomic_long_read(&page->pp_frag_count) =3D=3D nr) { + /* As we have ensured nr is always one for constant case + * using the BUILD_BUG_ON(), only need to handle the + * non-constant case here for frag count draining, which + * is a rare case. + */ + BUILD_BUG_ON(__builtin_constant_p(nr) && nr !=3D 1); + if (!__builtin_constant_p(nr)) + atomic_long_set(&page->pp_frag_count, 1); + return 0; + } =20 ret =3D atomic_long_sub_return(nr, &page->pp_frag_count); WARN_ON(ret < 0); + + /* We are the last user here too, reset frag count back to 1 to + * ensure all pages have been split into one big frag initially, + * this should be the rare case when the last two frag users call + * page_pool_defrag_page() currently. + */ + if (unlikely(!ret)) + atomic_long_set(&page->pp_frag_count, 1); + return ret; } =20 @@ -353,12 +378,10 @@ static inline bool page_pool_is_last_frag(struct page= _pool *pool, { /* We assume we are the last frag user that is still holding * on to the page if: - * 1. Fragments aren't enabled. - * 2. We are running in 32-bit arch with 64-bit DMA. - * 3. page_pool_defrag_page() indicate we are the last user. + * 1. We are running in 32-bit arch with 64-bit DMA. + * 2. page_pool_defrag_page() indicate we are the last user. */ - return !(pool->p.flags & PP_FLAG_PAGE_FRAG) || - PAGE_POOL_DMA_USE_PP_FRAG_COUNT || + return PAGE_POOL_DMA_USE_PP_FRAG_COUNT || (page_pool_defrag_page(page, 1) =3D=3D 0); } =20 diff --git a/net/core/page_pool.c b/net/core/page_pool.c index 9c4118c62997..69e3c5175236 100644 --- a/net/core/page_pool.c +++ b/net/core/page_pool.c @@ -352,6 +352,14 @@ static void page_pool_set_pp_info(struct page_pool *po= ol, { page->pp =3D pool; page->pp_magic |=3D PP_SIGNATURE; + + /* Ensuring all pages have been split into one big frag initially: + * page_pool_set_pp_info() is only called once for every page when it + * is allocated from the page allocator and page_pool_fragment_page() + * is dirtying the same cache line as the page->pp_magic above, so + * the overhead is negligible. + */ + page_pool_fragment_page(page, 1); if (pool->p.init_callback) pool->p.init_callback(page, pool->p.init_arg); } --=20 2.33.0 From nobody Fri Sep 20 16:30:29 2024 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D0A49C7EE45 for ; Mon, 12 Jun 2023 13:05:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236266AbjFLNFb (ORCPT ); Mon, 12 Jun 2023 09:05:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46684 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235960AbjFLNFZ (ORCPT ); Mon, 12 Jun 2023 09:05:25 -0400 Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7C82BE69; Mon, 12 Jun 2023 06:05:20 -0700 (PDT) Received: from dggpemm500005.china.huawei.com (unknown [172.30.72.53]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4QfsKK2JtlzLqhM; Mon, 12 Jun 2023 21:02:13 +0800 (CST) Received: from localhost.localdomain (10.69.192.56) by dggpemm500005.china.huawei.com (7.185.36.74) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.23; Mon, 12 Jun 2023 21:05:17 +0800 From: Yunsheng Lin To: , , CC: , , Yunsheng Lin , Lorenzo Bianconi , Alexander Duyck , Jesper Dangaard Brouer , Ilias Apalodimas , Eric Dumazet Subject: [PATCH net-next v4 3/5] page_pool: introduce page_pool_alloc() API Date: Mon, 12 Jun 2023 21:02:54 +0800 Message-ID: <20230612130256.4572-4-linyunsheng@huawei.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20230612130256.4572-1-linyunsheng@huawei.com> References: <20230612130256.4572-1-linyunsheng@huawei.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Originating-IP: [10.69.192.56] X-ClientProxiedBy: dggems703-chm.china.huawei.com (10.3.19.180) To dggpemm500005.china.huawei.com (7.185.36.74) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Currently page pool supports the below use cases: use case 1: allocate page without page splitting using page_pool_alloc_pages() API if the driver knows that the memory it need is always bigger than half of the page allocated from page pool. use case 2: allocate page frag with page splitting using page_pool_alloc_frag() API if the driver knows that the memory it need is always smaller than or equal to the half of the page allocated from page pool. There is emerging use case [1] & [2] that is a mix of the above two case: the driver doesn't know the size of memory it need beforehand, so the driver may use something like below to allocate memory with least memory utilization and performance penalty: if (size << 1 > max_size) page =3D page_pool_alloc_pages(); else page =3D page_pool_alloc_frag(); To avoid the driver doing something like above, add the page_pool_alloc() API to support the above use case, and update the true size of memory that is acctually allocated by updating '*size' back to the driver in order to avoid the truesize underestimate problem. 1. https://lore.kernel.org/all/d3ae6bd3537fbce379382ac6a42f67e22f27ece2.168= 3896626.git.lorenzo@kernel.org/ 2. https://lore.kernel.org/all/20230526054621.18371-3-liangchen.linux@gmail= .com/ Signed-off-by: Yunsheng Lin CC: Lorenzo Bianconi CC: Alexander Duyck --- include/net/page_pool.h | 43 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 43 insertions(+) diff --git a/include/net/page_pool.h b/include/net/page_pool.h index 0b8cd2acc1d7..c135cd157cea 100644 --- a/include/net/page_pool.h +++ b/include/net/page_pool.h @@ -260,6 +260,49 @@ static inline struct page *page_pool_dev_alloc_frag(st= ruct page_pool *pool, return page_pool_alloc_frag(pool, offset, size, gfp); } =20 +static inline struct page *page_pool_alloc(struct page_pool *pool, + unsigned int *offset, + unsigned int *size, gfp_t gfp) +{ + unsigned int max_size =3D PAGE_SIZE << pool->p.order; + struct page *page; + + *size =3D ALIGN(*size, dma_get_cache_alignment()); + + if (WARN_ON(*size > max_size)) + return NULL; + + if ((*size << 1) > max_size || PAGE_POOL_DMA_USE_PP_FRAG_COUNT) { + *size =3D max_size; + *offset =3D 0; + return page_pool_alloc_pages(pool, gfp); + } + + page =3D __page_pool_alloc_frag(pool, offset, *size, gfp); + if (unlikely(!page)) + return NULL; + + /* There is very likely not enough space for another frag, so append the + * remaining size to the current frag to avoid truesize underestimate + * problem. + */ + if (pool->frag_offset + *size > max_size) { + *size =3D max_size - *offset; + pool->frag_offset =3D max_size; + } + + return page; +} + +static inline struct page *page_pool_dev_alloc(struct page_pool *pool, + unsigned int *offset, + unsigned int *size) +{ + gfp_t gfp =3D (GFP_ATOMIC | __GFP_NOWARN); + + return page_pool_alloc(pool, offset, size, gfp); +} + /* get the stored dma direction. A driver might decide to treat this local= ly and * avoid the extra cache line from page_pool to determine the direction */ --=20 2.33.0 From nobody Fri Sep 20 16:30:29 2024 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 68F16C7EE25 for ; Mon, 12 Jun 2023 13:05:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231879AbjFLNFe (ORCPT ); Mon, 12 Jun 2023 09:05:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46712 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235962AbjFLNF0 (ORCPT ); Mon, 12 Jun 2023 09:05:26 -0400 Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2D9AEE7D; Mon, 12 Jun 2023 06:05:22 -0700 (PDT) Received: from dggpemm500005.china.huawei.com (unknown [172.30.72.54]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4QfsKN076yzLqgt; Mon, 12 Jun 2023 21:02:16 +0800 (CST) Received: from localhost.localdomain (10.69.192.56) by dggpemm500005.china.huawei.com (7.185.36.74) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.23; Mon, 12 Jun 2023 21:05:20 +0800 From: Yunsheng Lin To: , , CC: , , Yunsheng Lin , Lorenzo Bianconi , Alexander Duyck , Yisen Zhuang , Salil Mehta , Eric Dumazet , Sunil Goutham , Geetha sowjanya , Subbaraya Sundeep , hariprasad , Saeed Mahameed , Leon Romanovsky , Felix Fietkau , Ryder Lee , Shayne Chen , Sean Wang , Kalle Valo , Matthias Brugger , AngeloGioacchino Del Regno , Jesper Dangaard Brouer , Ilias Apalodimas , , , , Subject: [PATCH net-next v4 4/5] page_pool: remove PP_FLAG_PAGE_FRAG flag Date: Mon, 12 Jun 2023 21:02:55 +0800 Message-ID: <20230612130256.4572-5-linyunsheng@huawei.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20230612130256.4572-1-linyunsheng@huawei.com> References: <20230612130256.4572-1-linyunsheng@huawei.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Originating-IP: [10.69.192.56] X-ClientProxiedBy: dggems703-chm.china.huawei.com (10.3.19.180) To dggpemm500005.china.huawei.com (7.185.36.74) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" PP_FLAG_PAGE_FRAG is not really needed after pp_frag_count handling is unified and page_pool_alloc_frag() is supported in 32-bit arch with 64-bit DMA, so remove it. Signed-off-by: Yunsheng Lin CC: Lorenzo Bianconi CC: Alexander Duyck --- drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 3 +-- drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c | 2 +- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 2 +- drivers/net/wireless/mediatek/mt76/mac80211.c | 2 +- include/net/page_pool.h | 7 ++----- net/core/skbuff.c | 2 +- 6 files changed, 7 insertions(+), 11 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/= ethernet/hisilicon/hns3/hns3_enet.c index b676496ec6d7..4e613d5bf1fd 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c @@ -4925,8 +4925,7 @@ static void hns3_put_ring_config(struct hns3_nic_priv= *priv) static void hns3_alloc_page_pool(struct hns3_enet_ring *ring) { struct page_pool_params pp_params =3D { - .flags =3D PP_FLAG_DMA_MAP | PP_FLAG_PAGE_FRAG | - PP_FLAG_DMA_SYNC_DEV, + .flags =3D PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV, .order =3D hns3_page_order(ring), .pool_size =3D ring->desc_num * hns3_buf_size(ring) / (PAGE_SIZE << hns3_page_order(ring)), diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c b/dri= vers/net/ethernet/marvell/octeontx2/nic/otx2_common.c index a79cb680bb23..404caec467af 100644 --- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c +++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c @@ -1426,7 +1426,7 @@ int otx2_pool_init(struct otx2_nic *pfvf, u16 pool_id, return 0; } =20 - pp_params.flags =3D PP_FLAG_PAGE_FRAG | PP_FLAG_DMA_MAP; + pp_params.flags =3D PP_FLAG_DMA_MAP; pp_params.pool_size =3D numptrs; pp_params.nid =3D NUMA_NO_NODE; pp_params.dev =3D pfvf->dev; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/ne= t/ethernet/mellanox/mlx5/core/en_main.c index 593cdfbfc035..79f2f5e51ae0 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -843,7 +843,7 @@ static int mlx5e_alloc_rq(struct mlx5e_params *params, } =20 pp_params.order =3D 0; - pp_params.flags =3D PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV | PP_FLAG= _PAGE_FRAG; + pp_params.flags =3D PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV; pp_params.pool_size =3D pool_size; pp_params.nid =3D node; pp_params.dev =3D rq->pdev; diff --git a/drivers/net/wireless/mediatek/mt76/mac80211.c b/drivers/net/wi= reless/mediatek/mt76/mac80211.c index 467afef98ba2..ee72869e5572 100644 --- a/drivers/net/wireless/mediatek/mt76/mac80211.c +++ b/drivers/net/wireless/mediatek/mt76/mac80211.c @@ -566,7 +566,7 @@ int mt76_create_page_pool(struct mt76_dev *dev, struct = mt76_queue *q) { struct page_pool_params pp_params =3D { .order =3D 0, - .flags =3D PP_FLAG_PAGE_FRAG, + .flags =3D 0, .nid =3D NUMA_NO_NODE, .dev =3D dev->dma_dev, }; diff --git a/include/net/page_pool.h b/include/net/page_pool.h index c135cd157cea..f4fc339ff020 100644 --- a/include/net/page_pool.h +++ b/include/net/page_pool.h @@ -46,10 +46,8 @@ * Please note DMA-sync-for-CPU is still * device driver responsibility */ -#define PP_FLAG_PAGE_FRAG BIT(2) /* for page frag feature */ #define PP_FLAG_ALL (PP_FLAG_DMA_MAP |\ - PP_FLAG_DMA_SYNC_DEV |\ - PP_FLAG_PAGE_FRAG) + PP_FLAG_DMA_SYNC_DEV) =20 #define PAGE_POOL_DMA_USE_PP_FRAG_COUNT \ (sizeof(dma_addr_t) > sizeof(unsigned long)) @@ -235,8 +233,7 @@ static inline struct page *page_pool_alloc_frag(struct = page_pool *pool, =20 size =3D ALIGN(size, dma_get_cache_alignment()); =20 - if (WARN_ON(!(pool->p.flags & PP_FLAG_PAGE_FRAG) || - size > max_size)) + if (WARN_ON(size > max_size)) return NULL; =20 /* Don't allow page splitting and allocate one big frag diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 7c4338221b17..ca2316cc1e7e 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -5652,7 +5652,7 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_b= uff *from, /* In general, avoid mixing page_pool and non-page_pool allocated * pages within the same SKB. Additionally avoid dealing with clones * with page_pool pages, in case the SKB is using page_pool fragment - * references (PP_FLAG_PAGE_FRAG). Since we only take full page + * references (page_pool_alloc_frag()). Since we only take full page * references for cloned SKBs at the moment that would result in * inconsistent reference counts. * In theory we could take full references if @from is cloned and --=20 2.33.0 From nobody Fri Sep 20 16:30:29 2024 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1EBB0C7EE23 for ; Mon, 12 Jun 2023 13:05:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236273AbjFLNFj (ORCPT ); Mon, 12 Jun 2023 09:05:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46742 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236081AbjFLNF1 (ORCPT ); Mon, 12 Jun 2023 09:05:27 -0400 Received: from szxga08-in.huawei.com (szxga08-in.huawei.com [45.249.212.255]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 94F2810D8; Mon, 12 Jun 2023 06:05:24 -0700 (PDT) Received: from dggpemm500005.china.huawei.com (unknown [172.30.72.55]) by szxga08-in.huawei.com (SkyGuard) with ESMTP id 4QfsHJ1yNwz18MB1; Mon, 12 Jun 2023 21:00:28 +0800 (CST) Received: from localhost.localdomain (10.69.192.56) by dggpemm500005.china.huawei.com (7.185.36.74) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.23; Mon, 12 Jun 2023 21:05:22 +0800 From: Yunsheng Lin To: , , CC: , , Yunsheng Lin , Lorenzo Bianconi , Alexander Duyck , Jesper Dangaard Brouer , Ilias Apalodimas , Eric Dumazet , Jonathan Corbet , Alexei Starovoitov , Daniel Borkmann , John Fastabend , , Subject: [PATCH net-next v4 5/5] page_pool: update document about frag API Date: Mon, 12 Jun 2023 21:02:56 +0800 Message-ID: <20230612130256.4572-6-linyunsheng@huawei.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20230612130256.4572-1-linyunsheng@huawei.com> References: <20230612130256.4572-1-linyunsheng@huawei.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Originating-IP: [10.69.192.56] X-ClientProxiedBy: dggems703-chm.china.huawei.com (10.3.19.180) To dggpemm500005.china.huawei.com (7.185.36.74) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" As more drivers begin to use the frag API, update the document about how to decide which API to for the driver author. Also it seems there is a similar document in page_pool.h, so remove it to avoid the duplication. Signed-off-by: Yunsheng Lin CC: Lorenzo Bianconi CC: Alexander Duyck --- Documentation/networking/page_pool.rst | 34 +++++++++++++++++++++----- include/net/page_pool.h | 22 ----------------- 2 files changed, 28 insertions(+), 28 deletions(-) diff --git a/Documentation/networking/page_pool.rst b/Documentation/network= ing/page_pool.rst index 873efd97f822..df3e28728008 100644 --- a/Documentation/networking/page_pool.rst +++ b/Documentation/networking/page_pool.rst @@ -4,12 +4,28 @@ Page Pool API =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =20 -The page_pool allocator is optimized for the XDP mode that uses one frame -per-page, but it can fallback on the regular page allocator APIs. - -Basic use involves replacing alloc_pages() calls with the -page_pool_alloc_pages() call. Drivers should use page_pool_dev_alloc_page= s() -replacing dev_alloc_pages(). +The page_pool allocator is optimized for recycling page or page frag used = by skb +packet and xdp frame. + +Basic use involves replacing alloc_pages() calls with different page pool +allocator API based on different use case: +1. page_pool_alloc_pages(): allocate memory without page splitting when dr= iver + knows that the memory it need is always bigger than half of the page + allocated from page pool. There is no cache line dirtying for 'struct p= age' + when a page is recycled back to the page pool. + +2. page_pool_alloc_frag(): allocate memory with page splitting when driver= knows + that the memory it need is always smaller than or equal to half of the = page + allocated from page pool. Page splitting enables memory saving and thus= avoid + TLB/cache miss for data access, but there also is some cost to implemen= t page + splitting, mainly some cache line dirtying/bouncing for 'struct page' a= nd + atomic operation for page->pp_frag_count. + +3. page_pool_alloc(): allocate memory with or without page splitting depen= ding + on the requested memory size when driver doesn't know the size of memor= y it + need beforehand. It is a mix of the above two case, so it is a wrapper = of the + above API to simplify driver's interface for memory allocation with lea= st + memory utilization and performance penalty. =20 API keeps track of in-flight pages, in order to let API user know when it is safe to free a page_pool object. Thus, API users @@ -93,6 +109,12 @@ a page will cause no race conditions is enough. * page_pool_dev_alloc_pages(): Get a page from the page allocator or page_= pool caches. =20 +* page_pool_dev_alloc_frag(): Get a page frag from the page allocator or + page_pool caches. + +* page_pool_dev_alloc(): Get a page or page frag from the page allocator or + page_pool caches. + * page_pool_get_dma_addr(): Retrieve the stored DMA address. =20 * page_pool_get_dma_dir(): Retrieve the stored DMA direction. diff --git a/include/net/page_pool.h b/include/net/page_pool.h index f4fc339ff020..5fea37fd7767 100644 --- a/include/net/page_pool.h +++ b/include/net/page_pool.h @@ -5,28 +5,6 @@ * Copyright (C) 2016 Red Hat, Inc. */ =20 -/** - * DOC: page_pool allocator - * - * This page_pool allocator is optimized for the XDP mode that - * uses one-frame-per-page, but have fallbacks that act like the - * regular page allocator APIs. - * - * Basic use involve replacing alloc_pages() calls with the - * page_pool_alloc_pages() call. Drivers should likely use - * page_pool_dev_alloc_pages() replacing dev_alloc_pages(). - * - * API keeps track of in-flight pages, in-order to let API user know - * when it is safe to dealloactor page_pool object. Thus, API users - * must make sure to call page_pool_release_page() when a page is - * "leaving" the page_pool. Or call page_pool_put_page() where - * appropiate. For maintaining correct accounting. - * - * API user must only call page_pool_put_page() once on a page, as it - * will either recycle the page, or in case of elevated refcnt, it - * will release the DMA mapping and in-flight state accounting. We - * hope to lift this requirement in the future. - */ #ifndef _NET_PAGE_POOL_H #define _NET_PAGE_POOL_H =20 --=20 2.33.0