From nobody Tue Dec 16 11:43:00 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C0914C77B73 for ; Wed, 24 May 2023 15:35:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236609AbjEXPfI (ORCPT ); Wed, 24 May 2023 11:35:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33216 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236242AbjEXPen (ORCPT ); Wed, 24 May 2023 11:34:43 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4685310D7 for ; Wed, 24 May 2023 08:33:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1684942402; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=VHLRJgykEA+vivKsr5PIe4LNQi6791a3nIaeRR1gBwE=; b=cOeU669AI18MExKi4R3dOb1S16gAjbzK+GnxT9Ya7tlZePtAi8qkouUduAs7d6WZxO2HPs d6IHOHUrJSxshxOO3pnhZIRZHo2a8uL+0UJns/Co+84UyEv9nQmWcwgRfzRN/MB9is3d/9 oG13Y68oiUIvWn/oS17NjTzh8wu53OE= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-283-sw8TgtPTPZKZI2FxAmu1nw-1; Wed, 24 May 2023 11:33:18 -0400 X-MC-Unique: sw8TgtPTPZKZI2FxAmu1nw-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 317F8185A793; Wed, 24 May 2023 15:33:17 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.39.192.68]) by smtp.corp.redhat.com (Postfix) with ESMTP id 60D09492B00; Wed, 24 May 2023 15:33:15 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Willem de Bruijn , David Ahern , Matthew Wilcox , Jens Axboe , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton Subject: [PATCH net-next 01/12] mm: Move the page fragment allocator from page_alloc.c into its own file Date: Wed, 24 May 2023 16:33:00 +0100 Message-Id: <20230524153311.3625329-2-dhowells@redhat.com> In-Reply-To: <20230524153311.3625329-1-dhowells@redhat.com> References: <20230524153311.3625329-1-dhowells@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.9 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Move the page fragment allocator from page_alloc.c into its own file preparatory to changing it. Signed-off-by: David Howells cc: Andrew Morton cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: Jens Axboe cc: Matthew Wilcox cc: linux-mm@kvack.org cc: netdev@vger.kernel.org --- mm/Makefile | 2 +- mm/page_alloc.c | 126 ----------------------------------------- mm/page_frag_alloc.c | 131 +++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 132 insertions(+), 127 deletions(-) create mode 100644 mm/page_frag_alloc.c diff --git a/mm/Makefile b/mm/Makefile index e29afc890cde..0daa4c6f4552 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -51,7 +51,7 @@ obj-y :=3D filemap.o mempool.o oom_kill.o fadvise.o \ readahead.o swap.o truncate.o vmscan.o shmem.o \ util.o mmzone.o vmstat.o backing-dev.o \ mm_init.o percpu.o slab_common.o \ - compaction.o \ + compaction.o page_frag_alloc.o \ interval_tree.o list_lru.o workingset.o \ debug.o gup.o mmap_lock.o $(mmu-y) =20 diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 47421bedc12b..29dc79dbeb22 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4871,132 +4871,6 @@ void free_pages(unsigned long addr, unsigned int or= der) =20 EXPORT_SYMBOL(free_pages); =20 -/* - * Page Fragment: - * An arbitrary-length arbitrary-offset area of memory which resides - * within a 0 or higher order page. Multiple fragments within that page - * are individually refcounted, in the page's reference counter. - * - * The page_frag functions below provide a simple allocation framework for - * page fragments. This is used by the network stack and network device - * drivers to provide a backing region of memory for use as either an - * sk_buff->head, or to be used in the "frags" portion of skb_shared_info. - */ -static struct page *__page_frag_cache_refill(struct page_frag_cache *nc, - gfp_t gfp_mask) -{ - struct page *page =3D NULL; - gfp_t gfp =3D gfp_mask; - -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) - gfp_mask |=3D __GFP_COMP | __GFP_NOWARN | __GFP_NORETRY | - __GFP_NOMEMALLOC; - page =3D alloc_pages_node(NUMA_NO_NODE, gfp_mask, - PAGE_FRAG_CACHE_MAX_ORDER); - nc->size =3D page ? PAGE_FRAG_CACHE_MAX_SIZE : PAGE_SIZE; -#endif - if (unlikely(!page)) - page =3D alloc_pages_node(NUMA_NO_NODE, gfp, 0); - - nc->va =3D page ? page_address(page) : NULL; - - return page; -} - -void __page_frag_cache_drain(struct page *page, unsigned int count) -{ - VM_BUG_ON_PAGE(page_ref_count(page) =3D=3D 0, page); - - if (page_ref_sub_and_test(page, count)) - free_the_page(page, compound_order(page)); -} -EXPORT_SYMBOL(__page_frag_cache_drain); - -void *page_frag_alloc_align(struct page_frag_cache *nc, - unsigned int fragsz, gfp_t gfp_mask, - unsigned int align_mask) -{ - unsigned int size =3D PAGE_SIZE; - struct page *page; - int offset; - - if (unlikely(!nc->va)) { -refill: - page =3D __page_frag_cache_refill(nc, gfp_mask); - if (!page) - return NULL; - -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) - /* if size can vary use size else just use PAGE_SIZE */ - size =3D nc->size; -#endif - /* Even if we own the page, we do not use atomic_set(). - * This would break get_page_unless_zero() users. - */ - page_ref_add(page, PAGE_FRAG_CACHE_MAX_SIZE); - - /* reset page count bias and offset to start of new frag */ - nc->pfmemalloc =3D page_is_pfmemalloc(page); - nc->pagecnt_bias =3D PAGE_FRAG_CACHE_MAX_SIZE + 1; - nc->offset =3D size; - } - - offset =3D nc->offset - fragsz; - if (unlikely(offset < 0)) { - page =3D virt_to_page(nc->va); - - if (!page_ref_sub_and_test(page, nc->pagecnt_bias)) - goto refill; - - if (unlikely(nc->pfmemalloc)) { - free_the_page(page, compound_order(page)); - goto refill; - } - -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) - /* if size can vary use size else just use PAGE_SIZE */ - size =3D nc->size; -#endif - /* OK, page count is 0, we can safely set it */ - set_page_count(page, PAGE_FRAG_CACHE_MAX_SIZE + 1); - - /* reset page count bias and offset to start of new frag */ - nc->pagecnt_bias =3D PAGE_FRAG_CACHE_MAX_SIZE + 1; - offset =3D size - fragsz; - if (unlikely(offset < 0)) { - /* - * The caller is trying to allocate a fragment - * with fragsz > PAGE_SIZE but the cache isn't big - * enough to satisfy the request, this may - * happen in low memory conditions. - * We don't release the cache page because - * it could make memory pressure worse - * so we simply return NULL here. - */ - return NULL; - } - } - - nc->pagecnt_bias--; - offset &=3D align_mask; - nc->offset =3D offset; - - return nc->va + offset; -} -EXPORT_SYMBOL(page_frag_alloc_align); - -/* - * Frees a page fragment allocated out of either a compound or order 0 pag= e. - */ -void page_frag_free(void *addr) -{ - struct page *page =3D virt_to_head_page(addr); - - if (unlikely(put_page_testzero(page))) - free_the_page(page, compound_order(page)); -} -EXPORT_SYMBOL(page_frag_free); - static void *make_alloc_exact(unsigned long addr, unsigned int order, size_t size) { diff --git a/mm/page_frag_alloc.c b/mm/page_frag_alloc.c new file mode 100644 index 000000000000..bee95824ef8f --- /dev/null +++ b/mm/page_frag_alloc.c @@ -0,0 +1,131 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Page fragment allocator + * + * Page Fragment: + * An arbitrary-length arbitrary-offset area of memory which resides with= in a + * 0 or higher order page. Multiple fragments within that page are + * individually refcounted, in the page's reference counter. + * + * The page_frag functions provide a simple allocation framework for page + * fragments. This is used by the network stack and network device driver= s to + * provide a backing region of memory for use as either an sk_buff->head, = or to + * be used in the "frags" portion of skb_shared_info. + */ + +#include +#include +#include + +static struct page *__page_frag_cache_refill(struct page_frag_cache *nc, + gfp_t gfp_mask) +{ + struct page *page =3D NULL; + gfp_t gfp =3D gfp_mask; + +#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) + gfp_mask |=3D __GFP_COMP | __GFP_NOWARN | __GFP_NORETRY | + __GFP_NOMEMALLOC; + page =3D alloc_pages_node(NUMA_NO_NODE, gfp_mask, + PAGE_FRAG_CACHE_MAX_ORDER); + nc->size =3D page ? PAGE_FRAG_CACHE_MAX_SIZE : PAGE_SIZE; +#endif + if (unlikely(!page)) + page =3D alloc_pages_node(NUMA_NO_NODE, gfp, 0); + + nc->va =3D page ? page_address(page) : NULL; + + return page; +} + +void __page_frag_cache_drain(struct page *page, unsigned int count) +{ + VM_BUG_ON_PAGE(page_ref_count(page) =3D=3D 0, page); + + if (page_ref_sub_and_test(page, count - 1)) + __free_pages(page, compound_order(page)); +} +EXPORT_SYMBOL(__page_frag_cache_drain); + +void *page_frag_alloc_align(struct page_frag_cache *nc, + unsigned int fragsz, gfp_t gfp_mask, + unsigned int align_mask) +{ + unsigned int size =3D PAGE_SIZE; + struct page *page; + int offset; + + if (unlikely(!nc->va)) { +refill: + page =3D __page_frag_cache_refill(nc, gfp_mask); + if (!page) + return NULL; + +#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) + /* if size can vary use size else just use PAGE_SIZE */ + size =3D nc->size; +#endif + /* Even if we own the page, we do not use atomic_set(). + * This would break get_page_unless_zero() users. + */ + page_ref_add(page, PAGE_FRAG_CACHE_MAX_SIZE); + + /* reset page count bias and offset to start of new frag */ + nc->pfmemalloc =3D page_is_pfmemalloc(page); + nc->pagecnt_bias =3D PAGE_FRAG_CACHE_MAX_SIZE + 1; + nc->offset =3D size; + } + + offset =3D nc->offset - fragsz; + if (unlikely(offset < 0)) { + page =3D virt_to_page(nc->va); + + if (page_ref_count(page) !=3D nc->pagecnt_bias) + goto refill; + if (unlikely(nc->pfmemalloc)) { + page_ref_sub(page, nc->pagecnt_bias - 1); + __free_pages(page, compound_order(page)); + goto refill; + } + +#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) + /* if size can vary use size else just use PAGE_SIZE */ + size =3D nc->size; +#endif + /* OK, page count is 0, we can safely set it */ + set_page_count(page, PAGE_FRAG_CACHE_MAX_SIZE + 1); + + /* reset page count bias and offset to start of new frag */ + nc->pagecnt_bias =3D PAGE_FRAG_CACHE_MAX_SIZE + 1; + offset =3D size - fragsz; + if (unlikely(offset < 0)) { + /* + * The caller is trying to allocate a fragment + * with fragsz > PAGE_SIZE but the cache isn't big + * enough to satisfy the request, this may + * happen in low memory conditions. + * We don't release the cache page because + * it could make memory pressure worse + * so we simply return NULL here. + */ + return NULL; + } + } + + nc->pagecnt_bias--; + offset &=3D align_mask; + nc->offset =3D offset; + + return nc->va + offset; +} +EXPORT_SYMBOL(page_frag_alloc_align); + +/* + * Frees a page fragment allocated out of either a compound or order 0 pag= e. + */ +void page_frag_free(void *addr) +{ + struct page *page =3D virt_to_head_page(addr); + + __free_pages(page, compound_order(page)); +} +EXPORT_SYMBOL(page_frag_free); From nobody Tue Dec 16 11:43:00 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E7EBDC77B7A for ; Wed, 24 May 2023 15:35:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230514AbjEXPf1 (ORCPT ); Wed, 24 May 2023 11:35:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33402 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236640AbjEXPe5 (ORCPT ); Wed, 24 May 2023 11:34:57 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E20CCE4B for ; Wed, 24 May 2023 08:33:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1684942408; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5Xt/kb1I9tBW/zHVjBQrREyGjijOb+tpZUMT6BFY+/c=; b=HZ8Of0rK1hyiWfcfsq/n1TQ4H44jwzSuyskoP3Xryi302meqeGzHEvoVATpV9YiR2GNG+6 CIcHInZD4ckVVi5gM2ajw2g4GvRoZ0siIranHiFJmQZWD6ZyvnrkqcJJZP0HV41w3hoQFK ni9sDY+5Z2+vodB0va9kc1KrZ5khZQM= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-147-h5Ilq50YMBixAqm1S8Ipag-1; Wed, 24 May 2023 11:33:24 -0400 X-MC-Unique: h5Ilq50YMBixAqm1S8Ipag-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 52D53280BC8A; Wed, 24 May 2023 15:33:22 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.39.192.68]) by smtp.corp.redhat.com (Postfix) with ESMTP id 24FB920296C8; Wed, 24 May 2023 15:33:18 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Willem de Bruijn , David Ahern , Matthew Wilcox , Jens Axboe , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Jeroen de Borst , Catherine Sullivan , Shailend Chand , Felix Fietkau , John Crispin , Sean Wang , Mark Lee , Lorenzo Bianconi , Matthias Brugger , AngeloGioacchino Del Regno , Keith Busch , Jens Axboe , Christoph Hellwig , Sagi Grimberg , Chaitanya Kulkarni , Andrew Morton , linux-arm-kernel@lists.infradead.org, linux-mediatek@lists.infradead.org, linux-nvme@lists.infradead.org Subject: [PATCH net-next 02/12] mm: Provide a page_frag_cache allocator cleanup function Date: Wed, 24 May 2023 16:33:01 +0100 Message-Id: <20230524153311.3625329-3-dhowells@redhat.com> In-Reply-To: <20230524153311.3625329-1-dhowells@redhat.com> References: <20230524153311.3625329-1-dhowells@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Provide a function to clean up a page_frag_cache allocator rather than doing it manually each time. Signed-off-by: David Howells cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: Jens Axboe cc: Jeroen de Borst cc: Catherine Sullivan cc: Shailend Chand cc: Felix Fietkau cc: John Crispin cc: Sean Wang cc: Mark Lee cc: Lorenzo Bianconi cc: Matthias Brugger cc: AngeloGioacchino Del Regno cc: Keith Busch cc: Jens Axboe cc: Christoph Hellwig cc: Sagi Grimberg cc: Chaitanya Kulkarni cc: Andrew Morton cc: Matthew Wilcox cc: netdev@vger.kernel.org cc: linux-arm-kernel@lists.infradead.org cc: linux-mediatek@lists.infradead.org cc: linux-nvme@lists.infradead.org cc: linux-mm@kvack.org --- drivers/net/ethernet/google/gve/gve_main.c | 11 ++--------- drivers/net/ethernet/mediatek/mtk_wed_wo.c | 17 ++--------------- drivers/nvme/host/tcp.c | 8 +------- drivers/nvme/target/tcp.c | 5 +---- include/linux/gfp.h | 2 ++ mm/page_frag_alloc.c | 17 +++++++++++++++++ 6 files changed, 25 insertions(+), 35 deletions(-) diff --git a/drivers/net/ethernet/google/gve/gve_main.c b/drivers/net/ether= net/google/gve/gve_main.c index 8fb70db63b8b..55feab29bed9 100644 --- a/drivers/net/ethernet/google/gve/gve_main.c +++ b/drivers/net/ethernet/google/gve/gve_main.c @@ -1251,17 +1251,10 @@ static void gve_unreg_xdp_info(struct gve_priv *pri= v) =20 static void gve_drain_page_cache(struct gve_priv *priv) { - struct page_frag_cache *nc; int i; =20 - for (i =3D 0; i < priv->rx_cfg.num_queues; i++) { - nc =3D &priv->rx[i].page_cache; - if (nc->va) { - __page_frag_cache_drain(virt_to_page(nc->va), - nc->pagecnt_bias); - nc->va =3D NULL; - } - } + for (i =3D 0; i < priv->rx_cfg.num_queues; i++) + page_frag_cache_clear(&priv->rx[i].page_cache); } =20 static int gve_open(struct net_device *dev) diff --git a/drivers/net/ethernet/mediatek/mtk_wed_wo.c b/drivers/net/ether= net/mediatek/mtk_wed_wo.c index 69fba29055e9..d90fea2c7d04 100644 --- a/drivers/net/ethernet/mediatek/mtk_wed_wo.c +++ b/drivers/net/ethernet/mediatek/mtk_wed_wo.c @@ -286,7 +286,6 @@ mtk_wed_wo_queue_free(struct mtk_wed_wo *wo, struct mtk= _wed_wo_queue *q) static void mtk_wed_wo_queue_tx_clean(struct mtk_wed_wo *wo, struct mtk_wed_wo_queue *= q) { - struct page *page; int i; =20 for (i =3D 0; i < q->n_desc; i++) { @@ -298,19 +297,12 @@ mtk_wed_wo_queue_tx_clean(struct mtk_wed_wo *wo, stru= ct mtk_wed_wo_queue *q) entry->buf =3D NULL; } =20 - if (!q->cache.va) - return; - - page =3D virt_to_page(q->cache.va); - __page_frag_cache_drain(page, q->cache.pagecnt_bias); - memset(&q->cache, 0, sizeof(q->cache)); + page_frag_cache_clear(&q->cache); } =20 static void mtk_wed_wo_queue_rx_clean(struct mtk_wed_wo *wo, struct mtk_wed_wo_queue *= q) { - struct page *page; - for (;;) { void *buf =3D mtk_wed_wo_dequeue(wo, q, NULL, true); =20 @@ -320,12 +312,7 @@ mtk_wed_wo_queue_rx_clean(struct mtk_wed_wo *wo, struc= t mtk_wed_wo_queue *q) skb_free_frag(buf); } =20 - if (!q->cache.va) - return; - - page =3D virt_to_page(q->cache.va); - __page_frag_cache_drain(page, q->cache.pagecnt_bias); - memset(&q->cache, 0, sizeof(q->cache)); + page_frag_cache_clear(&q->cache); } =20 static void diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c index bf0230442d57..dcc35f6bff8c 100644 --- a/drivers/nvme/host/tcp.c +++ b/drivers/nvme/host/tcp.c @@ -1315,7 +1315,6 @@ static int nvme_tcp_alloc_async_req(struct nvme_tcp_c= trl *ctrl) =20 static void nvme_tcp_free_queue(struct nvme_ctrl *nctrl, int qid) { - struct page *page; struct nvme_tcp_ctrl *ctrl =3D to_tcp_ctrl(nctrl); struct nvme_tcp_queue *queue =3D &ctrl->queues[qid]; unsigned int noreclaim_flag; @@ -1326,12 +1325,7 @@ static void nvme_tcp_free_queue(struct nvme_ctrl *nc= trl, int qid) if (queue->hdr_digest || queue->data_digest) nvme_tcp_free_crypto(queue); =20 - if (queue->pf_cache.va) { - page =3D virt_to_head_page(queue->pf_cache.va); - __page_frag_cache_drain(page, queue->pf_cache.pagecnt_bias); - queue->pf_cache.va =3D NULL; - } - + page_frag_cache_clear(&queue->pf_cache); noreclaim_flag =3D memalloc_noreclaim_save(); sock_release(queue->sock); memalloc_noreclaim_restore(noreclaim_flag); diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c index ed98df72c76b..984e6ce85dcd 100644 --- a/drivers/nvme/target/tcp.c +++ b/drivers/nvme/target/tcp.c @@ -1464,7 +1464,6 @@ static void nvmet_tcp_free_cmd_data_in_buffers(struct= nvmet_tcp_queue *queue) =20 static void nvmet_tcp_release_queue_work(struct work_struct *w) { - struct page *page; struct nvmet_tcp_queue *queue =3D container_of(w, struct nvmet_tcp_queue, release_work); =20 @@ -1486,9 +1485,7 @@ static void nvmet_tcp_release_queue_work(struct work_= struct *w) if (queue->hdr_digest || queue->data_digest) nvmet_tcp_free_crypto(queue); ida_free(&nvmet_tcp_queue_ida, queue->idx); - - page =3D virt_to_head_page(queue->pf_cache.va); - __page_frag_cache_drain(page, queue->pf_cache.pagecnt_bias); + page_frag_cache_clear(&queue->pf_cache); kfree(queue); } =20 diff --git a/include/linux/gfp.h b/include/linux/gfp.h index ed8cb537c6a7..03504beb51e4 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -314,6 +314,8 @@ static inline void *page_frag_alloc(struct page_frag_ca= che *nc, return page_frag_alloc_align(nc, fragsz, gfp_mask, ~0u); } =20 +void page_frag_cache_clear(struct page_frag_cache *nc); + extern void page_frag_free(void *addr); =20 #define __free_page(page) __free_pages((page), 0) diff --git a/mm/page_frag_alloc.c b/mm/page_frag_alloc.c index bee95824ef8f..e02b81d68dc4 100644 --- a/mm/page_frag_alloc.c +++ b/mm/page_frag_alloc.c @@ -46,6 +46,23 @@ void __page_frag_cache_drain(struct page *page, unsigned= int count) } EXPORT_SYMBOL(__page_frag_cache_drain); =20 +/** + * page_frag_cache_clear - Clear out a page fragment cache + * @nc: The cache to clear + * + * Discard any pages still cached in a page fragment cache. + */ +void page_frag_cache_clear(struct page_frag_cache *nc) +{ + if (nc->va) { + struct page *page =3D virt_to_head_page(nc->va); + + __page_frag_cache_drain(page, nc->pagecnt_bias); + nc->va =3D NULL; + } +} +EXPORT_SYMBOL(page_frag_cache_clear); + void *page_frag_alloc_align(struct page_frag_cache *nc, unsigned int fragsz, gfp_t gfp_mask, unsigned int align_mask) From nobody Tue Dec 16 11:43:00 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8B5D7C77B73 for ; Wed, 24 May 2023 15:36:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236751AbjEXPgR (ORCPT ); Wed, 24 May 2023 11:36:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32974 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236678AbjEXPfM (ORCPT ); Wed, 24 May 2023 11:35:12 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EDC171A4 for ; Wed, 24 May 2023 08:33:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1684942412; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=uN3mSK+PjIP6JAbZhfDAmtnx0lnfpAA82MAUedDHiDA=; b=bP62N1dupkuCW43Pb7TDZG9ocoOp2S5lyB2lhWFEX+jJLTNZkk+WbXl29K0T1G1+F9i6UI vBX/k5DkEWqhFn0X7rTLNFnjO5mOSp1v6TzfcnDvNxks2VDFGTiCAihCsPqn0BvbKAARKI RJwAFG32pMR6uzYvS9Nqto2XMnkDxz8= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-649-Arb27wHhPVquoWD2zvAqmw-1; Wed, 24 May 2023 11:33:28 -0400 X-MC-Unique: Arb27wHhPVquoWD2zvAqmw-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id EB53C3C0D185; Wed, 24 May 2023 15:33:26 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.39.192.68]) by smtp.corp.redhat.com (Postfix) with ESMTP id 12905492B0B; Wed, 24 May 2023 15:33:22 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Willem de Bruijn , David Ahern , Matthew Wilcox , Jens Axboe , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Jeroen de Borst , Catherine Sullivan , Shailend Chand , Felix Fietkau , John Crispin , Sean Wang , Mark Lee , Lorenzo Bianconi , Matthias Brugger , AngeloGioacchino Del Regno , Keith Busch , Jens Axboe , Christoph Hellwig , Sagi Grimberg , Chaitanya Kulkarni , Andrew Morton , linux-arm-kernel@lists.infradead.org, linux-mediatek@lists.infradead.org, linux-nvme@lists.infradead.org Subject: [PATCH net-next 03/12] mm: Make the page_frag_cache allocator alignment param a pow-of-2 Date: Wed, 24 May 2023 16:33:02 +0100 Message-Id: <20230524153311.3625329-4-dhowells@redhat.com> In-Reply-To: <20230524153311.3625329-1-dhowells@redhat.com> References: <20230524153311.3625329-1-dhowells@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.10 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Make the page_frag_cache allocator's alignment parameter a power of 2 rather than a mask and give a warning if it isn't. This means that it's consistent with {napi,netdec}_alloc_frag_align() and allows __{napi,netdev}_alloc_frag_align() to be removed. Signed-off-by: David Howells cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: Jens Axboe cc: Jeroen de Borst cc: Catherine Sullivan cc: Shailend Chand cc: Felix Fietkau cc: John Crispin cc: Sean Wang cc: Mark Lee cc: Lorenzo Bianconi cc: Matthias Brugger cc: AngeloGioacchino Del Regno cc: Keith Busch cc: Jens Axboe cc: Christoph Hellwig cc: Sagi Grimberg cc: Chaitanya Kulkarni cc: Andrew Morton cc: Matthew Wilcox cc: netdev@vger.kernel.org cc: linux-arm-kernel@lists.infradead.org cc: linux-mediatek@lists.infradead.org cc: linux-nvme@lists.infradead.org cc: linux-mm@kvack.org --- include/linux/gfp.h | 4 ++-- include/linux/skbuff.h | 22 ++++------------------ mm/page_frag_alloc.c | 8 +++++--- net/core/skbuff.c | 14 +++++++------- 4 files changed, 18 insertions(+), 30 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 03504beb51e4..fa30100f46ad 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -306,12 +306,12 @@ struct page_frag_cache; extern void __page_frag_cache_drain(struct page *page, unsigned int count); extern void *page_frag_alloc_align(struct page_frag_cache *nc, unsigned int fragsz, gfp_t gfp_mask, - unsigned int align_mask); + unsigned int align); =20 static inline void *page_frag_alloc(struct page_frag_cache *nc, unsigned int fragsz, gfp_t gfp_mask) { - return page_frag_alloc_align(nc, fragsz, gfp_mask, ~0u); + return page_frag_alloc_align(nc, fragsz, gfp_mask, 1); } =20 void page_frag_cache_clear(struct page_frag_cache *nc); diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 1b2ebf6113e0..41b63e72c6c3 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -3158,7 +3158,7 @@ void skb_queue_purge(struct sk_buff_head *list); =20 unsigned int skb_rbtree_purge(struct rb_root *root); =20 -void *__netdev_alloc_frag_align(unsigned int fragsz, unsigned int align_ma= sk); +void *netdev_alloc_frag_align(unsigned int fragsz, unsigned int align); =20 /** * netdev_alloc_frag - allocate a page fragment @@ -3169,14 +3169,7 @@ void *__netdev_alloc_frag_align(unsigned int fragsz,= unsigned int align_mask); */ static inline void *netdev_alloc_frag(unsigned int fragsz) { - return __netdev_alloc_frag_align(fragsz, ~0u); -} - -static inline void *netdev_alloc_frag_align(unsigned int fragsz, - unsigned int align) -{ - WARN_ON_ONCE(!is_power_of_2(align)); - return __netdev_alloc_frag_align(fragsz, -align); + return netdev_alloc_frag_align(fragsz, 1); } =20 struct sk_buff *__netdev_alloc_skb(struct net_device *dev, unsigned int le= ngth, @@ -3236,18 +3229,11 @@ static inline void skb_free_frag(void *addr) page_frag_free(addr); } =20 -void *__napi_alloc_frag_align(unsigned int fragsz, unsigned int align_mask= ); +void *napi_alloc_frag_align(unsigned int fragsz, unsigned int align); =20 static inline void *napi_alloc_frag(unsigned int fragsz) { - return __napi_alloc_frag_align(fragsz, ~0u); -} - -static inline void *napi_alloc_frag_align(unsigned int fragsz, - unsigned int align) -{ - WARN_ON_ONCE(!is_power_of_2(align)); - return __napi_alloc_frag_align(fragsz, -align); + return napi_alloc_frag_align(fragsz, 1); } =20 struct sk_buff *__napi_alloc_skb(struct napi_struct *napi, diff --git a/mm/page_frag_alloc.c b/mm/page_frag_alloc.c index e02b81d68dc4..9d3f6fbd9a07 100644 --- a/mm/page_frag_alloc.c +++ b/mm/page_frag_alloc.c @@ -64,13 +64,15 @@ void page_frag_cache_clear(struct page_frag_cache *nc) EXPORT_SYMBOL(page_frag_cache_clear); =20 void *page_frag_alloc_align(struct page_frag_cache *nc, - unsigned int fragsz, gfp_t gfp_mask, - unsigned int align_mask) + unsigned int fragsz, gfp_t gfp_mask, + unsigned int align) { unsigned int size =3D PAGE_SIZE; struct page *page; int offset; =20 + WARN_ON_ONCE(!is_power_of_2(align)); + if (unlikely(!nc->va)) { refill: page =3D __page_frag_cache_refill(nc, gfp_mask); @@ -129,7 +131,7 @@ void *page_frag_alloc_align(struct page_frag_cache *nc, } =20 nc->pagecnt_bias--; - offset &=3D align_mask; + offset &=3D ~(align - 1); nc->offset =3D offset; =20 return nc->va + offset; diff --git a/net/core/skbuff.c b/net/core/skbuff.c index f4a5b51aed22..cc507433b357 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -289,17 +289,17 @@ void napi_get_frags_check(struct napi_struct *napi) local_bh_enable(); } =20 -void *__napi_alloc_frag_align(unsigned int fragsz, unsigned int align_mask) +void *napi_alloc_frag_align(unsigned int fragsz, unsigned int align) { struct napi_alloc_cache *nc =3D this_cpu_ptr(&napi_alloc_cache); =20 fragsz =3D SKB_DATA_ALIGN(fragsz); =20 - return page_frag_alloc_align(&nc->page, fragsz, GFP_ATOMIC, align_mask); + return page_frag_alloc_align(&nc->page, fragsz, GFP_ATOMIC, align); } -EXPORT_SYMBOL(__napi_alloc_frag_align); +EXPORT_SYMBOL(napi_alloc_frag_align); =20 -void *__netdev_alloc_frag_align(unsigned int fragsz, unsigned int align_ma= sk) +void *netdev_alloc_frag_align(unsigned int fragsz, unsigned int align) { void *data; =20 @@ -307,18 +307,18 @@ void *__netdev_alloc_frag_align(unsigned int fragsz, = unsigned int align_mask) if (in_hardirq() || irqs_disabled()) { struct page_frag_cache *nc =3D this_cpu_ptr(&netdev_alloc_cache); =20 - data =3D page_frag_alloc_align(nc, fragsz, GFP_ATOMIC, align_mask); + data =3D page_frag_alloc_align(nc, fragsz, GFP_ATOMIC, align); } else { struct napi_alloc_cache *nc; =20 local_bh_disable(); nc =3D this_cpu_ptr(&napi_alloc_cache); - data =3D page_frag_alloc_align(&nc->page, fragsz, GFP_ATOMIC, align_mask= ); + data =3D page_frag_alloc_align(&nc->page, fragsz, GFP_ATOMIC, align); local_bh_enable(); } return data; } -EXPORT_SYMBOL(__netdev_alloc_frag_align); +EXPORT_SYMBOL(netdev_alloc_frag_align); =20 static struct sk_buff *napi_skb_cache_get(void) { From nobody Tue Dec 16 11:43:00 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 09233C77B73 for ; Wed, 24 May 2023 15:36:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236465AbjEXPgf (ORCPT ); Wed, 24 May 2023 11:36:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33516 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236709AbjEXPfO (ORCPT ); Wed, 24 May 2023 11:35:14 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 71F471A6 for ; Wed, 24 May 2023 08:33:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1684942414; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=E/mkzJ+c6qr7yPOsPF9NXbENxWjJthPNWT3mBurv3yc=; b=Kl/d8uRVqHoqy6t67jIyWJG+FdoHl4o80HDg8dKimTsfU0CPt45a5B+LQPTP6L+bJIV+5t Z2DSn7/0jqr36t3FWcRjSDBO1W5aKMCOAj/kPlmw1YdFgSPMS3VrOCe7GNwydphU+RbsA9 TBaPHlMo6r89U5IgOsGkURKcllo/aoU= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-617-ozOdLk_aN02TMJY6p7EiVg-1; Wed, 24 May 2023 11:33:33 -0400 X-MC-Unique: ozOdLk_aN02TMJY6p7EiVg-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id B040C280BC80; Wed, 24 May 2023 15:33:31 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.39.192.68]) by smtp.corp.redhat.com (Postfix) with ESMTP id C783840CFD45; Wed, 24 May 2023 15:33:27 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Willem de Bruijn , David Ahern , Matthew Wilcox , Jens Axboe , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Jeroen de Borst , Catherine Sullivan , Shailend Chand , Felix Fietkau , John Crispin , Sean Wang , Mark Lee , Lorenzo Bianconi , Matthias Brugger , AngeloGioacchino Del Regno , Keith Busch , Jens Axboe , Christoph Hellwig , Sagi Grimberg , Chaitanya Kulkarni , Andrew Morton , linux-arm-kernel@lists.infradead.org, linux-mediatek@lists.infradead.org, linux-nvme@lists.infradead.org Subject: [PATCH net-next 04/12] mm: Make the page_frag_cache allocator use multipage folios Date: Wed, 24 May 2023 16:33:03 +0100 Message-Id: <20230524153311.3625329-5-dhowells@redhat.com> In-Reply-To: <20230524153311.3625329-1-dhowells@redhat.com> References: <20230524153311.3625329-1-dhowells@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.1 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Change the page_frag_cache allocator to use multipage folios rather than groups of pages. This reduces page_frag_free to just a folio_put() or put_page(). Signed-off-by: David Howells cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: Jens Axboe cc: Jeroen de Borst cc: Catherine Sullivan cc: Shailend Chand cc: Felix Fietkau cc: John Crispin cc: Sean Wang cc: Mark Lee cc: Lorenzo Bianconi cc: Matthias Brugger cc: AngeloGioacchino Del Regno cc: Keith Busch cc: Jens Axboe cc: Christoph Hellwig cc: Sagi Grimberg cc: Chaitanya Kulkarni cc: Andrew Morton cc: Matthew Wilcox cc: netdev@vger.kernel.org cc: linux-arm-kernel@lists.infradead.org cc: linux-mediatek@lists.infradead.org cc: linux-nvme@lists.infradead.org cc: linux-mm@kvack.org --- include/linux/mm_types.h | 13 ++---- mm/page_frag_alloc.c | 99 +++++++++++++++++++--------------------- 2 files changed, 52 insertions(+), 60 deletions(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 306a3d1a0fa6..d7c52a5979cc 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -420,18 +420,13 @@ static inline void *folio_get_private(struct folio *f= olio) } =20 struct page_frag_cache { - void * va; -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) - __u16 offset; - __u16 size; -#else - __u32 offset; -#endif + struct folio *folio; + unsigned int offset; /* we maintain a pagecount bias, so that we dont dirty cache line * containing page->_refcount every time we allocate a fragment. */ - unsigned int pagecnt_bias; - bool pfmemalloc; + unsigned int pagecnt_bias; + bool pfmemalloc; }; =20 typedef unsigned long vm_flags_t; diff --git a/mm/page_frag_alloc.c b/mm/page_frag_alloc.c index 9d3f6fbd9a07..ffd68bfb677d 100644 --- a/mm/page_frag_alloc.c +++ b/mm/page_frag_alloc.c @@ -16,33 +16,34 @@ #include #include =20 -static struct page *__page_frag_cache_refill(struct page_frag_cache *nc, - gfp_t gfp_mask) +/* + * Allocate a new folio for the frag cache. + */ +static struct folio *page_frag_cache_refill(struct page_frag_cache *nc, + gfp_t gfp_mask) { - struct page *page =3D NULL; + struct folio *folio =3D NULL; gfp_t gfp =3D gfp_mask; =20 #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) - gfp_mask |=3D __GFP_COMP | __GFP_NOWARN | __GFP_NORETRY | - __GFP_NOMEMALLOC; - page =3D alloc_pages_node(NUMA_NO_NODE, gfp_mask, - PAGE_FRAG_CACHE_MAX_ORDER); - nc->size =3D page ? PAGE_FRAG_CACHE_MAX_SIZE : PAGE_SIZE; + gfp_mask |=3D __GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC; + folio =3D folio_alloc(gfp_mask, PAGE_FRAG_CACHE_MAX_ORDER); #endif - if (unlikely(!page)) - page =3D alloc_pages_node(NUMA_NO_NODE, gfp, 0); + if (unlikely(!folio)) + folio =3D folio_alloc(gfp, 0); =20 - nc->va =3D page ? page_address(page) : NULL; - - return page; + if (folio) + nc->folio =3D folio; + return folio; } =20 void __page_frag_cache_drain(struct page *page, unsigned int count) { - VM_BUG_ON_PAGE(page_ref_count(page) =3D=3D 0, page); + struct folio *folio =3D page_folio(page); + + VM_BUG_ON_FOLIO(folio_ref_count(folio) =3D=3D 0, folio); =20 - if (page_ref_sub_and_test(page, count - 1)) - __free_pages(page, compound_order(page)); + folio_put_refs(folio, count); } EXPORT_SYMBOL(__page_frag_cache_drain); =20 @@ -54,11 +55,12 @@ EXPORT_SYMBOL(__page_frag_cache_drain); */ void page_frag_cache_clear(struct page_frag_cache *nc) { - if (nc->va) { - struct page *page =3D virt_to_head_page(nc->va); + struct folio *folio =3D nc->folio; =20 - __page_frag_cache_drain(page, nc->pagecnt_bias); - nc->va =3D NULL; + if (folio) { + VM_BUG_ON_FOLIO(folio_ref_count(folio) =3D=3D 0, folio); + folio_put_refs(folio, nc->pagecnt_bias); + nc->folio =3D NULL; } } EXPORT_SYMBOL(page_frag_cache_clear); @@ -67,56 +69,51 @@ void *page_frag_alloc_align(struct page_frag_cache *nc, unsigned int fragsz, gfp_t gfp_mask, unsigned int align) { - unsigned int size =3D PAGE_SIZE; - struct page *page; - int offset; + struct folio *folio =3D nc->folio; + size_t offset; =20 WARN_ON_ONCE(!is_power_of_2(align)); =20 - if (unlikely(!nc->va)) { + if (unlikely(!folio)) { refill: - page =3D __page_frag_cache_refill(nc, gfp_mask); - if (!page) + folio =3D page_frag_cache_refill(nc, gfp_mask); + if (!folio) return NULL; =20 -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) - /* if size can vary use size else just use PAGE_SIZE */ - size =3D nc->size; -#endif /* Even if we own the page, we do not use atomic_set(). * This would break get_page_unless_zero() users. */ - page_ref_add(page, PAGE_FRAG_CACHE_MAX_SIZE); + folio_ref_add(folio, PAGE_FRAG_CACHE_MAX_SIZE); =20 /* reset page count bias and offset to start of new frag */ - nc->pfmemalloc =3D page_is_pfmemalloc(page); + nc->pfmemalloc =3D folio_is_pfmemalloc(folio); nc->pagecnt_bias =3D PAGE_FRAG_CACHE_MAX_SIZE + 1; - nc->offset =3D size; + nc->offset =3D folio_size(folio); } =20 - offset =3D nc->offset - fragsz; - if (unlikely(offset < 0)) { - page =3D virt_to_page(nc->va); - - if (page_ref_count(page) !=3D nc->pagecnt_bias) + offset =3D nc->offset; + if (unlikely(fragsz > offset)) { + /* Reuse the folio if everyone we gave it to has finished with + * it. + */ + if (!folio_ref_sub_and_test(folio, nc->pagecnt_bias)) { + nc->folio =3D NULL; goto refill; + } + if (unlikely(nc->pfmemalloc)) { - page_ref_sub(page, nc->pagecnt_bias - 1); - __free_pages(page, compound_order(page)); + __folio_put(folio); + nc->folio =3D NULL; goto refill; } =20 -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) - /* if size can vary use size else just use PAGE_SIZE */ - size =3D nc->size; -#endif /* OK, page count is 0, we can safely set it */ - set_page_count(page, PAGE_FRAG_CACHE_MAX_SIZE + 1); + folio_set_count(folio, PAGE_FRAG_CACHE_MAX_SIZE + 1); =20 /* reset page count bias and offset to start of new frag */ nc->pagecnt_bias =3D PAGE_FRAG_CACHE_MAX_SIZE + 1; - offset =3D size - fragsz; - if (unlikely(offset < 0)) { + offset =3D folio_size(folio); + if (unlikely(fragsz > offset)) { /* * The caller is trying to allocate a fragment * with fragsz > PAGE_SIZE but the cache isn't big @@ -126,15 +123,17 @@ void *page_frag_alloc_align(struct page_frag_cache *n= c, * it could make memory pressure worse * so we simply return NULL here. */ + nc->offset =3D offset; return NULL; } } =20 nc->pagecnt_bias--; + offset -=3D fragsz; offset &=3D ~(align - 1); nc->offset =3D offset; =20 - return nc->va + offset; + return folio_address(folio) + offset; } EXPORT_SYMBOL(page_frag_alloc_align); =20 @@ -143,8 +142,6 @@ EXPORT_SYMBOL(page_frag_alloc_align); */ void page_frag_free(void *addr) { - struct page *page =3D virt_to_head_page(addr); - - __free_pages(page, compound_order(page)); + folio_put(virt_to_folio(addr)); } EXPORT_SYMBOL(page_frag_free); From nobody Tue Dec 16 11:43:00 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 06932C77B7A for ; Wed, 24 May 2023 15:36:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237158AbjEXPg2 (ORCPT ); Wed, 24 May 2023 11:36:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33688 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236763AbjEXPfR (ORCPT ); Wed, 24 May 2023 11:35:17 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BF8771B6 for ; Wed, 24 May 2023 08:34:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1684942421; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=uekJL1E5Uz+9JLqowhoPzZVYALfVqMM/z7Y13jfCLzU=; b=BTDSpsphuW9fIcfezkiYIjMwV/fLk+H1R2qNj0ohldeC9yQQN1Nu1JRudoBtwqi+lYtstE 6lh71gCm9Lu8CZDVcLHpSJ+IgwNg61XWWX/cPQ6RTIyY0oQeYPKwz7va0rS4eZGy7NSVgv 9iNqw7yPVSnmEzUzC1q7a/kjS0OSino= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-168-UtOOYHJNMQqeOwkvKYu_rw-1; Wed, 24 May 2023 11:33:38 -0400 X-MC-Unique: UtOOYHJNMQqeOwkvKYu_rw-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 4F027280BCA2; Wed, 24 May 2023 15:33:36 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.39.192.68]) by smtp.corp.redhat.com (Postfix) with ESMTP id 6A068492B0A; Wed, 24 May 2023 15:33:32 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Willem de Bruijn , David Ahern , Matthew Wilcox , Jens Axboe , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Jeroen de Borst , Catherine Sullivan , Shailend Chand , Felix Fietkau , John Crispin , Sean Wang , Mark Lee , Lorenzo Bianconi , Matthias Brugger , AngeloGioacchino Del Regno , Keith Busch , Jens Axboe , Christoph Hellwig , Sagi Grimberg , Chaitanya Kulkarni , Andrew Morton , linux-arm-kernel@lists.infradead.org, linux-mediatek@lists.infradead.org, linux-nvme@lists.infradead.org Subject: [PATCH net-next 05/12] mm: Make the page_frag_cache allocator handle __GFP_ZERO itself Date: Wed, 24 May 2023 16:33:04 +0100 Message-Id: <20230524153311.3625329-6-dhowells@redhat.com> In-Reply-To: <20230524153311.3625329-1-dhowells@redhat.com> References: <20230524153311.3625329-1-dhowells@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.10 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Make the page_frag_cache allocator handle __GFP_ZERO itself rather than passing it off to the page allocator. There may be a mix of callers, some specifying __GFP_ZERO and some not - and even if all specify __GFP_ZERO, we might refurbish the page, in which case the returned memory doesn't get cleared. This is a potential bug in the nvme over TCP driver. Signed-off-by: David Howells cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: Jens Axboe cc: Jeroen de Borst cc: Catherine Sullivan cc: Shailend Chand cc: Felix Fietkau cc: John Crispin cc: Sean Wang cc: Mark Lee cc: Lorenzo Bianconi cc: Matthias Brugger cc: AngeloGioacchino Del Regno cc: Keith Busch cc: Jens Axboe cc: Christoph Hellwig cc: Sagi Grimberg cc: Chaitanya Kulkarni cc: Andrew Morton cc: Matthew Wilcox cc: netdev@vger.kernel.org cc: linux-arm-kernel@lists.infradead.org cc: linux-mediatek@lists.infradead.org cc: linux-nvme@lists.infradead.org cc: linux-mm@kvack.org --- mm/page_frag_alloc.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/mm/page_frag_alloc.c b/mm/page_frag_alloc.c index ffd68bfb677d..2b73c7f5d9a9 100644 --- a/mm/page_frag_alloc.c +++ b/mm/page_frag_alloc.c @@ -23,7 +23,10 @@ static struct folio *page_frag_cache_refill(struct page_= frag_cache *nc, gfp_t gfp_mask) { struct folio *folio =3D NULL; - gfp_t gfp =3D gfp_mask; + gfp_t gfp; + + gfp_mask &=3D ~__GFP_ZERO; + gfp =3D gfp_mask; =20 #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) gfp_mask |=3D __GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC; @@ -71,6 +74,7 @@ void *page_frag_alloc_align(struct page_frag_cache *nc, { struct folio *folio =3D nc->folio; size_t offset; + void *p; =20 WARN_ON_ONCE(!is_power_of_2(align)); =20 @@ -133,7 +137,10 @@ void *page_frag_alloc_align(struct page_frag_cache *nc, offset &=3D ~(align - 1); nc->offset =3D offset; =20 - return folio_address(folio) + offset; + p =3D folio_address(folio) + offset; + if (gfp_mask & __GFP_ZERO) + return memset(p, 0, fragsz); + return p; } EXPORT_SYMBOL(page_frag_alloc_align); From nobody Tue Dec 16 11:43:00 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1ECE6C77B73 for ; Wed, 24 May 2023 15:36:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237171AbjEXPgc (ORCPT ); Wed, 24 May 2023 11:36:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33640 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236782AbjEXPfV (ORCPT ); Wed, 24 May 2023 11:35:21 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BC4FAE4A for ; Wed, 24 May 2023 08:34:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1684942427; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=p71nalHI+W3yFqMm+4MkrrRPKSqs6pNOJ/pHSBSTc4w=; b=C5AVIpH7YPyT0C4XJFzDEWeVBBVw6PP1qtwO1i1mZVR1Xu+UI0iNGFJGWK1iFBpcfHjgoT Lbjw8qbP5+iWVnZqMneOpV2GHhgoghB4+A7eb8ZviT2n0HLB6X1WolMsIjDvEwfDWxxt6k 4IeZIb0YjJm3wuHJSawP7Sv8CQkAWII= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-136-3nahLBV-MXSO5tBDJHqSWA-1; Wed, 24 May 2023 11:33:43 -0400 X-MC-Unique: 3nahLBV-MXSO5tBDJHqSWA-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 19058800888; Wed, 24 May 2023 15:33:41 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.39.192.68]) by smtp.corp.redhat.com (Postfix) with ESMTP id 217A440C6CCC; Wed, 24 May 2023 15:33:37 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Willem de Bruijn , David Ahern , Matthew Wilcox , Jens Axboe , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Jeroen de Borst , Catherine Sullivan , Shailend Chand , Felix Fietkau , John Crispin , Sean Wang , Mark Lee , Lorenzo Bianconi , Matthias Brugger , AngeloGioacchino Del Regno , Keith Busch , Jens Axboe , Christoph Hellwig , Sagi Grimberg , Chaitanya Kulkarni , Andrew Morton , linux-arm-kernel@lists.infradead.org, linux-mediatek@lists.infradead.org, linux-nvme@lists.infradead.org Subject: [PATCH net-next 06/12] mm: Make the page_frag_cache allocator use per-cpu Date: Wed, 24 May 2023 16:33:05 +0100 Message-Id: <20230524153311.3625329-7-dhowells@redhat.com> In-Reply-To: <20230524153311.3625329-1-dhowells@redhat.com> References: <20230524153311.3625329-1-dhowells@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Make the page_frag_cache allocator have a separate allocation bucket for each cpu to avoid racing. This means that no lock is required, other than preempt disablement, to allocate from it, though if a softirq wants to access it, then softirq disablement will need to be added. Make the NVMe, mediatek and GVE drivers pass in NULL to page_frag_cache() and use the default allocation buckets rather than defining their own. Signed-off-by: David Howells cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: Jens Axboe cc: Jeroen de Borst cc: Catherine Sullivan cc: Shailend Chand cc: Felix Fietkau cc: John Crispin cc: Sean Wang cc: Mark Lee cc: Lorenzo Bianconi cc: Matthias Brugger cc: AngeloGioacchino Del Regno cc: Keith Busch cc: Jens Axboe cc: Christoph Hellwig cc: Sagi Grimberg cc: Chaitanya Kulkarni cc: Andrew Morton cc: Matthew Wilcox cc: netdev@vger.kernel.org cc: linux-arm-kernel@lists.infradead.org cc: linux-mediatek@lists.infradead.org cc: linux-nvme@lists.infradead.org cc: linux-mm@kvack.org --- drivers/net/ethernet/google/gve/gve.h | 1 - drivers/net/ethernet/google/gve/gve_main.c | 9 - drivers/net/ethernet/google/gve/gve_rx.c | 2 +- drivers/net/ethernet/mediatek/mtk_wed_wo.c | 6 +- drivers/net/ethernet/mediatek/mtk_wed_wo.h | 2 - drivers/nvme/host/tcp.c | 13 +- drivers/nvme/target/tcp.c | 19 +- include/linux/gfp.h | 19 +- mm/page_frag_alloc.c | 202 +++++++++++++-------- net/core/skbuff.c | 32 ++-- 10 files changed, 163 insertions(+), 142 deletions(-) diff --git a/drivers/net/ethernet/google/gve/gve.h b/drivers/net/ethernet/g= oogle/gve/gve.h index 98eb78d98e9f..87244ab911bd 100644 --- a/drivers/net/ethernet/google/gve/gve.h +++ b/drivers/net/ethernet/google/gve/gve.h @@ -250,7 +250,6 @@ struct gve_rx_ring { struct xdp_rxq_info xdp_rxq; struct xdp_rxq_info xsk_rxq; struct xsk_buff_pool *xsk_pool; - struct page_frag_cache page_cache; /* Page cache to allocate XDP frames */ }; =20 /* A TX desc ring entry */ diff --git a/drivers/net/ethernet/google/gve/gve_main.c b/drivers/net/ether= net/google/gve/gve_main.c index 55feab29bed9..9f0fb986d61e 100644 --- a/drivers/net/ethernet/google/gve/gve_main.c +++ b/drivers/net/ethernet/google/gve/gve_main.c @@ -1249,14 +1249,6 @@ static void gve_unreg_xdp_info(struct gve_priv *priv) } } =20 -static void gve_drain_page_cache(struct gve_priv *priv) -{ - int i; - - for (i =3D 0; i < priv->rx_cfg.num_queues; i++) - page_frag_cache_clear(&priv->rx[i].page_cache); -} - static int gve_open(struct net_device *dev) { struct gve_priv *priv =3D netdev_priv(dev); @@ -1340,7 +1332,6 @@ static int gve_close(struct net_device *dev) netif_carrier_off(dev); if (gve_get_device_rings_ok(priv)) { gve_turndown(priv); - gve_drain_page_cache(priv); err =3D gve_destroy_rings(priv); if (err) goto err; diff --git a/drivers/net/ethernet/google/gve/gve_rx.c b/drivers/net/etherne= t/google/gve/gve_rx.c index d1da7413dc4d..7ae8377c394f 100644 --- a/drivers/net/ethernet/google/gve/gve_rx.c +++ b/drivers/net/ethernet/google/gve/gve_rx.c @@ -634,7 +634,7 @@ static int gve_xdp_redirect(struct net_device *dev, str= uct gve_rx_ring *rx, =20 total_len =3D headroom + SKB_DATA_ALIGN(len) + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); - frame =3D page_frag_alloc(&rx->page_cache, total_len, GFP_ATOMIC); + frame =3D page_frag_alloc(NULL, total_len, GFP_ATOMIC); if (!frame) { u64_stats_update_begin(&rx->statss); rx->xdp_alloc_fails++; diff --git a/drivers/net/ethernet/mediatek/mtk_wed_wo.c b/drivers/net/ether= net/mediatek/mtk_wed_wo.c index d90fea2c7d04..859f34447f2f 100644 --- a/drivers/net/ethernet/mediatek/mtk_wed_wo.c +++ b/drivers/net/ethernet/mediatek/mtk_wed_wo.c @@ -143,7 +143,7 @@ mtk_wed_wo_queue_refill(struct mtk_wed_wo *wo, struct m= tk_wed_wo_queue *q, dma_addr_t addr; void *buf; =20 - buf =3D page_frag_alloc(&q->cache, q->buf_size, GFP_ATOMIC); + buf =3D page_frag_alloc(NULL, q->buf_size, GFP_ATOMIC); if (!buf) break; =20 @@ -296,8 +296,6 @@ mtk_wed_wo_queue_tx_clean(struct mtk_wed_wo *wo, struct= mtk_wed_wo_queue *q) skb_free_frag(entry->buf); entry->buf =3D NULL; } - - page_frag_cache_clear(&q->cache); } =20 static void @@ -311,8 +309,6 @@ mtk_wed_wo_queue_rx_clean(struct mtk_wed_wo *wo, struct= mtk_wed_wo_queue *q) =20 skb_free_frag(buf); } - - page_frag_cache_clear(&q->cache); } =20 static void diff --git a/drivers/net/ethernet/mediatek/mtk_wed_wo.h b/drivers/net/ether= net/mediatek/mtk_wed_wo.h index 7a1a2a28f1ac..f69bd83dc486 100644 --- a/drivers/net/ethernet/mediatek/mtk_wed_wo.h +++ b/drivers/net/ethernet/mediatek/mtk_wed_wo.h @@ -211,8 +211,6 @@ struct mtk_wed_wo_queue_entry { struct mtk_wed_wo_queue { struct mtk_wed_wo_queue_regs regs; =20 - struct page_frag_cache cache; - struct mtk_wed_wo_queue_desc *desc; dma_addr_t desc_dma; =20 diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c index dcc35f6bff8c..145cf6186509 100644 --- a/drivers/nvme/host/tcp.c +++ b/drivers/nvme/host/tcp.c @@ -147,8 +147,6 @@ struct nvme_tcp_queue { __le32 exp_ddgst; __le32 recv_ddgst; =20 - struct page_frag_cache pf_cache; - void (*state_change)(struct sock *); void (*data_ready)(struct sock *); void (*write_space)(struct sock *); @@ -482,9 +480,8 @@ static int nvme_tcp_init_request(struct blk_mq_tag_set = *set, struct nvme_tcp_queue *queue =3D &ctrl->queues[queue_idx]; u8 hdgst =3D nvme_tcp_hdgst_len(queue); =20 - req->pdu =3D page_frag_alloc(&queue->pf_cache, - sizeof(struct nvme_tcp_cmd_pdu) + hdgst, - GFP_KERNEL | __GFP_ZERO); + req->pdu =3D page_frag_alloc(NULL, sizeof(struct nvme_tcp_cmd_pdu) + hdgs= t, + GFP_KERNEL | __GFP_ZERO); if (!req->pdu) return -ENOMEM; =20 @@ -1303,9 +1300,8 @@ static int nvme_tcp_alloc_async_req(struct nvme_tcp_c= trl *ctrl) struct nvme_tcp_request *async =3D &ctrl->async_req; u8 hdgst =3D nvme_tcp_hdgst_len(queue); =20 - async->pdu =3D page_frag_alloc(&queue->pf_cache, - sizeof(struct nvme_tcp_cmd_pdu) + hdgst, - GFP_KERNEL | __GFP_ZERO); + async->pdu =3D page_frag_alloc(NULL, sizeof(struct nvme_tcp_cmd_pdu) + hd= gst, + GFP_KERNEL | __GFP_ZERO); if (!async->pdu) return -ENOMEM; =20 @@ -1325,7 +1321,6 @@ static void nvme_tcp_free_queue(struct nvme_ctrl *nct= rl, int qid) if (queue->hdr_digest || queue->data_digest) nvme_tcp_free_crypto(queue); =20 - page_frag_cache_clear(&queue->pf_cache); noreclaim_flag =3D memalloc_noreclaim_save(); sock_release(queue->sock); memalloc_noreclaim_restore(noreclaim_flag); diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c index 984e6ce85dcd..cb352f5d2bbf 100644 --- a/drivers/nvme/target/tcp.c +++ b/drivers/nvme/target/tcp.c @@ -169,8 +169,6 @@ struct nvmet_tcp_queue { =20 struct nvmet_tcp_cmd connect; =20 - struct page_frag_cache pf_cache; - void (*data_ready)(struct sock *); void (*state_change)(struct sock *); void (*write_space)(struct sock *); @@ -1338,25 +1336,25 @@ static int nvmet_tcp_alloc_cmd(struct nvmet_tcp_que= ue *queue, c->queue =3D queue; c->req.port =3D queue->port->nport; =20 - c->cmd_pdu =3D page_frag_alloc(&queue->pf_cache, - sizeof(*c->cmd_pdu) + hdgst, GFP_KERNEL | __GFP_ZERO); + c->cmd_pdu =3D page_frag_alloc(NULL, sizeof(*c->cmd_pdu) + hdgst, + GFP_KERNEL | __GFP_ZERO); if (!c->cmd_pdu) return -ENOMEM; c->req.cmd =3D &c->cmd_pdu->cmd; =20 - c->rsp_pdu =3D page_frag_alloc(&queue->pf_cache, - sizeof(*c->rsp_pdu) + hdgst, GFP_KERNEL | __GFP_ZERO); + c->rsp_pdu =3D page_frag_alloc(NULL, sizeof(*c->rsp_pdu) + hdgst, + GFP_KERNEL | __GFP_ZERO); if (!c->rsp_pdu) goto out_free_cmd; c->req.cqe =3D &c->rsp_pdu->cqe; =20 - c->data_pdu =3D page_frag_alloc(&queue->pf_cache, - sizeof(*c->data_pdu) + hdgst, GFP_KERNEL | __GFP_ZERO); + c->data_pdu =3D page_frag_alloc(NULL, sizeof(*c->data_pdu) + hdgst, + GFP_KERNEL | __GFP_ZERO); if (!c->data_pdu) goto out_free_rsp; =20 - c->r2t_pdu =3D page_frag_alloc(&queue->pf_cache, - sizeof(*c->r2t_pdu) + hdgst, GFP_KERNEL | __GFP_ZERO); + c->r2t_pdu =3D page_frag_alloc(NULL, sizeof(*c->r2t_pdu) + hdgst, + GFP_KERNEL | __GFP_ZERO); if (!c->r2t_pdu) goto out_free_data; =20 @@ -1485,7 +1483,6 @@ static void nvmet_tcp_release_queue_work(struct work_= struct *w) if (queue->hdr_digest || queue->data_digest) nvmet_tcp_free_crypto(queue); ida_free(&nvmet_tcp_queue_ida, queue->idx); - page_frag_cache_clear(&queue->pf_cache); kfree(queue); } =20 diff --git a/include/linux/gfp.h b/include/linux/gfp.h index fa30100f46ad..baa25a00d9e3 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -304,18 +304,19 @@ extern void free_pages(unsigned long addr, unsigned i= nt order); =20 struct page_frag_cache; extern void __page_frag_cache_drain(struct page *page, unsigned int count); -extern void *page_frag_alloc_align(struct page_frag_cache *nc, - unsigned int fragsz, gfp_t gfp_mask, - unsigned int align); - -static inline void *page_frag_alloc(struct page_frag_cache *nc, - unsigned int fragsz, gfp_t gfp_mask) +extern void *page_frag_alloc_align(struct page_frag_cache __percpu *frag_c= ache, + size_t fragsz, gfp_t gfp, + unsigned long align); +extern void *page_frag_memdup(struct page_frag_cache __percpu *frag_cache, + const void *p, size_t fragsz, gfp_t gfp, + unsigned long align); + +static inline void *page_frag_alloc(struct page_frag_cache __percpu *frag_= cache, + size_t fragsz, gfp_t gfp) { - return page_frag_alloc_align(nc, fragsz, gfp_mask, 1); + return page_frag_alloc_align(frag_cache, fragsz, gfp, 1); } =20 -void page_frag_cache_clear(struct page_frag_cache *nc); - extern void page_frag_free(void *addr); =20 #define __free_page(page) __free_pages((page), 0) diff --git a/mm/page_frag_alloc.c b/mm/page_frag_alloc.c index 2b73c7f5d9a9..b035bbb34fac 100644 --- a/mm/page_frag_alloc.c +++ b/mm/page_frag_alloc.c @@ -16,28 +16,25 @@ #include #include =20 +static DEFINE_PER_CPU(struct page_frag_cache, page_frag_default_allocator); + /* * Allocate a new folio for the frag cache. */ -static struct folio *page_frag_cache_refill(struct page_frag_cache *nc, - gfp_t gfp_mask) +static struct folio *page_frag_cache_refill(gfp_t gfp) { - struct folio *folio =3D NULL; - gfp_t gfp; + struct folio *folio; =20 - gfp_mask &=3D ~__GFP_ZERO; - gfp =3D gfp_mask; + gfp &=3D ~__GFP_ZERO; =20 #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) - gfp_mask |=3D __GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC; - folio =3D folio_alloc(gfp_mask, PAGE_FRAG_CACHE_MAX_ORDER); + folio =3D folio_alloc(gfp | __GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALL= OC, + PAGE_FRAG_CACHE_MAX_ORDER); + if (folio) + return folio; #endif - if (unlikely(!folio)) - folio =3D folio_alloc(gfp, 0); =20 - if (folio) - nc->folio =3D folio; - return folio; + return folio_alloc(gfp, 0); } =20 void __page_frag_cache_drain(struct page *page, unsigned int count) @@ -51,63 +48,70 @@ void __page_frag_cache_drain(struct page *page, unsigne= d int count) EXPORT_SYMBOL(__page_frag_cache_drain); =20 /** - * page_frag_cache_clear - Clear out a page fragment cache - * @nc: The cache to clear + * page_frag_alloc_align - Allocate some memory for use in zerocopy + * @frag_cache: The frag cache to use (or NULL for the default) + * @fragsz: The size of the fragment desired + * @gfp: Allocation flags under which to make an allocation + * @align: The required alignment + * + * Allocate some memory for use with zerocopy where protocol bits have to = be + * mixed in with spliced/zerocopied data. Unlike memory allocated from the + * slab, this memory's lifetime is purely dependent on the folio's refcoun= t. + * + * The way it works is that a folio is allocated and fragments are broken = off + * sequentially and returned to the caller with a ref until the folio no l= onger + * has enough spare space - at which point the allocator's ref is dropped = and a + * new folio is allocated. The folio remains in existence until the last = ref + * held by, say, an sk_buff is discarded and then the page is returned to = the + * page allocator. * - * Discard any pages still cached in a page fragment cache. + * Returns a pointer to the memory on success and -ENOMEM on allocation + * failure. + * + * The allocated memory should be disposed of with folio_put(). */ -void page_frag_cache_clear(struct page_frag_cache *nc) -{ - struct folio *folio =3D nc->folio; - - if (folio) { - VM_BUG_ON_FOLIO(folio_ref_count(folio) =3D=3D 0, folio); - folio_put_refs(folio, nc->pagecnt_bias); - nc->folio =3D NULL; - } -} -EXPORT_SYMBOL(page_frag_cache_clear); - -void *page_frag_alloc_align(struct page_frag_cache *nc, - unsigned int fragsz, gfp_t gfp_mask, - unsigned int align) +void *page_frag_alloc_align(struct page_frag_cache __percpu *frag_cache, + size_t fragsz, gfp_t gfp, unsigned long align) { - struct folio *folio =3D nc->folio; + struct page_frag_cache *nc; + struct folio *folio, *spare =3D NULL; size_t offset; void *p; =20 WARN_ON_ONCE(!is_power_of_2(align)); =20 - if (unlikely(!folio)) { -refill: - folio =3D page_frag_cache_refill(nc, gfp_mask); - if (!folio) - return NULL; - - /* Even if we own the page, we do not use atomic_set(). - * This would break get_page_unless_zero() users. - */ - folio_ref_add(folio, PAGE_FRAG_CACHE_MAX_SIZE); + if (!frag_cache) + frag_cache =3D &page_frag_default_allocator; + if (WARN_ON_ONCE(fragsz =3D=3D 0)) + fragsz =3D 1; =20 - /* reset page count bias and offset to start of new frag */ - nc->pfmemalloc =3D folio_is_pfmemalloc(folio); - nc->pagecnt_bias =3D PAGE_FRAG_CACHE_MAX_SIZE + 1; - nc->offset =3D folio_size(folio); + nc =3D get_cpu_ptr(frag_cache); +reload: + folio =3D nc->folio; + offset =3D nc->offset; +try_again: + + /* Make the allocation if there's sufficient space. */ + if (fragsz <=3D offset) { + nc->pagecnt_bias--; + offset =3D (offset - fragsz) & ~(align - 1); + nc->offset =3D offset; + p =3D folio_address(folio) + offset; + put_cpu_ptr(frag_cache); + if (spare) + folio_put(spare); + if (gfp & __GFP_ZERO) + return memset(p, 0, fragsz); + return p; } =20 - offset =3D nc->offset; - if (unlikely(fragsz > offset)) { - /* Reuse the folio if everyone we gave it to has finished with - * it. - */ - if (!folio_ref_sub_and_test(folio, nc->pagecnt_bias)) { - nc->folio =3D NULL; + /* Insufficient space - see if we can refurbish the current folio. */ + if (folio) { + if (!folio_ref_sub_and_test(folio, nc->pagecnt_bias)) goto refill; - } =20 if (unlikely(nc->pfmemalloc)) { __folio_put(folio); - nc->folio =3D NULL; goto refill; } =20 @@ -117,30 +121,56 @@ void *page_frag_alloc_align(struct page_frag_cache *n= c, /* reset page count bias and offset to start of new frag */ nc->pagecnt_bias =3D PAGE_FRAG_CACHE_MAX_SIZE + 1; offset =3D folio_size(folio); - if (unlikely(fragsz > offset)) { - /* - * The caller is trying to allocate a fragment - * with fragsz > PAGE_SIZE but the cache isn't big - * enough to satisfy the request, this may - * happen in low memory conditions. - * We don't release the cache page because - * it could make memory pressure worse - * so we simply return NULL here. - */ - nc->offset =3D offset; + if (unlikely(fragsz > offset)) + goto frag_too_big; + goto try_again; + } + +refill: + if (!spare) { + nc->folio =3D NULL; + put_cpu_ptr(frag_cache); + + spare =3D page_frag_cache_refill(gfp); + if (!spare) return NULL; - } + + nc =3D get_cpu_ptr(frag_cache); + /* We may now be on a different cpu and/or someone else may + * have refilled it + */ + nc->pfmemalloc =3D folio_is_pfmemalloc(spare); + if (nc->folio) + goto reload; } =20 - nc->pagecnt_bias--; - offset -=3D fragsz; - offset &=3D ~(align - 1); + nc->folio =3D spare; + folio =3D spare; + spare =3D NULL; + + /* Even if we own the page, we do not use atomic_set(). This would + * break get_page_unless_zero() users. + */ + folio_ref_add(folio, PAGE_FRAG_CACHE_MAX_SIZE); + + /* Reset page count bias and offset to start of new frag */ + nc->pagecnt_bias =3D PAGE_FRAG_CACHE_MAX_SIZE + 1; + offset =3D folio_size(folio); + goto try_again; + +frag_too_big: + /* + * The caller is trying to allocate a fragment with fragsz > PAGE_SIZE + * but the cache isn't big enough to satisfy the request, this may + * happen in low memory conditions. We don't release the cache page + * because it could make memory pressure worse so we simply return NULL + * here. + */ nc->offset =3D offset; - - p =3D folio_address(folio) + offset; - if (gfp_mask & __GFP_ZERO) - return memset(p, 0, fragsz); - return p; + put_cpu_ptr(frag_cache); + if (spare) + folio_put(spare); + return NULL; } EXPORT_SYMBOL(page_frag_alloc_align); =20 @@ -152,3 +182,25 @@ void page_frag_free(void *addr) folio_put(virt_to_folio(addr)); } EXPORT_SYMBOL(page_frag_free); + +/** + * page_frag_memdup - Allocate a page fragment and duplicate some data int= o it + * @frag_cache: The frag cache to use (or NULL for the default) + * @fragsz: The amount of memory to copy (maximum 1/2 page). + * @p: The source data to copy + * @gfp: Allocation flags under which to make an allocation + * @align_mask: The required alignment + */ +void *page_frag_memdup(struct page_frag_cache __percpu *frag_cache, + const void *p, size_t fragsz, gfp_t gfp, + unsigned long align_mask) +{ + void *q; + + q =3D page_frag_alloc_align(frag_cache, fragsz, gfp, align_mask); + if (!q) + return q; + + return memcpy(q, p, fragsz); +} +EXPORT_SYMBOL(page_frag_memdup); diff --git a/net/core/skbuff.c b/net/core/skbuff.c index cc507433b357..225a16f3713f 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -263,13 +263,13 @@ static void *page_frag_alloc_1k(struct page_frag_1k *= nc, gfp_t gfp_mask) #endif =20 struct napi_alloc_cache { - struct page_frag_cache page; struct page_frag_1k page_small; unsigned int skb_count; void *skb_cache[NAPI_SKB_CACHE_SIZE]; }; =20 static DEFINE_PER_CPU(struct page_frag_cache, netdev_alloc_cache); +static DEFINE_PER_CPU(struct page_frag_cache, napi_frag_cache); static DEFINE_PER_CPU(struct napi_alloc_cache, napi_alloc_cache); =20 /* Double check that napi_get_frags() allocates skbs with @@ -291,11 +291,9 @@ void napi_get_frags_check(struct napi_struct *napi) =20 void *napi_alloc_frag_align(unsigned int fragsz, unsigned int align) { - struct napi_alloc_cache *nc =3D this_cpu_ptr(&napi_alloc_cache); - fragsz =3D SKB_DATA_ALIGN(fragsz); =20 - return page_frag_alloc_align(&nc->page, fragsz, GFP_ATOMIC, align); + return page_frag_alloc_align(&napi_frag_cache, fragsz, GFP_ATOMIC, align); } EXPORT_SYMBOL(napi_alloc_frag_align); =20 @@ -305,15 +303,12 @@ void *netdev_alloc_frag_align(unsigned int fragsz, un= signed int align) =20 fragsz =3D SKB_DATA_ALIGN(fragsz); if (in_hardirq() || irqs_disabled()) { - struct page_frag_cache *nc =3D this_cpu_ptr(&netdev_alloc_cache); - - data =3D page_frag_alloc_align(nc, fragsz, GFP_ATOMIC, align); + data =3D page_frag_alloc_align(&netdev_alloc_cache, + fragsz, GFP_ATOMIC, align); } else { - struct napi_alloc_cache *nc; - local_bh_disable(); - nc =3D this_cpu_ptr(&napi_alloc_cache); - data =3D page_frag_alloc_align(&nc->page, fragsz, GFP_ATOMIC, align); + data =3D page_frag_alloc_align(&napi_frag_cache, + fragsz, GFP_ATOMIC, align); local_bh_enable(); } return data; @@ -691,7 +686,6 @@ EXPORT_SYMBOL(__alloc_skb); struct sk_buff *__netdev_alloc_skb(struct net_device *dev, unsigned int le= n, gfp_t gfp_mask) { - struct page_frag_cache *nc; struct sk_buff *skb; bool pfmemalloc; void *data; @@ -716,14 +710,12 @@ struct sk_buff *__netdev_alloc_skb(struct net_device = *dev, unsigned int len, gfp_mask |=3D __GFP_MEMALLOC; =20 if (in_hardirq() || irqs_disabled()) { - nc =3D this_cpu_ptr(&netdev_alloc_cache); - data =3D page_frag_alloc(nc, len, gfp_mask); - pfmemalloc =3D nc->pfmemalloc; + data =3D page_frag_alloc(&netdev_alloc_cache, len, gfp_mask); + pfmemalloc =3D folio_is_pfmemalloc(virt_to_folio(data)); } else { local_bh_disable(); - nc =3D this_cpu_ptr(&napi_alloc_cache.page); - data =3D page_frag_alloc(nc, len, gfp_mask); - pfmemalloc =3D nc->pfmemalloc; + data =3D page_frag_alloc(&napi_frag_cache, len, gfp_mask); + pfmemalloc =3D folio_is_pfmemalloc(virt_to_folio(data)); local_bh_enable(); } =20 @@ -811,8 +803,8 @@ struct sk_buff *__napi_alloc_skb(struct napi_struct *na= pi, unsigned int len, } else { len =3D SKB_HEAD_ALIGN(len); =20 - data =3D page_frag_alloc(&nc->page, len, gfp_mask); - pfmemalloc =3D nc->page.pfmemalloc; + data =3D page_frag_alloc(&napi_frag_cache, len, gfp_mask); + pfmemalloc =3D folio_is_pfmemalloc(virt_to_folio(data)); } =20 if (unlikely(!data)) From nobody Tue Dec 16 11:43:00 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D0283C77B7A for ; Wed, 24 May 2023 15:36:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237132AbjEXPgY (ORCPT ); Wed, 24 May 2023 11:36:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33536 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236698AbjEXPfM (ORCPT ); Wed, 24 May 2023 11:35:12 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E5EC5E7A for ; Wed, 24 May 2023 08:34:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1684942428; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hgbi0fncQFlNqYJvJIncCzqzDTHwL/0fY7UO/lvli4s=; b=hTPgvlWoInLvn++tgGEbbo5CQe2zyOKlXGZJnFqICvHNGvvqFT0wLnTwNb+rhRHkGgTUFY 6XG3CLTf74Ae1W1e8TJFHzj2e4rSRboLSPjU5M2W5O/WgDHbRAMg3Eq+Mc0RMLFQPDDIxx INSLprQJTn1RfnGkk4xLRKKwy8kuS8w= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-170-IhsFsMjsP6yLutFsBhfOSA-1; Wed, 24 May 2023 11:33:44 -0400 X-MC-Unique: IhsFsMjsP6yLutFsBhfOSA-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 6EDD280027F; Wed, 24 May 2023 15:33:43 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.39.192.68]) by smtp.corp.redhat.com (Postfix) with ESMTP id C2DC4C1ED99; Wed, 24 May 2023 15:33:41 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Willem de Bruijn , David Ahern , Matthew Wilcox , Jens Axboe , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next 07/12] net: Clean up users of netdev_alloc_cache and napi_frag_cache Date: Wed, 24 May 2023 16:33:06 +0100 Message-Id: <20230524153311.3625329-8-dhowells@redhat.com> In-Reply-To: <20230524153311.3625329-1-dhowells@redhat.com> References: <20230524153311.3625329-1-dhowells@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" The users of netdev_alloc_cache and napi_frag_cache don't need to take the bh lock around access to these fragment caches any more as the percpu handling is now done in page_frag_alloc_align(). Signed-off-by: David Howells cc: "David S. Miller" cc: Eric Dumazet cc: Jakub Kicinski cc: Paolo Abeni cc: linux-mm@kvack.org --- include/linux/skbuff.h | 3 ++- net/core/skbuff.c | 29 +++++++++-------------------- 2 files changed, 11 insertions(+), 21 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 41b63e72c6c3..e11a765fe7fa 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -252,7 +252,8 @@ /* Maximum value in skb->csum_level */ #define SKB_MAX_CSUM_LEVEL 3 =20 -#define SKB_DATA_ALIGN(X) ALIGN(X, SMP_CACHE_BYTES) +#define SKB_DATA_ALIGNMENT SMP_CACHE_BYTES +#define SKB_DATA_ALIGN(X) ALIGN(X, SKB_DATA_ALIGNMENT) #define SKB_WITH_OVERHEAD(X) \ ((X) - SKB_DATA_ALIGN(sizeof(struct skb_shared_info))) =20 diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 225a16f3713f..c2840b0dcad9 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -291,27 +291,20 @@ void napi_get_frags_check(struct napi_struct *napi) =20 void *napi_alloc_frag_align(unsigned int fragsz, unsigned int align) { - fragsz =3D SKB_DATA_ALIGN(fragsz); - + align =3D min_t(unsigned int, align, SKB_DATA_ALIGNMENT); return page_frag_alloc_align(&napi_frag_cache, fragsz, GFP_ATOMIC, align); } EXPORT_SYMBOL(napi_alloc_frag_align); =20 void *netdev_alloc_frag_align(unsigned int fragsz, unsigned int align) { - void *data; - - fragsz =3D SKB_DATA_ALIGN(fragsz); - if (in_hardirq() || irqs_disabled()) { - data =3D page_frag_alloc_align(&netdev_alloc_cache, + align =3D min_t(unsigned int, align, SKB_DATA_ALIGNMENT); + if (in_hardirq() || irqs_disabled()) + return page_frag_alloc_align(&netdev_alloc_cache, fragsz, GFP_ATOMIC, align); - } else { - local_bh_disable(); - data =3D page_frag_alloc_align(&napi_frag_cache, + else + return page_frag_alloc_align(&napi_frag_cache, fragsz, GFP_ATOMIC, align); - local_bh_enable(); - } - return data; } EXPORT_SYMBOL(netdev_alloc_frag_align); =20 @@ -709,15 +702,11 @@ struct sk_buff *__netdev_alloc_skb(struct net_device = *dev, unsigned int len, if (sk_memalloc_socks()) gfp_mask |=3D __GFP_MEMALLOC; =20 - if (in_hardirq() || irqs_disabled()) { + if (in_hardirq() || irqs_disabled()) data =3D page_frag_alloc(&netdev_alloc_cache, len, gfp_mask); - pfmemalloc =3D folio_is_pfmemalloc(virt_to_folio(data)); - } else { - local_bh_disable(); + else data =3D page_frag_alloc(&napi_frag_cache, len, gfp_mask); - pfmemalloc =3D folio_is_pfmemalloc(virt_to_folio(data)); - local_bh_enable(); - } + pfmemalloc =3D folio_is_pfmemalloc(virt_to_folio(data)); =20 if (unlikely(!data)) return NULL; From nobody Tue Dec 16 11:43:00 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F3DCEC77B7C for ; Wed, 24 May 2023 15:36:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237121AbjEXPgU (ORCPT ); Wed, 24 May 2023 11:36:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33540 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236323AbjEXPfM (ORCPT ); Wed, 24 May 2023 11:35:12 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 17F5E123 for ; Wed, 24 May 2023 08:34:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1684942430; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=PZf0byHPp2E2QKGUTDTD+TlMjPhCnr7dpojTO1v28qY=; b=jWzDHD7qRb45gP5eAEdveVXJvNr8sK15Oep6MRqqVvJ2Sovbdc5Sd7u6eODXgpIfZzpYO+ SlxXksB1FMqy7MtnvE8BFdOQN8hwzWz5zmTP/DyJITyMqPBNZzRK3je9LvwakkG2VZAwIg qS8l8D5yMR71WoeA6oF+EpBDVcXcgog= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-26-wF7-Cb2KPtKxfSJZIAw0Dg-1; Wed, 24 May 2023 11:33:46 -0400 X-MC-Unique: wF7-Cb2KPtKxfSJZIAw0Dg-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id E46EB811E85; Wed, 24 May 2023 15:33:45 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.39.192.68]) by smtp.corp.redhat.com (Postfix) with ESMTP id 4289240C6EC4; Wed, 24 May 2023 15:33:44 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Willem de Bruijn , David Ahern , Matthew Wilcox , Jens Axboe , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next 08/12] net: Copy slab data for sendmsg(MSG_SPLICE_PAGES) Date: Wed, 24 May 2023 16:33:07 +0100 Message-Id: <20230524153311.3625329-9-dhowells@redhat.com> In-Reply-To: <20230524153311.3625329-1-dhowells@redhat.com> References: <20230524153311.3625329-1-dhowells@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" If sendmsg() is passed MSG_SPLICE_PAGES and is given a buffer that contains some data that's resident in the slab, copy/coalesce it rather than returning EIO. Signed-off-by: David Howells cc: Eric Dumazet cc: "David S. Miller" cc: David Ahern cc: Jakub Kicinski cc: Paolo Abeni cc: Jens Axboe cc: Matthew Wilcox cc: netdev@vger.kernel.org --- include/linux/skbuff.h | 3 +++ net/core/skbuff.c | 33 ++++++++++++++++++++++++++++++--- 2 files changed, 33 insertions(+), 3 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index e11a765fe7fa..11d98990f5f1 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -5084,6 +5084,9 @@ static inline void skb_mark_for_recycle(struct sk_buf= f *skb) #endif } =20 +ssize_t skb_splice_from_iter(struct sk_buff *skb, struct iov_iter *iter, + ssize_t maxsize, gfp_t gfp); + ssize_t skb_splice_from_iter(struct sk_buff *skb, struct iov_iter *iter, ssize_t maxsize, gfp_t gfp); =20 diff --git a/net/core/skbuff.c b/net/core/skbuff.c index c2840b0dcad9..a16499b9942b 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -6927,17 +6927,44 @@ ssize_t skb_splice_from_iter(struct sk_buff *skb, s= truct iov_iter *iter, break; } =20 + if (space =3D=3D 0 && + !skb_can_coalesce(skb, skb_shinfo(skb)->nr_frags, + pages[0], off)) { + iov_iter_revert(iter, len); + break; + } + i =3D 0; do { struct page *page =3D pages[i++]; size_t part =3D min_t(size_t, PAGE_SIZE - off, len); - - ret =3D -EIO; - if (WARN_ON_ONCE(!sendpage_ok(page))) + bool put =3D false; + + if (PageSlab(page)) { + const void *p; + void *q; + + p =3D kmap_local_page(page); + q =3D page_frag_memdup(NULL, p + off, part, gfp, + ULONG_MAX); + kunmap_local(p); + if (!q) { + iov_iter_revert(iter, len); + ret =3D -ENOMEM; + goto out; + } + page =3D virt_to_page(q); + off =3D offset_in_page(q); + put =3D true; + } else if (WARN_ON_ONCE(!sendpage_ok(page))) { + ret =3D -EIO; goto out; + } =20 ret =3D skb_append_pagefrags(skb, page, off, part, frag_limit); + if (put) + put_page(page); if (ret < 0) { iov_iter_revert(iter, len); goto out; From nobody Tue Dec 16 11:43:00 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 10718C77B73 for ; Wed, 24 May 2023 15:36:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237229AbjEXPgn (ORCPT ); Wed, 24 May 2023 11:36:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33790 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236398AbjEXPfZ (ORCPT ); Wed, 24 May 2023 11:35:25 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 16A2413E for ; Wed, 24 May 2023 08:34:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1684942436; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hDpyQQ1JedFG6lTxY3chHyZphdbNgdYt0GxXfK9/WHk=; b=MsotCOIcMt265KW/WcHD8SryM+iPuTMZYn5frv12XWfn7YbGysExOJQtOMlSN8/geXaFrL 4qqJXcmmepvd4TavLJS/xNEDPwpTA9Y18+C0CRaUFOnkPssoYgdVLaD4eTTobLcoatSiFs wIhmQZUp9fpRh18CwQlwmPhuUHTfMSI= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-164-4pleeuJrPNyai-iLRApvog-1; Wed, 24 May 2023 11:33:49 -0400 X-MC-Unique: 4pleeuJrPNyai-iLRApvog-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 9A7B73849528; Wed, 24 May 2023 15:33:48 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.39.192.68]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9B045C164ED; Wed, 24 May 2023 15:33:46 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Willem de Bruijn , David Ahern , Matthew Wilcox , Jens Axboe , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Chuck Lever , Boris Pismenny , John Fastabend Subject: [PATCH net-next 09/12] tls/sw: Support MSG_SPLICE_PAGES Date: Wed, 24 May 2023 16:33:08 +0100 Message-Id: <20230524153311.3625329-10-dhowells@redhat.com> In-Reply-To: <20230524153311.3625329-1-dhowells@redhat.com> References: <20230524153311.3625329-1-dhowells@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Make TLS's sendmsg() support MSG_SPLICE_PAGES. This causes pages to be spliced from the source iterator if possible and copied the data if not. This allows ->sendpage() to be replaced by something that can handle multiple multipage folios in a single transaction. Signed-off-by: David Howells cc: Chuck Lever cc: Boris Pismenny cc: John Fastabend cc: Jakub Kicinski cc: Eric Dumazet cc: "David S. Miller" cc: Paolo Abeni cc: Jens Axboe cc: Matthew Wilcox cc: netdev@vger.kernel.org --- net/tls/tls_sw.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 56 insertions(+), 1 deletion(-) diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index 635b8bf6b937..0ccef8aa9951 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -929,6 +929,49 @@ static int tls_sw_push_pending_record(struct sock *sk,= int flags) &copied, flags); } =20 +static int rls_sw_sendmsg_splice(struct sock *sk, struct msghdr *msg, + struct sk_msg *msg_pl, size_t try_to_copy, + ssize_t *copied) +{ + struct page *page =3D NULL, **pages =3D &page; + + do { + ssize_t part; + size_t off; + bool put =3D false; + + part =3D iov_iter_extract_pages(&msg->msg_iter, &pages, + try_to_copy, 1, 0, &off); + if (part <=3D 0) + return part ?: -EIO; + + if (!sendpage_ok(page)) { + const void *p =3D kmap_local_page(page); + void *q; + + q =3D page_frag_memdup(NULL, p + off, part, + sk->sk_allocation, ULONG_MAX); + kunmap_local(p); + if (!q) { + iov_iter_revert(&msg->msg_iter, part); + return -ENOMEM; + } + page =3D virt_to_page(q); + off =3D offset_in_page(q); + put =3D true; + } + + sk_msg_page_add(msg_pl, page, part, off); + sk_mem_charge(sk, part); + if (put) + put_page(page); + *copied +=3D part; + try_to_copy -=3D part; + } while (try_to_copy && !sk_msg_full(msg_pl)); + + return 0; +} + int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) { long timeo =3D sock_sndtimeo(sk, msg->msg_flags & MSG_DONTWAIT); @@ -1018,6 +1061,17 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *m= sg, size_t size) full_record =3D true; } =20 + if (try_to_copy && (msg->msg_flags & MSG_SPLICE_PAGES)) { + ret =3D rls_sw_sendmsg_splice(sk, msg, msg_pl, + try_to_copy, &copied); + if (ret < 0) + goto send_end; + tls_ctx->pending_open_record_frags =3D true; + if (full_record || eor || sk_msg_full(msg_pl)) + goto copied; + continue; + } + if (!is_kvec && (full_record || eor) && !async_capable) { u32 first =3D msg_pl->sg.end; =20 @@ -1080,8 +1134,9 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *ms= g, size_t size) /* Open records defined only if successfully copied, otherwise * we would trim the sg but not reset the open record frags. */ - tls_ctx->pending_open_record_frags =3D true; copied +=3D try_to_copy; +copied: + tls_ctx->pending_open_record_frags =3D true; if (full_record || eor) { ret =3D bpf_exec_tx_verdict(msg_pl, sk, full_record, record_type, &copied, From nobody Tue Dec 16 11:43:00 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3C2A3C77B73 for ; Wed, 24 May 2023 15:36:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237212AbjEXPgj (ORCPT ); Wed, 24 May 2023 11:36:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33662 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230443AbjEXPfZ (ORCPT ); Wed, 24 May 2023 11:35:25 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A9DE512B for ; Wed, 24 May 2023 08:34:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1684942435; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/e3etHO5quwZ5TcFINBOwsTOV20jtXx9aa4v5VXGvJw=; b=CFqr511ZS72WA1Ald2VELwastPSRF0Pj6qsMLrKecrxT8tl8/ALvKHG8GU6mYE5w+pMAH9 kg7d4Tsj6fh06+TCCGy3QEDrl6sAPmcPNxK1JR2J/K1OojDlhzF+g56uuH1np2/6AeuoBF 1HNDjdOuQQ48yoXNcyaAhJFUo+VdXmw= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-519-Sq90ewIjPbyQ2MCsFbryuA-1; Wed, 24 May 2023 11:33:52 -0400 X-MC-Unique: Sq90ewIjPbyQ2MCsFbryuA-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 74E1D8032F5; Wed, 24 May 2023 15:33:51 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.39.192.68]) by smtp.corp.redhat.com (Postfix) with ESMTP id 74180C164ED; Wed, 24 May 2023 15:33:49 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Willem de Bruijn , David Ahern , Matthew Wilcox , Jens Axboe , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Chuck Lever , Boris Pismenny , John Fastabend Subject: [PATCH net-next 10/12] tls/sw: Convert tls_sw_sendpage() to use MSG_SPLICE_PAGES Date: Wed, 24 May 2023 16:33:09 +0100 Message-Id: <20230524153311.3625329-11-dhowells@redhat.com> In-Reply-To: <20230524153311.3625329-1-dhowells@redhat.com> References: <20230524153311.3625329-1-dhowells@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Convert tls_sw_sendpage() and tls_sw_sendpage_locked() to use sendmsg() with MSG_SPLICE_PAGES rather than directly splicing in the pages itself. [!] Note that tls_sw_sendpage_locked() appears to have the wrong locking upstream. I think the caller will only hold the socket lock, but it should hold tls_ctx->tx_lock too. This allows ->sendpage() to be replaced by something that can handle multiple multipage folios in a single transaction. Signed-off-by: David Howells cc: Chuck Lever cc: Boris Pismenny cc: John Fastabend cc: Jakub Kicinski cc: Eric Dumazet cc: "David S. Miller" cc: Paolo Abeni cc: Jens Axboe cc: Matthew Wilcox cc: netdev@vger.kernel.org --- net/tls/tls_sw.c | 164 +++++++++-------------------------------------- 1 file changed, 30 insertions(+), 134 deletions(-) diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index 0ccef8aa9951..1a5926cc3e84 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -972,7 +972,7 @@ static int rls_sw_sendmsg_splice(struct sock *sk, struc= t msghdr *msg, return 0; } =20 -int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) +static int tls_sw_sendmsg_locked(struct sock *sk, struct msghdr *msg, size= _t size) { long timeo =3D sock_sndtimeo(sk, msg->msg_flags & MSG_DONTWAIT); struct tls_context *tls_ctx =3D tls_get_ctx(sk); @@ -995,15 +995,6 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg= , size_t size) int ret =3D 0; int pending; =20 - if (msg->msg_flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL | - MSG_CMSG_COMPAT)) - return -EOPNOTSUPP; - - ret =3D mutex_lock_interruptible(&tls_ctx->tx_lock); - if (ret) - return ret; - lock_sock(sk); - if (unlikely(msg->msg_controllen)) { ret =3D tls_process_cmsg(sk, msg, &record_type); if (ret) { @@ -1204,157 +1195,62 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr = *msg, size_t size) =20 send_end: ret =3D sk_stream_error(sk, msg->msg_flags, ret); - - release_sock(sk); - mutex_unlock(&tls_ctx->tx_lock); return copied > 0 ? copied : ret; } =20 -static int tls_sw_do_sendpage(struct sock *sk, struct page *page, - int offset, size_t size, int flags) +int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) { - long timeo =3D sock_sndtimeo(sk, flags & MSG_DONTWAIT); struct tls_context *tls_ctx =3D tls_get_ctx(sk); - struct tls_sw_context_tx *ctx =3D tls_sw_ctx_tx(tls_ctx); - struct tls_prot_info *prot =3D &tls_ctx->prot_info; - unsigned char record_type =3D TLS_RECORD_TYPE_DATA; - struct sk_msg *msg_pl; - struct tls_rec *rec; - int num_async =3D 0; - ssize_t copied =3D 0; - bool full_record; - int record_room; - int ret =3D 0; - bool eor; - - eor =3D !(flags & MSG_SENDPAGE_NOTLAST); - sk_clear_bit(SOCKWQ_ASYNC_NOSPACE, sk); - - /* Call the sk_stream functions to manage the sndbuf mem. */ - while (size > 0) { - size_t copy, required_size; - - if (sk->sk_err) { - ret =3D -sk->sk_err; - goto sendpage_end; - } - - if (ctx->open_rec) - rec =3D ctx->open_rec; - else - rec =3D ctx->open_rec =3D tls_get_rec(sk); - if (!rec) { - ret =3D -ENOMEM; - goto sendpage_end; - } - - msg_pl =3D &rec->msg_plaintext; - - full_record =3D false; - record_room =3D TLS_MAX_PAYLOAD_SIZE - msg_pl->sg.size; - copy =3D size; - if (copy >=3D record_room) { - copy =3D record_room; - full_record =3D true; - } - - required_size =3D msg_pl->sg.size + copy + prot->overhead_size; - - if (!sk_stream_memory_free(sk)) - goto wait_for_sndbuf; -alloc_payload: - ret =3D tls_alloc_encrypted_msg(sk, required_size); - if (ret) { - if (ret !=3D -ENOSPC) - goto wait_for_memory; - - /* Adjust copy according to the amount that was - * actually allocated. The difference is due - * to max sg elements limit - */ - copy -=3D required_size - msg_pl->sg.size; - full_record =3D true; - } - - sk_msg_page_add(msg_pl, page, copy, offset); - sk_mem_charge(sk, copy); - - offset +=3D copy; - size -=3D copy; - copied +=3D copy; - - tls_ctx->pending_open_record_frags =3D true; - if (full_record || eor || sk_msg_full(msg_pl)) { - ret =3D bpf_exec_tx_verdict(msg_pl, sk, full_record, - record_type, &copied, flags); - if (ret) { - if (ret =3D=3D -EINPROGRESS) - num_async++; - else if (ret =3D=3D -ENOMEM) - goto wait_for_memory; - else if (ret !=3D -EAGAIN) { - if (ret =3D=3D -ENOSPC) - ret =3D 0; - goto sendpage_end; - } - } - } - continue; -wait_for_sndbuf: - set_bit(SOCK_NOSPACE, &sk->sk_socket->flags); -wait_for_memory: - ret =3D sk_stream_wait_memory(sk, &timeo); - if (ret) { - if (ctx->open_rec) - tls_trim_both_msgs(sk, msg_pl->sg.size); - goto sendpage_end; - } + int ret; =20 - if (ctx->open_rec) - goto alloc_payload; - } + if (msg->msg_flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL | + MSG_CMSG_COMPAT | MSG_SPLICE_PAGES | + MSG_SENDPAGE_NOTLAST | MSG_SENDPAGE_NOPOLICY)) + return -EOPNOTSUPP; =20 - if (num_async) { - /* Transmit if any encryptions have completed */ - if (test_and_clear_bit(BIT_TX_SCHEDULED, &ctx->tx_bitmask)) { - cancel_delayed_work(&ctx->tx_work.work); - tls_tx_records(sk, flags); - } - } -sendpage_end: - ret =3D sk_stream_error(sk, flags, ret); - return copied > 0 ? copied : ret; + ret =3D mutex_lock_interruptible(&tls_ctx->tx_lock); + if (ret) + return ret; + lock_sock(sk); + ret =3D tls_sw_sendmsg_locked(sk, msg, size); + release_sock(sk); + mutex_unlock(&tls_ctx->tx_lock); + return ret; } =20 int tls_sw_sendpage_locked(struct sock *sk, struct page *page, int offset, size_t size, int flags) { + struct bio_vec bvec; + struct msghdr msg =3D { .msg_flags =3D flags | MSG_SPLICE_PAGES, }; + if (flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL | MSG_SENDPAGE_NOTLAST | MSG_SENDPAGE_NOPOLICY | MSG_NO_SHARED_FRAGS)) return -EOPNOTSUPP; + if (flags & MSG_SENDPAGE_NOTLAST) + msg.msg_flags |=3D MSG_MORE; =20 - return tls_sw_do_sendpage(sk, page, offset, size, flags); + bvec_set_page(&bvec, page, size, offset); + iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size); + return tls_sw_sendmsg_locked(sk, &msg, size); } =20 int tls_sw_sendpage(struct sock *sk, struct page *page, int offset, size_t size, int flags) { - struct tls_context *tls_ctx =3D tls_get_ctx(sk); - int ret; + struct bio_vec bvec; + struct msghdr msg =3D { .msg_flags =3D flags | MSG_SPLICE_PAGES, }; =20 if (flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL | MSG_SENDPAGE_NOTLAST | MSG_SENDPAGE_NOPOLICY)) return -EOPNOTSUPP; + if (flags & MSG_SENDPAGE_NOTLAST) + msg.msg_flags |=3D MSG_MORE; =20 - ret =3D mutex_lock_interruptible(&tls_ctx->tx_lock); - if (ret) - return ret; - lock_sock(sk); - ret =3D tls_sw_do_sendpage(sk, page, offset, size, flags); - release_sock(sk); - mutex_unlock(&tls_ctx->tx_lock); - return ret; + bvec_set_page(&bvec, page, size, offset); + iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size); + return tls_sw_sendmsg(sk, &msg, size); } =20 static int From nobody Tue Dec 16 11:43:00 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 65AC4C77B73 for ; Wed, 24 May 2023 15:36:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237242AbjEXPgr (ORCPT ); Wed, 24 May 2023 11:36:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33738 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236623AbjEXPf3 (ORCPT ); Wed, 24 May 2023 11:35:29 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 557DEE6C for ; Wed, 24 May 2023 08:34:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1684942461; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=E1PF2dIIOP/VptjcBGEUgdHYPyQQ9EA3G7wUORmRH+w=; b=WvT7Nf13rQbO8W26d93jhsMwEGqyTG4lh6BXOYOPMKXn8QaRhkj3VLpNAt+cioRAxJuojO ZL9zJODHjSBCf7B+ov+WBOgGaQOz1C4CdRnQU+wbQzfKHjqEmjLUvUwSQ9ODltB9RaFgaC YJH6DxXIxOUCxL1JOIT0XCi37qAxvl4= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-237-qbT634KSMquY-3XAUHekDg-1; Wed, 24 May 2023 11:34:16 -0400 X-MC-Unique: qbT634KSMquY-3XAUHekDg-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 18F5B3C0F24B; Wed, 24 May 2023 15:34:11 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.39.192.68]) by smtp.corp.redhat.com (Postfix) with ESMTP id 2D91C2166B25; Wed, 24 May 2023 15:33:52 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Willem de Bruijn , David Ahern , Matthew Wilcox , Jens Axboe , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Chuck Lever , Boris Pismenny , John Fastabend Subject: [PATCH net-next 11/12] tls/device: Support MSG_SPLICE_PAGES Date: Wed, 24 May 2023 16:33:10 +0100 Message-Id: <20230524153311.3625329-12-dhowells@redhat.com> In-Reply-To: <20230524153311.3625329-1-dhowells@redhat.com> References: <20230524153311.3625329-1-dhowells@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Make TLS's device sendmsg() support MSG_SPLICE_PAGES. This causes pages to be spliced from the source iterator if possible and copied the data if not. This allows ->sendpage() to be replaced by something that can handle multiple multipage folios in a single transaction. Signed-off-by: David Howells cc: Chuck Lever cc: Boris Pismenny cc: John Fastabend cc: Jakub Kicinski cc: Eric Dumazet cc: "David S. Miller" cc: Paolo Abeni cc: Jens Axboe cc: Matthew Wilcox cc: netdev@vger.kernel.org --- net/tls/tls_device.c | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c index daeff54bdbfa..ee07f6e67d52 100644 --- a/net/tls/tls_device.c +++ b/net/tls/tls_device.c @@ -508,7 +508,30 @@ static int tls_push_data(struct sock *sk, tls_append_frag(record, &zc_pfrag, copy); =20 iter_offset.offset +=3D copy; + } else if (copy && (flags & MSG_SPLICE_PAGES)) { + struct page_frag zc_pfrag; + struct page **pages =3D &zc_pfrag.page; + size_t off; + + rc =3D iov_iter_extract_pages(iter_offset.msg_iter, &pages, + copy, 1, 0, &off); + if (rc <=3D 0) { + if (rc =3D=3D 0) + rc =3D -EIO; + goto handle_error; + } + copy =3D rc; + + if (!sendpage_ok(zc_pfrag.page)) { + iov_iter_revert(iter_offset.msg_iter, copy); + goto no_zcopy_this_page; + } + + zc_pfrag.offset =3D off; + zc_pfrag.size =3D copy; + tls_append_frag(record, &zc_pfrag, copy); } else if (copy) { +no_zcopy_this_page: copy =3D min_t(size_t, copy, pfrag->size - pfrag->offset); =20 rc =3D tls_device_copy_data(page_address(pfrag->page) + @@ -571,6 +594,9 @@ int tls_device_sendmsg(struct sock *sk, struct msghdr *= msg, size_t size) union tls_iter_offset iter; int rc; =20 + if (!tls_ctx->zerocopy_sendfile) + msg->msg_flags &=3D ~MSG_SPLICE_PAGES; + mutex_lock(&tls_ctx->tx_lock); lock_sock(sk); From nobody Tue Dec 16 11:43:00 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8129DC77B73 for ; Wed, 24 May 2023 15:36:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237244AbjEXPgv (ORCPT ); Wed, 24 May 2023 11:36:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33968 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236389AbjEXPfg (ORCPT ); Wed, 24 May 2023 11:35:36 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CF9691A8 for ; Wed, 24 May 2023 08:34:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1684942464; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=k9cQP4HmvFAjJY7J8On2WvrNLYKoX68FiRQGhNJuMdw=; b=VkNKzs7FQGBCMESNKhArybTK35MjEh0U0XX5ISzqpif58pqJigtr1S/+6fByGkyzeK6nsI fVlSSNKsV9h3h8jAkRjYlZraYrHSC5RZKj1VxXjNfwPkGtjM/SyHKN9G2Hj1UghcT/e5Me ga1aKUno3vcSCLqM++SHX/GbS64nctE= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-121-TJrS7NBWNQOPag9a_0ZUCA-1; Wed, 24 May 2023 11:34:18 -0400 X-MC-Unique: TJrS7NBWNQOPag9a_0ZUCA-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 712A928078D3; Wed, 24 May 2023 15:34:17 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.39.192.68]) by smtp.corp.redhat.com (Postfix) with ESMTP id DE8F140CFD45; Wed, 24 May 2023 15:34:14 +0000 (UTC) From: David Howells To: netdev@vger.kernel.org Cc: David Howells , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Willem de Bruijn , David Ahern , Matthew Wilcox , Jens Axboe , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Chuck Lever , Boris Pismenny , John Fastabend Subject: [PATCH net-next 12/12] tls/device: Convert tls_device_sendpage() to use MSG_SPLICE_PAGES Date: Wed, 24 May 2023 16:33:11 +0100 Message-Id: <20230524153311.3625329-13-dhowells@redhat.com> In-Reply-To: <20230524153311.3625329-1-dhowells@redhat.com> References: <20230524153311.3625329-1-dhowells@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.1 on 10.11.54.1 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Convert tls_device_sendpage() to use sendmsg() with MSG_SPLICE_PAGES rather than directly splicing in the pages itself. With that, the tls_iter_offset union is no longer necessary and can be replaced with an iov_iter pointer and the zc_page argument to tls_push_data() can also be removed. This allows ->sendpage() to be replaced by something that can handle multiple multipage folios in a single transaction. Signed-off-by: David Howells cc: Chuck Lever cc: Boris Pismenny cc: John Fastabend cc: Jakub Kicinski cc: Eric Dumazet cc: "David S. Miller" cc: Paolo Abeni cc: Jens Axboe cc: Matthew Wilcox cc: netdev@vger.kernel.org --- net/tls/tls_device.c | 81 ++++++++++---------------------------------- 1 file changed, 18 insertions(+), 63 deletions(-) diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c index ee07f6e67d52..f2c895009314 100644 --- a/net/tls/tls_device.c +++ b/net/tls/tls_device.c @@ -422,16 +422,10 @@ static int tls_device_copy_data(void *addr, size_t by= tes, struct iov_iter *i) return 0; } =20 -union tls_iter_offset { - struct iov_iter *msg_iter; - int offset; -}; - static int tls_push_data(struct sock *sk, - union tls_iter_offset iter_offset, + struct iov_iter *iter, size_t size, int flags, - unsigned char record_type, - struct page *zc_page) + unsigned char record_type) { struct tls_context *tls_ctx =3D tls_get_ctx(sk); struct tls_prot_info *prot =3D &tls_ctx->prot_info; @@ -499,21 +493,12 @@ static int tls_push_data(struct sock *sk, record =3D ctx->open_record; =20 copy =3D min_t(size_t, size, max_open_record_len - record->len); - if (copy && zc_page) { - struct page_frag zc_pfrag; - - zc_pfrag.page =3D zc_page; - zc_pfrag.offset =3D iter_offset.offset; - zc_pfrag.size =3D copy; - tls_append_frag(record, &zc_pfrag, copy); - - iter_offset.offset +=3D copy; - } else if (copy && (flags & MSG_SPLICE_PAGES)) { + if (copy && (flags & MSG_SPLICE_PAGES)) { struct page_frag zc_pfrag; struct page **pages =3D &zc_pfrag.page; size_t off; =20 - rc =3D iov_iter_extract_pages(iter_offset.msg_iter, &pages, + rc =3D iov_iter_extract_pages(iter, &pages, copy, 1, 0, &off); if (rc <=3D 0) { if (rc =3D=3D 0) @@ -523,7 +508,7 @@ static int tls_push_data(struct sock *sk, copy =3D rc; =20 if (!sendpage_ok(zc_pfrag.page)) { - iov_iter_revert(iter_offset.msg_iter, copy); + iov_iter_revert(iter, copy); goto no_zcopy_this_page; } =20 @@ -536,7 +521,7 @@ static int tls_push_data(struct sock *sk, =20 rc =3D tls_device_copy_data(page_address(pfrag->page) + pfrag->offset, copy, - iter_offset.msg_iter); + iter); if (rc) goto handle_error; tls_append_frag(record, pfrag, copy); @@ -591,7 +576,6 @@ int tls_device_sendmsg(struct sock *sk, struct msghdr *= msg, size_t size) { unsigned char record_type =3D TLS_RECORD_TYPE_DATA; struct tls_context *tls_ctx =3D tls_get_ctx(sk); - union tls_iter_offset iter; int rc; =20 if (!tls_ctx->zerocopy_sendfile) @@ -606,8 +590,7 @@ int tls_device_sendmsg(struct sock *sk, struct msghdr *= msg, size_t size) goto out; } =20 - iter.msg_iter =3D &msg->msg_iter; - rc =3D tls_push_data(sk, iter, size, msg->msg_flags, record_type, NULL); + rc =3D tls_push_data(sk, &msg->msg_iter, size, msg->msg_flags, record_typ= e); =20 out: release_sock(sk); @@ -618,44 +601,18 @@ int tls_device_sendmsg(struct sock *sk, struct msghdr= *msg, size_t size) int tls_device_sendpage(struct sock *sk, struct page *page, int offset, size_t size, int flags) { - struct tls_context *tls_ctx =3D tls_get_ctx(sk); - union tls_iter_offset iter_offset; - struct iov_iter msg_iter; - char *kaddr; - struct kvec iov; - int rc; + struct bio_vec bvec; + struct msghdr msg =3D { .msg_flags =3D flags | MSG_SPLICE_PAGES, }; =20 if (flags & MSG_SENDPAGE_NOTLAST) - flags |=3D MSG_MORE; - - mutex_lock(&tls_ctx->tx_lock); - lock_sock(sk); + msg.msg_flags |=3D MSG_MORE; =20 - if (flags & MSG_OOB) { - rc =3D -EOPNOTSUPP; - goto out; - } - - if (tls_ctx->zerocopy_sendfile) { - iter_offset.offset =3D offset; - rc =3D tls_push_data(sk, iter_offset, size, - flags, TLS_RECORD_TYPE_DATA, page); - goto out; - } - - kaddr =3D kmap(page); - iov.iov_base =3D kaddr + offset; - iov.iov_len =3D size; - iov_iter_kvec(&msg_iter, ITER_SOURCE, &iov, 1, size); - iter_offset.msg_iter =3D &msg_iter; - rc =3D tls_push_data(sk, iter_offset, size, flags, TLS_RECORD_TYPE_DATA, - NULL); - kunmap(page); + if (flags & MSG_OOB) + return -EOPNOTSUPP; =20 -out: - release_sock(sk); - mutex_unlock(&tls_ctx->tx_lock); - return rc; + bvec_set_page(&bvec, page, size, offset); + iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size); + return tls_device_sendmsg(sk, &msg, size); } =20 struct tls_record_info *tls_get_record(struct tls_offload_context_tx *cont= ext, @@ -720,12 +677,10 @@ EXPORT_SYMBOL(tls_get_record); =20 static int tls_device_push_pending_record(struct sock *sk, int flags) { - union tls_iter_offset iter; - struct iov_iter msg_iter; + struct iov_iter iter; =20 - iov_iter_kvec(&msg_iter, ITER_SOURCE, NULL, 0, 0); - iter.msg_iter =3D &msg_iter; - return tls_push_data(sk, iter, 0, flags, TLS_RECORD_TYPE_DATA, NULL); + iov_iter_kvec(&iter, ITER_SOURCE, NULL, 0, 0); + return tls_push_data(sk, &iter, 0, flags, TLS_RECORD_TYPE_DATA); } =20 void tls_device_write_space(struct sock *sk, struct tls_context *ctx)