From nobody Sat Feb 7 06:13:48 2026 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 55E4A330665; Wed, 4 Feb 2026 19:36:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.145.42 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770233796; cv=none; b=cAeI/tcKarULTZ7RD+iB5COoe9Ne87pDCmBBGJoCr2CzTz4KLca53XQoRxw0a3zWuJBpW5xd8nAowkTDgnOZIjmSskocR89uf5LSH09Glj/n8oG9choq2kfHr1OKeWMqMWUed/1YTAejvy5kecNMpmKcAVwsAqVLpExCMy13xZw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770233796; c=relaxed/simple; bh=nFmE9UGe7Lj9fFOEHrum3LQ7NoDOnAY1+z7k72xNp1I=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=ns2bx5IycTlnBEdqs3f2Mal0fm0XM0dH9DpOUZofQt1om8MbadLk5KT5uczx3/khrYkJUkIfhQ72MtoTGD/CgdV6zqWCptm2BqDEgMtzibaZdgT5K65ud3PvGWfu9QLfCcV5Oc139SeUq2U7mjjWL/eZ1IQaUAenWYjmFM/2CkE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=MY77AI0J; arc=none smtp.client-ip=67.231.145.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="MY77AI0J" Received: from pps.filterd (m0044012.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 614H8Jlk2662589; Wed, 4 Feb 2026 11:36:20 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2025-q2; bh=8LQQ3MZfMmMIzAuMDOAsWoJsKA91PQr21IDGmdLIcaI=; b=MY77AI0JofGj Y7+y/n4HtHQFbr+JwYGPSwCPxe+J2AU8m8pHcadCxj325mj69xIqnhN5fIfbDY8G TbEEry2S67jxz9E5/+UNS9inPMRA4i+jxPNLigq01f9Aq/FfTWjVLhuboYkmE1FY bjPzeNTl3qYmjgkPn1olYgjMjO9QaaJfj3kdSgAP4q+37M4LNeUtAHhh5JJyGbsR zIGiZTGpx8JuKGqnwycMsAj1Fp9PdL/trvfXgUeQZSVBFFC7GIM0c3X6gp16djBC ovdH2EJrAfJZOz3Utgl7FSJuXvphcb+cG7HjZ4fO0dKArRWXYczffiW9ty5IaCEP KofbfbbcYg== Received: from mail.thefacebook.com ([163.114.134.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 4c4a8g1x13-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT); Wed, 04 Feb 2026 11:36:20 -0800 (PST) Received: from localhost (2620:10d:c085:208::f) by mail.thefacebook.com (2620:10d:c08b:78::c78f) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.2562.35; Wed, 4 Feb 2026 19:36:19 +0000 From: Vishwanath Seshagiri To: "Michael S . Tsirkin" , Jason Wang CC: Xuan Zhuo , =?UTF-8?q?Eugenio=20P=C3=A9rez?= , Andrew Lunn , "David S . Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , David Wei , Matteo Croce , Ilias Apalodimas , , , , Subject: [PATCH net-next v4 1/2] virtio_net: add page_pool support for buffer allocation Date: Wed, 4 Feb 2026 11:36:16 -0800 Message-ID: <20260204193617.1200752-2-vishs@meta.com> X-Mailer: git-send-email 2.47.3 In-Reply-To: <20260204193617.1200752-1-vishs@meta.com> References: <20260204193617.1200752-1-vishs@meta.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Proofpoint-GUID: 5MP5MS8d5GHrgO7Xqtpi4_1OHWEdrX01 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwMjA0MDE1MCBTYWx0ZWRfXwnEekY2xGkuZ rxMMLtz7XRDG6rY7HKw+kEctrawGjlfx/Kp8D1+TxVLGXS0K9gwpy8Oni+0AVQEGvkVzdqOFNtG mn3ONyOhIipaTGokC/v/gfDMZMcQx9SROIhrtaAiQ/hpwo1I2mik/41pBFuMMbhfb7jPeWV4glY A69JvOqJCtoxzcj1CU8v2PtzMrrWqn22MLGzH6cKxzXkKzvzDOQHGaS9gbJoMvjPMHwtQ7SkEng Vxjhg7+KzZKk0UeKBB/le60pT1Kc1JgsjCELHC1/PUpCMcPgTttmTdiiay1IZn/WV6EiKPO8vZq rE17LPKLmYK2ozVKc5vMMfLHWaqMHPApN06yhjUo9yFNOKxEdiWOtDBxH5uC0xf/75Elp00DGWZ eKroXoKpUDcmZtTRub6NlWFzbGr+zwFHPeAoMlIzuBDOvE3t21rXA8BIEMxNsnoeKGoNQpCJ403 Ngsb8tnAGUfVaWPDzoA== X-Proofpoint-ORIG-GUID: 5MP5MS8d5GHrgO7Xqtpi4_1OHWEdrX01 X-Authority-Analysis: v=2.4 cv=C6/kCAP+ c=1 sm=1 tr=0 ts=69839fb4 cx=c_pps a=CB4LiSf2rd0gKozIdrpkBw==:117 a=CB4LiSf2rd0gKozIdrpkBw==:17 a=HzLeVaNsDn8A:10 a=VkNPw1HP01LnGYTKEx00:22 a=VabnemYjAAAA:8 a=qdrvl2vHfO7bKPKSEp8A:9 a=gKebqoRLp9LExxC7YDUY:22 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-02-04_06,2026-02-04_01,2025-10-01_01 Content-Type: text/plain; charset="utf-8" Use page_pool for RX buffer allocation in mergeable and small buffer modes to enable page recycling and avoid repeated page allocator calls. skb_mark_for_recycle() enables page reuse in the network stack. Big packets mode is unchanged because it uses page->private for linked list chaining of multiple pages per buffer, which conflicts with page_pool's internal use of page->private. Implement conditional DMA premapping using virtqueue_dma_dev(): - When non-NULL (vhost, virtio-pci): use PP_FLAG_DMA_MAP with page_pool handling DMA mapping, submit via virtqueue_add_inbuf_premapped() - When NULL (VDUSE, direct physical): page_pool handles allocation only, submit via virtqueue_add_inbuf_ctx() This preserves the DMA premapping optimization from commit 31f3cd4e5756b ("virtio-net: rq submits premapped per-buffer") while adding page_pool support as a prerequisite for future zero-copy features (devmem TCP, io_uring ZCRX). Page pools are created in probe and destroyed in remove (not open/close), following existing driver behavior where RX buffers remain in virtqueues across interface state changes. Signed-off-by: Vishwanath Seshagiri --- drivers/net/Kconfig | 1 + drivers/net/virtio_net.c | 351 ++++++++++++++++++++++----------------- 2 files changed, 201 insertions(+), 151 deletions(-) diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig index ac12eaf11755..f1e6b6b0a86f 100644 --- a/drivers/net/Kconfig +++ b/drivers/net/Kconfig @@ -450,6 +450,7 @@ config VIRTIO_NET depends on VIRTIO select NET_FAILOVER select DIMLIB + select PAGE_POOL help This is the virtual network driver for virtio. It can be used with QEMU based VMMs (like KVM or Xen). Say Y or M. diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index db88dcaefb20..74c51e597c3f 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -26,6 +26,7 @@ #include #include #include +#include =20 static int napi_weight =3D NAPI_POLL_WEIGHT; module_param(napi_weight, int, 0444); @@ -359,6 +360,11 @@ struct receive_queue { /* Page frag for packet buffer allocation. */ struct page_frag alloc_frag; =20 + struct page_pool *page_pool; + + /* True if page_pool handles DMA mapping via PP_FLAG_DMA_MAP */ + bool use_page_pool_dma; + /* RX: fragments + linear part + virtio header */ struct scatterlist sg[MAX_SKB_FRAGS + 2]; =20 @@ -521,11 +527,13 @@ static int virtnet_xdp_handler(struct bpf_prog *xdp_p= rog, struct xdp_buff *xdp, struct virtnet_rq_stats *stats); static void virtnet_receive_done(struct virtnet_info *vi, struct receive_q= ueue *rq, struct sk_buff *skb, u8 flags); -static struct sk_buff *virtnet_skb_append_frag(struct sk_buff *head_skb, +static struct sk_buff *virtnet_skb_append_frag(struct receive_queue *rq, + struct sk_buff *head_skb, struct sk_buff *curr_skb, struct page *page, void *buf, int len, int truesize); static void virtnet_xsk_completed(struct send_queue *sq, int num); +static void free_unused_bufs(struct virtnet_info *vi); =20 enum virtnet_xmit_type { VIRTNET_XMIT_TYPE_SKB, @@ -706,15 +714,24 @@ static struct page *get_a_page(struct receive_queue *= rq, gfp_t gfp_mask) return p; } =20 +static void virtnet_put_page(struct receive_queue *rq, struct page *page, + bool allow_direct) +{ + if (page_pool_page_is_pp(page)) + page_pool_put_page(rq->page_pool, page, -1, allow_direct); + else + put_page(page); +} + static void virtnet_rq_free_buf(struct virtnet_info *vi, struct receive_queue *rq, void *buf) { if (vi->mergeable_rx_bufs) - put_page(virt_to_head_page(buf)); + virtnet_put_page(rq, virt_to_head_page(buf), false); else if (vi->big_packets) give_pages(rq, buf); else - put_page(virt_to_head_page(buf)); + virtnet_put_page(rq, virt_to_head_page(buf), false); } =20 static void enable_rx_mode_work(struct virtnet_info *vi) @@ -877,9 +894,12 @@ static struct sk_buff *page_to_skb(struct virtnet_info= *vi, if (unlikely(!skb)) return NULL; =20 - page =3D (struct page *)page->private; - if (page) - give_pages(rq, page); + if (!rq->page_pool) { + page =3D (struct page *)page->private; + if (page) + give_pages(rq, page); + } + goto ok; } =20 @@ -925,7 +945,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info = *vi, hdr =3D skb_vnet_common_hdr(skb); memcpy(hdr, hdr_p, hdr_len); if (page_to_free) - put_page(page_to_free); + virtnet_put_page(rq, page_to_free, true); =20 return skb; } @@ -965,93 +985,10 @@ static void virtnet_rq_unmap(struct receive_queue *rq= , void *buf, u32 len) static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void *= *ctx) { struct virtnet_info *vi =3D rq->vq->vdev->priv; - void *buf; - - BUG_ON(vi->big_packets && !vi->mergeable_rx_bufs); - - buf =3D virtqueue_get_buf_ctx(rq->vq, len, ctx); - if (buf) - virtnet_rq_unmap(rq, buf, *len); - - return buf; -} - -static void virtnet_rq_init_one_sg(struct receive_queue *rq, void *buf, u3= 2 len) -{ - struct virtnet_info *vi =3D rq->vq->vdev->priv; - struct virtnet_rq_dma *dma; - dma_addr_t addr; - u32 offset; - void *head; - - BUG_ON(vi->big_packets && !vi->mergeable_rx_bufs); - - head =3D page_address(rq->alloc_frag.page); - - offset =3D buf - head; - - dma =3D head; - - addr =3D dma->addr - sizeof(*dma) + offset; - - sg_init_table(rq->sg, 1); - sg_fill_dma(rq->sg, addr, len); -} - -static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gf= p) -{ - struct page_frag *alloc_frag =3D &rq->alloc_frag; - struct virtnet_info *vi =3D rq->vq->vdev->priv; - struct virtnet_rq_dma *dma; - void *buf, *head; - dma_addr_t addr; =20 BUG_ON(vi->big_packets && !vi->mergeable_rx_bufs); =20 - head =3D page_address(alloc_frag->page); - - dma =3D head; - - /* new pages */ - if (!alloc_frag->offset) { - if (rq->last_dma) { - /* Now, the new page is allocated, the last dma - * will not be used. So the dma can be unmapped - * if the ref is 0. - */ - virtnet_rq_unmap(rq, rq->last_dma, 0); - rq->last_dma =3D NULL; - } - - dma->len =3D alloc_frag->size - sizeof(*dma); - - addr =3D virtqueue_map_single_attrs(rq->vq, dma + 1, - dma->len, DMA_FROM_DEVICE, 0); - if (virtqueue_map_mapping_error(rq->vq, addr)) - return NULL; - - dma->addr =3D addr; - dma->need_sync =3D virtqueue_map_need_sync(rq->vq, addr); - - /* Add a reference to dma to prevent the entire dma from - * being released during error handling. This reference - * will be freed after the pages are no longer used. - */ - get_page(alloc_frag->page); - dma->ref =3D 1; - alloc_frag->offset =3D sizeof(*dma); - - rq->last_dma =3D dma; - } - - ++dma->ref; - - buf =3D head + alloc_frag->offset; - - get_page(alloc_frag->page); - alloc_frag->offset +=3D size; - - return buf; + return virtqueue_get_buf_ctx(rq->vq, len, ctx); } =20 static void virtnet_rq_unmap_free_buf(struct virtqueue *vq, void *buf) @@ -1067,9 +1004,6 @@ static void virtnet_rq_unmap_free_buf(struct virtqueu= e *vq, void *buf) return; } =20 - if (!vi->big_packets || vi->mergeable_rx_bufs) - virtnet_rq_unmap(rq, buf, 0); - virtnet_rq_free_buf(vi, rq, buf); } =20 @@ -1335,7 +1269,7 @@ static int xsk_append_merge_buffer(struct virtnet_inf= o *vi, =20 truesize =3D len; =20 - curr_skb =3D virtnet_skb_append_frag(head_skb, curr_skb, page, + curr_skb =3D virtnet_skb_append_frag(rq, head_skb, curr_skb, page, buf, len, truesize); if (!curr_skb) { put_page(page); @@ -1771,7 +1705,7 @@ static int virtnet_xdp_xmit(struct net_device *dev, return ret; } =20 -static void put_xdp_frags(struct xdp_buff *xdp) +static void put_xdp_frags(struct receive_queue *rq, struct xdp_buff *xdp) { struct skb_shared_info *shinfo; struct page *xdp_page; @@ -1781,7 +1715,7 @@ static void put_xdp_frags(struct xdp_buff *xdp) shinfo =3D xdp_get_shared_info_from_buff(xdp); for (i =3D 0; i < shinfo->nr_frags; i++) { xdp_page =3D skb_frag_page(&shinfo->frags[i]); - put_page(xdp_page); + virtnet_put_page(rq, xdp_page, true); } } } @@ -1873,7 +1807,7 @@ static struct page *xdp_linearize_page(struct net_dev= ice *dev, if (page_off + *len + tailroom > PAGE_SIZE) return NULL; =20 - page =3D alloc_page(GFP_ATOMIC); + page =3D page_pool_alloc_pages(rq->page_pool, GFP_ATOMIC); if (!page) return NULL; =20 @@ -1897,7 +1831,7 @@ static struct page *xdp_linearize_page(struct net_dev= ice *dev, off =3D buf - page_address(p); =20 if (check_mergeable_len(dev, ctx, buflen)) { - put_page(p); + virtnet_put_page(rq, p, true); goto err_buf; } =20 @@ -1905,21 +1839,21 @@ static struct page *xdp_linearize_page(struct net_d= evice *dev, * is sending packet larger than the MTU. */ if ((page_off + buflen + tailroom) > PAGE_SIZE) { - put_page(p); + virtnet_put_page(rq, p, true); goto err_buf; } =20 memcpy(page_address(page) + page_off, page_address(p) + off, buflen); page_off +=3D buflen; - put_page(p); + virtnet_put_page(rq, p, true); } =20 /* Headroom does not contribute to packet length */ *len =3D page_off - XDP_PACKET_HEADROOM; return page; err_buf: - __free_pages(page, 0); + page_pool_put_page(rq->page_pool, page, -1, true); return NULL; } =20 @@ -1996,7 +1930,7 @@ static struct sk_buff *receive_small_xdp(struct net_d= evice *dev, goto err_xdp; =20 buf =3D page_address(xdp_page); - put_page(page); + virtnet_put_page(rq, page, true); page =3D xdp_page; } =20 @@ -2028,13 +1962,15 @@ static struct sk_buff *receive_small_xdp(struct net= _device *dev, if (metasize) skb_metadata_set(skb, metasize); =20 + skb_mark_for_recycle(skb); + return skb; =20 err_xdp: u64_stats_inc(&stats->xdp_drops); err: u64_stats_inc(&stats->drops); - put_page(page); + virtnet_put_page(rq, page, true); xdp_xmit: return NULL; } @@ -2082,12 +2018,14 @@ static struct sk_buff *receive_small(struct net_dev= ice *dev, } =20 skb =3D receive_small_build_skb(vi, xdp_headroom, buf, len); - if (likely(skb)) + if (likely(skb)) { + skb_mark_for_recycle(skb); return skb; + } =20 err: u64_stats_inc(&stats->drops); - put_page(page); + virtnet_put_page(rq, page, true); return NULL; } =20 @@ -2142,7 +2080,7 @@ static void mergeable_buf_free(struct receive_queue *= rq, int num_buf, } u64_stats_add(&stats->bytes, len); page =3D virt_to_head_page(buf); - put_page(page); + virtnet_put_page(rq, page, true); } } =20 @@ -2253,7 +2191,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_devi= ce *dev, offset =3D buf - page_address(page); =20 if (check_mergeable_len(dev, ctx, len)) { - put_page(page); + virtnet_put_page(rq, page, true); goto err; } =20 @@ -2272,7 +2210,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_devi= ce *dev, return 0; =20 err: - put_xdp_frags(xdp); + put_xdp_frags(rq, xdp); return -EINVAL; } =20 @@ -2337,7 +2275,7 @@ static void *mergeable_xdp_get_buf(struct virtnet_inf= o *vi, if (*len + xdp_room > PAGE_SIZE) return NULL; =20 - xdp_page =3D alloc_page(GFP_ATOMIC); + xdp_page =3D page_pool_alloc_pages(rq->page_pool, GFP_ATOMIC); if (!xdp_page) return NULL; =20 @@ -2347,7 +2285,7 @@ static void *mergeable_xdp_get_buf(struct virtnet_inf= o *vi, =20 *frame_sz =3D PAGE_SIZE; =20 - put_page(*page); + virtnet_put_page(rq, *page, true); =20 *page =3D xdp_page; =20 @@ -2393,6 +2331,8 @@ static struct sk_buff *receive_mergeable_xdp(struct n= et_device *dev, head_skb =3D build_skb_from_xdp_buff(dev, vi, &xdp, xdp_frags_truesz); if (unlikely(!head_skb)) break; + + skb_mark_for_recycle(head_skb); return head_skb; =20 case XDP_TX: @@ -2403,10 +2343,10 @@ static struct sk_buff *receive_mergeable_xdp(struct= net_device *dev, break; } =20 - put_xdp_frags(&xdp); + put_xdp_frags(rq, &xdp); =20 err_xdp: - put_page(page); + virtnet_put_page(rq, page, true); mergeable_buf_free(rq, num_buf, dev, stats); =20 u64_stats_inc(&stats->xdp_drops); @@ -2414,7 +2354,8 @@ static struct sk_buff *receive_mergeable_xdp(struct n= et_device *dev, return NULL; } =20 -static struct sk_buff *virtnet_skb_append_frag(struct sk_buff *head_skb, +static struct sk_buff *virtnet_skb_append_frag(struct receive_queue *rq, + struct sk_buff *head_skb, struct sk_buff *curr_skb, struct page *page, void *buf, int len, int truesize) @@ -2446,7 +2387,7 @@ static struct sk_buff *virtnet_skb_append_frag(struct= sk_buff *head_skb, =20 offset =3D buf - page_address(page); if (skb_can_coalesce(curr_skb, num_skb_frags, page, offset)) { - put_page(page); + virtnet_put_page(rq, page, true); skb_coalesce_rx_frag(curr_skb, num_skb_frags - 1, len, truesize); } else { @@ -2499,6 +2440,8 @@ static struct sk_buff *receive_mergeable(struct net_d= evice *dev, =20 if (unlikely(!curr_skb)) goto err_skb; + + skb_mark_for_recycle(head_skb); while (--num_buf) { buf =3D virtnet_rq_get_buf(rq, &len, &ctx); if (unlikely(!buf)) { @@ -2517,7 +2460,7 @@ static struct sk_buff *receive_mergeable(struct net_d= evice *dev, goto err_skb; =20 truesize =3D mergeable_ctx_to_truesize(ctx); - curr_skb =3D virtnet_skb_append_frag(head_skb, curr_skb, page, + curr_skb =3D virtnet_skb_append_frag(rq, head_skb, curr_skb, page, buf, len, truesize); if (!curr_skb) goto err_skb; @@ -2527,7 +2470,7 @@ static struct sk_buff *receive_mergeable(struct net_d= evice *dev, return head_skb; =20 err_skb: - put_page(page); + virtnet_put_page(rq, page, true); mergeable_buf_free(rq, num_buf, dev, stats); =20 err_buf: @@ -2666,32 +2609,42 @@ static void receive_buf(struct virtnet_info *vi, st= ruct receive_queue *rq, static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue= *rq, gfp_t gfp) { - char *buf; unsigned int xdp_headroom =3D virtnet_get_headroom(vi); void *ctx =3D (void *)(unsigned long)xdp_headroom; int len =3D vi->hdr_len + VIRTNET_RX_PAD + GOOD_PACKET_LEN + xdp_headroom; + unsigned int offset; + struct page *page; + dma_addr_t addr; + char *buf; int err; =20 len =3D SKB_DATA_ALIGN(len) + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); =20 - if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp))) - return -ENOMEM; - - buf =3D virtnet_rq_alloc(rq, len, gfp); - if (unlikely(!buf)) + page =3D page_pool_alloc_frag(rq->page_pool, &offset, len, gfp); + if (unlikely(!page)) return -ENOMEM; =20 + buf =3D page_address(page) + offset; buf +=3D VIRTNET_RX_PAD + xdp_headroom; =20 - virtnet_rq_init_one_sg(rq, buf, vi->hdr_len + GOOD_PACKET_LEN); + if (rq->use_page_pool_dma) { + addr =3D page_pool_get_dma_addr(page) + offset; + addr +=3D VIRTNET_RX_PAD + xdp_headroom; =20 - err =3D virtqueue_add_inbuf_premapped(rq->vq, rq->sg, 1, buf, ctx, gfp); - if (err < 0) { - virtnet_rq_unmap(rq, buf, 0); - put_page(virt_to_head_page(buf)); + sg_init_table(rq->sg, 1); + sg_fill_dma(rq->sg, addr, vi->hdr_len + GOOD_PACKET_LEN); + err =3D virtqueue_add_inbuf_premapped(rq->vq, rq->sg, 1, + buf, ctx, gfp); + } else { + sg_init_one(rq->sg, buf, vi->hdr_len + GOOD_PACKET_LEN); + err =3D virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, + buf, ctx, gfp); } =20 + if (err < 0) + page_pool_put_page(rq->page_pool, virt_to_head_page(buf), + -1, false); return err; } =20 @@ -2764,13 +2717,15 @@ static unsigned int get_mergeable_buf_len(struct re= ceive_queue *rq, static int add_recvbuf_mergeable(struct virtnet_info *vi, struct receive_queue *rq, gfp_t gfp) { - struct page_frag *alloc_frag =3D &rq->alloc_frag; unsigned int headroom =3D virtnet_get_headroom(vi); unsigned int tailroom =3D headroom ? sizeof(struct skb_shared_info) : 0; unsigned int room =3D SKB_DATA_ALIGN(headroom + tailroom); unsigned int len, hole; - void *ctx; + unsigned int offset; + struct page *page; + dma_addr_t addr; char *buf; + void *ctx; int err; =20 /* Extra tailroom is needed to satisfy XDP's assumption. This @@ -2779,18 +2734,14 @@ static int add_recvbuf_mergeable(struct virtnet_inf= o *vi, */ len =3D get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room); =20 - if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp))) - return -ENOMEM; - - if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > a= lloc_frag->size) - len -=3D sizeof(struct virtnet_rq_dma); - - buf =3D virtnet_rq_alloc(rq, len + room, gfp); - if (unlikely(!buf)) + page =3D page_pool_alloc_frag(rq->page_pool, &offset, len + room, gfp); + if (unlikely(!page)) return -ENOMEM; =20 + buf =3D page_address(page) + offset; buf +=3D headroom; /* advance address leaving hole at front of pkt */ - hole =3D alloc_frag->size - alloc_frag->offset; + + hole =3D PAGE_SIZE - (offset + len + room); if (hole < len + room) { /* To avoid internal fragmentation, if there is very likely not * enough space for another buffer, add the remaining space to @@ -2800,18 +2751,27 @@ static int add_recvbuf_mergeable(struct virtnet_inf= o *vi, */ if (!headroom) len +=3D hole; - alloc_frag->offset +=3D hole; } =20 - virtnet_rq_init_one_sg(rq, buf, len); - ctx =3D mergeable_len_to_ctx(len + room, headroom); - err =3D virtqueue_add_inbuf_premapped(rq->vq, rq->sg, 1, buf, ctx, gfp); - if (err < 0) { - virtnet_rq_unmap(rq, buf, 0); - put_page(virt_to_head_page(buf)); + + if (rq->use_page_pool_dma) { + addr =3D page_pool_get_dma_addr(page) + offset; + addr +=3D headroom; + + sg_init_table(rq->sg, 1); + sg_fill_dma(rq->sg, addr, len); + err =3D virtqueue_add_inbuf_premapped(rq->vq, rq->sg, 1, + buf, ctx, gfp); + } else { + sg_init_one(rq->sg, buf, len); + err =3D virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, + buf, ctx, gfp); } =20 + if (err < 0) + page_pool_put_page(rq->page_pool, virt_to_head_page(buf), + -1, false); return err; } =20 @@ -3128,7 +3088,10 @@ static int virtnet_enable_queue_pair(struct virtnet_= info *vi, int qp_index) return err; =20 err =3D xdp_rxq_info_reg_mem_model(&vi->rq[qp_index].xdp_rxq, - MEM_TYPE_PAGE_SHARED, NULL); + vi->rq[qp_index].page_pool ? + MEM_TYPE_PAGE_POOL : + MEM_TYPE_PAGE_SHARED, + vi->rq[qp_index].page_pool); if (err < 0) goto err_xdp_reg_mem_model; =20 @@ -3168,6 +3131,81 @@ static void virtnet_update_settings(struct virtnet_i= nfo *vi) vi->duplex =3D duplex; } =20 +static int virtnet_create_page_pools(struct virtnet_info *vi) +{ + int i, err; + + if (!vi->mergeable_rx_bufs && vi->big_packets) + return 0; + + for (i =3D 0; i < vi->max_queue_pairs; i++) { + struct receive_queue *rq =3D &vi->rq[i]; + struct page_pool_params pp_params =3D { 0 }; + struct device *dma_dev; + + if (rq->page_pool) + continue; + + if (rq->xsk_pool) + continue; + + pp_params.order =3D 0; + pp_params.pool_size =3D virtqueue_get_vring_size(rq->vq); + pp_params.nid =3D dev_to_node(vi->vdev->dev.parent); + pp_params.netdev =3D vi->dev; + pp_params.napi =3D &rq->napi; + + /* Check if backend supports DMA API (e.g., vhost, virtio-pci). + * If so, use page_pool's DMA mapping for premapped buffers. + * Otherwise (e.g., VDUSE), page_pool only handles allocation. + */ + dma_dev =3D virtqueue_dma_dev(rq->vq); + if (dma_dev) { + pp_params.dev =3D dma_dev; + pp_params.flags =3D PP_FLAG_DMA_MAP; + pp_params.dma_dir =3D DMA_FROM_DEVICE; + rq->use_page_pool_dma =3D true; + } else { + pp_params.dev =3D vi->vdev->dev.parent; + pp_params.flags =3D 0; + rq->use_page_pool_dma =3D false; + } + + rq->page_pool =3D page_pool_create(&pp_params); + if (IS_ERR(rq->page_pool)) { + err =3D PTR_ERR(rq->page_pool); + rq->page_pool =3D NULL; + goto err_cleanup; + } + } + return 0; + +err_cleanup: + while (--i >=3D 0) { + struct receive_queue *rq =3D &vi->rq[i]; + + if (rq->page_pool) { + page_pool_destroy(rq->page_pool); + rq->page_pool =3D NULL; + } + } + return err; +} + +static void virtnet_destroy_page_pools(struct virtnet_info *vi) +{ + int i; + + for (i =3D 0; i < vi->max_queue_pairs; i++) { + struct receive_queue *rq =3D &vi->rq[i]; + + if (rq->page_pool) { + page_pool_destroy(rq->page_pool); + rq->page_pool =3D NULL; + } + } +} + static int virtnet_open(struct net_device *dev) { struct virtnet_info *vi =3D netdev_priv(dev); @@ -6441,10 +6479,8 @@ static int virtnet_find_vqs(struct virtnet_info *vi) vi->rq[i].min_buf_len =3D mergeable_min_buf_len(vi, vi->rq[i].vq); vi->sq[i].vq =3D vqs[txq2vq(i)]; } - /* run here: ret =3D=3D 0. */ =20 - err_find: kfree(ctx); err_ctx: @@ -6945,6 +6981,14 @@ static int virtnet_probe(struct virtio_device *vdev) goto free; } =20 + /* Create page pools for receive queues. + * Page pools are created at probe time so they can be used + * with premapped DMA addresses throughout the device lifetime. + */ + err =3D virtnet_create_page_pools(vi); + if (err) + goto free_irq_moder; + #ifdef CONFIG_SYSFS if (vi->mergeable_rx_bufs) dev->sysfs_rx_queue_group =3D &virtio_net_mrg_rx_group; @@ -6958,7 +7002,7 @@ static int virtnet_probe(struct virtio_device *vdev) vi->failover =3D net_failover_create(vi->dev); if (IS_ERR(vi->failover)) { err =3D PTR_ERR(vi->failover); - goto free_vqs; + goto free_page_pools; } } =20 @@ -7075,7 +7119,10 @@ static int virtnet_probe(struct virtio_device *vdev) unregister_netdev(dev); free_failover: net_failover_destroy(vi->failover); -free_vqs: +free_page_pools: + virtnet_destroy_page_pools(vi); +free_irq_moder: + virtnet_free_irq_moder(vi); virtio_reset_device(vdev); free_receive_page_frags(vi); virtnet_del_vqs(vi); @@ -7104,6 +7151,8 @@ static void remove_vq_common(struct virtnet_info *vi) =20 free_receive_page_frags(vi); =20 + virtnet_destroy_page_pools(vi); + virtnet_del_vqs(vi); } =20 --=20 2.47.3 From nobody Sat Feb 7 06:13:48 2026 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4F95D32E137; Wed, 4 Feb 2026 19:36:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.145.42 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770233796; cv=none; b=GLgjsr1V8bATu6+fXhT7xdP9vER1y+2hpKwFtRFehhrVTl9Xbr8Ca0yhRAHIq8CsAM7eRlWlgFVi/x/+M/ECDoZJ4yObUa9peoZItkfNKrcKtfPzXzwFLUnASsN2ApuXNkOiTvwH3ETpB7HPeJEJwz+0Jk+0Azc0wAz01HR74k8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770233796; c=relaxed/simple; bh=lUcJN1jg+zED3Isc1cH6enOy5dSIioTA0MdAfnTcs98=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=P7uZMkKIFyqmGLH0SLYOC8/D7/Eb6hPNkzEBQCG3pxPQuGDhhsUvsZ4yHw3nh5MihlHTOl84yedGnzBN5kcIPDPHt62R9BCUlD2RJGbFvQeH7bUGCkQdXU74f/kIPU004kKmYMfjVZia+U4owcF6tf4jwNbR32sdUnVjP+4QRj4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=geFeJUTW; arc=none smtp.client-ip=67.231.145.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="geFeJUTW" Received: from pps.filterd (m0044012.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 614H8Jll2662589; Wed, 4 Feb 2026 11:36:21 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2025-q2; bh=iPTUI/xKO457I8R5No/8MzqCrUFsTAdriLc9ADEbvic=; b=geFeJUTWFtm0 dlIVGPQ2q7AT+RHnO7e+X19nqoOin9dYVuYggGj2GRDXqR3NDPVcyfFXZSSPX0qZ 0DZHFjoLOBV6EizMLGy0brbzyvQa6b6uQoZglshlQgZ44ashZ19e1LcAlTnnckol iQg+M+6pD/1nGb/aCK3GyEs4IZgRtEBqy0PX9dTqvOleXjG+zMs5ak1/3EnVaHKo 31QDcFZCA3HFOd1uxgcl6yjRqo8TOCuRhPGn7dyRSSYmPxHIkwe4s5/pG2d5mM/I cWigKrh6LBm4dyUTbEVUvQuZVB9Yr6wrlASwqfLyHLcGztDE5c+KqjNjYJv1I1GL ATTnb17yiQ== Received: from mail.thefacebook.com ([163.114.134.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 4c4a8g1x13-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT); Wed, 04 Feb 2026 11:36:21 -0800 (PST) Received: from localhost (2620:10d:c085:108::150d) by mail.thefacebook.com (2620:10d:c08b:78::c78f) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.2562.35; Wed, 4 Feb 2026 19:36:20 +0000 From: Vishwanath Seshagiri To: "Michael S . Tsirkin" , Jason Wang CC: Xuan Zhuo , =?UTF-8?q?Eugenio=20P=C3=A9rez?= , Andrew Lunn , "David S . Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , David Wei , Matteo Croce , Ilias Apalodimas , , , , Subject: [PATCH net-next v4 2/2] selftests: virtio_net: add buffer circulation test Date: Wed, 4 Feb 2026 11:36:17 -0800 Message-ID: <20260204193617.1200752-3-vishs@meta.com> X-Mailer: git-send-email 2.47.3 In-Reply-To: <20260204193617.1200752-1-vishs@meta.com> References: <20260204193617.1200752-1-vishs@meta.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Proofpoint-GUID: FSM9xJ4skG2UokK36IoBzWH_hiFMZWaG X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwMjA0MDE1MCBTYWx0ZWRfX18WPQ0843Mzw 3yYBIc3YQXZ2n9qBRU1WQ35Y51CsxHG3Egyozg3UDlo0A+g5fQ+3zIY1eTEJwVWu7MyJ6wOqH7x 9Qo4fCffG2sXuYC28RMv4PX8BbPG7qjE28R7VRzvUb7bZ7l4cMetvy0Tg3RWp9EzzFIgBEpCXaP xCBNP4bDQp3b1rhUj7LXMhziY9lMC4tHr64Hm57bnRTxJzk1t2b6gyAHFZ5Tx3LUCL0y2kvok/o r0HrZvGg0QSMNdKfq2ZWWhsRgG1Z4fn79iZP9sPXj61TEc1dKCrF5rbECQYl2Ihivo9N1I73gcT 0kFE5eKivtSgyBQeFBlw0cUM2hIDXgM9jSf5TCRIUZPT9I0oNf7uFl9/IA1Uu1w/IECNUNgPjyF XIKsGjdT8zyUeo+Gd6URH+JEUgEm3RGgcIhfh2VduWUS6oC5pZP6FZdahuULS6mJiI6icT2Ei9V AU3hBvs0I7KqM2i1U7w== X-Proofpoint-ORIG-GUID: FSM9xJ4skG2UokK36IoBzWH_hiFMZWaG X-Authority-Analysis: v=2.4 cv=C6/kCAP+ c=1 sm=1 tr=0 ts=69839fb5 cx=c_pps a=CB4LiSf2rd0gKozIdrpkBw==:117 a=CB4LiSf2rd0gKozIdrpkBw==:17 a=HzLeVaNsDn8A:10 a=VkNPw1HP01LnGYTKEx00:22 a=VabnemYjAAAA:8 a=LsufUKy5yQhhIYM6NHsA:9 a=gKebqoRLp9LExxC7YDUY:22 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-02-04_06,2026-02-04_01,2025-10-01_01 Content-Type: text/plain; charset="utf-8" Add iperf3-based test to verify RX buffer handling under load. Optionally logs page_pool tracepoints when available. Signed-off-by: Vishwanath Seshagiri --- .../drivers/net/virtio_net/basic_features.sh | 86 +++++++++++++++++++ 1 file changed, 86 insertions(+) diff --git a/tools/testing/selftests/drivers/net/virtio_net/basic_features.= sh b/tools/testing/selftests/drivers/net/virtio_net/basic_features.sh index cf8cf816ed48..fa98505c4674 100755 --- a/tools/testing/selftests/drivers/net/virtio_net/basic_features.sh +++ b/tools/testing/selftests/drivers/net/virtio_net/basic_features.sh @@ -6,6 +6,7 @@ ALL_TESTS=3D" initial_ping_test f_mac_test + buffer_circulation_test " =20 source virtio_net_common.sh @@ -16,6 +17,8 @@ source "$lib_dir"/../../../net/forwarding/lib.sh h1=3D${NETIFS[p1]} h2=3D${NETIFS[p2]} =20 +IPERF_SERVER_PID=3D"" + h1_create() { simple_if_init $h1 $H1_IPV4/24 $H1_IPV6/64 @@ -83,6 +86,84 @@ f_mac_test() log_test "$test_name" } =20 +buffer_circulation_test() +{ + RET=3D0 + local test_name=3D"buffer circulation" + local tracefs=3D"/sys/kernel/tracing" + + if ! check_command iperf3; then + log_test_skip "$test_name" "iperf3 not installed" + return 0 + fi + + setup_cleanup + setup_prepare + + ping -c 1 -I "$h1" "$H2_IPV4" >/dev/null + if [ $? -ne 0 ]; then + check_err 1 "Ping failed" + log_test "$test_name" + return + fi + + local rx_start=3D$(cat /sys/class/net/"$h2"/statistics/rx_packets) + local tx_start=3D$(cat /sys/class/net/"$h1"/statistics/tx_packets) + + if [ -d "$tracefs/events/page_pool" ]; then + echo > "$tracefs/trace" + echo 1 > "$tracefs/events/page_pool/enable" + fi + + local port=3D$(shuf -i 49152-65535 -n 1) + + iperf3 -s -1 --bind-dev "$h2" -p "$port" &>/dev/null & + IPERF_SERVER_PID=3D$! + sleep 1 + + if ! kill -0 "$IPERF_SERVER_PID" 2>/dev/null; then + IPERF_SERVER_PID=3D"" + if [ -d "$tracefs/events/page_pool" ]; then + echo 0 > "$tracefs/events/page_pool/enable" + fi + check_err 1 "iperf3 server died" + log_test "$test_name" + return + fi + + iperf3 -c "$H2_IPV4" --bind-dev "$h1" -p "$port" -t 5 >/dev/null 2>&1 + local iperf_ret=3D$? + + if [ -n "$IPERF_SERVER_PID" ]; then + kill "$IPERF_SERVER_PID" 2>/dev/null || true + wait "$IPERF_SERVER_PID" 2>/dev/null || true + IPERF_SERVER_PID=3D"" + fi + + if [ -d "$tracefs/events/page_pool" ]; then + echo 0 > "$tracefs/events/page_pool/enable" + local trace=3D"$tracefs/trace" + local hold=3D$(grep -c "page_pool_state_hold" "$trace" 2>/dev/null) + local release=3D$(grep -c "page_pool_state_release" "$trace" 2>/dev/null) + log_info "page_pool events: hold=3D${hold:-0}, release=3D${release:-0}" + fi + + local rx_end=3D$(cat /sys/class/net/"$h2"/statistics/rx_packets) + local tx_end=3D$(cat /sys/class/net/"$h1"/statistics/tx_packets) + local rx_delta=3D$((rx_end - rx_start)) + local tx_delta=3D$((tx_end - tx_start)) + + log_info "Circulated TX:$tx_delta RX:$rx_delta" + + if [ "$iperf_ret" -ne 0 ]; then + check_err 1 "iperf3 failed" + elif [ "$rx_delta" -lt 10000 ]; then + check_err 1 "Too few packets: $rx_delta" + fi + + log_test "$test_name" +} + setup_prepare() { virtio_device_rebind $h1 @@ -113,6 +194,11 @@ setup_cleanup() =20 cleanup() { + if [ -n "$IPERF_SERVER_PID" ]; then + kill "$IPERF_SERVER_PID" 2>/dev/null || true + wait "$IPERF_SERVER_PID" 2>/dev/null || true + fi + pre_cleanup setup_cleanup } --=20 2.47.3