From nobody Fri Dec 19 21:02:35 2025 Received: from mail-pf1-f174.google.com (mail-pf1-f174.google.com [209.85.210.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ABC3C27FB05; Tue, 27 May 2025 16:19:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.174 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1748362777; cv=none; b=t7mDCQEP2oFUnAExHYF2FcJyAF2eagjNL7twvhD8w0qjejdxMG20i2t7RE00FY/tUp1A/svj+DgFRAp/YZ1SJgAsaMZwnmNyKxatEhZutfUI0ykXqSAJPY6/yXq8Y7E2e5DegByH4ZmD2YEb95ZM5ZIK4x3b0lFS0CkIK5peWrc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1748362777; c=relaxed/simple; bh=IUzh3BLEMlZQ5rPvOCcCqz3khKM+z1S4yWpg+KfuQZM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=e8mpGIS7cDStob6qBYk3Idd8AKlNrYFmflnmQJNxd6CkgnQRtvZN4OdVefNbi2HosXt3psDoMrVd9eXXXOrSgmOlhJLAOfouKwtNAiHrWcAEfajaKvsgvHugTJ7exPzeBxJZw0+6lIvH9MkvU1KWnCisAQX0ssyW4vuPesKW21U= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=mJxrjPk4; arc=none smtp.client-ip=209.85.210.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="mJxrjPk4" Received: by mail-pf1-f174.google.com with SMTP id d2e1a72fcca58-7399a2dc13fso3976298b3a.2; Tue, 27 May 2025 09:19:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1748362774; x=1748967574; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=FyumaTW6lo7mIeb7nd+vfxQ27K0rqMNzwb6ZVHOGqO4=; b=mJxrjPk4ZW69TkirA6bn83PoyXJNJDwC10dNW5vFzDPuri76fFE7D+vKutJr9cTWEk xFMwaoSjBM+9O70Ht/fmUJKtIPnb9IR3Nc1h1bUCA4rHeWjk5W6+a/6dr0hh0TOWkxTW 5rRCVFNUTZY22sU8UWZURCzGVDjJ6QoL1WpIS7a8/FuUotHxXcsF8vkANCp+eaYi7vyz KkQ7zsDa+mYVbbfBTpSi99MO/Nj1vJkGYq05jOU3RumawuHw0HZxSNQRw5wmlOSqRzTP NmEeEdQ56n2V++Nnc01okZibl3EtSHplzqqQ84vPiBrvC6Z2+2rDOC3bQ0XGFAtPhxjZ LLRw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1748362774; x=1748967574; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=FyumaTW6lo7mIeb7nd+vfxQ27K0rqMNzwb6ZVHOGqO4=; b=lEuuz92nNeHM32IFf+IfXUEL5BhxO4KBhEgYABJoVr0iWWIdiupw4Jpz2gL48X0ojB 6xmbuNClFC2Cg38j+fpBe+BHO0H5o9fcPrmnYrjnoJd2GZjIXmKOMgnkv3zTLB+mmQNg nBuXOt7V831xhLVrChe0ralfwvBiPw9Z/hx46ildPEsgEzV9kmUsrIyW7tlupPJTo5Vp N8i0t60CTrW5my8JmrqSUTGZ9lsQw1vmnXE6CN8h49Jx8zEw+xlTI2/f+sXa/mzEU2lK 55JMEuJ1EweIWOwb2xkkRQqNxzxl8neLy0dxP8A7sEsn3PygriQMkWjK7VnGob1t9Gtw OqKg== X-Forwarded-Encrypted: i=1; AJvYcCW84XOrXekdaNZhIKZhAyuvDY0dbCqDfbtcDcMERrFp8Wnau5muTPNWMEYC2ocrGYV6jBCSFMp6CjwNQSkq@vger.kernel.org, AJvYcCXazNbBjYsA0rlR7+zHf0jAdX+D4vjfUCtG2RbhK0qznrr3SYEN9eanrG4ZsoJSvienKa8=@vger.kernel.org X-Gm-Message-State: AOJu0YxeB723Fj9ZsINSQeT9ixvKmTTTg+XWx1z92SSwsLCWxdVZF9x6 iJCrMTDMe7S+gDbigVIeV+9197C6I6i1805CtRhGh3C5jobAhN9k5q1Ogp5Ae798n6A= X-Gm-Gg: ASbGncswYXdf7ZJSLNW53qI0yJv/8GT4kOYfjqhCz8Wzhf/gS4WBjp9bjbvyl+spaRz lM9PpaT/Uc6cwzRoe/5rVdkqMeaSyLiI//+DKsIFXbQD22JbLGn9r3wcnxd1weezTjbQl6iU2Yw L8QbvFpATvL0ncqGXyE2rK3AcjmoRPAidhn8OKBNTFJmYOrr/0t4ZVQtOmiNaqxZE7pFzhdf2kk TzcftriGylU2ouvV8kwYNffZNwLj7ajjB45fxutJQi6E0FLW54CoPBzgelBIqccxT48SB9xtg65 /yfzfY2yRw5cApydvMBbW4u5YwOUKCDcOFXv/wEiyH1K8sL8iMmRvYchskRkMPLLhEk= X-Google-Smtp-Source: AGHT+IHgfUjhe5piUBHcuQNK/0nrtm0tXgcRsbtKkH28640CNrWGf0iRv7DTvT+hKCAanahN8yWIzA== X-Received: by 2002:a05:6a21:329a:b0:215:d611:5d9b with SMTP id adf61e73a8af0-2188c240698mr21613605637.12.1748362774298; Tue, 27 May 2025 09:19:34 -0700 (PDT) Received: from minh.192.168.1.1 ([2001:ee0:4f0e:fb30:52e0:fc81:ee8a:bb3f]) by smtp.googlemail.com with ESMTPSA id d2e1a72fcca58-7462aefb414sm1118121b3a.34.2025.05.27.09.19.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 27 May 2025 09:19:33 -0700 (PDT) From: Bui Quang Minh To: netdev@vger.kernel.org Cc: "Michael S. Tsirkin" , Jason Wang , Xuan Zhuo , =?UTF-8?q?Eugenio=20P=C3=A9rez?= , Andrew Lunn , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , virtualization@lists.linux.dev, linux-kernel@vger.kernel.org, bpf@vger.kernel.org, Bui Quang Minh Subject: [RFC PATCH net-next v2 1/2] virtio-net: support zerocopy multi buffer XDP in mergeable Date: Tue, 27 May 2025 23:19:03 +0700 Message-ID: <20250527161904.75259-2-minhquangbui99@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250527161904.75259-1-minhquangbui99@gmail.com> References: <20250527161904.75259-1-minhquangbui99@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Currently, in zerocopy mode with mergeable receive buffer, virtio-net does not support multi buffer but a single buffer only. This commit adds support for multi mergeable receive buffer in the zerocopy XDP path by utilizing XDP buffer with frags. Signed-off-by: Bui Quang Minh Tested-by: Lei Yang --- drivers/net/virtio_net.c | 123 +++++++++++++++++++++------------------ 1 file changed, 66 insertions(+), 57 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index e53ba600605a..a9558650f205 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -45,6 +45,8 @@ module_param(napi_tx, bool, 0644); #define VIRTIO_XDP_TX BIT(0) #define VIRTIO_XDP_REDIR BIT(1) =20 +#define VIRTNET_MAX_ZC_SEGS 8 + /* RX packet size EWMA. The average packet size is used to determine the p= acket * buffer size when refilling RX rings. As the entire RX ring may be refil= led * at once, the weight is chosen so that the EWMA will be insensitive to s= hort- @@ -1232,65 +1234,53 @@ static void xsk_drop_follow_bufs(struct net_device = *dev, } } =20 -static int xsk_append_merge_buffer(struct virtnet_info *vi, - struct receive_queue *rq, - struct sk_buff *head_skb, - u32 num_buf, - struct virtio_net_hdr_mrg_rxbuf *hdr, - struct virtnet_rq_stats *stats) +static int virtnet_build_xsk_buff_mrg(struct virtnet_info *vi, + struct receive_queue *rq, + u32 num_buf, + struct xdp_buff *xdp, + struct virtnet_rq_stats *stats) { - struct sk_buff *curr_skb; - struct xdp_buff *xdp; - u32 len, truesize; - struct page *page; + unsigned int len; void *buf; =20 - curr_skb =3D head_skb; + if (num_buf < 2) + return 0; + + while (num_buf > 1) { + struct xdp_buff *new_xdp; =20 - while (--num_buf) { buf =3D virtqueue_get_buf(rq->vq, &len); - if (unlikely(!buf)) { - pr_debug("%s: rx error: %d buffers out of %d missing\n", - vi->dev->name, num_buf, - virtio16_to_cpu(vi->vdev, - hdr->num_buffers)); + if (!unlikely(buf)) { + pr_debug("%s: rx error: %d buffers missing\n", + vi->dev->name, num_buf); DEV_STATS_INC(vi->dev, rx_length_errors); - return -EINVAL; - } - - u64_stats_add(&stats->bytes, len); - - xdp =3D buf_to_xdp(vi, rq, buf, len); - if (!xdp) - goto err; - - buf =3D napi_alloc_frag(len); - if (!buf) { - xsk_buff_free(xdp); - goto err; + return -1; } =20 - memcpy(buf, xdp->data - vi->hdr_len, len); - - xsk_buff_free(xdp); + new_xdp =3D buf_to_xdp(vi, rq, buf, len); + if (!new_xdp) + goto drop_bufs; =20 - page =3D virt_to_page(buf); + /* In virtnet_add_recvbuf_xsk(), we ask the host to fill from + * xdp->data - vi->hdr_len with both virtio_net_hdr and data. + * However, only the first packet has the virtio_net_hdr, the + * following ones do not. So we need to adjust the following + * packets' data pointer to the correct place. + */ + new_xdp->data -=3D vi->hdr_len; + new_xdp->data_end =3D new_xdp->data + len; =20 - truesize =3D len; + if (!xsk_buff_add_frag(xdp, new_xdp)) + goto drop_bufs; =20 - curr_skb =3D virtnet_skb_append_frag(head_skb, curr_skb, page, - buf, len, truesize); - if (!curr_skb) { - put_page(page); - goto err; - } + num_buf--; } =20 return 0; =20 -err: +drop_bufs: xsk_drop_follow_bufs(vi->dev, rq, num_buf, stats); - return -EINVAL; + return -1; } =20 static struct sk_buff *virtnet_receive_xsk_merge(struct net_device *dev, s= truct virtnet_info *vi, @@ -1307,23 +1297,42 @@ static struct sk_buff *virtnet_receive_xsk_merge(st= ruct net_device *dev, struct num_buf =3D virtio16_to_cpu(vi->vdev, hdr->num_buffers); =20 ret =3D XDP_PASS; + if (virtnet_build_xsk_buff_mrg(vi, rq, num_buf, xdp, stats)) + goto drop; + rcu_read_lock(); prog =3D rcu_dereference(rq->xdp_prog); - /* TODO: support multi buffer. */ - if (prog && num_buf =3D=3D 1) - ret =3D virtnet_xdp_handler(prog, xdp, dev, xdp_xmit, stats); + if (prog) { + /* We are in zerocopy mode so we cannot copy the multi-buffer + * xdp buff to a single linear xdp buff. If we do so, in case + * the BPF program decides to redirect to a XDP socket (XSK), + * it will trigger the zerocopy receive logic in XDP socket. + * The receive logic thinks it receives zerocopy buffer while + * in fact, it is the copy one and everything is messed up. + * So just drop the packet here if we have a multi-buffer xdp + * buff and the BPF program does not support it. + */ + if (xdp_buff_has_frags(xdp) && !prog->aux->xdp_has_frags) + ret =3D XDP_DROP; + else + ret =3D virtnet_xdp_handler(prog, xdp, dev, xdp_xmit, + stats); + } rcu_read_unlock(); =20 switch (ret) { case XDP_PASS: - skb =3D xsk_construct_skb(rq, xdp); + skb =3D xdp_build_skb_from_zc(xdp); if (!skb) - goto drop_bufs; + break; =20 - if (xsk_append_merge_buffer(vi, rq, skb, num_buf, hdr, stats)) { - dev_kfree_skb(skb); - goto drop; - } + /* Later, in virtnet_receive_done(), eth_type_trans() + * is called. However, in xdp_build_skb_from_zc(), it is called + * already. As a result, we need to reset the data to before + * the mac header so that the later call in + * virtnet_receive_done() works correctly. + */ + skb_push(skb, ETH_HLEN); =20 return skb; =20 @@ -1332,14 +1341,11 @@ static struct sk_buff *virtnet_receive_xsk_merge(st= ruct net_device *dev, struct return NULL; =20 default: - /* drop packet */ - xsk_buff_free(xdp); + break; } =20 -drop_bufs: - xsk_drop_follow_bufs(dev, rq, num_buf, stats); - drop: + xsk_buff_free(xdp); u64_stats_inc(&stats->drops); return NULL; } @@ -1396,6 +1402,8 @@ static int virtnet_add_recvbuf_xsk(struct virtnet_inf= o *vi, struct receive_queue return -ENOMEM; =20 len =3D xsk_pool_get_rx_frame_size(pool) + vi->hdr_len; + /* Reserve some space for skb_shared_info */ + len -=3D SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); =20 for (i =3D 0; i < num; ++i) { /* Use the part of XDP_PACKET_HEADROOM as the virtnet hdr space. @@ -6734,6 +6742,7 @@ static int virtnet_probe(struct virtio_device *vdev) dev->netdev_ops =3D &virtnet_netdev; dev->stat_ops =3D &virtnet_stat_ops; dev->features =3D NETIF_F_HIGHDMA; + dev->xdp_zc_max_segs =3D VIRTNET_MAX_ZC_SEGS; =20 dev->ethtool_ops =3D &virtnet_ethtool_ops; SET_NETDEV_DEV(dev, &vdev->dev); --=20 2.43.0 From nobody Fri Dec 19 21:02:35 2025 Received: from mail-pf1-f182.google.com (mail-pf1-f182.google.com [209.85.210.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6B24427F72C; Tue, 27 May 2025 16:19:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1748362785; cv=none; b=Ap5tTun0M+6jJRxpOQdE2PABJ4+2BZd1HS0Ystepd+5Zh1R6x+l256SYNqYAiCqEwQqm5qqt9yyIarb2TD4oSWBN2nqqUpYBG614rXEhFrpSmM5bI2cn3zBLMhLq1oanb+g4oLhtbBsAN+M+Tid2vaTzg1P7sy6KKaR/ZMpZiNg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1748362785; c=relaxed/simple; bh=GxJnCoCywSfj04E5CZqm/ZtgXqq8qJEOhMHdpE4tz0I=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ovVcAllsaOxlv5Xa/RMerP8tXfSM5nWOyL4/WF5FoZLDiDs4oByVc9ba99McGtAjS+9ePLGSzszSQE6gvnsfS2RyEcIgiEo4WnyO8rIFoUZKDbDrc0e30ebcxDm8be+Iw9U4j6xAI3hC8/rsgUAaQDdYei3TwzIFVXLQy8wKKYg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Lp4gAHip; arc=none smtp.client-ip=209.85.210.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Lp4gAHip" Received: by mail-pf1-f182.google.com with SMTP id d2e1a72fcca58-73bf5aa95e7so2524536b3a.1; Tue, 27 May 2025 09:19:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1748362781; x=1748967581; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=90R6A++gtPZWq67DZypChaSVtDu+EZq/m8b2334jUgI=; b=Lp4gAHip5qU5A+t6zbORQVPryjxWqfay5KzZywGFnDznMgJUWZhhaRXQMGyi2OlcNs 18tULon7QPrz9KdkBqq6RDSYBkptSWaESN+28gAaXSF0G7BeSDTigNRsz+RbQf41SMwD c7d8rQw6Jxw/J3IOt8qyub9MzDWjwqRw6OuoN1fCaxThsz4rNu+TkN7U1Ifmobca8W2e 68tClmFOaRYHR/8ZLtkcvAOkpiwoBGK5gy1fDrHdqaIRraAokRdmuwXqSC20nFMish3H 7WduUpXfxvJgy7veyfETk7SBvYDwsQTKwsIgqHQtj74F1IlJoDmzLNoN22OdgPOijHpS n+Sg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1748362781; x=1748967581; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=90R6A++gtPZWq67DZypChaSVtDu+EZq/m8b2334jUgI=; b=NWxpPXls7BQ86C+Y6JhT6QAv1AF4qPbL0rpnyWED605vvVpCSA3XRKMCPWt+d1yg9i CnIIz1TvuCyavM6ZYvMlZAeytqfCLeURlreU7hWNvmqNdGD938t2n0xVJt8pwojk7nmd tN2UwE60myB9TfiqQoGCq/5JwXILAhLnBn7tbmC07sBiIlgWjAzjep4ioQBZe4rMm4Ff 97ZNXmqrx6PMnKxGqICZ6pUtM1h2wFcYLfVPl/JIG7CJOqaZJ8nPAtWDh9BLN0vI7SeU U/UBHOIYoe/AOrW0fzvZkJoOM8skidc+GexqLdlxOtIJEDHd5MeiO7oBwW3udipOK6zt Zxjw== X-Forwarded-Encrypted: i=1; AJvYcCWB6e+KwRGn9yodc98fljqaoi4pDVODniv/bofp/1TS6zMN9rhGOmtSKZVzxSb8JQr46D+XnZWCgoVgYfWk@vger.kernel.org, AJvYcCXfZAiadJJI+awk84v5xROxBb1t6fuds7ybSGS1svgjprflzVBMdoVRL2pXZ9ifcoSP/oQ=@vger.kernel.org X-Gm-Message-State: AOJu0YzCjyt+7RQDD3VUP5aEpeG/yMlWE52cmIFP8CsqV8FCylzoC/vs g+1rR5OqOQgsSYrc5zU4VxWndhDKU4OJ0qU99XyVSzXzDAXwz1lDLLpGtThv2BOU32k= X-Gm-Gg: ASbGncs5BLrE0KQST4oXUvwFRUyTILtfIBzaSyQxTEs9P+HEWgeFxPZSYhn2OfFlgeM 4jeY0tcOh+eb/AYn6ia0WskBaCbSAzkgSKV8pP7cNfhDPs4AafwGEsZ8CpjsFUxDezAiZm73Vvn k29zY7lvOKEhiWSZLKeUI+kp+dloR8Y5Rf1sVtRlSmR+6VBLDjaZzB90qUXsmIy7kWhmVRnbieU xTct2KFeKWw/wATbxfdiu1AObIKl3VfU4cBUAmuleDmHjvfIN4OPYTx5KWmhMHqT0rCZo3Zg6eK DUS2gw6bIcMus5Xn2BL4KPCwMDVtbdCld3/K8yWpStoKAgf9c4Kftb0jpivX9KBv+GM= X-Google-Smtp-Source: AGHT+IH2t382sXXANLRskfg4maHJpEmaHwmnBsYZa8bvr+4T4Mg80xVzehbmSdLKZjYSSBfMJg+QRA== X-Received: by 2002:a05:6a00:1148:b0:740:9abe:4d94 with SMTP id d2e1a72fcca58-745fe0c7e82mr20582541b3a.21.1748362780825; Tue, 27 May 2025 09:19:40 -0700 (PDT) Received: from minh.192.168.1.1 ([2001:ee0:4f0e:fb30:52e0:fc81:ee8a:bb3f]) by smtp.googlemail.com with ESMTPSA id d2e1a72fcca58-7462aefb414sm1118121b3a.34.2025.05.27.09.19.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 27 May 2025 09:19:40 -0700 (PDT) From: Bui Quang Minh To: netdev@vger.kernel.org Cc: "Michael S. Tsirkin" , Jason Wang , Xuan Zhuo , =?UTF-8?q?Eugenio=20P=C3=A9rez?= , Andrew Lunn , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , virtualization@lists.linux.dev, linux-kernel@vger.kernel.org, bpf@vger.kernel.org, Bui Quang Minh Subject: [RFC PATCH net-next v2 2/2] selftests: net: add XDP socket tests for virtio-net Date: Tue, 27 May 2025 23:19:04 +0700 Message-ID: <20250527161904.75259-3-minhquangbui99@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250527161904.75259-1-minhquangbui99@gmail.com> References: <20250527161904.75259-1-minhquangbui99@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This adds a test to test the virtio-net rx when there is a XDP socket bound to it. There are tests for both copy mode and zerocopy mode, both cases when XDP program returns XDP_PASS and XDP_REDIRECT to a XDP socket. Signed-off-by: Bui Quang Minh --- .../selftests/drivers/net/hw/.gitignore | 3 + .../testing/selftests/drivers/net/hw/Makefile | 12 +- .../drivers/net/hw/xsk_receive.bpf.c | 43 ++ .../selftests/drivers/net/hw/xsk_receive.c | 398 ++++++++++++++++++ .../selftests/drivers/net/hw/xsk_receive.py | 75 ++++ 5 files changed, 530 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/drivers/net/hw/xsk_receive.bpf.c create mode 100644 tools/testing/selftests/drivers/net/hw/xsk_receive.c create mode 100755 tools/testing/selftests/drivers/net/hw/xsk_receive.py diff --git a/tools/testing/selftests/drivers/net/hw/.gitignore b/tools/test= ing/selftests/drivers/net/hw/.gitignore index 6942bf575497..c32271faecff 100644 --- a/tools/testing/selftests/drivers/net/hw/.gitignore +++ b/tools/testing/selftests/drivers/net/hw/.gitignore @@ -1,3 +1,6 @@ # SPDX-License-Identifier: GPL-2.0-only iou-zcrx ncdevmem +xsk_receive.skel.h +xsk_receive +tools diff --git a/tools/testing/selftests/drivers/net/hw/Makefile b/tools/testin= g/selftests/drivers/net/hw/Makefile index df2c047ffa90..964edbb3b79f 100644 --- a/tools/testing/selftests/drivers/net/hw/Makefile +++ b/tools/testing/selftests/drivers/net/hw/Makefile @@ -1,6 +1,9 @@ # SPDX-License-Identifier: GPL-2.0+ OR MIT =20 -TEST_GEN_FILES =3D iou-zcrx +TEST_GEN_FILES =3D \ + iou-zcrx \ + xsk_receive \ + # =20 TEST_PROGS =3D \ csum.py \ @@ -20,6 +23,7 @@ TEST_PROGS =3D \ rss_input_xfrm.py \ tso.py \ xsk_reconfig.py \ + xsk_receive.py \ # =20 TEST_FILES :=3D \ @@ -48,3 +52,9 @@ include ../../../net/ynl.mk include ../../../net/bpf.mk =20 $(OUTPUT)/iou-zcrx: LDLIBS +=3D -luring + +$(OUTPUT)/xsk_receive.skel.h: xsk_receive.bpf.o + bpftool gen skeleton xsk_receive.bpf.o > xsk_receive.skel.h + +$(OUTPUT)/xsk_receive: xsk_receive.skel.h +$(OUTPUT)/xsk_receive: LDLIBS +=3D -lbpf diff --git a/tools/testing/selftests/drivers/net/hw/xsk_receive.bpf.c b/too= ls/testing/selftests/drivers/net/hw/xsk_receive.bpf.c new file mode 100644 index 000000000000..462046d95bfe --- /dev/null +++ b/tools/testing/selftests/drivers/net/hw/xsk_receive.bpf.c @@ -0,0 +1,43 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include +#include +#include + +struct { + __uint(type, BPF_MAP_TYPE_XSKMAP); + __uint(max_entries, 1); + __uint(key_size, sizeof(__u32)); + __uint(value_size, sizeof(__u32)); +} xsk_map SEC(".maps"); + +SEC("xdp.frags") +int dummy_prog(struct xdp_md *ctx) +{ + return XDP_PASS; +} + +SEC("xdp.frags") +int redirect_xsk_prog(struct xdp_md *ctx) +{ + void *data_end =3D (void *)(long)ctx->data_end; + void *data =3D (void *)(long)ctx->data; + struct ethhdr *eth =3D data; + struct iphdr *iph; + + if (data + sizeof(*eth) + sizeof(*iph) > data_end) + return XDP_PASS; + + if (bpf_htons(eth->h_proto) !=3D ETH_P_IP) + return XDP_PASS; + + iph =3D data + sizeof(*eth); + if (iph->protocol !=3D IPPROTO_UDP) + return XDP_PASS; + + return bpf_redirect_map(&xsk_map, 0, XDP_DROP); +} + +char _license[] SEC("license") =3D "GPL"; diff --git a/tools/testing/selftests/drivers/net/hw/xsk_receive.c b/tools/t= esting/selftests/drivers/net/hw/xsk_receive.c new file mode 100644 index 000000000000..96213ceeda5c --- /dev/null +++ b/tools/testing/selftests/drivers/net/hw/xsk_receive.c @@ -0,0 +1,398 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "xsk_receive.skel.h" + +#define load_acquire(p) \ + atomic_load_explicit((_Atomic typeof(*(p)) *)(p), memory_order_acquire) + +#define store_release(p, v) \ + atomic_store_explicit((_Atomic typeof(*(p)) *)(p), v, \ + memory_order_release) + +#define UMEM_CHUNK_SIZE 0x1000 +#define BUFFER_SIZE 0x2000 + +#define SERVER_PORT 8888 +#define CLIENT_PORT 9999 + +const int num_entries =3D 256; +const char *pass_msg =3D "PASS"; + +int cfg_client; +int cfg_server; +char *cfg_server_ip; +char *cfg_client_ip; +int cfg_ifindex; +int cfg_redirect; +int cfg_zerocopy; + +struct xdp_sock_context { + int xdp_sock; + void *umem_region; + void *rx_ring; + void *fill_ring; + struct xdp_mmap_offsets off; +}; + +struct xdp_sock_context *setup_xdp_socket(int ifindex) +{ + struct xdp_mmap_offsets off; + void *rx_ring, *fill_ring; + struct xdp_umem_reg umem_reg =3D {}; + int optlen =3D sizeof(off); + int umem_len, sock, ret, i; + void *umem_region; + uint32_t *fr_producer; + uint64_t *addr; + struct sockaddr_xdp sxdp =3D { + .sxdp_family =3D AF_XDP, + .sxdp_ifindex =3D ifindex, + .sxdp_queue_id =3D 0, + .sxdp_flags =3D XDP_USE_SG, + }; + struct xdp_sock_context *ctx; + + ctx =3D malloc(sizeof(*ctx)); + if (!ctx) + error(1, 0, "malloc()"); + + if (cfg_zerocopy) + sxdp.sxdp_flags |=3D XDP_ZEROCOPY; + else + sxdp.sxdp_flags |=3D XDP_COPY; + + umem_len =3D UMEM_CHUNK_SIZE * num_entries; + umem_region =3D mmap(0, umem_len, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, 0, 0); + if (umem_region =3D=3D MAP_FAILED) + error(1, errno, "mmap() umem"); + ctx->umem_region =3D umem_region; + + sock =3D socket(AF_XDP, SOCK_RAW, 0); + if (sock < 0) + error(1, errno, "socket() XDP"); + ctx->xdp_sock =3D sock; + + ret =3D setsockopt(sock, SOL_XDP, XDP_RX_RING, &num_entries, + sizeof(num_entries)); + if (ret < 0) + error(1, errno, "setsockopt() XDP_RX_RING"); + + ret =3D setsockopt(sock, SOL_XDP, XDP_UMEM_COMPLETION_RING, &num_entries, + sizeof(num_entries)); + if (ret < 0) + error(1, errno, "setsockopt() XDP_UMEM_COMPLETION_RING"); + + ret =3D setsockopt(sock, SOL_XDP, XDP_UMEM_FILL_RING, &num_entries, + sizeof(num_entries)); + if (ret < 0) + error(1, errno, "setsockopt() XDP_UMEM_FILL_RING"); + + ret =3D getsockopt(sock, SOL_XDP, XDP_MMAP_OFFSETS, &off, &optlen); + if (ret < 0) + error(1, errno, "getsockopt()"); + ctx->off =3D off; + + rx_ring =3D mmap(0, off.rx.desc + num_entries * sizeof(struct xdp_desc), + PROT_READ | PROT_WRITE, MAP_SHARED, sock, + XDP_PGOFF_RX_RING); + if (rx_ring =3D=3D (void *)-1) + error(1, errno, "mmap() rx-ring"); + ctx->rx_ring =3D rx_ring; + + fill_ring =3D mmap(0, off.fr.desc + num_entries * sizeof(uint64_t), + PROT_READ | PROT_WRITE, MAP_SHARED, sock, + XDP_UMEM_PGOFF_FILL_RING); + if (fill_ring =3D=3D (void *)-1) + error(1, errno, "mmap() fill-ring"); + ctx->fill_ring =3D fill_ring; + + umem_reg.addr =3D (unsigned long long)ctx->umem_region; + umem_reg.len =3D umem_len; + umem_reg.chunk_size =3D UMEM_CHUNK_SIZE; + ret =3D setsockopt(sock, SOL_XDP, XDP_UMEM_REG, &umem_reg, + sizeof(umem_reg)); + if (ret < 0) + error(1, errno, "setsockopt() XDP_UMEM_REG"); + + i =3D 0; + while (1) { + ret =3D bind(sock, (const struct sockaddr *)&sxdp, sizeof(sxdp)); + if (!ret) + break; + + if (errno =3D=3D EBUSY && i < 3) { + i++; + sleep(1); + } else { + error(1, errno, "bind() XDP"); + } + } + + /* Submit all umem entries to fill ring */ + addr =3D fill_ring + off.fr.desc; + for (i =3D 0; i < umem_len; i +=3D UMEM_CHUNK_SIZE) { + *addr =3D i; + addr++; + } + fr_producer =3D fill_ring + off.fr.producer; + store_release(fr_producer, num_entries); + + return ctx; +} + +void setup_xdp_prog(int sock, int ifindex, int redirect) +{ + struct xsk_receive_bpf *bpf; + int key, ret; + + bpf =3D xsk_receive_bpf__open_and_load(); + if (!bpf) + error(1, 0, "open eBPF"); + + key =3D 0; + ret =3D bpf_map__update_elem(bpf->maps.xsk_map, &key, sizeof(key), + &sock, sizeof(sock), 0); + if (ret < 0) + error(1, errno, "eBPF map update"); + + if (redirect) { + ret =3D bpf_xdp_attach(ifindex, + bpf_program__fd(bpf->progs.redirect_xsk_prog), + 0, NULL); + if (ret < 0) + error(1, errno, "attach eBPF"); + } else { + ret =3D bpf_xdp_attach(ifindex, + bpf_program__fd(bpf->progs.dummy_prog), + 0, NULL); + if (ret < 0) + error(1, errno, "attach eBPF"); + } +} + +void send_pass_msg(int sock) +{ + int ret; + struct sockaddr_in addr =3D { + .sin_family =3D AF_INET, + .sin_addr =3D inet_addr(cfg_client_ip), + .sin_port =3D htons(CLIENT_PORT), + }; + + ret =3D sendto(sock, pass_msg, sizeof(pass_msg), 0, + (const struct sockaddr *)&addr, sizeof(addr)); + if (ret < 0) + error(1, errno, "sendto()"); +} + +void server_recv_xdp(struct xdp_sock_context *ctx, int udp_sock) +{ + int ret; + struct pollfd fds =3D { + .fd =3D ctx->xdp_sock, + .events =3D POLLIN, + }; + + ret =3D poll(&fds, 1, -1); + if (ret < 0) + error(1, errno, "poll()"); + + if (fds.revents & POLLIN) { + uint32_t *producer_ptr =3D ctx->rx_ring + ctx->off.rx.producer; + uint32_t *consumer_ptr =3D ctx->rx_ring + ctx->off.rx.consumer; + uint32_t producer, consumer; + struct xdp_desc *desc; + + producer =3D load_acquire(producer_ptr); + consumer =3D load_acquire(consumer_ptr); + + printf("Receive %d XDP buffers\n", producer - consumer); + + store_release(consumer_ptr, producer); + } else { + error(1, 0, "unexpected poll event: %d", fds.revents); + } + + send_pass_msg(udp_sock); +} + +void server_recv_udp(int sock) +{ + char *buffer; + int i, ret; + + buffer =3D mmap(0, BUFFER_SIZE, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, 0, 0); + if (buffer =3D=3D MAP_FAILED) + error(1, errno, "mmap() send buffer"); + + ret =3D recv(sock, buffer, BUFFER_SIZE, 0); + if (ret < 0) + error(1, errno, "recv()"); + + if (ret !=3D BUFFER_SIZE) + error(1, errno, "message is truncated, expected: %d, got: %d", + BUFFER_SIZE, ret); + + for (i =3D 0; i < BUFFER_SIZE; i++) + if (buffer[i] !=3D 'a' + (i % 26)) + error(1, 0, "message mismatches at %d", i); + + send_pass_msg(sock); +} + +int setup_udp_sock(const char *addr, int port) +{ + int sock, ret; + struct sockaddr_in saddr =3D { + .sin_family =3D AF_INET, + .sin_addr =3D inet_addr(addr), + .sin_port =3D htons(port), + }; + + sock =3D socket(AF_INET, SOCK_DGRAM, 0); + if (sock < 0) + error(1, errno, "socket() UDP"); + + ret =3D bind(sock, (const struct sockaddr *)&saddr, sizeof(saddr)); + if (ret < 0) + error(1, errno, "bind() UDP"); + + return sock; +} + +void run_server(void) +{ + int udp_sock; + struct xdp_sock_context *ctx; + + ctx =3D setup_xdp_socket(cfg_ifindex); + setup_xdp_prog(ctx->xdp_sock, cfg_ifindex, cfg_redirect); + udp_sock =3D setup_udp_sock(cfg_server_ip, SERVER_PORT); + + if (cfg_redirect) + server_recv_xdp(ctx, udp_sock); + else + server_recv_udp(udp_sock); +} + +void run_client(void) +{ + char *buffer; + int sock, ret, i; + struct sockaddr_in addr =3D { + .sin_family =3D AF_INET, + .sin_addr =3D inet_addr(cfg_server_ip), + .sin_port =3D htons(SERVER_PORT), + }; + + buffer =3D mmap(0, BUFFER_SIZE, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, 0, 0); + if (buffer =3D=3D MAP_FAILED) + error(1, errno, "mmap() send buffer"); + + for (i =3D 0; i < BUFFER_SIZE; i++) + buffer[i] =3D 'a' + (i % 26); + + sock =3D setup_udp_sock(cfg_client_ip, CLIENT_PORT); + + ret =3D sendto(sock, buffer, BUFFER_SIZE, 0, + (const struct sockaddr *)&addr, sizeof(addr)); + if (ret < 0) + error(1, errno, "sendto()"); + + if (ret !=3D BUFFER_SIZE) + error(1, 0, "sent buffer is truncated, expected: %d got: %d", + BUFFER_SIZE, ret); + + ret =3D recv(sock, buffer, BUFFER_SIZE, 0); + if (ret < 0) + error(1, errno, "recv()"); + + if ((ret !=3D sizeof(pass_msg)) || strcmp(buffer, pass_msg)) + error(1, 0, "message mismatches, expected: %s, got: %s", + pass_msg, buffer); +} + +void print_usage(char *prog) +{ + fprintf(stderr, "Usage: %s (-c|-s) -r -l" + " -i [-d] [-z]\n", prog); +} + +void parse_opts(int argc, char **argv) +{ + int opt; + char *ifname =3D NULL; + + while ((opt =3D getopt(argc, argv, "hcsr:l:i:dz")) !=3D -1) { + switch (opt) { + case 'c': + if (cfg_server) + error(1, 0, "Pass one of -s or -c"); + + cfg_client =3D 1; + break; + case 's': + if (cfg_client) + error(1, 0, "Pass one of -s or -c"); + + cfg_server =3D 1; + break; + case 'r': + cfg_server_ip =3D optarg; + break; + case 'l': + cfg_client_ip =3D optarg; + break; + case 'i': + ifname =3D optarg; + break; + case 'd': + cfg_redirect =3D 1; + break; + case 'z': + cfg_zerocopy =3D 1; + break; + case 'h': + default: + print_usage(argv[0]); + exit(1); + } + } + + if (!cfg_client && !cfg_server) + error(1, 0, "Pass one of -s or -c"); + + if (ifname) { + cfg_ifindex =3D if_nametoindex(ifname); + if (!cfg_ifindex) + error(1, errno, "Invalid interface %s", ifname); + } +} + +int main(int argc, char **argv) +{ + parse_opts(argc, argv); + if (cfg_client) + run_client(); + else if (cfg_server) + run_server(); + + return 0; +} diff --git a/tools/testing/selftests/drivers/net/hw/xsk_receive.py b/tools/= testing/selftests/drivers/net/hw/xsk_receive.py new file mode 100755 index 000000000000..f32cb4477b75 --- /dev/null +++ b/tools/testing/selftests/drivers/net/hw/xsk_receive.py @@ -0,0 +1,75 @@ +#!/usr/bin/env python3 +# SPDX-License-Identifier: GPL-2.0 + +# This a test for virtio-net rx when there is a XDP socket bound to it. Th= e test +# is expected to be run in the host side. +# +# The run example: +# +# export NETIF=3Dtap0 +# export LOCAL_V4=3D192.168.31.1 +# export REMOTE_V4=3D192.168.31.3 +# export REMOTE_TYPE=3Dssh +# export REMOTE_ARGS=3D'root@192.168.31.3' +# ./ksft-net-drv/run_kselftest.sh -t drivers/net/hw:xsk_receive.py +# +# where: +# - 192.168.31.1 is the IP of tap device in the host +# - 192.168.31.3 is the IP of virtio-net device in the guest +# +# The Qemu command to setup virtio-net +# -netdev tap,id=3Dhostnet1,vhost=3Don,script=3Dno,downscript=3Dno +# -device virtio-net-pci,netdev=3Dhostnet1,iommu_platform=3Don,disable-leg= acy=3Don +# +# The MTU of tap device can be adjusted to test more cases: +# - 1500: single buffer XDP +# - 9000: multi-buffer XDP + +from lib.py import ksft_exit, ksft_run +from lib.py import KsftSkipEx, KsftFailEx +from lib.py import NetDrvEpEnv +from lib.py import bkg, cmd, wait_port_listen +from os import path + +SERVER_PORT =3D 8888 +CLIENT_PORT =3D 9999 + +def test_xdp_pass(cfg, server_cmd, client_cmd): + with bkg(server_cmd, host=3Dcfg.remote, exit_wait=3DTrue): + wait_port_listen(SERVER_PORT, proto=3D"udp", host=3Dcfg.remote) + cmd(client_cmd) + +def test_xdp_pass_zc(cfg, server_cmd, client_cmd): + server_cmd +=3D " -z" + with bkg(server_cmd, host=3Dcfg.remote, exit_wait=3DTrue): + wait_port_listen(SERVER_PORT, proto=3D"udp", host=3Dcfg.remote) + cmd(client_cmd) + +def test_xdp_redirect(cfg, server_cmd, client_cmd): + server_cmd +=3D " -d" + with bkg(server_cmd, host=3Dcfg.remote, exit_wait=3DTrue): + wait_port_listen(SERVER_PORT, proto=3D"udp", host=3Dcfg.remote) + cmd(client_cmd) + +def test_xdp_redirect_zc(cfg, server_cmd, client_cmd): + server_cmd +=3D " -d -z" + with bkg(server_cmd, host=3Dcfg.remote, exit_wait=3DTrue): + wait_port_listen(SERVER_PORT, proto=3D"udp", host=3Dcfg.remote) + cmd(client_cmd) + +def main(): + with NetDrvEpEnv(__file__, nsim_test=3DFalse) as cfg: + cfg.bin_local =3D path.abspath(path.dirname(__file__) + + "/../../../drivers/net/hw/xsk_receive") + cfg.bin_remote =3D cfg.remote.deploy(cfg.bin_local) + + server_cmd =3D f"{cfg.bin_remote} -s -i {cfg.remote_ifname} " + server_cmd +=3D f"-r {cfg.remote_addr_v["4"]} -l {cfg.addr_v["4"]}" + client_cmd =3D f"{cfg.bin_local} -c -r {cfg.remote_addr_v["4"]} " + client_cmd +=3D f"-l {cfg.addr_v["4"]}" + + ksft_run(globs=3Dglobals(), case_pfx=3D{"test_"}, args=3D(cfg, ser= ver_cmd, client_cmd)) + ksft_exit() + +if __name__ =3D=3D "__main__": + main() --=20 2.43.0