From nobody Sun Feb 8 22:18:36 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 48F5AEB64DD for ; Tue, 8 Aug 2023 03:21:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231337AbjHHDVC (ORCPT ); Mon, 7 Aug 2023 23:21:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45460 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229454AbjHHDU4 (ORCPT ); Mon, 7 Aug 2023 23:20:56 -0400 Received: from mail-pl1-x629.google.com (mail-pl1-x629.google.com [IPv6:2607:f8b0:4864:20::629]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A405310F0 for ; Mon, 7 Aug 2023 20:20:28 -0700 (PDT) Received: by mail-pl1-x629.google.com with SMTP id d9443c01a7336-1bba54f7eefso40768345ad.1 for ; Mon, 07 Aug 2023 20:20:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1691464828; x=1692069628; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=uLQYdihKrZnqzA8pyJFR0KmRfev22Y9Nk5IK1CsNWSQ=; b=Kl4mmZ6wC30QI0KYvWfqwsTsY4ihiWC63YfNOwTtHsCTiPRlVcI029sp+MJOROKfoo 2ZSjwgqwyuqMHwmhreAo6j9KGFeqiP4JTgT+5eIfWSoSekMcFAsja8NYdATAxJRAzZha LgHGf3BWqBvWvn5Ibdk2tZLiEfEVgU7EokhG1/IOeN7a4Al8TjQt7K1s0XC5mN196l4D ObsT7B1ydnVq2hD/R+PJ3GIdyMv70DAsQoTfRJJk0ZiyITCwrdg2BpYb2NUERKzwD37q 7WVSfkeOaG8pIqpShLoRNDtTBASpWi86pCdSd4Sb5itRrV6POLS4x2CKUFvjHQ28tKtd jk2A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691464828; x=1692069628; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=uLQYdihKrZnqzA8pyJFR0KmRfev22Y9Nk5IK1CsNWSQ=; b=fc1qJZKqx8Era/X7nlTfEb0JCHScaCLd6yjIIWiYB0C1LzCutcR+qf0O8rC62l7wSP nu9XFrUNm3Tnwn11KTRXjzh/jBo1ACkDbEJ9vbwf2vO98n75b7fQvSEDIw/YlEi4zxzA /SK3SPt0XPsng7CJ8s2LCkpKlzrnKFTeNZA1YSOycpb7Qjj90g6qlQlsalqfQb88dT2P IApRvEedZh0tEvoOA3ul1oHas9YL2h54vYmBozuuUGDwalgHmW3Wd/SgwurAvUs0vm6g TOoVQGn/Y0Bb2mJNGggtBxnQE8dBDqcKr5G97S/pkKEH5FfWhfscInQGzEHmGSCmU5oz 5hUQ== X-Gm-Message-State: AOJu0YwuUb3FsS7kih4CIfjoQrrJkzETNehPLGLiPRsrlIgEQt8wU2Zk r43tgmZU+XEtJJ45XeDveVXfKw== X-Google-Smtp-Source: AGHT+IGLpz+58KIOin27q8snOr3n0kQTemVujLRn7G1yFjy5FFK/EozRt3B6Cq5rzwpxMgPHKWxaaw== X-Received: by 2002:a17:902:d4c8:b0:1bc:1e17:6d70 with SMTP id o8-20020a170902d4c800b001bc1e176d70mr11761596plg.24.1691464827688; Mon, 07 Aug 2023 20:20:27 -0700 (PDT) Received: from C02FG34NMD6R.bytedance.net ([2408:8656:30f8:e020::b]) by smtp.gmail.com with ESMTPSA id 13-20020a170902c10d00b001b896686c78sm7675800pli.66.2023.08.07.20.20.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Aug 2023 20:20:27 -0700 (PDT) From: Albert Huang To: davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com Cc: Albert Huang , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Pavel Begunkov , Yunsheng Lin , Kees Cook , Richard Gobert , "open list:NETWORKING DRIVERS" , open list , "open list:XDP (eXpress Data Path)" Subject: [RFC v3 Optimizing veth xsk performance 1/9] veth: Implement ethtool's get_ringparam() callback Date: Tue, 8 Aug 2023 11:19:05 +0800 Message-Id: <20230808031913.46965-2-huangjie.albert@bytedance.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) In-Reply-To: <20230808031913.46965-1-huangjie.albert@bytedance.com> References: <20230808031913.46965-1-huangjie.albert@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" some xsk library calls get_ringparam() API to get the queue length to init the xsk umem. Implement that in veth so those scenarios can work properly. Signed-off-by: Albert Huang --- drivers/net/veth.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 614f3e3efab0..77e12d52ca2b 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -255,6 +255,17 @@ static void veth_get_channels(struct net_device *dev, static int veth_set_channels(struct net_device *dev, struct ethtool_channels *ch); =20 +static void veth_get_ringparam(struct net_device *dev, + struct ethtool_ringparam *ring, + struct kernel_ethtool_ringparam *kernel_ring, + struct netlink_ext_ack *extack) +{ + ring->rx_max_pending =3D VETH_RING_SIZE; + ring->tx_max_pending =3D VETH_RING_SIZE; + ring->rx_pending =3D VETH_RING_SIZE; + ring->tx_pending =3D VETH_RING_SIZE; +} + static const struct ethtool_ops veth_ethtool_ops =3D { .get_drvinfo =3D veth_get_drvinfo, .get_link =3D ethtool_op_get_link, @@ -265,6 +276,7 @@ static const struct ethtool_ops veth_ethtool_ops =3D { .get_ts_info =3D ethtool_op_get_ts_info, .get_channels =3D veth_get_channels, .set_channels =3D veth_set_channels, + .get_ringparam =3D veth_get_ringparam, }; =20 /* general routines */ --=20 2.20.1 From nobody Sun Feb 8 22:18:36 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 92372EB64DD for ; Tue, 8 Aug 2023 03:21:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231339AbjHHDVN (ORCPT ); Mon, 7 Aug 2023 23:21:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45492 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230128AbjHHDVA (ORCPT ); Mon, 7 Aug 2023 23:21:00 -0400 Received: from mail-pl1-x62c.google.com (mail-pl1-x62c.google.com [IPv6:2607:f8b0:4864:20::62c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A6EE110DE for ; Mon, 7 Aug 2023 20:20:36 -0700 (PDT) Received: by mail-pl1-x62c.google.com with SMTP id d9443c01a7336-1bbf8cb61aeso33502795ad.2 for ; Mon, 07 Aug 2023 20:20:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1691464836; x=1692069636; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=CSkaRhEoax/kU+S7BeIv84prtFC3VRpkmwXx3cyNhW4=; b=d6NExLM8tFgWdgX9G9vZowhLgc1yvXqs7IkLMLHXraFw++XbkXxgetWnrYgoDnH2PL bKoG+iR2qZdyLwagLCZt6X9V4gshueZ4JBNkk7hSH9GzbbXtE6DV8cFw2t6L291TW3Bl RkrxAU1mOtbcr70fK4bljLdaAEYWd4A5hr9wZ96t2zuuh/S5KsRZqo4l/VpAuTJk/0th pUM4uhplW0joZl356oAtpSoeLbDUed2TyxXYGQvRA20cg/tISFQFjAtUbYl0jgSapqG8 fJpW3L2C0NbOqpLGIqYRwXCjCJ31gazngAztABsx8PEOY9U9Ou7Yy/RRiZiEIIQEcmLC ZkHg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691464836; x=1692069636; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=CSkaRhEoax/kU+S7BeIv84prtFC3VRpkmwXx3cyNhW4=; b=cPnlSH3mizTEb1yjV+yo1dY5z9svjcqHKTYRqxyeepvVUlZrf2a1X59kzZAlnSWZ0S 31CNjw0g8xfzvsDz/ToxZ2nAc7gKebTTKBM4hwuHItHqmj/JEoT121nwJZYLBYSgswhF hGtHiXMP10SoEmhavrkcZi7lpRYotgMiPecOgFigMzs5Kd6oq1BiUM/OSfdBLUPdA1Gc ch/J60teq4sqd7V07Unbl9JUy+qX/eebAl2Crb/vwrpFDJMf5mLtEzVcAPILAfCnYc2/ y4J2Wi1kFQfSc6GADEug81ADyR94uNNK2AynX0OLN4aCUOsmpKcjeHnmoNvg4ZUAtRQE zSow== X-Gm-Message-State: AOJu0YzD+3d+6mlIuvanxVbnZWENqGDocCd93GnM9R1M7B6ReIGoKpz7 ijwe61dha3qFt2zDOiXpk7gM1g== X-Google-Smtp-Source: AGHT+IH2VXJ6Ghkxwofrs8nrrcDgKtW93x4dyG1q9rUxYCzHKLPem259aEOah6YsFt5u9x/X6BX1Lw== X-Received: by 2002:a17:902:d48c:b0:1bc:1df2:4c07 with SMTP id c12-20020a170902d48c00b001bc1df24c07mr11625675plg.63.1691464836135; Mon, 07 Aug 2023 20:20:36 -0700 (PDT) Received: from C02FG34NMD6R.bytedance.net ([2408:8656:30f8:e020::b]) by smtp.gmail.com with ESMTPSA id 13-20020a170902c10d00b001b896686c78sm7675800pli.66.2023.08.07.20.20.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Aug 2023 20:20:35 -0700 (PDT) From: Albert Huang To: davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com Cc: Albert Huang , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Pavel Begunkov , Yunsheng Lin , Kees Cook , Richard Gobert , "open list:NETWORKING DRIVERS" , open list , "open list:XDP (eXpress Data Path)" Subject: [RFC v3 Optimizing veth xsk performance 2/9] xsk: add dma_check_skip for skipping dma check Date: Tue, 8 Aug 2023 11:19:06 +0800 Message-Id: <20230808031913.46965-3-huangjie.albert@bytedance.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) In-Reply-To: <20230808031913.46965-1-huangjie.albert@bytedance.com> References: <20230808031913.46965-1-huangjie.albert@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" for the virtual net device such as veth, there is no need to do dma check if we support zero copy. add this flag after unaligned. beacause there is 4 bytes hole pahole -V ./net/xdp/xsk_buff_pool.o: ----------- ... /* --- cacheline 3 boundary (192 bytes) --- */ u32 chunk_size; /* 192 4 */ u32 frame_len; /* 196 4 */ u8 cached_need_wakeup; /* 200 1 */ bool uses_need_wakeup; /* 201 1 */ bool dma_need_sync; /* 202 1 */ bool unaligned; /* 203 1 */ /* XXX 4 bytes hole, try to pack */ void * addrs; /* 208 8 */ spinlock_t cq_lock; /* 216 4 */ ... ----------- Signed-off-by: Albert Huang --- include/net/xsk_buff_pool.h | 1 + net/xdp/xsk_buff_pool.c | 3 ++- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h index b0bdff26fc88..fe31097dc11b 100644 --- a/include/net/xsk_buff_pool.h +++ b/include/net/xsk_buff_pool.h @@ -81,6 +81,7 @@ struct xsk_buff_pool { bool uses_need_wakeup; bool dma_need_sync; bool unaligned; + bool dma_check_skip; void *addrs; /* Mutual exclusion of the completion ring in the SKB mode. Two cases to = protect: * NAPI TX thread and sendmsg error paths in the SKB destructor callback = and when diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c index b3f7b310811e..ed251b8e8773 100644 --- a/net/xdp/xsk_buff_pool.c +++ b/net/xdp/xsk_buff_pool.c @@ -85,6 +85,7 @@ struct xsk_buff_pool *xp_create_and_assign_umem(struct xd= p_sock *xs, XDP_PACKET_HEADROOM; pool->umem =3D umem; pool->addrs =3D umem->addrs; + pool->dma_check_skip =3D false; INIT_LIST_HEAD(&pool->free_list); INIT_LIST_HEAD(&pool->xskb_list); INIT_LIST_HEAD(&pool->xsk_tx_list); @@ -202,7 +203,7 @@ int xp_assign_dev(struct xsk_buff_pool *pool, if (err) goto err_unreg_pool; =20 - if (!pool->dma_pages) { + if (!pool->dma_pages && !pool->dma_check_skip) { WARN(1, "Driver did not DMA map zero-copy buffers"); err =3D -EINVAL; goto err_unreg_xsk; --=20 2.20.1 From nobody Sun Feb 8 22:18:36 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0E95AEB64DD for ; Tue, 8 Aug 2023 03:20:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229628AbjHHDU4 (ORCPT ); Mon, 7 Aug 2023 23:20:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45502 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231157AbjHHDUp (ORCPT ); Mon, 7 Aug 2023 23:20:45 -0400 Received: from mail-pl1-x62f.google.com (mail-pl1-x62f.google.com [IPv6:2607:f8b0:4864:20::62f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C358A10DC for ; Mon, 7 Aug 2023 20:20:43 -0700 (PDT) Received: by mail-pl1-x62f.google.com with SMTP id d9443c01a7336-1bc73a2b0easo8964765ad.0 for ; Mon, 07 Aug 2023 20:20:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1691464843; x=1692069643; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=9C7bHaMSd+SY+pPB7Vtx4/6UZIJ72tWmfxoauVOSVx0=; b=VHfxDg5QSz1PqjRWt9B9x6mNaS6RhCwH3kbVmHYf16gDy0fMcTjbQfLdJ1bUO9PgyF e858P0e+77eKT6XZAaPITT05KvwS75X5CChow28CxQevaNfwHO/a/b/Tjth6QhYmamUl D2koPxHSfUnNqjewXP7wEAFJPTs3dyQoX1v5vV3qDUxrlvdVl0KOQKfXzW86ewoVN7mA /80D4Vn2cmagGQlwfUnU6NadEKweg09+IK+04wQqqwm6DaEtzrp97grfGS8he+jLvx/k eWU21VDBsw4THoGugTPD0TDkYAr8PyGh2PllRIUx5Bm4PqGbt8R1AGICgsH1UV/Qg33Y IKPQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691464843; x=1692069643; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=9C7bHaMSd+SY+pPB7Vtx4/6UZIJ72tWmfxoauVOSVx0=; b=Z0GK9sJ93Bj4mxmCK8FbBdK98zsvZp/9F2IpYptBe/fYoPdCYy2Uw86oHYHaZ/MhH0 JWwlk837gooaXPbwQ0rt6lMfaidwbuHsr44Eoe7J+B5y2iK8kiDWJm4vUF33mxWSoMRY midOoAHBHJNuOp6iSLEEAq/50elAXcV6+jRTEio6MlwxjvYmUrSWCz8OfUR/LfQEXJqT iXvGLr0ZbP8U9L4h964zfYLVygkiCL7IbxIzskb4PrbVyPQgWCyoukycEMO3njiSkNPc h16PmFBzKG0PAmyO8NeTVO5u5Ny4+pi+EV8P0avQNsTooRsbDeDzZGIoiXFsNFsVj0Ve whWg== X-Gm-Message-State: AOJu0Yy5sdKo4MUWzeB1DrxjPe0zH4uK8jFilz3EKEwnLXJ94GD/yvyf v67zeKIT/iypg1r3rVZSTwdRPA== X-Google-Smtp-Source: AGHT+IFT7mDq0JlyeShuwm6n4b9kapZWwkdRXEbqIpMukZkRiMaS8kGKuxgNgq0KI7HbhgNQsXCRbQ== X-Received: by 2002:a17:902:c404:b0:1bb:a367:a77 with SMTP id k4-20020a170902c40400b001bba3670a77mr11633095plk.31.1691464843297; Mon, 07 Aug 2023 20:20:43 -0700 (PDT) Received: from C02FG34NMD6R.bytedance.net ([2408:8656:30f8:e020::b]) by smtp.gmail.com with ESMTPSA id 13-20020a170902c10d00b001b896686c78sm7675800pli.66.2023.08.07.20.20.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Aug 2023 20:20:42 -0700 (PDT) From: Albert Huang To: davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com Cc: Albert Huang , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Pavel Begunkov , Yunsheng Lin , Kees Cook , Richard Gobert , "open list:NETWORKING DRIVERS" , open list , "open list:XDP (eXpress Data Path)" Subject: [RFC v3 Optimizing veth xsk performance 3/9] veth: add support for send queue Date: Tue, 8 Aug 2023 11:19:07 +0800 Message-Id: <20230808031913.46965-4-huangjie.albert@bytedance.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) In-Reply-To: <20230808031913.46965-1-huangjie.albert@bytedance.com> References: <20230808031913.46965-1-huangjie.albert@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" in order to support native af_xdp for veth. we need support for send queue for napi tx. the upcoming patch will make use of it. Signed-off-by: Albert Huang --- drivers/net/veth.c | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 77e12d52ca2b..25faba879505 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -56,6 +56,11 @@ struct veth_rq_stats { struct u64_stats_sync syncp; }; =20 +struct veth_sq_stats { + struct veth_stats vs; + struct u64_stats_sync syncp; +}; + struct veth_rq { struct napi_struct xdp_napi; struct napi_struct __rcu *napi; /* points to xdp_napi when the latter is = initialized */ @@ -69,11 +74,25 @@ struct veth_rq { struct page_pool *page_pool; }; =20 +struct veth_sq { + struct napi_struct xdp_napi; + struct net_device *dev; + struct xdp_mem_info xdp_mem; + struct veth_sq_stats stats; + u32 queue_index; + /* for xsk */ + struct { + struct xsk_buff_pool __rcu *pool; + u32 last_cpu; + } xsk; +}; + struct veth_priv { struct net_device __rcu *peer; atomic64_t dropped; struct bpf_prog *_xdp_prog; struct veth_rq *rq; + struct veth_sq *sq; unsigned int requested_headroom; }; =20 @@ -1495,6 +1514,15 @@ static int veth_alloc_queues(struct net_device *dev) u64_stats_init(&priv->rq[i].stats.syncp); } =20 + priv->sq =3D kcalloc(dev->num_tx_queues, sizeof(*priv->sq), GFP_KERNEL); + if (!priv->sq) + return -ENOMEM; + + for (i =3D 0; i < dev->num_tx_queues; i++) { + priv->sq[i].dev =3D dev; + u64_stats_init(&priv->sq[i].stats.syncp); + } + return 0; } =20 @@ -1503,6 +1531,7 @@ static void veth_free_queues(struct net_device *dev) struct veth_priv *priv =3D netdev_priv(dev); =20 kfree(priv->rq); + kfree(priv->sq); } =20 static int veth_dev_init(struct net_device *dev) --=20 2.20.1 From nobody Sun Feb 8 22:18:36 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AC746C001B0 for ; Tue, 8 Aug 2023 03:21:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231448AbjHHDVw (ORCPT ); Mon, 7 Aug 2023 23:21:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45446 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229998AbjHHDVO (ORCPT ); Mon, 7 Aug 2023 23:21:14 -0400 Received: from mail-pl1-x630.google.com (mail-pl1-x630.google.com [IPv6:2607:f8b0:4864:20::630]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0B7DB10D2 for ; Mon, 7 Aug 2023 20:20:50 -0700 (PDT) Received: by mail-pl1-x630.google.com with SMTP id d9443c01a7336-1bc63ef9959so19864395ad.2 for ; Mon, 07 Aug 2023 20:20:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1691464850; x=1692069650; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=nhqzuKs8acODjWqieOr60TzQPrln+0aaO9utMxKEwXY=; b=OJL9LrbSgQt70QJMij8iZC+JU41pG03VeJ0y7Nl922F2ZaU7U87A5B753UGGc0Yzx1 /bOWzzKWiz5MHCq3MgepwcOP88c3QaVKgVkTsFPHfIKsB1CIEy7ISR3PzFTQvfTVch3s WnjevnnH9SsiyAEF6GrygCCohR3vtxY6qtYmJVOnIJAzQC5P3SlBXIP0HPbiUcyU/Tpg WzO3Noh9RFrUXrGlfzBi0PbKEYMSibgslfmMf83UoYgMOmjvVh7T/nHKLh9NspmzZ0bp ZmzvtRNd6WNniQsh/hzTpgIkY/Il5uuKqN/T/P4pQ1UI1p7bI63YzkuOWTSh/Dn//l3M qjZg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691464850; x=1692069650; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=nhqzuKs8acODjWqieOr60TzQPrln+0aaO9utMxKEwXY=; b=LtgxD1WjbXT1PaQt5uGg7O1Psggg9QbDwI8fZNx08fjEiggSK5l40v5uXaw5ZVHdzV i9nRv6HApq5kFj4fmk5mjLzSaYid4RYerLLVZDW2bE8uJ9+uV9KRvfCoFfErlpgsDPRI jHqznJmtLyG9BiPW0w7ansVbUeyP89dVVY+kUw/hW2JN1ETMM+zDuC4MjirR577wHJUh tNTP73suKGK9keKx5aGYGy5cZLOS+Rl0nrWmBqsyiCtNZoe6NcIirnD7WjJSjZRN1rMJ PxpDv2+w8n0uuRdxKrxNd8QcdfesJXattDnIQoSCHvftZn01s1hJhG8UB6Hj5LbOs+TC MLGA== X-Gm-Message-State: AOJu0YzNzLuIzdseg3Twzz8qe3hkn5WparVl0jv0h12A178TIqdje2aE e5jb7jnr1JWMxt3IuAuXqWz5qA== X-Google-Smtp-Source: AGHT+IFJntj41Gt0ECRGUl7vLwT94BPvXtio+ZQROoGCzVRtLpUoICA2rzvmhhrD1xkeCfFkMSjNtw== X-Received: by 2002:a17:902:eccc:b0:1b8:1335:b775 with SMTP id a12-20020a170902eccc00b001b81335b775mr13965313plh.0.1691464850307; Mon, 07 Aug 2023 20:20:50 -0700 (PDT) Received: from C02FG34NMD6R.bytedance.net ([2408:8656:30f8:e020::b]) by smtp.gmail.com with ESMTPSA id 13-20020a170902c10d00b001b896686c78sm7675800pli.66.2023.08.07.20.20.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Aug 2023 20:20:49 -0700 (PDT) From: Albert Huang To: davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com Cc: Albert Huang , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Pavel Begunkov , Yunsheng Lin , Kees Cook , Richard Gobert , "open list:NETWORKING DRIVERS" , open list , "open list:XDP (eXpress Data Path)" Subject: [RFC v3 Optimizing veth xsk performance 4/9] xsk: add xsk_tx_completed_addr function Date: Tue, 8 Aug 2023 11:19:08 +0800 Message-Id: <20230808031913.46965-5-huangjie.albert@bytedance.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) In-Reply-To: <20230808031913.46965-1-huangjie.albert@bytedance.com> References: <20230808031913.46965-1-huangjie.albert@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Return desc to the cq by using the descriptor address. Signed-off-by: Albert Huang --- include/net/xdp_sock_drv.h | 5 +++++ net/xdp/xsk.c | 6 ++++++ net/xdp/xsk_queue.h | 10 ++++++++++ 3 files changed, 21 insertions(+) diff --git a/include/net/xdp_sock_drv.h b/include/net/xdp_sock_drv.h index 1f6fc8c7a84c..de82c596e48f 100644 --- a/include/net/xdp_sock_drv.h +++ b/include/net/xdp_sock_drv.h @@ -15,6 +15,7 @@ #ifdef CONFIG_XDP_SOCKETS =20 void xsk_tx_completed(struct xsk_buff_pool *pool, u32 nb_entries); +void xsk_tx_completed_addr(struct xsk_buff_pool *pool, u64 addr); bool xsk_tx_peek_desc(struct xsk_buff_pool *pool, struct xdp_desc *desc); u32 xsk_tx_peek_release_desc_batch(struct xsk_buff_pool *pool, u32 max); void xsk_tx_release(struct xsk_buff_pool *pool); @@ -188,6 +189,10 @@ static inline void xsk_tx_completed(struct xsk_buff_po= ol *pool, u32 nb_entries) { } =20 +static inline void xsk_tx_completed_addr(struct xsk_buff_pool *pool, u64 a= ddr) +{ +} + static inline bool xsk_tx_peek_desc(struct xsk_buff_pool *pool, struct xdp_desc *desc) { diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 4f1e0599146e..b2b8aa7b0bcf 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -396,6 +396,12 @@ void xsk_tx_completed(struct xsk_buff_pool *pool, u32 = nb_entries) } EXPORT_SYMBOL(xsk_tx_completed); =20 +void xsk_tx_completed_addr(struct xsk_buff_pool *pool, u64 addr) +{ + xskq_prod_submit_addr(pool->cq, addr); +} +EXPORT_SYMBOL(xsk_tx_completed_addr); + void xsk_tx_release(struct xsk_buff_pool *pool) { struct xdp_sock *xs; diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h index 13354a1e4280..3a5e26a81dc2 100644 --- a/net/xdp/xsk_queue.h +++ b/net/xdp/xsk_queue.h @@ -428,6 +428,16 @@ static inline void __xskq_prod_submit(struct xsk_queue= *q, u32 idx) smp_store_release(&q->ring->producer, idx); /* B, matches C */ } =20 +static inline void xskq_prod_submit_addr(struct xsk_queue *q, u64 addr) +{ + struct xdp_umem_ring *ring =3D (struct xdp_umem_ring *)q->ring; + u32 idx =3D q->ring->producer; + + ring->desc[idx++ & q->ring_mask] =3D addr; + + __xskq_prod_submit(q, idx); +} + static inline void xskq_prod_submit(struct xsk_queue *q) { __xskq_prod_submit(q, q->cached_prod); --=20 2.20.1 From nobody Sun Feb 8 22:18:36 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D474FEB64DD for ; Tue, 8 Aug 2023 03:21:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231379AbjHHDVK (ORCPT ); Mon, 7 Aug 2023 23:21:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45714 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231225AbjHHDVA (ORCPT ); Mon, 7 Aug 2023 23:21:00 -0400 Received: from mail-pl1-x62f.google.com (mail-pl1-x62f.google.com [IPv6:2607:f8b0:4864:20::62f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DBD4D138 for ; Mon, 7 Aug 2023 20:20:57 -0700 (PDT) Received: by mail-pl1-x62f.google.com with SMTP id d9443c01a7336-1bba54f7eefso40769765ad.1 for ; Mon, 07 Aug 2023 20:20:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1691464857; x=1692069657; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=tzvycUSp6fj3lhy6v0s47NdlhPCC8jzqygliYQ4HaF8=; b=NvVq2h/3LxshBlFB6gD30775uo3/1fXs1KQJKJFoXl/OGNxuF83Tu9AIGUcdDtSWix TbsEnaXloQxMxlKaCVptTVgdab0tdTLokOycU/saAgyxbajhxKhjG0JFKl+S2ntaQDEi H/BkS3UzYJXQTbydd1Fh8rJ28tHVdjdslTXSLoKRAaE6VlD7bwEbCCHTjH2Yl/CtlOgq qE8YFjdsbQs2aSzfcriCwjrjvYqmwhvsFV5aopNlV9BEMdlZ9Dcqo9riMIPW6sSi4gWL WHT5FQQxnzB4mD3NT1+4b1bHu+7RUbCzfXxMfDLL5kYnoz3Gbztq3ildDRlo8jDGdh0u QzSw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691464857; x=1692069657; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=tzvycUSp6fj3lhy6v0s47NdlhPCC8jzqygliYQ4HaF8=; b=jE6sKLnFQJ2QArLBT4BsmxgYKKAtM7Y1p1hs4AqUjSUC6bu1h2uVJdR9L412w/fr+J YxrwQ/WL/MKlZlu65eM7WwwfpjrzKgBDZakoGhy8eeI/99RjoRyeml/M8NoTPDDO1LTp FVddDnI6fMBue5UWT6FFkX+oTL3amY9tFTrHK44dgGkJdgmpFHPT7uwE/lyMyZRz+926 vbVm2irHwMgxo6ebAThxI1PeYE5zehi14Otl/xhna+b9uIChAhMifvMCntpWB+ujRHRq Ij1saahAf3hH0tGo+cWkIaB+wyszS6G/Ffr7GGdq4p0lAGT+WlfddNYJ+5GvQ6aHFUDQ jiiw== X-Gm-Message-State: AOJu0YxsJkho3ATmRro6VjH5Q16SzlmXWG9lVfmbijsYKrDOYdabtiKi RxlNa2OpJuQW3JDRDmDkbzITmw== X-Google-Smtp-Source: AGHT+IGczbp6+PgAEbO1oFEDSNGLeIH6sbO9lcpT91LZiAk8F7bvXkch/TPiWwysUhcrgcO4FE55Mg== X-Received: by 2002:a17:902:6bc1:b0:1b6:6a14:3734 with SMTP id m1-20020a1709026bc100b001b66a143734mr8363404plt.29.1691464857325; Mon, 07 Aug 2023 20:20:57 -0700 (PDT) Received: from C02FG34NMD6R.bytedance.net ([2408:8656:30f8:e020::b]) by smtp.gmail.com with ESMTPSA id 13-20020a170902c10d00b001b896686c78sm7675800pli.66.2023.08.07.20.20.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Aug 2023 20:20:56 -0700 (PDT) From: Albert Huang To: davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com Cc: Albert Huang , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Pavel Begunkov , Yunsheng Lin , Kees Cook , Richard Gobert , "open list:NETWORKING DRIVERS" , open list , "open list:XDP (eXpress Data Path)" Subject: [RFC v3 Optimizing veth xsk performance 5/9] veth: use send queue tx napi to xmit xsk tx desc Date: Tue, 8 Aug 2023 11:19:09 +0800 Message-Id: <20230808031913.46965-6-huangjie.albert@bytedance.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) In-Reply-To: <20230808031913.46965-1-huangjie.albert@bytedance.com> References: <20230808031913.46965-1-huangjie.albert@bytedance.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org use send queue tx napi to xmit xsk tx desc Signed-off-by: Albert Huang --- drivers/net/veth.c | 230 ++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 229 insertions(+), 1 deletion(-) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 25faba879505..28b891dd8dc9 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -27,6 +27,8 @@ #include #include #include +#include +#include =20 #define DRV_NAME "veth" #define DRV_VERSION "1.0" @@ -1061,6 +1063,141 @@ static int veth_poll(struct napi_struct *napi, int = budget) return done; } =20 +static struct sk_buff *veth_build_skb(void *head, int headroom, int len, + int buflen) +{ + struct sk_buff *skb; + + skb =3D build_skb(head, buflen); + if (!skb) + return NULL; + + skb_reserve(skb, headroom); + skb_put(skb, len); + + return skb; +} + +static int veth_xsk_tx_xmit(struct veth_sq *sq, struct xsk_buff_pool *xsk_= pool, int budget) +{ + struct veth_priv *priv, *peer_priv; + struct net_device *dev, *peer_dev; + struct veth_stats stats =3D {}; + struct sk_buff *skb =3D NULL; + struct veth_rq *peer_rq; + struct xdp_desc desc; + int done =3D 0; + + dev =3D sq->dev; + priv =3D netdev_priv(dev); + peer_dev =3D priv->peer; + peer_priv =3D netdev_priv(peer_dev); + + /* todo: queue index must set before this */ + peer_rq =3D &peer_priv->rq[sq->queue_index]; + + /* set xsk wake up flag, to do: where to disable */ + if (xsk_uses_need_wakeup(xsk_pool)) + xsk_set_tx_need_wakeup(xsk_pool); + + while (budget-- > 0) { + unsigned int truesize =3D 0; + struct page *page; + void *vaddr; + void *addr; + + if (!xsk_tx_peek_desc(xsk_pool, &desc)) + break; + + addr =3D xsk_buff_raw_get_data(xsk_pool, desc.addr); + + /* can not hold all data in a page */ + truesize =3D SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); + truesize +=3D desc.len + xsk_pool->headroom; + if (truesize > PAGE_SIZE) { + xsk_tx_completed_addr(xsk_pool, desc.addr); + stats.xdp_drops++; + break; + } + + page =3D dev_alloc_page(); + if (!page) { + xsk_tx_completed_addr(xsk_pool, desc.addr); + stats.xdp_drops++; + break; + } + vaddr =3D page_to_virt(page); + + memcpy(vaddr + xsk_pool->headroom, addr, desc.len); + xsk_tx_completed_addr(xsk_pool, desc.addr); + + skb =3D veth_build_skb(vaddr, xsk_pool->headroom, desc.len, PAGE_SIZE); + if (!skb) { + put_page(page); + stats.xdp_drops++; + break; + } + skb->protocol =3D eth_type_trans(skb, peer_dev); + napi_gro_receive(&peer_rq->xdp_napi, skb); + + stats.xdp_bytes +=3D desc.len; + done++; + } + + /* release, move consumer=EF=BC=8Cand wakeup the producer */ + if (done) { + napi_schedule(&peer_rq->xdp_napi); + xsk_tx_release(xsk_pool); + } + + u64_stats_update_begin(&sq->stats.syncp); + sq->stats.vs.xdp_packets +=3D done; + sq->stats.vs.xdp_bytes +=3D stats.xdp_bytes; + sq->stats.vs.xdp_drops +=3D stats.xdp_drops; + u64_stats_update_end(&sq->stats.syncp); + + return done; +} + +static int veth_poll_tx(struct napi_struct *napi, int budget) +{ + struct veth_sq *sq =3D container_of(napi, struct veth_sq, xdp_napi); + struct xsk_buff_pool *pool; + int done =3D 0; + + sq->xsk.last_cpu =3D smp_processor_id(); + + /* xmit for tx queue */ + rcu_read_lock(); + pool =3D rcu_dereference(sq->xsk.pool); + if (pool) + done =3D veth_xsk_tx_xmit(sq, pool, budget); + + rcu_read_unlock(); + + if (done < budget) { + /* if done < budget, the tx ring is no buffer */ + napi_complete_done(napi, done); + } + + return done; +} + +static int veth_napi_add_tx(struct net_device *dev) +{ + struct veth_priv *priv =3D netdev_priv(dev); + int i; + + for (i =3D 0; i < dev->real_num_rx_queues; i++) { + struct veth_sq *sq =3D &priv->sq[i]; + + netif_napi_add(dev, &sq->xdp_napi, veth_poll_tx); + napi_enable(&sq->xdp_napi); + } + + return 0; +} + static int veth_create_page_pool(struct veth_rq *rq) { struct page_pool_params pp_params =3D { @@ -1153,6 +1290,19 @@ static void veth_napi_del_range(struct net_device *d= ev, int start, int end) } } =20 +static void veth_napi_del_tx(struct net_device *dev) +{ + struct veth_priv *priv =3D netdev_priv(dev); + int i; + + for (i =3D 0; i < dev->real_num_rx_queues; i++) { + struct veth_sq *sq =3D &priv->sq[i]; + + napi_disable(&sq->xdp_napi); + __netif_napi_del(&sq->xdp_napi); + } +} + static void veth_napi_del(struct net_device *dev) { veth_napi_del_range(dev, 0, dev->real_num_rx_queues); @@ -1360,7 +1510,7 @@ static void veth_set_xdp_features(struct net_device *= dev) struct veth_priv *priv_peer =3D netdev_priv(peer); xdp_features_t val =3D NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT | - NETDEV_XDP_ACT_RX_SG; + NETDEV_XDP_ACT_RX_SG | NETDEV_XDP_ACT_XSK_ZEROCOPY; =20 if (priv_peer->_xdp_prog || veth_gro_requested(peer)) val |=3D NETDEV_XDP_ACT_NDO_XMIT | @@ -1737,11 +1887,89 @@ static int veth_xdp_set(struct net_device *dev, str= uct bpf_prog *prog, return err; } =20 +static int veth_xsk_pool_enable(struct net_device *dev, struct xsk_buff_po= ol *pool, u16 qid) +{ + struct veth_priv *peer_priv; + struct veth_priv *priv =3D netdev_priv(dev); + struct net_device *peer_dev =3D priv->peer; + int err =3D 0; + + if (qid >=3D dev->real_num_tx_queues) + return -EINVAL; + + if (!peer_dev) + return -EINVAL; + + /* no dma, so we just skip dma skip in xsk zero copy */ + pool->dma_check_skip =3D true; + + peer_priv =3D netdev_priv(peer_dev); + + /* enable peer tx xdp here, this side + * xdp is enable by veth_xdp_set + * to do: we need to check whther this side is already enable xdp + * maybe it do not have xdp prog + */ + if (!(peer_priv->_xdp_prog) && (!veth_gro_requested(peer_dev))) { + /* peer should enable napi*/ + err =3D veth_napi_enable(peer_dev); + if (err) + return err; + } + + /* Here is already protected by rtnl_lock, so rcu_assign_pointer + * is safe. + */ + rcu_assign_pointer(priv->sq[qid].xsk.pool, pool); + + veth_napi_add_tx(dev); + + return err; +} + +static int veth_xsk_pool_disable(struct net_device *dev, u16 qid) +{ + struct veth_priv *peer_priv; + struct veth_priv *priv =3D netdev_priv(dev); + struct net_device *peer_dev =3D priv->peer; + int err =3D 0; + + if (qid >=3D dev->real_num_tx_queues) + return -EINVAL; + + if (!peer_dev) + return -EINVAL; + + peer_priv =3D netdev_priv(peer_dev); + + /* to do: this may be failed */ + if (!(peer_priv->_xdp_prog) && (!veth_gro_requested(peer_dev))) { + /* disable peer napi */ + veth_napi_del(peer_dev); + } + + veth_napi_del_tx(dev); + + rcu_assign_pointer(priv->sq[qid].xsk.pool, NULL); + return err; +} + +/* this is for setup xdp */ +static int veth_xsk_pool_setup(struct net_device *dev, struct netdev_bpf *= xdp) +{ + if (xdp->xsk.pool) + return veth_xsk_pool_enable(dev, xdp->xsk.pool, xdp->xsk.queue_id); + else + return veth_xsk_pool_disable(dev, xdp->xsk.queue_id); +} + static int veth_xdp(struct net_device *dev, struct netdev_bpf *xdp) { switch (xdp->command) { case XDP_SETUP_PROG: return veth_xdp_set(dev, xdp->prog, xdp->extack); + case XDP_SETUP_XSK_POOL: + return veth_xsk_pool_setup(dev, xdp); default: return -EINVAL; } --=20 2.20.1 From nobody Sun Feb 8 22:18:36 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0F05CEB64DD for ; Tue, 8 Aug 2023 03:21:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231225AbjHHDVc (ORCPT ); Mon, 7 Aug 2023 23:21:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45848 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231370AbjHHDVG (ORCPT ); Mon, 7 Aug 2023 23:21:06 -0400 Received: from mail-pg1-x530.google.com (mail-pg1-x530.google.com [IPv6:2607:f8b0:4864:20::530]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 047281729 for ; Mon, 7 Aug 2023 20:21:05 -0700 (PDT) Received: by mail-pg1-x530.google.com with SMTP id 41be03b00d2f7-53482b44007so3016567a12.2 for ; Mon, 07 Aug 2023 20:21:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1691464864; x=1692069664; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ICV3oSZrEepyfkdFh294C2oc/HDgt3q4tnvL6u1XOSo=; b=lutZ0A2WljE0PHMq69IriO2rZ3mTGdwt4yVVFZOwx7KaXGHtsYthZEnorDptzuoR9Z 3bTfCPeVS8fpSvafHRcMLxrDNgnA2loby72Wgu9cfloNQRxmHCbPK3vKdpTZfK8jUlmF 7p0hYB/3nnDjWEtIceXVAL8kl4qqmuLDT5CgZQQRARE9KRnFvExGQzyuneXXfHttAgdL zsKB0Oh+uxe0BPwxqJaQMOPzn/nTgtz9LdZMlVg0RTJdKhq0bRAT7rJsnC7zn+ynGm5n Lymc0VkgXf3C5i+q0ZNqVquZH5V2gVRVSKb9Yn7GpEFzEVlv3u68qZNnVPavuo/G/XFK RNwQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691464864; x=1692069664; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ICV3oSZrEepyfkdFh294C2oc/HDgt3q4tnvL6u1XOSo=; b=OqbWEPWagk7j7me7n4HvNtO0vOTtZbHqulCu8PGlR4gJ7tWy+doWvPAMKrLIHwWde8 wK8gnkDxy6c9ZBoKlWPV6DltHAG4aC6BHNuQuNHqBFHY03tbaJn8zwuLz3ekXQNznojE 25/IaMNvpv4junlTnmmZ0TTA1J+tu7lj4MlaWY17j8PJhCnchdcYRVIjfTryk8zw3U/b pW2RKqU2KkavmrfXsG0sg6aH5eRAPlzPUdZoCXmxVHa6KbuhLZpA9bsRNvXj1YcWd39h IaGqmv8l5RbTtRMHQPIUR8wrzQ7kzC9albHhPUafURtLwkFsxUPmduhkUaXoi7KdeImJ fr5Q== X-Gm-Message-State: AOJu0YwYrKRa849FSqDMigPx3vEG013aFqpbfwX6McIQ8h8sovGNQQMQ A/wDReLqryiH2Z2c3968wD4uAg== X-Google-Smtp-Source: AGHT+IF76sZiV7Ae2VDxsLeP3TXhil8es9a99enu8nvMpK1GwLMowzM4hBRaMRVwS8bSHHASr/moLg== X-Received: by 2002:a05:6a21:2713:b0:133:2974:d31a with SMTP id rm19-20020a056a21271300b001332974d31amr8482956pzb.17.1691464864453; Mon, 07 Aug 2023 20:21:04 -0700 (PDT) Received: from C02FG34NMD6R.bytedance.net ([2408:8656:30f8:e020::b]) by smtp.gmail.com with ESMTPSA id 13-20020a170902c10d00b001b896686c78sm7675800pli.66.2023.08.07.20.20.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Aug 2023 20:21:04 -0700 (PDT) From: Albert Huang To: davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com Cc: Albert Huang , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Pavel Begunkov , Yunsheng Lin , Kees Cook , Richard Gobert , "open list:NETWORKING DRIVERS" , open list , "open list:XDP (eXpress Data Path)" Subject: [RFC v3 Optimizing veth xsk performance 6/9] veth: add ndo_xsk_wakeup callback for veth Date: Tue, 8 Aug 2023 11:19:10 +0800 Message-Id: <20230808031913.46965-7-huangjie.albert@bytedance.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) In-Reply-To: <20230808031913.46965-1-huangjie.albert@bytedance.com> References: <20230808031913.46965-1-huangjie.albert@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" add ndo_xsk_wakeup callback for veth, this is used to wakeup napi tx. Signed-off-by: Albert Huang --- drivers/net/veth.c | 39 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 39 insertions(+) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 28b891dd8dc9..ac78d6a87416 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -1805,6 +1805,44 @@ static void veth_set_rx_headroom(struct net_device *= dev, int new_hr) rcu_read_unlock(); } =20 +static void veth_xsk_remote_trigger_napi(void *info) +{ + struct veth_sq *sq =3D info; + + napi_schedule(&sq->xdp_napi); +} + +static int veth_xsk_wakeup(struct net_device *dev, u32 qid, u32 flag) +{ + struct veth_priv *priv; + struct veth_sq *sq; + u32 last_cpu, cur_cpu; + + if (!netif_running(dev)) + return -ENETDOWN; + + if (qid >=3D dev->real_num_rx_queues) + return -EINVAL; + + priv =3D netdev_priv(dev); + sq =3D &priv->sq[qid]; + + if (napi_if_scheduled_mark_missed(&sq->xdp_napi)) + return 0; + + last_cpu =3D sq->xsk.last_cpu; + cur_cpu =3D get_cpu(); + + /* raise a napi */ + if (last_cpu =3D=3D cur_cpu) + napi_schedule(&sq->xdp_napi); + else + smp_call_function_single(last_cpu, veth_xsk_remote_trigger_napi, sq, tru= e); + + put_cpu(); + return 0; +} + static int veth_xdp_set(struct net_device *dev, struct bpf_prog *prog, struct netlink_ext_ack *extack) { @@ -2019,6 +2057,7 @@ static const struct net_device_ops veth_netdev_ops = =3D { .ndo_set_rx_headroom =3D veth_set_rx_headroom, .ndo_bpf =3D veth_xdp, .ndo_xdp_xmit =3D veth_ndo_xdp_xmit, + .ndo_xsk_wakeup =3D veth_xsk_wakeup, .ndo_get_peer_dev =3D veth_peer_dev, }; =20 --=20 2.20.1 From nobody Sun Feb 8 22:18:36 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 711E1EB64DD for ; Tue, 8 Aug 2023 03:22:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231426AbjHHDWK (ORCPT ); Mon, 7 Aug 2023 23:22:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45884 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231204AbjHHDV2 (ORCPT ); Mon, 7 Aug 2023 23:21:28 -0400 Received: from mail-pl1-x636.google.com (mail-pl1-x636.google.com [IPv6:2607:f8b0:4864:20::636]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 96B821BF5 for ; Mon, 7 Aug 2023 20:21:12 -0700 (PDT) Received: by mail-pl1-x636.google.com with SMTP id d9443c01a7336-1bba04b9df3so45869755ad.0 for ; Mon, 07 Aug 2023 20:21:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1691464872; x=1692069672; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=vNrm/+UjyC6jP3361KhAbnju37DBvc3v6PBY22LsM1M=; b=khfL3k+d/CGznwaO5bu9+cRGuRQ6fYxlH9gn4KFfrTBbf9tNYQ6PkS2QLz2ndsXr7u ZZFyAYrq4pkWZ2egL0T8+F1oRE2BdqG2AchQKzk5rpmHiJn7brKTCQ8ATfQT7sFIWFge +wJcsLzJca4mkXRRBfJXZfKyNTBPlv+eL9czPa03R4sohsTcUYnfsQxbhSXxU25FemRt iejLrUhpyeo72CMnzwLsp9SuIrXP3m4KaE4DHI1lDQX6YY5lFuWPo3Vo9fSW3SuO+lAg gxB6SIK1coQKqnReEqyCG/Z1l38wetEMXmr5Avzcf+u4KBASxQT7KEpJWvcaZOhS10vC fb8A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691464872; x=1692069672; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=vNrm/+UjyC6jP3361KhAbnju37DBvc3v6PBY22LsM1M=; b=IL9/TDscc15cJUqsiI109PGIQxKuybvEbLZxx4RtkUSaieN6R9mWBk7sP0IgmAmRl5 3i8pyr0wcwFCFoj/e/H3a9HXmjJ63PN0FQoBv3xTecfjI+SzEfzuoL1apXQAfPpTCDxq z01oT9dta/a4/80/2icWLgNVmujoDpG5UVhe4NrccWIzQMjPMILgKnbqVc1hb0i78fqk oz/IEYSi2ZgHJoKn/Trutlth6rrezWn0L5qq+LkBc5rQRc5Cvzy/hPR3w4CGs3CLBEDP YJHRIBdmSNv1reZU2EOoNlQ6PdFMTl8Hs3auY70RSw8MaFqo48uFE/EnZVaz1OU2Los7 e9Vg== X-Gm-Message-State: AOJu0YwxrylECoJ+mGIpZG5Fy0qriRSCJEKqRe1yliGlX38shn51NUyI OQ62vP7UEWWJmWbVjMWPrnKRvQ== X-Google-Smtp-Source: AGHT+IEf+1UGY33nl/NwIi62HwrnmxDbqQr/9EwPpV2GjIISAPneNKJbTvSnXmVCy/syiT18tgdayw== X-Received: by 2002:a17:902:6909:b0:1b8:9461:6729 with SMTP id j9-20020a170902690900b001b894616729mr10585584plk.2.1691464871774; Mon, 07 Aug 2023 20:21:11 -0700 (PDT) Received: from C02FG34NMD6R.bytedance.net ([2408:8656:30f8:e020::b]) by smtp.gmail.com with ESMTPSA id 13-20020a170902c10d00b001b896686c78sm7675800pli.66.2023.08.07.20.21.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Aug 2023 20:21:11 -0700 (PDT) From: Albert Huang To: davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com Cc: Albert Huang , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Pavel Begunkov , Yunsheng Lin , Kees Cook , Richard Gobert , "open list:NETWORKING DRIVERS" , open list , "open list:XDP (eXpress Data Path)" Subject: [RFC v3 Optimizing veth xsk performance 7/9] sk_buff: add destructor_arg_xsk_pool for zero copy Date: Tue, 8 Aug 2023 11:19:11 +0800 Message-Id: <20230808031913.46965-8-huangjie.albert@bytedance.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) In-Reply-To: <20230808031913.46965-1-huangjie.albert@bytedance.com> References: <20230808031913.46965-1-huangjie.albert@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" this member is add for dummy dev to suppot zero copy Signed-off-by: Albert Huang --- include/linux/skbuff.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 16a49ba534e4..db999056022e 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -592,6 +592,8 @@ struct skb_shared_info { /* Intermediate layers must ensure that destructor_arg * remains valid until skb destructor */ void * destructor_arg; + /* just for dummy device xsk zero copy */ + void *destructor_arg_xsk_pool; =20 /* must be last field, see pskb_expand_head() */ skb_frag_t frags[MAX_SKB_FRAGS]; --=20 2.20.1 From nobody Sun Feb 8 22:18:36 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2ACF3EB64DD for ; Tue, 8 Aug 2023 03:22:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231454AbjHHDWk (ORCPT ); Mon, 7 Aug 2023 23:22:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46664 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231394AbjHHDVu (ORCPT ); Mon, 7 Aug 2023 23:21:50 -0400 Received: from mail-pg1-x536.google.com (mail-pg1-x536.google.com [IPv6:2607:f8b0:4864:20::536]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 96E1D1FFC for ; Mon, 7 Aug 2023 20:21:19 -0700 (PDT) Received: by mail-pg1-x536.google.com with SMTP id 41be03b00d2f7-563e21a6011so3822560a12.0 for ; Mon, 07 Aug 2023 20:21:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1691464879; x=1692069679; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=VNxERTokHn2QOzCO7nu1QH5j2GU8N7cGLZ9USDe1a7I=; b=C4mBcxo9HLIzgPmkZwvVwMe2Y8+oOUbmzCrCDJqkOO4K6EXNq7sy0sXwURk4dnYWN/ nqw603ksEycjKPz+q7Gb7cwNFSYIK+vVrjiyNyqw3L/+v1thar9dzpcpeGKEFR3o4wLH S6xLfrbCC72IxnIkr5RURV71aSuBQadDIjBi3PHym4cpsgU/YZrISBPUzd4jceT+L/0a 307O/xTRmtCcwfALjTjmrGApGmrIdSdjks+lyuqttA5hf1liWdxzdc0Cqo6COCriBkaS QPLzDbZufpr2FBVXchXRpN1W/43dNgJJikedWzqrHrtBXTwNFXaAoiDaBXhaBmaMmOlA CRuw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691464879; x=1692069679; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=VNxERTokHn2QOzCO7nu1QH5j2GU8N7cGLZ9USDe1a7I=; b=GftnFg7Kn/I7llgRSeCKiDaji6HG56QA0JGB7pil67rOun4dMc13GEQ7xsZ8k1kRfD C4OpOAcFQIcFWDa1tCpJ4EYu+z4/rHbo263LYq6hFInKS9DsgXbMovpn1K/z5NUZHVUd xJc7H19jj+cgLQNrzjoP4EYGJc/4Q0PGrNN18Egt6VoeeFLKVNyBPBR+NPpfzVOAGYJi 7FJDLGV3O3bCl5JS4PnzJV/DTLsLfwwvPsroZLkiTXFzuf2+6HtRMCsV6htFtTLpAOwx 50+bpm3t9ceLgGfuCCm1Z2e71MKiV7L7EZl0UNUZhAOIKd81UoobLPAEfutwd1qd3DlT /5hQ== X-Gm-Message-State: AOJu0YxMFgJN9fNkUNTOeORbxziJEkQOqtxhyTxx6Jw12YSgKOpx6HKR NNtdBZ9/t+/36WoxwBuE0n8TQw== X-Google-Smtp-Source: AGHT+IEOYdq2Iq7uEVVkuLq5Idoxa0lnhD1ymimKbq0IIAGWG6YIbaqjqtDvHW0ImxBWxvTQ4G+7Nw== X-Received: by 2002:a05:6a20:3206:b0:138:2fb8:6a14 with SMTP id hl6-20020a056a20320600b001382fb86a14mr11332615pzc.3.1691464878793; Mon, 07 Aug 2023 20:21:18 -0700 (PDT) Received: from C02FG34NMD6R.bytedance.net ([2408:8656:30f8:e020::b]) by smtp.gmail.com with ESMTPSA id 13-20020a170902c10d00b001b896686c78sm7675800pli.66.2023.08.07.20.21.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Aug 2023 20:21:18 -0700 (PDT) From: Albert Huang To: davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com Cc: Albert Huang , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Pavel Begunkov , Yunsheng Lin , Kees Cook , Richard Gobert , "open list:NETWORKING DRIVERS" , open list , "open list:XDP (eXpress Data Path)" Subject: [RFC v3 Optimizing veth xsk performance 8/9] veth: af_xdp tx batch support for ipv4 udp Date: Tue, 8 Aug 2023 11:19:12 +0800 Message-Id: <20230808031913.46965-9-huangjie.albert@bytedance.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) In-Reply-To: <20230808031913.46965-1-huangjie.albert@bytedance.com> References: <20230808031913.46965-1-huangjie.albert@bytedance.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org A typical topology is shown below: veth<--------veth-peer 1 | |2 | bridge<------->eth0(such as mlnx5 NIC) If you use af_xdp to send packets from veth to a physical NIC, it needs to go through some software paths, so we can refer to the implementation of kernel GSO. When af_xdp sends packets out from veth, consider aggregating packets and send a large packet from the veth virtual NIC to the physical NIC. performance:(test weth libxdp lib) AF_XDP without batch : 480 Kpps (with ksoftirqd 100% cpu) AF_XDP with batch : 1.5 Mpps (with ksoftirqd 15% cpu) With af_xdp batch, the libxdp user-space program reaches a bottleneck. Therefore, the softirq did not reach the limit. Signed-off-by: Albert Huang --- drivers/net/veth.c | 408 ++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 387 insertions(+), 21 deletions(-) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index ac78d6a87416..70489d017b51 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -29,6 +29,7 @@ #include #include #include +#include =20 #define DRV_NAME "veth" #define DRV_VERSION "1.0" @@ -103,6 +104,23 @@ struct veth_xdp_tx_bq { unsigned int count; }; =20 +struct veth_batch_tuple { + __u8 protocol; + __be32 saddr; + __be32 daddr; + __be16 source; + __be16 dest; + __be16 batch_size; + __be16 batch_segs; + bool batch_enable; + bool batch_flush; +}; + +struct veth_seg_info { + u32 segs; + u64 desc[] ____cacheline_aligned_in_smp; +}; + /* * ethtool interface */ @@ -1078,11 +1096,340 @@ static struct sk_buff *veth_build_skb(void *head, = int headroom, int len, return skb; } =20 +static void veth_xsk_destruct_skb(struct sk_buff *skb) +{ + struct skb_shared_info *si =3D skb_shinfo(skb); + struct xsk_buff_pool *pool =3D (struct xsk_buff_pool *)si->destructor_arg= _xsk_pool; + struct veth_seg_info *seg_info =3D (struct veth_seg_info *)si->destructor= _arg; + unsigned long flags; + u32 index =3D 0; + u64 addr; + + /* release cq */ + spin_lock_irqsave(&pool->cq_lock, flags); + for (index =3D 0; index < seg_info->segs; index++) { + addr =3D (u64)(long)seg_info->desc[index]; + xsk_tx_completed_addr(pool, addr); + } + spin_unlock_irqrestore(&pool->cq_lock, flags); + + kfree(seg_info); + si->destructor_arg =3D NULL; + si->destructor_arg_xsk_pool =3D NULL; +} + +static struct sk_buff *veth_build_gso_head_skb(struct net_device *dev, + char *buff, u32 tot_len, + u32 headroom, u32 iph_len, + u32 th_len) +{ + struct sk_buff *skb =3D NULL; + int err =3D 0; + + skb =3D alloc_skb(tot_len, GFP_KERNEL); + if (unlikely(!skb)) + return NULL; + + /* header room contains the eth header */ + skb_reserve(skb, headroom - ETH_HLEN); + skb_put(skb, ETH_HLEN + iph_len + th_len); + skb_shinfo(skb)->gso_segs =3D 0; + + err =3D skb_store_bits(skb, 0, buff, ETH_HLEN + iph_len + th_len); + if (unlikely(err)) { + kfree_skb(skb); + return NULL; + } + + skb->protocol =3D eth_type_trans(skb, dev); + skb->network_header =3D skb->mac_header + ETH_HLEN; + skb->transport_header =3D skb->network_header + iph_len; + skb->ip_summed =3D CHECKSUM_PARTIAL; + + return skb; +} + +/* only ipv4 udp match + * to do: tcp and ipv6 + */ +static inline bool veth_segment_match(struct veth_batch_tuple *tuple, + struct iphdr *iph, struct udphdr *udph) +{ + if (tuple->protocol =3D=3D iph->protocol && + tuple->saddr =3D=3D iph->saddr && + tuple->daddr =3D=3D iph->daddr && + tuple->source =3D=3D udph->source && + tuple->dest =3D=3D udph->dest && + tuple->batch_size =3D=3D ntohs(udph->len)) { + tuple->batch_flush =3D false; + return true; + } + + tuple->batch_flush =3D true; + return false; +} + +static inline void veth_tuple_init(struct veth_batch_tuple *tuple, + struct iphdr *iph, struct udphdr *udph) +{ + tuple->protocol =3D iph->protocol; + tuple->saddr =3D iph->saddr; + tuple->daddr =3D iph->daddr; + tuple->source =3D udph->source; + tuple->dest =3D udph->dest; + tuple->batch_flush =3D false; + tuple->batch_size =3D ntohs(udph->len); + tuple->batch_segs =3D 0; +} + +static inline bool veth_batch_ip_check_v4(struct iphdr *iph, u32 len) +{ + if (len <=3D (ETH_HLEN + sizeof(*iph))) + return false; + + if (iph->ihl < 5 || iph->version !=3D 4 || len < (iph->ihl * 4 + ETH_HLEN= )) + return false; + + return true; +} + +static struct sk_buff *veth_build_skb_batch_udp(struct net_device *dev, + struct xsk_buff_pool *pool, + struct xdp_desc *desc, + struct veth_batch_tuple *tuple, + struct sk_buff *prev_skb) +{ + u32 hr, len, ts, index, iph_len, th_len, data_offset, data_len, tot_len; + struct veth_seg_info *seg_info; + void *buffer; + struct udphdr *udph; + struct iphdr *iph; + struct sk_buff *skb; + struct page *page; + u32 seg_len =3D 0; + int hh_len =3D 0; + u64 addr; + + addr =3D desc->addr; + len =3D desc->len; + + /* l2 reserved len */ + hh_len =3D LL_RESERVED_SPACE(dev); + hr =3D max(NET_SKB_PAD, L1_CACHE_ALIGN(hh_len)); + + /* data points to eth header */ + buffer =3D (unsigned char *)xsk_buff_raw_get_data(pool, addr); + + iph =3D (struct iphdr *)(buffer + ETH_HLEN); + iph_len =3D iph->ihl * 4; + + udph =3D (struct udphdr *)(buffer + ETH_HLEN + iph_len); + th_len =3D sizeof(struct udphdr); + + if (tuple->batch_flush) + veth_tuple_init(tuple, iph, udph); + + ts =3D pool->unaligned ? len : pool->chunk_size; + + data_offset =3D offset_in_page(buffer) + ETH_HLEN + iph_len + th_len; + data_len =3D len - (ETH_HLEN + iph_len + th_len); + + /* head is null or this is a new 5 tuple */ + if (!prev_skb || !veth_segment_match(tuple, iph, udph)) { + tot_len =3D hr + iph_len + th_len; + skb =3D veth_build_gso_head_skb(dev, buffer, tot_len, hr, iph_len, th_le= n); + if (!skb) { + /* to do: handle here for skb */ + return NULL; + } + + /* store information for gso */ + seg_len =3D struct_size(seg_info, desc, MAX_SKB_FRAGS); + seg_info =3D kmalloc(seg_len, GFP_KERNEL); + if (!seg_info) { + /* to do */ + kfree_skb(skb); + return NULL; + } + } else { + skb =3D prev_skb; + skb_shinfo(skb)->gso_type =3D SKB_GSO_UDP_L4 | SKB_GSO_PARTIAL; + skb_shinfo(skb)->gso_size =3D data_len; + skb->ip_summed =3D CHECKSUM_PARTIAL; + + /* max segment is MAX_SKB_FRAGS */ + if (skb_shinfo(skb)->gso_segs >=3D MAX_SKB_FRAGS - 1) + tuple->batch_flush =3D true; + + seg_info =3D (struct veth_seg_info *)skb_shinfo(skb)->destructor_arg; + } + + /* offset in umem pool buffer */ + addr =3D buffer - pool->addrs; + + /* get the page of the desc */ + page =3D pool->umem->pgs[addr >> PAGE_SHIFT]; + + /* in order to avoid to get freed by kfree_skb */ + get_page(page); + + /* desc.data can not hold in two */ + skb_fill_page_desc(skb, skb_shinfo(skb)->gso_segs, page, data_offset, dat= a_len); + + skb->len +=3D data_len; + skb->data_len +=3D data_len; + skb->truesize +=3D ts; + skb->dev =3D dev; + + /* later we will support gso for this */ + index =3D skb_shinfo(skb)->gso_segs; + seg_info->desc[index] =3D desc->addr; + seg_info->segs =3D ++index; + skb_shinfo(skb)->gso_segs++; + + skb_shinfo(skb)->destructor_arg =3D (void *)(long)seg_info; + skb_shinfo(skb)->destructor_arg_xsk_pool =3D (void *)(long)pool; + skb->destructor =3D veth_xsk_destruct_skb; + + /* to do: + * add skb to sock. may be there is no need to do for this + * and this might be multiple xsk sockets involved, so it's + * difficult to determine which socket is sending the data. + * refcount_add(ts, &xs->sk.sk_wmem_alloc); + */ + return skb; +} + +static inline struct sk_buff *veth_build_skb_def(struct net_device *dev, + struct xsk_buff_pool *pool, struct xdp_desc *desc) +{ + struct sk_buff *skb =3D NULL; + struct page *page; + void *buffer; + void *vaddr; + + page =3D dev_alloc_page(); + if (!page) + return NULL; + + buffer =3D (unsigned char *)xsk_buff_raw_get_data(pool, desc->addr); + + vaddr =3D page_to_virt(page); + memcpy(vaddr + pool->headroom, buffer, desc->len); + skb =3D veth_build_skb(vaddr, pool->headroom, desc->len, PAGE_SIZE); + if (!skb) { + put_page(page); + return NULL; + } + + skb->protocol =3D eth_type_trans(skb, dev); + + return skb; +} + +/* To call the following function, the following conditions must be met: + * 1.The data packet must be a standard Ethernet data packet + * 2. Data packets support batch sending + */ +static inline struct sk_buff *veth_build_skb_batch_v4(struct net_device *d= ev, + struct xsk_buff_pool *pool, + struct xdp_desc *desc, + struct veth_batch_tuple *tuple, + struct sk_buff *prev_skb) +{ + struct iphdr *iph; + void *buffer; + u64 addr; + + addr =3D desc->addr; + buffer =3D (unsigned char *)xsk_buff_raw_get_data(pool, addr); + iph =3D (struct iphdr *)(buffer + ETH_HLEN); + if (!veth_batch_ip_check_v4(iph, desc->len)) + goto normal; + + switch (iph->protocol) { + case IPPROTO_UDP: + return veth_build_skb_batch_udp(dev, pool, desc, tuple, prev_skb); + default: + break; + } +normal: + tuple->batch_enable =3D false; + return veth_build_skb_def(dev, pool, desc); +} + +/* Zero copy needs to meet the following conditions=EF=BC=9A + * 1. The data content of tx desc must be within one page + * 2=E3=80=81the tx desc must support batch xmit, which seted by userspace + */ +static inline bool veth_batch_desc_check(void *buff, u32 len) +{ + u32 offset; + + offset =3D offset_in_page(buff); + if (PAGE_SIZE - offset < len) + return false; + + return true; +} + +/* here must be a ipv4 or ipv6 packet */ +static inline struct sk_buff *veth_build_skb_batch(struct net_device *dev, + struct xsk_buff_pool *pool, + struct xdp_desc *desc, + struct veth_batch_tuple *tuple, + struct sk_buff *prev_skb) +{ + const struct ethhdr *eth; + void *buffer; + + buffer =3D xsk_buff_raw_get_data(pool, desc->addr); + if (!veth_batch_desc_check(buffer, desc->len)) + goto normal; + + eth =3D (struct ethhdr *)buffer; + switch (ntohs(eth->h_proto)) { + case ETH_P_IP: + tuple->batch_enable =3D true; + return veth_build_skb_batch_v4(dev, pool, desc, tuple, prev_skb); + /* to do: not support yet, just build skb, no batch */ + case ETH_P_IPV6: + fallthrough; + default: + break; + } + +normal: + tuple->batch_flush =3D false; + tuple->batch_enable =3D false; + return veth_build_skb_def(dev, pool, desc); +} + +/* just support ipv4 udp batch + * to do: ipv4 tcp and ipv6 + */ +static inline void veth_skb_batch_checksum(struct sk_buff *skb) +{ + struct iphdr *iph =3D ip_hdr(skb); + struct udphdr *uh =3D udp_hdr(skb); + int ip_tot_len =3D skb->len; + int udp_len =3D skb->len - (skb->transport_header - skb->network_header); + + iph->tot_len =3D htons(ip_tot_len); + ip_send_check(iph); + uh->len =3D htons(udp_len); + uh->check =3D 0; + + udp4_hwcsum(skb, iph->saddr, iph->daddr); +} + static int veth_xsk_tx_xmit(struct veth_sq *sq, struct xsk_buff_pool *xsk_= pool, int budget) { struct veth_priv *priv, *peer_priv; struct net_device *dev, *peer_dev; + struct veth_batch_tuple tuple; struct veth_stats stats =3D {}; + struct sk_buff *prev_skb =3D NULL; struct sk_buff *skb =3D NULL; struct veth_rq *peer_rq; struct xdp_desc desc; @@ -1093,24 +1440,23 @@ static int veth_xsk_tx_xmit(struct veth_sq *sq, str= uct xsk_buff_pool *xsk_pool, peer_dev =3D priv->peer; peer_priv =3D netdev_priv(peer_dev); =20 - /* todo: queue index must set before this */ + /* queue_index set in napi enable + * to do:may be we should select rq by 5-tuple or hash + */ peer_rq =3D &peer_priv->rq[sq->queue_index]; =20 + memset(&tuple, 0, sizeof(tuple)); + /* set xsk wake up flag, to do: where to disable */ if (xsk_uses_need_wakeup(xsk_pool)) xsk_set_tx_need_wakeup(xsk_pool); =20 while (budget-- > 0) { unsigned int truesize =3D 0; - struct page *page; - void *vaddr; - void *addr; =20 if (!xsk_tx_peek_desc(xsk_pool, &desc)) break; =20 - addr =3D xsk_buff_raw_get_data(xsk_pool, desc.addr); - /* can not hold all data in a page */ truesize =3D SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); truesize +=3D desc.len + xsk_pool->headroom; @@ -1120,30 +1466,50 @@ static int veth_xsk_tx_xmit(struct veth_sq *sq, str= uct xsk_buff_pool *xsk_pool, break; } =20 - page =3D dev_alloc_page(); - if (!page) { + skb =3D veth_build_skb_batch(peer_dev, xsk_pool, &desc, &tuple, prev_skb= ); + if (!skb) { + stats.rx_drops++; xsk_tx_completed_addr(xsk_pool, desc.addr); - stats.xdp_drops++; - break; + if (prev_skb !=3D skb) { + napi_gro_receive(&peer_rq->xdp_napi, prev_skb); + prev_skb =3D NULL; + } + continue; } - vaddr =3D page_to_virt(page); - - memcpy(vaddr + xsk_pool->headroom, addr, desc.len); - xsk_tx_completed_addr(xsk_pool, desc.addr); =20 - skb =3D veth_build_skb(vaddr, xsk_pool->headroom, desc.len, PAGE_SIZE); - if (!skb) { - put_page(page); - stats.xdp_drops++; - break; + if (!tuple.batch_enable) { + xsk_tx_completed_addr(xsk_pool, desc.addr); + /* flush the prev skb first to avoid out of order */ + if (prev_skb !=3D skb && prev_skb) { + veth_skb_batch_checksum(prev_skb); + napi_gro_receive(&peer_rq->xdp_napi, prev_skb); + prev_skb =3D NULL; + } + napi_gro_receive(&peer_rq->xdp_napi, skb); + skb =3D NULL; + } else { + if (prev_skb && tuple.batch_flush) { + veth_skb_batch_checksum(prev_skb); + napi_gro_receive(&peer_rq->xdp_napi, prev_skb); + if (prev_skb =3D=3D skb) + prev_skb =3D skb =3D NULL; + else + prev_skb =3D skb; + } else { + prev_skb =3D skb; + } } - skb->protocol =3D eth_type_trans(skb, peer_dev); - napi_gro_receive(&peer_rq->xdp_napi, skb); =20 stats.xdp_bytes +=3D desc.len; done++; } =20 + /* means there is a skb need to send to peer_rq (batch)*/ + if (skb) { + veth_skb_batch_checksum(skb); + napi_gro_receive(&peer_rq->xdp_napi, skb); + } + /* release, move consumer=EF=BC=8Cand wakeup the producer */ if (done) { napi_schedule(&peer_rq->xdp_napi); --=20 2.20.1 From nobody Sun Feb 8 22:18:36 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 088A0C001B0 for ; Tue, 8 Aug 2023 03:23:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231462AbjHHDXc (ORCPT ); Mon, 7 Aug 2023 23:23:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45972 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231350AbjHHDWr (ORCPT ); Mon, 7 Aug 2023 23:22:47 -0400 Received: from mail-pf1-x431.google.com (mail-pf1-x431.google.com [IPv6:2607:f8b0:4864:20::431]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4FDDD213F for ; Mon, 7 Aug 2023 20:21:28 -0700 (PDT) Received: by mail-pf1-x431.google.com with SMTP id d2e1a72fcca58-68783b2e40bso3598456b3a.3 for ; Mon, 07 Aug 2023 20:21:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1691464888; x=1692069688; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=DZBR/L7n+c0K4+2jvj2qF9IKHrGGr2P5oMHLWIibsdA=; b=P/V9ljiS+RZV06sSAc2COBIOCp251sZvy3FmMZoG/OkNXoyMAnbNe24EKZNrbifKu0 TMw9/bIBHlaHnzbtUtWzPHTqYsy2sFGtkRs+xt8vbZzsiS0x7Rc9mKFj2RktzhmElS+2 SIeHhQhwHgkVPUcIXzdkakUQs/PkxTbSAUA2zU2DOYbE+GTmiLeGsM6DB34oY7IioJoX u8Jqfm2ETyE34gqRs7naQ2o/4eL8Sh9ZBIOH/9/XmQWQmnyKbBt5QKI6jrWVjEbTDSob NWueoCvGAVQgsnFEvyZJztfQr+58ypf9d4mbmvIy2JWUjfWc7DNBUg9/MfRu1XPR5mvw xldg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691464888; x=1692069688; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=DZBR/L7n+c0K4+2jvj2qF9IKHrGGr2P5oMHLWIibsdA=; b=OCa4MA9DgRDH/apv7O2ve7JZrjxfhZnJroy5CaxhAq8CyAanvDiCtp/C9rBtWpIOHg 77InA7nLPCOsfqL9eJQBii94VN4a5Y/Fbv8Uy9BlffLiuSzk7X0FANNTA8fJPek/7+jm 3N5BKBxAe4ymqHoEwDa1HCngoMT4qdHMdvUrENxQK6Mk8xfofRPejqPwoIIjAKHq4YcG tvdJaq6YaOJJxShDIf2iNrWKtgshbxmbOxvLiL98q7bawiA25eA3MBZnh6x0oIerJKdB kyYMzzaIf2Dv4Iy6po48r2XeOvpRymA2aaIqDyy0YLYP8Mi1Nz9/fWpn/EAuBi8a+dgb WOvQ== X-Gm-Message-State: AOJu0YyZx9ZxiWDodKmb0u2xQgkBtBfZj+O00fIYRlEzt8b9dpuFPwxx 37vZVS1L6SEe0qr4aQsi3UG3/Q== X-Google-Smtp-Source: AGHT+IFOa9auvTw+52jDK1Q58QGt14PBhuKoWVXpCxAriS0oNTmVYC/tpubsf59++3sdyZXGMGacNA== X-Received: by 2002:a05:6a20:9193:b0:140:d536:d424 with SMTP id v19-20020a056a20919300b00140d536d424mr6753584pzd.53.1691464887689; Mon, 07 Aug 2023 20:21:27 -0700 (PDT) Received: from C02FG34NMD6R.bytedance.net ([2408:8656:30f8:e020::b]) by smtp.gmail.com with ESMTPSA id 13-20020a170902c10d00b001b896686c78sm7675800pli.66.2023.08.07.20.21.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Aug 2023 20:21:27 -0700 (PDT) From: Albert Huang To: davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com Cc: Albert Huang , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Pavel Begunkov , Yunsheng Lin , Kees Cook , Richard Gobert , "open list:NETWORKING DRIVERS" , open list , "open list:XDP (eXpress Data Path)" Subject: [RFC v3 Optimizing veth xsk performance 9/9] veth: add support for AF_XDP tx need_wakup feature Date: Tue, 8 Aug 2023 11:19:13 +0800 Message-Id: <20230808031913.46965-10-huangjie.albert@bytedance.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) In-Reply-To: <20230808031913.46965-1-huangjie.albert@bytedance.com> References: <20230808031913.46965-1-huangjie.albert@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" this patch only support for tx need_wakup feature. Signed-off-by: Albert Huang --- drivers/net/veth.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 70489d017b51..7c60c64ef10b 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -1447,9 +1447,9 @@ static int veth_xsk_tx_xmit(struct veth_sq *sq, struc= t xsk_buff_pool *xsk_pool, =20 memset(&tuple, 0, sizeof(tuple)); =20 - /* set xsk wake up flag, to do: where to disable */ + /* clear xsk wake up flag */ if (xsk_uses_need_wakeup(xsk_pool)) - xsk_set_tx_need_wakeup(xsk_pool); + xsk_clear_tx_need_wakeup(xsk_pool); =20 while (budget-- > 0) { unsigned int truesize =3D 0; @@ -1539,12 +1539,15 @@ static int veth_poll_tx(struct napi_struct *napi, i= nt budget) if (pool) done =3D veth_xsk_tx_xmit(sq, pool, budget); =20 - rcu_read_unlock(); - if (done < budget) { + /* set xsk wake up flag */ + if (xsk_uses_need_wakeup(pool)) + xsk_set_tx_need_wakeup(pool); + /* if done < budget, the tx ring is no buffer */ napi_complete_done(napi, done); } + rcu_read_unlock(); =20 return done; } --=20 2.20.1