From nobody Tue Apr 7 02:56:17 2026 Received: from mail-yw1-f179.google.com (mail-yw1-f179.google.com [209.85.128.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 36239371D0C for ; Mon, 16 Mar 2026 22:29:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773700166; cv=none; b=YUVivO0csFv21BWddx5Ohx/XhorIYozJK9tDvM/dZTTOoKymn1yXiEdfLA5kkl4vZkwevTt3N0xvrxLQSIhXFS52vO5Pxvy0Rr65ZAZw7/Ns2jmqJVNrUiILleihspFmWkla9+dXL94RmoskQthI4ZjecCzLvX+TRs1FvvTHGuA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773700166; c=relaxed/simple; bh=KysKv+4UMfhnDnTvr6BO8IlevtyErd4SdDk4OwtDiUU=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=gavCQqWzzepBEnM3GSQ0/IHiqICu1CTYWvmVB+054xeiSFKGd3FclPN13LmOM9KQ0afVBtzGN1Nt1M5ORvX3mRH0U/RJEuz0AobgZnhW+TtMhwOIxUR3rURURjpB7SxLfPvlhqmS9KGgLtvD0p5rVnuHQI8Vu7w8e3fI4uqr0Vc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=d7x2AqX1; arc=none smtp.client-ip=209.85.128.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="d7x2AqX1" Received: by mail-yw1-f179.google.com with SMTP id 00721157ae682-79a3e2e64f4so10823407b3.1 for ; Mon, 16 Mar 2026 15:29:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1773700162; x=1774304962; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=Mhg1Y4Ej84+nsyH2qvB8M3otNVTGHX9lgToOwhjqhW4=; b=d7x2AqX1KCf2gCvxClxT7pQg8TYtC6zI88CvaFlV9h0wNBR52J0zPbkx1pUD9Ih8xD Ddi9IopD4EBbUGrXBgIucM6DgLnbrfJSgk84SJLOkFjT33hsSG1qj3v4xDhtEXibD7l+ JX6lXmIMQLE0nCaYnwVQhIh+rBkaiNgYAk/XYi/WlA5fgA+Eo6xGWMP1bpJ5Rl/4zh6F aFQTrrI+SaDsTzeO6nU6ApCekP36+9+UWcbUWMVZYkwmX5fNt6JnXIHXaH176PTT3pQz 9goHQWQ+k2u5t51hkqzNl4sIrYsCje9VT4QoTz/pyN+dgw7LbzEt91BH02I0UReCsI+U UX5w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1773700162; x=1774304962; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=Mhg1Y4Ej84+nsyH2qvB8M3otNVTGHX9lgToOwhjqhW4=; b=nZZyulFkJyEi0PdQ/LGFFh4AMYX9vBcvTNLzvPEYik4ii1c3Kz5en9U55ORmrd1n44 ZxHNRGseRbwsBaM2dlZ0KSMIAkziOPeTW8eILDBrBcLX8wLVB5wMsJ6NEJ0TEcOaBWCo I6ongoFa8dd++4GsXqluhD3YMQl8OH6sxW9RSjZy1PjFuBO9at690fmggGBcGn3EQOAF lUuRbQ7oGc/19/nMuyulDpBCZc5shSyyiytST5EknQ7as3jKu/OxS05j0+BShtpk/9Z2 5IoK4th+WNFWhzioDbWOVGUeTG/YCHIOKwdbBfsZNzfN9G1liFFJkKxQCB5iYq5HFKd0 WKrA== X-Forwarded-Encrypted: i=1; AJvYcCVU/cEuf4Txoax1eTu/+vLNDwila2XpMM88pVrDuC+ezr0CFcxbiKIfoeEVXShVyEBh8GIU4DhrSRc0b0I=@vger.kernel.org X-Gm-Message-State: AOJu0YyyenwEjpWU0S3PxRB+pdwY3c0Xr+01F+Xk536nELzEC6BVLdNk jDWeGYopH+N7okjNV0obmZa5MgewXMTOrYfzsI0KggIoxeBjJuCz4+X+ X-Gm-Gg: ATEYQzzcqrjRraciNGpSOOJWRYHVR3RlM1Kfh0kdxqs+y6nqMn08e/SaGCn6NoiGyJ5 am80GX72u3LCLSsAiUHuG8HfMvuN/fb15zGzfDhseMWBv8Le+Znb740OAbmbkMfbR4Bqc+9I+1U V3HrXV35/mGBP+U0vcOouEwP1OIso4czbmATimY6POtwZSQ76XjrtyCgV0xnxCgypjALhCBXlGz cp6j0bcheRx8quPEeDPsqUMQH5xzd15DiLZhDboQkfga0bmqWgCIHTF1dpTwNEVCFpOIyRPRdOg LOaPfNVVudGxoYpyrvV/DOnNU9nLSfXRlNV/ELoKPJEGQvl1fXykLK0sJmkELGBVnDVmhBp5IR5 SrfWXvwnmJQotN0o6fOzCAc9MZAmlf3YjAcIJaZgSel80FR+fmQ0Hssl64APrp2UKniHjbnEGj6 FdYuEz93JGJTjO0I3ABTkSuPfiERKXBrJ6 X-Received: by 2002:a05:690c:1c:b0:798:34a:52de with SMTP id 00721157ae682-79a1c1ddd0fmr149731677b3.51.1773700161934; Mon, 16 Mar 2026 15:29:21 -0700 (PDT) Received: from localhost ([2a03:2880:25ff:45::]) by smtp.gmail.com with ESMTPSA id 00721157ae682-79917f08134sm92221467b3.36.2026.03.16.15.29.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 Mar 2026 15:29:21 -0700 (PDT) From: Bobby Eshleman Date: Mon, 16 Mar 2026 15:29:12 -0700 Subject: [PATCH net-next 1/6] net: devmem: support TX through netkit leased queues Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260316-scratch-bobbyeshleman-tcp-dm-netkit5-v1-1-b53c8cd72b23@meta.com> References: <20260316-scratch-bobbyeshleman-tcp-dm-netkit5-v1-0-b53c8cd72b23@meta.com> In-Reply-To: <20260316-scratch-bobbyeshleman-tcp-dm-netkit5-v1-0-b53c8cd72b23@meta.com> To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Simon Horman , Andrew Lunn , Shuah Khan Cc: Stanislav Fomichev , Mina Almasry , Wei Wang , David Wei , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, bpf@vger.kernel.org, Bobby Eshleman X-Mailer: b4 0.14.3 From: Bobby Eshleman When a netkit virtual device leases queues from a physical NIC, devmem TX bindings created on the netkit device should use the physical NIC for DMA operations rather than the virtual device, which has no DMA capability. In bind_tx_doit, walk the device's leased rx queues to discover the underlying physical device that supports netmem_tx. Use this device for DMA device lookup and pass it as the real_tx_dev in the binding. When real_tx_dev is set, it is also used for NUMA-local allocations. Extend validate_xmit_unreadable_skb() to support the netkit case, where the skb is validated twice: once on the netkit guest device and again on the physical NIC after BPF redirect or ip forwarding. Both invocations must pass for the skb to be transmitted. Signed-off-by: Bobby Eshleman --- net/core/dev.c | 26 +++++++++++++++++++------- net/core/devmem.c | 16 ++++++++++------ net/core/devmem.h | 6 ++++-- net/core/netdev-genl.c | 38 +++++++++++++++++++++++++++++++++----- 4 files changed, 66 insertions(+), 20 deletions(-) diff --git a/net/core/dev.c b/net/core/dev.c index ca4b26dfb1bd..105bd27be024 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3981,24 +3981,36 @@ static struct sk_buff *sk_validate_xmit_skb(struct = sk_buff *skb, static struct sk_buff *validate_xmit_unreadable_skb(struct sk_buff *skb, struct net_device *dev) { + struct net_devmem_dmabuf_binding *binding; struct skb_shared_info *shinfo; + struct net_device *real_tx_dev; struct net_iov *niov; =20 if (likely(skb_frags_readable(skb))) goto out; =20 - if (!dev->netmem_tx) - goto out_free; - shinfo =3D skb_shinfo(skb); + if (shinfo->nr_frags =3D=3D 0) + goto out; =20 - if (shinfo->nr_frags > 0) { - niov =3D netmem_to_net_iov(skb_frag_netmem(&shinfo->frags[0])); - if (net_is_devmem_iov(niov) && - READ_ONCE(net_devmem_iov_binding(niov)->dev) !=3D dev) + niov =3D netmem_to_net_iov(skb_frag_netmem(&shinfo->frags[0])); + if (!net_is_devmem_iov(niov)) + goto out; + + binding =3D net_devmem_iov_binding(niov); + real_tx_dev =3D READ_ONCE(binding->real_tx_dev); + + if (real_tx_dev) { + if (!real_tx_dev->netmem_tx) + goto out_free; + if (READ_ONCE(binding->dev) !=3D dev && real_tx_dev !=3D dev) goto out_free; + goto out; } =20 + if (READ_ONCE(binding->dev) !=3D dev || !dev->netmem_tx) + goto out_free; + out: return skb; =20 diff --git a/net/core/devmem.c b/net/core/devmem.c index 7ede81509968..a4148cba5b5f 100644 --- a/net/core/devmem.c +++ b/net/core/devmem.c @@ -181,12 +181,13 @@ int net_devmem_bind_dmabuf_to_queue(struct net_device= *dev, u32 rxq_idx, } =20 struct net_devmem_dmabuf_binding * -net_devmem_bind_dmabuf(struct net_device *dev, +net_devmem_bind_dmabuf(struct net_device *dev, struct net_device *real_tx_= dev, struct device *dma_dev, enum dma_data_direction direction, unsigned int dmabuf_fd, struct netdev_nl_sock *priv, struct netlink_ext_ack *extack) { + struct net_device *node_dev =3D real_tx_dev ?: dev; struct net_devmem_dmabuf_binding *binding; static u32 id_alloc_next; struct scatterlist *sg; @@ -205,13 +206,14 @@ net_devmem_bind_dmabuf(struct net_device *dev, return ERR_CAST(dmabuf); =20 binding =3D kzalloc_node(sizeof(*binding), GFP_KERNEL, - dev_to_node(&dev->dev)); + dev_to_node(&node_dev->dev)); if (!binding) { err =3D -ENOMEM; goto err_put_dmabuf; } =20 binding->dev =3D dev; + binding->real_tx_dev =3D real_tx_dev; xa_init_flags(&binding->bound_rxqs, XA_FLAGS_ALLOC); =20 err =3D percpu_ref_init(&binding->ref, @@ -254,7 +256,7 @@ net_devmem_bind_dmabuf(struct net_device *dev, * allocate MTU sized chunks here. Leave that for future work... */ binding->chunk_pool =3D gen_pool_create(PAGE_SHIFT, - dev_to_node(&dev->dev)); + dev_to_node(&node_dev->dev)); if (!binding->chunk_pool) { err =3D -ENOMEM; goto err_tx_vec; @@ -268,7 +270,7 @@ net_devmem_bind_dmabuf(struct net_device *dev, struct net_iov *niov; =20 owner =3D kzalloc_node(sizeof(*owner), GFP_KERNEL, - dev_to_node(&dev->dev)); + dev_to_node(&node_dev->dev)); if (!owner) { err =3D -ENOMEM; goto err_free_chunks; @@ -280,7 +282,8 @@ net_devmem_bind_dmabuf(struct net_device *dev, owner->binding =3D binding; =20 err =3D gen_pool_add_owner(binding->chunk_pool, dma_addr, - dma_addr, len, dev_to_node(&dev->dev), + dma_addr, len, + dev_to_node(&node_dev->dev), owner); if (err) { kfree(owner); @@ -397,7 +400,8 @@ struct net_devmem_dmabuf_binding *net_devmem_get_bindin= g(struct sock *sk, */ dst_dev =3D dst_dev_rcu(dst); if (unlikely(!dst_dev) || - unlikely(dst_dev !=3D READ_ONCE(binding->dev))) { + unlikely(dst_dev !=3D READ_ONCE(binding->dev) && + dst_dev !=3D READ_ONCE(binding->real_tx_dev))) { err =3D -ENODEV; goto out_unlock; } diff --git a/net/core/devmem.h b/net/core/devmem.h index 1c5c18581fcb..ffcf97a33633 100644 --- a/net/core/devmem.h +++ b/net/core/devmem.h @@ -20,6 +20,8 @@ struct net_devmem_dmabuf_binding { struct dma_buf_attachment *attachment; struct sg_table *sgt; struct net_device *dev; + /* Phys dev behind a virtual dev (e.g. netkit) with a queue lease. */ + struct net_device *real_tx_dev; struct gen_pool *chunk_pool; /* Protect dev */ struct mutex lock; @@ -84,7 +86,7 @@ struct dmabuf_genpool_chunk_owner { =20 void __net_devmem_dmabuf_binding_free(struct work_struct *wq); struct net_devmem_dmabuf_binding * -net_devmem_bind_dmabuf(struct net_device *dev, +net_devmem_bind_dmabuf(struct net_device *dev, struct net_device *real_tx_= dev, struct device *dma_dev, enum dma_data_direction direction, unsigned int dmabuf_fd, struct netdev_nl_sock *priv, @@ -165,7 +167,7 @@ static inline void net_devmem_put_net_iov(struct net_io= v *niov) } =20 static inline struct net_devmem_dmabuf_binding * -net_devmem_bind_dmabuf(struct net_device *dev, +net_devmem_bind_dmabuf(struct net_device *dev, struct net_device *real_tx_= dev, struct device *dma_dev, enum dma_data_direction direction, unsigned int dmabuf_fd, diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c index 7d073894ca74..2b34924dc30f 100644 --- a/net/core/netdev-genl.c +++ b/net/core/netdev-genl.c @@ -1037,7 +1037,7 @@ int netdev_nl_bind_rx_doit(struct sk_buff *skb, struc= t genl_info *info) goto err_rxq_bitmap; } =20 - binding =3D net_devmem_bind_dmabuf(netdev, dma_dev, DMA_FROM_DEVICE, + binding =3D net_devmem_bind_dmabuf(netdev, NULL, dma_dev, DMA_FROM_DEVICE, dmabuf_fd, priv, info->extack); if (IS_ERR(binding)) { err =3D PTR_ERR(binding); @@ -1082,6 +1082,8 @@ int netdev_nl_bind_rx_doit(struct sk_buff *skb, struc= t genl_info *info) int netdev_nl_bind_tx_doit(struct sk_buff *skb, struct genl_info *info) { struct net_devmem_dmabuf_binding *binding; + struct net_device *real_tx_dev =3D NULL; + struct netdev_rx_queue *lease_rxq; struct netdev_nl_sock *priv; struct net_device *netdev; struct device *dma_dev; @@ -1089,6 +1091,7 @@ int netdev_nl_bind_tx_doit(struct sk_buff *skb, struc= t genl_info *info) struct sk_buff *rsp; int err =3D 0; void *hdr; + int i; =20 if (GENL_REQ_ATTR_CHECK(info, NETDEV_A_DEV_IFINDEX) || GENL_REQ_ATTR_CHECK(info, NETDEV_A_DMABUF_FD)) @@ -1124,16 +1127,41 @@ int netdev_nl_bind_tx_doit(struct sk_buff *skb, str= uct genl_info *info) goto err_unlock_netdev; } =20 - if (!netdev->netmem_tx) { + for (i =3D 0; i < netdev->real_num_rx_queues; i++) { + lease_rxq =3D READ_ONCE(__netif_get_rx_queue(netdev, i)->lease); + + if (!lease_rxq) + continue; + + real_tx_dev =3D lease_rxq->dev; + break; + } + + if (real_tx_dev) { + if (!netif_device_present(real_tx_dev)) { + err =3D -ENODEV; + goto err_unlock_netdev; + } + + if (!real_tx_dev->netmem_tx) { + err =3D -EOPNOTSUPP; + NL_SET_ERR_MSG(info->extack, + "Driver for queue lease device does not support netmem TX"); + goto err_unlock_netdev; + } + } + + if (!real_tx_dev && !netdev->netmem_tx) { err =3D -EOPNOTSUPP; NL_SET_ERR_MSG(info->extack, "Driver does not support netmem TX"); goto err_unlock_netdev; } =20 - dma_dev =3D netdev_queue_get_dma_dev(netdev, 0); - binding =3D net_devmem_bind_dmabuf(netdev, dma_dev, DMA_TO_DEVICE, - dmabuf_fd, priv, info->extack); + dma_dev =3D netdev_queue_get_dma_dev(real_tx_dev ?: netdev, 0); + binding =3D net_devmem_bind_dmabuf(netdev, real_tx_dev, dma_dev, + DMA_TO_DEVICE, dmabuf_fd, priv, + info->extack); if (IS_ERR(binding)) { err =3D PTR_ERR(binding); goto err_unlock_netdev; --=20 2.52.0