From nobody Sun Feb 8 00:03:26 2026 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8183820E702 for ; Sat, 22 Feb 2025 19:15:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740251724; cv=none; b=UsriYuQNbmTd4tVdoN5JlLGE9dVUYxf0tUAV2sC7LlSKYEATtEZCbaQbxWRwFmNk5/tFOY9zx0oQiIwMDNPUV92gHQFXG0PDVy5q9waAWoKKolF6Hf9XneKfsv8d8eXT2vStYod3tMYsFa324rSUKiFkMaOUmQR41KGMmIU7sJA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740251724; c=relaxed/simple; bh=8hHuafidSvrrDPzN1rK4NrpnqDvHUFeQD0HDJ+fsQGU=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=djK/9ElAeyoo0kAk2cgIr1hvXwnWgFiKRiXVCILsD5ARREfv1xSF9TPlyRr6pt2/lpCuvCiIi4A/7/vM89Dfso2b1XIj0llNqo97cZBAyzdoskov51Ha9ycdNJetULq/CvsjMN6qONjBvwiL0nXAqv1tMfkW2hfe6JnrPR2+TmE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--almasrymina.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=oJimJR2l; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--almasrymina.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="oJimJR2l" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-2fc1a4c150bso6218568a91.2 for ; Sat, 22 Feb 2025 11:15:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1740251722; x=1740856522; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=CmVMF/N+Ap+RHm3V03Y6V9mWFRDB6yMKiDn7IGjBd4k=; b=oJimJR2lkU6FfMrp46gHJuzYQOlURokzO0yU9LAqqOFZzOVyEgpYou77LGo5U7Xcws yAQcR7X2yMhC2HepwNi6MKrkE5v4P44bSwRZHFc/c4nViD+72CD44dAwvS8yodXSQLGH u7dRHygmv3wtkY207XXlT92Y8wR0ea4/hF53FyvM7ih4yUXuUPmpYUCmO5Syukopr4jP A59iQzykqsmbHkEFQrMo8dYh/IU7CJvYUQKC1UCJiHWVagq8+uSpfoiVxjGfxbOaFOnM AaolUt6azz2vsTpefGqwbMhNh4OX/ZNwVGjIWcZNvVxKGau332Wmn/SA3mUfJdaEQhjh bE0g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740251722; x=1740856522; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=CmVMF/N+Ap+RHm3V03Y6V9mWFRDB6yMKiDn7IGjBd4k=; b=WieD7gc9CWNc2MXM1+ZNG5VSBsnv1LtIntYnptzv0ZwTIOI30bcJ7PHDb0Nf9N2Mfa 1YV5iRMW/ERgxzi0iAAsVNbU0IGn34T+aqudYlTml8GXcyk2vbqrkje0OpRXokkwvZ7L jFR+Uoac2cxqYneOa6PjaAfRhQI7JPsZeEMsEONTeTbIn1Dgdhyg8Omou+RGUwhGVhCp I86l8+TgfNwUY/fBL0Ha7d59+2dZ8Sk2c/QkKssKH2R5NMf2w5xWEioxlCSJma/WqjTh FeQoDpQ1MXqnM6EVuYgnZs6Fhwa0rYjAyhfTNf4UNzI0ULh+nT/jmW1d8R1zjaenHMlt rGyw== X-Forwarded-Encrypted: i=1; AJvYcCWdn+1rKUfOq2bfzIYUaK55dPTzCx397snpjP8uKrL8pb/IQ8vmEWwK8mtso4I42M+ZqbsoPBPnrTmye2w=@vger.kernel.org X-Gm-Message-State: AOJu0YwTaNUYRPvoqabVdmrcLvLIM5BIa3VtwEraBXCUP7YEmDxZg8Gl amKmhC9BJwYARZl6u9UK+vKYw+C3RQj/rqnG37Edj/pMBKIsO5bErNu06tGTj3qLvqtUXuv9IkB /GpfUAE/gAv6Dinnzxi1Jdg== X-Google-Smtp-Source: AGHT+IGZ+vSi+xRAyavOp/pwzu6XPW4bg11KacVipDuoahr0nhZE+inydsp0VRbLkZVbrSzgGryCMtyyW31XCXlgcw== X-Received: from pgbcz5.prod.google.com ([2002:a05:6a02:2305:b0:ad5:577f:8e58]) (user=almasrymina job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a21:a24b:b0:1ea:f941:8d7e with SMTP id adf61e73a8af0-1eef3c893edmr15418728637.13.1740251721805; Sat, 22 Feb 2025 11:15:21 -0800 (PST) Date: Sat, 22 Feb 2025 19:15:09 +0000 In-Reply-To: <20250222191517.743530-1-almasrymina@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250222191517.743530-1-almasrymina@google.com> X-Mailer: git-send-email 2.48.1.601.g30ceb7b040-goog Message-ID: <20250222191517.743530-2-almasrymina@google.com> Subject: [PATCH net-next v5 1/9] net: add get_netmem/put_netmem support From: Mina Almasry To: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, virtualization@lists.linux.dev, kvm@vger.kernel.org, linux-kselftest@vger.kernel.org Cc: Mina Almasry , Donald Hunter , Jakub Kicinski , "David S. Miller" , Eric Dumazet , Paolo Abeni , Simon Horman , Jonathan Corbet , Andrew Lunn , Jeroen de Borst , Harshitha Ramamurthy , Kuniyuki Iwashima , Willem de Bruijn , David Ahern , Neal Cardwell , Stefan Hajnoczi , Stefano Garzarella , "Michael S. Tsirkin" , Jason Wang , Xuan Zhuo , "=?UTF-8?q?Eugenio=20P=C3=A9rez?=" , Shuah Khan , sdf@fomichev.me, asml.silence@gmail.com, dw@davidwei.uk, Jamal Hadi Salim , Victor Nogueira , Pedro Tammela , Samiullah Khawaja Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Currently net_iovs support only pp ref counts, and do not support a page ref equivalent. This is fine for the RX path as net_iovs are used exclusively with the pp and only pp refcounting is needed there. The TX path however does not use pp ref counts, thus, support for get_page/put_page equivalent is needed for netmem. Support get_netmem/put_netmem. Check the type of the netmem before passing it to page or net_iov specific code to obtain a page ref equivalent. For dmabuf net_iovs, we obtain a ref on the underlying binding. This ensures the entire binding doesn't disappear until all the net_iovs have been put_netmem'ed. We do not need to track the refcount of individual dmabuf net_iovs as we don't allocate/free them from a pool similar to what the buddy allocator does for pages. This code is written to be extensible by other net_iov implementers. get_netmem/put_netmem will check the type of the netmem and route it to the correct helper: pages -> [get|put]_page() dmabuf net_iovs -> net_devmem_[get|put]_net_iov() new net_iovs -> new helpers Signed-off-by: Mina Almasry Acked-by: Stanislav Fomichev --- v2: - Add comment on top of refcount_t ref explaining the usage in the XT path. - Fix missing definition of net_devmem_dmabuf_binding_put in this patch. --- include/linux/skbuff_ref.h | 4 ++-- include/net/netmem.h | 3 +++ net/core/devmem.c | 10 ++++++++++ net/core/devmem.h | 20 ++++++++++++++++++++ net/core/skbuff.c | 30 ++++++++++++++++++++++++++++++ 5 files changed, 65 insertions(+), 2 deletions(-) diff --git a/include/linux/skbuff_ref.h b/include/linux/skbuff_ref.h index 0f3c58007488..9e49372ef1a0 100644 --- a/include/linux/skbuff_ref.h +++ b/include/linux/skbuff_ref.h @@ -17,7 +17,7 @@ */ static inline void __skb_frag_ref(skb_frag_t *frag) { - get_page(skb_frag_page(frag)); + get_netmem(skb_frag_netmem(frag)); } =20 /** @@ -40,7 +40,7 @@ static inline void skb_page_unref(netmem_ref netmem, bool= recycle) if (recycle && napi_pp_put_page(netmem)) return; #endif - put_page(netmem_to_page(netmem)); + put_netmem(netmem); } =20 /** diff --git a/include/net/netmem.h b/include/net/netmem.h index c61d5b21e7b4..a2148ffb203d 100644 --- a/include/net/netmem.h +++ b/include/net/netmem.h @@ -264,4 +264,7 @@ static inline unsigned long netmem_get_dma_addr(netmem_= ref netmem) return __netmem_clear_lsb(netmem)->dma_addr; } =20 +void get_netmem(netmem_ref netmem); +void put_netmem(netmem_ref netmem); + #endif /* _NET_NETMEM_H */ diff --git a/net/core/devmem.c b/net/core/devmem.c index 7c6e0b5b6acb..b1aafc66ebb7 100644 --- a/net/core/devmem.c +++ b/net/core/devmem.c @@ -325,6 +325,16 @@ net_devmem_bind_dmabuf(struct net_device *dev, unsigne= d int dmabuf_fd, return ERR_PTR(err); } =20 +void net_devmem_get_net_iov(struct net_iov *niov) +{ + net_devmem_dmabuf_binding_get(net_devmem_iov_binding(niov)); +} + +void net_devmem_put_net_iov(struct net_iov *niov) +{ + net_devmem_dmabuf_binding_put(net_devmem_iov_binding(niov)); +} + /*** "Dmabuf devmem memory provider" ***/ =20 int mp_dmabuf_devmem_init(struct page_pool *pool) diff --git a/net/core/devmem.h b/net/core/devmem.h index 7fc158d52729..946f2e015746 100644 --- a/net/core/devmem.h +++ b/net/core/devmem.h @@ -29,6 +29,10 @@ struct net_devmem_dmabuf_binding { * The binding undos itself and unmaps the underlying dmabuf once all * those refs are dropped and the binding is no longer desired or in * use. + * + * net_devmem_get_net_iov() on dmabuf net_iovs will increment this + * reference, making sure that the binding remains alive until all the + * net_iovs are no longer used. */ refcount_t ref; =20 @@ -111,6 +115,9 @@ net_devmem_dmabuf_binding_put(struct net_devmem_dmabuf_= binding *binding) __net_devmem_dmabuf_binding_free(binding); } =20 +void net_devmem_get_net_iov(struct net_iov *niov); +void net_devmem_put_net_iov(struct net_iov *niov); + struct net_iov * net_devmem_alloc_dmabuf(struct net_devmem_dmabuf_binding *binding); void net_devmem_free_dmabuf(struct net_iov *ppiov); @@ -120,6 +127,19 @@ bool net_is_devmem_iov(struct net_iov *niov); #else struct net_devmem_dmabuf_binding; =20 +static inline void +net_devmem_dmabuf_binding_put(struct net_devmem_dmabuf_binding *binding) +{ +} + +static inline void net_devmem_get_net_iov(struct net_iov *niov) +{ +} + +static inline void net_devmem_put_net_iov(struct net_iov *niov) +{ +} + static inline void __net_devmem_dmabuf_binding_free(struct net_devmem_dmabuf_binding *binding) { diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 5b241c9e6f38..6e853d55a3e8 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -89,6 +89,7 @@ #include =20 #include "dev.h" +#include "devmem.h" #include "netmem_priv.h" #include "sock_destructor.h" =20 @@ -7253,3 +7254,32 @@ bool csum_and_copy_from_iter_full(void *addr, size_t= bytes, return false; } EXPORT_SYMBOL(csum_and_copy_from_iter_full); + +void get_netmem(netmem_ref netmem) +{ + if (netmem_is_net_iov(netmem)) { + /* Assume any net_iov is devmem and route it to + * net_devmem_get_net_iov. As new net_iov types are added they + * need to be checked here. + */ + net_devmem_get_net_iov(netmem_to_net_iov(netmem)); + return; + } + get_page(netmem_to_page(netmem)); +} +EXPORT_SYMBOL(get_netmem); + +void put_netmem(netmem_ref netmem) +{ + if (netmem_is_net_iov(netmem)) { + /* Assume any net_iov is devmem and route it to + * net_devmem_put_net_iov. As new net_iov types are added they + * need to be checked here. + */ + net_devmem_put_net_iov(netmem_to_net_iov(netmem)); + return; + } + + put_page(netmem_to_page(netmem)); +} +EXPORT_SYMBOL(put_netmem); --=20 2.48.1.601.g30ceb7b040-goog From nobody Sun Feb 8 00:03:26 2026 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 507B1212F98 for ; Sat, 22 Feb 2025 19:15:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740251726; cv=none; b=QeA+W5/AAXThI+y7bdXjFfa1KN3xMUCrHj9oZtu5kR3gaswYkLefU5XIjDp/LYXr7s641lb7e1WaFWu1/sa47I0WqAHwh3iyWP/2M2oNM7/RIsliamuOFIQQXYXl/jTmJJ6HKsho3TAKNhgg3ALEZDCBwR2c6lqdqr87OJqt67A= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740251726; c=relaxed/simple; bh=gkFfOfu9WKr9FRA5K4klVB4y4vYEdazePHQP6t2RXWM=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=Goetq87Y4vfK9T/XW6rWHCCz28yBKIWp/E7E+lJ7YEhpmktbgy9H6lpjZHHvW/G3t5A8BVpzp4Z6Ti5IJxx7kj9+N8aVQySQab0IYFb/MSSVjD3f/JEaSTBRNuxdDUOdfWiXOvMc913s0f4xSAEGE6lG6siO+Nn58wfUB/i5m+A= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--almasrymina.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=TkdB5niS; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--almasrymina.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="TkdB5niS" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-2fc4dc34291so6214924a91.3 for ; Sat, 22 Feb 2025 11:15:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1740251723; x=1740856523; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=4+uD/U2vycUjudt2AEI4HSs62q3AICmDnc2uCCI12W0=; b=TkdB5niSeZ3CB92Hub+N3OBSk3Bqf6fQEEDHDHoN+mfevcyHhAEtcNBhs4+1veBD8w jgj8V4SEwjlp/m+fjwLwx3uzpinbZYpkk/vwwyFT4GRTzk2oKEsO+h/MEgflVgsQIb/p 6q9WKZC6M+do20e+e+yvejMjVl5KRxT9a6KJw4U2RHcVicejuWyYgWIAPoROPy3uPwsi GJJoL6BTIY3DXY3C/oaF6S38bcK2BWLdXHNn5xphmNZlm4YQE0c99F3fwWSLJdeAXGZ2 Dpic3ckpnyV/evw4yG7C0u6EwCQ4RAWmwFf+9PPnp2G415/HWQN7XCg5wTJdCpoDVauX 5yww== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740251723; x=1740856523; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=4+uD/U2vycUjudt2AEI4HSs62q3AICmDnc2uCCI12W0=; b=Aqkb6m25DH2H9cUqmijREBZxmNPDEQYhOy9fTcyQE660V81nXcMkQbM60N1rDIxFbB Dfct2NW4Dx7aVvmvc5U3CWq5ni8QsdMzkhx4om951Cvc/r+zmfaCDlNVO63XPuzn3maa NwqXjW5DRWigGjtsR37yFzwOyDhxqDpfFPqh4WvidCbHRXN0cm8usJsFNd2PsbIjoQYm 3jv2pSmpFJSluC2wMv1QMF77BJ4g61UJfxkC3pAXNR2kCo3mJoSbaJ5V2TBhj7d6lt/k gUzQZb9GdEtJDy9atBoC33MvFBfyP3Zuw8xYJFMoXXTXwoeyvxAQv07JVoFWwnlQoUkf TaHQ== X-Forwarded-Encrypted: i=1; AJvYcCU0ewaUJFAlOdTxvYu/2jqONj9+essfMXbI4/HF+ej72wqBFH7CO8XV2JSlF2DBXyhjfh76E3zICWWpWWI=@vger.kernel.org X-Gm-Message-State: AOJu0YxB1OnWeUy2oTO8KWA2DLTCJZTa16nDU8nd8FcKkQLSClOqssxn vkYm/dni81Qfy9ocfRBAxl73087ziNNIj2RNoKpy37iPYbFjeXMc5CKfvZ36xJGDhfd5WAlZS3T MSrGLf3AEgn6Z6O2V2Wic3g== X-Google-Smtp-Source: AGHT+IE7XiVmXUXukuSBBLpyvBO/788RzbPT/XMlnfm7Hb5OdS2UTsI8RXz5o7RCEpNhWS4Hqe/8fBalyM3FmRuqgw== X-Received: from pgmr22.prod.google.com ([2002:a63:2056:0:b0:ad5:4a28:559a]) (user=almasrymina job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a21:7011:b0:1ee:d7fa:5d8f with SMTP id adf61e73a8af0-1eef3d5a9dfmr14749939637.27.1740251723639; Sat, 22 Feb 2025 11:15:23 -0800 (PST) Date: Sat, 22 Feb 2025 19:15:10 +0000 In-Reply-To: <20250222191517.743530-1-almasrymina@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250222191517.743530-1-almasrymina@google.com> X-Mailer: git-send-email 2.48.1.601.g30ceb7b040-goog Message-ID: <20250222191517.743530-3-almasrymina@google.com> Subject: [PATCH net-next v5 2/9] net: devmem: TCP tx netlink api From: Mina Almasry To: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, virtualization@lists.linux.dev, kvm@vger.kernel.org, linux-kselftest@vger.kernel.org Cc: Mina Almasry , Donald Hunter , Jakub Kicinski , "David S. Miller" , Eric Dumazet , Paolo Abeni , Simon Horman , Jonathan Corbet , Andrew Lunn , Jeroen de Borst , Harshitha Ramamurthy , Kuniyuki Iwashima , Willem de Bruijn , David Ahern , Neal Cardwell , Stefan Hajnoczi , Stefano Garzarella , "Michael S. Tsirkin" , Jason Wang , Xuan Zhuo , "=?UTF-8?q?Eugenio=20P=C3=A9rez?=" , Shuah Khan , sdf@fomichev.me, asml.silence@gmail.com, dw@davidwei.uk, Jamal Hadi Salim , Victor Nogueira , Pedro Tammela , Samiullah Khawaja Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Stanislav Fomichev Add bind-tx netlink call to attach dmabuf for TX; queue is not required, only ifindex and dmabuf fd for attachment. Signed-off-by: Stanislav Fomichev Signed-off-by: Mina Almasry --- v3: - Fix ynl-regen.sh error (Simon). --- Documentation/netlink/specs/netdev.yaml | 12 ++++++++++++ include/uapi/linux/netdev.h | 1 + net/core/netdev-genl-gen.c | 13 +++++++++++++ net/core/netdev-genl-gen.h | 1 + net/core/netdev-genl.c | 6 ++++++ tools/include/uapi/linux/netdev.h | 1 + 6 files changed, 34 insertions(+) diff --git a/Documentation/netlink/specs/netdev.yaml b/Documentation/netlin= k/specs/netdev.yaml index 36f1152bfac3..e560b05eb528 100644 --- a/Documentation/netlink/specs/netdev.yaml +++ b/Documentation/netlink/specs/netdev.yaml @@ -743,6 +743,18 @@ operations: - defer-hard-irqs - gro-flush-timeout - irq-suspend-timeout + - + name: bind-tx + doc: Bind dmabuf to netdev for TX + attribute-set: dmabuf + do: + request: + attributes: + - ifindex + - fd + reply: + attributes: + - id =20 kernel-family: headers: [ "linux/list.h"] diff --git a/include/uapi/linux/netdev.h b/include/uapi/linux/netdev.h index 7600bf62dbdf..7eb9571786b8 100644 --- a/include/uapi/linux/netdev.h +++ b/include/uapi/linux/netdev.h @@ -219,6 +219,7 @@ enum { NETDEV_CMD_QSTATS_GET, NETDEV_CMD_BIND_RX, NETDEV_CMD_NAPI_SET, + NETDEV_CMD_BIND_TX, =20 __NETDEV_CMD_MAX, NETDEV_CMD_MAX =3D (__NETDEV_CMD_MAX - 1) diff --git a/net/core/netdev-genl-gen.c b/net/core/netdev-genl-gen.c index 996ac6a449eb..f27608d6301c 100644 --- a/net/core/netdev-genl-gen.c +++ b/net/core/netdev-genl-gen.c @@ -99,6 +99,12 @@ static const struct nla_policy netdev_napi_set_nl_policy= [NETDEV_A_NAPI_IRQ_SUSPE [NETDEV_A_NAPI_IRQ_SUSPEND_TIMEOUT] =3D { .type =3D NLA_UINT, }, }; =20 +/* NETDEV_CMD_BIND_TX - do */ +static const struct nla_policy netdev_bind_tx_nl_policy[NETDEV_A_DMABUF_FD= + 1] =3D { + [NETDEV_A_DMABUF_IFINDEX] =3D NLA_POLICY_MIN(NLA_U32, 1), + [NETDEV_A_DMABUF_FD] =3D { .type =3D NLA_U32, }, +}; + /* Ops table for netdev */ static const struct genl_split_ops netdev_nl_ops[] =3D { { @@ -190,6 +196,13 @@ static const struct genl_split_ops netdev_nl_ops[] =3D= { .maxattr =3D NETDEV_A_NAPI_IRQ_SUSPEND_TIMEOUT, .flags =3D GENL_ADMIN_PERM | GENL_CMD_CAP_DO, }, + { + .cmd =3D NETDEV_CMD_BIND_TX, + .doit =3D netdev_nl_bind_tx_doit, + .policy =3D netdev_bind_tx_nl_policy, + .maxattr =3D NETDEV_A_DMABUF_FD, + .flags =3D GENL_CMD_CAP_DO, + }, }; =20 static const struct genl_multicast_group netdev_nl_mcgrps[] =3D { diff --git a/net/core/netdev-genl-gen.h b/net/core/netdev-genl-gen.h index e09dd7539ff2..c1fed66e92b9 100644 --- a/net/core/netdev-genl-gen.h +++ b/net/core/netdev-genl-gen.h @@ -34,6 +34,7 @@ int netdev_nl_qstats_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb); int netdev_nl_bind_rx_doit(struct sk_buff *skb, struct genl_info *info); int netdev_nl_napi_set_doit(struct sk_buff *skb, struct genl_info *info); +int netdev_nl_bind_tx_doit(struct sk_buff *skb, struct genl_info *info); =20 enum { NETDEV_NLGRP_MGMT, diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c index 2b774183d31c..6e5f2de4d947 100644 --- a/net/core/netdev-genl.c +++ b/net/core/netdev-genl.c @@ -931,6 +931,12 @@ int netdev_nl_bind_rx_doit(struct sk_buff *skb, struct= genl_info *info) return err; } =20 +/* stub */ +int netdev_nl_bind_tx_doit(struct sk_buff *skb, struct genl_info *info) +{ + return 0; +} + void netdev_nl_sock_priv_init(struct list_head *priv) { INIT_LIST_HEAD(priv); diff --git a/tools/include/uapi/linux/netdev.h b/tools/include/uapi/linux/n= etdev.h index 7600bf62dbdf..7eb9571786b8 100644 --- a/tools/include/uapi/linux/netdev.h +++ b/tools/include/uapi/linux/netdev.h @@ -219,6 +219,7 @@ enum { NETDEV_CMD_QSTATS_GET, NETDEV_CMD_BIND_RX, NETDEV_CMD_NAPI_SET, + NETDEV_CMD_BIND_TX, =20 __NETDEV_CMD_MAX, NETDEV_CMD_MAX =3D (__NETDEV_CMD_MAX - 1) --=20 2.48.1.601.g30ceb7b040-goog From nobody Sun Feb 8 00:03:27 2026 Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 164B22135C3 for ; Sat, 22 Feb 2025 19:15:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740251729; cv=none; b=jiVrbwa0iE3j4Mw/ARkO9vwJSevm66pQoft7Dx/s1wI51p8LMz3eEaqPKRpWMb2LDLsvpPztrnVnSAmN+YlbE3Qb0I2NMa21xsYFx5zk4z1Hm99iUuJ+nyd2oGiqwNO8sPoTRivxFnYs7CcrvAQF9CRMDeFD4tXiGPSOa8StGp0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740251729; c=relaxed/simple; bh=0ChafBpbGboky+mOXHnlPWCYa4e7SNJoIQqzPTOyn4s=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=NlI6RRQKl+wQqb4yW1+Wba+77/2T0wVmPpK1hYihMC3ZDsX50kHFdIchnzc8IEf7jQKxdqUH0Yzg99id/PRxfa4tEPvhhxLGqy+Klro3RaNZC/pVZDI2b33zIFQMCKBZjhPHQnW8mtFNKeNuydLAmZYfWIRdP6dtz59SK9fA0p0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--almasrymina.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=ZzQFXES8; arc=none smtp.client-ip=209.85.214.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--almasrymina.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="ZzQFXES8" Received: by mail-pl1-f201.google.com with SMTP id d9443c01a7336-22119b07d52so42406765ad.1 for ; Sat, 22 Feb 2025 11:15:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1740251725; x=1740856525; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=YvBvb5m4oxUc59qIhI7bYvSI5MjBHL1mncrDuS6HN08=; b=ZzQFXES8x2aRhtbyOlbuatvrQ3BaaNoo1umKOQLnOa054d3+T+XzOimo7Q6zyt02AV USmQZzXgGnYUOYQSLt8mLwJx+MkyDP+21ZRLUnlbWQJNFrDuL1+cfVOateOYXsXtXD/K 8Wj05gqSAs3xh0KWuLeRlsn9/xvESn7seXLOBYjf5CIRryGGmelgVSy0XBqnTEMJSZHW HQU0q0XRuq6dFM/QyK7WRL8+V2NbU83ZvJGFtTe5Etlkwyc1g+y+x/YTGaA/N/qkxtVJ CTUw9fS2Wv4hZBx8HrLDT7c0THJsvMU77yCnzDsV2f5zwfuNxxIba6vKxhstIvopgB5m sM9w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740251725; x=1740856525; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=YvBvb5m4oxUc59qIhI7bYvSI5MjBHL1mncrDuS6HN08=; b=eDDLtcwGqg+eAF1BA+fd2Cy2RN0lISMmqbderRs1YjVVl7T+TmWtoO71jzumi5TqR1 Oc14SCuInJHIxZi6ydN57BlY57OdPHRyeu4jhToRfw0iqCKtVUVGBkyT4damMTn0y99V P/DaClGZYW8UQNK165y456DJfIWVjTz85G4YN949DOlA937edasUD179MCmZd1FAt+Kn 90w4LwprpSUq0hVLc5D1RbQILnUywsRk9LGt45LwEfo4J34znd4A7O1M9m69Oq/pCfZb PChFG8XTBkNWGGzurVbl38Hkg17RauLn3AGuCN9DX/W9pvO9G+ZQ9ECSxFnYn1ehnmrI iWpA== X-Forwarded-Encrypted: i=1; AJvYcCWocuRoNlMzMmbxnOKORvgyIod5uxJfpBh/ZBa1LYIaXBxkIMS04XraYVkGjgLJycmLgG3MxuhWdpDZSoY=@vger.kernel.org X-Gm-Message-State: AOJu0YwkDJN2cogA1kPWScN9TuFwMIALqY3UtTH3ko0b3FlDdOg/ZeQc Q7mdRPFNwMno0uJUWWzyff3qZYSsh5N/ZuNW/FyTebGMLNvnx8oZqkJkzOH9pDjs//hRz+wvB55 kXKLkYcvunYwcHFk5Rj4r3g== X-Google-Smtp-Source: AGHT+IFYAvBmQFT++Oo33x5hI2PeX6qJGyKPSNHLqQlnBtz1a6k3I1XnG4WSOskQZZ9xHd6ARZzJqXH0ViPJ//DoOA== X-Received: from pllh3.prod.google.com ([2002:a17:902:7483:b0:220:d668:ff81]) (user=almasrymina job=prod-delivery.src-stubby-dispatcher) by 2002:a17:902:fc4e:b0:21f:7964:e989 with SMTP id d9443c01a7336-2219fffb650mr127314955ad.52.1740251725452; Sat, 22 Feb 2025 11:15:25 -0800 (PST) Date: Sat, 22 Feb 2025 19:15:11 +0000 In-Reply-To: <20250222191517.743530-1-almasrymina@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250222191517.743530-1-almasrymina@google.com> X-Mailer: git-send-email 2.48.1.601.g30ceb7b040-goog Message-ID: <20250222191517.743530-4-almasrymina@google.com> Subject: [PATCH net-next v5 3/9] net: devmem: Implement TX path From: Mina Almasry To: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, virtualization@lists.linux.dev, kvm@vger.kernel.org, linux-kselftest@vger.kernel.org Cc: Mina Almasry , Donald Hunter , Jakub Kicinski , "David S. Miller" , Eric Dumazet , Paolo Abeni , Simon Horman , Jonathan Corbet , Andrew Lunn , Jeroen de Borst , Harshitha Ramamurthy , Kuniyuki Iwashima , Willem de Bruijn , David Ahern , Neal Cardwell , Stefan Hajnoczi , Stefano Garzarella , "Michael S. Tsirkin" , Jason Wang , Xuan Zhuo , "=?UTF-8?q?Eugenio=20P=C3=A9rez?=" , Shuah Khan , sdf@fomichev.me, asml.silence@gmail.com, dw@davidwei.uk, Jamal Hadi Salim , Victor Nogueira , Pedro Tammela , Samiullah Khawaja , Kaiyuan Zhang Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Augment dmabuf binding to be able to handle TX. Additional to all the RX binding, we also create tx_vec needed for the TX path. Provide API for sendmsg to be able to send dmabufs bound to this device: - Provide a new dmabuf_tx_cmsg which includes the dmabuf to send from. - MSG_ZEROCOPY with SCM_DEVMEM_DMABUF cmsg indicates send from dma-buf. Devmem is uncopyable, so piggyback off the existing MSG_ZEROCOPY implementation, while disabling instances where MSG_ZEROCOPY falls back to copying. We additionally pipe the binding down to the new zerocopy_fill_skb_from_devmem which fills a TX skb with net_iov netmems instead of the traditional page netmems. We also special case skb_frag_dma_map to return the dma-address of these dmabuf net_iovs instead of attempting to map pages. Based on work by Stanislav Fomichev . A lot of the meat of the implementation came from devmem TCP RFC v1[1], which included the TX path, but Stan did all the rebasing on top of netmem/net_iov. Cc: Stanislav Fomichev Signed-off-by: Kaiyuan Zhang Signed-off-by: Mina Almasry Acked-by: Stanislav Fomichev --- v5: - Return -EFAULT from zerocopy_fill_skb_from_devmem (Stan) - don't null check before kvfree (stan). v4: - Remove dmabuf_tx_cmsg definition and just use __u32 for the dma-buf id (Willem). - Check that iov_iter_type() is ITER_IOVEC in zerocopy_fill_skb_from_iter() (Pavel). - Fix binding->tx_vec not being freed on error paths (Paolo). - Make devmem patch mutually exclusive with msg->ubuf_info path (Pavel). - Check that MSG_ZEROCOPY and SOCK_ZEROCOPY are provided when sockc.dmabuf_id is provided. - Don't mm_account_pinned_pages() on devmem TX (Pavel). v3: - Use kvmalloc_array instead of kcalloc (Stan). - Fix unreachable code warning (Simon). v2: - Remove dmabuf_offset from the dmabuf cmsg. - Update zerocopy_fill_skb_from_devmem to interpret the iov_base/iter_iov_addr as the offset into the dmabuf to send from (Stan). - Remove the confusing binding->tx_iter which is not needed if we interpret the iov_base/iter_iov_addr as offset into the dmabuf (Stan). - Remove check for binding->sgt and binding->sgt->nents in dmabuf binding. - Simplify the calculation of binding->tx_vec. - Check in net_devmem_get_binding that the binding we're returning has ifindex matching the sending socket (Willem). --- include/linux/skbuff.h | 17 ++++- include/net/sock.h | 1 + net/core/datagram.c | 48 +++++++++++- net/core/devmem.c | 99 +++++++++++++++++++++++-- net/core/devmem.h | 41 +++++++++- net/core/netdev-genl.c | 64 +++++++++++++++- net/core/skbuff.c | 18 +++-- net/core/sock.c | 6 ++ net/ipv4/ip_output.c | 3 +- net/ipv4/tcp.c | 46 +++++++++--- net/ipv6/ip6_output.c | 3 +- net/vmw_vsock/virtio_transport_common.c | 5 +- 12 files changed, 308 insertions(+), 43 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 0b4f1889500d..6cb4ada2c01f 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1713,13 +1713,16 @@ static inline void skb_set_end_offset(struct sk_buf= f *skb, unsigned int offset) extern const struct ubuf_info_ops msg_zerocopy_ubuf_ops; =20 struct ubuf_info *msg_zerocopy_realloc(struct sock *sk, size_t size, - struct ubuf_info *uarg); + struct ubuf_info *uarg, bool devmem); =20 void msg_zerocopy_put_abort(struct ubuf_info *uarg, bool have_uref); =20 +struct net_devmem_dmabuf_binding; + int __zerocopy_sg_from_iter(struct msghdr *msg, struct sock *sk, struct sk_buff *skb, struct iov_iter *from, - size_t length); + size_t length, + struct net_devmem_dmabuf_binding *binding); =20 int zerocopy_fill_skb_from_iter(struct sk_buff *skb, struct iov_iter *from, size_t length); @@ -1727,12 +1730,14 @@ int zerocopy_fill_skb_from_iter(struct sk_buff *skb, static inline int skb_zerocopy_iter_dgram(struct sk_buff *skb, struct msghdr *msg, int len) { - return __zerocopy_sg_from_iter(msg, skb->sk, skb, &msg->msg_iter, len); + return __zerocopy_sg_from_iter(msg, skb->sk, skb, &msg->msg_iter, len, + NULL); } =20 int skb_zerocopy_iter_stream(struct sock *sk, struct sk_buff *skb, struct msghdr *msg, int len, - struct ubuf_info *uarg); + struct ubuf_info *uarg, + struct net_devmem_dmabuf_binding *binding); =20 /* Internal */ #define skb_shinfo(SKB) ((struct skb_shared_info *)(skb_end_pointer(SKB))) @@ -3703,6 +3708,10 @@ static inline dma_addr_t __skb_frag_dma_map(struct d= evice *dev, size_t offset, size_t size, enum dma_data_direction dir) { + if (skb_frag_is_net_iov(frag)) { + return netmem_to_net_iov(frag->netmem)->dma_addr + offset + + frag->offset; + } return dma_map_page(dev, skb_frag_page(frag), skb_frag_off(frag) + offset, size, dir); } diff --git a/include/net/sock.h b/include/net/sock.h index efc031163c33..5ce5e57b5ac5 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1813,6 +1813,7 @@ struct sockcm_cookie { u32 tsflags; u32 ts_opt_id; u32 priority; + u32 dmabuf_id; }; =20 static inline void sockcm_init(struct sockcm_cookie *sockc, diff --git a/net/core/datagram.c b/net/core/datagram.c index f0693707aece..09c74a1d836b 100644 --- a/net/core/datagram.c +++ b/net/core/datagram.c @@ -63,6 +63,8 @@ #include #include =20 +#include "devmem.h" + /* * Is a socket 'connection oriented' ? */ @@ -692,9 +694,49 @@ int zerocopy_fill_skb_from_iter(struct sk_buff *skb, return 0; } =20 +static int +zerocopy_fill_skb_from_devmem(struct sk_buff *skb, struct iov_iter *from, + int length, + struct net_devmem_dmabuf_binding *binding) +{ + int i =3D skb_shinfo(skb)->nr_frags; + size_t virt_addr, size, off; + struct net_iov *niov; + + /* Devmem filling works by taking an IOVEC from the user where the + * iov_addrs are interpreted as an offset in bytes into the dma-buf to + * send from. We do not support other iter types. + */ + if (iov_iter_type(from) !=3D ITER_IOVEC) + return -EFAULT; + + while (length && iov_iter_count(from)) { + if (i =3D=3D MAX_SKB_FRAGS) + return -EMSGSIZE; + + virt_addr =3D (size_t)iter_iov_addr(from); + niov =3D net_devmem_get_niov_at(binding, virt_addr, &off, &size); + if (!niov) + return -EFAULT; + + size =3D min_t(size_t, size, length); + size =3D min_t(size_t, size, iter_iov_len(from)); + + get_netmem(net_iov_to_netmem(niov)); + skb_add_rx_frag_netmem(skb, i, net_iov_to_netmem(niov), off, + size, PAGE_SIZE); + iov_iter_advance(from, size); + length -=3D size; + i++; + } + + return 0; +} + int __zerocopy_sg_from_iter(struct msghdr *msg, struct sock *sk, struct sk_buff *skb, struct iov_iter *from, - size_t length) + size_t length, + struct net_devmem_dmabuf_binding *binding) { unsigned long orig_size =3D skb->truesize; unsigned long truesize; @@ -702,6 +744,8 @@ int __zerocopy_sg_from_iter(struct msghdr *msg, struct = sock *sk, =20 if (msg && msg->msg_ubuf && msg->sg_from_iter) ret =3D msg->sg_from_iter(skb, from, length); + else if (unlikely(binding)) + ret =3D zerocopy_fill_skb_from_devmem(skb, from, length, binding); else ret =3D zerocopy_fill_skb_from_iter(skb, from, length); =20 @@ -735,7 +779,7 @@ int zerocopy_sg_from_iter(struct sk_buff *skb, struct i= ov_iter *from) if (skb_copy_datagram_from_iter(skb, 0, from, copy)) return -EFAULT; =20 - return __zerocopy_sg_from_iter(NULL, NULL, skb, from, ~0U); + return __zerocopy_sg_from_iter(NULL, NULL, skb, from, ~0U, NULL); } EXPORT_SYMBOL(zerocopy_sg_from_iter); =20 diff --git a/net/core/devmem.c b/net/core/devmem.c index b1aafc66ebb7..e5941f8e29df 100644 --- a/net/core/devmem.c +++ b/net/core/devmem.c @@ -17,6 +17,7 @@ #include #include #include +#include #include =20 #include "devmem.h" @@ -73,8 +74,10 @@ void __net_devmem_dmabuf_binding_free(struct net_devmem_= dmabuf_binding *binding) dma_buf_detach(binding->dmabuf, binding->attachment); dma_buf_put(binding->dmabuf); xa_destroy(&binding->bound_rxqs); + kvfree(binding->tx_vec); kfree(binding); } +EXPORT_SYMBOL(__net_devmem_dmabuf_binding_free); =20 struct net_iov * net_devmem_alloc_dmabuf(struct net_devmem_dmabuf_binding *binding) @@ -119,6 +122,13 @@ void net_devmem_unbind_dmabuf(struct net_devmem_dmabuf= _binding *binding) unsigned long xa_idx; unsigned int rxq_idx; =20 + xa_erase(&net_devmem_dmabuf_bindings, binding->id); + + /* Ensure no tx net_devmem_lookup_dmabuf() are in flight after the + * erase. + */ + synchronize_net(); + if (binding->list.next) list_del(&binding->list); =20 @@ -133,8 +143,6 @@ void net_devmem_unbind_dmabuf(struct net_devmem_dmabuf_= binding *binding) WARN_ON(netdev_rx_queue_restart(binding->dev, rxq_idx)); } =20 - xa_erase(&net_devmem_dmabuf_bindings, binding->id); - net_devmem_dmabuf_binding_put(binding); } =20 @@ -197,8 +205,9 @@ int net_devmem_bind_dmabuf_to_queue(struct net_device *= dev, u32 rxq_idx, } =20 struct net_devmem_dmabuf_binding * -net_devmem_bind_dmabuf(struct net_device *dev, unsigned int dmabuf_fd, - struct netlink_ext_ack *extack) +net_devmem_bind_dmabuf(struct net_device *dev, + enum dma_data_direction direction, + unsigned int dmabuf_fd, struct netlink_ext_ack *extack) { struct net_devmem_dmabuf_binding *binding; static u32 id_alloc_next; @@ -241,7 +250,7 @@ net_devmem_bind_dmabuf(struct net_device *dev, unsigned= int dmabuf_fd, } =20 binding->sgt =3D dma_buf_map_attachment_unlocked(binding->attachment, - DMA_FROM_DEVICE); + direction); if (IS_ERR(binding->sgt)) { err =3D PTR_ERR(binding->sgt); NL_SET_ERR_MSG(extack, "Failed to map dmabuf attachment"); @@ -252,13 +261,23 @@ net_devmem_bind_dmabuf(struct net_device *dev, unsign= ed int dmabuf_fd, * binding can be much more flexible than that. We may be able to * allocate MTU sized chunks here. Leave that for future work... */ - binding->chunk_pool =3D - gen_pool_create(PAGE_SHIFT, dev_to_node(&dev->dev)); + binding->chunk_pool =3D gen_pool_create(PAGE_SHIFT, + dev_to_node(&dev->dev)); if (!binding->chunk_pool) { err =3D -ENOMEM; goto err_unmap; } =20 + if (direction =3D=3D DMA_TO_DEVICE) { + binding->tx_vec =3D kvmalloc_array(dmabuf->size / PAGE_SIZE, + sizeof(struct net_iov *), + GFP_KERNEL); + if (!binding->tx_vec) { + err =3D -ENOMEM; + goto err_free_chunks; + } + } + virtual =3D 0; for_each_sgtable_dma_sg(binding->sgt, sg, sg_idx) { dma_addr_t dma_addr =3D sg_dma_address(sg); @@ -300,6 +319,8 @@ net_devmem_bind_dmabuf(struct net_device *dev, unsigned= int dmabuf_fd, niov->owner =3D &owner->area; page_pool_set_dma_addr_netmem(net_iov_to_netmem(niov), net_devmem_get_dma_addr(niov)); + if (direction =3D=3D DMA_TO_DEVICE) + binding->tx_vec[owner->area.base_virtual / PAGE_SIZE + i] =3D niov; } =20 virtual +=3D len; @@ -311,6 +332,8 @@ net_devmem_bind_dmabuf(struct net_device *dev, unsigned= int dmabuf_fd, gen_pool_for_each_chunk(binding->chunk_pool, net_devmem_dmabuf_free_chunk_owner, NULL); gen_pool_destroy(binding->chunk_pool); + + kvfree(binding->tx_vec); err_unmap: dma_buf_unmap_attachment_unlocked(binding->attachment, binding->sgt, DMA_FROM_DEVICE); @@ -325,6 +348,21 @@ net_devmem_bind_dmabuf(struct net_device *dev, unsigne= d int dmabuf_fd, return ERR_PTR(err); } =20 +struct net_devmem_dmabuf_binding *net_devmem_lookup_dmabuf(u32 id) +{ + struct net_devmem_dmabuf_binding *binding; + + rcu_read_lock(); + binding =3D xa_load(&net_devmem_dmabuf_bindings, id); + if (binding) { + if (!net_devmem_dmabuf_binding_get(binding)) + binding =3D NULL; + } + rcu_read_unlock(); + + return binding; +} + void net_devmem_get_net_iov(struct net_iov *niov) { net_devmem_dmabuf_binding_get(net_devmem_iov_binding(niov)); @@ -335,6 +373,53 @@ void net_devmem_put_net_iov(struct net_iov *niov) net_devmem_dmabuf_binding_put(net_devmem_iov_binding(niov)); } =20 +struct net_devmem_dmabuf_binding *net_devmem_get_binding(struct sock *sk, + unsigned int dmabuf_id) +{ + struct net_devmem_dmabuf_binding *binding; + struct dst_entry *dst =3D __sk_dst_get(sk); + int err =3D 0; + + binding =3D net_devmem_lookup_dmabuf(dmabuf_id); + if (!binding || !binding->tx_vec) { + err =3D -EINVAL; + goto out_err; + } + + /* The dma-addrs in this binding are only reachable to the corresponding + * net_device. + */ + if (!dst || !dst->dev || dst->dev->ifindex !=3D binding->dev->ifindex) { + err =3D -ENODEV; + goto out_err; + } + + return binding; + +out_err: + if (binding) + net_devmem_dmabuf_binding_put(binding); + + return ERR_PTR(err); +} + +struct net_iov * +net_devmem_get_niov_at(struct net_devmem_dmabuf_binding *binding, + size_t virt_addr, size_t *off, size_t *size) +{ + size_t idx; + + if (virt_addr >=3D binding->dmabuf->size) + return NULL; + + idx =3D virt_addr / PAGE_SIZE; + + *off =3D virt_addr % PAGE_SIZE; + *size =3D PAGE_SIZE - *off; + + return binding->tx_vec[idx]; +} + /*** "Dmabuf devmem memory provider" ***/ =20 int mp_dmabuf_devmem_init(struct page_pool *pool) diff --git a/net/core/devmem.h b/net/core/devmem.h index 946f2e015746..a8b79c0e01b3 100644 --- a/net/core/devmem.h +++ b/net/core/devmem.h @@ -48,6 +48,12 @@ struct net_devmem_dmabuf_binding { * active. */ u32 id; + + /* Array of net_iov pointers for this binding, sorted by virtual + * address. This array is convenient to map the virtual addresses to + * net_iovs in the TX path. + */ + struct net_iov **tx_vec; }; =20 #if defined(CONFIG_NET_DEVMEM) @@ -66,12 +72,15 @@ struct dmabuf_genpool_chunk_owner { =20 void __net_devmem_dmabuf_binding_free(struct net_devmem_dmabuf_binding *bi= nding); struct net_devmem_dmabuf_binding * -net_devmem_bind_dmabuf(struct net_device *dev, unsigned int dmabuf_fd, - struct netlink_ext_ack *extack); +net_devmem_bind_dmabuf(struct net_device *dev, + enum dma_data_direction direction, + unsigned int dmabuf_fd, struct netlink_ext_ack *extack); +struct net_devmem_dmabuf_binding *net_devmem_lookup_dmabuf(u32 id); void net_devmem_unbind_dmabuf(struct net_devmem_dmabuf_binding *binding); int net_devmem_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx, struct net_devmem_dmabuf_binding *binding, struct netlink_ext_ack *extack); +void net_devmem_bind_tx_release(struct sock *sk); =20 static inline struct dmabuf_genpool_chunk_owner * net_devmem_iov_to_chunk_owner(const struct net_iov *niov) @@ -100,10 +109,10 @@ static inline unsigned long net_iov_virtual_addr(cons= t struct net_iov *niov) ((unsigned long)net_iov_idx(niov) << PAGE_SHIFT); } =20 -static inline void +static inline bool net_devmem_dmabuf_binding_get(struct net_devmem_dmabuf_binding *binding) { - refcount_inc(&binding->ref); + return refcount_inc_not_zero(&binding->ref); } =20 static inline void @@ -123,6 +132,11 @@ net_devmem_alloc_dmabuf(struct net_devmem_dmabuf_bindi= ng *binding); void net_devmem_free_dmabuf(struct net_iov *ppiov); =20 bool net_is_devmem_iov(struct net_iov *niov); +struct net_devmem_dmabuf_binding * +net_devmem_get_binding(struct sock *sk, unsigned int dmabuf_id); +struct net_iov * +net_devmem_get_niov_at(struct net_devmem_dmabuf_binding *binding, size_t a= ddr, + size_t *off, size_t *size); =20 #else struct net_devmem_dmabuf_binding; @@ -147,11 +161,17 @@ __net_devmem_dmabuf_binding_free(struct net_devmem_dm= abuf_binding *binding) =20 static inline struct net_devmem_dmabuf_binding * net_devmem_bind_dmabuf(struct net_device *dev, unsigned int dmabuf_fd, + enum dma_data_direction direction, struct netlink_ext_ack *extack) { return ERR_PTR(-EOPNOTSUPP); } =20 +static inline struct net_devmem_dmabuf_binding *net_devmem_lookup_dmabuf(u= 32 id) +{ + return NULL; +} + static inline void net_devmem_unbind_dmabuf(struct net_devmem_dmabuf_binding *binding) { @@ -190,6 +210,19 @@ static inline bool net_is_devmem_iov(struct net_iov *n= iov) { return false; } + +static inline struct net_devmem_dmabuf_binding * +net_devmem_get_binding(struct sock *sk, unsigned int dmabuf_id) +{ + return ERR_PTR(-EOPNOTSUPP); +} + +static inline struct net_iov * +net_devmem_get_niov_at(struct net_devmem_dmabuf_binding *binding, size_t a= ddr, + size_t *off, size_t *size) +{ + return NULL; +} #endif =20 #endif /* _NET_DEVMEM_H */ diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c index 6e5f2de4d947..6e7cd6a5c177 100644 --- a/net/core/netdev-genl.c +++ b/net/core/netdev-genl.c @@ -874,7 +874,8 @@ int netdev_nl_bind_rx_doit(struct sk_buff *skb, struct = genl_info *info) goto err_unlock; } =20 - binding =3D net_devmem_bind_dmabuf(netdev, dmabuf_fd, info->extack); + binding =3D net_devmem_bind_dmabuf(netdev, DMA_FROM_DEVICE, dmabuf_fd, + info->extack); if (IS_ERR(binding)) { err =3D PTR_ERR(binding); goto err_unlock; @@ -931,10 +932,67 @@ int netdev_nl_bind_rx_doit(struct sk_buff *skb, struc= t genl_info *info) return err; } =20 -/* stub */ int netdev_nl_bind_tx_doit(struct sk_buff *skb, struct genl_info *info) { - return 0; + struct net_devmem_dmabuf_binding *binding; + struct list_head *sock_binding_list; + struct net_device *netdev; + u32 ifindex, dmabuf_fd; + struct sk_buff *rsp; + int err =3D 0; + void *hdr; + + if (GENL_REQ_ATTR_CHECK(info, NETDEV_A_DEV_IFINDEX) || + GENL_REQ_ATTR_CHECK(info, NETDEV_A_DMABUF_FD)) + return -EINVAL; + + ifindex =3D nla_get_u32(info->attrs[NETDEV_A_DEV_IFINDEX]); + dmabuf_fd =3D nla_get_u32(info->attrs[NETDEV_A_DMABUF_FD]); + + sock_binding_list =3D genl_sk_priv_get(&netdev_nl_family, + NETLINK_CB(skb).sk); + if (IS_ERR(sock_binding_list)) + return PTR_ERR(sock_binding_list); + + rsp =3D genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL); + if (!rsp) + return -ENOMEM; + + hdr =3D genlmsg_iput(rsp, info); + if (!hdr) { + err =3D -EMSGSIZE; + goto err_genlmsg_free; + } + + rtnl_lock(); + + netdev =3D __dev_get_by_index(genl_info_net(info), ifindex); + if (!netdev || !netif_device_present(netdev)) { + err =3D -ENODEV; + goto err_unlock; + } + + binding =3D net_devmem_bind_dmabuf(netdev, DMA_TO_DEVICE, dmabuf_fd, + info->extack); + if (IS_ERR(binding)) { + err =3D PTR_ERR(binding); + goto err_unlock; + } + + list_add(&binding->list, sock_binding_list); + + nla_put_u32(rsp, NETDEV_A_DMABUF_ID, binding->id); + genlmsg_end(rsp, hdr); + + rtnl_unlock(); + + return genlmsg_reply(rsp, info); + +err_unlock: + rtnl_unlock(); +err_genlmsg_free: + nlmsg_free(rsp); + return err; } =20 void netdev_nl_sock_priv_init(struct list_head *priv) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 6e853d55a3e8..14bf4596da58 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -1605,7 +1605,8 @@ void mm_unaccount_pinned_pages(struct mmpin *mmp) } EXPORT_SYMBOL_GPL(mm_unaccount_pinned_pages); =20 -static struct ubuf_info *msg_zerocopy_alloc(struct sock *sk, size_t size) +static struct ubuf_info *msg_zerocopy_alloc(struct sock *sk, size_t size, + bool devmem) { struct ubuf_info_msgzc *uarg; struct sk_buff *skb; @@ -1620,7 +1621,7 @@ static struct ubuf_info *msg_zerocopy_alloc(struct so= ck *sk, size_t size) uarg =3D (void *)skb->cb; uarg->mmp.user =3D NULL; =20 - if (mm_account_pinned_pages(&uarg->mmp, size)) { + if (likely(!devmem) && mm_account_pinned_pages(&uarg->mmp, size)) { kfree_skb(skb); return NULL; } @@ -1643,7 +1644,7 @@ static inline struct sk_buff *skb_from_uarg(struct ub= uf_info_msgzc *uarg) } =20 struct ubuf_info *msg_zerocopy_realloc(struct sock *sk, size_t size, - struct ubuf_info *uarg) + struct ubuf_info *uarg, bool devmem) { if (uarg) { struct ubuf_info_msgzc *uarg_zc; @@ -1673,7 +1674,8 @@ struct ubuf_info *msg_zerocopy_realloc(struct sock *s= k, size_t size, =20 next =3D (u32)atomic_read(&sk->sk_zckey); if ((u32)(uarg_zc->id + uarg_zc->len) =3D=3D next) { - if (mm_account_pinned_pages(&uarg_zc->mmp, size)) + if (likely(!devmem) && + mm_account_pinned_pages(&uarg_zc->mmp, size)) return NULL; uarg_zc->len++; uarg_zc->bytelen =3D bytelen; @@ -1688,7 +1690,7 @@ struct ubuf_info *msg_zerocopy_realloc(struct sock *s= k, size_t size, } =20 new_alloc: - return msg_zerocopy_alloc(sk, size); + return msg_zerocopy_alloc(sk, size, devmem); } EXPORT_SYMBOL_GPL(msg_zerocopy_realloc); =20 @@ -1792,7 +1794,8 @@ EXPORT_SYMBOL_GPL(msg_zerocopy_ubuf_ops); =20 int skb_zerocopy_iter_stream(struct sock *sk, struct sk_buff *skb, struct msghdr *msg, int len, - struct ubuf_info *uarg) + struct ubuf_info *uarg, + struct net_devmem_dmabuf_binding *binding) { int err, orig_len =3D skb->len; =20 @@ -1811,7 +1814,8 @@ int skb_zerocopy_iter_stream(struct sock *sk, struct = sk_buff *skb, return -EEXIST; } =20 - err =3D __zerocopy_sg_from_iter(msg, sk, skb, &msg->msg_iter, len); + err =3D __zerocopy_sg_from_iter(msg, sk, skb, &msg->msg_iter, len, + binding); if (err =3D=3D -EFAULT || (err =3D=3D -EMSGSIZE && skb->len =3D=3D orig_l= en)) { struct sock *save_sk =3D skb->sk; =20 diff --git a/net/core/sock.c b/net/core/sock.c index 5ac445f8244b..5cd181578395 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -2979,6 +2979,12 @@ int __sock_cmsg_send(struct sock *sk, struct cmsghdr= *cmsg, if (!sk_set_prio_allowed(sk, *(u32 *)CMSG_DATA(cmsg))) return -EPERM; sockc->priority =3D *(u32 *)CMSG_DATA(cmsg); + break; + case SCM_DEVMEM_DMABUF: + if (cmsg->cmsg_len !=3D CMSG_LEN(sizeof(u32))) + return -EINVAL; + sockc->dmabuf_id =3D *(u32 *)CMSG_DATA(cmsg); + break; default: return -EINVAL; diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index ea7a260bec8a..7d8a5f3fae9b 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -1015,7 +1015,8 @@ static int __ip_append_data(struct sock *sk, uarg =3D msg->msg_ubuf; } } else if (sock_flag(sk, SOCK_ZEROCOPY)) { - uarg =3D msg_zerocopy_realloc(sk, length, skb_zcopy(skb)); + uarg =3D msg_zerocopy_realloc(sk, length, skb_zcopy(skb), + false); if (!uarg) return -ENOBUFS; extra_uref =3D !skb_zcopy(skb); /* only ref on new uarg */ diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 08d73f17e816..1180f0486714 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -1059,6 +1059,7 @@ int tcp_sendmsg_fastopen(struct sock *sk, struct msgh= dr *msg, int *copied, =20 int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) { + struct net_devmem_dmabuf_binding *binding =3D NULL; struct tcp_sock *tp =3D tcp_sk(sk); struct ubuf_info *uarg =3D NULL; struct sk_buff *skb; @@ -1071,6 +1072,16 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghd= r *msg, size_t size) =20 flags =3D msg->msg_flags; =20 + sockc =3D (struct sockcm_cookie){ .tsflags =3D READ_ONCE(sk->sk_tsflags), + .dmabuf_id =3D 0 }; + if (msg->msg_controllen) { + err =3D sock_cmsg_send(sk, msg, &sockc); + if (unlikely(err)) { + err =3D -EINVAL; + goto out_err; + } + } + if ((flags & MSG_ZEROCOPY) && size) { if (msg->msg_ubuf) { uarg =3D msg->msg_ubuf; @@ -1078,7 +1089,8 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr= *msg, size_t size) zc =3D MSG_ZEROCOPY; } else if (sock_flag(sk, SOCK_ZEROCOPY)) { skb =3D tcp_write_queue_tail(sk); - uarg =3D msg_zerocopy_realloc(sk, size, skb_zcopy(skb)); + uarg =3D msg_zerocopy_realloc(sk, size, skb_zcopy(skb), + !!sockc.dmabuf_id); if (!uarg) { err =3D -ENOBUFS; goto out_err; @@ -1087,12 +1099,27 @@ int tcp_sendmsg_locked(struct sock *sk, struct msgh= dr *msg, size_t size) zc =3D MSG_ZEROCOPY; else uarg_to_msgzc(uarg)->zerocopy =3D 0; + + if (sockc.dmabuf_id) { + binding =3D net_devmem_get_binding(sk, sockc.dmabuf_id); + if (IS_ERR(binding)) { + err =3D PTR_ERR(binding); + binding =3D NULL; + goto out_err; + } + } } } else if (unlikely(msg->msg_flags & MSG_SPLICE_PAGES) && size) { if (sk->sk_route_caps & NETIF_F_SG) zc =3D MSG_SPLICE_PAGES; } =20 + if (sockc.dmabuf_id && + (!(flags & MSG_ZEROCOPY) || !sock_flag(sk, SOCK_ZEROCOPY))) { + err =3D -EINVAL; + goto out_err; + } + if (unlikely(flags & MSG_FASTOPEN || inet_test_bit(DEFER_CONNECT, sk)) && !tp->repair) { @@ -1131,15 +1158,6 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghd= r *msg, size_t size) /* 'common' sending to sendq */ } =20 - sockc =3D (struct sockcm_cookie) { .tsflags =3D READ_ONCE(sk->sk_tsflags)= }; - if (msg->msg_controllen) { - err =3D sock_cmsg_send(sk, msg, &sockc); - if (unlikely(err)) { - err =3D -EINVAL; - goto out_err; - } - } - /* This should be in poll */ sk_clear_bit(SOCKWQ_ASYNC_NOSPACE, sk); =20 @@ -1256,7 +1274,8 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr= *msg, size_t size) goto wait_for_space; } =20 - err =3D skb_zerocopy_iter_stream(sk, skb, msg, copy, uarg); + err =3D skb_zerocopy_iter_stream(sk, skb, msg, copy, uarg, + binding); if (err =3D=3D -EMSGSIZE || err =3D=3D -EEXIST) { tcp_mark_push(tp, skb); goto new_segment; @@ -1337,6 +1356,8 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr= *msg, size_t size) /* msg->msg_ubuf is pinned by the caller so we don't take extra refs */ if (uarg && !msg->msg_ubuf) net_zcopy_put(uarg); + if (binding) + net_devmem_dmabuf_binding_put(binding); return copied + copied_syn; =20 do_error: @@ -1354,6 +1375,9 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr= *msg, size_t size) sk->sk_write_space(sk); tcp_chrono_stop(sk, TCP_CHRONO_SNDBUF_LIMITED); } + if (binding) + net_devmem_dmabuf_binding_put(binding); + return err; } EXPORT_SYMBOL_GPL(tcp_sendmsg_locked); diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index d577bf2f3053..e9e752f08f87 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -1523,7 +1523,8 @@ static int __ip6_append_data(struct sock *sk, uarg =3D msg->msg_ubuf; } } else if (sock_flag(sk, SOCK_ZEROCOPY)) { - uarg =3D msg_zerocopy_realloc(sk, length, skb_zcopy(skb)); + uarg =3D msg_zerocopy_realloc(sk, length, skb_zcopy(skb), + false); if (!uarg) return -ENOBUFS; extra_uref =3D !skb_zcopy(skb); /* only ref on new uarg */ diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio= _transport_common.c index 7f7de6d88096..6e7b727c781c 100644 --- a/net/vmw_vsock/virtio_transport_common.c +++ b/net/vmw_vsock/virtio_transport_common.c @@ -87,7 +87,7 @@ static int virtio_transport_init_zcopy_skb(struct vsock_s= ock *vsk, =20 uarg =3D msg_zerocopy_realloc(sk_vsock(vsk), iter->count, - NULL); + NULL, false); if (!uarg) return -1; =20 @@ -107,8 +107,7 @@ static int virtio_transport_fill_skb(struct sk_buff *sk= b, { if (zcopy) return __zerocopy_sg_from_iter(info->msg, NULL, skb, - &info->msg->msg_iter, - len); + &info->msg->msg_iter, len, NULL); =20 return memcpy_from_msg(skb_put(skb, len), info->msg, len); } --=20 2.48.1.601.g30ceb7b040-goog From nobody Sun Feb 8 00:03:27 2026 Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BDE67213E93 for ; Sat, 22 Feb 2025 19:15:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740251730; cv=none; b=MQLoHXnaymMuGJpQSMK0iDipGcT5XislW6o373bJs7t9j84pvc5qd0ErI6JbR3LEMY6d6vwKTOaWGSkrXFanAWfECYAOll1dyDl0ixXnl6mvjNzrouxFCGqzq0O9yshkamU61NZZTuNzC2GrxhVbZFCfNHFtyA2un7Cod9XOgdI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740251730; c=relaxed/simple; bh=E/P7rPxR4Kk1M6YTaZjfUiSuLdbUI4t3yFFBc5WKH3g=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=DdU364pVkzJ5hR7yTpv3GxRjug8jDYc5Mzi1TzSl/EkmkwnveTL+ehWsGdJ/X88SA9bZHpxPDJppt9gMMOQowvueCGOhnw0BDHBqjz5ze7yfQmoQ73RZD2wBeMovDjvlhOLlPvESRMaiEJfb6wdRRPjixZPBPceY+2dHoRqvIPc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--almasrymina.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=WpXbPI1h; arc=none smtp.client-ip=209.85.214.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--almasrymina.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="WpXbPI1h" Received: by mail-pl1-f201.google.com with SMTP id d9443c01a7336-22119b07d52so42406965ad.1 for ; Sat, 22 Feb 2025 11:15:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1740251727; x=1740856527; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=FUeVNgWGUp7ktC/YEhqD+3gJ3Ib6eWIiGMQ6DKQeydo=; b=WpXbPI1hbKKhh0erQ7K3reSfQpu66tfQVQP654f7TDA9449mO1eXA32Xl0yNKI4cOu E34xIHcurZ0KALJN7XRYiGwsdzwOmeaega/r/reE/JWZPFGM8wvPa4jaHg8DBCHUcnNP 46RoPAFT0aDWhar/v4FpllrrYIy6VpI+tzq2G8MfJnVeKSzjYjHo4oEWHclBH1zDd+15 EfRlcVhEhhPE/kGcSu1TE3w73+LqdYqHQiMpedA+M2CbPyI2QNSbsFzZg1Xov6YR+Agn NAjjgfLUwDxNyuiasPBE0vGAEKuxMz/7Xq2KVirmlGmoaGRvlqFPZh8Jn47R2KyKj6fY eYAg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740251727; x=1740856527; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=FUeVNgWGUp7ktC/YEhqD+3gJ3Ib6eWIiGMQ6DKQeydo=; b=r2QKywniNSSQfhkCTI5M2XGRtgGK8B3kFvmmLHG7N3Gr2De+nHw1B105FC8z0M3o1p 3ZtGqtWrJlNf7zOjAe5IEf7Bhn2w3pRnrZFJ4tX+d5VorGPUY4QxC9uffa7vO/LYkav9 AqKL55hhU5mtJJwWB3XJbXZV6XBnHfrQrAzFebAm2r5UX6dEboB5fPuTR/XQnaHDKYMf yOoXdXL6m9YUV2a9YDASoLxCa1nR4L8IVv8yWlc7QUjzqDrLCEi0Ndj1QZAXaUUEJHYg xjtuzbK/YpPg00XqXzwmgUWK82zVzmBDqWjzT5/+A2UGBrjjLOPL18NHqKE0ufhg8qTX i+HQ== X-Forwarded-Encrypted: i=1; AJvYcCV16lz9zr+ze5ahNDmlfCr6FQT0nc5MoYAvdixVeI8vYm0EoF+duGuqphX6w8tuY0TVO34sCSa8W2xmfEo=@vger.kernel.org X-Gm-Message-State: AOJu0YykkFEWrPBPUSV+A+ndN792IZRyaE4uCo6NUgqB2ZmfgbgQ0x4q dtDRK6ZhY8vPyzyCknKIw5tHpJ2kzL/u4w4KKhwuAfh3bSTV4HTYgQXBx0dviXxKb1zVgso1vfu dSEMR2CwpndWF630Uz+1lxw== X-Google-Smtp-Source: AGHT+IEk/SyxiiKUwYqu42o8ilQTEtdwmxUvqDOqkfVjhLyldlpx8hpol9aqi8Y0N2cdPnmHqxqGFy4YpekVtRbslQ== X-Received: from pfbfc26.prod.google.com ([2002:a05:6a00:2e1a:b0:730:7d23:bc34]) (user=almasrymina job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a00:2e87:b0:730:91fc:f9c4 with SMTP id d2e1a72fcca58-73426d8fa7dmr12105412b3a.24.1740251727269; Sat, 22 Feb 2025 11:15:27 -0800 (PST) Date: Sat, 22 Feb 2025 19:15:12 +0000 In-Reply-To: <20250222191517.743530-1-almasrymina@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250222191517.743530-1-almasrymina@google.com> X-Mailer: git-send-email 2.48.1.601.g30ceb7b040-goog Message-ID: <20250222191517.743530-5-almasrymina@google.com> Subject: [PATCH net-next v5 4/9] net: devmem: make dmabuf unbinding scheduled work From: Mina Almasry To: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, virtualization@lists.linux.dev, kvm@vger.kernel.org, linux-kselftest@vger.kernel.org Cc: Mina Almasry , Donald Hunter , Jakub Kicinski , "David S. Miller" , Eric Dumazet , Paolo Abeni , Simon Horman , Jonathan Corbet , Andrew Lunn , Jeroen de Borst , Harshitha Ramamurthy , Kuniyuki Iwashima , Willem de Bruijn , David Ahern , Neal Cardwell , Stefan Hajnoczi , Stefano Garzarella , "Michael S. Tsirkin" , Jason Wang , Xuan Zhuo , "=?UTF-8?q?Eugenio=20P=C3=A9rez?=" , Shuah Khan , sdf@fomichev.me, asml.silence@gmail.com, dw@davidwei.uk, Jamal Hadi Salim , Victor Nogueira , Pedro Tammela , Samiullah Khawaja Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The TX path may release the dmabuf in a context where we cannot wait. This happens when the user unbinds a TX dmabuf while there are still references to its netmems in the TX path. In that case, the netmems will be put_netmem'd from a context where we can't unmap the dmabuf, resulting in a BUG like seen by Stan: [ 1.548495] BUG: sleeping function called from invalid context at driver= s/dma-buf/dma-buf.c:1255 [ 1.548741] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 149, = name: ncdevmem [ 1.548926] preempt_count: 201, expected: 0 [ 1.549026] RCU nest depth: 0, expected: 0 [ 1.549197] [ 1.549237] =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D [ 1.549331] [ BUG: Invalid wait context ] [ 1.549425] 6.13.0-rc3-00770-gbc9ef9606dc9-dirty #15 Tainted: G W [ 1.549609] ----------------------------- [ 1.549704] ncdevmem/149 is trying to lock: [ 1.549801] ffff8880066701c0 (reservation_ww_class_mutex){+.+.}-{4:4}, a= t: dma_buf_unmap_attachment_unlocked+0x4b/0x90 [ 1.550051] other info that might help us debug this: [ 1.550167] context-{5:5} [ 1.550229] 3 locks held by ncdevmem/149: [ 1.550322] #0: ffff888005730208 (&sb->s_type->i_mutex_key#11){+.+.}-{4= :4}, at: sock_close+0x40/0xf0 [ 1.550530] #1: ffff88800b148f98 (sk_lock-AF_INET6){+.+.}-{0:0}, at: tc= p_close+0x19/0x80 [ 1.550731] #2: ffff88800b148f18 (slock-AF_INET6){+.-.}-{3:3}, at: __tc= p_close+0x185/0x4b0 [ 1.550921] stack backtrace: [ 1.550990] CPU: 0 UID: 0 PID: 149 Comm: ncdevmem Tainted: G W = 6.13.0-rc3-00770-gbc9ef9606dc9-dirty #15 [ 1.551233] Tainted: [W]=3DWARN [ 1.551304] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS = Arch Linux 1.16.3-1-1 04/01/2014 [ 1.551518] Call Trace: [ 1.551584] [ 1.551636] dump_stack_lvl+0x86/0xc0 [ 1.551723] __lock_acquire+0xb0f/0xc30 [ 1.551814] ? dma_buf_unmap_attachment_unlocked+0x4b/0x90 [ 1.551941] lock_acquire+0xf1/0x2a0 [ 1.552026] ? dma_buf_unmap_attachment_unlocked+0x4b/0x90 [ 1.552152] ? dma_buf_unmap_attachment_unlocked+0x4b/0x90 [ 1.552281] ? dma_buf_unmap_attachment_unlocked+0x4b/0x90 [ 1.552408] __ww_mutex_lock+0x121/0x1060 [ 1.552503] ? dma_buf_unmap_attachment_unlocked+0x4b/0x90 [ 1.552648] ww_mutex_lock+0x3d/0xa0 [ 1.552733] dma_buf_unmap_attachment_unlocked+0x4b/0x90 [ 1.552857] __net_devmem_dmabuf_binding_free+0x56/0xb0 [ 1.552979] skb_release_data+0x120/0x1f0 [ 1.553074] __kfree_skb+0x29/0xa0 [ 1.553156] tcp_write_queue_purge+0x41/0x310 [ 1.553259] tcp_v4_destroy_sock+0x127/0x320 [ 1.553363] ? __tcp_close+0x169/0x4b0 [ 1.553452] inet_csk_destroy_sock+0x53/0x130 [ 1.553560] __tcp_close+0x421/0x4b0 [ 1.553646] tcp_close+0x24/0x80 [ 1.553724] inet_release+0x5d/0x90 [ 1.553806] sock_close+0x4a/0xf0 [ 1.553886] __fput+0x9c/0x2b0 [ 1.553960] task_work_run+0x89/0xc0 [ 1.554046] do_exit+0x27f/0x980 [ 1.554125] do_group_exit+0xa4/0xb0 [ 1.554211] __x64_sys_exit_group+0x17/0x20 [ 1.554309] x64_sys_call+0x21a0/0x21a0 [ 1.554400] do_syscall_64+0xec/0x1d0 [ 1.554487] ? exc_page_fault+0x8a/0xf0 [ 1.554585] entry_SYSCALL_64_after_hwframe+0x77/0x7f [ 1.554703] RIP: 0033:0x7f2f8a27abcd Resolve this by making __net_devmem_dmabuf_binding_free schedule_work'd. Suggested-by: Stanislav Fomichev Signed-off-by: Mina Almasry Acked-by: Stanislav Fomichev --- net/core/devmem.c | 4 +++- net/core/devmem.h | 10 ++++++---- 2 files changed, 9 insertions(+), 5 deletions(-) diff --git a/net/core/devmem.c b/net/core/devmem.c index e5941f8e29df..7a0ce705a703 100644 --- a/net/core/devmem.c +++ b/net/core/devmem.c @@ -55,8 +55,10 @@ static dma_addr_t net_devmem_get_dma_addr(const struct n= et_iov *niov) ((dma_addr_t)net_iov_idx(niov) << PAGE_SHIFT); } =20 -void __net_devmem_dmabuf_binding_free(struct net_devmem_dmabuf_binding *bi= nding) +void __net_devmem_dmabuf_binding_free(struct work_struct *wq) { + struct net_devmem_dmabuf_binding *binding =3D container_of(wq, typeof(*bi= nding), unbind_w); + size_t size, avail; =20 gen_pool_for_each_chunk(binding->chunk_pool, diff --git a/net/core/devmem.h b/net/core/devmem.h index a8b79c0e01b3..861150349825 100644 --- a/net/core/devmem.h +++ b/net/core/devmem.h @@ -54,6 +54,8 @@ struct net_devmem_dmabuf_binding { * net_iovs in the TX path. */ struct net_iov **tx_vec; + + struct work_struct unbind_w; }; =20 #if defined(CONFIG_NET_DEVMEM) @@ -70,7 +72,7 @@ struct dmabuf_genpool_chunk_owner { dma_addr_t base_dma_addr; }; =20 -void __net_devmem_dmabuf_binding_free(struct net_devmem_dmabuf_binding *bi= nding); +void __net_devmem_dmabuf_binding_free(struct work_struct *wq); struct net_devmem_dmabuf_binding * net_devmem_bind_dmabuf(struct net_device *dev, enum dma_data_direction direction, @@ -121,7 +123,8 @@ net_devmem_dmabuf_binding_put(struct net_devmem_dmabuf_= binding *binding) if (!refcount_dec_and_test(&binding->ref)) return; =20 - __net_devmem_dmabuf_binding_free(binding); + INIT_WORK(&binding->unbind_w, __net_devmem_dmabuf_binding_free); + schedule_work(&binding->unbind_w); } =20 void net_devmem_get_net_iov(struct net_iov *niov); @@ -154,8 +157,7 @@ static inline void net_devmem_put_net_iov(struct net_io= v *niov) { } =20 -static inline void -__net_devmem_dmabuf_binding_free(struct net_devmem_dmabuf_binding *binding) +static inline void __net_devmem_dmabuf_binding_free(struct work_struct *wq) { } =20 --=20 2.48.1.601.g30ceb7b040-goog From nobody Sun Feb 8 00:03:27 2026 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D07542147FB for ; Sat, 22 Feb 2025 19:15:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740251732; cv=none; b=cqBAonBEv3lvFKueSefoGy7RPyTwSsV97vkvWAlmcBZVaaDBXNmVeCUZNnsfCCKqNo7BM3mQHbxeok+eATG4mX64Nmo8G+q1J/nROB0vIy8+HSeEYggWQLn/cXRTZXOUDwHg8eP/uDB/ZniLwgdW1rvCd0ntWyadTnA6oOJONdU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740251732; c=relaxed/simple; bh=s68TTpwP7QKDMDjkbUq2S8mCekWk8x6v47OZqGbmT5g=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=Nwg7Iy/FP0PQ8o3uYKv+d3UC16iTNjBAij0jofLpTXL9MBaY2wZrcEj/S4BRWaj2jpy52Ri10ELSlknPfYyYyRHi1p9nY3SnO5RFu4qjy4QP0Xhe5/lOy7GAHI3II6DTU2lcGyUinyWQPKx1iZYggF3u4fA49PXlkXBjktYUGMw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--almasrymina.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=F7LxmK8Y; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--almasrymina.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="F7LxmK8Y" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-2fc0bc05bb5so6876583a91.2 for ; Sat, 22 Feb 2025 11:15:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1740251729; x=1740856529; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=6gJV9EBgmaTTrfk8tvzzMOqrL8tQk1rORdPb1Sc+sNM=; b=F7LxmK8YiiAL1FNanCjZKgvFKmIfOHsqMgLHH1doTkYKa7nd889kvmUxRZxCSAht0i BHydokZetmIHN+SYLS+f31UMzo4gdtTI1jK48dOZsW3pntT6JIML92JyUIKD7dot6sBR NvVMkbBQM/ZL/ER1rYfBspvP9JT17RvMAsNTTvy2NIW5JcC5hvdW9l7/Hi2j36snlXTn UezirKAlktFGDZJyZ4RNhj7oWSwRCOvd87sb7e+FpYKE3ZPdpOx6lwcejvP7trwafJ26 U3JWzLrUf/s75NZKwFaoPCEs4pRrYCQQjlohAYcAaWDTc/YVa5gATQx2Z0BCnrwajdfb EJeQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740251729; x=1740856529; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=6gJV9EBgmaTTrfk8tvzzMOqrL8tQk1rORdPb1Sc+sNM=; b=SMGJe4SUHodoR8mmbP7VWPwn0QGxpAw+PZGDFIsxscmilhq/aziW5eGpjSMhgg1occ Bf11/4tN+BxXhevfNtAljASliE9roUVQtix8w93vhD4lvUqjzmp3lmwDeT1YaocuEQNj 1L1kjB23gBExCEWoJQ2+OG5yD1CwewxuBLeROyX4wwt2Ba7kfG0N0u6RgE6qX2o8tqX4 jdtvikiaxKuQZS07OVl9pQeacdRxxry6rxp0x89PoBincu9TEKAiT0qHYv9zDoV9B0Ck mavRtdZj6/2H+rkivIa4puHE6cCGhgufNkLr5uupLdMLQmkVtEvAgaY+wZctm+30dV92 qY3g== X-Forwarded-Encrypted: i=1; AJvYcCXLaEKaB5IenZqhJCIO9KAx8fEpAkhHlX9+2Co1dtDa1v9DTsYeXKftvtPvQlcKuXyLwUsXsqpCmuockeA=@vger.kernel.org X-Gm-Message-State: AOJu0Yx9hmswY1nuDRlTLVLG58J0dioww/hpzKqNTIaRlqRmXQBO4AUk rxawMBDo0qVh1r4jE99hAMUaYwtBAUI+OiCNpekS7O4YHgLPyafQsPbfDYllBPUgOXZPTLvgrwy OAvOMoLc6Q1O0eIKiTKtzEw== X-Google-Smtp-Source: AGHT+IE1PIfztiZwPW21h1n8rfUQNigUxL9cKJJasAHDKTyCvGRQgfKcO/CfYPAk3liqfg0igsEGvAm9aKKhG08NOA== X-Received: from pjbnb15.prod.google.com ([2002:a17:90b:35cf:b0:2fc:b544:749e]) (user=almasrymina job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:2b8e:b0:2f5:747:cbd with SMTP id 98e67ed59e1d1-2fce78da5d0mr15018799a91.18.1740251729147; Sat, 22 Feb 2025 11:15:29 -0800 (PST) Date: Sat, 22 Feb 2025 19:15:13 +0000 In-Reply-To: <20250222191517.743530-1-almasrymina@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250222191517.743530-1-almasrymina@google.com> X-Mailer: git-send-email 2.48.1.601.g30ceb7b040-goog Message-ID: <20250222191517.743530-6-almasrymina@google.com> Subject: [PATCH net-next v5 5/9] net: add devmem TCP TX documentation From: Mina Almasry To: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, virtualization@lists.linux.dev, kvm@vger.kernel.org, linux-kselftest@vger.kernel.org Cc: Mina Almasry , Donald Hunter , Jakub Kicinski , "David S. Miller" , Eric Dumazet , Paolo Abeni , Simon Horman , Jonathan Corbet , Andrew Lunn , Jeroen de Borst , Harshitha Ramamurthy , Kuniyuki Iwashima , Willem de Bruijn , David Ahern , Neal Cardwell , Stefan Hajnoczi , Stefano Garzarella , "Michael S. Tsirkin" , Jason Wang , Xuan Zhuo , "=?UTF-8?q?Eugenio=20P=C3=A9rez?=" , Shuah Khan , sdf@fomichev.me, asml.silence@gmail.com, dw@davidwei.uk, Jamal Hadi Salim , Victor Nogueira , Pedro Tammela , Samiullah Khawaja Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add documentation outlining the usage and details of the devmem TCP TX API. Signed-off-by: Mina Almasry Acked-by: Stanislav Fomichev --- v5: - Address comments from Stan and Bagas v4: - Mention SO_BINDTODEVICE is recommended (me/Pavel). v2: - Update documentation for iov_base is the dmabuf offset (Stan) --- Documentation/networking/devmem.rst | 150 +++++++++++++++++++++++++++- 1 file changed, 146 insertions(+), 4 deletions(-) diff --git a/Documentation/networking/devmem.rst b/Documentation/networking= /devmem.rst index d95363645331..1c476522d6f5 100644 --- a/Documentation/networking/devmem.rst +++ b/Documentation/networking/devmem.rst @@ -62,15 +62,15 @@ More Info https://lore.kernel.org/netdev/20240831004313.3713467-1-almasrymina@go= ogle.com/ =20 =20 -Interface -=3D=3D=3D=3D=3D=3D=3D=3D=3D +RX Interface +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =20 =20 Example ------- =20 -tools/testing/selftests/net/ncdevmem.c:do_server shows an example of setti= ng up -the RX path of this API. +./tools/testing/selftests/drivers/net/hw/ncdevmem:do_server shows an examp= le of +setting up the RX path of this API. =20 =20 NIC Setup @@ -235,6 +235,148 @@ can be less than the tokens provided by the user in c= ase of: (a) an internal kernel leak bug. (b) the user passed more than 1024 frags. =20 +TX Interface +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + + +Example +------- + +./tools/testing/selftests/drivers/net/hw/ncdevmem:do_client shows an examp= le of +setting up the TX path of this API. + + +NIC Setup +--------- + +The user must bind a TX dmabuf to a given NIC using the netlink API:: + + struct netdev_bind_tx_req *req =3D NULL; + struct netdev_bind_tx_rsp *rsp =3D NULL; + struct ynl_error yerr; + + *ys =3D ynl_sock_create(&ynl_netdev_family, &yerr); + + req =3D netdev_bind_tx_req_alloc(); + netdev_bind_tx_req_set_ifindex(req, ifindex); + netdev_bind_tx_req_set_fd(req, dmabuf_fd); + + rsp =3D netdev_bind_tx(*ys, req); + + tx_dmabuf_id =3D rsp->id; + + +The netlink API returns a dmabuf_id: a unique ID that refers to this dmabuf +that has been bound. + +The user can unbind the dmabuf from the netdevice by closing the netlink s= ocket +that established the binding. We do this so that the binding is automatica= lly +unbound even if the userspace process crashes. + +Note that any reasonably well-behaved dmabuf from any exporter should work= with +devmem TCP, even if the dmabuf is not actually backed by devmem. An exampl= e of +this is udmabuf, which wraps user memory (non-devmem) in a dmabuf. + +Socket Setup +------------ + +The user application must use MSG_ZEROCOPY flag when sending devmem TCP. D= evmem +cannot be copied by the kernel, so the semantics of the devmem TX are simi= lar +to the semantics of MSG_ZEROCOPY:: + + setsockopt(socket_fd, SOL_SOCKET, SO_ZEROCOPY, &opt, sizeof(opt)); + +It is also recommended that the user binds the TX socket to the same inter= face +the dma-buf has been bound to via SO_BINDTODEVICE:: + + setsockopt(socket_fd, SOL_SOCKET, SO_BINDTODEVICE, ifname, strlen(ifname)= + 1); + + +Sending data +------------ + +Devmem data is sent using the SCM_DEVMEM_DMABUF cmsg. + +The user should create a msghdr where, + +* iov_base is set to the offset into the dmabuf to start sending from +* iov_len is set to the number of bytes to be sent from the dmabuf + +The user passes the dma-buf id to send from via the dmabuf_tx_cmsg.dmabuf_= id. + +The example below sends 1024 bytes from offset 100 into the dmabuf, and 20= 48 +from offset 2000 into the dmabuf. The dmabuf to send from is tx_dmabuf_id:: + + char ctrl_data[CMSG_SPACE(sizeof(struct dmabuf_tx_cmsg))]; + struct dmabuf_tx_cmsg ddmabuf; + struct msghdr msg =3D {}; + struct cmsghdr *cmsg; + struct iovec iov[2]; + + iov[0].iov_base =3D (void*)100; + iov[0].iov_len =3D 1024; + iov[1].iov_base =3D (void*)2000; + iov[1].iov_len =3D 2048; + + msg.msg_iov =3D iov; + msg.msg_iovlen =3D 2; + + msg.msg_control =3D ctrl_data; + msg.msg_controllen =3D sizeof(ctrl_data); + + cmsg =3D CMSG_FIRSTHDR(&msg); + cmsg->cmsg_level =3D SOL_SOCKET; + cmsg->cmsg_type =3D SCM_DEVMEM_DMABUF; + cmsg->cmsg_len =3D CMSG_LEN(sizeof(struct dmabuf_tx_cmsg)); + + ddmabuf.dmabuf_id =3D tx_dmabuf_id; + + *((struct dmabuf_tx_cmsg *)CMSG_DATA(cmsg)) =3D ddmabuf; + + sendmsg(socket_fd, &msg, MSG_ZEROCOPY); + + +Reusing TX dmabufs +------------------ + +Similar to MSG_ZEROCOPY with regular memory, the user should not modify the +contents of the dma-buf while a send operation is in progress. This is bec= ause +the kernel does not keep a copy of the dmabuf contents. Instead, the kernel +will pin and send data from the buffer available to the userspace. + +Just as in MSG_ZEROCOPY, the kernel notifies the userspace of send complet= ions +using MSG_ERRQUEUE:: + + int64_t tstop =3D gettimeofday_ms() + waittime_ms; + char control[CMSG_SPACE(100)] =3D {}; + struct sock_extended_err *serr; + struct msghdr msg =3D {}; + struct cmsghdr *cm; + int retries =3D 10; + __u32 hi, lo; + + msg.msg_control =3D control; + msg.msg_controllen =3D sizeof(control); + + while (gettimeofday_ms() < tstop) { + if (!do_poll(fd)) continue; + + ret =3D recvmsg(fd, &msg, MSG_ERRQUEUE); + + for (cm =3D CMSG_FIRSTHDR(&msg); cm; cm =3D CMSG_NXTHDR(&m= sg, cm)) { + serr =3D (void *)CMSG_DATA(cm); + + hi =3D serr->ee_data; + lo =3D serr->ee_info; + + fprintf(stdout, "tx complete [%d,%d]\n", lo, hi); + } + } + +After the associated sendmsg has been completed, the dmabuf can be reused = by +the userspace. + + Implementation & Caveats =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =20 --=20 2.48.1.601.g30ceb7b040-goog From nobody Sun Feb 8 00:03:27 2026 Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 88D04215175 for ; Sat, 22 Feb 2025 19:15:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740251733; cv=none; b=u/stqtx75HAoqlOGBhutv56sFvEdcff76zHLP/dN+iCCHAi465Kq+R0g0GikkxfLMNeNlOmJJMNGfNFOY8yMt9uYgDpwPXCuAj0w4YxooFU3KnNsqzM4PCNt3x9RQs+ptQ3gftLhiehFWNTvGYDhrmAKSN4EeLMvmQfSRqZzlLI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740251733; c=relaxed/simple; bh=Z8XZh144o1xsxD6YeKrPk1DFMZATUyG50T5LU7nvHVE=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=mGLs7XAkJL6ljTMTq6t+yWHHgi5l13czwg3yNOndVgdVypyGDxNqUfC/YvWtTHDoP362HAvDDoysWpuPV3m1YZQQMBaBAEhUGW9ig0TA5fMeSoRU8+IGjRFUsur+A1SwFiRR/PdDPw1jVnr3hTr5bXuGY5vEJYbOBIorvf99yjc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--almasrymina.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=YLpWOnlo; arc=none smtp.client-ip=209.85.214.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--almasrymina.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="YLpWOnlo" Received: by mail-pl1-f201.google.com with SMTP id d9443c01a7336-220cd43c75aso97570045ad.3 for ; Sat, 22 Feb 2025 11:15:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1740251731; x=1740856531; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=mQstzXvy/+ei25gmEQDC5JMLZS6iH36WD8mnGccBiB0=; b=YLpWOnloljy0bZ6GVd7X/l1j8eL4f5BgvW/OXl+nzmU9yKovA70gSMfmRdGPeNarEK nijqrd+Zhej99wDh7PMNMtRLS6rATXdy0b95M2zo1xrjHpToXcFSSPRnpNTn9GF8Za6N pvJvdaU6J3uLOMBE9ad7xsyqmOrMrRMY0BSW5WfaBCVr7lVo75tkwYZ0lEnWPR8oCzEB shIccDA5UZFrACG4PaHtEhNjStbQtGGhLDkgH7vUvjC8CfZlpdJ2Kc+RxJCOvwLinStL N0ygY2xits2WeabYZGKALD/HnpdRmspM8ZFEe7dzGRaKZY8BfNaWXtj6VBcCNGwtxJRl zlsw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740251731; x=1740856531; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=mQstzXvy/+ei25gmEQDC5JMLZS6iH36WD8mnGccBiB0=; b=Q7TSfhsRJGYFjGQkipgjOHaoeTwmlOwB2R3t0XPnU0NKlNGJ58UtJg3AXOOow6bB8h HGLceTXDqyS1NLsG7sRpcBxTppu41QcxgSapj1kGVE0Z6bYsI5lBMtXuPamZ9pGDxfL+ XhZkL5MNhAXUkC+9CYGqepXnyL+bt5dNZrnf4xWs/UYH7HNeLjdcDUJyW55I0ihJeGuM Bfrg6Zc/G/xyNnRVhoKv1YR82sy0E8OYy1qn3nPy6TG24PPmgt9v6qFB9Zr76j24oVJQ Cqs+aaLhUPBpjnWmnJr5DXv+1GcXv8e3JRyTFBCq0oY5WzWYdTw1be2Sc7styKhH+GI9 qmqQ== X-Forwarded-Encrypted: i=1; AJvYcCVgzYucGrgHEguPHZtbdZdmxPPADIGiDlmbg9Mf4/vMWaCx27OoXKFpZCABjcDCE30KLH2RnAU2S/R5MyA=@vger.kernel.org X-Gm-Message-State: AOJu0YzdU4MFf9zuAFiIuZmAQ0Mud4+4Jdbs/F+OlvA30JmP2BMIOwfM wJCwvGrkiwf0+w06zzuWP3FpEWCZQGUst440XuZqrqxKuytUL3zQsn4t3J6OkXPDY3asg+op4B7 TF9xvwCLTfHgDvGxtUDfSYw== X-Google-Smtp-Source: AGHT+IGzohye702pra9xAOfuA8QqfT7yOlA9WIo2DniYe4LBDR8mwbdq2/IMdBRMBotzAJAXvPnQ2LHGT+xy2L2BlA== X-Received: from ploo10.prod.google.com ([2002:a17:902:e00a:b0:212:4557:e89b]) (user=almasrymina job=prod-delivery.src-stubby-dispatcher) by 2002:a17:903:191:b0:216:4853:4c0b with SMTP id d9443c01a7336-2219ffc491dmr125988365ad.33.1740251730917; Sat, 22 Feb 2025 11:15:30 -0800 (PST) Date: Sat, 22 Feb 2025 19:15:14 +0000 In-Reply-To: <20250222191517.743530-1-almasrymina@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250222191517.743530-1-almasrymina@google.com> X-Mailer: git-send-email 2.48.1.601.g30ceb7b040-goog Message-ID: <20250222191517.743530-7-almasrymina@google.com> Subject: [PATCH net-next v5 6/9] net: enable driver support for netmem TX From: Mina Almasry To: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, virtualization@lists.linux.dev, kvm@vger.kernel.org, linux-kselftest@vger.kernel.org Cc: Mina Almasry , Donald Hunter , Jakub Kicinski , "David S. Miller" , Eric Dumazet , Paolo Abeni , Simon Horman , Jonathan Corbet , Andrew Lunn , Jeroen de Borst , Harshitha Ramamurthy , Kuniyuki Iwashima , Willem de Bruijn , David Ahern , Neal Cardwell , Stefan Hajnoczi , Stefano Garzarella , "Michael S. Tsirkin" , Jason Wang , Xuan Zhuo , "=?UTF-8?q?Eugenio=20P=C3=A9rez?=" , Shuah Khan , sdf@fomichev.me, asml.silence@gmail.com, dw@davidwei.uk, Jamal Hadi Salim , Victor Nogueira , Pedro Tammela , Samiullah Khawaja Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Drivers need to make sure not to pass netmem dma-addrs to the dma-mapping API in order to support netmem TX. Add helpers and netmem_dma_*() helpers that enables special handling of netmem dma-addrs that drivers can use. Document in netmem.rst what drivers need to do to support netmem TX. Signed-off-by: Mina Almasry Acked-by: Stanislav Fomichev --- v5: - Fix netmet TX documentation (Stan). v4: - New patch --- .../networking/net_cachelines/net_device.rst | 1 + Documentation/networking/netdev-features.rst | 5 ++++ Documentation/networking/netmem.rst | 23 +++++++++++++++++-- include/linux/netdevice.h | 2 ++ include/net/netmem.h | 20 ++++++++++++++++ 5 files changed, 49 insertions(+), 2 deletions(-) diff --git a/Documentation/networking/net_cachelines/net_device.rst b/Docum= entation/networking/net_cachelines/net_device.rst index 15e31ece675f..e3043b033647 100644 --- a/Documentation/networking/net_cachelines/net_device.rst +++ b/Documentation/networking/net_cachelines/net_device.rst @@ -10,6 +10,7 @@ Type Name = fastpath_tx_acce =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D unsigned_long:32 priv_flags read_mostl= y __dev_queue_xmit(tx) unsigned_long:1 lltx read_mostl= y HARD_TX_LOCK,HARD_TX_TRYLOCK,HARD_TX_UNLOCK(t= x) +unsigned long:1 netmem_tx:1; read_mostly char name[16] struct netdev_name_node* name_node struct dev_ifalias* ifalias diff --git a/Documentation/networking/netdev-features.rst b/Documentation/n= etworking/netdev-features.rst index 5014f7cc1398..02bd7536fc0c 100644 --- a/Documentation/networking/netdev-features.rst +++ b/Documentation/networking/netdev-features.rst @@ -188,3 +188,8 @@ Redundancy) frames from one port to another in hardware. This should be set for devices which duplicate outgoing HSR (High-availabi= lity Seamless Redundancy) or PRP (Parallel Redundancy Protocol) tags automatica= lly frames in hardware. + +* netmem-tx + +This should be set for devices which support netmem TX. See +Documentation/networking/netmem.rst diff --git a/Documentation/networking/netmem.rst b/Documentation/networking= /netmem.rst index 7de21ddb5412..b63aded46337 100644 --- a/Documentation/networking/netmem.rst +++ b/Documentation/networking/netmem.rst @@ -19,8 +19,8 @@ Benefits of Netmem : * Simplified Development: Drivers interact with a consistent API, regardless of the underlying memory implementation. =20 -Driver Requirements -=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D +Driver RX Requirements +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =20 1. The driver must support page_pool. =20 @@ -77,3 +77,22 @@ Driver Requirements that purpose, but be mindful that some netmem types might have longer circulation times, such as when userspace holds a reference in zerocopy scenarios. + +Driver TX Requirements +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +1. The Driver must not pass the netmem dma_addr to any of the dma-mapping = APIs + directly. This is because netmem dma_addrs may come from a source like + dma-buf that is not compatible with the dma-mapping APIs. + + Helpers like netmem_dma_unmap_page_attrs() & netmem_dma_unmap_addr_set() + should be used in lieu of dma_unmap_page[_attrs](), dma_unmap_addr_set(= ). + The netmem variants will handle netmem dma_addrs correctly regardless o= f the + source, delegating to the dma-mapping APIs when appropriate. + + Not all dma-mapping APIs have netmem equivalents at the moment. If your + driver relies on a missing netmem API, feel free to add and propose to + netdev@, or reach out to the maintainers and/or almasrymina@google.com = for + help adding the netmem API. + +2. Driver should declare support by setting `netdev->netmem_tx =3D true` diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 9a387d456592..22d9621633a0 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1753,6 +1753,7 @@ enum netdev_reg_state { * @lltx: device supports lockless Tx. Deprecated for real HW * drivers. Mainly used by logical interfaces, such as * bonding and tunnels + * @netmem_tx: device support netmem_tx. * * @name: This is the first field of the "visible" part of this structure * (i.e. as seen by users in the "Space.c" file). It is the name @@ -2061,6 +2062,7 @@ struct net_device { struct_group(priv_flags_fast, unsigned long priv_flags:32; unsigned long lltx:1; + unsigned long netmem_tx:1; ); const struct net_device_ops *netdev_ops; const struct header_ops *header_ops; diff --git a/include/net/netmem.h b/include/net/netmem.h index a2148ffb203d..1fb39ad63290 100644 --- a/include/net/netmem.h +++ b/include/net/netmem.h @@ -8,6 +8,7 @@ #ifndef _NET_NETMEM_H #define _NET_NETMEM_H =20 +#include #include #include =20 @@ -267,4 +268,23 @@ static inline unsigned long netmem_get_dma_addr(netmem= _ref netmem) void get_netmem(netmem_ref netmem); void put_netmem(netmem_ref netmem); =20 +#define netmem_dma_unmap_addr_set(NETMEM, PTR, ADDR_NAME, VAL) \ + do { \ + if (!netmem_is_net_iov(NETMEM)) \ + dma_unmap_addr_set(PTR, ADDR_NAME, VAL); \ + else \ + dma_unmap_addr_set(PTR, ADDR_NAME, 0); \ + } while (0) + +static inline void netmem_dma_unmap_page_attrs(struct device *dev, + dma_addr_t addr, size_t size, + enum dma_data_direction dir, + unsigned long attrs) +{ + if (!addr) + return; + + dma_unmap_page_attrs(dev, addr, size, dir, attrs); +} + #endif /* _NET_NETMEM_H */ --=20 2.48.1.601.g30ceb7b040-goog From nobody Sun Feb 8 00:03:27 2026 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C70E52153EB for ; Sat, 22 Feb 2025 19:15:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740251734; cv=none; b=sjTnivhLaqa93B9gNo5/QF5OEZJf0pIL6qLzHBodYSpj40NEtS/HPSVpfAYeG984/5hHQ2aE5i9l55KssHu+Pp2dYQ8DYgvbtdjxnrUk0vPQ0X1uDPua2stLEc+Lj5ySfHndJOQynK8CxUj5lVTxQLBWLW8U13vIlsRYbhcD8D4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740251734; c=relaxed/simple; bh=S4WIdcIymEnObCKW8hFtjmYPsUwr3C9vEUbedDCBePY=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=sM5cUzdlCxKHsLDDa7sq/Kqh0vMZiUSLWxIxDuz9kDzWxH5IUdNPtjSW9AUoNNaN8y1Any3Msg+wqOu5mZuAy9WfIzWKTuQH7DGGg7w659FYjVXYFh5cgtshFQ1hJKi3LeXuVZnwE2LJom4Bj/BDJe9LOJ1uCcdLqSwwEpq+7r8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--almasrymina.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=3vhCkRvu; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--almasrymina.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="3vhCkRvu" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-2fc1c7c8396so6788446a91.0 for ; Sat, 22 Feb 2025 11:15:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1740251732; x=1740856532; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=biWDWCpFQb1l1Z/yT0zgxnecEVUj+cBj8CVtcecqD6k=; b=3vhCkRvu9ODATAHeq172b5LYMXdSmSzPyrHBeonxWdVs0zR0wvYWtpdn05MjErEjrb U3iAxI/fqMS0Be3gPk7aMcwgRQ28JmeDnf5JcZ1Z4XhneCdTcRMpLU/JmMje/2zkS7nW D+nUiFYiXJcrjMNc/J/Ufda9gynSdCBMArhnTQ/WHB3+xig5pnxtfMav9t+yfZogPHpx vw3Y96nrkYccbLoEEN5Lr8e2w62doHQngW2LXparGfL/ETIOkQRqEanKZl8JtPyR2VUw 1YTeASmCrCtpHzxPCZmkf+ZSoKGiLD2RtnktUqM1aZ//mFZ1TEq4q5kb9S8A+wGr/Lt+ MwlQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740251732; x=1740856532; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=biWDWCpFQb1l1Z/yT0zgxnecEVUj+cBj8CVtcecqD6k=; b=gY8u3oQN3dvyOojTLZ/0GV7ZyAt1X2ULhElPi7rXdKx25mBpeJXPJJEFUz9XAckXws q3jx9UURQSA8u4564Kve3oAKD5SSymncrS+tccSSjQN/ReIxKF6RL9JTTbbBLbRaEPXH TUbOIXIZsjzAlXPskihTQc8wXPnhm5ldMlj5uel2FX80q4EG4/htn45mikk6QtQDw8QA W6edap2vISw8snpr7a/8WitfiCQ710lchX628ACKnT4G/Ri+upAsGULbNP0kdDv1M2xu c1r/gY0CVniIfuIqBbPLK3m8mZ2AVz64dgX8KbB5/7ftHoYeceUgMGxTfnIdA9TWQ4kY JCQg== X-Forwarded-Encrypted: i=1; AJvYcCUGKcujiRRGlYb+OqUL2pds0CA5JaR6IvfKLQ5uH52FhpKVfUZbkCxGnHSPSESO6/vRC4jpprSMUXP5VII=@vger.kernel.org X-Gm-Message-State: AOJu0Yw0Skixx12M7bU1XZtW8fNHZRuVvimfMdJOldI6ZQxwZq2+SsTK oiJoc7YGTpjBFmSgag/ftHe5qfmhFM20WnDE/YnD2FGqTqEVSAseSAcRk07gULF9q7EbNjF9Hbv cYvGIpaqlITOdG153URXcYg== X-Google-Smtp-Source: AGHT+IF3RH6VzHVCehHRbpJJLJjV+rVcxv8rDd/2+CtDFFrrVHzDHLPS8NaJht7KEp7lWFwHwxhOI4KSFp5QbSD+AA== X-Received: from pjbdj7.prod.google.com ([2002:a17:90a:d2c7:b0:2fc:2b96:2d4b]) (user=almasrymina job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:3b4a:b0:2fa:1a8a:cffc with SMTP id 98e67ed59e1d1-2fce875d403mr10219694a91.34.1740251732332; Sat, 22 Feb 2025 11:15:32 -0800 (PST) Date: Sat, 22 Feb 2025 19:15:15 +0000 In-Reply-To: <20250222191517.743530-1-almasrymina@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250222191517.743530-1-almasrymina@google.com> X-Mailer: git-send-email 2.48.1.601.g30ceb7b040-goog Message-ID: <20250222191517.743530-8-almasrymina@google.com> Subject: [PATCH net-next v5 7/9] gve: add netmem TX support to GVE DQO-RDA mode From: Mina Almasry To: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, virtualization@lists.linux.dev, kvm@vger.kernel.org, linux-kselftest@vger.kernel.org Cc: Mina Almasry , Donald Hunter , Jakub Kicinski , "David S. Miller" , Eric Dumazet , Paolo Abeni , Simon Horman , Jonathan Corbet , Andrew Lunn , Jeroen de Borst , Harshitha Ramamurthy , Kuniyuki Iwashima , Willem de Bruijn , David Ahern , Neal Cardwell , Stefan Hajnoczi , Stefano Garzarella , "Michael S. Tsirkin" , Jason Wang , Xuan Zhuo , "=?UTF-8?q?Eugenio=20P=C3=A9rez?=" , Shuah Khan , sdf@fomichev.me, asml.silence@gmail.com, dw@davidwei.uk, Jamal Hadi Salim , Victor Nogueira , Pedro Tammela , Samiullah Khawaja Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Use netmem_dma_*() helpers in gve_tx_dqo.c DQO-RDA paths to enable netmem TX support in that mode. Declare support for netmem TX in GVE DQO-RDA mode. Signed-off-by: Mina Almasry --- v4: - New patch --- drivers/net/ethernet/google/gve/gve_main.c | 4 ++++ drivers/net/ethernet/google/gve/gve_tx_dqo.c | 8 +++++--- 2 files changed, 9 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/google/gve/gve_main.c b/drivers/net/ether= net/google/gve/gve_main.c index 029be8342b7b..0f11c8455149 100644 --- a/drivers/net/ethernet/google/gve/gve_main.c +++ b/drivers/net/ethernet/google/gve/gve_main.c @@ -2817,6 +2817,10 @@ static int gve_probe(struct pci_dev *pdev, const str= uct pci_device_id *ent) =20 dev_info(&pdev->dev, "GVE version %s\n", gve_version_str); dev_info(&pdev->dev, "GVE queue format %d\n", (int)priv->queue_format); + + if (!gve_is_gqi(priv) && !gve_is_qpl(priv)) + dev->netmem_tx =3D true; + gve_clear_probe_in_progress(priv); queue_work(priv->gve_wq, &priv->service_task); return 0; diff --git a/drivers/net/ethernet/google/gve/gve_tx_dqo.c b/drivers/net/eth= ernet/google/gve/gve_tx_dqo.c index 394debc62268..e74580dc7ebe 100644 --- a/drivers/net/ethernet/google/gve/gve_tx_dqo.c +++ b/drivers/net/ethernet/google/gve/gve_tx_dqo.c @@ -667,7 +667,8 @@ static int gve_tx_add_skb_no_copy_dqo(struct gve_tx_rin= g *tx, goto err; =20 dma_unmap_len_set(pkt, len[pkt->num_bufs], len); - dma_unmap_addr_set(pkt, dma[pkt->num_bufs], addr); + netmem_dma_unmap_addr_set(skb_frag_netmem(frag), pkt, + dma[pkt->num_bufs], addr); ++pkt->num_bufs; =20 gve_tx_fill_pkt_desc_dqo(tx, desc_idx, skb, len, addr, @@ -1045,8 +1046,9 @@ static void gve_unmap_packet(struct device *dev, dma_unmap_single(dev, dma_unmap_addr(pkt, dma[0]), dma_unmap_len(pkt, len[0]), DMA_TO_DEVICE); for (i =3D 1; i < pkt->num_bufs; i++) { - dma_unmap_page(dev, dma_unmap_addr(pkt, dma[i]), - dma_unmap_len(pkt, len[i]), DMA_TO_DEVICE); + netmem_dma_unmap_page_attrs(dev, dma_unmap_addr(pkt, dma[i]), + dma_unmap_len(pkt, len[i]), + DMA_TO_DEVICE, 0); } pkt->num_bufs =3D 0; } --=20 2.48.1.601.g30ceb7b040-goog From nobody Sun Feb 8 00:03:27 2026 Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6CFA6217641 for ; Sat, 22 Feb 2025 19:15:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740251736; cv=none; b=etN+aXPwlC1/2pISOuo3vq7oW7EP1BLzfZtXIHSt5UGhDhFJnrVf8ppckX7vYnLKhyRhL32lFW6eDtIblfq9n5eFtgtvOgVdminckat1iQN0QHrPzx3ZahZv0r1QzxlG+gsXfeeCCjEWMqVpsE8mlPsjwfAcQCpLZX3h0pqnds0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740251736; c=relaxed/simple; bh=8Y2ZMgshy3DSser0k2/0xpJbIFztqzldqJ0EgMTN7TA=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=q2qoJxfZuuIUN2y/BW7JvN6yBvHlhzfdZgEET3Grbfz/d9Dj+qBkNYqwnvtr2FRZuOvSCZ3HcylVzYo28i7G9XDMsDs3E1hoLdr5O71H3/hAWk1Bg3OZvPxAMAxGi3H2UzKjMmS643BJ2P8DLPiWsxZaCc6cnDzMFaatfuzl7nk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--almasrymina.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=QaY3TEe/; arc=none smtp.client-ip=209.85.214.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--almasrymina.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="QaY3TEe/" Received: by mail-pl1-f201.google.com with SMTP id d9443c01a7336-2210305535bso102565775ad.1 for ; Sat, 22 Feb 2025 11:15:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1740251734; x=1740856534; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Bokme88KuMNdn1f3MKxdjhr0HIEvbFywaglPVkH9x/0=; b=QaY3TEe/dhjGqEuEEc/5BX0kQkxYluD5PyWZHJKQxf1d6M8M/8Y/7dLHRUmAY9uSBg V6EgIOaklEuorPEupYLmWrTOyWISyxHTn01obRMNOW74atG5exqzX91lfujwh2OUMiBZ S66WdFdQlcxwQOTs5bzetH3HWsobzFfUFpQww1aX/rMYt7bzPpMFnLqNemckwDk5Lr24 v1ZWqT9tIzshOZU0nTgtS2EX4YQnxzRshkIFRKKNXL/8gYjjCw6wfDkeZQtQATq1P8q1 v8/72sqgjCPUocV++K0qOnwHfDZdvRQQVknJjNvbv5Reo7jynnDThhtNPpl/6AcztieU Uu+Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740251734; x=1740856534; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Bokme88KuMNdn1f3MKxdjhr0HIEvbFywaglPVkH9x/0=; b=n8NYqghdKqSmInZ9EM5ZOt/lj9kPlYkgk9iU5Dw3ABTR16Gyn7BZlKGbmmn5k9RXyn dSTrDU/HNlxYt+gyNzvGY0Hrv3rXf9wWhTwhkBt5uODon/5mOircNCovLyoksZsCZ09n V6MDggkNaITS/IcDO3xT4phDSBoQYUNOmZjjli3qKZJp81ugQ7RhG9KT/eJi3K8DPb4Y G9qvy/bd+q34GTj6iiTkSHKV1xCzDsUkrpSwFMsmG+wb+HTKpPxvu+92e5o+IP1kIPZ/ DmNHnoN26LndV680ZqYg5bEGMvnlbbjvE1w2s7qnAXDJa/U84ycpUXKsKa0fglcBJXrV cxAg== X-Forwarded-Encrypted: i=1; AJvYcCXEY7M296S4+bBTaKJPnRDiUtETpegT/b0X5cWsGPRzbHI7rJ4NlYRLD74OKOfj49fXd0nED+G1tapQYNQ=@vger.kernel.org X-Gm-Message-State: AOJu0YwfWtY4iA59JW1WX39T3LGZIdPqlbV0EWlk7o2VBYYhMAUQIJ1z BVgyjS5b0InAjz9LgBQhRuPxJDFgte5CSIriUbqPm/69QKgIVnks3Gx2AtOj1NNG2De1MzsqZeL l1GBZy+5cpmtL+560IBD8GA== X-Google-Smtp-Source: AGHT+IEFy9MOMqoUL+PCw3IyeJmCuFhQ91qsxJxZ5c9JuF+IqC2mtPSjn0iWhVp1Rk9Pf+OhOvRwARWSRX9mbFDPmw== X-Received: from pful4.prod.google.com ([2002:a05:6a00:1404:b0:732:3440:ffcc]) (user=almasrymina job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a00:1746:b0:730:8d0c:1066 with SMTP id d2e1a72fcca58-73426da62camr15650719b3a.24.1740251733766; Sat, 22 Feb 2025 11:15:33 -0800 (PST) Date: Sat, 22 Feb 2025 19:15:16 +0000 In-Reply-To: <20250222191517.743530-1-almasrymina@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250222191517.743530-1-almasrymina@google.com> X-Mailer: git-send-email 2.48.1.601.g30ceb7b040-goog Message-ID: <20250222191517.743530-9-almasrymina@google.com> Subject: [PATCH net-next v5 8/9] net: check for driver support in netmem TX From: Mina Almasry To: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, virtualization@lists.linux.dev, kvm@vger.kernel.org, linux-kselftest@vger.kernel.org Cc: Mina Almasry , Donald Hunter , Jakub Kicinski , "David S. Miller" , Eric Dumazet , Paolo Abeni , Simon Horman , Jonathan Corbet , Andrew Lunn , Jeroen de Borst , Harshitha Ramamurthy , Kuniyuki Iwashima , Willem de Bruijn , David Ahern , Neal Cardwell , Stefan Hajnoczi , Stefano Garzarella , "Michael S. Tsirkin" , Jason Wang , Xuan Zhuo , "=?UTF-8?q?Eugenio=20P=C3=A9rez?=" , Shuah Khan , sdf@fomichev.me, asml.silence@gmail.com, dw@davidwei.uk, Jamal Hadi Salim , Victor Nogueira , Pedro Tammela , Samiullah Khawaja Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" We should not enable netmem TX for drivers that don't declare support. Check for driver netmem TX support during devmem TX binding and fail if the driver does not have the functionality. Check for driver support in validate_xmit_skb as well. Signed-off-by: Mina Almasry Acked-by: Stanislav Fomichev --- v4: - New patch --- net/core/dev.c | 3 +++ net/core/netdev-genl.c | 7 +++++++ 2 files changed, 10 insertions(+) diff --git a/net/core/dev.c b/net/core/dev.c index 8c7ee7ada6a3..ba83b18fa703 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3921,6 +3921,9 @@ static struct sk_buff *validate_xmit_skb(struct sk_bu= ff *skb, struct net_device =20 skb =3D validate_xmit_xfrm(skb, features, again); =20 + if (!skb_frags_readable(skb) && !dev->netmem_tx) + goto out_kfree_skb; + return skb; =20 out_kfree_skb: diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c index 6e7cd6a5c177..6c5d62df0d65 100644 --- a/net/core/netdev-genl.c +++ b/net/core/netdev-genl.c @@ -972,6 +972,13 @@ int netdev_nl_bind_tx_doit(struct sk_buff *skb, struct= genl_info *info) goto err_unlock; } =20 + if (!netdev->netmem_tx) { + err =3D -EOPNOTSUPP; + NL_SET_ERR_MSG(info->extack, + "Driver does not support netmem TX"); + goto err_unlock; + } + binding =3D net_devmem_bind_dmabuf(netdev, DMA_TO_DEVICE, dmabuf_fd, info->extack); if (IS_ERR(binding)) { --=20 2.48.1.601.g30ceb7b040-goog From nobody Sun Feb 8 00:03:27 2026 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2510C218ADF for ; Sat, 22 Feb 2025 19:15:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740251738; cv=none; b=W1n2Sj4yq/bIPmpJOaoxAJIVb/tbI12Bn5E2CdSbVKdOHZW6KQ/0hpDPpK6npykifnO/7pfiBQbmAaqrlGgThosZVaCnuVlZ/zJf0q0f/vi5mv8wvFu1j2NDkTax3rWzlFfO/0HkNpbYqAo/M8mW/rdOddCSJWgfXvcdKYB4aCo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740251738; c=relaxed/simple; bh=A8O9VMPSFft2Yk57BZXy6GFeusyN7VqfqbGG2xw78qI=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=emvRXKtSJvfTSFS/+EGW35/UtEE07Sv5NCSPsOIn4D2s2eF5EbyeE5Sk8V/IRUSWUvVsGc8sf/974MqkLuhNTLvjIhZk0BGxTSRq90W/QCUTwHvHzpr/i6W7Qeb27v2H2+EguLho46zPfF6VM94ql+q+UNsOk3QI2Ig3FQMJT3Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--almasrymina.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=0qIo3yNU; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--almasrymina.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="0qIo3yNU" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-2fc318bd470so6261190a91.0 for ; Sat, 22 Feb 2025 11:15:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1740251736; x=1740856536; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=QWC5md8FOa5+2bC+P1m5axEsbDYPxze1SR5J82UpDqE=; b=0qIo3yNUwLbxsswPRCyRhYKSLJ+JRQgUivMgEGUkholTJuAYRbhKmfpMVYrmqNv8hS i9EE0kLzII4U0eTRW5LdZ9wPJFpGFjN1KW+s9lThV8YIH4Ks2I83Y2vEBbSacfkmttM+ /cKqOruXiFdmEPfLpiZ0lkf7ZH6Yyzob83gsbKRiBBJUiTJzCoI5cymIFQawarzjDTHV K59Iwh0YJEjkYZ6a3S2QU2fJe07r8ZxMnsVRViCieSM8XLmc83mzKY7WSBjz2IIyEm5n q+yyLFEMs+sC/FEPBsEDDL3JRqi2JDoPIa+YwJtiekzYLS+v3yk88FxE3zrXhNlgkKny WmtQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740251736; x=1740856536; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=QWC5md8FOa5+2bC+P1m5axEsbDYPxze1SR5J82UpDqE=; b=rgsN7aPydXU7O4tHhxrUvgu63q82hvg+gDV30ItbhGZAXQcdlAGhhAEN9P7BUvI0Qs JmpRVVgM+lBT9IEZUPTHYrNLWzkj4QymaxcdwIJAysT4wyMCbKU8YNjz9xh6bE6zmKTa 7TeOwP5l6VBiyBc+LnYy5zK45kjMMiwN18OyOD/FHW4GtyYM3RtObPg65HCHcX6iYWkp Lffn/xH8Xhqglx6LKy0NKuD+TZpQYkOI8iYEe0BUM7tNKngVGJrrNX6ixdYDLkwYbFSa FUTL/ojyTA7Em7pwJZx3VuHPHCPvzhi5HozrdaZG8mhMhCU313WB96aLh1IVkLEuVQSr U3jQ== X-Forwarded-Encrypted: i=1; AJvYcCWeeltmhxwvsgOacU+JZhrL59anqbD83+neNj4ykHjM6i4NoX15h5jaESryIhtOW48Nvli2VClKvDmflOU=@vger.kernel.org X-Gm-Message-State: AOJu0YzmciUUZPldC2QkAt3adGYkC97rTiE2BOBZpF1Gppuwrb5spPOm 8np021NR7XrcvqvQuSL2Hb0zT+JOPi02G0/MS3PvZbC1s6ATqPuzs+X71MVUde3iFxfnTFOGveU azx5TVLFbSQmrp44VRQgJBQ== X-Google-Smtp-Source: AGHT+IG5kiipYfOcBQft6tFGC1DPG0FCtaH58DOxEl8iVj5VZ8YYESsHJgPbqsEFzVAIPWLmO/0WlLshNha2oWKHPA== X-Received: from pjbhl3.prod.google.com ([2002:a17:90b:1343:b0:2fc:11a0:c549]) (user=almasrymina job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:37c7:b0:2fa:157e:c78e with SMTP id 98e67ed59e1d1-2fce76a26cfmr13983966a91.7.1740251735690; Sat, 22 Feb 2025 11:15:35 -0800 (PST) Date: Sat, 22 Feb 2025 19:15:17 +0000 In-Reply-To: <20250222191517.743530-1-almasrymina@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250222191517.743530-1-almasrymina@google.com> X-Mailer: git-send-email 2.48.1.601.g30ceb7b040-goog Message-ID: <20250222191517.743530-10-almasrymina@google.com> Subject: [PATCH net-next v5 9/9] selftests: ncdevmem: Implement devmem TCP TX From: Mina Almasry To: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, virtualization@lists.linux.dev, kvm@vger.kernel.org, linux-kselftest@vger.kernel.org Cc: Mina Almasry , Donald Hunter , Jakub Kicinski , "David S. Miller" , Eric Dumazet , Paolo Abeni , Simon Horman , Jonathan Corbet , Andrew Lunn , Jeroen de Borst , Harshitha Ramamurthy , Kuniyuki Iwashima , Willem de Bruijn , David Ahern , Neal Cardwell , Stefan Hajnoczi , Stefano Garzarella , "Michael S. Tsirkin" , Jason Wang , Xuan Zhuo , "=?UTF-8?q?Eugenio=20P=C3=A9rez?=" , Shuah Khan , sdf@fomichev.me, asml.silence@gmail.com, dw@davidwei.uk, Jamal Hadi Salim , Victor Nogueira , Pedro Tammela , Samiullah Khawaja Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add support for devmem TX in ncdevmem. This is a combination of the ncdevmem from the devmem TCP series RFCv1 which included the TX path, and work by Stan to include the netlink API and refactored on top of his generic memory_provider support. Signed-off-by: Mina Almasry Signed-off-by: Stanislav Fomichev Acked-by: Stanislav Fomichev --- v5: - Remove unnecassyr socat bindings (Stan). - Add exit_wait=3DTrue (Stan) - Remove unnecessary -c arg to ncdevmem in check_tx. v4: - Add TX test to devmem.py (Paolo). v3: - Update ncdevmem docs to run validation with RX-only and RX-with-TX. - Fix build warnings (Stan). - Make the validation expect new lines in the pattern so we can have the TX path behave like netcat (Stan). - Change ret to errno in error() calls (Stan). - Handle the case where client_ip is not provided (Stan). - Don't assume mid is <=3D 2000 (Stan). v2: - make errors a static variable so that we catch instances where there are less than 20 errors across different buffers. - Fix the issue where the seed is reset to 0 instead of its starting value 1. - Use 1000ULL instead of 1000 to guard against overflow (Willem). - Do not set POLLERR (Willem). - Update the test to use the new interface where iov_base is the dmabuf_offset. - Update the test to send 2 iov instead of 1, so we get some test coverage over sending multiple iovs at once. - Print the ifindex the test is using, useful for debugging issues where maybe the test may fail because the ifindex of the socket is different from the dmabuf binding. --- .../selftests/drivers/net/hw/devmem.py | 26 +- .../selftests/drivers/net/hw/ncdevmem.c | 300 +++++++++++++++++- 2 files changed, 311 insertions(+), 15 deletions(-) diff --git a/tools/testing/selftests/drivers/net/hw/devmem.py b/tools/testi= ng/selftests/drivers/net/hw/devmem.py index 3947e9157115..7fc686cf47a2 100755 --- a/tools/testing/selftests/drivers/net/hw/devmem.py +++ b/tools/testing/selftests/drivers/net/hw/devmem.py @@ -1,6 +1,7 @@ #!/usr/bin/env python3 # SPDX-License-Identifier: GPL-2.0 =20 +from os import path from lib.py import ksft_run, ksft_exit from lib.py import ksft_eq, KsftSkipEx from lib.py import NetDrvEpEnv @@ -10,8 +11,7 @@ from lib.py import ksft_disruptive =20 def require_devmem(cfg): if not hasattr(cfg, "_devmem_probed"): - port =3D rand_port() - probe_command =3D f"./ncdevmem -f {cfg.ifname}" + probe_command =3D f"{cfg.bin_local} -f {cfg.ifname}" cfg._devmem_supported =3D cmd(probe_command, fail=3DFalse, shell= =3DTrue).ret =3D=3D 0 cfg._devmem_probed =3D True =20 @@ -25,7 +25,7 @@ def check_rx(cfg) -> None: require_devmem(cfg) =20 port =3D rand_port() - listen_cmd =3D f"./ncdevmem -l -f {cfg.ifname} -s {cfg.addr_v['6']} -p= {port}" + listen_cmd =3D f"{cfg.bin_local} -l -f {cfg.ifname} -s {cfg.addr_v['6'= ]} -p {port}" =20 with bkg(listen_cmd) as socat: wait_port_listen(port) @@ -34,9 +34,27 @@ def check_rx(cfg) -> None: ksft_eq(socat.stdout.strip(), "hello\nworld") =20 =20 +@ksft_disruptive +def check_tx(cfg) -> None: + cfg.require_ipver("6") + require_devmem(cfg) + + port =3D rand_port() + listen_cmd =3D f"socat -U - TCP6-LISTEN:{port}" + + with bkg(listen_cmd, exit_wait=3DTrue) as socat: + wait_port_listen(port) + cmd(f"echo -e \"hello\\nworld\"| {cfg.bin_remote} -f {cfg.ifname} = -s {cfg.addr_v['6']} -p {port}", host=3Dcfg.remote, shell=3DTrue) + + ksft_eq(socat.stdout.strip(), "hello\nworld") + + def main() -> None: with NetDrvEpEnv(__file__) as cfg: - ksft_run([check_rx], + cfg.bin_local =3D path.abspath(path.dirname(__file__) + "/ncdevmem= ") + cfg.bin_remote =3D cfg.remote.deploy(cfg.bin_local) + + ksft_run([check_rx, check_tx], args=3D(cfg, )) ksft_exit() =20 diff --git a/tools/testing/selftests/drivers/net/hw/ncdevmem.c b/tools/test= ing/selftests/drivers/net/hw/ncdevmem.c index 2bf14ac2b8c6..f801a1b3545f 100644 --- a/tools/testing/selftests/drivers/net/hw/ncdevmem.c +++ b/tools/testing/selftests/drivers/net/hw/ncdevmem.c @@ -9,22 +9,31 @@ * ncdevmem -s [-c ] -f eth1 -l -p 5201 * * On client: - * echo -n "hello\nworld" | nc -s 5201 -p 5201 + * echo -n "hello\nworld" | \ + * ncdevmem -s [-c ] -p 5201 -f eth1 * - * Test data validation: + * Note this is compatible with regular netcat. i.e. the sender or receive= r can + * be replaced with regular netcat to test the RX or TX path in isolation. + * + * Test data validation (devmem TCP on RX only): * * On server: * ncdevmem -s [-c ] -f eth1 -l -p 5201 -v 7 * * On client: * yes $(echo -e \\x01\\x02\\x03\\x04\\x05\\x06) | \ - * tr \\n \\0 | \ - * head -c 5G | \ + * head -c 1G | \ * nc 5201 -p 5201 * + * Test data validation (devmem TCP on RX and TX, validation happens on RX= ): * - * Note this is compatible with regular netcat. i.e. the sender or receive= r can - * be replaced with regular netcat to test the RX or TX path in isolation. + * On server: + * ncdevmem -s [-c ] -l -p 5201 -v 8 -f eth1 + * + * On client: + * yes $(echo -e \\x01\\x02\\x03\\x04\\x05\\x06\\x07) | \ + * head -c 1M | \ + * ncdevmem -s [-c ] -p 5201 -f eth1 */ #define _GNU_SOURCE #define __EXPORTED_HEADERS__ @@ -40,15 +49,18 @@ #include #include #include +#include =20 #include #include #include #include #include +#include =20 #include #include +#include #include #include #include @@ -79,6 +91,8 @@ static int num_queues =3D -1; static char *ifname; static unsigned int ifindex; static unsigned int dmabuf_id; +static uint32_t tx_dmabuf_id; +static int waittime_ms =3D 500; =20 struct memory_buffer { int fd; @@ -92,6 +106,8 @@ struct memory_buffer { struct memory_provider { struct memory_buffer *(*alloc)(size_t size); void (*free)(struct memory_buffer *ctx); + void (*memcpy_to_device)(struct memory_buffer *dst, size_t off, + void *src, int n); void (*memcpy_from_device)(void *dst, struct memory_buffer *src, size_t off, int n); }; @@ -152,6 +168,20 @@ static void udmabuf_free(struct memory_buffer *ctx) free(ctx); } =20 +static void udmabuf_memcpy_to_device(struct memory_buffer *dst, size_t off, + void *src, int n) +{ + struct dma_buf_sync sync =3D {}; + + sync.flags =3D DMA_BUF_SYNC_START | DMA_BUF_SYNC_WRITE; + ioctl(dst->fd, DMA_BUF_IOCTL_SYNC, &sync); + + memcpy(dst->buf_mem + off, src, n); + + sync.flags =3D DMA_BUF_SYNC_END | DMA_BUF_SYNC_WRITE; + ioctl(dst->fd, DMA_BUF_IOCTL_SYNC, &sync); +} + static void udmabuf_memcpy_from_device(void *dst, struct memory_buffer *sr= c, size_t off, int n) { @@ -169,6 +199,7 @@ static void udmabuf_memcpy_from_device(void *dst, struc= t memory_buffer *src, static struct memory_provider udmabuf_memory_provider =3D { .alloc =3D udmabuf_alloc, .free =3D udmabuf_free, + .memcpy_to_device =3D udmabuf_memcpy_to_device, .memcpy_from_device =3D udmabuf_memcpy_from_device, }; =20 @@ -187,14 +218,16 @@ void validate_buffer(void *line, size_t size) { static unsigned char seed =3D 1; unsigned char *ptr =3D line; - int errors =3D 0; + unsigned char expected; + static int errors; size_t i; =20 for (i =3D 0; i < size; i++) { - if (ptr[i] !=3D seed) { + expected =3D seed ? seed : '\n'; + if (ptr[i] !=3D expected) { fprintf(stderr, "Failed validation: expected=3D%u, actual=3D%u, index=3D%lu\n", - seed, ptr[i], i); + expected, ptr[i], i); errors++; if (errors > 20) error(1, 0, "validation failed."); @@ -393,6 +426,49 @@ static int bind_rx_queue(unsigned int ifindex, unsigne= d int dmabuf_fd, return -1; } =20 +static int bind_tx_queue(unsigned int ifindex, unsigned int dmabuf_fd, + struct ynl_sock **ys) +{ + struct netdev_bind_tx_req *req =3D NULL; + struct netdev_bind_tx_rsp *rsp =3D NULL; + struct ynl_error yerr; + + *ys =3D ynl_sock_create(&ynl_netdev_family, &yerr); + if (!*ys) { + fprintf(stderr, "YNL: %s\n", yerr.msg); + return -1; + } + + req =3D netdev_bind_tx_req_alloc(); + netdev_bind_tx_req_set_ifindex(req, ifindex); + netdev_bind_tx_req_set_fd(req, dmabuf_fd); + + rsp =3D netdev_bind_tx(*ys, req); + if (!rsp) { + perror("netdev_bind_tx"); + goto err_close; + } + + if (!rsp->_present.id) { + perror("id not present"); + goto err_close; + } + + fprintf(stderr, "got tx dmabuf id=3D%d\n", rsp->id); + tx_dmabuf_id =3D rsp->id; + + netdev_bind_tx_req_free(req); + netdev_bind_tx_rsp_free(rsp); + + return 0; + +err_close: + fprintf(stderr, "YNL failed: %s\n", (*ys)->err.msg); + netdev_bind_tx_req_free(req); + ynl_sock_destroy(*ys); + return -1; +} + static void enable_reuseaddr(int fd) { int opt =3D 1; @@ -431,7 +507,7 @@ static int parse_address(const char *str, int port, str= uct sockaddr_in6 *sin6) return 0; } =20 -int do_server(struct memory_buffer *mem) +static int do_server(struct memory_buffer *mem) { char ctrl_data[sizeof(int) * 20000]; struct netdev_queue_id *queues; @@ -685,6 +761,206 @@ void run_devmem_tests(void) provider->free(mem); } =20 +static uint64_t gettimeofday_ms(void) +{ + struct timeval tv; + + gettimeofday(&tv, NULL); + return (tv.tv_sec * 1000ULL) + (tv.tv_usec / 1000ULL); +} + +static int do_poll(int fd) +{ + struct pollfd pfd; + int ret; + + pfd.revents =3D 0; + pfd.fd =3D fd; + + ret =3D poll(&pfd, 1, waittime_ms); + if (ret =3D=3D -1) + error(1, errno, "poll"); + + return ret && (pfd.revents & POLLERR); +} + +static void wait_compl(int fd) +{ + int64_t tstop =3D gettimeofday_ms() + waittime_ms; + char control[CMSG_SPACE(100)] =3D {}; + struct sock_extended_err *serr; + struct msghdr msg =3D {}; + struct cmsghdr *cm; + __u32 hi, lo; + int ret; + + msg.msg_control =3D control; + msg.msg_controllen =3D sizeof(control); + + while (gettimeofday_ms() < tstop) { + if (!do_poll(fd)) + continue; + + ret =3D recvmsg(fd, &msg, MSG_ERRQUEUE); + if (ret < 0) { + if (errno =3D=3D EAGAIN) + continue; + error(1, errno, "recvmsg(MSG_ERRQUEUE)"); + return; + } + if (msg.msg_flags & MSG_CTRUNC) + error(1, 0, "MSG_CTRUNC\n"); + + for (cm =3D CMSG_FIRSTHDR(&msg); cm; cm =3D CMSG_NXTHDR(&msg, cm)) { + if (cm->cmsg_level !=3D SOL_IP && + cm->cmsg_level !=3D SOL_IPV6) + continue; + if (cm->cmsg_level =3D=3D SOL_IP && + cm->cmsg_type !=3D IP_RECVERR) + continue; + if (cm->cmsg_level =3D=3D SOL_IPV6 && + cm->cmsg_type !=3D IPV6_RECVERR) + continue; + + serr =3D (void *)CMSG_DATA(cm); + if (serr->ee_origin !=3D SO_EE_ORIGIN_ZEROCOPY) + error(1, 0, "wrong origin %u", serr->ee_origin); + if (serr->ee_errno !=3D 0) + error(1, 0, "wrong errno %d", serr->ee_errno); + + hi =3D serr->ee_data; + lo =3D serr->ee_info; + + fprintf(stderr, "tx complete [%d,%d]\n", lo, hi); + return; + } + } + + error(1, 0, "did not receive tx completion"); +} + +static int do_client(struct memory_buffer *mem) +{ + char ctrl_data[CMSG_SPACE(sizeof(__u32))]; + struct sockaddr_in6 server_sin; + struct sockaddr_in6 client_sin; + struct ynl_sock *ys =3D NULL; + struct msghdr msg =3D {}; + ssize_t line_size =3D 0; + struct cmsghdr *cmsg; + struct iovec iov[2]; + char *line =3D NULL; + unsigned long mid; + size_t len =3D 0; + int socket_fd; + __u32 ddmabuf; + int opt =3D 1; + int ret; + + ret =3D parse_address(server_ip, atoi(port), &server_sin); + if (ret < 0) + error(1, 0, "parse server address"); + + socket_fd =3D socket(AF_INET6, SOCK_STREAM, 0); + if (socket_fd < 0) + error(1, socket_fd, "create socket"); + + enable_reuseaddr(socket_fd); + + ret =3D setsockopt(socket_fd, SOL_SOCKET, SO_BINDTODEVICE, ifname, + strlen(ifname) + 1); + if (ret) + error(1, errno, "bindtodevice"); + + if (bind_tx_queue(ifindex, mem->fd, &ys)) + error(1, 0, "Failed to bind\n"); + + if (client_ip) { + ret =3D parse_address(client_ip, atoi(port), &client_sin); + if (ret < 0) + error(1, 0, "parse client address"); + + ret =3D bind(socket_fd, &client_sin, sizeof(client_sin)); + if (ret) + error(1, errno, "bind"); + } + + ret =3D setsockopt(socket_fd, SOL_SOCKET, SO_ZEROCOPY, &opt, sizeof(opt)); + if (ret) + error(1, errno, "set sock opt"); + + fprintf(stderr, "Connect to %s %d (via %s)\n", server_ip, + ntohs(server_sin.sin6_port), ifname); + + ret =3D connect(socket_fd, &server_sin, sizeof(server_sin)); + if (ret) + error(1, errno, "connect"); + + while (1) { + free(line); + line =3D NULL; + line_size =3D getline(&line, &len, stdin); + + if (line_size < 0) + break; + + mid =3D (line_size / 2) + 1; + + iov[0].iov_base =3D (void *)1; + iov[0].iov_len =3D mid; + iov[1].iov_base =3D (void *)(mid + 2); + iov[1].iov_len =3D line_size - mid; + + provider->memcpy_to_device(mem, (size_t)iov[0].iov_base, line, + iov[0].iov_len); + provider->memcpy_to_device(mem, (size_t)iov[1].iov_base, + line + iov[0].iov_len, + iov[1].iov_len); + + fprintf(stderr, + "read line_size=3D%ld iov[0].iov_base=3D%lu, iov[0].iov_len=3D%lu, iov[= 1].iov_base=3D%lu, iov[1].iov_len=3D%lu\n", + line_size, (unsigned long)iov[0].iov_base, + iov[0].iov_len, (unsigned long)iov[1].iov_base, + iov[1].iov_len); + + msg.msg_iov =3D iov; + msg.msg_iovlen =3D 2; + + msg.msg_control =3D ctrl_data; + msg.msg_controllen =3D sizeof(ctrl_data); + + cmsg =3D CMSG_FIRSTHDR(&msg); + cmsg->cmsg_level =3D SOL_SOCKET; + cmsg->cmsg_type =3D SCM_DEVMEM_DMABUF; + cmsg->cmsg_len =3D CMSG_LEN(sizeof(__u32)); + + ddmabuf =3D tx_dmabuf_id; + + *((__u32 *)CMSG_DATA(cmsg)) =3D ddmabuf; + + ret =3D sendmsg(socket_fd, &msg, MSG_ZEROCOPY); + if (ret < 0) + error(1, errno, "Failed sendmsg"); + + fprintf(stderr, "sendmsg_ret=3D%d\n", ret); + + if (ret !=3D line_size) + error(1, errno, "Did not send all bytes"); + + wait_compl(socket_fd); + } + + fprintf(stderr, "%s: tx ok\n", TEST_PREFIX); + + free(line); + close(socket_fd); + + if (ys) + ynl_sock_destroy(ys); + + return 0; +} + int main(int argc, char *argv[]) { struct memory_buffer *mem; @@ -728,6 +1004,8 @@ int main(int argc, char *argv[]) =20 ifindex =3D if_nametoindex(ifname); =20 + fprintf(stderr, "using ifindex=3D%u\n", ifindex); + if (!server_ip && !client_ip) { if (start_queue < 0 && num_queues < 0) { num_queues =3D rxq_num(ifindex); @@ -778,7 +1056,7 @@ int main(int argc, char *argv[]) error(1, 0, "Missing -p argument\n"); =20 mem =3D provider->alloc(getpagesize() * NUM_PAGES); - ret =3D is_server ? do_server(mem) : 1; + ret =3D is_server ? do_server(mem) : do_client(mem); provider->free(mem); =20 return ret; --=20 2.48.1.601.g30ceb7b040-goog