From nobody Mon Feb 9 10:28:41 2026 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 98939198E75 for ; Thu, 24 Apr 2025 04:03:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745467389; cv=none; b=QNPVNEQdL0dPmr4K4L0FXY5/C+0WCQoU/c9ZOXTAYQiFmChE0J6+CmKe4nhIgpxLiiMyM4pbpruaUlBn4of6K1QkzEcPU1RuuZQIXf0hl4ECX+bZOBIyEwGi9nXc7F6gG0LKV5y9vdAhEEbHc+o4/On7bLlPqwniON+y/8Z0WJQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745467389; c=relaxed/simple; bh=4aYoyzB/KkWmJ5/9G+uCizlJUk2VJLG9EQuaIZ/YcXQ=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=bpUlyN4JV603tTFp5alsCXG+f5599uN85MpFTVLOhqtvlkAXQ3Jin/OhKtuSCPELigfBHNZgF7CRQqLAq+mEkQEaOFs0sLWLyxDb9rrLVXhM4Pt/qnlSm1AolcpPUacsZu5e+iVV5d7srp2PanTtc7WAwvqgxvq2lWdYRoNJS+Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--almasrymina.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=kJYt0Frg; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--almasrymina.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="kJYt0Frg" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-2ff798e8c3bso441662a91.2 for ; Wed, 23 Apr 2025 21:03:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1745467387; x=1746072187; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=3J28jVMWh8snHxmkIXhQjFawApSkxGZWDK5B0i2RUdE=; b=kJYt0FrgBCwKhuIcL8yiqAir2zpwvvucwRsR33/C5FOJQCcvbPUoOGwpVTcysUdXtr 60/jB1rRksQwLyGx6zCKm9lWRLv706oqBwuod0EkQPIljmeebQeUgh1gVq3e+p9DgW+y E+oiW6ygqpeCTMT1en21V0Ssy9sJ9Jje7tI3bK8NSUm/SRrELN9Bt8EPQTH35VaHtoMQ DtNi55uQjImP+5oMYKa/NK61bNB5AVse5gmkDG4e7PCUWSvak8Sgfo0X2WotkYCmrZtQ sCeKOFxTKuz1EjgXUUWgm7NBFQayMNLwEltD41dDkgraHsNIRBXCT6IN8NuMtPNieJdB X+nA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1745467387; x=1746072187; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=3J28jVMWh8snHxmkIXhQjFawApSkxGZWDK5B0i2RUdE=; b=hNeA5bbQAiBs/MmuWuTuKN8QlHFahWOXMW4kY+17flOPK4EXbIFJzrzEkfFaY8aV8b bWjj9DyDmGbblBri3e6KtZ9NtlvJR8+qOOb22HdimgxOlY8McBeX6rCbr5pSVV1wxuCM BpogRCCf1VEuncYWF9PVup2Q8TJl4y6XkSZI6hYXnV1JOHn7jpTxl5iwLdyFlGZpQHu9 TfIhpIt11hWDPN6+xtl9bzsRn9Iovc/Kd4vmWuBBpeANMAo0bL2E2R7121pk5dv1krFh H3apv4Opw/DPpMvt/1kAVnh635LnzALL4Qc1FDFTi4yigDzQ50tZ5TTVE8Y15jSzLbE1 zrXQ== X-Forwarded-Encrypted: i=1; AJvYcCW9N/mg3ALZTFbBXAD6lNlScdzNYnnEGo0Qs1IAQRFSHnKTqFLFvlH1YuaQ5yKMfjx3gu2zK/0jVOr54I4=@vger.kernel.org X-Gm-Message-State: AOJu0Yx26f9OqleiXiIVqyS1/DFBiJ4lGeXJqmffD/LGbu9yqXmtK6iG 8w9oSas+ovclxwp1kkVDBjHQETDJ4wA0Hc8OLiRjE1LHlS1Sk0EYTMcV86Qfq6+MsoE/+IFAdwR reyeQyF49A1W9K5WIF+9/IQ== X-Google-Smtp-Source: AGHT+IGBZ2S7HV+M9UOrbBgFDh83W2ojy4ER/zk/J1mOecMVq06wFRifMv1iMLCm4IcHtdE3RfF8GhNBCk00yV5IAw== X-Received: from pjqq12.prod.google.com ([2002:a17:90b:584c:b0:2f9:dc36:b11]) (user=almasrymina job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:2752:b0:305:2d9d:81c9 with SMTP id 98e67ed59e1d1-309ed2805d0mr1893917a91.16.1745467386893; Wed, 23 Apr 2025 21:03:06 -0700 (PDT) Date: Thu, 24 Apr 2025 04:02:54 +0000 In-Reply-To: <20250424040301.2480876-1-almasrymina@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250424040301.2480876-1-almasrymina@google.com> X-Mailer: git-send-email 2.49.0.805.g082f7c87e0-goog Message-ID: <20250424040301.2480876-3-almasrymina@google.com> Subject: [PATCH net-next v11 2/8] net: add get_netmem/put_netmem support From: Mina Almasry To: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, io-uring@vger.kernel.org, virtualization@lists.linux.dev, kvm@vger.kernel.org Cc: Mina Almasry , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Simon Horman , Donald Hunter , Jonathan Corbet , Andrew Lunn , Jeroen de Borst , Harshitha Ramamurthy , Kuniyuki Iwashima , Willem de Bruijn , Jens Axboe , Pavel Begunkov , David Ahern , Neal Cardwell , Stefan Hajnoczi , Stefano Garzarella , "Michael S. Tsirkin" , Jason Wang , Xuan Zhuo , "=?UTF-8?q?Eugenio=20P=C3=A9rez?=" , sdf@fomichev.me, dw@davidwei.uk, Jamal Hadi Salim , Victor Nogueira , Pedro Tammela , Samiullah Khawaja Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Currently net_iovs support only pp ref counts, and do not support a page ref equivalent. This is fine for the RX path as net_iovs are used exclusively with the pp and only pp refcounting is needed there. The TX path however does not use pp ref counts, thus, support for get_page/put_page equivalent is needed for netmem. Support get_netmem/put_netmem. Check the type of the netmem before passing it to page or net_iov specific code to obtain a page ref equivalent. For dmabuf net_iovs, we obtain a ref on the underlying binding. This ensures the entire binding doesn't disappear until all the net_iovs have been put_netmem'ed. We do not need to track the refcount of individual dmabuf net_iovs as we don't allocate/free them from a pool similar to what the buddy allocator does for pages. This code is written to be extensible by other net_iov implementers. get_netmem/put_netmem will check the type of the netmem and route it to the correct helper: pages -> [get|put]_page() dmabuf net_iovs -> net_devmem_[get|put]_net_iov() new net_iovs -> new helpers Signed-off-by: Mina Almasry Acked-by: Stanislav Fomichev --- v5: https://lore.kernel.org/netdev/20250227041209.2031104-2-almasrymina@goo= gle.com/ - Updated to check that the net_iov is devmem before calling net_devmem_put_net_iov(). - Jakub requested that callers of __skb_frag_ref()/skb_page_unref be inspected to make sure that they generate / anticipate skbs with the correct pp_recycle and unreadable setting: skb_page_unref =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D - skb_page_unref is unreachable from these callers due to unreadable checks returning early: gro_pull_from_frag0, skb_copy_ubufs, __pskb_pull_tail - callers that are reachable for unreadable skbs. These would only see rx unreadable skbs with pp_recycle set before this patchset and would drop a pp ref count. After this patchset they can see tx unreadable skbs with no pp attached and no pp_recycle set, and so now they will drop a net_iov ref via put_netmem: __pskb_trim, __pskb_trim_head, skb_release_data, skb_shift __skb_frag_ref =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Before this patchset __skb_frag_ref would not do the right thing if it saw any unreadable skbs, either with pp_recycle set or not. Because it unconditionally tries to acquire a page ref, but with RX only support I can't reproduce calls to __skb_frag_ref even after enabling tc forwarding to TX. After this patchset __skb_frag_ref would obtain a page ref equivalent on dmabuf net_iovs, by obtaining a ref on the binding. Callers that are unreachable for unreadable skbs: - veth_xdp_get Callers that are reachable for unreadable skbs, and from code review they look specific to the TX path: - tcp_grow_skb, __skb_zcopy_downgrade_managed, __pskb_copy_fclone, pskb_expand_head, skb_zerocopy, skb_split, pksb_carve_inside_header, pskb_care_inside_nonlinear, tcp_clone_payload, skb_segment. Callers that are reachable for unreadable skbs, and from code review they look reachable in the RX path, although my testing never hit these paths. These are concerning. Maybe we should put this patch in net and cc stable? However, no drivers currently enable unreadable netmem, so fixing this in net-next is fine as well maybe: - skb_shift, skb_try_coalesce v2: - Add comment on top of refcount_t ref explaining the usage in the XT path. - Fix missing definition of net_devmem_dmabuf_binding_put in this patch. --- include/linux/skbuff_ref.h | 4 ++-- include/net/netmem.h | 3 +++ net/core/devmem.c | 10 ++++++++++ net/core/devmem.h | 20 ++++++++++++++++++++ net/core/skbuff.c | 30 ++++++++++++++++++++++++++++++ 5 files changed, 65 insertions(+), 2 deletions(-) diff --git a/include/linux/skbuff_ref.h b/include/linux/skbuff_ref.h index 0f3c58007488a..9e49372ef1a05 100644 --- a/include/linux/skbuff_ref.h +++ b/include/linux/skbuff_ref.h @@ -17,7 +17,7 @@ */ static inline void __skb_frag_ref(skb_frag_t *frag) { - get_page(skb_frag_page(frag)); + get_netmem(skb_frag_netmem(frag)); } =20 /** @@ -40,7 +40,7 @@ static inline void skb_page_unref(netmem_ref netmem, bool= recycle) if (recycle && napi_pp_put_page(netmem)) return; #endif - put_page(netmem_to_page(netmem)); + put_netmem(netmem); } =20 /** diff --git a/include/net/netmem.h b/include/net/netmem.h index 64af9a288c80c..1b047cfb9e4f7 100644 --- a/include/net/netmem.h +++ b/include/net/netmem.h @@ -273,4 +273,7 @@ static inline unsigned long netmem_get_dma_addr(netmem_= ref netmem) return __netmem_clear_lsb(netmem)->dma_addr; } =20 +void get_netmem(netmem_ref netmem); +void put_netmem(netmem_ref netmem); + #endif /* _NET_NETMEM_H */ diff --git a/net/core/devmem.c b/net/core/devmem.c index f5c3a7e6dbb7b..dca2ff7cf6923 100644 --- a/net/core/devmem.c +++ b/net/core/devmem.c @@ -295,6 +295,16 @@ net_devmem_bind_dmabuf(struct net_device *dev, unsigne= d int dmabuf_fd, return ERR_PTR(err); } =20 +void net_devmem_get_net_iov(struct net_iov *niov) +{ + net_devmem_dmabuf_binding_get(net_devmem_iov_binding(niov)); +} + +void net_devmem_put_net_iov(struct net_iov *niov) +{ + net_devmem_dmabuf_binding_put(net_devmem_iov_binding(niov)); +} + /*** "Dmabuf devmem memory provider" ***/ =20 int mp_dmabuf_devmem_init(struct page_pool *pool) diff --git a/net/core/devmem.h b/net/core/devmem.h index 7fc158d527293..946f2e0157467 100644 --- a/net/core/devmem.h +++ b/net/core/devmem.h @@ -29,6 +29,10 @@ struct net_devmem_dmabuf_binding { * The binding undos itself and unmaps the underlying dmabuf once all * those refs are dropped and the binding is no longer desired or in * use. + * + * net_devmem_get_net_iov() on dmabuf net_iovs will increment this + * reference, making sure that the binding remains alive until all the + * net_iovs are no longer used. */ refcount_t ref; =20 @@ -111,6 +115,9 @@ net_devmem_dmabuf_binding_put(struct net_devmem_dmabuf_= binding *binding) __net_devmem_dmabuf_binding_free(binding); } =20 +void net_devmem_get_net_iov(struct net_iov *niov); +void net_devmem_put_net_iov(struct net_iov *niov); + struct net_iov * net_devmem_alloc_dmabuf(struct net_devmem_dmabuf_binding *binding); void net_devmem_free_dmabuf(struct net_iov *ppiov); @@ -120,6 +127,19 @@ bool net_is_devmem_iov(struct net_iov *niov); #else struct net_devmem_dmabuf_binding; =20 +static inline void +net_devmem_dmabuf_binding_put(struct net_devmem_dmabuf_binding *binding) +{ +} + +static inline void net_devmem_get_net_iov(struct net_iov *niov) +{ +} + +static inline void net_devmem_put_net_iov(struct net_iov *niov) +{ +} + static inline void __net_devmem_dmabuf_binding_free(struct net_devmem_dmabuf_binding *binding) { diff --git a/net/core/skbuff.c b/net/core/skbuff.c index d73ad79fe739d..00c22bce98e44 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -89,6 +89,7 @@ #include =20 #include "dev.h" +#include "devmem.h" #include "netmem_priv.h" #include "sock_destructor.h" =20 @@ -7313,3 +7314,32 @@ bool csum_and_copy_from_iter_full(void *addr, size_t= bytes, return false; } EXPORT_SYMBOL(csum_and_copy_from_iter_full); + +void get_netmem(netmem_ref netmem) +{ + struct net_iov *niov; + + if (netmem_is_net_iov(netmem)) { + niov =3D netmem_to_net_iov(netmem); + if (net_is_devmem_iov(niov)) + net_devmem_get_net_iov(netmem_to_net_iov(netmem)); + return; + } + get_page(netmem_to_page(netmem)); +} +EXPORT_SYMBOL(get_netmem); + +void put_netmem(netmem_ref netmem) +{ + struct net_iov *niov; + + if (netmem_is_net_iov(netmem)) { + niov =3D netmem_to_net_iov(netmem); + if (net_is_devmem_iov(niov)) + net_devmem_put_net_iov(netmem_to_net_iov(netmem)); + return; + } + + put_page(netmem_to_page(netmem)); +} +EXPORT_SYMBOL(put_netmem); --=20 2.49.0.805.g082f7c87e0-goog