[PATCH net-next v6 3/6] net: devmem: prepare for autorelease rx token management

Bobby Eshleman posted 6 patches 1 month, 2 weeks ago
There is a newer version of this series
[PATCH net-next v6 3/6] net: devmem: prepare for autorelease rx token management
Posted by Bobby Eshleman 1 month, 2 weeks ago
From: Bobby Eshleman <bobbyeshleman@meta.com>

Add alternative token management implementation (autorelease vs
non-autorelease) that replaces xarray-based token lookups with direct
array access using page offsets as dmabuf tokens. When enabled, this
eliminates xarray overhead and reduces CPU utilization in devmem RX
threads by approximately 13%.

This patch changes the meaning of tokens when this option is used.
Tokens previously referred to unique fragments of pages. With this
option tokens instead represent references to pages, not fragments.
Because of this, multiple tokens may refer to the same page and so have
identical value (e.g., two small fragments may coexist on the same
page). The token and offset pair that the user receives uniquely
identifies fragments if needed.  This assumes that the user is not
attempting to sort / uniq the token list using tokens alone.

This introduces a restriction: devmem RX sockets cannot switch dmabuf
bindings when using the autorelease off option. This is necessary
because 32-bit tokens lack sufficient bits to encode both large dmabuf
page counts and binding/queue IDs. For example, a system with 8 NICs and
32 queues needs 8 bits for binding IDs, leaving only 24 bits for pages
(64GB max). This restriction aligns with common usage, as steering flows
to different queues/devices is often undesirable for TCP.

This patch adds an atomic uref counter to net_iov for tracking user
references via binding->vec. The pp_ref_count is only updated on uref
transitions from zero to one or from one to zero, to minimize atomic
overhead. If a user fails to refill and closes before returning all
tokens, the binding will finish the uref release when unbound.

A flag "autorelease" is added per-socket. This will be used for enabling
the old behavior of the kernel releasing references for the sockets upon
close(2) (autorelease), instead of requiring that socket users do this
themselves. The autorelease flag is always true in this patch, meaning
that the old (non-optimized) behavior is kept unconditionally. A future
patch supports a user-facing knob to toggle this feature and will change
the default to false for the improved performance.

An outstanding_urefs counter is added per-socket so that changes to the
autorelease mode can be rejected for active sockets.

The dmabuf unbind path always checks for any leaked urefs.

Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
---
Changes in v6:
- remove sk_devmem_info.autorelease, using binding->autorelease instead
- move binding->autorelease check to outside of
  net_devmem_dmabuf_binding_put_urefs() (Mina)
- remove overly defensive net_is_devmem_iov() (Mina)
- add comment about multiple urefs mapping to a single netmem ref (Mina)
- remove overly defense netmem NULL and netmem_is_net_iov checks (Mina)
- use niov without casting back and forth with netmem (Mina)
- move the autorelease flag from per-binding to per-socket (Mina)
- remove the batching logic in sock_devmem_dontneed_manual_release()
  (Mina)
- move autorelease check inside tcp_xa_pool_commit() (Mina)
- remove single-binding restriction for autorelease mode (Mina)
- unbind always checks for leaked urefs

Changes in v5:
- remove unused variables
- introduce autorelease flag, preparing for future patch toggle new
  behavior

Changes in v3:
- make urefs per-binding instead of per-socket, reducing memory
  footprint
- fallback to cleaning up references in dmabuf unbind if socket leaked
  tokens
- drop ethtool patch

Changes in v2:
- always use GFP_ZERO for binding->vec (Mina)
- remove WARN for changed binding (Mina)
- remove extraneous binding ref get (Mina)
- remove WARNs on invalid user input (Mina)
- pre-assign niovs in binding->vec for RX case (Mina)
- use atomic_set(, 0) to initialize sk_user_frags.urefs
- fix length of alloc for urefs
---
 include/net/netmem.h     |  1 +
 include/net/sock.h       | 13 +++++++--
 net/core/devmem.c        | 42 ++++++++++++++++++++++-------
 net/core/devmem.h        |  2 +-
 net/core/sock.c          | 55 +++++++++++++++++++++++++++++++++-----
 net/ipv4/tcp.c           | 69 ++++++++++++++++++++++++++++++++++++++----------
 net/ipv4/tcp_ipv4.c      | 11 +++++---
 net/ipv4/tcp_minisocks.c |  5 +++-
 8 files changed, 161 insertions(+), 37 deletions(-)

diff --git a/include/net/netmem.h b/include/net/netmem.h
index 9e10f4ac50c3..80d2263ba4ed 100644
--- a/include/net/netmem.h
+++ b/include/net/netmem.h
@@ -112,6 +112,7 @@ struct net_iov {
 	};
 	struct net_iov_area *owner;
 	enum net_iov_type type;
+	atomic_t uref;
 };
 
 struct net_iov_area {
diff --git a/include/net/sock.h b/include/net/sock.h
index c7e58b8e8a90..548fabacff7c 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -350,7 +350,11 @@ struct sk_filter;
   *	@sk_scm_rights: flagged by SO_PASSRIGHTS to recv SCM_RIGHTS
   *	@sk_scm_unused: unused flags for scm_recv()
   *	@ns_tracker: tracker for netns reference
-  *	@sk_user_frags: xarray of pages the user is holding a reference on.
+  *	@sk_devmem_info: the devmem binding information for the socket
+  *		@frags: xarray of tokens for autorelease mode
+  *		@binding: pointer to the dmabuf binding
+  *		@outstanding_urefs: count of outstanding user references
+  *		@autorelease: if true, tokens released on close; if false, user must release
   *	@sk_owner: reference to the real owner of the socket that calls
   *		   sock_lock_init_class_and_name().
   */
@@ -579,7 +583,12 @@ struct sock {
 	struct numa_drop_counters *sk_drop_counters;
 	struct rcu_head		sk_rcu;
 	netns_tracker		ns_tracker;
-	struct xarray		sk_user_frags;
+	struct {
+		struct xarray				frags;
+		struct net_devmem_dmabuf_binding	*binding;
+		atomic_t				outstanding_urefs;
+		bool					autorelease;
+	} sk_devmem_info;
 
 #if IS_ENABLED(CONFIG_PROVE_LOCKING) && IS_ENABLED(CONFIG_MODULES)
 	struct module		*sk_owner;
diff --git a/net/core/devmem.c b/net/core/devmem.c
index 4dee2666dd07..904d19e58f4b 100644
--- a/net/core/devmem.c
+++ b/net/core/devmem.c
@@ -11,6 +11,7 @@
 #include <linux/genalloc.h>
 #include <linux/mm.h>
 #include <linux/netdevice.h>
+#include <linux/skbuff_ref.h>
 #include <linux/types.h>
 #include <net/netdev_queues.h>
 #include <net/netdev_rx_queue.h>
@@ -116,6 +117,24 @@ void net_devmem_free_dmabuf(struct net_iov *niov)
 	gen_pool_free(binding->chunk_pool, dma_addr, PAGE_SIZE);
 }
 
+static void
+net_devmem_dmabuf_binding_put_urefs(struct net_devmem_dmabuf_binding *binding)
+{
+	int i;
+
+	for (i = 0; i < binding->dmabuf->size / PAGE_SIZE; i++) {
+		struct net_iov *niov;
+		netmem_ref netmem;
+
+		niov = binding->vec[i];
+		netmem = net_iov_to_netmem(niov);
+
+		/* Multiple urefs map to only a single netmem ref. */
+		if (atomic_xchg(&niov->uref, 0) > 0)
+			WARN_ON_ONCE(!napi_pp_put_page(netmem));
+	}
+}
+
 void net_devmem_unbind_dmabuf(struct net_devmem_dmabuf_binding *binding)
 {
 	struct netdev_rx_queue *rxq;
@@ -143,6 +162,10 @@ void net_devmem_unbind_dmabuf(struct net_devmem_dmabuf_binding *binding)
 		__net_mp_close_rxq(binding->dev, rxq_idx, &mp_params);
 	}
 
+	/* Clean up any lingering urefs from sockets that had autorelease
+	 * disabled.
+	 */
+	net_devmem_dmabuf_binding_put_urefs(binding);
 	net_devmem_dmabuf_binding_put(binding);
 }
 
@@ -231,14 +254,13 @@ net_devmem_bind_dmabuf(struct net_device *dev,
 		goto err_detach;
 	}
 
-	if (direction == DMA_TO_DEVICE) {
-		binding->vec = kvmalloc_array(dmabuf->size / PAGE_SIZE,
-					      sizeof(struct net_iov *),
-					      GFP_KERNEL);
-		if (!binding->vec) {
-			err = -ENOMEM;
-			goto err_unmap;
-		}
+	/* Used by TX and also by RX when socket has autorelease disabled */
+	binding->vec = kvmalloc_array(dmabuf->size / PAGE_SIZE,
+				      sizeof(struct net_iov *),
+				      GFP_KERNEL | __GFP_ZERO);
+	if (!binding->vec) {
+		err = -ENOMEM;
+		goto err_unmap;
 	}
 
 	/* For simplicity we expect to make PAGE_SIZE allocations, but the
@@ -292,10 +314,10 @@ net_devmem_bind_dmabuf(struct net_device *dev,
 			niov = &owner->area.niovs[i];
 			niov->type = NET_IOV_DMABUF;
 			niov->owner = &owner->area;
+			atomic_set(&niov->uref, 0);
 			page_pool_set_dma_addr_netmem(net_iov_to_netmem(niov),
 						      net_devmem_get_dma_addr(niov));
-			if (direction == DMA_TO_DEVICE)
-				binding->vec[owner->area.base_virtual / PAGE_SIZE + i] = niov;
+			binding->vec[owner->area.base_virtual / PAGE_SIZE + i] = niov;
 		}
 
 		virtual += len;
diff --git a/net/core/devmem.h b/net/core/devmem.h
index 2ada54fb63d7..d4eb28d079bb 100644
--- a/net/core/devmem.h
+++ b/net/core/devmem.h
@@ -61,7 +61,7 @@ struct net_devmem_dmabuf_binding {
 
 	/* Array of net_iov pointers for this binding, sorted by virtual
 	 * address. This array is convenient to map the virtual addresses to
-	 * net_iovs in the TX path.
+	 * net_iovs.
 	 */
 	struct net_iov **vec;
 
diff --git a/net/core/sock.c b/net/core/sock.c
index 5562f517d889..465645c1d74f 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -87,6 +87,7 @@
 
 #include <linux/unaligned.h>
 #include <linux/capability.h>
+#include <linux/dma-buf.h>
 #include <linux/errno.h>
 #include <linux/errqueue.h>
 #include <linux/types.h>
@@ -151,6 +152,7 @@
 #include <uapi/linux/pidfd.h>
 
 #include "dev.h"
+#include "devmem.h"
 
 static DEFINE_MUTEX(proto_list_mutex);
 static LIST_HEAD(proto_list);
@@ -1081,6 +1083,43 @@ static int sock_reserve_memory(struct sock *sk, int bytes)
 #define MAX_DONTNEED_TOKENS 128
 #define MAX_DONTNEED_FRAGS 1024
 
+static noinline_for_stack int
+sock_devmem_dontneed_manual_release(struct sock *sk, struct dmabuf_token *tokens,
+				    unsigned int num_tokens)
+{
+	struct net_iov *niov;
+	unsigned int i, j;
+	netmem_ref netmem;
+	unsigned int token;
+	int num_frags = 0;
+	int ret;
+
+	if (!sk->sk_devmem_info.binding)
+		return -EINVAL;
+
+	for (i = 0; i < num_tokens; i++) {
+		for (j = 0; j < tokens[i].token_count; j++) {
+			token = tokens[i].token_start + j;
+			if (token >= sk->sk_devmem_info.binding->dmabuf->size / PAGE_SIZE)
+				break;
+
+			if (++num_frags > MAX_DONTNEED_FRAGS)
+				return ret;
+
+			niov = sk->sk_devmem_info.binding->vec[token];
+			if (atomic_dec_and_test(&niov->uref)) {
+				netmem = net_iov_to_netmem(niov);
+				WARN_ON_ONCE(!napi_pp_put_page(netmem));
+			}
+			ret++;
+		}
+	}
+
+	atomic_sub(ret, &sk->sk_devmem_info.outstanding_urefs);
+
+	return ret;
+}
+
 static noinline_for_stack int
 sock_devmem_dontneed_autorelease(struct sock *sk, struct dmabuf_token *tokens,
 				 unsigned int num_tokens)
@@ -1089,32 +1128,32 @@ sock_devmem_dontneed_autorelease(struct sock *sk, struct dmabuf_token *tokens,
 	int ret = 0, num_frags = 0;
 	netmem_ref netmems[16];
 
-	xa_lock_bh(&sk->sk_user_frags);
+	xa_lock_bh(&sk->sk_devmem_info.frags);
 	for (i = 0; i < num_tokens; i++) {
 		for (j = 0; j < tokens[i].token_count; j++) {
 			if (++num_frags > MAX_DONTNEED_FRAGS)
 				goto frag_limit_reached;
 
 			netmem_ref netmem = (__force netmem_ref)__xa_erase(
-				&sk->sk_user_frags, tokens[i].token_start + j);
+				&sk->sk_devmem_info.frags, tokens[i].token_start + j);
 
 			if (!netmem || WARN_ON_ONCE(!netmem_is_net_iov(netmem)))
 				continue;
 
 			netmems[netmem_num++] = netmem;
 			if (netmem_num == ARRAY_SIZE(netmems)) {
-				xa_unlock_bh(&sk->sk_user_frags);
+				xa_unlock_bh(&sk->sk_devmem_info.frags);
 				for (k = 0; k < netmem_num; k++)
 					WARN_ON_ONCE(!napi_pp_put_page(netmems[k]));
 				netmem_num = 0;
-				xa_lock_bh(&sk->sk_user_frags);
+				xa_lock_bh(&sk->sk_devmem_info.frags);
 			}
 			ret++;
 		}
 	}
 
 frag_limit_reached:
-	xa_unlock_bh(&sk->sk_user_frags);
+	xa_unlock_bh(&sk->sk_devmem_info.frags);
 	for (k = 0; k < netmem_num; k++)
 		WARN_ON_ONCE(!napi_pp_put_page(netmems[k]));
 
@@ -1145,7 +1184,11 @@ sock_devmem_dontneed(struct sock *sk, sockptr_t optval, unsigned int optlen)
 		return -EFAULT;
 	}
 
-	ret = sock_devmem_dontneed_autorelease(sk, tokens, num_tokens);
+	if (sk->sk_devmem_info.autorelease)
+		ret = sock_devmem_dontneed_autorelease(sk, tokens, num_tokens);
+	else
+		ret = sock_devmem_dontneed_manual_release(sk, tokens,
+							  num_tokens);
 
 	kvfree(tokens);
 	return ret;
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index a9345aa5a2e5..052875c1b547 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -260,6 +260,7 @@
 #include <linux/memblock.h>
 #include <linux/highmem.h>
 #include <linux/cache.h>
+#include <linux/dma-buf.h>
 #include <linux/err.h>
 #include <linux/time.h>
 #include <linux/slab.h>
@@ -492,7 +493,10 @@ void tcp_init_sock(struct sock *sk)
 
 	set_bit(SOCK_SUPPORT_ZC, &sk->sk_socket->flags);
 	sk_sockets_allocated_inc(sk);
-	xa_init_flags(&sk->sk_user_frags, XA_FLAGS_ALLOC1);
+	xa_init_flags(&sk->sk_devmem_info.frags, XA_FLAGS_ALLOC1);
+	sk->sk_devmem_info.binding = NULL;
+	atomic_set(&sk->sk_devmem_info.outstanding_urefs, 0);
+	sk->sk_devmem_info.autorelease = true;
 }
 EXPORT_IPV6_MOD(tcp_init_sock);
 
@@ -2424,11 +2428,11 @@ static void tcp_xa_pool_commit_locked(struct sock *sk, struct tcp_xa_pool *p)
 
 	/* Commit part that has been copied to user space. */
 	for (i = 0; i < p->idx; i++)
-		__xa_cmpxchg(&sk->sk_user_frags, p->tokens[i], XA_ZERO_ENTRY,
+		__xa_cmpxchg(&sk->sk_devmem_info.frags, p->tokens[i], XA_ZERO_ENTRY,
 			     (__force void *)p->netmems[i], GFP_KERNEL);
 	/* Rollback what has been pre-allocated and is no longer needed. */
 	for (; i < p->max; i++)
-		__xa_erase(&sk->sk_user_frags, p->tokens[i]);
+		__xa_erase(&sk->sk_devmem_info.frags, p->tokens[i]);
 
 	p->max = 0;
 	p->idx = 0;
@@ -2436,14 +2440,17 @@ static void tcp_xa_pool_commit_locked(struct sock *sk, struct tcp_xa_pool *p)
 
 static void tcp_xa_pool_commit(struct sock *sk, struct tcp_xa_pool *p)
 {
+	if (!sk->sk_devmem_info.autorelease)
+		return;
+
 	if (!p->max)
 		return;
 
-	xa_lock_bh(&sk->sk_user_frags);
+	xa_lock_bh(&sk->sk_devmem_info.frags);
 
 	tcp_xa_pool_commit_locked(sk, p);
 
-	xa_unlock_bh(&sk->sk_user_frags);
+	xa_unlock_bh(&sk->sk_devmem_info.frags);
 }
 
 static int tcp_xa_pool_refill(struct sock *sk, struct tcp_xa_pool *p,
@@ -2454,18 +2461,18 @@ static int tcp_xa_pool_refill(struct sock *sk, struct tcp_xa_pool *p,
 	if (p->idx < p->max)
 		return 0;
 
-	xa_lock_bh(&sk->sk_user_frags);
+	xa_lock_bh(&sk->sk_devmem_info.frags);
 
 	tcp_xa_pool_commit_locked(sk, p);
 
 	for (k = 0; k < max_frags; k++) {
-		err = __xa_alloc(&sk->sk_user_frags, &p->tokens[k],
+		err = __xa_alloc(&sk->sk_devmem_info.frags, &p->tokens[k],
 				 XA_ZERO_ENTRY, xa_limit_31b, GFP_KERNEL);
 		if (err)
 			break;
 	}
 
-	xa_unlock_bh(&sk->sk_user_frags);
+	xa_unlock_bh(&sk->sk_devmem_info.frags);
 
 	p->max = k;
 	p->idx = 0;
@@ -2479,12 +2486,14 @@ static int tcp_recvmsg_dmabuf(struct sock *sk, const struct sk_buff *skb,
 			      unsigned int offset, struct msghdr *msg,
 			      int remaining_len)
 {
+	struct net_devmem_dmabuf_binding *binding = NULL;
 	struct dmabuf_cmsg dmabuf_cmsg = { 0 };
 	struct tcp_xa_pool tcp_xa_pool;
 	unsigned int start;
 	int i, copy, n;
 	int sent = 0;
 	int err = 0;
+	int refs;
 
 	tcp_xa_pool.max = 0;
 	tcp_xa_pool.idx = 0;
@@ -2536,6 +2545,7 @@ static int tcp_recvmsg_dmabuf(struct sock *sk, const struct sk_buff *skb,
 			skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
 			struct net_iov *niov;
 			u64 frag_offset;
+			u32 token;
 			int end;
 
 			/* !skb_frags_readable() should indicate that ALL the
@@ -2568,13 +2578,32 @@ static int tcp_recvmsg_dmabuf(struct sock *sk, const struct sk_buff *skb,
 					      start;
 				dmabuf_cmsg.frag_offset = frag_offset;
 				dmabuf_cmsg.frag_size = copy;
-				err = tcp_xa_pool_refill(sk, &tcp_xa_pool,
-							 skb_shinfo(skb)->nr_frags - i);
-				if (err)
+
+				binding = net_devmem_iov_binding(niov);
+
+				if (!sk->sk_devmem_info.binding)
+					sk->sk_devmem_info.binding = binding;
+
+				if (sk->sk_devmem_info.binding != binding) {
+					err = -EFAULT;
 					goto out;
+				}
+
+				if (sk->sk_devmem_info.autorelease) {
+					err = tcp_xa_pool_refill(sk, &tcp_xa_pool,
+								 skb_shinfo(skb)->nr_frags - i);
+					if (err)
+						goto out;
+
+					dmabuf_cmsg.frag_token =
+						tcp_xa_pool.tokens[tcp_xa_pool.idx];
+				} else {
+					token = net_iov_virtual_addr(niov) >> PAGE_SHIFT;
+					dmabuf_cmsg.frag_token = token;
+				}
+
 
 				/* Will perform the exchange later */
-				dmabuf_cmsg.frag_token = tcp_xa_pool.tokens[tcp_xa_pool.idx];
 				dmabuf_cmsg.dmabuf_id = net_devmem_iov_binding_id(niov);
 
 				offset += copy;
@@ -2587,8 +2616,15 @@ static int tcp_recvmsg_dmabuf(struct sock *sk, const struct sk_buff *skb,
 				if (err)
 					goto out;
 
-				atomic_long_inc(&niov->pp_ref_count);
-				tcp_xa_pool.netmems[tcp_xa_pool.idx++] = skb_frag_netmem(frag);
+				if (sk->sk_devmem_info.autorelease) {
+					atomic_long_inc(&niov->pp_ref_count);
+					tcp_xa_pool.netmems[tcp_xa_pool.idx++] =
+						skb_frag_netmem(frag);
+				} else {
+					if (atomic_inc_return(&niov->uref) == 1)
+						atomic_long_inc(&niov->pp_ref_count);
+					refs++;
+				}
 
 				sent += copy;
 
@@ -2599,6 +2635,7 @@ static int tcp_recvmsg_dmabuf(struct sock *sk, const struct sk_buff *skb,
 		}
 
 		tcp_xa_pool_commit(sk, &tcp_xa_pool);
+
 		if (!remaining_len)
 			goto out;
 
@@ -2617,9 +2654,13 @@ static int tcp_recvmsg_dmabuf(struct sock *sk, const struct sk_buff *skb,
 
 out:
 	tcp_xa_pool_commit(sk, &tcp_xa_pool);
+
 	if (!sent)
 		sent = err;
 
+	if (refs > 0)
+		atomic_add(refs, &sk->sk_devmem_info.outstanding_urefs);
+
 	return sent;
 }
 
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 40a76da5364a..dbb7a71e3cce 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -89,6 +89,9 @@
 
 #include <crypto/md5.h>
 
+#include <linux/dma-buf.h>
+#include "../core/devmem.h"
+
 #include <trace/events/tcp.h>
 
 #ifdef CONFIG_TCP_MD5SIG
@@ -2493,7 +2496,7 @@ static void tcp_release_user_frags(struct sock *sk)
 	unsigned long index;
 	void *netmem;
 
-	xa_for_each(&sk->sk_user_frags, index, netmem)
+	xa_for_each(&sk->sk_devmem_info.frags, index, netmem)
 		WARN_ON_ONCE(!napi_pp_put_page((__force netmem_ref)netmem));
 #endif
 }
@@ -2502,9 +2505,11 @@ void tcp_v4_destroy_sock(struct sock *sk)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 
-	tcp_release_user_frags(sk);
+	if (sk->sk_devmem_info.binding && sk->sk_devmem_info.autorelease)
+		tcp_release_user_frags(sk);
 
-	xa_destroy(&sk->sk_user_frags);
+	xa_destroy(&sk->sk_devmem_info.frags);
+	sk->sk_devmem_info.binding = NULL;
 
 	trace_tcp_destroy_sock(sk);
 
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index ded2cf1f6006..a017dea35bb8 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -663,7 +663,10 @@ struct sock *tcp_create_openreq_child(const struct sock *sk,
 
 	__TCP_INC_STATS(sock_net(sk), TCP_MIB_PASSIVEOPENS);
 
-	xa_init_flags(&newsk->sk_user_frags, XA_FLAGS_ALLOC1);
+	xa_init_flags(&newsk->sk_devmem_info.frags, XA_FLAGS_ALLOC1);
+	newsk->sk_devmem_info.binding = NULL;
+	atomic_set(&newsk->sk_devmem_info.outstanding_urefs, 0);
+	newsk->sk_devmem_info.autorelease = true;
 
 	return newsk;
 }

-- 
2.47.3
Re: [PATCH net-next v6 3/6] net: devmem: prepare for autorelease rx token management
Posted by Dan Carpenter 1 month, 1 week ago
Hi Bobby,

kernel test robot noticed the following build warnings:

url:    https://github.com/intel-lab-lkp/linux/commits/Bobby-Eshleman/net-devmem-rename-tx_vec-to-vec-in-dmabuf-binding/20251105-092703
base:   255d75ef029f33f75fcf5015052b7302486f7ad2
patch link:    https://lore.kernel.org/r/20251104-scratch-bobbyeshleman-devmem-tcp-token-upstream-v6-3-ea98cf4d40b3%40meta.com
patch subject: [PATCH net-next v6 3/6] net: devmem: prepare for autorelease rx token management
config: nios2-randconfig-r071-20251105 (https://download.01.org/0day-ci/archive/20251106/202511060352.2kPPX3xE-lkp@intel.com/config)
compiler: nios2-linux-gcc (GCC) 8.5.0

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
| Closes: https://lore.kernel.org/r/202511060352.2kPPX3xE-lkp@intel.com/

New smatch warnings:
net/ipv4/tcp.c:2626 tcp_recvmsg_dmabuf() error: uninitialized symbol 'refs'.

vim +/refs +2626 net/ipv4/tcp.c

8f0b3cc9a4c102 Mina Almasry       2024-09-10  2485  static int tcp_recvmsg_dmabuf(struct sock *sk, const struct sk_buff *skb,
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2486  			      unsigned int offset, struct msghdr *msg,
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2487  			      int remaining_len)
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2488  {
45aa39492cf4dd Bobby Eshleman     2025-11-04  2489  	struct net_devmem_dmabuf_binding *binding = NULL;
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2490  	struct dmabuf_cmsg dmabuf_cmsg = { 0 };
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2491  	struct tcp_xa_pool tcp_xa_pool;
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2492  	unsigned int start;
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2493  	int i, copy, n;
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2494  	int sent = 0;
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2495  	int err = 0;
45aa39492cf4dd Bobby Eshleman     2025-11-04  2496  	int refs;

refs needs to be initialized to zero.  Best practice is to use
CONFIG_INIT_STACK_ALL_PATTERN for testing.

8f0b3cc9a4c102 Mina Almasry       2024-09-10  2497  
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2498  	tcp_xa_pool.max = 0;
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2499  	tcp_xa_pool.idx = 0;
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2500  	do {
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2501  		start = skb_headlen(skb);
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2502  
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2503  		if (skb_frags_readable(skb)) {
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2504  			err = -ENODEV;
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2505  			goto out;
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2506  		}
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2507  
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2508  		/* Copy header. */
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2509  		copy = start - offset;
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2510  		if (copy > 0) {
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2511  			copy = min(copy, remaining_len);
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2512  
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2513  			n = copy_to_iter(skb->data + offset, copy,
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2514  					 &msg->msg_iter);
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2515  			if (n != copy) {
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2516  				err = -EFAULT;
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2517  				goto out;
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2518  			}
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2519  
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2520  			offset += copy;
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2521  			remaining_len -= copy;
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2522  
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2523  			/* First a dmabuf_cmsg for # bytes copied to user
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2524  			 * buffer.
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2525  			 */
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2526  			memset(&dmabuf_cmsg, 0, sizeof(dmabuf_cmsg));
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2527  			dmabuf_cmsg.frag_size = copy;
18912c520674ec Stanislav Fomichev 2025-02-24  2528  			err = put_cmsg_notrunc(msg, SOL_SOCKET,
18912c520674ec Stanislav Fomichev 2025-02-24  2529  					       SO_DEVMEM_LINEAR,
18912c520674ec Stanislav Fomichev 2025-02-24  2530  					       sizeof(dmabuf_cmsg),
18912c520674ec Stanislav Fomichev 2025-02-24  2531  					       &dmabuf_cmsg);
18912c520674ec Stanislav Fomichev 2025-02-24  2532  			if (err)
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2533  				goto out;
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2534  
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2535  			sent += copy;
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2536  
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2537  			if (remaining_len == 0)
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2538  				goto out;
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2539  		}
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2540  
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2541  		/* after that, send information of dmabuf pages through a
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2542  		 * sequence of cmsg
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2543  		 */
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2544  		for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2545  			skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2546  			struct net_iov *niov;
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2547  			u64 frag_offset;
45aa39492cf4dd Bobby Eshleman     2025-11-04  2548  			u32 token;
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2549  			int end;
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2550  
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2551  			/* !skb_frags_readable() should indicate that ALL the
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2552  			 * frags in this skb are dmabuf net_iovs. We're checking
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2553  			 * for that flag above, but also check individual frags
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2554  			 * here. If the tcp stack is not setting
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2555  			 * skb_frags_readable() correctly, we still don't want
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2556  			 * to crash here.
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2557  			 */
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2558  			if (!skb_frag_net_iov(frag)) {
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2559  				net_err_ratelimited("Found non-dmabuf skb with net_iov");
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2560  				err = -ENODEV;
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2561  				goto out;
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2562  			}
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2563  
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2564  			niov = skb_frag_net_iov(frag);
69e39537b66232 Pavel Begunkov     2025-02-04  2565  			if (!net_is_devmem_iov(niov)) {
69e39537b66232 Pavel Begunkov     2025-02-04  2566  				err = -ENODEV;
69e39537b66232 Pavel Begunkov     2025-02-04  2567  				goto out;
69e39537b66232 Pavel Begunkov     2025-02-04  2568  			}
69e39537b66232 Pavel Begunkov     2025-02-04  2569  
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2570  			end = start + skb_frag_size(frag);
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2571  			copy = end - offset;
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2572  
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2573  			if (copy > 0) {
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2574  				copy = min(copy, remaining_len);
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2575  
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2576  				frag_offset = net_iov_virtual_addr(niov) +
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2577  					      skb_frag_off(frag) + offset -
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2578  					      start;
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2579  				dmabuf_cmsg.frag_offset = frag_offset;
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2580  				dmabuf_cmsg.frag_size = copy;
45aa39492cf4dd Bobby Eshleman     2025-11-04  2581  
45aa39492cf4dd Bobby Eshleman     2025-11-04  2582  				binding = net_devmem_iov_binding(niov);
45aa39492cf4dd Bobby Eshleman     2025-11-04  2583  
45aa39492cf4dd Bobby Eshleman     2025-11-04  2584  				if (!sk->sk_devmem_info.binding)
45aa39492cf4dd Bobby Eshleman     2025-11-04  2585  					sk->sk_devmem_info.binding = binding;
45aa39492cf4dd Bobby Eshleman     2025-11-04  2586  
45aa39492cf4dd Bobby Eshleman     2025-11-04  2587  				if (sk->sk_devmem_info.binding != binding) {
45aa39492cf4dd Bobby Eshleman     2025-11-04  2588  					err = -EFAULT;
45aa39492cf4dd Bobby Eshleman     2025-11-04  2589  					goto out;
45aa39492cf4dd Bobby Eshleman     2025-11-04  2590  				}
45aa39492cf4dd Bobby Eshleman     2025-11-04  2591  
45aa39492cf4dd Bobby Eshleman     2025-11-04  2592  				if (sk->sk_devmem_info.autorelease) {
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2593  					err = tcp_xa_pool_refill(sk, &tcp_xa_pool,
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2594  								 skb_shinfo(skb)->nr_frags - i);
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2595  					if (err)
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2596  						goto out;
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2597  
45aa39492cf4dd Bobby Eshleman     2025-11-04  2598  					dmabuf_cmsg.frag_token =
45aa39492cf4dd Bobby Eshleman     2025-11-04  2599  						tcp_xa_pool.tokens[tcp_xa_pool.idx];
45aa39492cf4dd Bobby Eshleman     2025-11-04  2600  				} else {
45aa39492cf4dd Bobby Eshleman     2025-11-04  2601  					token = net_iov_virtual_addr(niov) >> PAGE_SHIFT;
45aa39492cf4dd Bobby Eshleman     2025-11-04  2602  					dmabuf_cmsg.frag_token = token;
45aa39492cf4dd Bobby Eshleman     2025-11-04  2603  				}
45aa39492cf4dd Bobby Eshleman     2025-11-04  2604  
45aa39492cf4dd Bobby Eshleman     2025-11-04  2605  
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2606  				/* Will perform the exchange later */
297d389e9e5bd4 Pavel Begunkov     2025-02-04  2607  				dmabuf_cmsg.dmabuf_id = net_devmem_iov_binding_id(niov);
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2608  
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2609  				offset += copy;
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2610  				remaining_len -= copy;
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2611  
18912c520674ec Stanislav Fomichev 2025-02-24  2612  				err = put_cmsg_notrunc(msg, SOL_SOCKET,
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2613  						       SO_DEVMEM_DMABUF,
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2614  						       sizeof(dmabuf_cmsg),
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2615  						       &dmabuf_cmsg);
18912c520674ec Stanislav Fomichev 2025-02-24  2616  				if (err)
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2617  					goto out;
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2618  
45aa39492cf4dd Bobby Eshleman     2025-11-04  2619  				if (sk->sk_devmem_info.autorelease) {
45aa39492cf4dd Bobby Eshleman     2025-11-04  2620  					atomic_long_inc(&niov->pp_ref_count);
45aa39492cf4dd Bobby Eshleman     2025-11-04  2621  					tcp_xa_pool.netmems[tcp_xa_pool.idx++] =
45aa39492cf4dd Bobby Eshleman     2025-11-04  2622  						skb_frag_netmem(frag);
45aa39492cf4dd Bobby Eshleman     2025-11-04  2623  				} else {
45aa39492cf4dd Bobby Eshleman     2025-11-04  2624  					if (atomic_inc_return(&niov->uref) == 1)
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2625  						atomic_long_inc(&niov->pp_ref_count);
45aa39492cf4dd Bobby Eshleman     2025-11-04 @2626  					refs++;
                                                                                        ^^^^^^

45aa39492cf4dd Bobby Eshleman     2025-11-04  2627  				}
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2628  
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2629  				sent += copy;
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2630  
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2631  				if (remaining_len == 0)
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2632  					goto out;
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2633  			}
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2634  			start = end;
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2635  		}
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2636  
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2637  		tcp_xa_pool_commit(sk, &tcp_xa_pool);
45aa39492cf4dd Bobby Eshleman     2025-11-04  2638  
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2639  		if (!remaining_len)
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2640  			goto out;
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2641  
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2642  		/* if remaining_len is not satisfied yet, we need to go to the
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2643  		 * next frag in the frag_list to satisfy remaining_len.
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2644  		 */
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2645  		skb = skb_shinfo(skb)->frag_list ?: skb->next;
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2646  
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2647  		offset = offset - start;
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2648  	} while (skb);
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2649  
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2650  	if (remaining_len) {
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2651  		err = -EFAULT;
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2652  		goto out;
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2653  	}
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2654  
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2655  out:
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2656  	tcp_xa_pool_commit(sk, &tcp_xa_pool);
45aa39492cf4dd Bobby Eshleman     2025-11-04  2657  
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2658  	if (!sent)
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2659  		sent = err;
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2660  
45aa39492cf4dd Bobby Eshleman     2025-11-04  2661  	if (refs > 0)
45aa39492cf4dd Bobby Eshleman     2025-11-04  2662  		atomic_add(refs, &sk->sk_devmem_info.outstanding_urefs);
45aa39492cf4dd Bobby Eshleman     2025-11-04  2663  
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2664  	return sent;
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2665  }
8f0b3cc9a4c102 Mina Almasry       2024-09-10  2666  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Re: [PATCH net-next v6 3/6] net: devmem: prepare for autorelease rx token management
Posted by Dan Carpenter 1 month, 1 week ago
Hi Bobby,

kernel test robot noticed the following build warnings:

url:    https://github.com/intel-lab-lkp/linux/commits/Bobby-Eshleman/net-devmem-rename-tx_vec-to-vec-in-dmabuf-binding/20251105-092703
base:   255d75ef029f33f75fcf5015052b7302486f7ad2
patch link:    https://lore.kernel.org/r/20251104-scratch-bobbyeshleman-devmem-tcp-token-upstream-v6-3-ea98cf4d40b3%40meta.com
patch subject: [PATCH net-next v6 3/6] net: devmem: prepare for autorelease rx token management
config: openrisc-randconfig-r073-20251105 (https://download.01.org/0day-ci/archive/20251106/202511060119.MAzcsLoN-lkp@intel.com/config)
compiler: or1k-linux-gcc (GCC) 10.5.0

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
| Closes: https://lore.kernel.org/r/202511060119.MAzcsLoN-lkp@intel.com/

New smatch warnings:
net/core/sock.c:1107 sock_devmem_dontneed_manual_release() error: uninitialized symbol 'ret'.

vim +/ret +1107 net/core/sock.c

45aa39492cf4dd Bobby Eshleman 2025-11-04  1086  static noinline_for_stack int
45aa39492cf4dd Bobby Eshleman 2025-11-04  1087  sock_devmem_dontneed_manual_release(struct sock *sk, struct dmabuf_token *tokens,
45aa39492cf4dd Bobby Eshleman 2025-11-04  1088  				    unsigned int num_tokens)
45aa39492cf4dd Bobby Eshleman 2025-11-04  1089  {
45aa39492cf4dd Bobby Eshleman 2025-11-04  1090  	struct net_iov *niov;
45aa39492cf4dd Bobby Eshleman 2025-11-04  1091  	unsigned int i, j;
45aa39492cf4dd Bobby Eshleman 2025-11-04  1092  	netmem_ref netmem;
45aa39492cf4dd Bobby Eshleman 2025-11-04  1093  	unsigned int token;
45aa39492cf4dd Bobby Eshleman 2025-11-04  1094  	int num_frags = 0;
45aa39492cf4dd Bobby Eshleman 2025-11-04  1095  	int ret;

ret needs to be = 0;

45aa39492cf4dd Bobby Eshleman 2025-11-04  1096  
45aa39492cf4dd Bobby Eshleman 2025-11-04  1097  	if (!sk->sk_devmem_info.binding)
45aa39492cf4dd Bobby Eshleman 2025-11-04  1098  		return -EINVAL;
45aa39492cf4dd Bobby Eshleman 2025-11-04  1099  
45aa39492cf4dd Bobby Eshleman 2025-11-04  1100  	for (i = 0; i < num_tokens; i++) {
45aa39492cf4dd Bobby Eshleman 2025-11-04  1101  		for (j = 0; j < tokens[i].token_count; j++) {
45aa39492cf4dd Bobby Eshleman 2025-11-04  1102  			token = tokens[i].token_start + j;
45aa39492cf4dd Bobby Eshleman 2025-11-04  1103  			if (token >= sk->sk_devmem_info.binding->dmabuf->size / PAGE_SIZE)
45aa39492cf4dd Bobby Eshleman 2025-11-04  1104  				break;
45aa39492cf4dd Bobby Eshleman 2025-11-04  1105  
45aa39492cf4dd Bobby Eshleman 2025-11-04  1106  			if (++num_frags > MAX_DONTNEED_FRAGS)
45aa39492cf4dd Bobby Eshleman 2025-11-04 @1107  				return ret;

Uninitialized.  It's always a good idea to test code with
CONFIG_INIT_STACK_ALL_PATTERN.

45aa39492cf4dd Bobby Eshleman 2025-11-04  1108  
45aa39492cf4dd Bobby Eshleman 2025-11-04  1109  			niov = sk->sk_devmem_info.binding->vec[token];
45aa39492cf4dd Bobby Eshleman 2025-11-04  1110  			if (atomic_dec_and_test(&niov->uref)) {
45aa39492cf4dd Bobby Eshleman 2025-11-04  1111  				netmem = net_iov_to_netmem(niov);
45aa39492cf4dd Bobby Eshleman 2025-11-04  1112  				WARN_ON_ONCE(!napi_pp_put_page(netmem));
45aa39492cf4dd Bobby Eshleman 2025-11-04  1113  			}
45aa39492cf4dd Bobby Eshleman 2025-11-04  1114  			ret++;

Uninitialized.

45aa39492cf4dd Bobby Eshleman 2025-11-04  1115  		}
45aa39492cf4dd Bobby Eshleman 2025-11-04  1116  	}
45aa39492cf4dd Bobby Eshleman 2025-11-04  1117  
45aa39492cf4dd Bobby Eshleman 2025-11-04  1118  	atomic_sub(ret, &sk->sk_devmem_info.outstanding_urefs);
45aa39492cf4dd Bobby Eshleman 2025-11-04  1119  
45aa39492cf4dd Bobby Eshleman 2025-11-04  1120  	return ret;
45aa39492cf4dd Bobby Eshleman 2025-11-04  1121  }

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Re: [PATCH net-next v6 3/6] net: devmem: prepare for autorelease rx token management
Posted by kernel test robot 1 month, 1 week ago
Hi Bobby,

kernel test robot noticed the following build warnings:

[auto build test WARNING on 255d75ef029f33f75fcf5015052b7302486f7ad2]

url:    https://github.com/intel-lab-lkp/linux/commits/Bobby-Eshleman/net-devmem-rename-tx_vec-to-vec-in-dmabuf-binding/20251105-092703
base:   255d75ef029f33f75fcf5015052b7302486f7ad2
patch link:    https://lore.kernel.org/r/20251104-scratch-bobbyeshleman-devmem-tcp-token-upstream-v6-3-ea98cf4d40b3%40meta.com
patch subject: [PATCH net-next v6 3/6] net: devmem: prepare for autorelease rx token management
config: arc-nsimosci_hs_defconfig (https://download.01.org/0day-ci/archive/20251106/202511060345.AQs0FTNg-lkp@intel.com/config)
compiler: arc-linux-gcc (GCC) 15.1.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251106/202511060345.AQs0FTNg-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202511060345.AQs0FTNg-lkp@intel.com/

All warnings (new ones prefixed by >>):

   net/ipv4/tcp.c: In function 'tcp_recvmsg_dmabuf':
>> net/ipv4/tcp.c:2661:12: warning: 'refs' is used uninitialized [-Wuninitialized]
    2661 |         if (refs > 0)
         |            ^
   net/ipv4/tcp.c:2496:13: note: 'refs' was declared here
    2496 |         int refs;
         |             ^~~~


vim +/refs +2661 net/ipv4/tcp.c

  2481	
  2482	/* On error, returns the -errno. On success, returns number of bytes sent to the
  2483	 * user. May not consume all of @remaining_len.
  2484	 */
  2485	static int tcp_recvmsg_dmabuf(struct sock *sk, const struct sk_buff *skb,
  2486				      unsigned int offset, struct msghdr *msg,
  2487				      int remaining_len)
  2488	{
  2489		struct net_devmem_dmabuf_binding *binding = NULL;
  2490		struct dmabuf_cmsg dmabuf_cmsg = { 0 };
  2491		struct tcp_xa_pool tcp_xa_pool;
  2492		unsigned int start;
  2493		int i, copy, n;
  2494		int sent = 0;
  2495		int err = 0;
  2496		int refs;
  2497	
  2498		tcp_xa_pool.max = 0;
  2499		tcp_xa_pool.idx = 0;
  2500		do {
  2501			start = skb_headlen(skb);
  2502	
  2503			if (skb_frags_readable(skb)) {
  2504				err = -ENODEV;
  2505				goto out;
  2506			}
  2507	
  2508			/* Copy header. */
  2509			copy = start - offset;
  2510			if (copy > 0) {
  2511				copy = min(copy, remaining_len);
  2512	
  2513				n = copy_to_iter(skb->data + offset, copy,
  2514						 &msg->msg_iter);
  2515				if (n != copy) {
  2516					err = -EFAULT;
  2517					goto out;
  2518				}
  2519	
  2520				offset += copy;
  2521				remaining_len -= copy;
  2522	
  2523				/* First a dmabuf_cmsg for # bytes copied to user
  2524				 * buffer.
  2525				 */
  2526				memset(&dmabuf_cmsg, 0, sizeof(dmabuf_cmsg));
  2527				dmabuf_cmsg.frag_size = copy;
  2528				err = put_cmsg_notrunc(msg, SOL_SOCKET,
  2529						       SO_DEVMEM_LINEAR,
  2530						       sizeof(dmabuf_cmsg),
  2531						       &dmabuf_cmsg);
  2532				if (err)
  2533					goto out;
  2534	
  2535				sent += copy;
  2536	
  2537				if (remaining_len == 0)
  2538					goto out;
  2539			}
  2540	
  2541			/* after that, send information of dmabuf pages through a
  2542			 * sequence of cmsg
  2543			 */
  2544			for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
  2545				skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
  2546				struct net_iov *niov;
  2547				u64 frag_offset;
  2548				u32 token;
  2549				int end;
  2550	
  2551				/* !skb_frags_readable() should indicate that ALL the
  2552				 * frags in this skb are dmabuf net_iovs. We're checking
  2553				 * for that flag above, but also check individual frags
  2554				 * here. If the tcp stack is not setting
  2555				 * skb_frags_readable() correctly, we still don't want
  2556				 * to crash here.
  2557				 */
  2558				if (!skb_frag_net_iov(frag)) {
  2559					net_err_ratelimited("Found non-dmabuf skb with net_iov");
  2560					err = -ENODEV;
  2561					goto out;
  2562				}
  2563	
  2564				niov = skb_frag_net_iov(frag);
  2565				if (!net_is_devmem_iov(niov)) {
  2566					err = -ENODEV;
  2567					goto out;
  2568				}
  2569	
  2570				end = start + skb_frag_size(frag);
  2571				copy = end - offset;
  2572	
  2573				if (copy > 0) {
  2574					copy = min(copy, remaining_len);
  2575	
  2576					frag_offset = net_iov_virtual_addr(niov) +
  2577						      skb_frag_off(frag) + offset -
  2578						      start;
  2579					dmabuf_cmsg.frag_offset = frag_offset;
  2580					dmabuf_cmsg.frag_size = copy;
  2581	
  2582					binding = net_devmem_iov_binding(niov);
  2583	
  2584					if (!sk->sk_devmem_info.binding)
  2585						sk->sk_devmem_info.binding = binding;
  2586	
  2587					if (sk->sk_devmem_info.binding != binding) {
  2588						err = -EFAULT;
  2589						goto out;
  2590					}
  2591	
  2592					if (sk->sk_devmem_info.autorelease) {
  2593						err = tcp_xa_pool_refill(sk, &tcp_xa_pool,
  2594									 skb_shinfo(skb)->nr_frags - i);
  2595						if (err)
  2596							goto out;
  2597	
  2598						dmabuf_cmsg.frag_token =
  2599							tcp_xa_pool.tokens[tcp_xa_pool.idx];
  2600					} else {
  2601						token = net_iov_virtual_addr(niov) >> PAGE_SHIFT;
  2602						dmabuf_cmsg.frag_token = token;
  2603					}
  2604	
  2605	
  2606					/* Will perform the exchange later */
  2607					dmabuf_cmsg.dmabuf_id = net_devmem_iov_binding_id(niov);
  2608	
  2609					offset += copy;
  2610					remaining_len -= copy;
  2611	
  2612					err = put_cmsg_notrunc(msg, SOL_SOCKET,
  2613							       SO_DEVMEM_DMABUF,
  2614							       sizeof(dmabuf_cmsg),
  2615							       &dmabuf_cmsg);
  2616					if (err)
  2617						goto out;
  2618	
  2619					if (sk->sk_devmem_info.autorelease) {
  2620						atomic_long_inc(&niov->pp_ref_count);
  2621						tcp_xa_pool.netmems[tcp_xa_pool.idx++] =
  2622							skb_frag_netmem(frag);
  2623					} else {
  2624						if (atomic_inc_return(&niov->uref) == 1)
  2625							atomic_long_inc(&niov->pp_ref_count);
  2626						refs++;
  2627					}
  2628	
  2629					sent += copy;
  2630	
  2631					if (remaining_len == 0)
  2632						goto out;
  2633				}
  2634				start = end;
  2635			}
  2636	
  2637			tcp_xa_pool_commit(sk, &tcp_xa_pool);
  2638	
  2639			if (!remaining_len)
  2640				goto out;
  2641	
  2642			/* if remaining_len is not satisfied yet, we need to go to the
  2643			 * next frag in the frag_list to satisfy remaining_len.
  2644			 */
  2645			skb = skb_shinfo(skb)->frag_list ?: skb->next;
  2646	
  2647			offset = offset - start;
  2648		} while (skb);
  2649	
  2650		if (remaining_len) {
  2651			err = -EFAULT;
  2652			goto out;
  2653		}
  2654	
  2655	out:
  2656		tcp_xa_pool_commit(sk, &tcp_xa_pool);
  2657	
  2658		if (!sent)
  2659			sent = err;
  2660	
> 2661		if (refs > 0)
  2662			atomic_add(refs, &sk->sk_devmem_info.outstanding_urefs);
  2663	
  2664		return sent;
  2665	}
  2666	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Re: [PATCH net-next v6 3/6] net: devmem: prepare for autorelease rx token management
Posted by kernel test robot 1 month, 1 week ago
Hi Bobby,

kernel test robot noticed the following build warnings:

[auto build test WARNING on 255d75ef029f33f75fcf5015052b7302486f7ad2]

url:    https://github.com/intel-lab-lkp/linux/commits/Bobby-Eshleman/net-devmem-rename-tx_vec-to-vec-in-dmabuf-binding/20251105-092703
base:   255d75ef029f33f75fcf5015052b7302486f7ad2
patch link:    https://lore.kernel.org/r/20251104-scratch-bobbyeshleman-devmem-tcp-token-upstream-v6-3-ea98cf4d40b3%40meta.com
patch subject: [PATCH net-next v6 3/6] net: devmem: prepare for autorelease rx token management
config: x86_64-kexec (https://download.01.org/0day-ci/archive/20251105/202511052307.doTV8fDn-lkp@intel.com/config)
compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251105/202511052307.doTV8fDn-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202511052307.doTV8fDn-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> net/core/sock.c:1107:12: warning: variable 'ret' is uninitialized when used here [-Wuninitialized]
    1107 |                                 return ret;
         |                                        ^~~
   net/core/sock.c:1095:9: note: initialize the variable 'ret' to silence this warning
    1095 |         int ret;
         |                ^
         |                 = 0
   1 warning generated.
--
>> net/ipv4/tcp.c:2626:6: warning: variable 'refs' is uninitialized when used here [-Wuninitialized]
    2626 |                                         refs++;
         |                                         ^~~~
   net/ipv4/tcp.c:2496:10: note: initialize the variable 'refs' to silence this warning
    2496 |         int refs;
         |                 ^
         |                  = 0
   1 warning generated.


vim +/ret +1107 net/core/sock.c

  1085	
  1086	static noinline_for_stack int
  1087	sock_devmem_dontneed_manual_release(struct sock *sk, struct dmabuf_token *tokens,
  1088					    unsigned int num_tokens)
  1089	{
  1090		struct net_iov *niov;
  1091		unsigned int i, j;
  1092		netmem_ref netmem;
  1093		unsigned int token;
  1094		int num_frags = 0;
  1095		int ret;
  1096	
  1097		if (!sk->sk_devmem_info.binding)
  1098			return -EINVAL;
  1099	
  1100		for (i = 0; i < num_tokens; i++) {
  1101			for (j = 0; j < tokens[i].token_count; j++) {
  1102				token = tokens[i].token_start + j;
  1103				if (token >= sk->sk_devmem_info.binding->dmabuf->size / PAGE_SIZE)
  1104					break;
  1105	
  1106				if (++num_frags > MAX_DONTNEED_FRAGS)
> 1107					return ret;
  1108	
  1109				niov = sk->sk_devmem_info.binding->vec[token];
  1110				if (atomic_dec_and_test(&niov->uref)) {
  1111					netmem = net_iov_to_netmem(niov);
  1112					WARN_ON_ONCE(!napi_pp_put_page(netmem));
  1113				}
  1114				ret++;
  1115			}
  1116		}
  1117	
  1118		atomic_sub(ret, &sk->sk_devmem_info.outstanding_urefs);
  1119	
  1120		return ret;
  1121	}
  1122	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki