From nobody Sun Feb 8 19:56:11 2026 Received: from mail-yw1-f172.google.com (mail-yw1-f172.google.com [209.85.128.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B99C81F5842 for ; Thu, 8 Jan 2026 00:58:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767833923; cv=none; b=khOnJoD8i8RVk2MGUuIl0HRMX9nWt7BScKYWic3uFONHzYf9ors0bx1tXT/q2t+rqjcCxm45A8dBOGTzzJ39nSGQSl0PhGygSHNLlp2XDPSz+Nl+PzvfODQwJz8wWqRkXEkzyKn4ejsNRBWQNh7Q0OLdYRAl4yhZZ1Hf8bfVJ6U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767833923; c=relaxed/simple; bh=xiO09bVBGC8/K7L0bw2p16Wp9QD28tXsWfMmjzD/Ua4=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=ko7kvvMSwIMUfXHiTWBttddMrNJ28NKZBFPF3EFTZnRw19EK9O9u+SA5LWFhE2gPyUMyotLFYDq/4AyBgoFPIoxaL1KZYZqZVDnwM35pn01uJZqliZ0wvbwpHnhnVmBS9xDQCZa7T0OPfqCUWyhknxOHxDNYrSAWq2C/UBpKZs8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=nK0zWCxG; arc=none smtp.client-ip=209.85.128.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="nK0zWCxG" Received: by mail-yw1-f172.google.com with SMTP id 00721157ae682-78f99901ed5so29493457b3.3 for ; Wed, 07 Jan 2026 16:58:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1767833920; x=1768438720; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=07Oq7s3cDlA8pW6W82CkM0efPRUiBJHxCtezHHHyhNs=; b=nK0zWCxG+7ICfWOQws3yIjm43ZkGhGo4uuZ4ZqbH8ilbkjQ9rREnikv5JtGbcMOun5 zwc6hf1kHfJkLoadUAoB9Lka0n1JXGBwSXEbRdJS582scMYHM5t3IYF3rNnPXxo0wcPd qCo7m2Y7IbkLIemmbBBxPswVAiBk488Sco8JFSlcKgQljCdJ+2z0F2isc58DU9Q4ulIK xtoWY9D4Yb8LrDQ0p4B9llc/5sX8n3Sh2GvBeVQ9Q4nivxk4DOGNxf628OK41qgnWDc3 NG+LIptZK6vuUmYofm5m7UPKO5tPhTrrNTMRm9bxCcaN4afmY2b4gBVo2hIeIK7SFUXh 5MKw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1767833920; x=1768438720; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=07Oq7s3cDlA8pW6W82CkM0efPRUiBJHxCtezHHHyhNs=; b=MU0XpLwPNrdgBwL/lUHS7ukNP5fbihAkP29Ucfqht9xLCOPkJLvfBCTGzXOji4E7gx t3UnjrQjWY7m48ZApu/F2uQyicPmPKaEji4JuXYFpSFgVJHxKgStl1BH+tZlQcLZ5YqF weB71rSruIFJl5l0Y8Nfi3sGX8T+L0WSFqnwQhhVYOhzomrMctg2UAfEpBpzA3eIgxKt IaL5OML7/+yyqi/Me50ZB0D96qMvn/ujRPLbstkzM4qvLL4UU9Ep+WYWaeY3Uugcl6GU ifnI/xz2nD61Z5Q7pe/OXFYeYXNxE/o9Uv5LLa/sE2/eMfUeO1Lm9GHLQsQ58vXxnyD8 WadQ== X-Forwarded-Encrypted: i=1; AJvYcCWtvUGcDoYTbntf38gjSbsTRTWiiQ4DqTCev0BsZP99gno+f7I72P0BCkbf8Do2qww8gGI2lHm5n/5PU9E=@vger.kernel.org X-Gm-Message-State: AOJu0YxQAlIHVepuojsFrDofHcgnxHFWlDOIABDZTWAZaHzIJnMJpSgY Dbrs3gBQwBQJgGVgQjKOPSXIrZ4ZhKcIdqq55TbjxtbnBmRgNGbbw+Gyu3d4oQ== X-Gm-Gg: AY/fxX66Jh+bymeYBj0V3Sn8bBlcQCYzsXVblZIoHNgeD2yDy4K8Ku8dAewEDwnlOWH yjgAXRzoIDGIcvnJX/4R0DhFEum2wb1/BRh80/JsgBrRDQG6zuVgkmrLikuAmXZULQcW59dh4zq MWkD2tsOIOlm06EmRrB1xrycinUtQdqyPyTecxcjeePgyWxkbcgblTFphTIpYfVrAwr9GXdUckH KRVuHVlfLRmgUip2a1xEZen6QV1b/pv0BqHCcPg3+qmkvceBCpgVdyBMObjfOv1oBeVcYemRwwy f3xhC5LpJGT4gR4dT6F7FKsgwr1K+MFfIfZqEMNMS0IjKQyayV/93/Q4DopH1aoaqXUKJd3Hqeo irDuuMTRiAXky6NpsmOoVQN3poDeiIGrvYI6TOxQ1TRc/IWNklGLyh/82jScL9eHArg65rd4SMb P8nJltn1OrANiQ33gO0fLr X-Google-Smtp-Source: AGHT+IGQTy89arTbFGwfJfLq1EIuhj9ChaqppYiVw6O7/CqtpiLLwNSu2qqwSquoJsQ/2MZMGVqI+A== X-Received: by 2002:a05:690e:13c4:b0:644:db91:eec4 with SMTP id 956f58d0204a3-64716b5f636mr4848198d50.20.1767833920101; Wed, 07 Jan 2026 16:58:40 -0800 (PST) Received: from localhost ([2a03:2880:25ff:5d::]) by smtp.gmail.com with ESMTPSA id 956f58d0204a3-6470d8c500esm2695494d50.23.2026.01.07.16.58.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 07 Jan 2026 16:58:39 -0800 (PST) From: Bobby Eshleman Date: Wed, 07 Jan 2026 16:57:35 -0800 Subject: [PATCH net-next v8 1/5] net: devmem: rename tx_vec to vec in dmabuf binding Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260107-scratch-bobbyeshleman-devmem-tcp-token-upstream-v8-1-92c968631496@meta.com> References: <20260107-scratch-bobbyeshleman-devmem-tcp-token-upstream-v8-0-92c968631496@meta.com> In-Reply-To: <20260107-scratch-bobbyeshleman-devmem-tcp-token-upstream-v8-0-92c968631496@meta.com> To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Simon Horman , Kuniyuki Iwashima , Willem de Bruijn , Neal Cardwell , David Ahern , Arnd Bergmann , Jonathan Corbet , Andrew Lunn , Shuah Khan , Donald Hunter , Mina Almasry Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, Stanislav Fomichev , asml.silence@gmail.com, matttbe@kernel.org, skhawaja@google.com, Bobby Eshleman X-Mailer: b4 0.14.3 From: Bobby Eshleman Rename the 'tx_vec' field in struct net_devmem_dmabuf_binding to 'vec'. This field holds pointers to net_iov structures. The rename prepares for reusing 'vec' for both TX and RX directions. No functional change intended. Reviewed-by: Mina Almasry Signed-off-by: Bobby Eshleman --- net/core/devmem.c | 22 +++++++++++----------- net/core/devmem.h | 2 +- 2 files changed, 12 insertions(+), 12 deletions(-) diff --git a/net/core/devmem.c b/net/core/devmem.c index ec4217d6c0b4..05a9a9e7abb9 100644 --- a/net/core/devmem.c +++ b/net/core/devmem.c @@ -75,7 +75,7 @@ void __net_devmem_dmabuf_binding_free(struct work_struct = *wq) dma_buf_detach(binding->dmabuf, binding->attachment); dma_buf_put(binding->dmabuf); xa_destroy(&binding->bound_rxqs); - kvfree(binding->tx_vec); + kvfree(binding->vec); kfree(binding); } =20 @@ -232,10 +232,10 @@ net_devmem_bind_dmabuf(struct net_device *dev, } =20 if (direction =3D=3D DMA_TO_DEVICE) { - binding->tx_vec =3D kvmalloc_array(dmabuf->size / PAGE_SIZE, - sizeof(struct net_iov *), - GFP_KERNEL); - if (!binding->tx_vec) { + binding->vec =3D kvmalloc_array(dmabuf->size / PAGE_SIZE, + sizeof(struct net_iov *), + GFP_KERNEL); + if (!binding->vec) { err =3D -ENOMEM; goto err_unmap; } @@ -249,7 +249,7 @@ net_devmem_bind_dmabuf(struct net_device *dev, dev_to_node(&dev->dev)); if (!binding->chunk_pool) { err =3D -ENOMEM; - goto err_tx_vec; + goto err_vec; } =20 virtual =3D 0; @@ -295,7 +295,7 @@ net_devmem_bind_dmabuf(struct net_device *dev, page_pool_set_dma_addr_netmem(net_iov_to_netmem(niov), net_devmem_get_dma_addr(niov)); if (direction =3D=3D DMA_TO_DEVICE) - binding->tx_vec[owner->area.base_virtual / PAGE_SIZE + i] =3D niov; + binding->vec[owner->area.base_virtual / PAGE_SIZE + i] =3D niov; } =20 virtual +=3D len; @@ -315,8 +315,8 @@ net_devmem_bind_dmabuf(struct net_device *dev, gen_pool_for_each_chunk(binding->chunk_pool, net_devmem_dmabuf_free_chunk_owner, NULL); gen_pool_destroy(binding->chunk_pool); -err_tx_vec: - kvfree(binding->tx_vec); +err_vec: + kvfree(binding->vec); err_unmap: dma_buf_unmap_attachment_unlocked(binding->attachment, binding->sgt, direction); @@ -363,7 +363,7 @@ struct net_devmem_dmabuf_binding *net_devmem_get_bindin= g(struct sock *sk, int err =3D 0; =20 binding =3D net_devmem_lookup_dmabuf(dmabuf_id); - if (!binding || !binding->tx_vec) { + if (!binding || !binding->vec) { err =3D -EINVAL; goto out_err; } @@ -414,7 +414,7 @@ net_devmem_get_niov_at(struct net_devmem_dmabuf_binding= *binding, *off =3D virt_addr % PAGE_SIZE; *size =3D PAGE_SIZE - *off; =20 - return binding->tx_vec[virt_addr / PAGE_SIZE]; + return binding->vec[virt_addr / PAGE_SIZE]; } =20 /*** "Dmabuf devmem memory provider" ***/ diff --git a/net/core/devmem.h b/net/core/devmem.h index 0b43a648cd2e..1ea6228e4f40 100644 --- a/net/core/devmem.h +++ b/net/core/devmem.h @@ -63,7 +63,7 @@ struct net_devmem_dmabuf_binding { * address. This array is convenient to map the virtual addresses to * net_iovs in the TX path. */ - struct net_iov **tx_vec; + struct net_iov **vec; =20 struct work_struct unbind_w; }; --=20 2.47.3 From nobody Sun Feb 8 19:56:11 2026 Received: from mail-yw1-f177.google.com (mail-yw1-f177.google.com [209.85.128.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 988CA200110 for ; Thu, 8 Jan 2026 00:58:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.177 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767833925; cv=none; b=RA44tR7OZ5mlEsYyRMcuftD+INosWbSGmniwGGIGJWl/JrCYaidfCMKf6FKiNb9Ny431LyCr8KShCfSV85qw5/i2HZ2VM2wsV4l7nae8Wf0+60NHbdXMkUIjwVPcHNMtYxtagUfnRcWVQcsYrggIhxM/afyZbmPdM3UZs+kKm4Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767833925; c=relaxed/simple; bh=yEzJ3QFqJEMOCSgYnwJO8nTuZMKBVcvq+HA+PHxU/yo=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=qK1LTCvEaTFWu8ryQnUmMQlpzmVeSzzRNHU9/pHSQkiW94dS6BQgZntl6Bu/beSIyWcCUSZQEN2F77cha0UB5OcmF+ZX+me9DiOpCfU/IZKgDOdkHrEg39OmSx/AxmYtS9EXXQ/7EB756mEClNigljXZkHvGA2CInWOaLUQLOwQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=gqWmDDEP; arc=none smtp.client-ip=209.85.128.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="gqWmDDEP" Received: by mail-yw1-f177.google.com with SMTP id 00721157ae682-78fb7ab7562so25074317b3.2 for ; Wed, 07 Jan 2026 16:58:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1767833921; x=1768438721; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=lxcwUUyeLF3N634r0h3gWr//mt0sjdpsyX6Na76fXWE=; b=gqWmDDEPzvDw3i1jVToG2tazNKarZJzsS929m/vevyyO7dcoGz5fxgkB+PY1Yr6aFO PtmkGLadnFUCuZHsio/RIpTDJC+aArfNi/37oTJ1PPnt7kB+UKZeTwJY40OuKwvLvKni vCF2NL3X6p2Uqkoxm3E8E+H3qhHci6WEnU3lz6oPWxi5xDR4DPsXVYM2i9nvnb6reuYI OKo4oOHfujsv/8YNwfR5XzMpRu8QS96/adf9n7s1tbhVbtfISj12s14WvRhO+J/oZZ+T Lc3NivwqQyZhaiO609f/s1CIO8gaEUqJ50DY9tyy8rF3cZX6DYm+mt7u9qewqVN8RhhD +I8Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1767833921; x=1768438721; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=lxcwUUyeLF3N634r0h3gWr//mt0sjdpsyX6Na76fXWE=; b=vlQdP0XjMreJ9eiXTj5Xdd3+s2eOxJxsDp7qght9m94EcL3AueHpuDYjAvlKkAxTFs JU053a4OELm+mWpt5V2h0fNCzBqrPzMCJ5jK3YDDRrZ0rvVrYIE2ljk4ulbv8KUzJTXk WIpwhJC6GBeeSZ0UgHksu15+9GABKoPDfwOd0VLE+cltQQ5Ns6CbKNzoxscOG6uOBwLw r/jR04CS2nYVcckho6wR/Z2iEgPgzefZ2oqhyBCyWn6o+oFOSEuKIWp8IdIy/iTatLov q1cSxVjzAR31WFGHKN/K6LbBZEEB0hZ+jPJSlhDPTI1h7WLXHD6gnkOhDDpKMX+ZlKEO daSA== X-Forwarded-Encrypted: i=1; AJvYcCWAEDdIO/o1hqU3xQp6/n8dyQjcTG+NjgLffFllos/Gv9Boie1ofVWTFvYAIQwr/WUpfahbnth8TCTCRcs=@vger.kernel.org X-Gm-Message-State: AOJu0Yzi1eXx2xpWKF2PAauWhwGwh+V6e0KdrxYatXfQBGls2zbcAdBl 41OQfLgrUzLw9BhwbrFbiw4qk9K1y/vbrUhlDU8hfdYA7zogKEF2/YpF0z6ZQw== X-Gm-Gg: AY/fxX5x2+xc/1rJLtdO33g9ioazpP0tXGJw5sXElBleLomFB0suiLkfT/hUv6iIknz F59sB7qvA1Jgnj4xHIni8ksq6RoUoqnzzOPacl37le4gJh4XIL0TfUrzVlQoeCF6vqFwTRPXBRf eWj54AclyjW9ZIzX3FY5IUZOGhAU5XDQPJB5qF1kkl/cbXQeVfMT6tXNMPTG/oBGfzvLivyvUKZ wBM4lKknlm3qMMUdTc4mMwCW+lQPdgfImqAaPU3gD7qVgeYSU3i8HgID+Kx560MFFkKvsv2NWlP TZBJuey7P2Vt9qKW/wp6g/+fR+C6WsxNs5jk6UnOmJAdvLhbEqHTewQhHzoz+Wy4xEGaVgn1wUa hHvNuhampClIP2uljk6KFcWfH0RtwxyLNDSearMsww//lFGwKeZdtGaXrZnUS6MhzzXaFPX40+2 cDus/kiqDgluRrxpSb0F7A X-Google-Smtp-Source: AGHT+IFHk0bBnJUnAJm0mlyVWIOTUg1tPnVZCbcNGWEk8FS2m/HgVqpWwGTUKe/5Fp4+/MYhz2UfHg== X-Received: by 2002:a05:690c:4910:b0:78f:bede:57c0 with SMTP id 00721157ae682-790b5758118mr90008177b3.23.1767833921038; Wed, 07 Jan 2026 16:58:41 -0800 (PST) Received: from localhost ([2a03:2880:25ff:44::]) by smtp.gmail.com with ESMTPSA id 956f58d0204a3-6470d89d4c9sm2728934d50.14.2026.01.07.16.58.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 07 Jan 2026 16:58:40 -0800 (PST) From: Bobby Eshleman Date: Wed, 07 Jan 2026 16:57:36 -0800 Subject: [PATCH net-next v8 2/5] net: devmem: refactor sock_devmem_dontneed for autorelease split Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260107-scratch-bobbyeshleman-devmem-tcp-token-upstream-v8-2-92c968631496@meta.com> References: <20260107-scratch-bobbyeshleman-devmem-tcp-token-upstream-v8-0-92c968631496@meta.com> In-Reply-To: <20260107-scratch-bobbyeshleman-devmem-tcp-token-upstream-v8-0-92c968631496@meta.com> To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Simon Horman , Kuniyuki Iwashima , Willem de Bruijn , Neal Cardwell , David Ahern , Arnd Bergmann , Jonathan Corbet , Andrew Lunn , Shuah Khan , Donald Hunter , Mina Almasry Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, Stanislav Fomichev , asml.silence@gmail.com, matttbe@kernel.org, skhawaja@google.com, Bobby Eshleman X-Mailer: b4 0.14.3 From: Bobby Eshleman Refactor sock_devmem_dontneed() in preparation for supporting both autorelease and manual token release modes. Split the function into two parts: - sock_devmem_dontneed(): handles input validation, token allocation, and copying from userspace - sock_devmem_dontneed_autorelease(): performs the actual token release via xarray lookup and page pool put This separation allows a future commit to add a parallel sock_devmem_dontneed_manual_release() function that uses a different token tracking mechanism (per-niov reference counting) without duplicating the input validation logic. The refactoring is purely mechanical with no functional change. Only intended to minimize the noise in subsequent patches. Reviewed-by: Mina Almasry Signed-off-by: Bobby Eshleman --- net/core/sock.c | 52 ++++++++++++++++++++++++++++++++-------------------- 1 file changed, 32 insertions(+), 20 deletions(-) diff --git a/net/core/sock.c b/net/core/sock.c index 45c98bf524b2..a5932719b191 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1082,30 +1082,13 @@ static int sock_reserve_memory(struct sock *sk, int= bytes) #define MAX_DONTNEED_FRAGS 1024 =20 static noinline_for_stack int -sock_devmem_dontneed(struct sock *sk, sockptr_t optval, unsigned int optle= n) +sock_devmem_dontneed_autorelease(struct sock *sk, struct dmabuf_token *tok= ens, + unsigned int num_tokens) { - unsigned int num_tokens, i, j, k, netmem_num =3D 0; - struct dmabuf_token *tokens; + unsigned int i, j, k, netmem_num =3D 0; int ret =3D 0, num_frags =3D 0; netmem_ref netmems[16]; =20 - if (!sk_is_tcp(sk)) - return -EBADF; - - if (optlen % sizeof(*tokens) || - optlen > sizeof(*tokens) * MAX_DONTNEED_TOKENS) - return -EINVAL; - - num_tokens =3D optlen / sizeof(*tokens); - tokens =3D kvmalloc_array(num_tokens, sizeof(*tokens), GFP_KERNEL); - if (!tokens) - return -ENOMEM; - - if (copy_from_sockptr(tokens, optval, optlen)) { - kvfree(tokens); - return -EFAULT; - } - xa_lock_bh(&sk->sk_user_frags); for (i =3D 0; i < num_tokens; i++) { for (j =3D 0; j < tokens[i].token_count; j++) { @@ -1135,6 +1118,35 @@ sock_devmem_dontneed(struct sock *sk, sockptr_t optv= al, unsigned int optlen) for (k =3D 0; k < netmem_num; k++) WARN_ON_ONCE(!napi_pp_put_page(netmems[k])); =20 + return ret; +} + +static noinline_for_stack int +sock_devmem_dontneed(struct sock *sk, sockptr_t optval, unsigned int optle= n) +{ + struct dmabuf_token *tokens; + unsigned int num_tokens; + int ret; + + if (!sk_is_tcp(sk)) + return -EBADF; + + if (optlen % sizeof(*tokens) || + optlen > sizeof(*tokens) * MAX_DONTNEED_TOKENS) + return -EINVAL; + + num_tokens =3D optlen / sizeof(*tokens); + tokens =3D kvmalloc_array(num_tokens, sizeof(*tokens), GFP_KERNEL); + if (!tokens) + return -ENOMEM; + + if (copy_from_sockptr(tokens, optval, optlen)) { + kvfree(tokens); + return -EFAULT; + } + + ret =3D sock_devmem_dontneed_autorelease(sk, tokens, num_tokens); + kvfree(tokens); return ret; } --=20 2.47.3 From nobody Sun Feb 8 19:56:11 2026 Received: from mail-yx1-f42.google.com (mail-yx1-f42.google.com [74.125.224.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DB9AD20A5F3 for ; Thu, 8 Jan 2026 00:58:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=74.125.224.42 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767833928; cv=none; b=LtQBP5VjK0BfdmN3di8cItS7c1CiCdwZqnqlvTTg2m9dPuGlGJCuLXXdTi+Lpkpp/qY/jmedsL+lG9ALX8+4WWEPOj7eemJQ2gxUprz9A9HrYvmTL8fWGRu1gDHLVmq8vX7ipL69npFwf2TDr3amyvjYyS66zEL2zinbS7rCKJE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767833928; c=relaxed/simple; bh=Br7tw+0Lbp1MYDQmPMz7vTvb3HWEOpFxoQgVq2dSqBQ=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=KkD5GeLcFKMrt0stwyXz79dFvCSOUgtS0mvQjkTgeuZZHNGx1aelBKjrAYOf4da6RxQoT4IYpjD7ZTAfrT9ieVy5dBfDa8SfivNO2yJaUN5Dpwlf7S91lrOvpM0XLQDmGAnYjrMbz+t31/IVFb+ZHLBZ+ShDRnk+2A24iAY5dMc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Yq0fYGiy; arc=none smtp.client-ip=74.125.224.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Yq0fYGiy" Received: by mail-yx1-f42.google.com with SMTP id 956f58d0204a3-6446c2bbfe3so2277657d50.1 for ; Wed, 07 Jan 2026 16:58:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1767833922; x=1768438722; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=6XPC+WBEx45kwDnh1tnGdCHUMvTtO1vDbgmrwHCPptE=; b=Yq0fYGiyy9lLwmFgchG2OrAlgNyJVV/cPe1Z2kgEenA2GohoYJ7jE0661eA4cJKgPl Ut2DcDyDYYM7vW7/3pgZQdqxiva5rgpup0mWyva7J46QNP2Ipm+Lx06gsiJfjTVhnUR+ lf6cZAYjxVaSZpL+ERXQMwqUIV7BngcrXUEu+sCXFdpC/BZ0fGsk5NUF4qR4wVHKGPd1 2z1z7aSlmCI9HwzTwJvS/PPlaOtbrSSnyVekzPjRsq8iRuZj1WxvvzHHJ6iivD1Xj0Rl 1CMevbkA1dDaHwnXg/WfFI9kkxILhL2QZoWFlkpAVN5A+K73zD170CSSPvzs8vKO3hGK lO5w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1767833922; x=1768438722; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=6XPC+WBEx45kwDnh1tnGdCHUMvTtO1vDbgmrwHCPptE=; b=SsV+WpdL19QrsFqxpTSG5ZfVbNKQYtkBvo4QDeZToCxO7N/0Lhj3fVnurxzAYHwbUj CI5L+QNkfnD6U1XTkhyjQXR8P8c+kU7kER4PcJE4Uxcy2ZtIRL8YSDJdqM+wG5v8SzPN /Z3IPXE8DDX2g+7tNGKg1qjzt7kFJPrJ9s4Lt5KZaooKyF4P8krOkrvWGm0mxLmKEtgL e7rbR8zDfnIQ2eMmmHR1Okk5JOBrrHKv31MZ9LNs4bAGPd21g/P5SCdgeaWcYh4+LBBQ w7wMlVQCPeeCTHvh+vvfU0lBtXKSRagh+Iu7HjfSJdyhPqRtFMIrvFKOBPkNmQBGEPVz LDug== X-Forwarded-Encrypted: i=1; AJvYcCW5YEQReoe/PBBBJuFPcOCPohIoCRt2wCG0YXrIqJhyia/jfSMR+UfMepIH/Rn4ts84es++7G6GO4YSWBo=@vger.kernel.org X-Gm-Message-State: AOJu0YxCJZ8eYmyDb23AHseCorNxgJ2GU/nVA60IrUd00lpO4+iUOcsj pX2t1s3UztyIIWXm82Kq9My8SOlp77JX4q1vJ4QDOKbHR1KBqa1nj7wM/09oZw== X-Gm-Gg: AY/fxX4x85MaXZHC0jmNM32HZ1NQyFgtODNS1goALPPjxMbVcM9JHUBxy1jUe2AuV13 YoJm3vW013p3b60MXlWEjnnbLGAI2rkx8/jXcauE5frbHt5mvY1a6O1usAFe71mMeljV937GGVw +LYqrs1qNFIj9h+swtCQZijheRiPEwqQ+0U2QN0WD/gY7FER2+b6ELR7a6e0pjx9ejR4U5KwaRE CTpYvRqGw6TkzAHhgzOZQb5fd5ubFF/1UqEP/Cs0NWV2Z5KT1AjW6qbCbwYYl/VIMJ5AyZit+7G 1qK8Rh9HCvBmhFhD9I+Yy9Bm/F2Q4KVXNJtPJPU131WMd0J+X4IQMSNW6Mm1xqkmmpretq1cz7v 4n+m2MP9JW3zgqdly3PWglcLgKgcbqUcrV8hpzh97MAT3DGsFAlxqh/Gtc9arA3lMDryq1WF23n xvqgd5FmlKHYW3KagsuBM= X-Google-Smtp-Source: AGHT+IGAKFsEFNAlp3Bk2ojegfayxyWJkOC/hzI/KxiSekmp/g20HXpOGwWhNNw4USn1MdBCyCiszg== X-Received: by 2002:a05:690e:191d:b0:641:718:8a0d with SMTP id 956f58d0204a3-64716c89c21mr3606920d50.65.1767833922035; Wed, 07 Jan 2026 16:58:42 -0800 (PST) Received: from localhost ([2a03:2880:25ff:b::]) by smtp.gmail.com with ESMTPSA id 956f58d0204a3-6470d81260asm2716702d50.10.2026.01.07.16.58.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 07 Jan 2026 16:58:41 -0800 (PST) From: Bobby Eshleman Date: Wed, 07 Jan 2026 16:57:37 -0800 Subject: [PATCH net-next v8 3/5] net: devmem: implement autorelease token management Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260107-scratch-bobbyeshleman-devmem-tcp-token-upstream-v8-3-92c968631496@meta.com> References: <20260107-scratch-bobbyeshleman-devmem-tcp-token-upstream-v8-0-92c968631496@meta.com> In-Reply-To: <20260107-scratch-bobbyeshleman-devmem-tcp-token-upstream-v8-0-92c968631496@meta.com> To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Simon Horman , Kuniyuki Iwashima , Willem de Bruijn , Neal Cardwell , David Ahern , Arnd Bergmann , Jonathan Corbet , Andrew Lunn , Shuah Khan , Donald Hunter , Mina Almasry Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, Stanislav Fomichev , asml.silence@gmail.com, matttbe@kernel.org, skhawaja@google.com, Bobby Eshleman X-Mailer: b4 0.14.3 From: Bobby Eshleman Add support for autorelease toggling of tokens using a static branch to control system-wide behavior. This allows applications to choose between two memory management modes: 1. Autorelease on: Leaked tokens are automatically released when the socket closes. 2. Autorelease off: Leaked tokens are released during dmabuf unbind. The autorelease mode is requested via the NETDEV_A_DMABUF_AUTORELEASE attribute of the NETDEV_CMD_BIND_RX message. Having separate modes per binding is disallowed and is rejected by netlink. The system will be "locked" into the mode that the first binding is set to. It can only be changed again once there are zero bindings on the system. Disabling autorelease offers ~13% improvement in CPU utilization. Static branching is used to limit the system to one mode or the other. Signed-off-by: Bobby Eshleman --- Changes in v8: - Only reset static key when bindings go to zero, defaulting back to disabled (Stan). - Fix bad usage of xarray spinlock for sleepy static branch switching, use mutex instead. - Access pp_ref_count via niov->desc instead of niov directly. - Move reset of static key to __net_devmem_dmabuf_binding_free() so that the static key can not be changed while there are outstanding tokens (free is only called when reference count reaches zero). - Add net_devmem_dmabuf_rx_bindings_count because tokens may be active even after xa_erase(), so static key changes must wait until all RX bindings are finally freed (not just when xarray is empty). A counter is a simple way to track this. - socket takes reference on the binding, to avoid use-after-free on sk_devmem_info.binding in the case that user releases all tokens, unbinds, then issues SO_DEVMEM_DONTNEED again (with bad token). - removed some comments that were unnecessary Changes in v7: - implement autorelease with static branch (Stan) - use netlink instead of sockopt (Stan) - merge uAPI and implementation patches into one patch (seemed less confusing) Changes in v6: - remove sk_devmem_info.autorelease, using binding->autorelease instead - move binding->autorelease check to outside of net_devmem_dmabuf_binding_put_urefs() (Mina) - remove overly defensive net_is_devmem_iov() (Mina) - add comment about multiple urefs mapping to a single netmem ref (Mina) - remove overly defense netmem NULL and netmem_is_net_iov checks (Mina) - use niov without casting back and forth with netmem (Mina) - move the autorelease flag from per-binding to per-socket (Mina) - remove the batching logic in sock_devmem_dontneed_manual_release() (Mina) - move autorelease check inside tcp_xa_pool_commit() (Mina) - remove single-binding restriction for autorelease mode (Mina) - unbind always checks for leaked urefs Changes in v5: - remove unused variables - introduce autorelease flag, preparing for future patch toggle new behavior Changes in v3: - make urefs per-binding instead of per-socket, reducing memory footprint - fallback to cleaning up references in dmabuf unbind if socket leaked tokens - drop ethtool patch Changes in v2: - always use GFP_ZERO for binding->vec (Mina) - remove WARN for changed binding (Mina) - remove extraneous binding ref get (Mina) - remove WARNs on invalid user input (Mina) - pre-assign niovs in binding->vec for RX case (Mina) - use atomic_set(, 0) to initialize sk_user_frags.urefs - fix length of alloc for urefs --- Documentation/netlink/specs/netdev.yaml | 12 ++++ include/net/netmem.h | 1 + include/net/sock.h | 7 ++- include/uapi/linux/netdev.h | 1 + net/core/devmem.c | 102 ++++++++++++++++++++++++++++= ---- net/core/devmem.h | 11 +++- net/core/netdev-genl-gen.c | 5 +- net/core/netdev-genl.c | 10 +++- net/core/sock.c | 57 ++++++++++++++++-- net/ipv4/tcp.c | 76 +++++++++++++++++++----- net/ipv4/tcp_ipv4.c | 11 +++- net/ipv4/tcp_minisocks.c | 3 +- tools/include/uapi/linux/netdev.h | 1 + 13 files changed, 251 insertions(+), 46 deletions(-) diff --git a/Documentation/netlink/specs/netdev.yaml b/Documentation/netlin= k/specs/netdev.yaml index 82bf5cb2617d..913fccca4c4e 100644 --- a/Documentation/netlink/specs/netdev.yaml +++ b/Documentation/netlink/specs/netdev.yaml @@ -562,6 +562,17 @@ attribute-sets: type: u32 checks: min: 1 + - + name: autorelease + doc: | + Token autorelease mode. If true (1), leaked tokens are automatic= ally + released when the socket closes. If false (0), leaked tokens are= only + released when the dmabuf is unbound. Once a binding is created w= ith a + specific mode, all subsequent bindings system-wide must use the = same + mode. + + Optional. Defaults to false if not specified. + type: u8 =20 operations: list: @@ -767,6 +778,7 @@ operations: - ifindex - fd - queues + - autorelease reply: attributes: - id diff --git a/include/net/netmem.h b/include/net/netmem.h index 9e10f4ac50c3..80d2263ba4ed 100644 --- a/include/net/netmem.h +++ b/include/net/netmem.h @@ -112,6 +112,7 @@ struct net_iov { }; struct net_iov_area *owner; enum net_iov_type type; + atomic_t uref; }; =20 struct net_iov_area { diff --git a/include/net/sock.h b/include/net/sock.h index aafe8bdb2c0f..9d3d5bde15e9 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -352,7 +352,7 @@ struct sk_filter; * @sk_scm_rights: flagged by SO_PASSRIGHTS to recv SCM_RIGHTS * @sk_scm_unused: unused flags for scm_recv() * @ns_tracker: tracker for netns reference - * @sk_user_frags: xarray of pages the user is holding a reference on. + * @sk_devmem_info: the devmem binding information for the socket * @sk_owner: reference to the real owner of the socket that calls * sock_lock_init_class_and_name(). */ @@ -584,7 +584,10 @@ struct sock { struct numa_drop_counters *sk_drop_counters; struct rcu_head sk_rcu; netns_tracker ns_tracker; - struct xarray sk_user_frags; + struct { + struct xarray frags; + struct net_devmem_dmabuf_binding *binding; + } sk_devmem_info; =20 #if IS_ENABLED(CONFIG_PROVE_LOCKING) && IS_ENABLED(CONFIG_MODULES) struct module *sk_owner; diff --git a/include/uapi/linux/netdev.h b/include/uapi/linux/netdev.h index e0b579a1df4f..1e5c209cb998 100644 --- a/include/uapi/linux/netdev.h +++ b/include/uapi/linux/netdev.h @@ -207,6 +207,7 @@ enum { NETDEV_A_DMABUF_QUEUES, NETDEV_A_DMABUF_FD, NETDEV_A_DMABUF_ID, + NETDEV_A_DMABUF_AUTORELEASE, =20 __NETDEV_A_DMABUF_MAX, NETDEV_A_DMABUF_MAX =3D (__NETDEV_A_DMABUF_MAX - 1) diff --git a/net/core/devmem.c b/net/core/devmem.c index 05a9a9e7abb9..6961f8386004 100644 --- a/net/core/devmem.c +++ b/net/core/devmem.c @@ -11,6 +11,7 @@ #include #include #include +#include #include #include #include @@ -28,6 +29,19 @@ =20 static DEFINE_XARRAY_FLAGS(net_devmem_dmabuf_bindings, XA_FLAGS_ALLOC1); =20 +/* If the user unbinds before releasing all tokens, the static key must not + * change until all tokens have been released (to avoid calling the wrong + * SO_DEVMEM_DONTNEED handler). We prevent this by making static key chang= es + * and binding alloc/free atomic with regards to each other, using the + * devmem_ar_lock. This works because binding free does not occur until al= l of + * the outstanding token's references on the binding are dropped. + */ +static DEFINE_MUTEX(devmem_ar_lock); + +DEFINE_STATIC_KEY_FALSE(tcp_devmem_ar_key); +EXPORT_SYMBOL(tcp_devmem_ar_key); +static int net_devmem_dmabuf_rx_bindings_count; + static const struct memory_provider_ops dmabuf_devmem_ops; =20 bool net_is_devmem_iov(struct net_iov *niov) @@ -60,6 +74,12 @@ void __net_devmem_dmabuf_binding_free(struct work_struct= *wq) =20 size_t size, avail; =20 + mutex_lock(&devmem_ar_lock); + net_devmem_dmabuf_rx_bindings_count--; + if (net_devmem_dmabuf_rx_bindings_count =3D=3D 0) + static_branch_disable(&tcp_devmem_ar_key); + mutex_unlock(&devmem_ar_lock); + gen_pool_for_each_chunk(binding->chunk_pool, net_devmem_dmabuf_free_chunk_owner, NULL); =20 @@ -116,6 +136,24 @@ void net_devmem_free_dmabuf(struct net_iov *niov) gen_pool_free(binding->chunk_pool, dma_addr, PAGE_SIZE); } =20 +static void +net_devmem_dmabuf_binding_put_urefs(struct net_devmem_dmabuf_binding *bind= ing) +{ + int i; + + for (i =3D 0; i < binding->dmabuf->size / PAGE_SIZE; i++) { + struct net_iov *niov; + netmem_ref netmem; + + niov =3D binding->vec[i]; + netmem =3D net_iov_to_netmem(niov); + + /* Multiple urefs map to only a single netmem ref. */ + if (atomic_xchg(&niov->uref, 0) > 0) + WARN_ON_ONCE(!napi_pp_put_page(netmem)); + } +} + void net_devmem_unbind_dmabuf(struct net_devmem_dmabuf_binding *binding) { struct netdev_rx_queue *rxq; @@ -143,6 +181,7 @@ void net_devmem_unbind_dmabuf(struct net_devmem_dmabuf_= binding *binding) __net_mp_close_rxq(binding->dev, rxq_idx, &mp_params); } =20 + net_devmem_dmabuf_binding_put_urefs(binding); net_devmem_dmabuf_binding_put(binding); } =20 @@ -179,8 +218,10 @@ struct net_devmem_dmabuf_binding * net_devmem_bind_dmabuf(struct net_device *dev, struct device *dma_dev, enum dma_data_direction direction, - unsigned int dmabuf_fd, struct netdev_nl_sock *priv, - struct netlink_ext_ack *extack) + unsigned int dmabuf_fd, + struct netdev_nl_sock *priv, + struct netlink_ext_ack *extack, + bool autorelease) { struct net_devmem_dmabuf_binding *binding; static u32 id_alloc_next; @@ -231,14 +272,12 @@ net_devmem_bind_dmabuf(struct net_device *dev, goto err_detach; } =20 - if (direction =3D=3D DMA_TO_DEVICE) { - binding->vec =3D kvmalloc_array(dmabuf->size / PAGE_SIZE, - sizeof(struct net_iov *), - GFP_KERNEL); - if (!binding->vec) { - err =3D -ENOMEM; - goto err_unmap; - } + binding->vec =3D kvmalloc_array(dmabuf->size / PAGE_SIZE, + sizeof(struct net_iov *), + GFP_KERNEL | __GFP_ZERO); + if (!binding->vec) { + err =3D -ENOMEM; + goto err_unmap; } =20 /* For simplicity we expect to make PAGE_SIZE allocations, but the @@ -292,25 +331,62 @@ net_devmem_bind_dmabuf(struct net_device *dev, niov =3D &owner->area.niovs[i]; niov->type =3D NET_IOV_DMABUF; niov->owner =3D &owner->area; + atomic_set(&niov->uref, 0); page_pool_set_dma_addr_netmem(net_iov_to_netmem(niov), net_devmem_get_dma_addr(niov)); - if (direction =3D=3D DMA_TO_DEVICE) - binding->vec[owner->area.base_virtual / PAGE_SIZE + i] =3D niov; + binding->vec[owner->area.base_virtual / PAGE_SIZE + i] =3D niov; } =20 virtual +=3D len; } =20 + mutex_lock(&devmem_ar_lock); + + if (direction =3D=3D DMA_FROM_DEVICE) { + if (net_devmem_dmabuf_rx_bindings_count > 0) { + bool mode; + + mode =3D static_key_enabled(&tcp_devmem_ar_key); + + /* When bindings exist, enforce that the mode does not + * change. + */ + if (mode !=3D autorelease) { + NL_SET_ERR_MSG_FMT(extack, + "System already configured with autorelease=3D%d", + mode); + err =3D -EINVAL; + goto err_unlock_mutex; + } + } else if (autorelease) { + /* First binding with autorelease enabled sets the + * mode. If autorelease is false, the key is already + * disabled by default so no action is needed. + */ + static_branch_enable(&tcp_devmem_ar_key); + } + + net_devmem_dmabuf_rx_bindings_count++; + } + err =3D xa_alloc_cyclic(&net_devmem_dmabuf_bindings, &binding->id, binding, xa_limit_32b, &id_alloc_next, GFP_KERNEL); if (err < 0) - goto err_free_chunks; + goto err_dec_binding_count; + + mutex_unlock(&devmem_ar_lock); =20 list_add(&binding->list, &priv->bindings); =20 return binding; =20 +err_dec_binding_count: + if (direction =3D=3D DMA_FROM_DEVICE) + net_devmem_dmabuf_rx_bindings_count--; + +err_unlock_mutex: + mutex_unlock(&devmem_ar_lock); err_free_chunks: gen_pool_for_each_chunk(binding->chunk_pool, net_devmem_dmabuf_free_chunk_owner, NULL); diff --git a/net/core/devmem.h b/net/core/devmem.h index 1ea6228e4f40..33e85ff5f35e 100644 --- a/net/core/devmem.h +++ b/net/core/devmem.h @@ -12,9 +12,13 @@ =20 #include #include +#include =20 struct netlink_ext_ack; =20 +/* static key for TCP devmem autorelease */ +extern struct static_key_false tcp_devmem_ar_key; + struct net_devmem_dmabuf_binding { struct dma_buf *dmabuf; struct dma_buf_attachment *attachment; @@ -61,7 +65,7 @@ struct net_devmem_dmabuf_binding { =20 /* Array of net_iov pointers for this binding, sorted by virtual * address. This array is convenient to map the virtual addresses to - * net_iovs in the TX path. + * net_iovs. */ struct net_iov **vec; =20 @@ -88,7 +92,7 @@ net_devmem_bind_dmabuf(struct net_device *dev, struct device *dma_dev, enum dma_data_direction direction, unsigned int dmabuf_fd, struct netdev_nl_sock *priv, - struct netlink_ext_ack *extack); + struct netlink_ext_ack *extack, bool autorelease); struct net_devmem_dmabuf_binding *net_devmem_lookup_dmabuf(u32 id); void net_devmem_unbind_dmabuf(struct net_devmem_dmabuf_binding *binding); int net_devmem_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx, @@ -174,7 +178,8 @@ net_devmem_bind_dmabuf(struct net_device *dev, enum dma_data_direction direction, unsigned int dmabuf_fd, struct netdev_nl_sock *priv, - struct netlink_ext_ack *extack) + struct netlink_ext_ack *extack, + bool autorelease) { return ERR_PTR(-EOPNOTSUPP); } diff --git a/net/core/netdev-genl-gen.c b/net/core/netdev-genl-gen.c index ba673e81716f..01b7765e11ec 100644 --- a/net/core/netdev-genl-gen.c +++ b/net/core/netdev-genl-gen.c @@ -86,10 +86,11 @@ static const struct nla_policy netdev_qstats_get_nl_pol= icy[NETDEV_A_QSTATS_SCOPE }; =20 /* NETDEV_CMD_BIND_RX - do */ -static const struct nla_policy netdev_bind_rx_nl_policy[NETDEV_A_DMABUF_FD= + 1] =3D { +static const struct nla_policy netdev_bind_rx_nl_policy[NETDEV_A_DMABUF_AU= TORELEASE + 1] =3D { [NETDEV_A_DMABUF_IFINDEX] =3D NLA_POLICY_MIN(NLA_U32, 1), [NETDEV_A_DMABUF_FD] =3D { .type =3D NLA_U32, }, [NETDEV_A_DMABUF_QUEUES] =3D NLA_POLICY_NESTED(netdev_queue_id_nl_policy), + [NETDEV_A_DMABUF_AUTORELEASE] =3D { .type =3D NLA_U8, }, }; =20 /* NETDEV_CMD_NAPI_SET - do */ @@ -188,7 +189,7 @@ static const struct genl_split_ops netdev_nl_ops[] =3D { .cmd =3D NETDEV_CMD_BIND_RX, .doit =3D netdev_nl_bind_rx_doit, .policy =3D netdev_bind_rx_nl_policy, - .maxattr =3D NETDEV_A_DMABUF_FD, + .maxattr =3D NETDEV_A_DMABUF_AUTORELEASE, .flags =3D GENL_ADMIN_PERM | GENL_CMD_CAP_DO, }, { diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c index 470fabbeacd9..c742bb34865e 100644 --- a/net/core/netdev-genl.c +++ b/net/core/netdev-genl.c @@ -939,6 +939,7 @@ int netdev_nl_bind_rx_doit(struct sk_buff *skb, struct = genl_info *info) struct netdev_nl_sock *priv; struct net_device *netdev; unsigned long *rxq_bitmap; + bool autorelease =3D false; struct device *dma_dev; struct sk_buff *rsp; int err =3D 0; @@ -952,6 +953,10 @@ int netdev_nl_bind_rx_doit(struct sk_buff *skb, struct= genl_info *info) ifindex =3D nla_get_u32(info->attrs[NETDEV_A_DEV_IFINDEX]); dmabuf_fd =3D nla_get_u32(info->attrs[NETDEV_A_DMABUF_FD]); =20 + if (info->attrs[NETDEV_A_DMABUF_AUTORELEASE]) + autorelease =3D + !!nla_get_u8(info->attrs[NETDEV_A_DMABUF_AUTORELEASE]); + priv =3D genl_sk_priv_get(&netdev_nl_family, NETLINK_CB(skb).sk); if (IS_ERR(priv)) return PTR_ERR(priv); @@ -1002,7 +1007,8 @@ int netdev_nl_bind_rx_doit(struct sk_buff *skb, struc= t genl_info *info) } =20 binding =3D net_devmem_bind_dmabuf(netdev, dma_dev, DMA_FROM_DEVICE, - dmabuf_fd, priv, info->extack); + dmabuf_fd, priv, info->extack, + autorelease); if (IS_ERR(binding)) { err =3D PTR_ERR(binding); goto err_rxq_bitmap; @@ -1097,7 +1103,7 @@ int netdev_nl_bind_tx_doit(struct sk_buff *skb, struc= t genl_info *info) =20 dma_dev =3D netdev_queue_get_dma_dev(netdev, 0); binding =3D net_devmem_bind_dmabuf(netdev, dma_dev, DMA_TO_DEVICE, - dmabuf_fd, priv, info->extack); + dmabuf_fd, priv, info->extack, false); if (IS_ERR(binding)) { err =3D PTR_ERR(binding); goto err_unlock_netdev; diff --git a/net/core/sock.c b/net/core/sock.c index a5932719b191..7f9ed965977b 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -87,6 +87,7 @@ =20 #include #include +#include #include #include #include @@ -151,6 +152,7 @@ #include =20 #include "dev.h" +#include "devmem.h" =20 static DEFINE_MUTEX(proto_list_mutex); static LIST_HEAD(proto_list); @@ -1081,6 +1083,44 @@ static int sock_reserve_memory(struct sock *sk, int = bytes) #define MAX_DONTNEED_TOKENS 128 #define MAX_DONTNEED_FRAGS 1024 =20 +static noinline_for_stack int +sock_devmem_dontneed_manual_release(struct sock *sk, + struct dmabuf_token *tokens, + unsigned int num_tokens) +{ + struct net_iov *niov; + unsigned int i, j; + netmem_ref netmem; + unsigned int token; + int num_frags =3D 0; + int ret =3D 0; + + if (!sk->sk_devmem_info.binding) + return -EINVAL; + + for (i =3D 0; i < num_tokens; i++) { + for (j =3D 0; j < tokens[i].token_count; j++) { + size_t size =3D sk->sk_devmem_info.binding->dmabuf->size; + + token =3D tokens[i].token_start + j; + if (token >=3D size / PAGE_SIZE) + break; + + if (++num_frags > MAX_DONTNEED_FRAGS) + return ret; + + niov =3D sk->sk_devmem_info.binding->vec[token]; + if (atomic_dec_and_test(&niov->uref)) { + netmem =3D net_iov_to_netmem(niov); + WARN_ON_ONCE(!napi_pp_put_page(netmem)); + } + ret++; + } + } + + return ret; +} + static noinline_for_stack int sock_devmem_dontneed_autorelease(struct sock *sk, struct dmabuf_token *tok= ens, unsigned int num_tokens) @@ -1089,32 +1129,33 @@ sock_devmem_dontneed_autorelease(struct sock *sk, s= truct dmabuf_token *tokens, int ret =3D 0, num_frags =3D 0; netmem_ref netmems[16]; =20 - xa_lock_bh(&sk->sk_user_frags); + xa_lock_bh(&sk->sk_devmem_info.frags); for (i =3D 0; i < num_tokens; i++) { for (j =3D 0; j < tokens[i].token_count; j++) { if (++num_frags > MAX_DONTNEED_FRAGS) goto frag_limit_reached; =20 netmem_ref netmem =3D (__force netmem_ref)__xa_erase( - &sk->sk_user_frags, tokens[i].token_start + j); + &sk->sk_devmem_info.frags, + tokens[i].token_start + j); =20 if (!netmem || WARN_ON_ONCE(!netmem_is_net_iov(netmem))) continue; =20 netmems[netmem_num++] =3D netmem; if (netmem_num =3D=3D ARRAY_SIZE(netmems)) { - xa_unlock_bh(&sk->sk_user_frags); + xa_unlock_bh(&sk->sk_devmem_info.frags); for (k =3D 0; k < netmem_num; k++) WARN_ON_ONCE(!napi_pp_put_page(netmems[k])); netmem_num =3D 0; - xa_lock_bh(&sk->sk_user_frags); + xa_lock_bh(&sk->sk_devmem_info.frags); } ret++; } } =20 frag_limit_reached: - xa_unlock_bh(&sk->sk_user_frags); + xa_unlock_bh(&sk->sk_devmem_info.frags); for (k =3D 0; k < netmem_num; k++) WARN_ON_ONCE(!napi_pp_put_page(netmems[k])); =20 @@ -1145,7 +1186,11 @@ sock_devmem_dontneed(struct sock *sk, sockptr_t optv= al, unsigned int optlen) return -EFAULT; } =20 - ret =3D sock_devmem_dontneed_autorelease(sk, tokens, num_tokens); + if (static_branch_unlikely(&tcp_devmem_ar_key)) + ret =3D sock_devmem_dontneed_autorelease(sk, tokens, num_tokens); + else + ret =3D sock_devmem_dontneed_manual_release(sk, tokens, + num_tokens); =20 kvfree(tokens); return ret; diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index f035440c475a..b6dc4774f707 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -260,6 +260,7 @@ #include #include #include +#include #include #include #include @@ -492,7 +493,8 @@ void tcp_init_sock(struct sock *sk) =20 set_bit(SOCK_SUPPORT_ZC, &sk->sk_socket->flags); sk_sockets_allocated_inc(sk); - xa_init_flags(&sk->sk_user_frags, XA_FLAGS_ALLOC1); + xa_init_flags(&sk->sk_devmem_info.frags, XA_FLAGS_ALLOC1); + sk->sk_devmem_info.binding =3D NULL; } EXPORT_IPV6_MOD(tcp_init_sock); =20 @@ -2424,11 +2426,12 @@ static void tcp_xa_pool_commit_locked(struct sock *= sk, struct tcp_xa_pool *p) =20 /* Commit part that has been copied to user space. */ for (i =3D 0; i < p->idx; i++) - __xa_cmpxchg(&sk->sk_user_frags, p->tokens[i], XA_ZERO_ENTRY, - (__force void *)p->netmems[i], GFP_KERNEL); + __xa_cmpxchg(&sk->sk_devmem_info.frags, p->tokens[i], + XA_ZERO_ENTRY, (__force void *)p->netmems[i], + GFP_KERNEL); /* Rollback what has been pre-allocated and is no longer needed. */ for (; i < p->max; i++) - __xa_erase(&sk->sk_user_frags, p->tokens[i]); + __xa_erase(&sk->sk_devmem_info.frags, p->tokens[i]); =20 p->max =3D 0; p->idx =3D 0; @@ -2436,14 +2439,18 @@ static void tcp_xa_pool_commit_locked(struct sock *= sk, struct tcp_xa_pool *p) =20 static void tcp_xa_pool_commit(struct sock *sk, struct tcp_xa_pool *p) { + /* Skip xarray operations if autorelease is disabled (manual mode) */ + if (!static_branch_unlikely(&tcp_devmem_ar_key)) + return; + if (!p->max) return; =20 - xa_lock_bh(&sk->sk_user_frags); + xa_lock_bh(&sk->sk_devmem_info.frags); =20 tcp_xa_pool_commit_locked(sk, p); =20 - xa_unlock_bh(&sk->sk_user_frags); + xa_unlock_bh(&sk->sk_devmem_info.frags); } =20 static int tcp_xa_pool_refill(struct sock *sk, struct tcp_xa_pool *p, @@ -2454,24 +2461,41 @@ static int tcp_xa_pool_refill(struct sock *sk, stru= ct tcp_xa_pool *p, if (p->idx < p->max) return 0; =20 - xa_lock_bh(&sk->sk_user_frags); + xa_lock_bh(&sk->sk_devmem_info.frags); =20 tcp_xa_pool_commit_locked(sk, p); =20 for (k =3D 0; k < max_frags; k++) { - err =3D __xa_alloc(&sk->sk_user_frags, &p->tokens[k], + err =3D __xa_alloc(&sk->sk_devmem_info.frags, &p->tokens[k], XA_ZERO_ENTRY, xa_limit_31b, GFP_KERNEL); if (err) break; } =20 - xa_unlock_bh(&sk->sk_user_frags); + xa_unlock_bh(&sk->sk_devmem_info.frags); =20 p->max =3D k; p->idx =3D 0; return k ? 0 : err; } =20 +static void tcp_xa_pool_inc_pp_ref_count(struct tcp_xa_pool *tcp_xa_pool, + skb_frag_t *frag) +{ + struct net_iov *niov; + + niov =3D skb_frag_net_iov(frag); + + if (static_branch_unlikely(&tcp_devmem_ar_key)) { + atomic_long_inc(&niov->desc.pp_ref_count); + tcp_xa_pool->netmems[tcp_xa_pool->idx++] =3D + skb_frag_netmem(frag); + } else { + if (atomic_inc_return(&niov->uref) =3D=3D 1) + atomic_long_inc(&niov->desc.pp_ref_count); + } +} + /* On error, returns the -errno. On success, returns number of bytes sent = to the * user. May not consume all of @remaining_len. */ @@ -2533,6 +2557,7 @@ static int tcp_recvmsg_dmabuf(struct sock *sk, const = struct sk_buff *skb, * sequence of cmsg */ for (i =3D 0; i < skb_shinfo(skb)->nr_frags; i++) { + struct net_devmem_dmabuf_binding *binding =3D NULL; skb_frag_t *frag =3D &skb_shinfo(skb)->frags[i]; struct net_iov *niov; u64 frag_offset; @@ -2568,13 +2593,35 @@ static int tcp_recvmsg_dmabuf(struct sock *sk, cons= t struct sk_buff *skb, start; dmabuf_cmsg.frag_offset =3D frag_offset; dmabuf_cmsg.frag_size =3D copy; - err =3D tcp_xa_pool_refill(sk, &tcp_xa_pool, - skb_shinfo(skb)->nr_frags - i); - if (err) + + binding =3D net_devmem_iov_binding(niov); + + if (!sk->sk_devmem_info.binding) { + net_devmem_dmabuf_binding_get(binding); + sk->sk_devmem_info.binding =3D binding; + } + + if (sk->sk_devmem_info.binding !=3D binding) { + err =3D -EFAULT; goto out; + } + + if (static_branch_unlikely(&tcp_devmem_ar_key)) { + err =3D tcp_xa_pool_refill(sk, + &tcp_xa_pool, + skb_shinfo(skb)->nr_frags - i); + if (err) + goto out; + + dmabuf_cmsg.frag_token =3D + tcp_xa_pool.tokens[tcp_xa_pool.idx]; + } else { + dmabuf_cmsg.frag_token =3D + net_iov_virtual_addr(niov) >> PAGE_SHIFT; + } + =20 /* Will perform the exchange later */ - dmabuf_cmsg.frag_token =3D tcp_xa_pool.tokens[tcp_xa_pool.idx]; dmabuf_cmsg.dmabuf_id =3D net_devmem_iov_binding_id(niov); =20 offset +=3D copy; @@ -2587,8 +2634,7 @@ static int tcp_recvmsg_dmabuf(struct sock *sk, const = struct sk_buff *skb, if (err) goto out; =20 - atomic_long_inc(&niov->desc.pp_ref_count); - tcp_xa_pool.netmems[tcp_xa_pool.idx++] =3D skb_frag_netmem(frag); + tcp_xa_pool_inc_pp_ref_count(&tcp_xa_pool, frag); =20 sent +=3D copy; =20 diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index f8a9596e8f4d..7b1b5a17002f 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -89,6 +89,9 @@ =20 #include =20 +#include +#include "../core/devmem.h" + #include =20 #ifdef CONFIG_TCP_MD5SIG @@ -2492,7 +2495,7 @@ static void tcp_release_user_frags(struct sock *sk) unsigned long index; void *netmem; =20 - xa_for_each(&sk->sk_user_frags, index, netmem) + xa_for_each(&sk->sk_devmem_info.frags, index, netmem) WARN_ON_ONCE(!napi_pp_put_page((__force netmem_ref)netmem)); #endif } @@ -2503,7 +2506,11 @@ void tcp_v4_destroy_sock(struct sock *sk) =20 tcp_release_user_frags(sk); =20 - xa_destroy(&sk->sk_user_frags); + xa_destroy(&sk->sk_devmem_info.frags); + if (sk->sk_devmem_info.binding) { + net_devmem_dmabuf_binding_put(sk->sk_devmem_info.binding); + sk->sk_devmem_info.binding =3D NULL; + } =20 trace_tcp_destroy_sock(sk); =20 diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c index bd5462154f97..2aec977f5c12 100644 --- a/net/ipv4/tcp_minisocks.c +++ b/net/ipv4/tcp_minisocks.c @@ -662,7 +662,8 @@ struct sock *tcp_create_openreq_child(const struct sock= *sk, =20 __TCP_INC_STATS(sock_net(sk), TCP_MIB_PASSIVEOPENS); =20 - xa_init_flags(&newsk->sk_user_frags, XA_FLAGS_ALLOC1); + xa_init_flags(&newsk->sk_devmem_info.frags, XA_FLAGS_ALLOC1); + newsk->sk_devmem_info.binding =3D NULL; =20 return newsk; } diff --git a/tools/include/uapi/linux/netdev.h b/tools/include/uapi/linux/n= etdev.h index e0b579a1df4f..1e5c209cb998 100644 --- a/tools/include/uapi/linux/netdev.h +++ b/tools/include/uapi/linux/netdev.h @@ -207,6 +207,7 @@ enum { NETDEV_A_DMABUF_QUEUES, NETDEV_A_DMABUF_FD, NETDEV_A_DMABUF_ID, + NETDEV_A_DMABUF_AUTORELEASE, =20 __NETDEV_A_DMABUF_MAX, NETDEV_A_DMABUF_MAX =3D (__NETDEV_A_DMABUF_MAX - 1) --=20 2.47.3 From nobody Sun Feb 8 19:56:11 2026 Received: from mail-yw1-f193.google.com (mail-yw1-f193.google.com [209.85.128.193]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5DCC12101AE for ; Thu, 8 Jan 2026 00:58:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.193 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767833928; cv=none; b=NWUlvcJBhP9T3tA3djq2hWunvc+/w+94W4Crq2jfnNetmEsJBMLkKT+LiI1uBQfEHYeW5WO8SnjweApYobI6NdeMYnvUiXwgqByUXiyw0vZv95m2x8Ys2eUjeaBKIuMYtElaAqNtLQaa14fRLZOnHE8vPh7HbodCiPmbaB8e3GE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767833928; c=relaxed/simple; bh=6tucM9ZQ59G0xrDfMuItsfF1GTt7N8xzYil/t1gA1PQ=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=STgcAUKVXZQ2BLoU1CH8LAO6QIxJ2IB9iDLG00YhulNf3zWcJYfyEOBb6Qxp84Wm0+R8giZ8x5AsMUGsZomPs56ZwmOPFnJLGxfBRBj6MZS+fwKd0U61hX7C8b0g8y4fIFtUXkCEWFlKNNDrGZ8HD25O8qFbauW/a8lTvNuAZKE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=l2QjA1Ek; arc=none smtp.client-ip=209.85.128.193 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="l2QjA1Ek" Received: by mail-yw1-f193.google.com with SMTP id 00721157ae682-78e6dc6d6d7so29418117b3.3 for ; Wed, 07 Jan 2026 16:58:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1767833923; x=1768438723; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=DXRdbOnUYiudn9GxHVVfvuR3By+3aRth8g9ruHEmquY=; b=l2QjA1Ekya05GYHt01OxB2dyZAG1P5dMJwxi9qkXuZgkJOHcTH+LwfxBKo6PjaRmFr orpLTsra1X/PwHcIkfIUXSCKM9yGMtNWiRbk7AJEnT/cICaHlAe7on3rAMNtFrspKCDP HkLEy09D9hE7YbQWA1D+KBiobO3sAD8vZOqksl4Ed1ZxFC1spnitG4746M+oXJUOtGef Mfvgs5ki/n27Ejv2qoeB29W7ZTHniI27cZP9lE1S9OG0H9aPvvUsbPy3zD7pOxPvV8UT Xlw1cQ03tmZjfvDGPBBT6w7QzdBzsX+Nf8UQ+wDk0S4adMXN6XDS+8UCmb+rMLMCIi19 KPDw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1767833923; x=1768438723; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=DXRdbOnUYiudn9GxHVVfvuR3By+3aRth8g9ruHEmquY=; b=NaOC6v7RDMpULqe4NGS9SwhY6GXD9cLZ9d+p3pWDivTtZJhmA29Ue4WPFiX3I2KMAy SyZ1rUzysphvWxlNJvJkjrr2NNHvH43q5+mJolgyJr/jGRGyY5hgtoMagydzCNyP3/7c UXj9sj/dyUAjHoFUFgy4Mt2pU3HSnYaUjIUeC0EihdTBKKTRt5m3PirsGdz70dCEyFON nJxwikhvBMQBHF/ynrgqYXL33TlW0GpptP51TcMtans5nVoThAhxcAKEm4RedI/2tkdT xcbemB+RqPN11uFy4W0iTsSbiA49blfL55+idMaEXNUfUYyW4ekyBnzw+l64YZVAie5u tU3w== X-Forwarded-Encrypted: i=1; AJvYcCWw1oeg4og1gDvhmZVgEx0g6cR1bLcSNJW5QCTVCY6YMwS7lYe9w/mq7PT8gA4lZkC0k0Y5uxP9jnBirBw=@vger.kernel.org X-Gm-Message-State: AOJu0Yzr00H2XezhIgJSOWGmZq2cvvdba7WbifTu8PAJ9E2l0bsmKduJ QjgFMdUdotnGWLcL1mTANiDj44KkDfeNIF2zGEZDJW9ZpYW7a0mBZgijSV5FO0YN X-Gm-Gg: AY/fxX4EynegYs0OQynFdTsOkyf5pbk9XOHQi+bdD4Jke0kKY/hBdEH8iYaf7D4qb7I fjm4OUd0hD3l4facauMkqxzBzmRwSonVb5lBNp7jrP+6dj8Y08th5yc+DkKP64dOwlfWvrLc8CD lCG6gB/4SZ9+44GKN+Xt6i6kaHq8xpKSomCoLIb+NA/c9IEnlKUV+pr07JTe1JWiML9Zj0IhD4F 3cpPLkr8LJzxudCi0eyh79Yi1HgfG0H1y3FtzgHZOu1AyMMA6STKZBDdS1c9x7HsyIYvaH4LTLH mvRV706ITmcjOjPsFp9Bx6Aig9U2Y/dYwoKpW+COrMtz0EZm8SyXDHElXdryIb9ilKIu+rOa9kO V8O/TljeptiyqznTdClrrYYxlmlfC2UCYzCRX+h0qyrbifNrH265dUVa5eXpJSmPHndPdsH/3Mu Q0sydTq7WB X-Google-Smtp-Source: AGHT+IGSZsqPA4Z0otbmZ3NOTLwk4921dfCJ5SicHC0H7KIg1eX7bEw/y3LBIJTZ2PC9nnY65wqkaQ== X-Received: by 2002:a05:690e:419a:b0:63f:ba88:e905 with SMTP id 956f58d0204a3-64716b99817mr4126568d50.30.1767833922740; Wed, 07 Jan 2026 16:58:42 -0800 (PST) Received: from localhost ([2a03:2880:25ff:4::]) by smtp.gmail.com with ESMTPSA id 956f58d0204a3-6470d89d510sm2724820d50.16.2026.01.07.16.58.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 07 Jan 2026 16:58:42 -0800 (PST) From: Bobby Eshleman Date: Wed, 07 Jan 2026 16:57:38 -0800 Subject: [PATCH net-next v8 4/5] net: devmem: document NETDEV_A_DMABUF_AUTORELEASE netlink attribute Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260107-scratch-bobbyeshleman-devmem-tcp-token-upstream-v8-4-92c968631496@meta.com> References: <20260107-scratch-bobbyeshleman-devmem-tcp-token-upstream-v8-0-92c968631496@meta.com> In-Reply-To: <20260107-scratch-bobbyeshleman-devmem-tcp-token-upstream-v8-0-92c968631496@meta.com> To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Simon Horman , Kuniyuki Iwashima , Willem de Bruijn , Neal Cardwell , David Ahern , Arnd Bergmann , Jonathan Corbet , Andrew Lunn , Shuah Khan , Donald Hunter , Mina Almasry Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, Stanislav Fomichev , asml.silence@gmail.com, matttbe@kernel.org, skhawaja@google.com, Bobby Eshleman X-Mailer: b4 0.14.3 From: Bobby Eshleman Update devmem.rst documentation to describe the autorelease netlink attribute used during RX dmabuf binding. The autorelease attribute is specified at bind-time via the netlink API (NETDEV_CMD_BIND_RX) and controls what happens to outstanding tokens when the socket closes. Document the two token release modes (automatic vs manual), how to configure the binding for autorelease, the perf benefits, new caveats and restrictions, and the way the mode is enforced system-wide. Signed-off-by: Bobby Eshleman --- Changes in v7: - Document netlink instead of sockopt - Mention system-wide locked to one mode --- Documentation/networking/devmem.rst | 70 +++++++++++++++++++++++++++++++++= ++++ 1 file changed, 70 insertions(+) diff --git a/Documentation/networking/devmem.rst b/Documentation/networking= /devmem.rst index a6cd7236bfbd..67c63bc5a7ae 100644 --- a/Documentation/networking/devmem.rst +++ b/Documentation/networking/devmem.rst @@ -235,6 +235,76 @@ can be less than the tokens provided by the user in ca= se of: (a) an internal kernel leak bug. (b) the user passed more than 1024 frags. =20 + +Autorelease Control +~~~~~~~~~~~~~~~~~~~ + +The autorelease mode controls what happens to outstanding tokens (tokens n= ot +released via SO_DEVMEM_DONTNEED) when the socket closes. Autorelease is +configured per-binding at binding creation time via the netlink API:: + + struct netdev_bind_rx_req *req; + struct netdev_bind_rx_rsp *rsp; + struct ynl_sock *ys; + struct ynl_error yerr; + + ys =3D ynl_sock_create(&ynl_netdev_family, &yerr); + + req =3D netdev_bind_rx_req_alloc(); + netdev_bind_rx_req_set_ifindex(req, ifindex); + netdev_bind_rx_req_set_fd(req, dmabuf_fd); + netdev_bind_rx_req_set_autorelease(req, 0); /* 0 =3D manual, 1 =3D auto */ + __netdev_bind_rx_req_set_queues(req, queues, n_queues); + + rsp =3D netdev_bind_rx(ys, req); + + dmabuf_id =3D rsp->id; + +When autorelease is disabled (0): + +- Outstanding tokens are NOT released when the socket closes +- Outstanding tokens are only released when the dmabuf is unbound +- Provides better performance by eliminating xarray overhead (~13% CPU red= uction) +- Kernel tracks tokens via atomic reference counters in net_iov structures + +When autorelease is enabled (1): + +- Outstanding tokens are automatically released when the socket closes +- Backwards compatible behavior +- Kernel tracks tokens in an xarray per socket + +The default is autorelease disabled. + +Important: In both modes, applications should call SO_DEVMEM_DONTNEED to +return tokens as soon as they are done processing. The autorelease setting= only +affects what happens to tokens that are still outstanding when close() is = called. + +The mode is enforced system-wide. Once a binding is created with a specific +autorelease mode, all subsequent bindings system-wide must use the same mo= de. + + +Performance Considerations +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Disabling autorelease provides approximately ~13% CPU utilization improvem= ent +in RX workloads. That said, applications must ensure all tokens are releas= ed +via SO_DEVMEM_DONTNEED before closing the socket, otherwise the backing pa= ges +will remain pinned until the dmabuf is unbound. + + +Caveats +~~~~~~~ + +- Once a system-wide autorelease mode is selected (via the first binding), + all subsequent bindings must use the same mode. Attempts to create bindi= ngs + with a different mode will be rejected with -EINVAL. + +- Applications using manual release mode (autorelease=3D0) must ensure all= tokens + are returned via SO_DEVMEM_DONTNEED before socket close to avoid resource + leaks during the lifetime of the dmabuf binding. Tokens not released bef= ore + close() will only be freed when the dmabuf is unbound. + + TX Interface =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =20 --=20 2.47.3 From nobody Sun Feb 8 19:56:11 2026 Received: from mail-yw1-f180.google.com (mail-yw1-f180.google.com [209.85.128.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4B5EC1F8723 for ; Thu, 8 Jan 2026 00:58:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.180 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767833928; cv=none; b=WcP98lZ33/ZnwimLhhCa12zowyO6adJUsjNT13GP+YnyjYoGnQI+Ubll8C8AkxkP7SH4WJd0jqBvJCtK2vpuoLApjqZhcGx0lhBHGi4A1pDC83wkr+8MVJmVXsZIv4cU9UQrRHW5HjF/qxvq6Px4zt1DjlvlCtqonYDo32AlmxM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767833928; c=relaxed/simple; bh=9zuoNqEVnt8pH9cLp3BgUGvaCFtzCDZEoS7OzsXWbGc=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=nYwYua7j57oYBiSZiA712PbZzogojMLDoH9+/cf0SB87RULtGN9Nu0jRv87mL3weKLcoIyDAcBOWTyT3p0H4m7RZNKjYHJ+d/TWlISGCZUD8MqoFa42A/29kCxuaFNpftS2joVlwiEtZCbBOM02FT4VAVb/BnfJJyI7wKBys3tA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=fyTuC3E4; arc=none smtp.client-ip=209.85.128.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="fyTuC3E4" Received: by mail-yw1-f180.google.com with SMTP id 00721157ae682-78fb6c7874cso31255847b3.0 for ; Wed, 07 Jan 2026 16:58:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1767833923; x=1768438723; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=dJperMXwTkIXRHW5+8MJDDWuaQbzVhWP/yMQKYVBdM8=; b=fyTuC3E4pezY9BYDbGJ1KJwuJzpO/0BJsMEn2b1waW5fV3U5gkWaQ9d8qluUr4S7yJ 5YjoCto2c2c+p0SGQJZ4zTZZx7j+ShCnw5mBUrXFg9fbcrOd04vViPPZjVZajz+ZgCss tzQUi4nFVNhMbScvNgXlI7Y9iPCL0W7+z57FsDKzWCjPliNGT64dMBfqchGx7h/VfNQ+ 2awR3xKwEGP3jY2EFCOa180o6qD4F1Kof4BYn8AKw24vIDKCWPzQcCeAE8LYpJ6jbJgD /XVNguyt/7Yw4aIHoVeNInuF29U4sT7r07iqBswfm15aiSWTPKymT2PUjCTs/6Zf5Ryg GSpw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1767833923; x=1768438723; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=dJperMXwTkIXRHW5+8MJDDWuaQbzVhWP/yMQKYVBdM8=; b=OBtYm26NRASTiicoLlVJald+8ot0qi/vSZxdGBdhiaEc1zPhIsBY5Ku1ugpCbNWIWD q+yZrD/7QNoC7gNE518Y/J499ywvm4LBw6VDRNNI8wPVptWgjuRPwD5Se3nCOnn+D2dd fbxdR8vRtLuCunR0oeEP5tbzzNN8iBnb4vmqISXNiu7rB81/Gt8bTj7j/UGLNSoWjCS7 BaKIAWgPRuVhTxNLal9gC0QkofN/niux2y/VV/8F6Bk1NnTYhRHXYNaeXvuRkvytd3Dh /ctI0sVrpeFS/pvtoJsIkKyxfL9j+0maExCPjQXI2JyHWHIWv1ClzM1US8f1prLcKQxj KOUA== X-Forwarded-Encrypted: i=1; AJvYcCXGkuWiF5KAtaBf+H84Xzm5I7RmHww6Hqn08oh5yC0ThY6hO5jsX7YYEsgTjPpWieRlFNq6hf9G+liBiY4=@vger.kernel.org X-Gm-Message-State: AOJu0YwjSpEPzGBnlnAv8qvATxUj9gQVwCJVFkBPVX9IDnA94Wb7dlRP K/Lf5nvz/7bf3d8Fx270OrrNaSRlQtiNYT6Fz1aEwKN4YLAbw6LUGEsgfLvwuw== X-Gm-Gg: AY/fxX6Vt44TU4Jiv3gVUsCD+HkUUX4cLUBV4lPGYGx64aGU7kVt9+/aIe9QGE4wAoB PdGylREkyywyajmy58ZBjznt68gMEtkovSmgoi/wJemREr5oX5Uk4AfHRbXRRZHkpKJ9sqCA6rl yExioU5zTgyAv+UB6jpDrhjLIiB7I04bYWuGROqkFwPXfYyASEgZQx/viUhp9zmVVLJWumlkJAv d7va/sYTDgTBFK4aGi5mqO7pMKWk2uCxVL7G+JnzHqDrK8Ou1jlCcM3A6T+dNxV/VocAz3VlXJv mgv2SNm3SlKODrAledg/IOYWIMGcViXh4xZRDeiEeHP/GCtGRWnLFVl2obb5RbFk1pViyc7nAPX pkcEJ/xCSkyigklLCYfLJArl/+H2mUnUVsuOgjsqOPh6XFDylLRVu1BvB66yEraiQK0BMm0cgj3 RFTlhEWG/nWopxTbIr4uKU X-Google-Smtp-Source: AGHT+IH/YnHPsvQ1x53oyPoX/9tw6AKgXdHarFn5R1WyPsLd/nAJHMzuOnxNXLvoQ3gkbQA3OMlgNg== X-Received: by 2002:a05:690c:630a:b0:78c:65e7:d226 with SMTP id 00721157ae682-790b567cda7mr46405997b3.32.1767833923479; Wed, 07 Jan 2026 16:58:43 -0800 (PST) Received: from localhost ([2a03:2880:25ff:74::]) by smtp.gmail.com with ESMTPSA id 00721157ae682-790aa55c77bsm24532427b3.3.2026.01.07.16.58.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 07 Jan 2026 16:58:43 -0800 (PST) From: Bobby Eshleman Date: Wed, 07 Jan 2026 16:57:39 -0800 Subject: [PATCH net-next v8 5/5] selftests: drv-net: devmem: add autorelease test Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260107-scratch-bobbyeshleman-devmem-tcp-token-upstream-v8-5-92c968631496@meta.com> References: <20260107-scratch-bobbyeshleman-devmem-tcp-token-upstream-v8-0-92c968631496@meta.com> In-Reply-To: <20260107-scratch-bobbyeshleman-devmem-tcp-token-upstream-v8-0-92c968631496@meta.com> To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Simon Horman , Kuniyuki Iwashima , Willem de Bruijn , Neal Cardwell , David Ahern , Arnd Bergmann , Jonathan Corbet , Andrew Lunn , Shuah Khan , Donald Hunter , Mina Almasry Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, Stanislav Fomichev , asml.silence@gmail.com, matttbe@kernel.org, skhawaja@google.com, Bobby Eshleman X-Mailer: b4 0.14.3 From: Bobby Eshleman Add test case for autorelease. THe test case is the same as the RX test, but enables autorelease. The original RX test is changed to use the -a 0 flag to disable autorelease. TAP version 13 1..4 ok 1 devmem.check_rx ok 2 devmem.check_rx_autorelease ok 3 devmem.check_tx ok 4 devmem.check_tx_chunks Signed-off-by: Bobby Eshleman --- Changes in v8: - removed stale/missing tests Changes in v7: - use autorelease netlink - remove sockopt tests --- tools/testing/selftests/drivers/net/hw/devmem.py | 21 +++++++++++++++++++= -- tools/testing/selftests/drivers/net/hw/ncdevmem.c | 19 +++++++++++++------ 2 files changed, 32 insertions(+), 8 deletions(-) diff --git a/tools/testing/selftests/drivers/net/hw/devmem.py b/tools/testi= ng/selftests/drivers/net/hw/devmem.py index 45c2d49d55b6..dbe696a445bd 100755 --- a/tools/testing/selftests/drivers/net/hw/devmem.py +++ b/tools/testing/selftests/drivers/net/hw/devmem.py @@ -25,7 +25,24 @@ def check_rx(cfg) -> None: =20 port =3D rand_port() socat =3D f"socat -u - TCP{cfg.addr_ipver}:{cfg.baddr}:{port},bind=3D{= cfg.remote_baddr}:{port}" - listen_cmd =3D f"{cfg.bin_local} -l -f {cfg.ifname} -s {cfg.addr} -p {= port} -c {cfg.remote_addr} -v 7" + listen_cmd =3D f"{cfg.bin_local} -l -f {cfg.ifname} -s {cfg.addr} -p {= port} -c {cfg.remote_addr} -v 7 -a 0" + + with bkg(listen_cmd, exit_wait=3DTrue) as ncdevmem: + wait_port_listen(port) + cmd(f"yes $(echo -e \x01\x02\x03\x04\x05\x06) | \ + head -c 1K | {socat}", host=3Dcfg.remote, shell=3DTrue) + + ksft_eq(ncdevmem.ret, 0) + + +@ksft_disruptive +def check_rx_autorelease(cfg) -> None: + require_devmem(cfg) + + port =3D rand_port() + socat =3D f"socat -u - TCP{cfg.addr_ipver}:{cfg.baddr}:{port},bind=3D{= cfg.remote_baddr}:{port}" + listen_cmd =3D f"{cfg.bin_local} -l -f {cfg.ifname} -s {cfg.addr} -p {= port} \ + -c {cfg.remote_addr} -v 7 -a 1" =20 with bkg(listen_cmd, exit_wait=3DTrue) as ncdevmem: wait_port_listen(port) @@ -68,7 +85,7 @@ def main() -> None: cfg.bin_local =3D path.abspath(path.dirname(__file__) + "/ncdevmem= ") cfg.bin_remote =3D cfg.remote.deploy(cfg.bin_local) =20 - ksft_run([check_rx, check_tx, check_tx_chunks], + ksft_run([check_rx, check_rx_autorelease, check_tx, check_tx_chunk= s], args=3D(cfg, )) ksft_exit() =20 diff --git a/tools/testing/selftests/drivers/net/hw/ncdevmem.c b/tools/test= ing/selftests/drivers/net/hw/ncdevmem.c index 3288ed04ce08..406f1771d9ec 100644 --- a/tools/testing/selftests/drivers/net/hw/ncdevmem.c +++ b/tools/testing/selftests/drivers/net/hw/ncdevmem.c @@ -92,6 +92,7 @@ static char *port; static size_t do_validation; static int start_queue =3D -1; static int num_queues =3D -1; +static int devmem_autorelease; static char *ifname; static unsigned int ifindex; static unsigned int dmabuf_id; @@ -679,7 +680,8 @@ static int configure_flow_steering(struct sockaddr_in6 = *server_sin) =20 static int bind_rx_queue(unsigned int ifindex, unsigned int dmabuf_fd, struct netdev_queue_id *queues, - unsigned int n_queue_index, struct ynl_sock **ys) + unsigned int n_queue_index, struct ynl_sock **ys, + int autorelease) { struct netdev_bind_rx_req *req =3D NULL; struct netdev_bind_rx_rsp *rsp =3D NULL; @@ -695,6 +697,7 @@ static int bind_rx_queue(unsigned int ifindex, unsigned= int dmabuf_fd, req =3D netdev_bind_rx_req_alloc(); netdev_bind_rx_req_set_ifindex(req, ifindex); netdev_bind_rx_req_set_fd(req, dmabuf_fd); + netdev_bind_rx_req_set_autorelease(req, autorelease); __netdev_bind_rx_req_set_queues(req, queues, n_queue_index); =20 rsp =3D netdev_bind_rx(*ys, req); @@ -872,7 +875,8 @@ static int do_server(struct memory_buffer *mem) goto err_reset_rss; } =20 - if (bind_rx_queue(ifindex, mem->fd, create_queues(), num_queues, &ys)) { + if (bind_rx_queue(ifindex, mem->fd, create_queues(), num_queues, &ys, + devmem_autorelease)) { pr_err("Failed to bind"); goto err_reset_flow_steering; } @@ -1092,7 +1096,7 @@ int run_devmem_tests(void) goto err_reset_headersplit; } =20 - if (!bind_rx_queue(ifindex, mem->fd, queues, num_queues, &ys)) { + if (!bind_rx_queue(ifindex, mem->fd, queues, num_queues, &ys, 0)) { pr_err("Binding empty queues array should have failed"); goto err_unbind; } @@ -1108,7 +1112,7 @@ int run_devmem_tests(void) goto err_reset_headersplit; } =20 - if (!bind_rx_queue(ifindex, mem->fd, queues, num_queues, &ys)) { + if (!bind_rx_queue(ifindex, mem->fd, queues, num_queues, &ys, 0)) { pr_err("Configure dmabuf with header split off should have failed"); goto err_unbind; } @@ -1124,7 +1128,7 @@ int run_devmem_tests(void) goto err_reset_headersplit; } =20 - if (bind_rx_queue(ifindex, mem->fd, queues, num_queues, &ys)) { + if (bind_rx_queue(ifindex, mem->fd, queues, num_queues, &ys, 0)) { pr_err("Failed to bind"); goto err_reset_headersplit; } @@ -1397,7 +1401,7 @@ int main(int argc, char *argv[]) int is_server =3D 0, opt; int ret, err =3D 1; =20 - while ((opt =3D getopt(argc, argv, "ls:c:p:v:q:t:f:z:")) !=3D -1) { + while ((opt =3D getopt(argc, argv, "ls:c:p:v:q:t:f:z:a:")) !=3D -1) { switch (opt) { case 'l': is_server =3D 1; @@ -1426,6 +1430,9 @@ int main(int argc, char *argv[]) case 'z': max_chunk =3D atoi(optarg); break; + case 'a': + devmem_autorelease =3D atoi(optarg); + break; case '?': fprintf(stderr, "unknown option: %c\n", optopt); break; --=20 2.47.3