From nobody Sun Feb 8 18:44:10 2026 Received: from mail-yw1-f172.google.com (mail-yw1-f172.google.com [209.85.128.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EF48034A76D for ; Fri, 16 Jan 2026 05:03:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768539802; cv=none; b=R/SuV+wdHlyI8CyZLFr7+k+1NqoT7ZGQq+pIkI4fYIOD1+12YKmLOHoQDwojjPAlU6EsBP8euNTy0dV/Xlc1/iMVIaJ1qI2KPAFZrMxxU7n04UWW3QNMkcGrcGbYQNymkuDA+56yX7voN53eAxM0EFjZj5aWKno7FHwJeoXYUNE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768539802; c=relaxed/simple; bh=QnAdtI5+4DFmxFeKdHgtB5jd0gU6PEbg0bZ2tI0AuZ4=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=jell5ChC/67N5ytURL6PNMFNR1mVxM3IUW3RgkaWJzVz35fICfUec2C7i5NH2lEwfqsGfL12QZiKMRS/MKZyJMOga7LUuYzaByVRy4FCp4rxp9cpiOUzwxanKRi8qT24kAmwCEBC7pS+GZyXB6fjRPhkIkaRkfr7tyn38Vqo4pU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=SEzDqk8S; arc=none smtp.client-ip=209.85.128.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="SEzDqk8S" Received: by mail-yw1-f172.google.com with SMTP id 00721157ae682-78c66bdf675so15989647b3.2 for ; Thu, 15 Jan 2026 21:03:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1768539798; x=1769144598; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=ZpPfAIHBzai3/e/zMri/UCW5j7V8nMCJkbV5OCQGhRA=; b=SEzDqk8S9FbFi4okuM3QhTEkgb1CvBaR8TdX18wRjKo3y/8v+5mIWiVLIRsTys8hQ+ Fxp29xgE5cu9G3CTO1dnoKauUbaJ6Fsxvp1yfjRf8DFi/5+iAa46PLKBxJh51Y9VkGox gJ/IqdV/BucrfCk+Lh5nOQGK2Y41f1XyPzodQ6iCoT8i5pbxo9y8lZ0z9y8Ya2V9SPxP FsNUTItvBPFo+XRoX8QfbbjxH3INN+I/j29YOREy248d5eTLufj2nHJzPxm+w9OPOSCX BGzzp3I6cKfuy0weAzH2e+9xe6ZQ8MmQD+idbQBfpdsS7g0Fy/25U8+unSGMCc49GhoP iZ4Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1768539798; x=1769144598; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=ZpPfAIHBzai3/e/zMri/UCW5j7V8nMCJkbV5OCQGhRA=; b=aECgS6twiChoB6gVmkPkPJ3fnB4KtN9MzkecdIgeZPjKZL1VJ4ZUdmCpHny6xzPV3T yqMxTwVeed3EyBZ7990B6eMiZQwnmiZYxJZ3TcoRKMYk89LRx7DCZGS2F1kC51if9Y7Y 2AcatpTyPTL03r3J4djubsPCx62i0/LApW4BaMhcvHKrCu5An7jUr9M5cDvU+3Hg1y7L 8RuM9DejVJvih5twlATcQsYVcBXX3+2U00R4TGP6zT9PpycVZcfO2SXwyW0qF0qclG0G vW0XCaXEclDdauiAfCvrGsgz0ajXGQc8dV0fXD+blXjuX+mUs23q7vf/7ypSli6m2vfJ IkVQ== X-Forwarded-Encrypted: i=1; AJvYcCVE/2Kt8lxFggPfNz56UjPyzSO98ibiLx5+dvOuCcD8XjxqFeIAKUG0v3UIB8JyAUB+nnQ/tgDy9VPjPqo=@vger.kernel.org X-Gm-Message-State: AOJu0Yz7ZVNh5KTstLS985pN2fHa6EmxILj+rKe6i/0qaOa4p20u0nRE Gb/hSCz62S/zi7lY99RBEEpdzg+BBg8bUgQoxGAZPhx3kjhzJ611/8F2 X-Gm-Gg: AY/fxX4VZkp3DNi7YbfyXPiWoyInSvFKB8BVKBd/PmIbq0m4CEeFSuuSuowSfJWoZ6y MMcthXREdjnAIytTLoqFot2+3qp3a4ciwCBbp6pvRLCw8YlgNdd6epa1RpIcUQddPbDT5cEIp3V 7RNk88xUQXKlYMacvgh5/KfDln+ABHC6HxO/XhGFihwZkYgUvMXdRyQX95WXbsrhGgCx4Vv7AoQ 4DmEo96au2q04I4sKUBfCTNfzEIe0/9hAJ3ou0QC2F8nvWC8/1QUe7b51UGY2t6mNReRi05sgSz iKIJGiqDYISRbOgrF2zBLuuo1usAGc+V2rYZ4aqZm+3sVaaRGqQHpcJVZgg3Dkd8Tw4n9s/kjTr KCdnxFe4H/qXFQ4XOWJong42bkZGgBo6MkVIrIXnHUgBI8752iA9V4BNKgWmrwg7iwkR/hAB22u iU8EZ0WnSK X-Received: by 2002:a05:690c:6f10:b0:793:afdd:e63e with SMTP id 00721157ae682-793c544d6ddmr17343937b3.33.1768539798065; Thu, 15 Jan 2026 21:03:18 -0800 (PST) Received: from localhost ([2a03:2880:25ff:d::]) by smtp.gmail.com with ESMTPSA id 00721157ae682-793c66c72aesm5327037b3.11.2026.01.15.21.03.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 15 Jan 2026 21:03:17 -0800 (PST) From: Bobby Eshleman Date: Thu, 15 Jan 2026 21:02:12 -0800 Subject: [PATCH net-next v10 1/5] net: devmem: rename tx_vec to vec in dmabuf binding Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260115-scratch-bobbyeshleman-devmem-tcp-token-upstream-v10-1-686d0af71978@meta.com> References: <20260115-scratch-bobbyeshleman-devmem-tcp-token-upstream-v10-0-686d0af71978@meta.com> In-Reply-To: <20260115-scratch-bobbyeshleman-devmem-tcp-token-upstream-v10-0-686d0af71978@meta.com> To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Simon Horman , Kuniyuki Iwashima , Willem de Bruijn , Neal Cardwell , David Ahern , Mina Almasry , Arnd Bergmann , Jonathan Corbet , Andrew Lunn , Shuah Khan , Donald Hunter Cc: Stanislav Fomichev , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, asml.silence@gmail.com, matttbe@kernel.org, skhawaja@google.com, Bobby Eshleman X-Mailer: b4 0.14.3 From: Bobby Eshleman Rename the 'tx_vec' field in struct net_devmem_dmabuf_binding to 'vec'. This field holds pointers to net_iov structures. The rename prepares for reusing 'vec' for both TX and RX directions. No functional change intended. Reviewed-by: Mina Almasry Signed-off-by: Bobby Eshleman --- net/core/devmem.c | 22 +++++++++++----------- net/core/devmem.h | 2 +- 2 files changed, 12 insertions(+), 12 deletions(-) diff --git a/net/core/devmem.c b/net/core/devmem.c index 185ed2a73d1c..9dee697a28ee 100644 --- a/net/core/devmem.c +++ b/net/core/devmem.c @@ -85,7 +85,7 @@ void __net_devmem_dmabuf_binding_free(struct work_struct = *wq) dma_buf_put(binding->dmabuf); xa_destroy(&binding->bound_rxqs); percpu_ref_exit(&binding->ref); - kvfree(binding->tx_vec); + kvfree(binding->vec); kfree(binding); } =20 @@ -246,10 +246,10 @@ net_devmem_bind_dmabuf(struct net_device *dev, } =20 if (direction =3D=3D DMA_TO_DEVICE) { - binding->tx_vec =3D kvmalloc_array(dmabuf->size / PAGE_SIZE, - sizeof(struct net_iov *), - GFP_KERNEL); - if (!binding->tx_vec) { + binding->vec =3D kvmalloc_array(dmabuf->size / PAGE_SIZE, + sizeof(struct net_iov *), + GFP_KERNEL); + if (!binding->vec) { err =3D -ENOMEM; goto err_unmap; } @@ -263,7 +263,7 @@ net_devmem_bind_dmabuf(struct net_device *dev, dev_to_node(&dev->dev)); if (!binding->chunk_pool) { err =3D -ENOMEM; - goto err_tx_vec; + goto err_vec; } =20 virtual =3D 0; @@ -309,7 +309,7 @@ net_devmem_bind_dmabuf(struct net_device *dev, page_pool_set_dma_addr_netmem(net_iov_to_netmem(niov), net_devmem_get_dma_addr(niov)); if (direction =3D=3D DMA_TO_DEVICE) - binding->tx_vec[owner->area.base_virtual / PAGE_SIZE + i] =3D niov; + binding->vec[owner->area.base_virtual / PAGE_SIZE + i] =3D niov; } =20 virtual +=3D len; @@ -329,8 +329,8 @@ net_devmem_bind_dmabuf(struct net_device *dev, gen_pool_for_each_chunk(binding->chunk_pool, net_devmem_dmabuf_free_chunk_owner, NULL); gen_pool_destroy(binding->chunk_pool); -err_tx_vec: - kvfree(binding->tx_vec); +err_vec: + kvfree(binding->vec); err_unmap: dma_buf_unmap_attachment_unlocked(binding->attachment, binding->sgt, direction); @@ -379,7 +379,7 @@ struct net_devmem_dmabuf_binding *net_devmem_get_bindin= g(struct sock *sk, int err =3D 0; =20 binding =3D net_devmem_lookup_dmabuf(dmabuf_id); - if (!binding || !binding->tx_vec) { + if (!binding || !binding->vec) { err =3D -EINVAL; goto out_err; } @@ -430,7 +430,7 @@ net_devmem_get_niov_at(struct net_devmem_dmabuf_binding= *binding, *off =3D virt_addr % PAGE_SIZE; *size =3D PAGE_SIZE - *off; =20 - return binding->tx_vec[virt_addr / PAGE_SIZE]; + return binding->vec[virt_addr / PAGE_SIZE]; } =20 /*** "Dmabuf devmem memory provider" ***/ diff --git a/net/core/devmem.h b/net/core/devmem.h index 2534c8144212..94874b323520 100644 --- a/net/core/devmem.h +++ b/net/core/devmem.h @@ -63,7 +63,7 @@ struct net_devmem_dmabuf_binding { * address. This array is convenient to map the virtual addresses to * net_iovs in the TX path. */ - struct net_iov **tx_vec; + struct net_iov **vec; =20 struct work_struct unbind_w; }; --=20 2.47.3 From nobody Sun Feb 8 18:44:10 2026 Received: from mail-yw1-f172.google.com (mail-yw1-f172.google.com [209.85.128.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5CFDB349B15 for ; Fri, 16 Jan 2026 05:03:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768539804; cv=none; b=QFPcjZ5FEs3f2n15MTVgXn/q0efXmSSPAy+vLF8FRN4ri9+jEI0/hhI0gzWbWFCxo6vaPwcrT0RNKB65/saITe34jxJ1wGVwB2gW2lM6CkHzffy+E3qmv3MdocJ30mo2COyzkr1s2Z/DMf9sjzr+Z7fiAD5jZvBmxNAjOqyHq/A= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768539804; c=relaxed/simple; bh=/8irlWx212MZNFNuTQquG/o/4qXs4GzUcqgDeIkWkeY=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=EIklChtzK+q6CGhlprvDPVM+m/5RwFlMA/KIjU1Yh9pP5XT9pv59EfVtqux+ODQ4h79qPToqXoOWRjBC1SSf8nLwtrFe+6BpkR2kAsCyxI+zr8AFpVTtZgHtWZUZCZgZCb/ytRaZV+WYnGbjaf+xNjKwScZu1ZEUF5EQPmwTOfA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Anb1f3io; arc=none smtp.client-ip=209.85.128.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Anb1f3io" Received: by mail-yw1-f172.google.com with SMTP id 00721157ae682-78fb9a67b06so16173727b3.1 for ; Thu, 15 Jan 2026 21:03:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1768539799; x=1769144599; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=vJwRH55S+4zS4rc7LIeDbyFWVKjFx7zOxvxC4DTtZFU=; b=Anb1f3iovHiDvlnpirku9R6ubteBBL791YYQz+9ZbGJ0zxCcO9CFupB7IEa4Lkld6b AYZSFNHidrxVDodoR5m6s86G/8TftF/m6AYTQVWQd+5OYr1CEojeyJ1mCBTHCh1Z2Xbq 4OupmM3Ci3yGW5Y3eeSWx6hV3vZXVTiq3FWUc7YvysxhQhrU8NJSqLsKQ1w4yv1w2Cg/ GMnf0VgWpzpqY9LdyhxcX7z3XKRzTB0NjzY/2xDzuobIn4/mE08a0WK4I4XzrxNc/3av JyQKQocsgd/untpIK9cYlIQeYhV5ce9pb6F4qK4eLGzHB3shm6yyygfb0+ahNxmk8KW/ CdiA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1768539799; x=1769144599; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=vJwRH55S+4zS4rc7LIeDbyFWVKjFx7zOxvxC4DTtZFU=; b=sT+MTfYuixVtHL/PDLlJ9407wPsmocBxNu9dWIfsbh6AzzIi7xWJNe7UTmJNwkkZyE SA9Z5mImDfLLgbiy8SZ0q6IcFGwETH8GfqG/9V/qMjYaxX2QA+Z1ZfRUl8E7xAe13dzX JRtybv+cI9UZ4hBtr4GkTIWxZMIdPzYkvdsPNxIBJcXhShS08FY+1A5IN6Aglr9pM0Wz Pi6Ysn1ggej4fjRQqojxtjKQj5jMoCIRqK/kTyYHLCBxEloTLzrGMvK8JaSsGVSK+2gb 9ZFWgMqYapHQt9C36lRDJsorE4GX9D0UznAFTogyEbzTLyAcTih5ezSyc+70IeJZbMZc mV2Q== X-Forwarded-Encrypted: i=1; AJvYcCVm07TvwZemBhNJr9XvAEwXDRHKjdNdGRI8kBwhhxPuI6SmpOzL5ihCTjllPbkuPDxKnDjlGgb15clzHss=@vger.kernel.org X-Gm-Message-State: AOJu0YxKu57oJQbYqs4gmHM731ixbHYOGHd8/BGc0reGsoLB/PPCQx9h KNPD4QRQnocrlhogyfr5QE954C/5mJiqdu4buT0NnVB/h6CqePL1kAMp X-Gm-Gg: AY/fxX7TmiPpOF072eVa1ZPAeeo9+HQekm8n4eqyKT8xeN8sgFAdurs4j+9vkjDrd8S OyJHY/L9fsZTh26awIvniNAck/Asuz8KGYbtbo4ZGHKjWBn9gZ6o+hi9FZqSSDLMjqycnApg0zQ zxCrvVMdTQvclG/74Le+pHY22f4qn9J0Ju5Lvng3KnDasWB7ky0eyYxoT0hlK9kYk+Nl5onKhr1 9bDlbZcroyWU/NqJtJ1wOSFWXFFPblMzmv4htmnWFFUduyx/0/fpvlxRDn2fjvldul39Zf7f4p7 FIg0wYocOyBVJj/I4405flXdbWBiTy2xwZiKoWdHdCq30MBndJs6EmfMyZOU1jwlQrW4C5w3L8L QKy25hNvhmQ/Ku2yjvghLfQh3abLvEFfHHPwabuoKjPcKmvn3U35rcvaDA2mn49oj0bmw0zF2I9 rp8YOH7wwqfg== X-Received: by 2002:a05:690c:4a07:b0:787:d0d5:808e with SMTP id 00721157ae682-793c682d2ebmr12993417b3.50.1768539798910; Thu, 15 Jan 2026 21:03:18 -0800 (PST) Received: from localhost ([2a03:2880:25ff:52::]) by smtp.gmail.com with ESMTPSA id 00721157ae682-793c68c31dfsm5205817b3.51.2026.01.15.21.03.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 15 Jan 2026 21:03:18 -0800 (PST) From: Bobby Eshleman Date: Thu, 15 Jan 2026 21:02:13 -0800 Subject: [PATCH net-next v10 2/5] net: devmem: refactor sock_devmem_dontneed for autorelease split Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260115-scratch-bobbyeshleman-devmem-tcp-token-upstream-v10-2-686d0af71978@meta.com> References: <20260115-scratch-bobbyeshleman-devmem-tcp-token-upstream-v10-0-686d0af71978@meta.com> In-Reply-To: <20260115-scratch-bobbyeshleman-devmem-tcp-token-upstream-v10-0-686d0af71978@meta.com> To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Simon Horman , Kuniyuki Iwashima , Willem de Bruijn , Neal Cardwell , David Ahern , Mina Almasry , Arnd Bergmann , Jonathan Corbet , Andrew Lunn , Shuah Khan , Donald Hunter Cc: Stanislav Fomichev , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, asml.silence@gmail.com, matttbe@kernel.org, skhawaja@google.com, Bobby Eshleman X-Mailer: b4 0.14.3 From: Bobby Eshleman Refactor sock_devmem_dontneed() in preparation for supporting both autorelease and manual token release modes. Split the function into two parts: - sock_devmem_dontneed(): handles input validation, token allocation, and copying from userspace - sock_devmem_dontneed_autorelease(): performs the actual token release via xarray lookup and page pool put This separation allows a future commit to add a parallel sock_devmem_dontneed_manual_release() function that uses a different token tracking mechanism (per-niov reference counting) without duplicating the input validation logic. The refactoring is purely mechanical with no functional change. Only intended to minimize the noise in subsequent patches. Reviewed-by: Mina Almasry Signed-off-by: Bobby Eshleman --- net/core/sock.c | 52 ++++++++++++++++++++++++++++++++-------------------- 1 file changed, 32 insertions(+), 20 deletions(-) diff --git a/net/core/sock.c b/net/core/sock.c index a1c8b47b0d56..f6526f43aa6e 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1082,30 +1082,13 @@ static int sock_reserve_memory(struct sock *sk, int= bytes) #define MAX_DONTNEED_FRAGS 1024 =20 static noinline_for_stack int -sock_devmem_dontneed(struct sock *sk, sockptr_t optval, unsigned int optle= n) +sock_devmem_dontneed_autorelease(struct sock *sk, struct dmabuf_token *tok= ens, + unsigned int num_tokens) { - unsigned int num_tokens, i, j, k, netmem_num =3D 0; - struct dmabuf_token *tokens; + unsigned int i, j, k, netmem_num =3D 0; int ret =3D 0, num_frags =3D 0; netmem_ref netmems[16]; =20 - if (!sk_is_tcp(sk)) - return -EBADF; - - if (optlen % sizeof(*tokens) || - optlen > sizeof(*tokens) * MAX_DONTNEED_TOKENS) - return -EINVAL; - - num_tokens =3D optlen / sizeof(*tokens); - tokens =3D kvmalloc_array(num_tokens, sizeof(*tokens), GFP_KERNEL); - if (!tokens) - return -ENOMEM; - - if (copy_from_sockptr(tokens, optval, optlen)) { - kvfree(tokens); - return -EFAULT; - } - xa_lock_bh(&sk->sk_user_frags); for (i =3D 0; i < num_tokens; i++) { for (j =3D 0; j < tokens[i].token_count; j++) { @@ -1135,6 +1118,35 @@ sock_devmem_dontneed(struct sock *sk, sockptr_t optv= al, unsigned int optlen) for (k =3D 0; k < netmem_num; k++) WARN_ON_ONCE(!napi_pp_put_page(netmems[k])); =20 + return ret; +} + +static noinline_for_stack int +sock_devmem_dontneed(struct sock *sk, sockptr_t optval, unsigned int optle= n) +{ + struct dmabuf_token *tokens; + unsigned int num_tokens; + int ret; + + if (!sk_is_tcp(sk)) + return -EBADF; + + if (optlen % sizeof(*tokens) || + optlen > sizeof(*tokens) * MAX_DONTNEED_TOKENS) + return -EINVAL; + + num_tokens =3D optlen / sizeof(*tokens); + tokens =3D kvmalloc_array(num_tokens, sizeof(*tokens), GFP_KERNEL); + if (!tokens) + return -ENOMEM; + + if (copy_from_sockptr(tokens, optval, optlen)) { + kvfree(tokens); + return -EFAULT; + } + + ret =3D sock_devmem_dontneed_autorelease(sk, tokens, num_tokens); + kvfree(tokens); return ret; } --=20 2.47.3 From nobody Sun Feb 8 18:44:10 2026 Received: from mail-yx1-f53.google.com (mail-yx1-f53.google.com [74.125.224.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0DC4C34AAFB for ; Fri, 16 Jan 2026 05:03:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=74.125.224.53 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768539808; cv=none; b=H9DKqLD9BxU+gAiQJnPxszVKSWPZIyTgSN5FpvrfK5GWVKriEQZrZxkIJAOFwDWnv6MvNjgEwBGwDuYLMxbyplAwbFVZ9uCTUShW0dLWysOAnWOZJHNMvYVDWBpoLGbeQwYAFECAHeIjVhhgUnxih7/0SBtO2THeKS8rHq1aBCs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768539808; c=relaxed/simple; bh=9S5OUgIpuf56WuydD4vdzfH08s/+w+X+AXyg3rys7KA=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=oDJm+HaETjWqEx/RACcxmOvaAp6OkU8wHE3U7X5MQNF9gJYIaRca3y5kukc6JIURbWF0D01+XZgF6HL/4p9LMTaR6kwSTimancnvfPu8T3qaufAiFSH97HHlaZFRJU7g6dWHpKeVwr/Aqbx/kyK4Sm3Zc/pkGOLQMDYqK9WISUI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=LP/8p35B; arc=none smtp.client-ip=74.125.224.53 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="LP/8p35B" Received: by mail-yx1-f53.google.com with SMTP id 956f58d0204a3-64760131fc1so1361086d50.2 for ; Thu, 15 Jan 2026 21:03:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1768539800; x=1769144600; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=/asKP1YvaUXXv4sijY4YouNHQV0l+Ef77I/RdWYiUjg=; b=LP/8p35BVItbcke/qa95GbG8LNugb/ZyNLm8Cx3zP8DQ6o0abnYk2987J4UaB2ocYi VoWEeEoMehSyPkPwbMheHQlJbzudqNPiNiNLGrG3+Ecn2S4JHzlNGfsRjbcJvknV69P8 /hAteteJbafrI1oqFpVazeGUJ4wr5pmyj4cN9uXzy9Se8AvSStWpPGl+cFMbhzDYEoaB 0V3ELeMk9WqtKl7cjmBV+w7iF8QkHP2CjscQ647ljA5PtAsIYTM+D21SfsbVzdMmODPx 6L4dUdB+wd4QgdT3oMAB1h3eYS26jDuaBHySe8+mDnxPYOdoZbQJKZ0oHLe6PJFD8axq qPZw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1768539800; x=1769144600; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=/asKP1YvaUXXv4sijY4YouNHQV0l+Ef77I/RdWYiUjg=; b=cyZs4KlhuLIdfK07Kk3o4VqI+bw1o0bwn8/8SW3vLrPuPUU6FihKsVmWntTVvx0dOh LZg++GTCpTN2u+i0UFOACjqkhupDOB0POVwxCTNOTF9TqWBYrij88THaGVfvINc2OUtv 4Z+Cq5h7A0B/3ADbac1XcFeE4YpTyFepln+jEYpaQkWaxsa7YanTcsM42q45yEg49eRx hIpKseYlliVOQb8WCXaxv7eCe/UQHykpxcsaHc9XWnIJdSWmFAqSkXiNQP+tYKKfCTYF UX7q3ePan9j1syxzB6wrD8GY+wuvbsEEfRcjxov92MbNuCCVVXnvqFAGJAm0ZFqh3Ktt by6Q== X-Forwarded-Encrypted: i=1; AJvYcCV6xPvA5RwHQzbFsxMF3+w+kqk0oyf5PrqjRt3KT/D1L/07teFW8tTff47uYvcyh3k72EIRLzJJCObquL4=@vger.kernel.org X-Gm-Message-State: AOJu0Yyk6+nAMA6ur0PsO2srYQ4mnS2VFcakcAHWsrD+Hsi8JzfBPzOs qA/LY0HqYYYCpptjRekhwbak+eg4H+noi4kdqLfjxTH2A+iYfT3pb0Ph X-Gm-Gg: AY/fxX7SWaM1D4tOOrxMwipK5EBFj3tL1y/R0u9oGw1YrC8msy0GC2MLXkDxR50ZCOQ FYxk8hJGxpU7x0n8MfYWzX3UU9JFbFJjIU62HDU19K15iyc6Kwr2nHg/KaQMGwEfNDwMnfeEMhW A2Bq1TKgVq4Hqk9ApsKOWN0Nv0W/b2+pA87emhLym+S4uveYhvUyzKqIQmEWn2Ga08af1NxYKLi sOwiGHUeOHBS3z4a+gx1UW0o+l3Z3j8eprqAg2JbMwZjqtdGRtxVN/MRgkhGQMpFKAqqPTKOm64 KQax4IpCF+/y68ju/SFYOU0Ebnztlb00ZQhZj9qUz0eZv0LngGCMSeGzy6tYnT31g6vqg1q1R6b oA9AcPIrReuw+hs0F5jN74zggjdmJwZ1aWe3h/JK4Yehoq9uEKNkqWQaZ2G9bORRF2spP1hWwKp cwV2sYzHbax+G32hc4uY7T X-Received: by 2002:a05:690c:f96:b0:788:989:fdae with SMTP id 00721157ae682-793c52a3abdmr38677927b3.28.1768539800055; Thu, 15 Jan 2026 21:03:20 -0800 (PST) Received: from localhost ([2a03:2880:25ff:71::]) by smtp.gmail.com with ESMTPSA id 00721157ae682-793c68c31dfsm5205917b3.51.2026.01.15.21.03.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 15 Jan 2026 21:03:19 -0800 (PST) From: Bobby Eshleman Date: Thu, 15 Jan 2026 21:02:14 -0800 Subject: [PATCH net-next v10 3/5] net: devmem: implement autorelease token management Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260115-scratch-bobbyeshleman-devmem-tcp-token-upstream-v10-3-686d0af71978@meta.com> References: <20260115-scratch-bobbyeshleman-devmem-tcp-token-upstream-v10-0-686d0af71978@meta.com> In-Reply-To: <20260115-scratch-bobbyeshleman-devmem-tcp-token-upstream-v10-0-686d0af71978@meta.com> To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Simon Horman , Kuniyuki Iwashima , Willem de Bruijn , Neal Cardwell , David Ahern , Mina Almasry , Arnd Bergmann , Jonathan Corbet , Andrew Lunn , Shuah Khan , Donald Hunter Cc: Stanislav Fomichev , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, asml.silence@gmail.com, matttbe@kernel.org, skhawaja@google.com, Bobby Eshleman X-Mailer: b4 0.14.3 From: Bobby Eshleman Add support for autorelease toggling of tokens using a static branch to control system-wide behavior. This allows applications to choose between two memory management modes: 1. Autorelease on: Leaked tokens are automatically released when the socket closes. 2. Autorelease off: Leaked tokens are released during dmabuf unbind. The autorelease mode is requested via the NETDEV_A_DMABUF_AUTORELEASE attribute of the NETDEV_CMD_BIND_RX message. Having separate modes per binding is disallowed and is rejected by netlink. The system will be "locked" into the mode that the first binding is set to. It can only be changed again once there are zero bindings on the system. Disabling autorelease offers ~13% improvement in CPU utilization. Static branching is used to limit the system to one mode or the other. The xa_erase(&net_devmem_dmabuf_bindings, ...) call is moved into __net_devmem_dmabuf_binding_free(...). The result is that it becomes possible to switch static branches atomically with regards to xarray state. In the time window between unbind and free the socket layer can still find the binding in the xarray, but it will fail to acquire binding->ref (if unbind decremented to zero). This change preserves correct behavior and allows us to avoid more complicated counting schemes for bindings. Signed-off-by: Bobby Eshleman --- Changes in v10: - add binding->users to track socket and rxq users of binding, defer release of urefs until binding->users hits zero to guard users from incrementing urefs *after* net_devmem_dmabuf_binding_put_urefs() is called. (Mina) - fix error failing to restore static key state when xarray alloc fails (Jakub) - add wrappers for setting/unsetting mode that captures the static key + rx binding count logic. - move xa_erase() into __net_devmem_dmabuf_binding_free() - remove net_devmem_rx_bindings_count, change xarray management to be to avoid the same race as net_devmem_rx_bindings_count did - check return of net_devmem_dmabuf_binding_get() in tcp_recvmsg_dmabuf() - move sk_devmem_info.binding fiddling into autorelease=3Doff static path Changes in v9: - Add missing stub for net_devmem_dmabuf_binding_get() when NET_DEVMEM=3Dn - Add wrapper around tcp_devmem_ar_key accesses so that it may be stubbed out when NET_DEVMEM=3Dn - only dec rx binding count for rx bindings in free (v8 did not exclude TX bindings) Changes in v8: - Only reset static key when bindings go to zero, defaulting back to disabled (Stan). - Fix bad usage of xarray spinlock for sleepy static branch switching, use mutex instead. - Access pp_ref_count via niov->desc instead of niov directly. - Move reset of static key to __net_devmem_dmabuf_binding_free() so that the static key can not be changed while there are outstanding tokens (free is only called when reference count reaches zero). - Add net_devmem_dmabuf_rx_bindings_count because tokens may be active even after xa_erase(), so static key changes must wait until all RX bindings are finally freed (not just when xarray is empty). A counter is a simple way to track this. - socket takes reference on the binding, to avoid use-after-free on sk_devmem_info.binding in the case that user releases all tokens, unbinds, then issues SO_DEVMEM_DONTNEED again (with bad token). - removed some comments that were unnecessary Changes in v7: - implement autorelease with static branch (Stan) - use netlink instead of sockopt (Stan) - merge uAPI and implementation patches into one patch (seemed less confusing) Changes in v6: - remove sk_devmem_info.autorelease, using binding->autorelease instead - move binding->autorelease check to outside of net_devmem_dmabuf_binding_put_urefs() (Mina) - remove overly defensive net_is_devmem_iov() (Mina) - add comment about multiple urefs mapping to a single netmem ref (Mina) - remove overly defense netmem NULL and netmem_is_net_iov checks (Mina) - use niov without casting back and forth with netmem (Mina) - move the autorelease flag from per-binding to per-socket (Mina) - remove the batching logic in sock_devmem_dontneed_manual_release() (Mina) - move autorelease check inside tcp_xa_pool_commit() (Mina) - remove single-binding restriction for autorelease mode (Mina) - unbind always checks for leaked urefs Changes in v5: - remove unused variables - introduce autorelease flag, preparing for future patch toggle new behavior Changes in v3: - make urefs per-binding instead of per-socket, reducing memory footprint - fallback to cleaning up references in dmabuf unbind if socket leaked tokens - drop ethtool patch Changes in v2: - always use GFP_ZERO for binding->vec (Mina) - remove WARN for changed binding (Mina) - remove extraneous binding ref get (Mina) - remove WARNs on invalid user input (Mina) - pre-assign niovs in binding->vec for RX case (Mina) - use atomic_set(, 0) to initialize sk_user_frags.urefs - fix length of alloc for urefs --- Documentation/netlink/specs/netdev.yaml | 12 +++ include/net/netmem.h | 1 + include/net/sock.h | 7 +- include/uapi/linux/netdev.h | 1 + net/core/devmem.c | 136 +++++++++++++++++++++++++++-= ---- net/core/devmem.h | 64 ++++++++++++++- net/core/netdev-genl-gen.c | 5 +- net/core/netdev-genl.c | 10 ++- net/core/sock.c | 57 +++++++++++-- net/ipv4/tcp.c | 87 ++++++++++++++++---- net/ipv4/tcp_ipv4.c | 15 +++- net/ipv4/tcp_minisocks.c | 3 +- tools/include/uapi/linux/netdev.h | 1 + 13 files changed, 345 insertions(+), 54 deletions(-) diff --git a/Documentation/netlink/specs/netdev.yaml b/Documentation/netlin= k/specs/netdev.yaml index 596c306ce52b..a5301b150663 100644 --- a/Documentation/netlink/specs/netdev.yaml +++ b/Documentation/netlink/specs/netdev.yaml @@ -562,6 +562,17 @@ attribute-sets: type: u32 checks: min: 1 + - + name: autorelease + doc: | + Token autorelease mode. If true (1), leaked tokens are automatic= ally + released when the socket closes. If false (0), leaked tokens are= only + released when the dmabuf is torn down. Once a binding is created= with + a specific mode, all subsequent bindings system-wide must use the + same mode. + + Optional. Defaults to false if not specified. + type: u8 =20 operations: list: @@ -769,6 +780,7 @@ operations: - ifindex - fd - queues + - autorelease reply: attributes: - id diff --git a/include/net/netmem.h b/include/net/netmem.h index 9e10f4ac50c3..80d2263ba4ed 100644 --- a/include/net/netmem.h +++ b/include/net/netmem.h @@ -112,6 +112,7 @@ struct net_iov { }; struct net_iov_area *owner; enum net_iov_type type; + atomic_t uref; }; =20 struct net_iov_area { diff --git a/include/net/sock.h b/include/net/sock.h index aafe8bdb2c0f..9d3d5bde15e9 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -352,7 +352,7 @@ struct sk_filter; * @sk_scm_rights: flagged by SO_PASSRIGHTS to recv SCM_RIGHTS * @sk_scm_unused: unused flags for scm_recv() * @ns_tracker: tracker for netns reference - * @sk_user_frags: xarray of pages the user is holding a reference on. + * @sk_devmem_info: the devmem binding information for the socket * @sk_owner: reference to the real owner of the socket that calls * sock_lock_init_class_and_name(). */ @@ -584,7 +584,10 @@ struct sock { struct numa_drop_counters *sk_drop_counters; struct rcu_head sk_rcu; netns_tracker ns_tracker; - struct xarray sk_user_frags; + struct { + struct xarray frags; + struct net_devmem_dmabuf_binding *binding; + } sk_devmem_info; =20 #if IS_ENABLED(CONFIG_PROVE_LOCKING) && IS_ENABLED(CONFIG_MODULES) struct module *sk_owner; diff --git a/include/uapi/linux/netdev.h b/include/uapi/linux/netdev.h index e0b579a1df4f..1e5c209cb998 100644 --- a/include/uapi/linux/netdev.h +++ b/include/uapi/linux/netdev.h @@ -207,6 +207,7 @@ enum { NETDEV_A_DMABUF_QUEUES, NETDEV_A_DMABUF_FD, NETDEV_A_DMABUF_ID, + NETDEV_A_DMABUF_AUTORELEASE, =20 __NETDEV_A_DMABUF_MAX, NETDEV_A_DMABUF_MAX =3D (__NETDEV_A_DMABUF_MAX - 1) diff --git a/net/core/devmem.c b/net/core/devmem.c index 9dee697a28ee..1264d8ee40e3 100644 --- a/net/core/devmem.c +++ b/net/core/devmem.c @@ -11,6 +11,7 @@ #include #include #include +#include #include #include #include @@ -27,6 +28,9 @@ /* Device memory support */ =20 static DEFINE_XARRAY_FLAGS(net_devmem_dmabuf_bindings, XA_FLAGS_ALLOC1); +static DEFINE_MUTEX(devmem_ar_lock); +DEFINE_STATIC_KEY_FALSE(tcp_devmem_ar_key); +EXPORT_SYMBOL(tcp_devmem_ar_key); =20 static const struct memory_provider_ops dmabuf_devmem_ops; =20 @@ -63,12 +67,71 @@ static void net_devmem_dmabuf_binding_release(struct pe= rcpu_ref *ref) schedule_work(&binding->unbind_w); } =20 +static bool net_devmem_has_rx_bindings(void) +{ + struct net_devmem_dmabuf_binding *binding; + unsigned long index; + + lockdep_assert_held(&devmem_ar_lock); + + xa_for_each(&net_devmem_dmabuf_bindings, index, binding) { + if (binding->direction =3D=3D DMA_FROM_DEVICE) + return true; + } + return false; +} + +/* caller must hold devmem_ar_lock */ +static int +__net_devmem_dmabuf_binding_set_mode(enum dma_data_direction direction, + bool autorelease) +{ + lockdep_assert_held(&devmem_ar_lock); + + if (direction !=3D DMA_FROM_DEVICE) + return 0; + + if (net_devmem_has_rx_bindings() && + static_key_enabled(&tcp_devmem_ar_key) !=3D autorelease) + return -EBUSY; + + if (autorelease) + static_branch_enable(&tcp_devmem_ar_key); + + return 0; +} + +/* caller must hold devmem_ar_lock */ +static void +__net_devmem_dmabuf_binding_unset_mode(enum dma_data_direction direction) +{ + lockdep_assert_held(&devmem_ar_lock); + + if (direction !=3D DMA_FROM_DEVICE) + return; + + if (net_devmem_has_rx_bindings()) + return; + + static_branch_disable(&tcp_devmem_ar_key); +} + void __net_devmem_dmabuf_binding_free(struct work_struct *wq) { struct net_devmem_dmabuf_binding *binding =3D container_of(wq, typeof(*bi= nding), unbind_w); =20 size_t size, avail; =20 + mutex_lock(&devmem_ar_lock); + xa_erase(&net_devmem_dmabuf_bindings, binding->id); + __net_devmem_dmabuf_binding_unset_mode(binding->direction); + mutex_unlock(&devmem_ar_lock); + + /* Ensure no tx net_devmem_lookup_dmabuf() are in flight after the + * erase. + */ + synchronize_net(); + gen_pool_for_each_chunk(binding->chunk_pool, net_devmem_dmabuf_free_chunk_owner, NULL); =20 @@ -126,19 +189,30 @@ void net_devmem_free_dmabuf(struct net_iov *niov) gen_pool_free(binding->chunk_pool, dma_addr, PAGE_SIZE); } =20 +void +net_devmem_dmabuf_binding_put_urefs(struct net_devmem_dmabuf_binding *bind= ing) +{ + int i; + + for (i =3D 0; i < binding->dmabuf->size / PAGE_SIZE; i++) { + struct net_iov *niov; + netmem_ref netmem; + + niov =3D binding->vec[i]; + netmem =3D net_iov_to_netmem(niov); + + /* Multiple urefs map to only a single netmem ref. */ + if (atomic_xchg(&niov->uref, 0) > 0) + WARN_ON_ONCE(!napi_pp_put_page(netmem)); + } +} + void net_devmem_unbind_dmabuf(struct net_devmem_dmabuf_binding *binding) { struct netdev_rx_queue *rxq; unsigned long xa_idx; unsigned int rxq_idx; =20 - xa_erase(&net_devmem_dmabuf_bindings, binding->id); - - /* Ensure no tx net_devmem_lookup_dmabuf() are in flight after the - * erase. - */ - synchronize_net(); - if (binding->list.next) list_del(&binding->list); =20 @@ -151,6 +225,8 @@ void net_devmem_unbind_dmabuf(struct net_devmem_dmabuf_= binding *binding) rxq_idx =3D get_netdev_rx_queue_index(rxq); =20 __net_mp_close_rxq(binding->dev, rxq_idx, &mp_params); + + net_devmem_dmabuf_binding_user_put(binding); } =20 percpu_ref_kill(&binding->ref); @@ -178,6 +254,8 @@ int net_devmem_bind_dmabuf_to_queue(struct net_device *= dev, u32 rxq_idx, if (err) goto err_close_rxq; =20 + atomic_inc(&binding->users); + return 0; =20 err_close_rxq: @@ -189,8 +267,10 @@ struct net_devmem_dmabuf_binding * net_devmem_bind_dmabuf(struct net_device *dev, struct device *dma_dev, enum dma_data_direction direction, - unsigned int dmabuf_fd, struct netdev_nl_sock *priv, - struct netlink_ext_ack *extack) + unsigned int dmabuf_fd, + struct netdev_nl_sock *priv, + struct netlink_ext_ack *extack, + bool autorelease) { struct net_devmem_dmabuf_binding *binding; static u32 id_alloc_next; @@ -225,6 +305,8 @@ net_devmem_bind_dmabuf(struct net_device *dev, if (err < 0) goto err_free_binding; =20 + atomic_set(&binding->users, 0); + mutex_init(&binding->lock); =20 binding->dmabuf =3D dmabuf; @@ -245,14 +327,12 @@ net_devmem_bind_dmabuf(struct net_device *dev, goto err_detach; } =20 - if (direction =3D=3D DMA_TO_DEVICE) { - binding->vec =3D kvmalloc_array(dmabuf->size / PAGE_SIZE, - sizeof(struct net_iov *), - GFP_KERNEL); - if (!binding->vec) { - err =3D -ENOMEM; - goto err_unmap; - } + binding->vec =3D kvmalloc_array(dmabuf->size / PAGE_SIZE, + sizeof(struct net_iov *), + GFP_KERNEL | __GFP_ZERO); + if (!binding->vec) { + err =3D -ENOMEM; + goto err_unmap; } =20 /* For simplicity we expect to make PAGE_SIZE allocations, but the @@ -306,25 +386,41 @@ net_devmem_bind_dmabuf(struct net_device *dev, niov =3D &owner->area.niovs[i]; niov->type =3D NET_IOV_DMABUF; niov->owner =3D &owner->area; + atomic_set(&niov->uref, 0); page_pool_set_dma_addr_netmem(net_iov_to_netmem(niov), net_devmem_get_dma_addr(niov)); - if (direction =3D=3D DMA_TO_DEVICE) - binding->vec[owner->area.base_virtual / PAGE_SIZE + i] =3D niov; + binding->vec[owner->area.base_virtual / PAGE_SIZE + i] =3D niov; } =20 virtual +=3D len; } =20 + mutex_lock(&devmem_ar_lock); + + err =3D __net_devmem_dmabuf_binding_set_mode(direction, autorelease); + if (err < 0) { + NL_SET_ERR_MSG_FMT(extack, + "System already configured with autorelease=3D%d", + static_key_enabled(&tcp_devmem_ar_key)); + goto err_unlock_mutex; + } + err =3D xa_alloc_cyclic(&net_devmem_dmabuf_bindings, &binding->id, binding, xa_limit_32b, &id_alloc_next, GFP_KERNEL); if (err < 0) - goto err_free_chunks; + goto err_unset_mode; + + mutex_unlock(&devmem_ar_lock); =20 list_add(&binding->list, &priv->bindings); =20 return binding; =20 +err_unset_mode: + __net_devmem_dmabuf_binding_unset_mode(direction); +err_unlock_mutex: + mutex_unlock(&devmem_ar_lock); err_free_chunks: gen_pool_for_each_chunk(binding->chunk_pool, net_devmem_dmabuf_free_chunk_owner, NULL); diff --git a/net/core/devmem.h b/net/core/devmem.h index 94874b323520..284f0ad5f381 100644 --- a/net/core/devmem.h +++ b/net/core/devmem.h @@ -12,9 +12,13 @@ =20 #include #include +#include =20 struct netlink_ext_ack; =20 +/* static key for TCP devmem autorelease */ +extern struct static_key_false tcp_devmem_ar_key; + struct net_devmem_dmabuf_binding { struct dma_buf *dmabuf; struct dma_buf_attachment *attachment; @@ -43,6 +47,12 @@ struct net_devmem_dmabuf_binding { */ struct percpu_ref ref; =20 + /* Counts sockets and rxqs that are using the binding. When this + * reaches zero, all urefs are drained and new sockets cannot join the + * binding. + */ + atomic_t users; + /* The list of bindings currently active. Used for netlink to notify us * of the user dropping the bind. */ @@ -61,7 +71,7 @@ struct net_devmem_dmabuf_binding { =20 /* Array of net_iov pointers for this binding, sorted by virtual * address. This array is convenient to map the virtual addresses to - * net_iovs in the TX path. + * net_iovs. */ struct net_iov **vec; =20 @@ -88,7 +98,7 @@ net_devmem_bind_dmabuf(struct net_device *dev, struct device *dma_dev, enum dma_data_direction direction, unsigned int dmabuf_fd, struct netdev_nl_sock *priv, - struct netlink_ext_ack *extack); + struct netlink_ext_ack *extack, bool autorelease); struct net_devmem_dmabuf_binding *net_devmem_lookup_dmabuf(u32 id); void net_devmem_unbind_dmabuf(struct net_devmem_dmabuf_binding *binding); int net_devmem_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx, @@ -134,6 +144,26 @@ net_devmem_dmabuf_binding_put(struct net_devmem_dmabuf= _binding *binding) percpu_ref_put(&binding->ref); } =20 +void net_devmem_dmabuf_binding_put_urefs(struct net_devmem_dmabuf_binding = *binding); + +static inline bool +net_devmem_dmabuf_binding_user_get(struct net_devmem_dmabuf_binding *bindi= ng) +{ + return atomic_inc_not_zero(&binding->users); +} + +static inline void +net_devmem_dmabuf_binding_user_put(struct net_devmem_dmabuf_binding *bindi= ng) +{ + if (atomic_dec_and_test(&binding->users)) + net_devmem_dmabuf_binding_put_urefs(binding); +} + +static inline bool net_devmem_autorelease_enabled(void) +{ + return static_branch_unlikely(&tcp_devmem_ar_key); +} + void net_devmem_get_net_iov(struct net_iov *niov); void net_devmem_put_net_iov(struct net_iov *niov); =20 @@ -151,11 +181,38 @@ net_devmem_get_niov_at(struct net_devmem_dmabuf_bindi= ng *binding, size_t addr, #else struct net_devmem_dmabuf_binding; =20 +static inline bool +net_devmem_dmabuf_binding_get(struct net_devmem_dmabuf_binding *binding) +{ + return false; +} + static inline void net_devmem_dmabuf_binding_put(struct net_devmem_dmabuf_binding *binding) { } =20 +static inline void +net_devmem_dmabuf_binding_put_urefs(struct net_devmem_dmabuf_binding *bind= ing) +{ +} + +static inline bool +net_devmem_dmabuf_binding_user_get(struct net_devmem_dmabuf_binding *bindi= ng) +{ + return false; +} + +static inline void +net_devmem_dmabuf_binding_user_put(struct net_devmem_dmabuf_binding *bindi= ng) +{ +} + +static inline bool net_devmem_autorelease_enabled(void) +{ + return false; +} + static inline void net_devmem_get_net_iov(struct net_iov *niov) { } @@ -170,7 +227,8 @@ net_devmem_bind_dmabuf(struct net_device *dev, enum dma_data_direction direction, unsigned int dmabuf_fd, struct netdev_nl_sock *priv, - struct netlink_ext_ack *extack) + struct netlink_ext_ack *extack, + bool autorelease) { return ERR_PTR(-EOPNOTSUPP); } diff --git a/net/core/netdev-genl-gen.c b/net/core/netdev-genl-gen.c index ba673e81716f..01b7765e11ec 100644 --- a/net/core/netdev-genl-gen.c +++ b/net/core/netdev-genl-gen.c @@ -86,10 +86,11 @@ static const struct nla_policy netdev_qstats_get_nl_pol= icy[NETDEV_A_QSTATS_SCOPE }; =20 /* NETDEV_CMD_BIND_RX - do */ -static const struct nla_policy netdev_bind_rx_nl_policy[NETDEV_A_DMABUF_FD= + 1] =3D { +static const struct nla_policy netdev_bind_rx_nl_policy[NETDEV_A_DMABUF_AU= TORELEASE + 1] =3D { [NETDEV_A_DMABUF_IFINDEX] =3D NLA_POLICY_MIN(NLA_U32, 1), [NETDEV_A_DMABUF_FD] =3D { .type =3D NLA_U32, }, [NETDEV_A_DMABUF_QUEUES] =3D NLA_POLICY_NESTED(netdev_queue_id_nl_policy), + [NETDEV_A_DMABUF_AUTORELEASE] =3D { .type =3D NLA_U8, }, }; =20 /* NETDEV_CMD_NAPI_SET - do */ @@ -188,7 +189,7 @@ static const struct genl_split_ops netdev_nl_ops[] =3D { .cmd =3D NETDEV_CMD_BIND_RX, .doit =3D netdev_nl_bind_rx_doit, .policy =3D netdev_bind_rx_nl_policy, - .maxattr =3D NETDEV_A_DMABUF_FD, + .maxattr =3D NETDEV_A_DMABUF_AUTORELEASE, .flags =3D GENL_ADMIN_PERM | GENL_CMD_CAP_DO, }, { diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c index 470fabbeacd9..c742bb34865e 100644 --- a/net/core/netdev-genl.c +++ b/net/core/netdev-genl.c @@ -939,6 +939,7 @@ int netdev_nl_bind_rx_doit(struct sk_buff *skb, struct = genl_info *info) struct netdev_nl_sock *priv; struct net_device *netdev; unsigned long *rxq_bitmap; + bool autorelease =3D false; struct device *dma_dev; struct sk_buff *rsp; int err =3D 0; @@ -952,6 +953,10 @@ int netdev_nl_bind_rx_doit(struct sk_buff *skb, struct= genl_info *info) ifindex =3D nla_get_u32(info->attrs[NETDEV_A_DEV_IFINDEX]); dmabuf_fd =3D nla_get_u32(info->attrs[NETDEV_A_DMABUF_FD]); =20 + if (info->attrs[NETDEV_A_DMABUF_AUTORELEASE]) + autorelease =3D + !!nla_get_u8(info->attrs[NETDEV_A_DMABUF_AUTORELEASE]); + priv =3D genl_sk_priv_get(&netdev_nl_family, NETLINK_CB(skb).sk); if (IS_ERR(priv)) return PTR_ERR(priv); @@ -1002,7 +1007,8 @@ int netdev_nl_bind_rx_doit(struct sk_buff *skb, struc= t genl_info *info) } =20 binding =3D net_devmem_bind_dmabuf(netdev, dma_dev, DMA_FROM_DEVICE, - dmabuf_fd, priv, info->extack); + dmabuf_fd, priv, info->extack, + autorelease); if (IS_ERR(binding)) { err =3D PTR_ERR(binding); goto err_rxq_bitmap; @@ -1097,7 +1103,7 @@ int netdev_nl_bind_tx_doit(struct sk_buff *skb, struc= t genl_info *info) =20 dma_dev =3D netdev_queue_get_dma_dev(netdev, 0); binding =3D net_devmem_bind_dmabuf(netdev, dma_dev, DMA_TO_DEVICE, - dmabuf_fd, priv, info->extack); + dmabuf_fd, priv, info->extack, false); if (IS_ERR(binding)) { err =3D PTR_ERR(binding); goto err_unlock_netdev; diff --git a/net/core/sock.c b/net/core/sock.c index f6526f43aa6e..6355c2ccfb8a 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -87,6 +87,7 @@ =20 #include #include +#include #include #include #include @@ -151,6 +152,7 @@ #include =20 #include "dev.h" +#include "devmem.h" =20 static DEFINE_MUTEX(proto_list_mutex); static LIST_HEAD(proto_list); @@ -1081,6 +1083,44 @@ static int sock_reserve_memory(struct sock *sk, int = bytes) #define MAX_DONTNEED_TOKENS 128 #define MAX_DONTNEED_FRAGS 1024 =20 +static noinline_for_stack int +sock_devmem_dontneed_manual_release(struct sock *sk, + struct dmabuf_token *tokens, + unsigned int num_tokens) +{ + struct net_iov *niov; + unsigned int i, j; + netmem_ref netmem; + unsigned int token; + int num_frags =3D 0; + int ret =3D 0; + + if (!sk->sk_devmem_info.binding) + return -EINVAL; + + for (i =3D 0; i < num_tokens; i++) { + for (j =3D 0; j < tokens[i].token_count; j++) { + size_t size =3D sk->sk_devmem_info.binding->dmabuf->size; + + token =3D tokens[i].token_start + j; + if (token >=3D size / PAGE_SIZE) + break; + + if (++num_frags > MAX_DONTNEED_FRAGS) + return ret; + + niov =3D sk->sk_devmem_info.binding->vec[token]; + if (atomic_dec_and_test(&niov->uref)) { + netmem =3D net_iov_to_netmem(niov); + WARN_ON_ONCE(!napi_pp_put_page(netmem)); + } + ret++; + } + } + + return ret; +} + static noinline_for_stack int sock_devmem_dontneed_autorelease(struct sock *sk, struct dmabuf_token *tok= ens, unsigned int num_tokens) @@ -1089,32 +1129,33 @@ sock_devmem_dontneed_autorelease(struct sock *sk, s= truct dmabuf_token *tokens, int ret =3D 0, num_frags =3D 0; netmem_ref netmems[16]; =20 - xa_lock_bh(&sk->sk_user_frags); + xa_lock_bh(&sk->sk_devmem_info.frags); for (i =3D 0; i < num_tokens; i++) { for (j =3D 0; j < tokens[i].token_count; j++) { if (++num_frags > MAX_DONTNEED_FRAGS) goto frag_limit_reached; =20 netmem_ref netmem =3D (__force netmem_ref)__xa_erase( - &sk->sk_user_frags, tokens[i].token_start + j); + &sk->sk_devmem_info.frags, + tokens[i].token_start + j); =20 if (!netmem || WARN_ON_ONCE(!netmem_is_net_iov(netmem))) continue; =20 netmems[netmem_num++] =3D netmem; if (netmem_num =3D=3D ARRAY_SIZE(netmems)) { - xa_unlock_bh(&sk->sk_user_frags); + xa_unlock_bh(&sk->sk_devmem_info.frags); for (k =3D 0; k < netmem_num; k++) WARN_ON_ONCE(!napi_pp_put_page(netmems[k])); netmem_num =3D 0; - xa_lock_bh(&sk->sk_user_frags); + xa_lock_bh(&sk->sk_devmem_info.frags); } ret++; } } =20 frag_limit_reached: - xa_unlock_bh(&sk->sk_user_frags); + xa_unlock_bh(&sk->sk_devmem_info.frags); for (k =3D 0; k < netmem_num; k++) WARN_ON_ONCE(!napi_pp_put_page(netmems[k])); =20 @@ -1145,7 +1186,11 @@ sock_devmem_dontneed(struct sock *sk, sockptr_t optv= al, unsigned int optlen) return -EFAULT; } =20 - ret =3D sock_devmem_dontneed_autorelease(sk, tokens, num_tokens); + if (net_devmem_autorelease_enabled()) + ret =3D sock_devmem_dontneed_autorelease(sk, tokens, num_tokens); + else + ret =3D sock_devmem_dontneed_manual_release(sk, tokens, + num_tokens); =20 kvfree(tokens); return ret; diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index d5319ebe2452..73a577bd8765 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -260,6 +260,7 @@ #include #include #include +#include #include #include #include @@ -492,7 +493,8 @@ void tcp_init_sock(struct sock *sk) =20 set_bit(SOCK_SUPPORT_ZC, &sk->sk_socket->flags); sk_sockets_allocated_inc(sk); - xa_init_flags(&sk->sk_user_frags, XA_FLAGS_ALLOC1); + xa_init_flags(&sk->sk_devmem_info.frags, XA_FLAGS_ALLOC1); + sk->sk_devmem_info.binding =3D NULL; } EXPORT_IPV6_MOD(tcp_init_sock); =20 @@ -2424,11 +2426,12 @@ static void tcp_xa_pool_commit_locked(struct sock *= sk, struct tcp_xa_pool *p) =20 /* Commit part that has been copied to user space. */ for (i =3D 0; i < p->idx; i++) - __xa_cmpxchg(&sk->sk_user_frags, p->tokens[i], XA_ZERO_ENTRY, - (__force void *)p->netmems[i], GFP_KERNEL); + __xa_cmpxchg(&sk->sk_devmem_info.frags, p->tokens[i], + XA_ZERO_ENTRY, (__force void *)p->netmems[i], + GFP_KERNEL); /* Rollback what has been pre-allocated and is no longer needed. */ for (; i < p->max; i++) - __xa_erase(&sk->sk_user_frags, p->tokens[i]); + __xa_erase(&sk->sk_devmem_info.frags, p->tokens[i]); =20 p->max =3D 0; p->idx =3D 0; @@ -2436,14 +2439,17 @@ static void tcp_xa_pool_commit_locked(struct sock *= sk, struct tcp_xa_pool *p) =20 static void tcp_xa_pool_commit(struct sock *sk, struct tcp_xa_pool *p) { + if (!net_devmem_autorelease_enabled()) + return; + if (!p->max) return; =20 - xa_lock_bh(&sk->sk_user_frags); + xa_lock_bh(&sk->sk_devmem_info.frags); =20 tcp_xa_pool_commit_locked(sk, p); =20 - xa_unlock_bh(&sk->sk_user_frags); + xa_unlock_bh(&sk->sk_devmem_info.frags); } =20 static int tcp_xa_pool_refill(struct sock *sk, struct tcp_xa_pool *p, @@ -2454,24 +2460,41 @@ static int tcp_xa_pool_refill(struct sock *sk, stru= ct tcp_xa_pool *p, if (p->idx < p->max) return 0; =20 - xa_lock_bh(&sk->sk_user_frags); + xa_lock_bh(&sk->sk_devmem_info.frags); =20 tcp_xa_pool_commit_locked(sk, p); =20 for (k =3D 0; k < max_frags; k++) { - err =3D __xa_alloc(&sk->sk_user_frags, &p->tokens[k], + err =3D __xa_alloc(&sk->sk_devmem_info.frags, &p->tokens[k], XA_ZERO_ENTRY, xa_limit_31b, GFP_KERNEL); if (err) break; } =20 - xa_unlock_bh(&sk->sk_user_frags); + xa_unlock_bh(&sk->sk_devmem_info.frags); =20 p->max =3D k; p->idx =3D 0; return k ? 0 : err; } =20 +static void tcp_xa_pool_inc_pp_ref_count(struct tcp_xa_pool *tcp_xa_pool, + skb_frag_t *frag) +{ + struct net_iov *niov; + + niov =3D skb_frag_net_iov(frag); + + if (net_devmem_autorelease_enabled()) { + atomic_long_inc(&niov->desc.pp_ref_count); + tcp_xa_pool->netmems[tcp_xa_pool->idx++] =3D + skb_frag_netmem(frag); + } else { + if (atomic_inc_return(&niov->uref) =3D=3D 1) + atomic_long_inc(&niov->desc.pp_ref_count); + } +} + /* On error, returns the -errno. On success, returns number of bytes sent = to the * user. May not consume all of @remaining_len. */ @@ -2533,6 +2556,7 @@ static int tcp_recvmsg_dmabuf(struct sock *sk, const = struct sk_buff *skb, * sequence of cmsg */ for (i =3D 0; i < skb_shinfo(skb)->nr_frags; i++) { + struct net_devmem_dmabuf_binding *binding =3D NULL; skb_frag_t *frag =3D &skb_shinfo(skb)->frags[i]; struct net_iov *niov; u64 frag_offset; @@ -2568,13 +2592,45 @@ static int tcp_recvmsg_dmabuf(struct sock *sk, cons= t struct sk_buff *skb, start; dmabuf_cmsg.frag_offset =3D frag_offset; dmabuf_cmsg.frag_size =3D copy; - err =3D tcp_xa_pool_refill(sk, &tcp_xa_pool, - skb_shinfo(skb)->nr_frags - i); - if (err) - goto out; + + binding =3D net_devmem_iov_binding(niov); + + if (net_devmem_autorelease_enabled()) { + err =3D tcp_xa_pool_refill(sk, + &tcp_xa_pool, + skb_shinfo(skb)->nr_frags - i); + if (err) + goto out; + + dmabuf_cmsg.frag_token =3D + tcp_xa_pool.tokens[tcp_xa_pool.idx]; + } else { + if (!sk->sk_devmem_info.binding) { + if (!net_devmem_dmabuf_binding_user_get(binding)) { + err =3D -ENODEV; + goto out; + } + + if (!net_devmem_dmabuf_binding_get(binding)) { + net_devmem_dmabuf_binding_user_put(binding); + err =3D -ENODEV; + goto out; + } + + sk->sk_devmem_info.binding =3D binding; + } + + if (sk->sk_devmem_info.binding !=3D binding) { + err =3D -EFAULT; + goto out; + } + + dmabuf_cmsg.frag_token =3D + net_iov_virtual_addr(niov) >> PAGE_SHIFT; + } + =20 /* Will perform the exchange later */ - dmabuf_cmsg.frag_token =3D tcp_xa_pool.tokens[tcp_xa_pool.idx]; dmabuf_cmsg.dmabuf_id =3D net_devmem_iov_binding_id(niov); =20 offset +=3D copy; @@ -2587,8 +2643,7 @@ static int tcp_recvmsg_dmabuf(struct sock *sk, const = struct sk_buff *skb, if (err) goto out; =20 - atomic_long_inc(&niov->desc.pp_ref_count); - tcp_xa_pool.netmems[tcp_xa_pool.idx++] =3D skb_frag_netmem(frag); + tcp_xa_pool_inc_pp_ref_count(&tcp_xa_pool, frag); =20 sent +=3D copy; =20 diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index f8a9596e8f4d..420e8c8ebf6d 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -89,6 +89,9 @@ =20 #include =20 +#include +#include "../core/devmem.h" + #include =20 #ifdef CONFIG_TCP_MD5SIG @@ -2492,7 +2495,7 @@ static void tcp_release_user_frags(struct sock *sk) unsigned long index; void *netmem; =20 - xa_for_each(&sk->sk_user_frags, index, netmem) + xa_for_each(&sk->sk_devmem_info.frags, index, netmem) WARN_ON_ONCE(!napi_pp_put_page((__force netmem_ref)netmem)); #endif } @@ -2503,7 +2506,15 @@ void tcp_v4_destroy_sock(struct sock *sk) =20 tcp_release_user_frags(sk); =20 - xa_destroy(&sk->sk_user_frags); + if (!net_devmem_autorelease_enabled() && sk->sk_devmem_info.binding) { + net_devmem_dmabuf_binding_user_put(sk->sk_devmem_info.binding); + net_devmem_dmabuf_binding_put(sk->sk_devmem_info.binding); + sk->sk_devmem_info.binding =3D NULL; + WARN_ONCE(!xa_empty(&sk->sk_devmem_info.frags), + "non-empty xarray discovered in autorelease off mode"); + } + + xa_destroy(&sk->sk_devmem_info.frags); =20 trace_tcp_destroy_sock(sk); =20 diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c index bd5462154f97..2aec977f5c12 100644 --- a/net/ipv4/tcp_minisocks.c +++ b/net/ipv4/tcp_minisocks.c @@ -662,7 +662,8 @@ struct sock *tcp_create_openreq_child(const struct sock= *sk, =20 __TCP_INC_STATS(sock_net(sk), TCP_MIB_PASSIVEOPENS); =20 - xa_init_flags(&newsk->sk_user_frags, XA_FLAGS_ALLOC1); + xa_init_flags(&newsk->sk_devmem_info.frags, XA_FLAGS_ALLOC1); + newsk->sk_devmem_info.binding =3D NULL; =20 return newsk; } diff --git a/tools/include/uapi/linux/netdev.h b/tools/include/uapi/linux/n= etdev.h index e0b579a1df4f..1e5c209cb998 100644 --- a/tools/include/uapi/linux/netdev.h +++ b/tools/include/uapi/linux/netdev.h @@ -207,6 +207,7 @@ enum { NETDEV_A_DMABUF_QUEUES, NETDEV_A_DMABUF_FD, NETDEV_A_DMABUF_ID, + NETDEV_A_DMABUF_AUTORELEASE, =20 __NETDEV_A_DMABUF_MAX, NETDEV_A_DMABUF_MAX =3D (__NETDEV_A_DMABUF_MAX - 1) --=20 2.47.3 From nobody Sun Feb 8 18:44:10 2026 Received: from mail-yx1-f67.google.com (mail-yx1-f67.google.com [74.125.224.67]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D539A34B693 for ; Fri, 16 Jan 2026 05:03:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=74.125.224.67 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768539811; cv=none; b=UW8lyjUJiuubu1joBWltkc8YozU2Oy2FSHE96mhFywqYowaLd3wmZ0rC47cv/BLgXzV4pBfKynBTmuA2i1Xqui8uc0a8zdGpJbkuhh2mu1dY7EhVzFad9mzTwGRcDnNimw9e1MHIW+fteGcm7y1eKfChD/sV/RMnfGl9sm6ZP+k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768539811; c=relaxed/simple; bh=s3BhnRo4qYeui2ODW6ff8S6N7tSfK9I74K29oR3s67M=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=tdO4Ql9MU57T3C+HYyHdDfnIzgswLRQdwAJSOJE7uWZukreICTFXRhgnzhRpUs9tPlptdipiHejoavMtjreIPYmUfz2kNZPOiIk6vvUQ5l+YR83Mory0/uxNMd4/AshHCw4fkpCnr7o1vBDki0e1xIEgI9kFt1h1kZNojDVb7Fg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=BJvglZt2; arc=none smtp.client-ip=74.125.224.67 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="BJvglZt2" Received: by mail-yx1-f67.google.com with SMTP id 956f58d0204a3-644715aad1aso1949440d50.0 for ; Thu, 15 Jan 2026 21:03:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1768539801; x=1769144601; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=K+I56CVp15hcfRzFNLZYMGassuBp58/sVTYq1ymlLks=; b=BJvglZt2EHUWZv3P2Z5dXbAAF5ohuFFr0vlbMVfB/ewM9z6S95XtSXAU5S+hjlaneV w6mNPh2RPODTngH1nCrwsSNqIzeZ/KQMfZc9+jNlrV9Gag06Koqafxzs6V7EB83c65S7 0iXs95woS82onCi+0K70TfBTZ8ngPh0vf5oOzLfA3f26SiuG0UPg1zkX7TvqmvnL8Z9s oUgiQGYSaCkYc2zwzFHkvA3AIvXPHp433pPVg754O7vUElhH3oJiiER6widNMLlwIEy0 Iej+iKBAofWwGA0pR8ky/DzgrFzk++1p3TlOAUsUC2N3/lzrMbfi0CanUZ7H7CWDg1Jh UvqA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1768539801; x=1769144601; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=K+I56CVp15hcfRzFNLZYMGassuBp58/sVTYq1ymlLks=; b=xGMNYbRU77oPSqAktNcNHocmapYCnsxAfFpTzS22wgQ/1dgJrpXBpWYJA9EllhrcKR u7vvPnjZvZ331+Tr3d4UTeLLyHgMp1QlxDQBfcDhVe1A3wITo+2NKjMqeeWSTFPPmVdY MTt9k9tohvPHJP1dNy4c3+PkMqWHyc09+rUVUsFKj6MK6TiMFDRUMU+rewIWZqFt013b 03Xj6YLOU2aJi6ElNR7jzfwzk3K3Mcmu4mZYWlr87BM9VgSgVo2LAWANCGK4w2XjHWgc Wal59gs76a10adysI2wdYL91LjCW0NsySAR3BFPl5mPdfnZgCgumW4PjuWa2CgA8wnsE tHVA== X-Forwarded-Encrypted: i=1; AJvYcCUbpV9woiN3QN9wXPfWE6gj/ogArQtNjn28bYCLK4zTJ8VDTI9IuIxMllxxujIbVE+cWDd4RRi6Si3aH/8=@vger.kernel.org X-Gm-Message-State: AOJu0YwnxEaS2rTfpsZHUf4SYEdWffYmPmYFEOdAXuEhnSu4buCb9Efn VcS47OJKBadD5OOC9Bez4sWH/T83qE3xPQTDL5dcFAA6hfdVWkIvlH7e X-Gm-Gg: AY/fxX7+ph9P8PNU+WE0slQz8ZqyFbLVumEwAxfeajY0doE0GpqAPwh5xXfb3A+Qls2 vcZB7HZK/YWpmzzarICl45Kt/+DEW+Pej3bZAiko17dAak+bAg1YYNXot/jouJFCGC+BLAEZilS ew2MphdK0Ydd9Kf5PYiJlx7a4aAcBU5+IUQYDXY0duH/Hy1E7KZXuVfs7kVRpXl7EAFeuUnd2sc EhYqzM/YJ8qsMAW1C59NbMwDc/1KIXKvE02UvT7lS/nbemDXcm6d/tNM89e/tVheD1xF765qacW Q+c2PoFL3Z7bZe6t5XjCZ7/neVIhE+YLVEUc8l0ay2nJz853G5eeNvJd5GTo8sVpUMMma5xl2V7 /7Vzq9yKgeSjid8czHeJAeXh2mi1vXbbSfSwAJywT2fPwRGX4qZami4ERMc1P3nhvTQZ+ehNW3i jb9kqimMBS X-Received: by 2002:a53:ac89:0:b0:645:5ac1:8ca with SMTP id 956f58d0204a3-6491691b79amr1489058d50.11.1768539800827; Thu, 15 Jan 2026 21:03:20 -0800 (PST) Received: from localhost ([2a03:2880:25ff:c::]) by smtp.gmail.com with ESMTPSA id 956f58d0204a3-649170be5desm713936d50.19.2026.01.15.21.03.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 15 Jan 2026 21:03:20 -0800 (PST) From: Bobby Eshleman Date: Thu, 15 Jan 2026 21:02:15 -0800 Subject: [PATCH net-next v10 4/5] net: devmem: document NETDEV_A_DMABUF_AUTORELEASE netlink attribute Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260115-scratch-bobbyeshleman-devmem-tcp-token-upstream-v10-4-686d0af71978@meta.com> References: <20260115-scratch-bobbyeshleman-devmem-tcp-token-upstream-v10-0-686d0af71978@meta.com> In-Reply-To: <20260115-scratch-bobbyeshleman-devmem-tcp-token-upstream-v10-0-686d0af71978@meta.com> To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Simon Horman , Kuniyuki Iwashima , Willem de Bruijn , Neal Cardwell , David Ahern , Mina Almasry , Arnd Bergmann , Jonathan Corbet , Andrew Lunn , Shuah Khan , Donald Hunter Cc: Stanislav Fomichev , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, asml.silence@gmail.com, matttbe@kernel.org, skhawaja@google.com, Bobby Eshleman X-Mailer: b4 0.14.3 From: Bobby Eshleman Update devmem.rst documentation to describe the autorelease netlink attribute used during RX dmabuf binding. The autorelease attribute is specified at bind-time via the netlink API (NETDEV_CMD_BIND_RX) and controls what happens to outstanding tokens when the socket closes. Document the two token release modes (automatic vs manual), how to configure the binding for autorelease, the perf benefits, new caveats and restrictions, and the way the mode is enforced system-wide. Signed-off-by: Bobby Eshleman --- Changes in v7: - Document netlink instead of sockopt - Mention system-wide locked to one mode --- Documentation/networking/devmem.rst | 73 +++++++++++++++++++++++++++++++++= ++++ 1 file changed, 73 insertions(+) diff --git a/Documentation/networking/devmem.rst b/Documentation/networking= /devmem.rst index a6cd7236bfbd..f85f1dcc9621 100644 --- a/Documentation/networking/devmem.rst +++ b/Documentation/networking/devmem.rst @@ -235,6 +235,79 @@ can be less than the tokens provided by the user in ca= se of: (a) an internal kernel leak bug. (b) the user passed more than 1024 frags. =20 + +Autorelease Control +~~~~~~~~~~~~~~~~~~~ + +The autorelease mode controls what happens to outstanding tokens (tokens n= ot +released via SO_DEVMEM_DONTNEED) when the socket closes. Autorelease is +configured per-binding at binding creation time via the netlink API:: + + struct netdev_bind_rx_req *req; + struct netdev_bind_rx_rsp *rsp; + struct ynl_sock *ys; + struct ynl_error yerr; + + ys =3D ynl_sock_create(&ynl_netdev_family, &yerr); + + req =3D netdev_bind_rx_req_alloc(); + netdev_bind_rx_req_set_ifindex(req, ifindex); + netdev_bind_rx_req_set_fd(req, dmabuf_fd); + netdev_bind_rx_req_set_autorelease(req, 0); /* 0 =3D manual, 1 =3D auto */ + __netdev_bind_rx_req_set_queues(req, queues, n_queues); + + rsp =3D netdev_bind_rx(ys, req); + + dmabuf_id =3D rsp->id; + +When autorelease is disabled (0): + +- Outstanding tokens are NOT released when the socket closes +- Outstanding tokens are only released when all RX queues are unbound AND = all + sockets that called recvmsg() are closed +- Provides better performance by eliminating xarray overhead (~13% CPU red= uction) +- Kernel tracks tokens via atomic reference counters in net_iov structures + +When autorelease is enabled (1): + +- Outstanding tokens are automatically released when the socket closes +- Backwards compatible behavior +- Kernel tracks tokens in an xarray per socket + +The default is autorelease disabled. + +Important: In both modes, applications should call SO_DEVMEM_DONTNEED to +return tokens as soon as they are done processing. The autorelease setting= only +affects what happens to tokens that are still outstanding when close() is = called. + +The mode is enforced system-wide. Once a binding is created with a specific +autorelease mode, all subsequent bindings system-wide must use the same mo= de. + + +Performance Considerations +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Disabling autorelease provides approximately ~13% CPU utilization improvem= ent +in RX workloads. That said, applications must ensure all tokens are releas= ed +via SO_DEVMEM_DONTNEED before closing the socket, otherwise the backing pa= ges +will remain pinned until all RX queues are unbound AND all sockets that ca= lled +recvmsg() are closed. + + +Caveats +~~~~~~~ + +- Once a system-wide autorelease mode is selected (via the first binding), + all subsequent bindings must use the same mode. Attempts to create bindi= ngs + with a different mode will be rejected with -EBUSY. + +- Applications using manual release mode (autorelease=3D0) must ensure all= tokens + are returned via SO_DEVMEM_DONTNEED before socket close to avoid resource + leaks during the lifetime of the dmabuf binding. Tokens not released bef= ore + close() will only be freed when all RX queues are unbound AND all sockets + that called recvmsg() are closed. + + TX Interface =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =20 --=20 2.47.3 From nobody Sun Feb 8 18:44:10 2026 Received: from mail-yw1-f179.google.com (mail-yw1-f179.google.com [209.85.128.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D2C2D34B43F for ; Fri, 16 Jan 2026 05:03:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768539809; cv=none; b=quNMxHLt/2ezLlbcFU3Kxb8WXP91PhL0MZbYXVQwT39CuxNCL+gUdRIVF5tUZqCt4ljLyusehssRsEWc/EPTY+f0gaAFeUcu6jMHkT3LqTaaR7ioSKo0wGvYKybaA3fyanuT3AdR44wLgaZzmYFgmOdm6gydSAC/U1UWoYketo4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768539809; c=relaxed/simple; bh=c/RMTvI113l1hFS34MtbsQYRQNKpL3Sqqwbb3gjFmTU=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=ohdaNB44qTa4z/k8INlmJBnEp2mPa9RIPy2YpSCKWdPq7Ga5a+uiHkvfuHwTRRpX45uu2pDOkpa+awonKjcZsjc6CfzrHCJV7GcT8wnHhe31C2KkYR9XrHrM2kSorVGXV2CHkEPmPksbzQojTQeuSJMYBw3MRn70zZUpTo6MWTo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=UTL83YtJ; arc=none smtp.client-ip=209.85.128.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="UTL83YtJ" Received: by mail-yw1-f179.google.com with SMTP id 00721157ae682-78e6dc6d6d7so17448057b3.3 for ; Thu, 15 Jan 2026 21:03:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1768539802; x=1769144602; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=kXwkpjnmiMoU36VnzxxQMbXTyAXIjsqE2khAXdKokBU=; b=UTL83YtJyPRozeJPQ+zbZaBeLx3mxtEB4NGFV6y2JBimPyOlfLCWsiQvagcAjrNGhi fZbsqqFiMw/ZsH+13Ddy+ExcpUHuyNC4zJKYzWjl24O+ylRX86U3b88R5KuD5Jy5OGcB m2TXk8iL0FSjuKCK3dYmdXoORI/n6IGtSCTv6EpcCCrcqDTfwX/BBkFY1D3Iu754HsAu IJan2+ZG04vDtgDyZJP/OMZCg6Olkjr99r4K/IeroOsQcjFiHauIHcqrPK1w3BNx8NOO 90T+p3C+JzGkmzSozcLm+/2AIEPV7AV1u+CM2Bi7l/tYoN8VTbOIAJ9UZzD+jQjf9Olw nVOw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1768539802; x=1769144602; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=kXwkpjnmiMoU36VnzxxQMbXTyAXIjsqE2khAXdKokBU=; b=XgSHrEvsGiXSdhFjjgbq+PP4SeCUYBos5eQK2XeEK16NKXQlnCFQYp048WarY3eHvQ 8vZSUSUBYnTG5MAXN/wcj2bcBhGmhmoCpYUzw8QKPRmb/LhIlvdwahwfS/a3wly0elZY W/6ovhOpNp8tyxabmOv3b7QopwpcktaSFqUIdQfnvSBW56kRjff4hc24OMvo3FMk5xFi OnsILKmET1qdFSyt7TNMJ2aB6Sc638VXxJ2L8Q1oUG/jNTYJ58uPPjb7fcWXucbB18bb 0Z6OVzXJE/XLMW8T/qsY+kAcewPBL7jLmuNLLIZhdLrBP49AuIIrRsscEodPXSHoVXz5 iUQw== X-Forwarded-Encrypted: i=1; AJvYcCV8mqk82XYgxHUJ5c9/ea+qWcUVsEjZt/qk7e0ZMKX2llCIYjtb2QxkWVgl+Mgufoc8dbRWQ+QedhrelWA=@vger.kernel.org X-Gm-Message-State: AOJu0YxnI7ybulrWPdLV6tMZkq/0bN90QhSRu7dDTdc7Rlm8wsn173dm 1a3ZwBr1J1v1Shw+N209jNDGjy3SciFxViPJu58+e1cgyDJRhtveajGP X-Gm-Gg: AY/fxX4vfITg0Ijf7kJjHpcrxzpDDxELA10RypzXrL8ibTV99d18F7XmwbfqhOzecP4 jVF0WHNWz/N1MmvxXXQdxYHrbiEEDP54rk3ZzeqTb+qk2xU8S3Up/9znw0tNtlPF5U+6ci/5DaZ wMzHjVIde6L+GmLZ8Ks0Awoa5GPIzl4UMiRM4trEzlZe30tcz9IBJQUqRJoumuL+QixOLc6pUWd W70eLb4bPtPS2mIaZ3O0CundC3ftSZ/QBdXFojAd0oYsV/Qkr0csOZFDg46oDEpWlbr/j8dkyr0 SF+rHwFseBjGvJbRbMlHtvX/O0ZSB7vhPgXJnBRk7Ml5iCj0fbejh8uQX594gtKkB4vONItr711 W/WZfQpIpkch4y1bf+BGd2AiRdzeNWnSvpbYMTN0PfpWHveaatP7xLn6TrOY7gVaMFMyVNQu7Lm Lj1WB0vu9mpQ== X-Received: by 2002:a05:690c:c8c:b0:786:4fd5:e5c8 with SMTP id 00721157ae682-793c684d825mr12843487b3.57.1768539801685; Thu, 15 Jan 2026 21:03:21 -0800 (PST) Received: from localhost ([2a03:2880:25ff:45::]) by smtp.gmail.com with ESMTPSA id 00721157ae682-793c66c8069sm5512017b3.6.2026.01.15.21.03.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 15 Jan 2026 21:03:21 -0800 (PST) From: Bobby Eshleman Date: Thu, 15 Jan 2026 21:02:16 -0800 Subject: [PATCH net-next v10 5/5] selftests: drv-net: devmem: add autorelease tests Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260115-scratch-bobbyeshleman-devmem-tcp-token-upstream-v10-5-686d0af71978@meta.com> References: <20260115-scratch-bobbyeshleman-devmem-tcp-token-upstream-v10-0-686d0af71978@meta.com> In-Reply-To: <20260115-scratch-bobbyeshleman-devmem-tcp-token-upstream-v10-0-686d0af71978@meta.com> To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Simon Horman , Kuniyuki Iwashima , Willem de Bruijn , Neal Cardwell , David Ahern , Mina Almasry , Arnd Bergmann , Jonathan Corbet , Andrew Lunn , Shuah Khan , Donald Hunter Cc: Stanislav Fomichev , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, asml.silence@gmail.com, matttbe@kernel.org, skhawaja@google.com, Bobby Eshleman X-Mailer: b4 0.14.3 From: Bobby Eshleman Add test cases for autorelease and new edge cases. A happy path check_rx / check_rx_autorelease test is added. The test check_unbind_before_recv/_autorelease tests that after a connection is accepted, but before recv is called, that unbind behaves correctly. The test check_unbind_after_recv/_autorelease tests that after a connection is accepted, but after recv is called, that unbind behaves correctly. To facilitate the unbind tests, ncdevmem is changed to take an "unbind" mod= e. The unbind modes are defined as the following: UNBIND_MODE_NORMAL: unbind after done with normal traffic UNBIND_MODE_BEFORE_RECV: Unbind before any recvmsg. The socket hasn't become a user yet, so binding->users reaches zero and recvmsg should fail with ENODEV. This validates that sockets can't access devmem after the binding is torn down. UNBIND_MODE_AFTER_RECV: Do one recvmsg first (socket becomes a user), then unbind, then continue receiving. This validates that binding->users keeps the binding alive for sockets that already acquired a reference via recvmsg. ncdevmem is also changed to take an autorelease flag for toggling the autorelease mode. TAP version 13 1..8 ok 1 devmem.check_rx ok 2 devmem.check_rx_autorelease ok 3 devmem.check_unbind_before_recv ok 4 devmem.check_unbind_before_recv_autorelease ok 5 devmem.check_unbind_after_recv ok 6 devmem.check_unbind_after_recv_autorelease ok 7 devmem.check_tx ok 8 devmem.check_tx_chunks Signed-off-by: Bobby Eshleman --- Changes in v10: - add tests for "unbind before/after recv" edge cases Changes in v8: - removed stale/missing tests Changes in v7: - use autorelease netlink - remove sockopt tests --- tools/testing/selftests/drivers/net/hw/devmem.py | 98 +++++++++++++++++++= +++- tools/testing/selftests/drivers/net/hw/ncdevmem.c | 68 ++++++++++++++-- 2 files changed, 157 insertions(+), 9 deletions(-) diff --git a/tools/testing/selftests/drivers/net/hw/devmem.py b/tools/testi= ng/selftests/drivers/net/hw/devmem.py index 45c2d49d55b6..0bbfdf19e23d 100755 --- a/tools/testing/selftests/drivers/net/hw/devmem.py +++ b/tools/testing/selftests/drivers/net/hw/devmem.py @@ -25,7 +25,98 @@ def check_rx(cfg) -> None: =20 port =3D rand_port() socat =3D f"socat -u - TCP{cfg.addr_ipver}:{cfg.baddr}:{port},bind=3D{= cfg.remote_baddr}:{port}" - listen_cmd =3D f"{cfg.bin_local} -l -f {cfg.ifname} -s {cfg.addr} -p {= port} -c {cfg.remote_addr} -v 7" + listen_cmd =3D f"{cfg.bin_local} -l -f {cfg.ifname} -s {cfg.addr} -p {= port} \ + -c {cfg.remote_addr} -v 7 -a 0" + + with bkg(listen_cmd, exit_wait=3DTrue) as ncdevmem: + wait_port_listen(port) + cmd(f"yes $(echo -e \x01\x02\x03\x04\x05\x06) | \ + head -c 1K | {socat}", host=3Dcfg.remote, shell=3DTrue) + + ksft_eq(ncdevmem.ret, 0) + + +@ksft_disruptive +def check_rx_autorelease(cfg) -> None: + """Test devmem TCP receive with autorelease mode enabled.""" + require_devmem(cfg) + + port =3D rand_port() + socat =3D f"socat -u - TCP{cfg.addr_ipver}:{cfg.baddr}:{port},bind=3D{= cfg.remote_baddr}:{port}" + listen_cmd =3D f"{cfg.bin_local} -l -f {cfg.ifname} -s {cfg.addr} -p {= port} \ + -c {cfg.remote_addr} -v 7 -a 1" + + with bkg(listen_cmd, exit_wait=3DTrue) as ncdevmem: + wait_port_listen(port) + cmd(f"yes $(echo -e \x01\x02\x03\x04\x05\x06) | \ + head -c 1K | {socat}", host=3Dcfg.remote, shell=3DTrue) + + ksft_eq(ncdevmem.ret, 0) + + +@ksft_disruptive +def check_unbind_before_recv(cfg) -> None: + """Test dmabuf unbind before socket recv with autorelease disabled.""" + require_devmem(cfg) + + port =3D rand_port() + socat =3D f"socat -u - TCP{cfg.addr_ipver}:{cfg.baddr}:{port},bind=3D{= cfg.remote_baddr}:{port}" + listen_cmd =3D f"{cfg.bin_local} -l -f {cfg.ifname} -s {cfg.addr} -p {= port} \ + -c {cfg.remote_addr} -v 7 -a 0 -U 1" + + with bkg(listen_cmd, exit_wait=3DTrue) as ncdevmem: + wait_port_listen(port) + cmd(f"yes $(echo -e \x01\x02\x03\x04\x05\x06) | \ + head -c 1K | {socat}", host=3Dcfg.remote, shell=3DTrue) + + ksft_eq(ncdevmem.ret, 0) + + +@ksft_disruptive +def check_unbind_before_recv_autorelease(cfg) -> None: + """Test dmabuf unbind before socket recv with autorelease enabled.""" + require_devmem(cfg) + + port =3D rand_port() + socat =3D f"socat -u - TCP{cfg.addr_ipver}:{cfg.baddr}:{port},bind=3D{= cfg.remote_baddr}:{port}" + listen_cmd =3D f"{cfg.bin_local} -l -f {cfg.ifname} -s {cfg.addr} -p {= port} \ + -c {cfg.remote_addr} -v 7 -a 1 -U 1" + + with bkg(listen_cmd, exit_wait=3DTrue) as ncdevmem: + wait_port_listen(port) + cmd(f"yes $(echo -e \x01\x02\x03\x04\x05\x06) | \ + head -c 1K | {socat}", host=3Dcfg.remote, shell=3DTrue) + + ksft_eq(ncdevmem.ret, 0) + + +@ksft_disruptive +def check_unbind_after_recv(cfg) -> None: + """Test dmabuf unbind after socket recv with autorelease disabled.""" + require_devmem(cfg) + + port =3D rand_port() + socat =3D f"socat -u - TCP{cfg.addr_ipver}:{cfg.baddr}:{port},bind=3D{= cfg.remote_baddr}:{port}" + listen_cmd =3D f"{cfg.bin_local} -l -f {cfg.ifname} -s {cfg.addr} -p {= port} \ + -c {cfg.remote_addr} -v 7 -a 0 -U 2" + + with bkg(listen_cmd, exit_wait=3DTrue) as ncdevmem: + wait_port_listen(port) + cmd(f"yes $(echo -e \x01\x02\x03\x04\x05\x06) | \ + head -c 1K | {socat}", host=3Dcfg.remote, shell=3DTrue) + + ksft_eq(ncdevmem.ret, 0) + + +@ksft_disruptive +def check_unbind_after_recv_autorelease(cfg) -> None: + """Test dmabuf unbind after socket recv with autorelease enabled.""" + require_devmem(cfg) + + port =3D rand_port() + socat =3D f"socat -u - TCP{cfg.addr_ipver}:{cfg.baddr}:{port},bind=3D{= cfg.remote_baddr}:{port}" + listen_cmd =3D f"{cfg.bin_local} -l -f {cfg.ifname} -s {cfg.addr} -p {= port} \ + -c {cfg.remote_addr} -v 7 -a 1 -U 2" =20 with bkg(listen_cmd, exit_wait=3DTrue) as ncdevmem: wait_port_listen(port) @@ -68,7 +159,10 @@ def main() -> None: cfg.bin_local =3D path.abspath(path.dirname(__file__) + "/ncdevmem= ") cfg.bin_remote =3D cfg.remote.deploy(cfg.bin_local) =20 - ksft_run([check_rx, check_tx, check_tx_chunks], + ksft_run([check_rx, check_rx_autorelease, + check_unbind_before_recv, check_unbind_before_recv_autor= elease, + check_unbind_after_recv, check_unbind_after_recv_autorel= ease, + check_tx, check_tx_chunks], args=3D(cfg, )) ksft_exit() =20 diff --git a/tools/testing/selftests/drivers/net/hw/ncdevmem.c b/tools/test= ing/selftests/drivers/net/hw/ncdevmem.c index 3288ed04ce08..5cbff3c602b2 100644 --- a/tools/testing/selftests/drivers/net/hw/ncdevmem.c +++ b/tools/testing/selftests/drivers/net/hw/ncdevmem.c @@ -85,6 +85,13 @@ =20 #define MAX_IOV 1024 =20 +enum unbind_mode_type { + UNBIND_MODE_NORMAL, + UNBIND_MODE_BEFORE_RECV, + UNBIND_MODE_AFTER_RECV, + UNBIND_MODE_INVAL, +}; + static size_t max_chunk; static char *server_ip; static char *client_ip; @@ -92,6 +99,8 @@ static char *port; static size_t do_validation; static int start_queue =3D -1; static int num_queues =3D -1; +static int devmem_autorelease; +static enum unbind_mode_type unbind_mode; static char *ifname; static unsigned int ifindex; static unsigned int dmabuf_id; @@ -679,7 +688,8 @@ static int configure_flow_steering(struct sockaddr_in6 = *server_sin) =20 static int bind_rx_queue(unsigned int ifindex, unsigned int dmabuf_fd, struct netdev_queue_id *queues, - unsigned int n_queue_index, struct ynl_sock **ys) + unsigned int n_queue_index, struct ynl_sock **ys, + int autorelease) { struct netdev_bind_rx_req *req =3D NULL; struct netdev_bind_rx_rsp *rsp =3D NULL; @@ -695,6 +705,7 @@ static int bind_rx_queue(unsigned int ifindex, unsigned= int dmabuf_fd, req =3D netdev_bind_rx_req_alloc(); netdev_bind_rx_req_set_ifindex(req, ifindex); netdev_bind_rx_req_set_fd(req, dmabuf_fd); + netdev_bind_rx_req_set_autorelease(req, autorelease); __netdev_bind_rx_req_set_queues(req, queues, n_queue_index); =20 rsp =3D netdev_bind_rx(*ys, req); @@ -872,7 +883,8 @@ static int do_server(struct memory_buffer *mem) goto err_reset_rss; } =20 - if (bind_rx_queue(ifindex, mem->fd, create_queues(), num_queues, &ys)) { + if (bind_rx_queue(ifindex, mem->fd, create_queues(), num_queues, &ys, + devmem_autorelease)) { pr_err("Failed to bind"); goto err_reset_flow_steering; } @@ -922,6 +934,23 @@ static int do_server(struct memory_buffer *mem) fprintf(stderr, "Got connection from %s:%d\n", buffer, ntohs(client_addr.sin6_port)); =20 + if (unbind_mode =3D=3D UNBIND_MODE_BEFORE_RECV) { + struct pollfd pfd =3D { + .fd =3D client_fd, + .events =3D POLLIN, + }; + + /* Wait for data then unbind (before recvmsg) */ + ret =3D poll(&pfd, 1, 5000); + if (ret <=3D 0) { + pr_err("poll failed or timed out waiting for data"); + goto err_close_client; + } + + ynl_sock_destroy(ys); + ys =3D NULL; + } + while (1) { struct iovec iov =3D { .iov_base =3D iobuf, .iov_len =3D sizeof(iobuf) }; @@ -942,11 +971,19 @@ static int do_server(struct memory_buffer *mem) if (ret < 0 && (errno =3D=3D EAGAIN || errno =3D=3D EWOULDBLOCK)) continue; if (ret < 0) { + if (unbind_mode =3D=3D UNBIND_MODE_BEFORE_RECV && + errno =3D=3D ENODEV) + goto cleanup; + perror("recvmsg"); if (errno =3D=3D EFAULT) { pr_err("received EFAULT, won't recover"); goto err_close_client; } + if (errno =3D=3D ENODEV) { + pr_err("unexpected ENODEV"); + goto err_close_client; + } continue; } if (ret =3D=3D 0) { @@ -1025,6 +1062,11 @@ static int do_server(struct memory_buffer *mem) goto err_close_client; } =20 + if (unbind_mode =3D=3D UNBIND_MODE_AFTER_RECV && ys) { + ynl_sock_destroy(ys); + ys =3D NULL; + } + fprintf(stderr, "total_received=3D%lu\n", total_received); } =20 @@ -1043,7 +1085,8 @@ static int do_server(struct memory_buffer *mem) err_free_tmp: free(tmp_mem); err_unbind: - ynl_sock_destroy(ys); + if (ys) + ynl_sock_destroy(ys); err_reset_flow_steering: reset_flow_steering(); err_reset_rss: @@ -1092,7 +1135,7 @@ int run_devmem_tests(void) goto err_reset_headersplit; } =20 - if (!bind_rx_queue(ifindex, mem->fd, queues, num_queues, &ys)) { + if (!bind_rx_queue(ifindex, mem->fd, queues, num_queues, &ys, 0)) { pr_err("Binding empty queues array should have failed"); goto err_unbind; } @@ -1108,7 +1151,7 @@ int run_devmem_tests(void) goto err_reset_headersplit; } =20 - if (!bind_rx_queue(ifindex, mem->fd, queues, num_queues, &ys)) { + if (!bind_rx_queue(ifindex, mem->fd, queues, num_queues, &ys, 0)) { pr_err("Configure dmabuf with header split off should have failed"); goto err_unbind; } @@ -1124,7 +1167,7 @@ int run_devmem_tests(void) goto err_reset_headersplit; } =20 - if (bind_rx_queue(ifindex, mem->fd, queues, num_queues, &ys)) { + if (bind_rx_queue(ifindex, mem->fd, queues, num_queues, &ys, 0)) { pr_err("Failed to bind"); goto err_reset_headersplit; } @@ -1397,7 +1440,7 @@ int main(int argc, char *argv[]) int is_server =3D 0, opt; int ret, err =3D 1; =20 - while ((opt =3D getopt(argc, argv, "ls:c:p:v:q:t:f:z:")) !=3D -1) { + while ((opt =3D getopt(argc, argv, "ls:c:p:v:q:t:f:z:a:U:")) !=3D -1) { switch (opt) { case 'l': is_server =3D 1; @@ -1426,6 +1469,12 @@ int main(int argc, char *argv[]) case 'z': max_chunk =3D atoi(optarg); break; + case 'a': + devmem_autorelease =3D atoi(optarg); + break; + case 'U': + unbind_mode =3D atoi(optarg); + break; case '?': fprintf(stderr, "unknown option: %c\n", optopt); break; @@ -1437,6 +1486,11 @@ int main(int argc, char *argv[]) return 1; } =20 + if (unbind_mode >=3D UNBIND_MODE_INVAL) { + pr_err("invalid unbind mode %u\n", unbind_mode); + return 1; + } + ifindex =3D if_nametoindex(ifname); =20 fprintf(stderr, "using ifindex=3D%u\n", ifindex); --=20 2.47.3