From nobody Tue Dec 2 02:05:00 2025 Received: from mail-yx1-f47.google.com (mail-yx1-f47.google.com [74.125.224.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CCC632ED161 for ; Thu, 20 Nov 2025 03:37:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=74.125.224.47 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763609834; cv=none; b=BxmYNVtbacS+Od4TIAzx117vRCzbt+3vxZUyKq5bBbXOwh3wr7TniorhZhDh5UUnrA0gEjr4iBHjYA8OLSU3V0Y2dbF3cNoo3KXQevATvJHO904bQV8w6AWfAYJ7gky8ickQsegZJ34gFkoOqpOfR6BTzM4D+gNWLhoW5XPwpoM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763609834; c=relaxed/simple; bh=wuDktkvkmPPQMkrr7ULLpZaYVUd01D6PWmVFjXlkyqY=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=Lj4gz7u64XByls9pNQYFsBJgwGHG/Hgg5CEn5XOLQo0VrqNfubjpfaCsOuCnScwyo+3DACwKtQjbyI3KK35rH+cFSxMlz208NowCjrHoqpcuvSaDfxOSr6DvvBBtmzAt+Ewbzh1IKK9kJXlXLrcplAaYbsIKhx8QVwXf5u7YZzU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=GNp71wbV; arc=none smtp.client-ip=74.125.224.47 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="GNp71wbV" Received: by mail-yx1-f47.google.com with SMTP id 956f58d0204a3-640daf41b19so554409d50.0 for ; Wed, 19 Nov 2025 19:37:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1763609832; x=1764214632; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=nZTyU8CT4iARbspCBMuLPPZbuWjZ7S1vt4fEqVtqV7I=; b=GNp71wbVJAfsI+xGI91OOvFT+BMpzV50+PagzJ7PVfotIJwvNsTlI+ir7yesWSWeJI SdTqXFDTSrud/6naeEBiQxx2uZDg68awdbq0YVNrInrOioeIyTrxcnFNIMikB/3kCyO8 Y5cxwP2tG9YBAdlnCu8lm4M1LRGa+K9eiLbPJmu+MLcVLmsHbYMh3KbpuIiPaiyBuuGK BRCv2VlN5kipm5P7lb4cEIRr0tlh85FVLTzVlC80vlEAZ8EtXqg7hkS7n34+V+rxlaMw m+VOzp/7xYHt90eJA/+DnyKTby1fnyEjCVrccPgch3w0XyZq+p4sC3Xu7ywN3a2qzGsE voBA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1763609832; x=1764214632; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=nZTyU8CT4iARbspCBMuLPPZbuWjZ7S1vt4fEqVtqV7I=; b=GpaphnHHs3PYmncW0LMvyUIRhYa2waP9k3hAnlLAsX/hmlTw37Lpup+HA1Gz4wSNx8 XTQmi8aXR/givz3WsbIG4dXNP8GTGdIMluUzPzRInse3rJA3/WNRInLJATp3NXo4+RZX XQTSPu/jxqkUJqwWIxQIpuXLO/DgZdh60/jfEBmq8sXy7p788hIDDEkI/AAxI1YxOOzF m8+KN8+03Y+D8Kzv3GexPiwd7CW4qe2JiHA51Fe6ZFclYosdnSqlaSLx9HNGQg8aOhrc qI3HqR2uM928U8I18n+A2KGDEYfwTawrZL3yMImFMZD3ZDeT+CRHKMcl+6A6kU0Xd9t6 b8zw== X-Forwarded-Encrypted: i=1; AJvYcCUXeEyXoB1nXExO9TAj3GiO4Oyn0+k/WtKgmNaez4+dabP1DWXpw6hLjA2WXKVhtUeTKRxHGH1jYZK3v7c=@vger.kernel.org X-Gm-Message-State: AOJu0Yy78gYAwsxv7JRxBRE63P0XbyNRWnb0XrYng1JsPB3l/4Q70DB+ 5n2+zqhdA7ZO+nvuO6IlV7XZUaBuyTuzP2uiXb25luRNiKub4sCVU07+uGynTYEs X-Gm-Gg: ASbGnctL1rYB+M9aGvo1hk+SHYNXEFMvgcVNc6TLeRfisZUSZiJLcUVSnU7i3C7GP4A 6CxDTMfRk5krQqgXXlUF9NY+hjYivVJuT4KMiATqJcYi43knUvEYqYmcI5zpDTIQTkW6BMUIF3C wCYlV5IjGV17bX/9bj2OMkSTvVP4//+MjABVHDURTHqhgBn9YdJISbXmp//DI074nxnrTlU+Ue8 6tf76bqoGL+vi1REdN/LRUzM7g5yz5nMW+6ih4zY6Vrq9Vg9j+7Jh7AhO024maBvIUhZej9p8xS HotpGFy5WeMhZfl/e/orANNYYuozFOSnAz+gVnqoaMhGA01JajcHW0YE3p6vok8ighEbNiBBut6 dKpO/02y0BoFenbJZ1/DCf/NwuuvNqju/xMFe/2Vet7ucQaPTDHID4ZkZYUUtO27428JS77mOyl iK4Ou+wynJK/05cxj2GM7y X-Google-Smtp-Source: AGHT+IF6XnzoA3Fp2yvsn96tIgCF3KkDZHwgr96TFqQEESgZBGU4z3LvWvX/N+AZH+r8JfoIVeqOHw== X-Received: by 2002:a05:690e:168a:b0:63f:b605:b7f2 with SMTP id 956f58d0204a3-642f8df31e4mr616438d50.10.1763609831701; Wed, 19 Nov 2025 19:37:11 -0800 (PST) Received: from localhost ([2a03:2880:25ff:2::]) by smtp.gmail.com with ESMTPSA id 956f58d0204a3-642f71ae739sm452236d50.25.2025.11.19.19.37.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 Nov 2025 19:37:11 -0800 (PST) From: Bobby Eshleman Date: Wed, 19 Nov 2025 19:37:08 -0800 Subject: [PATCH net-next v7 1/5] net: devmem: rename tx_vec to vec in dmabuf binding Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20251119-scratch-bobbyeshleman-devmem-tcp-token-upstream-v7-1-1abc8467354c@meta.com> References: <20251119-scratch-bobbyeshleman-devmem-tcp-token-upstream-v7-0-1abc8467354c@meta.com> In-Reply-To: <20251119-scratch-bobbyeshleman-devmem-tcp-token-upstream-v7-0-1abc8467354c@meta.com> To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Simon Horman , Kuniyuki Iwashima , Willem de Bruijn , Neal Cardwell , David Ahern , Arnd Bergmann , Jonathan Corbet , Andrew Lunn , Shuah Khan , Donald Hunter , Mina Almasry Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, Stanislav Fomichev , Bobby Eshleman X-Mailer: b4 0.14.3 From: Bobby Eshleman Rename the 'tx_vec' field in struct net_devmem_dmabuf_binding to 'vec'. This field holds pointers to net_iov structures. The rename prepares for reusing 'vec' for both TX and RX directions. No functional change intended. Reviewed-by: Mina Almasry Signed-off-by: Bobby Eshleman --- net/core/devmem.c | 22 +++++++++++----------- net/core/devmem.h | 2 +- 2 files changed, 12 insertions(+), 12 deletions(-) diff --git a/net/core/devmem.c b/net/core/devmem.c index 1d04754bc756..4dee2666dd07 100644 --- a/net/core/devmem.c +++ b/net/core/devmem.c @@ -75,7 +75,7 @@ void __net_devmem_dmabuf_binding_free(struct work_struct = *wq) dma_buf_detach(binding->dmabuf, binding->attachment); dma_buf_put(binding->dmabuf); xa_destroy(&binding->bound_rxqs); - kvfree(binding->tx_vec); + kvfree(binding->vec); kfree(binding); } =20 @@ -232,10 +232,10 @@ net_devmem_bind_dmabuf(struct net_device *dev, } =20 if (direction =3D=3D DMA_TO_DEVICE) { - binding->tx_vec =3D kvmalloc_array(dmabuf->size / PAGE_SIZE, - sizeof(struct net_iov *), - GFP_KERNEL); - if (!binding->tx_vec) { + binding->vec =3D kvmalloc_array(dmabuf->size / PAGE_SIZE, + sizeof(struct net_iov *), + GFP_KERNEL); + if (!binding->vec) { err =3D -ENOMEM; goto err_unmap; } @@ -249,7 +249,7 @@ net_devmem_bind_dmabuf(struct net_device *dev, dev_to_node(&dev->dev)); if (!binding->chunk_pool) { err =3D -ENOMEM; - goto err_tx_vec; + goto err_vec; } =20 virtual =3D 0; @@ -295,7 +295,7 @@ net_devmem_bind_dmabuf(struct net_device *dev, page_pool_set_dma_addr_netmem(net_iov_to_netmem(niov), net_devmem_get_dma_addr(niov)); if (direction =3D=3D DMA_TO_DEVICE) - binding->tx_vec[owner->area.base_virtual / PAGE_SIZE + i] =3D niov; + binding->vec[owner->area.base_virtual / PAGE_SIZE + i] =3D niov; } =20 virtual +=3D len; @@ -315,8 +315,8 @@ net_devmem_bind_dmabuf(struct net_device *dev, gen_pool_for_each_chunk(binding->chunk_pool, net_devmem_dmabuf_free_chunk_owner, NULL); gen_pool_destroy(binding->chunk_pool); -err_tx_vec: - kvfree(binding->tx_vec); +err_vec: + kvfree(binding->vec); err_unmap: dma_buf_unmap_attachment_unlocked(binding->attachment, binding->sgt, direction); @@ -363,7 +363,7 @@ struct net_devmem_dmabuf_binding *net_devmem_get_bindin= g(struct sock *sk, int err =3D 0; =20 binding =3D net_devmem_lookup_dmabuf(dmabuf_id); - if (!binding || !binding->tx_vec) { + if (!binding || !binding->vec) { err =3D -EINVAL; goto out_err; } @@ -414,7 +414,7 @@ net_devmem_get_niov_at(struct net_devmem_dmabuf_binding= *binding, *off =3D virt_addr % PAGE_SIZE; *size =3D PAGE_SIZE - *off; =20 - return binding->tx_vec[virt_addr / PAGE_SIZE]; + return binding->vec[virt_addr / PAGE_SIZE]; } =20 /*** "Dmabuf devmem memory provider" ***/ diff --git a/net/core/devmem.h b/net/core/devmem.h index 0b43a648cd2e..1ea6228e4f40 100644 --- a/net/core/devmem.h +++ b/net/core/devmem.h @@ -63,7 +63,7 @@ struct net_devmem_dmabuf_binding { * address. This array is convenient to map the virtual addresses to * net_iovs in the TX path. */ - struct net_iov **tx_vec; + struct net_iov **vec; =20 struct work_struct unbind_w; }; --=20 2.47.3 From nobody Tue Dec 2 02:05:00 2025 Received: from mail-yw1-f171.google.com (mail-yw1-f171.google.com [209.85.128.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A5FEC2F12C0 for ; Thu, 20 Nov 2025 03:37:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.171 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763609835; cv=none; b=MidaDo6rX/jDPRYqFByXtaXH5oWA9v2DpCNFBEDzfDSi/uDcMWuPj+9ndWBkGAfk8MmTUc3wonw6u+pjD2rXGl0lGQZuqe+jCNCFn+nvasqDGsN7zpTaKe3zQSipYNlP22Ne0rqaq2sm0pjEQc3zeRvSAvBRlMdDv33oALb3V4I= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763609835; c=relaxed/simple; bh=7/XNyB/QLFMRNjNb6PbYdwSizLyVpMNRusy8wp8n+pw=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=QOHc4xqDYOkP5kZ26yY4vFAUuJz6CJJu1ZTylrTbwqHjtkr2ZonitU4zp1qbL4DHkUVFqEgXLElwBHJ+fLGIe2X8cbEqpw7PUFg8iHOxU/ViwdTr9HUAPbPOLFeTBVB881ygS/kztmcmdYQnKimhgGd6Em+NZYZM0KS+pNzATMc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Ioh+t15g; arc=none smtp.client-ip=209.85.128.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Ioh+t15g" Received: by mail-yw1-f171.google.com with SMTP id 00721157ae682-7866375e943so3774867b3.0 for ; Wed, 19 Nov 2025 19:37:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1763609833; x=1764214633; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=Y8HPHcTcPHipeEg2J18CsLH7RrVON818AUh8XbynViI=; b=Ioh+t15gUP1FCkd+u+1wtNH1+icyAb7TgFSIk6rfTt8t6X0aFy9u4O2fcFE9JddFPl Jn7+9ppxCHSLMGhLfmnGvD0Pc27M9EFixc+/7o7JwipbIcFSqE5yN3E2tew3I13gMfSw lKl0oBJlk7Y6PCsDwpalzjolzGmPIqOwqqtzcwhI/S8WC6RbONgTw+xdg679Xp1V+CRQ pGBMc+qmJsCFrwXKPltmREf2KHIfRQ6SMnHYP39Poff3uTwkJcGAuKWOdFg6opck773v JPbvZAXvATsStAp4dqZpFycuBwHvcaCwyajXDdQReAFezniYm7ZELYG6ADsn3rrfK2d3 PDxQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1763609833; x=1764214633; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=Y8HPHcTcPHipeEg2J18CsLH7RrVON818AUh8XbynViI=; b=kR7qiMPGIHMu+7YNPUg9lj95zaQ2EC5D4aZ3Sq8/adYOVOAoyMKlyBBpdUj71iV54g nAqikR8zUdCxXUhK5d40BKC3+uPtG9DyiPfQBYwUXyRrYAsJ9uwmOhDb/EEFXSw5JR10 Y0PCF8du86Aw7IJLMkI/QZGzZ4PXkzjx41nklU0mt4sFfXgfEKnti/7WJgG7FKPoPjpq xAp49s5hBrLFAOwjq7oHW7CPvAinXYcsu5Guehh9QBtETG6BZyHhU0wUl6SBKinThtwC 08m47d98o9G/lYUSea78rK27MPNGwHDOkTb5iBUT6oFohqWhs7fVd2k8DppV9b17VEI+ lN7w== X-Forwarded-Encrypted: i=1; AJvYcCXLwB6LDlvXo0Y4lrmJnpiFLah4Gc9jPb6BKNzi9LNRPusS61MOMRbo25ZTI8JUTWcdqJLvyy31BLKX72Y=@vger.kernel.org X-Gm-Message-State: AOJu0YynNPKcKEbI+mbzQ1zH6bgTuDWNe86/Y/GJMpzPbR0MEAYNxwjV nmsvCymESnRz+eMW39KHrtJEL67IZ+YYNNMfVbwHHGVTUgOKsSu23R7W X-Gm-Gg: ASbGncvTd7IE3SGbFNw2Vs5VQsrTFG0kLMpc9IWEI/dTHIVAqVa9pN+cEeryyNh7Sl/ TNr2PLUqbOec6qZl2G6cyD7tnDRP1x3Xu/4w1Nzfdojdagdcyy+lW1k2emCNmAQ0uXuf2icVrPb AGj9O5uqWP2kEIwlC8NIZsyIOnzKYeFL3dfuESFAGKFG0my15ipIlKCwLA1Mjktq96GVLMPVKqf ozhM8kk1WOQDh0PnH6TKH4378PH88HITQMf/zblU89NBUVIa2CNGX+H6/OthDT7dcxLTItKN9Vm pbOWMk0ZlAgDg8qaplhhLLMHFc4dlvUN9cfPMA89pxPuIoBpud64mOHOFaiD41sa0pDskaccxCJ yLrXxfPbLv/BBjhG2CzpV8Ym3v2CaQ4lbhT4szrHSA5616hPGcEs/+aMKPbXRHdJEfJ0vuwGSvl +UpsoXp18QFY8ubt1ojFzr6EnGHYKyOpHz X-Google-Smtp-Source: AGHT+IE0rUu8qu5Rkx9gHq3ccJHYL3kg9sZQWOC4lc4TwPWjYLNRb4WIbwmt/9NAqgJ1/c/9jA9W/w== X-Received: by 2002:a81:ee0b:0:b0:789:29ba:562c with SMTP id 00721157ae682-78a7965dfcamr13513017b3.64.1763609832623; Wed, 19 Nov 2025 19:37:12 -0800 (PST) Received: from localhost ([2a03:2880:25ff:53::]) by smtp.gmail.com with ESMTPSA id 00721157ae682-78a798a77d0sm4153177b3.22.2025.11.19.19.37.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 Nov 2025 19:37:12 -0800 (PST) From: Bobby Eshleman Date: Wed, 19 Nov 2025 19:37:09 -0800 Subject: [PATCH net-next v7 2/5] net: devmem: refactor sock_devmem_dontneed for autorelease split Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20251119-scratch-bobbyeshleman-devmem-tcp-token-upstream-v7-2-1abc8467354c@meta.com> References: <20251119-scratch-bobbyeshleman-devmem-tcp-token-upstream-v7-0-1abc8467354c@meta.com> In-Reply-To: <20251119-scratch-bobbyeshleman-devmem-tcp-token-upstream-v7-0-1abc8467354c@meta.com> To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Simon Horman , Kuniyuki Iwashima , Willem de Bruijn , Neal Cardwell , David Ahern , Arnd Bergmann , Jonathan Corbet , Andrew Lunn , Shuah Khan , Donald Hunter , Mina Almasry Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, Stanislav Fomichev , Bobby Eshleman X-Mailer: b4 0.14.3 From: Bobby Eshleman Refactor sock_devmem_dontneed() in preparation for supporting both autorelease and manual token release modes. Split the function into two parts: - sock_devmem_dontneed(): handles input validation, token allocation, and copying from userspace - sock_devmem_dontneed_autorelease(): performs the actual token release via xarray lookup and page pool put This separation allows a future commit to add a parallel sock_devmem_dontneed_manual_release() function that uses a different token tracking mechanism (per-niov reference counting) without duplicating the input validation logic. The refactoring is purely mechanical with no functional change. Only intended to minimize the noise in subsequent patches. Reviewed-by: Mina Almasry Signed-off-by: Bobby Eshleman --- net/core/sock.c | 52 ++++++++++++++++++++++++++++++++-------------------- 1 file changed, 32 insertions(+), 20 deletions(-) diff --git a/net/core/sock.c b/net/core/sock.c index 3b74fc71f51c..41274bd0394e 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1082,30 +1082,13 @@ static int sock_reserve_memory(struct sock *sk, int= bytes) #define MAX_DONTNEED_FRAGS 1024 =20 static noinline_for_stack int -sock_devmem_dontneed(struct sock *sk, sockptr_t optval, unsigned int optle= n) +sock_devmem_dontneed_autorelease(struct sock *sk, struct dmabuf_token *tok= ens, + unsigned int num_tokens) { - unsigned int num_tokens, i, j, k, netmem_num =3D 0; - struct dmabuf_token *tokens; + unsigned int i, j, k, netmem_num =3D 0; int ret =3D 0, num_frags =3D 0; netmem_ref netmems[16]; =20 - if (!sk_is_tcp(sk)) - return -EBADF; - - if (optlen % sizeof(*tokens) || - optlen > sizeof(*tokens) * MAX_DONTNEED_TOKENS) - return -EINVAL; - - num_tokens =3D optlen / sizeof(*tokens); - tokens =3D kvmalloc_array(num_tokens, sizeof(*tokens), GFP_KERNEL); - if (!tokens) - return -ENOMEM; - - if (copy_from_sockptr(tokens, optval, optlen)) { - kvfree(tokens); - return -EFAULT; - } - xa_lock_bh(&sk->sk_user_frags); for (i =3D 0; i < num_tokens; i++) { for (j =3D 0; j < tokens[i].token_count; j++) { @@ -1135,6 +1118,35 @@ sock_devmem_dontneed(struct sock *sk, sockptr_t optv= al, unsigned int optlen) for (k =3D 0; k < netmem_num; k++) WARN_ON_ONCE(!napi_pp_put_page(netmems[k])); =20 + return ret; +} + +static noinline_for_stack int +sock_devmem_dontneed(struct sock *sk, sockptr_t optval, unsigned int optle= n) +{ + struct dmabuf_token *tokens; + unsigned int num_tokens; + int ret; + + if (!sk_is_tcp(sk)) + return -EBADF; + + if (optlen % sizeof(*tokens) || + optlen > sizeof(*tokens) * MAX_DONTNEED_TOKENS) + return -EINVAL; + + num_tokens =3D optlen / sizeof(*tokens); + tokens =3D kvmalloc_array(num_tokens, sizeof(*tokens), GFP_KERNEL); + if (!tokens) + return -ENOMEM; + + if (copy_from_sockptr(tokens, optval, optlen)) { + kvfree(tokens); + return -EFAULT; + } + + ret =3D sock_devmem_dontneed_autorelease(sk, tokens, num_tokens); + kvfree(tokens); return ret; } --=20 2.47.3 From nobody Tue Dec 2 02:05:00 2025 Received: from mail-yw1-f181.google.com (mail-yw1-f181.google.com [209.85.128.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BE8D92FFDC1 for ; Thu, 20 Nov 2025 03:37:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.181 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763609839; cv=none; b=TyfSi41dWZyDfpr1grUMAC4gZwvlYWjjoU+Wv/Ygxhjqpuif8BTXc/nODi8v0iW3LuVtc3F9AZtf8KXEGUI0F9hCum9+pKxmClC8ESDVsHzBdKkenk800VPS4OpIaPU7xvg3NyIUrtfMpWQ4W9PW0t4uhman0hO9VXt6B887pLI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763609839; c=relaxed/simple; bh=vxB8S1FZQSfGZXaNzpx1kiwtceVGQ0Ep5vIJSMeFQeo=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=K+jR45NPsYcuWiBb+W/TwjQDCF5vA4YS8Mm1wbr1tuIM0otgU+fRE6boneEwoVnxtJ2hpCy8HutYXX+ahfJ418Z+YTypzn5ACuqUIXH7gpC43ebh9cjqYEo5A/tCsmA1Fk/fEwj62xTT4RXwscwD3ynMjHKeOjxjbD82uyUR+UE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=TRSwAeEm; arc=none smtp.client-ip=209.85.128.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="TRSwAeEm" Received: by mail-yw1-f181.google.com with SMTP id 00721157ae682-786a85a68c6so3980517b3.3 for ; Wed, 19 Nov 2025 19:37:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1763609834; x=1764214634; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=E4e9rqZc0SkOIbNN+4QKzR0fy1wKkUXfCc1ZVSCdFrg=; b=TRSwAeEmGkQEbj+jZRAT7vGao5YxQ5A6hrRSYzCf/mOF4zDX37uwMPfeAjGX/RFK0c nUuyk+7rPn7vduWhl9qhJeHe6D3QmxC03FlVwJE4sYdC+Ba7DZKaiSBTlcGsxEPX3a9u UsHqH30haTsGr07SRNtC7HkwCvOQWULnnC71cyY4jGVzFF61KQH85bPmdx8A63E1RTd5 KNXi0PuTXlmYnYoM/wUTeRj1hNCcAE6wj0EnCudJf9JqNWSVlLI08V18o9WXVzJq9niO bVPScmn37uUYB2q36gex7JnNIZnQBDDONpqjrAV5jLDrjXr+UwKSN39f62RWwbJbj2aC WL0g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1763609834; x=1764214634; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=E4e9rqZc0SkOIbNN+4QKzR0fy1wKkUXfCc1ZVSCdFrg=; b=a9aGq3GDYGoDPEsFlSpCbdFPurCONTqnRraqY+DWQJTmLxhOyY4Yv7vXnJ/RkwK4gJ n6a/I5mHpQ873t/xUIBh1z6aVOYh2D6h68WBzm1A1MBHEaVKW5Q7e46LH+9LKJH7uEgV diZspE6rqZatNdtPCRCeHgQTHcgv6rsDF4Vh8HnFcuo0uL4kTxIcV153cZZjZ2QuVfg7 4+IybpJwdETeaLNRX+f2twQdWrAQ+VoIsEg39Ya6nXrhhkcgjMyuhY8PbHFYdAl5Wnlx 8IfswZGcuEzzqJY2p5NeirKyrl6wqoIxKPr7mbLfhzecdXzAGi+gZwWfBTbtZcSrRaYZ Z2DQ== X-Forwarded-Encrypted: i=1; AJvYcCV6JTfAXDI/lCyCtxArR56TzHwuHYoA+KkIl4NsSaU8M2em3fCbH/3+Oyf3QN3achPxlY0gkm4pvnoqcTY=@vger.kernel.org X-Gm-Message-State: AOJu0YxdWDwhRhbQaFtTEi8pvQOaJTcj/o4iZ5pqhJqE2pUGo9hD8OSy X9rV+P6n8291mJYpY1Zl/QSE9UKDiukkHpJc+w3cTRFVmSvB7qrrnsg5 X-Gm-Gg: ASbGncvMMA6S3TQETD78zVdeJ3Res9cjmhN45KO3Bo2ES4YMNmskWZvMzOUkVI1qyAX izoOqXRxefPmHS6Z9hqG6fAOObrJmttMRwkg3hih7YELcqlZP6kiez8SMuw4SspgABk6jeG48r/ hrR3BI3YzOCnn7m4QzYZrvYM1nuz9ih/Nl5dx95A/VA8w0dThSvoOOzD9bf11xRlj+yWcIf7dEu 8Nx0RaRtgIQI8gsRUNaqhE94XvnGnUqbMXgSjs1bSG+mK5ClqjxctkAXp1xfEaqibIo5iqmGFuo dGgEniaV+uGbx7AOjne2K1AVrBEL+1Jkz3GLE2qFFcP0VCushseFNMTT0QybGe3HXen/A5a1+UM YYA+pa69y4JAtlnSPueyE11RWgxk6f+s1x+P3KDJPUZ4sHVeh1UmqfK24zXqbHhB3kj6EgvFoR5 3UkniPzqfnRIESnBZO+jiQkNvp9xRssaE= X-Google-Smtp-Source: AGHT+IGxALLY8prG6JAVW5E/uGEtHfK5CSs8zal/mbFUfzItp6K39nfM2KQWttIuCn2w0Hkx9cfa3Q== X-Received: by 2002:a05:690e:16a0:b0:63d:bfad:6c7 with SMTP id 956f58d0204a3-642f7e385ffmr1032542d50.58.1763609833574; Wed, 19 Nov 2025 19:37:13 -0800 (PST) Received: from localhost ([2a03:2880:25ff:c::]) by smtp.gmail.com with ESMTPSA id 00721157ae682-78a7993cedesm4099427b3.41.2025.11.19.19.37.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 Nov 2025 19:37:13 -0800 (PST) From: Bobby Eshleman Date: Wed, 19 Nov 2025 19:37:10 -0800 Subject: [PATCH net-next v7 3/5] net: devmem: implement autorelease token management Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20251119-scratch-bobbyeshleman-devmem-tcp-token-upstream-v7-3-1abc8467354c@meta.com> References: <20251119-scratch-bobbyeshleman-devmem-tcp-token-upstream-v7-0-1abc8467354c@meta.com> In-Reply-To: <20251119-scratch-bobbyeshleman-devmem-tcp-token-upstream-v7-0-1abc8467354c@meta.com> To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Simon Horman , Kuniyuki Iwashima , Willem de Bruijn , Neal Cardwell , David Ahern , Arnd Bergmann , Jonathan Corbet , Andrew Lunn , Shuah Khan , Donald Hunter , Mina Almasry Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, Stanislav Fomichev , Bobby Eshleman X-Mailer: b4 0.14.3 From: Bobby Eshleman Add support for autorelease toggling of tokens using a static branch to control system-wide behavior. This allows applications to choose between two memory management modes: 1. Autorelease on: Leaked tokens are automatically released when the socket closes. 2. Autorelease off: Leaked tokens are released during dmabuf unbind. The autorelease mode is requested via the NETDEV_A_DMABUF_AUTORELEASE attribute of the NETDEV_CMD_BIND_RX message. Having separate modes per binding is disallowed and is rejected by netlink. The system will be "locked" into the mode that the first binding is set to. It can only be changed again once there are zero bindings on the system. Disabling autorelease offers ~13% improvement in CPU utilization. Static branching is used to limit the system to one mode or the other. Signed-off-by: Bobby Eshleman --- Changes in v7: - implement autorelease with static branch (Stan) - use netlink instead of sockopt (Stan) - merge uAPI and implementation patches into one patch (seemed less confusing) Changes in v6: - remove sk_devmem_info.autorelease, using binding->autorelease instead - move binding->autorelease check to outside of net_devmem_dmabuf_binding_put_urefs() (Mina) - remove overly defensive net_is_devmem_iov() (Mina) - add comment about multiple urefs mapping to a single netmem ref (Mina) - remove overly defense netmem NULL and netmem_is_net_iov checks (Mina) - use niov without casting back and forth with netmem (Mina) - move the autorelease flag from per-binding to per-socket (Mina) - remove the batching logic in sock_devmem_dontneed_manual_release() (Mina) - move autorelease check inside tcp_xa_pool_commit() (Mina) - remove single-binding restriction for autorelease mode (Mina) - unbind always checks for leaked urefs Changes in v5: - remove unused variables - introduce autorelease flag, preparing for future patch toggle new behavior Changes in v3: - make urefs per-binding instead of per-socket, reducing memory footprint - fallback to cleaning up references in dmabuf unbind if socket leaked tokens - drop ethtool patch Changes in v2: - always use GFP_ZERO for binding->vec (Mina) - remove WARN for changed binding (Mina) - remove extraneous binding ref get (Mina) - remove WARNs on invalid user input (Mina) - pre-assign niovs in binding->vec for RX case (Mina) - use atomic_set(, 0) to initialize sk_user_frags.urefs - fix length of alloc for urefs --- Documentation/netlink/specs/netdev.yaml | 12 ++++ include/net/netmem.h | 1 + include/net/sock.h | 7 +- include/uapi/linux/netdev.h | 1 + net/core/devmem.c | 109 +++++++++++++++++++++++++++-= ---- net/core/devmem.h | 11 +++- net/core/netdev-genl-gen.c | 5 +- net/core/netdev-genl.c | 13 +++- net/core/sock.c | 57 +++++++++++++++-- net/ipv4/tcp.c | 78 ++++++++++++++++++----- net/ipv4/tcp_ipv4.c | 13 +++- net/ipv4/tcp_minisocks.c | 3 +- tools/include/uapi/linux/netdev.h | 1 + 13 files changed, 262 insertions(+), 49 deletions(-) diff --git a/Documentation/netlink/specs/netdev.yaml b/Documentation/netlin= k/specs/netdev.yaml index 82bf5cb2617d..913fccca4c4e 100644 --- a/Documentation/netlink/specs/netdev.yaml +++ b/Documentation/netlink/specs/netdev.yaml @@ -562,6 +562,17 @@ attribute-sets: type: u32 checks: min: 1 + - + name: autorelease + doc: | + Token autorelease mode. If true (1), leaked tokens are automatic= ally + released when the socket closes. If false (0), leaked tokens are= only + released when the dmabuf is unbound. Once a binding is created w= ith a + specific mode, all subsequent bindings system-wide must use the = same + mode. + + Optional. Defaults to false if not specified. + type: u8 =20 operations: list: @@ -767,6 +778,7 @@ operations: - ifindex - fd - queues + - autorelease reply: attributes: - id diff --git a/include/net/netmem.h b/include/net/netmem.h index 9e10f4ac50c3..80d2263ba4ed 100644 --- a/include/net/netmem.h +++ b/include/net/netmem.h @@ -112,6 +112,7 @@ struct net_iov { }; struct net_iov_area *owner; enum net_iov_type type; + atomic_t uref; }; =20 struct net_iov_area { diff --git a/include/net/sock.h b/include/net/sock.h index a5f36ea9d46f..797b21148945 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -350,7 +350,7 @@ struct sk_filter; * @sk_scm_rights: flagged by SO_PASSRIGHTS to recv SCM_RIGHTS * @sk_scm_unused: unused flags for scm_recv() * @ns_tracker: tracker for netns reference - * @sk_user_frags: xarray of pages the user is holding a reference on. + * @sk_devmem_info: the devmem binding information for the socket * @sk_owner: reference to the real owner of the socket that calls * sock_lock_init_class_and_name(). */ @@ -579,7 +579,10 @@ struct sock { struct numa_drop_counters *sk_drop_counters; struct rcu_head sk_rcu; netns_tracker ns_tracker; - struct xarray sk_user_frags; + struct { + struct xarray frags; + struct net_devmem_dmabuf_binding *binding; + } sk_devmem_info; =20 #if IS_ENABLED(CONFIG_PROVE_LOCKING) && IS_ENABLED(CONFIG_MODULES) struct module *sk_owner; diff --git a/include/uapi/linux/netdev.h b/include/uapi/linux/netdev.h index 048c8de1a130..dff0be8223a4 100644 --- a/include/uapi/linux/netdev.h +++ b/include/uapi/linux/netdev.h @@ -206,6 +206,7 @@ enum { NETDEV_A_DMABUF_QUEUES, NETDEV_A_DMABUF_FD, NETDEV_A_DMABUF_ID, + NETDEV_A_DMABUF_AUTORELEASE, =20 __NETDEV_A_DMABUF_MAX, NETDEV_A_DMABUF_MAX =3D (__NETDEV_A_DMABUF_MAX - 1) diff --git a/net/core/devmem.c b/net/core/devmem.c index 4dee2666dd07..bba21c6cb195 100644 --- a/net/core/devmem.c +++ b/net/core/devmem.c @@ -11,6 +11,7 @@ #include #include #include +#include #include #include #include @@ -28,6 +29,17 @@ =20 static DEFINE_XARRAY_FLAGS(net_devmem_dmabuf_bindings, XA_FLAGS_ALLOC1); =20 +/* Static key to lock down autorelease to a single mode on a system. When + * enabled: autorelease mode (leaked tokens released on socket close). Wh= en + * disabled: manual mode (leaked tokens released on dmabuf unbind). Once = the + * first binding is created, the mode is locked system-wide and can not be + * changed until the system has zero bindings again. + * + * Protected by xa_lock of net_devmem_dmabuf_bindings. + */ +DEFINE_STATIC_KEY_FALSE(tcp_devmem_ar_key); +EXPORT_SYMBOL(tcp_devmem_ar_key); + static const struct memory_provider_ops dmabuf_devmem_ops; =20 bool net_is_devmem_iov(struct net_iov *niov) @@ -116,6 +128,24 @@ void net_devmem_free_dmabuf(struct net_iov *niov) gen_pool_free(binding->chunk_pool, dma_addr, PAGE_SIZE); } =20 +static void +net_devmem_dmabuf_binding_put_urefs(struct net_devmem_dmabuf_binding *bind= ing) +{ + int i; + + for (i =3D 0; i < binding->dmabuf->size / PAGE_SIZE; i++) { + struct net_iov *niov; + netmem_ref netmem; + + niov =3D binding->vec[i]; + netmem =3D net_iov_to_netmem(niov); + + /* Multiple urefs map to only a single netmem ref. */ + if (atomic_xchg(&niov->uref, 0) > 0) + WARN_ON_ONCE(!napi_pp_put_page(netmem)); + } +} + void net_devmem_unbind_dmabuf(struct net_devmem_dmabuf_binding *binding) { struct netdev_rx_queue *rxq; @@ -143,6 +173,10 @@ void net_devmem_unbind_dmabuf(struct net_devmem_dmabuf= _binding *binding) __net_mp_close_rxq(binding->dev, rxq_idx, &mp_params); } =20 + /* Clean up any lingering urefs from sockets that had autorelease + * disabled. + */ + net_devmem_dmabuf_binding_put_urefs(binding); net_devmem_dmabuf_binding_put(binding); } =20 @@ -179,8 +213,10 @@ struct net_devmem_dmabuf_binding * net_devmem_bind_dmabuf(struct net_device *dev, struct device *dma_dev, enum dma_data_direction direction, - unsigned int dmabuf_fd, struct netdev_nl_sock *priv, - struct netlink_ext_ack *extack) + unsigned int dmabuf_fd, + struct netdev_nl_sock *priv, + struct netlink_ext_ack *extack, + bool autorelease) { struct net_devmem_dmabuf_binding *binding; static u32 id_alloc_next; @@ -231,14 +267,13 @@ net_devmem_bind_dmabuf(struct net_device *dev, goto err_detach; } =20 - if (direction =3D=3D DMA_TO_DEVICE) { - binding->vec =3D kvmalloc_array(dmabuf->size / PAGE_SIZE, - sizeof(struct net_iov *), - GFP_KERNEL); - if (!binding->vec) { - err =3D -ENOMEM; - goto err_unmap; - } + /* Used by TX and also by RX when socket has autorelease disabled */ + binding->vec =3D kvmalloc_array(dmabuf->size / PAGE_SIZE, + sizeof(struct net_iov *), + GFP_KERNEL | __GFP_ZERO); + if (!binding->vec) { + err =3D -ENOMEM; + goto err_unmap; } =20 /* For simplicity we expect to make PAGE_SIZE allocations, but the @@ -292,25 +327,67 @@ net_devmem_bind_dmabuf(struct net_device *dev, niov =3D &owner->area.niovs[i]; niov->type =3D NET_IOV_DMABUF; niov->owner =3D &owner->area; + atomic_set(&niov->uref, 0); page_pool_set_dma_addr_netmem(net_iov_to_netmem(niov), net_devmem_get_dma_addr(niov)); - if (direction =3D=3D DMA_TO_DEVICE) - binding->vec[owner->area.base_virtual / PAGE_SIZE + i] =3D niov; + binding->vec[owner->area.base_virtual / PAGE_SIZE + i] =3D niov; } =20 virtual +=3D len; } =20 - err =3D xa_alloc_cyclic(&net_devmem_dmabuf_bindings, &binding->id, - binding, xa_limit_32b, &id_alloc_next, - GFP_KERNEL); + /* Enforce system-wide autorelease mode consistency for RX bindings. + * TX bindings don't use autorelease (always false) since tokens aren't + * leaked in TX path. Only RX bindings must all have the same + * autorelease mode, never mixed. + * + * We use the xarray's lock to atomically check xa_empty() and toggle + * the static key, avoiding the race where two new bindings may try to + * set the static key to different states. + */ + xa_lock(&net_devmem_dmabuf_bindings); + + if (direction =3D=3D DMA_FROM_DEVICE) { + if (!xa_empty(&net_devmem_dmabuf_bindings)) { + bool mode; + + mode =3D static_key_enabled(&tcp_devmem_ar_key); + + /* When bindings exist, enforce that the mode does not + * change. + */ + if (mode !=3D autorelease) { + NL_SET_ERR_MSG_FMT(extack, + "System already configured with autorelease=3D%d", + mode); + err =3D -EINVAL; + goto err_unlock_xa; + } + } else { + /* First binding sets the mode for all subsequent + * bindings. + */ + if (autorelease) + static_branch_enable(&tcp_devmem_ar_key); + else + static_branch_disable(&tcp_devmem_ar_key); + } + } + + err =3D __xa_alloc_cyclic(&net_devmem_dmabuf_bindings, &binding->id, + binding, xa_limit_32b, &id_alloc_next, + GFP_KERNEL); if (err < 0) - goto err_free_chunks; + goto err_unlock_xa; + + xa_unlock(&net_devmem_dmabuf_bindings); =20 list_add(&binding->list, &priv->bindings); =20 return binding; =20 +err_unlock_xa: + xa_unlock(&net_devmem_dmabuf_bindings); err_free_chunks: gen_pool_for_each_chunk(binding->chunk_pool, net_devmem_dmabuf_free_chunk_owner, NULL); diff --git a/net/core/devmem.h b/net/core/devmem.h index 1ea6228e4f40..33e85ff5f35e 100644 --- a/net/core/devmem.h +++ b/net/core/devmem.h @@ -12,9 +12,13 @@ =20 #include #include +#include =20 struct netlink_ext_ack; =20 +/* static key for TCP devmem autorelease */ +extern struct static_key_false tcp_devmem_ar_key; + struct net_devmem_dmabuf_binding { struct dma_buf *dmabuf; struct dma_buf_attachment *attachment; @@ -61,7 +65,7 @@ struct net_devmem_dmabuf_binding { =20 /* Array of net_iov pointers for this binding, sorted by virtual * address. This array is convenient to map the virtual addresses to - * net_iovs in the TX path. + * net_iovs. */ struct net_iov **vec; =20 @@ -88,7 +92,7 @@ net_devmem_bind_dmabuf(struct net_device *dev, struct device *dma_dev, enum dma_data_direction direction, unsigned int dmabuf_fd, struct netdev_nl_sock *priv, - struct netlink_ext_ack *extack); + struct netlink_ext_ack *extack, bool autorelease); struct net_devmem_dmabuf_binding *net_devmem_lookup_dmabuf(u32 id); void net_devmem_unbind_dmabuf(struct net_devmem_dmabuf_binding *binding); int net_devmem_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx, @@ -174,7 +178,8 @@ net_devmem_bind_dmabuf(struct net_device *dev, enum dma_data_direction direction, unsigned int dmabuf_fd, struct netdev_nl_sock *priv, - struct netlink_ext_ack *extack) + struct netlink_ext_ack *extack, + bool autorelease) { return ERR_PTR(-EOPNOTSUPP); } diff --git a/net/core/netdev-genl-gen.c b/net/core/netdev-genl-gen.c index ff20435c45d2..ecbd8ae2a3fa 100644 --- a/net/core/netdev-genl-gen.c +++ b/net/core/netdev-genl-gen.c @@ -85,10 +85,11 @@ static const struct nla_policy netdev_qstats_get_nl_pol= icy[NETDEV_A_QSTATS_SCOPE }; =20 /* NETDEV_CMD_BIND_RX - do */ -static const struct nla_policy netdev_bind_rx_nl_policy[NETDEV_A_DMABUF_FD= + 1] =3D { +static const struct nla_policy netdev_bind_rx_nl_policy[NETDEV_A_DMABUF_AU= TORELEASE + 1] =3D { [NETDEV_A_DMABUF_IFINDEX] =3D NLA_POLICY_MIN(NLA_U32, 1), [NETDEV_A_DMABUF_FD] =3D { .type =3D NLA_U32, }, [NETDEV_A_DMABUF_QUEUES] =3D NLA_POLICY_NESTED(netdev_queue_id_nl_policy), + [NETDEV_A_DMABUF_AUTORELEASE] =3D { .type =3D NLA_U8, }, }; =20 /* NETDEV_CMD_NAPI_SET - do */ @@ -187,7 +188,7 @@ static const struct genl_split_ops netdev_nl_ops[] =3D { .cmd =3D NETDEV_CMD_BIND_RX, .doit =3D netdev_nl_bind_rx_doit, .policy =3D netdev_bind_rx_nl_policy, - .maxattr =3D NETDEV_A_DMABUF_FD, + .maxattr =3D NETDEV_A_DMABUF_AUTORELEASE, .flags =3D GENL_ADMIN_PERM | GENL_CMD_CAP_DO, }, { diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c index 470fabbeacd9..5f06a677f056 100644 --- a/net/core/netdev-genl.c +++ b/net/core/netdev-genl.c @@ -939,6 +939,7 @@ int netdev_nl_bind_rx_doit(struct sk_buff *skb, struct = genl_info *info) struct netdev_nl_sock *priv; struct net_device *netdev; unsigned long *rxq_bitmap; + bool autorelease =3D false; struct device *dma_dev; struct sk_buff *rsp; int err =3D 0; @@ -952,6 +953,10 @@ int netdev_nl_bind_rx_doit(struct sk_buff *skb, struct= genl_info *info) ifindex =3D nla_get_u32(info->attrs[NETDEV_A_DEV_IFINDEX]); dmabuf_fd =3D nla_get_u32(info->attrs[NETDEV_A_DMABUF_FD]); =20 + if (info->attrs[NETDEV_A_DMABUF_AUTORELEASE]) + autorelease =3D + !!nla_get_u8(info->attrs[NETDEV_A_DMABUF_AUTORELEASE]); + priv =3D genl_sk_priv_get(&netdev_nl_family, NETLINK_CB(skb).sk); if (IS_ERR(priv)) return PTR_ERR(priv); @@ -1002,7 +1007,8 @@ int netdev_nl_bind_rx_doit(struct sk_buff *skb, struc= t genl_info *info) } =20 binding =3D net_devmem_bind_dmabuf(netdev, dma_dev, DMA_FROM_DEVICE, - dmabuf_fd, priv, info->extack); + dmabuf_fd, priv, info->extack, + autorelease); if (IS_ERR(binding)) { err =3D PTR_ERR(binding); goto err_rxq_bitmap; @@ -1096,8 +1102,11 @@ int netdev_nl_bind_tx_doit(struct sk_buff *skb, stru= ct genl_info *info) } =20 dma_dev =3D netdev_queue_get_dma_dev(netdev, 0); + /* TX bindings don't use autorelease. Autorelease is only meaningful + * for RX where tokens may be leaked by userspace. + */ binding =3D net_devmem_bind_dmabuf(netdev, dma_dev, DMA_TO_DEVICE, - dmabuf_fd, priv, info->extack); + dmabuf_fd, priv, info->extack, false); if (IS_ERR(binding)) { err =3D PTR_ERR(binding); goto err_unlock_netdev; diff --git a/net/core/sock.c b/net/core/sock.c index 41274bd0394e..f945cdb5a337 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -87,6 +87,7 @@ =20 #include #include +#include #include #include #include @@ -151,6 +152,7 @@ #include =20 #include "dev.h" +#include "devmem.h" =20 static DEFINE_MUTEX(proto_list_mutex); static LIST_HEAD(proto_list); @@ -1081,6 +1083,44 @@ static int sock_reserve_memory(struct sock *sk, int = bytes) #define MAX_DONTNEED_TOKENS 128 #define MAX_DONTNEED_FRAGS 1024 =20 +static noinline_for_stack int +sock_devmem_dontneed_manual_release(struct sock *sk, + struct dmabuf_token *tokens, + unsigned int num_tokens) +{ + struct net_iov *niov; + unsigned int i, j; + netmem_ref netmem; + unsigned int token; + int num_frags =3D 0; + int ret =3D 0; + + if (!sk->sk_devmem_info.binding) + return -EINVAL; + + for (i =3D 0; i < num_tokens; i++) { + for (j =3D 0; j < tokens[i].token_count; j++) { + size_t size =3D sk->sk_devmem_info.binding->dmabuf->size; + + token =3D tokens[i].token_start + j; + if (token >=3D size / PAGE_SIZE) + break; + + if (++num_frags > MAX_DONTNEED_FRAGS) + return ret; + + niov =3D sk->sk_devmem_info.binding->vec[token]; + if (atomic_dec_and_test(&niov->uref)) { + netmem =3D net_iov_to_netmem(niov); + WARN_ON_ONCE(!napi_pp_put_page(netmem)); + } + ret++; + } + } + + return ret; +} + static noinline_for_stack int sock_devmem_dontneed_autorelease(struct sock *sk, struct dmabuf_token *tok= ens, unsigned int num_tokens) @@ -1089,32 +1129,33 @@ sock_devmem_dontneed_autorelease(struct sock *sk, s= truct dmabuf_token *tokens, int ret =3D 0, num_frags =3D 0; netmem_ref netmems[16]; =20 - xa_lock_bh(&sk->sk_user_frags); + xa_lock_bh(&sk->sk_devmem_info.frags); for (i =3D 0; i < num_tokens; i++) { for (j =3D 0; j < tokens[i].token_count; j++) { if (++num_frags > MAX_DONTNEED_FRAGS) goto frag_limit_reached; =20 netmem_ref netmem =3D (__force netmem_ref)__xa_erase( - &sk->sk_user_frags, tokens[i].token_start + j); + &sk->sk_devmem_info.frags, + tokens[i].token_start + j); =20 if (!netmem || WARN_ON_ONCE(!netmem_is_net_iov(netmem))) continue; =20 netmems[netmem_num++] =3D netmem; if (netmem_num =3D=3D ARRAY_SIZE(netmems)) { - xa_unlock_bh(&sk->sk_user_frags); + xa_unlock_bh(&sk->sk_devmem_info.frags); for (k =3D 0; k < netmem_num; k++) WARN_ON_ONCE(!napi_pp_put_page(netmems[k])); netmem_num =3D 0; - xa_lock_bh(&sk->sk_user_frags); + xa_lock_bh(&sk->sk_devmem_info.frags); } ret++; } } =20 frag_limit_reached: - xa_unlock_bh(&sk->sk_user_frags); + xa_unlock_bh(&sk->sk_devmem_info.frags); for (k =3D 0; k < netmem_num; k++) WARN_ON_ONCE(!napi_pp_put_page(netmems[k])); =20 @@ -1145,7 +1186,11 @@ sock_devmem_dontneed(struct sock *sk, sockptr_t optv= al, unsigned int optlen) return -EFAULT; } =20 - ret =3D sock_devmem_dontneed_autorelease(sk, tokens, num_tokens); + if (static_branch_unlikely(&tcp_devmem_ar_key)) + ret =3D sock_devmem_dontneed_autorelease(sk, tokens, num_tokens); + else + ret =3D sock_devmem_dontneed_manual_release(sk, tokens, + num_tokens); =20 kvfree(tokens); return ret; diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index dee578aad690..e17b71244922 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -260,6 +260,7 @@ #include #include #include +#include #include #include #include @@ -492,7 +493,8 @@ void tcp_init_sock(struct sock *sk) =20 set_bit(SOCK_SUPPORT_ZC, &sk->sk_socket->flags); sk_sockets_allocated_inc(sk); - xa_init_flags(&sk->sk_user_frags, XA_FLAGS_ALLOC1); + xa_init_flags(&sk->sk_devmem_info.frags, XA_FLAGS_ALLOC1); + sk->sk_devmem_info.binding =3D NULL; } EXPORT_IPV6_MOD(tcp_init_sock); =20 @@ -2424,11 +2426,12 @@ static void tcp_xa_pool_commit_locked(struct sock *= sk, struct tcp_xa_pool *p) =20 /* Commit part that has been copied to user space. */ for (i =3D 0; i < p->idx; i++) - __xa_cmpxchg(&sk->sk_user_frags, p->tokens[i], XA_ZERO_ENTRY, - (__force void *)p->netmems[i], GFP_KERNEL); + __xa_cmpxchg(&sk->sk_devmem_info.frags, p->tokens[i], + XA_ZERO_ENTRY, (__force void *)p->netmems[i], + GFP_KERNEL); /* Rollback what has been pre-allocated and is no longer needed. */ for (; i < p->max; i++) - __xa_erase(&sk->sk_user_frags, p->tokens[i]); + __xa_erase(&sk->sk_devmem_info.frags, p->tokens[i]); =20 p->max =3D 0; p->idx =3D 0; @@ -2436,14 +2439,18 @@ static void tcp_xa_pool_commit_locked(struct sock *= sk, struct tcp_xa_pool *p) =20 static void tcp_xa_pool_commit(struct sock *sk, struct tcp_xa_pool *p) { + /* Skip xarray operations if autorelease is disabled (manual mode) */ + if (!static_branch_unlikely(&tcp_devmem_ar_key)) + return; + if (!p->max) return; =20 - xa_lock_bh(&sk->sk_user_frags); + xa_lock_bh(&sk->sk_devmem_info.frags); =20 tcp_xa_pool_commit_locked(sk, p); =20 - xa_unlock_bh(&sk->sk_user_frags); + xa_unlock_bh(&sk->sk_devmem_info.frags); } =20 static int tcp_xa_pool_refill(struct sock *sk, struct tcp_xa_pool *p, @@ -2454,24 +2461,42 @@ static int tcp_xa_pool_refill(struct sock *sk, stru= ct tcp_xa_pool *p, if (p->idx < p->max) return 0; =20 - xa_lock_bh(&sk->sk_user_frags); + xa_lock_bh(&sk->sk_devmem_info.frags); =20 tcp_xa_pool_commit_locked(sk, p); =20 for (k =3D 0; k < max_frags; k++) { - err =3D __xa_alloc(&sk->sk_user_frags, &p->tokens[k], + err =3D __xa_alloc(&sk->sk_devmem_info.frags, &p->tokens[k], XA_ZERO_ENTRY, xa_limit_31b, GFP_KERNEL); if (err) break; } =20 - xa_unlock_bh(&sk->sk_user_frags); + xa_unlock_bh(&sk->sk_devmem_info.frags); =20 p->max =3D k; p->idx =3D 0; return k ? 0 : err; } =20 +static void tcp_xa_pool_inc_pp_ref_count(struct tcp_xa_pool *tcp_xa_pool, + skb_frag_t *frag, int *refs) +{ + struct net_iov *niov; + + niov =3D skb_frag_net_iov(frag); + + if (static_branch_unlikely(&tcp_devmem_ar_key)) { + atomic_long_inc(&niov->pp_ref_count); + tcp_xa_pool->netmems[tcp_xa_pool->idx++] =3D + skb_frag_netmem(frag); + } else { + if (atomic_inc_return(&niov->uref) =3D=3D 1) + atomic_long_inc(&niov->pp_ref_count); + (*refs)++; + } +} + /* On error, returns the -errno. On success, returns number of bytes sent = to the * user. May not consume all of @remaining_len. */ @@ -2479,10 +2504,12 @@ static int tcp_recvmsg_dmabuf(struct sock *sk, cons= t struct sk_buff *skb, unsigned int offset, struct msghdr *msg, int remaining_len) { + struct net_devmem_dmabuf_binding *binding =3D NULL; struct dmabuf_cmsg dmabuf_cmsg =3D { 0 }; struct tcp_xa_pool tcp_xa_pool; unsigned int start; int i, copy, n; + int refs =3D 0; int sent =3D 0; int err =3D 0; =20 @@ -2536,6 +2563,7 @@ static int tcp_recvmsg_dmabuf(struct sock *sk, const = struct sk_buff *skb, skb_frag_t *frag =3D &skb_shinfo(skb)->frags[i]; struct net_iov *niov; u64 frag_offset; + u32 token; int end; =20 /* !skb_frags_readable() should indicate that ALL the @@ -2568,13 +2596,32 @@ static int tcp_recvmsg_dmabuf(struct sock *sk, cons= t struct sk_buff *skb, start; dmabuf_cmsg.frag_offset =3D frag_offset; dmabuf_cmsg.frag_size =3D copy; - err =3D tcp_xa_pool_refill(sk, &tcp_xa_pool, - skb_shinfo(skb)->nr_frags - i); - if (err) + + binding =3D net_devmem_iov_binding(niov); + + if (!sk->sk_devmem_info.binding) + sk->sk_devmem_info.binding =3D binding; + + if (sk->sk_devmem_info.binding !=3D binding) { + err =3D -EFAULT; goto out; + } + + if (static_branch_unlikely(&tcp_devmem_ar_key)) { + err =3D tcp_xa_pool_refill(sk, &tcp_xa_pool, + skb_shinfo(skb)->nr_frags - i); + if (err) + goto out; + + dmabuf_cmsg.frag_token =3D + tcp_xa_pool.tokens[tcp_xa_pool.idx]; + } else { + token =3D net_iov_virtual_addr(niov) >> PAGE_SHIFT; + dmabuf_cmsg.frag_token =3D token; + } + =20 /* Will perform the exchange later */ - dmabuf_cmsg.frag_token =3D tcp_xa_pool.tokens[tcp_xa_pool.idx]; dmabuf_cmsg.dmabuf_id =3D net_devmem_iov_binding_id(niov); =20 offset +=3D copy; @@ -2587,8 +2634,8 @@ static int tcp_recvmsg_dmabuf(struct sock *sk, const = struct sk_buff *skb, if (err) goto out; =20 - atomic_long_inc(&niov->pp_ref_count); - tcp_xa_pool.netmems[tcp_xa_pool.idx++] =3D skb_frag_netmem(frag); + tcp_xa_pool_inc_pp_ref_count(&tcp_xa_pool, frag, + &refs); =20 sent +=3D copy; =20 @@ -2617,6 +2664,7 @@ static int tcp_recvmsg_dmabuf(struct sock *sk, const = struct sk_buff *skb, =20 out: tcp_xa_pool_commit(sk, &tcp_xa_pool); + if (!sent) sent =3D err; =20 diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 6fcaecb67284..bfa19aeec6b5 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -89,6 +89,9 @@ =20 #include =20 +#include +#include "../core/devmem.h" + #include =20 #ifdef CONFIG_TCP_MD5SIG @@ -2492,7 +2495,7 @@ static void tcp_release_user_frags(struct sock *sk) unsigned long index; void *netmem; =20 - xa_for_each(&sk->sk_user_frags, index, netmem) + xa_for_each(&sk->sk_devmem_info.frags, index, netmem) WARN_ON_ONCE(!napi_pp_put_page((__force netmem_ref)netmem)); #endif } @@ -2501,9 +2504,15 @@ void tcp_v4_destroy_sock(struct sock *sk) { struct tcp_sock *tp =3D tcp_sk(sk); =20 + /* No static branch because sockets may outlive the binding, which + * opens the opportunity for static key state to change. In any + * scenario, if the xarray is non-empty then we need to free those + * frags. + */ tcp_release_user_frags(sk); =20 - xa_destroy(&sk->sk_user_frags); + xa_destroy(&sk->sk_devmem_info.frags); + sk->sk_devmem_info.binding =3D NULL; =20 trace_tcp_destroy_sock(sk); =20 diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c index bd5462154f97..2aec977f5c12 100644 --- a/net/ipv4/tcp_minisocks.c +++ b/net/ipv4/tcp_minisocks.c @@ -662,7 +662,8 @@ struct sock *tcp_create_openreq_child(const struct sock= *sk, =20 __TCP_INC_STATS(sock_net(sk), TCP_MIB_PASSIVEOPENS); =20 - xa_init_flags(&newsk->sk_user_frags, XA_FLAGS_ALLOC1); + xa_init_flags(&newsk->sk_devmem_info.frags, XA_FLAGS_ALLOC1); + newsk->sk_devmem_info.binding =3D NULL; =20 return newsk; } diff --git a/tools/include/uapi/linux/netdev.h b/tools/include/uapi/linux/n= etdev.h index 048c8de1a130..dff0be8223a4 100644 --- a/tools/include/uapi/linux/netdev.h +++ b/tools/include/uapi/linux/netdev.h @@ -206,6 +206,7 @@ enum { NETDEV_A_DMABUF_QUEUES, NETDEV_A_DMABUF_FD, NETDEV_A_DMABUF_ID, + NETDEV_A_DMABUF_AUTORELEASE, =20 __NETDEV_A_DMABUF_MAX, NETDEV_A_DMABUF_MAX =3D (__NETDEV_A_DMABUF_MAX - 1) --=20 2.47.3 From nobody Tue Dec 2 02:05:00 2025 Received: from mail-yw1-f178.google.com (mail-yw1-f178.google.com [209.85.128.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BED622FFDCF for ; Thu, 20 Nov 2025 03:37:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.178 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763609838; cv=none; b=QSeAwR71gWpRl+zekqPwUDVK9kq6dUHsy7n2iR2J4eRg4pAsA+86cLo35Ej2+2DqQf+D/b9a2a9mX2MOTuq32DAoVV8WfGz9O63A+y1Jasyc+17MDbviT3ygzlz6teDZxaaZfH/Vr1tscIYaFj5bW3t0t+NIZtMU7PJdWR6Do6Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763609838; c=relaxed/simple; bh=6tucM9ZQ59G0xrDfMuItsfF1GTt7N8xzYil/t1gA1PQ=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=CCnD3+gxY/LhhTZ2YxEY+s76T/mFunAvLhq9SbXpv5W/qrbWlYmyUq/I0COXu+2Ue3ryJ6mG+sa0XHANq4OURbKyzGML4lmDeIrYzwbN1X2oPDPvGRgjpRMYXxq1d9+EVrzZk0JzJbZGcCl+3N+3/cNMGtP7DJ98yN32FX4j/iM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=G13At/e0; arc=none smtp.client-ip=209.85.128.178 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="G13At/e0" Received: by mail-yw1-f178.google.com with SMTP id 00721157ae682-787eb2d86bfso4092177b3.2 for ; Wed, 19 Nov 2025 19:37:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1763609834; x=1764214634; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=DXRdbOnUYiudn9GxHVVfvuR3By+3aRth8g9ruHEmquY=; b=G13At/e0ioienxBeW6SQ4+fRmP63PmF11G7gK+KurFlMBg7OkigrTjnVBpsGYN0J0n fiCfJ0RsAjB63GZ8//xxoaWEM3ODPdoTvl2NQIr8NVIe1E1Ie2W7lPBWPI8MiJ4v85FL VtYKdGz7ogahpf+jJiKYhB3WrjgfPWDKMTyKyU+Gjtvf5MhW+Dsn5060AkHv+MZGwgaQ 01yL6IYIYcswfHXCVr1uekDjLEwklN5yyOTP/zueoyfuFIi17EYqJBlqpyGTAGJR/bBu ueZqnAtGdpdisotGE5eYJKzF6iYTn9pYYZRNS0eQmkUtqlqs79HfOFucgTufrWGJonUH 5XJQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1763609834; x=1764214634; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=DXRdbOnUYiudn9GxHVVfvuR3By+3aRth8g9ruHEmquY=; b=bptMdZ0XnyTP0QmVUPZG2BdL+tUvDEl1DGqzcoOUPCtKF2x4ZqvcM5mgFAu6diBqIv SyjvexiM4wq/g4I47RMWFvmotF+mSr8Jo8Q1mrztyfY7nm6xWvZmvxYkypSCB9MYcy6g 027oFtRIvJh20NZTGiVcP8z54SYTTW73pkWbRTzF0mQgwQnGsMDA5RYbSAOayVHkowR/ SnR/2aBIzvRtTLBtVcx7UzzqZ6S10QRAOW3bFz9FdwUkdkqOc9LbMn0BE5H4+w9+xjYh q5HNzi3ni+SuQWLY0OcFxnePzR7Iemv8C5OWX5I8ivSUeoRlyOJRaO1TYqxNvVuSKCr5 oxlw== X-Forwarded-Encrypted: i=1; AJvYcCVHfyPtC2wH2JJUMlyRYl7L043VIAHSqlWREHv7/BOqNCQ9TugBzlV+qVpnSTM6gwoadoqn8f5hvFJzObw=@vger.kernel.org X-Gm-Message-State: AOJu0YxsMBGkwMYr2/gQOi0cKf8/KKsPx9h9qXldZKaYB84vqMC2uRav 2/9yhl6U1EXvICfyTsZNO1a5I8K7+g4FToSk7Co4fU7WPTqt2FqcJf3k X-Gm-Gg: ASbGnctpQ+yej3+hUKq+YQMI1saXpTJIGGpLNFjbjlBT9UsikAB1hNlYCZAGWg5A0Ik rV06EHjIr1+EvYnebnhsqjipO5MJirr6pQ208dEco4KE6lhGfXtXmb1oexDlNeOjRNX5Lh48i8M TryUVbpDLNOrXu8L4oLnRCa5Y01EVaqJiREl97dhPZnIu1czebmqkjhK8yFv8RgHou3kCkFwP2L eCDnXbzttOM1X1q4zzrvgm1/QfFQBslXEyY/R68lG77o3yyNU7ehTEWvKmbrSl1dPAxPuxICNf4 X1FMYZVnXt7pzzVAocYhTtvPx2H6TcjSzUi7H9Zzx/s3oifceAWRkyufhtPlcRWZzMHZxEfxi1Q EFjmD1Z/wnBRy9/9/f+2T5zaF+G4wh4XFv0sf9pEYR4NZDGW6Nw4Jd9BHP9JcekmiLnFRqsDqSI M61xNM32UmRDtAT18J7HIN5w== X-Google-Smtp-Source: AGHT+IHc8CUnhbCI562x0BXPS/DRgsQMraAy3qtBr5+up+1mfxjEqc0SEDv6gbxTVE+GeQ4ZI1qCqg== X-Received: by 2002:a05:690c:b13:b0:787:de81:35c2 with SMTP id 00721157ae682-78a795f89b8mr13892267b3.42.1763609834525; Wed, 19 Nov 2025 19:37:14 -0800 (PST) Received: from localhost ([2a03:2880:25ff:70::]) by smtp.gmail.com with ESMTPSA id 00721157ae682-78a7993c4e5sm4116457b3.40.2025.11.19.19.37.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 Nov 2025 19:37:14 -0800 (PST) From: Bobby Eshleman Date: Wed, 19 Nov 2025 19:37:11 -0800 Subject: [PATCH net-next v7 4/5] net: devmem: document NETDEV_A_DMABUF_AUTORELEASE netlink attribute Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20251119-scratch-bobbyeshleman-devmem-tcp-token-upstream-v7-4-1abc8467354c@meta.com> References: <20251119-scratch-bobbyeshleman-devmem-tcp-token-upstream-v7-0-1abc8467354c@meta.com> In-Reply-To: <20251119-scratch-bobbyeshleman-devmem-tcp-token-upstream-v7-0-1abc8467354c@meta.com> To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Simon Horman , Kuniyuki Iwashima , Willem de Bruijn , Neal Cardwell , David Ahern , Arnd Bergmann , Jonathan Corbet , Andrew Lunn , Shuah Khan , Donald Hunter , Mina Almasry Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, Stanislav Fomichev , Bobby Eshleman X-Mailer: b4 0.14.3 From: Bobby Eshleman Update devmem.rst documentation to describe the autorelease netlink attribute used during RX dmabuf binding. The autorelease attribute is specified at bind-time via the netlink API (NETDEV_CMD_BIND_RX) and controls what happens to outstanding tokens when the socket closes. Document the two token release modes (automatic vs manual), how to configure the binding for autorelease, the perf benefits, new caveats and restrictions, and the way the mode is enforced system-wide. Signed-off-by: Bobby Eshleman --- Changes in v7: - Document netlink instead of sockopt - Mention system-wide locked to one mode --- Documentation/networking/devmem.rst | 70 +++++++++++++++++++++++++++++++++= ++++ 1 file changed, 70 insertions(+) diff --git a/Documentation/networking/devmem.rst b/Documentation/networking= /devmem.rst index a6cd7236bfbd..67c63bc5a7ae 100644 --- a/Documentation/networking/devmem.rst +++ b/Documentation/networking/devmem.rst @@ -235,6 +235,76 @@ can be less than the tokens provided by the user in ca= se of: (a) an internal kernel leak bug. (b) the user passed more than 1024 frags. =20 + +Autorelease Control +~~~~~~~~~~~~~~~~~~~ + +The autorelease mode controls what happens to outstanding tokens (tokens n= ot +released via SO_DEVMEM_DONTNEED) when the socket closes. Autorelease is +configured per-binding at binding creation time via the netlink API:: + + struct netdev_bind_rx_req *req; + struct netdev_bind_rx_rsp *rsp; + struct ynl_sock *ys; + struct ynl_error yerr; + + ys =3D ynl_sock_create(&ynl_netdev_family, &yerr); + + req =3D netdev_bind_rx_req_alloc(); + netdev_bind_rx_req_set_ifindex(req, ifindex); + netdev_bind_rx_req_set_fd(req, dmabuf_fd); + netdev_bind_rx_req_set_autorelease(req, 0); /* 0 =3D manual, 1 =3D auto */ + __netdev_bind_rx_req_set_queues(req, queues, n_queues); + + rsp =3D netdev_bind_rx(ys, req); + + dmabuf_id =3D rsp->id; + +When autorelease is disabled (0): + +- Outstanding tokens are NOT released when the socket closes +- Outstanding tokens are only released when the dmabuf is unbound +- Provides better performance by eliminating xarray overhead (~13% CPU red= uction) +- Kernel tracks tokens via atomic reference counters in net_iov structures + +When autorelease is enabled (1): + +- Outstanding tokens are automatically released when the socket closes +- Backwards compatible behavior +- Kernel tracks tokens in an xarray per socket + +The default is autorelease disabled. + +Important: In both modes, applications should call SO_DEVMEM_DONTNEED to +return tokens as soon as they are done processing. The autorelease setting= only +affects what happens to tokens that are still outstanding when close() is = called. + +The mode is enforced system-wide. Once a binding is created with a specific +autorelease mode, all subsequent bindings system-wide must use the same mo= de. + + +Performance Considerations +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Disabling autorelease provides approximately ~13% CPU utilization improvem= ent +in RX workloads. That said, applications must ensure all tokens are releas= ed +via SO_DEVMEM_DONTNEED before closing the socket, otherwise the backing pa= ges +will remain pinned until the dmabuf is unbound. + + +Caveats +~~~~~~~ + +- Once a system-wide autorelease mode is selected (via the first binding), + all subsequent bindings must use the same mode. Attempts to create bindi= ngs + with a different mode will be rejected with -EINVAL. + +- Applications using manual release mode (autorelease=3D0) must ensure all= tokens + are returned via SO_DEVMEM_DONTNEED before socket close to avoid resource + leaks during the lifetime of the dmabuf binding. Tokens not released bef= ore + close() will only be freed when the dmabuf is unbound. + + TX Interface =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =20 --=20 2.47.3 From nobody Tue Dec 2 02:05:00 2025 Received: from mail-yw1-f176.google.com (mail-yw1-f176.google.com [209.85.128.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E2F6F2F6170 for ; Thu, 20 Nov 2025 03:37:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.176 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763609840; cv=none; b=E7e8HBO3Ed69e372L91k8ycQ0OtGrnnZUjkKXLUD0WcM1GzffmbKsDja1l9bQcH8eXUo7zcMqB7zGFMM9BJB+zpNRLhEIMoQLprpmijD1SwRTvRRGvRHYRHdgSDgxQ0KKz3fYK+zCeIRUMLtnomAr5C6/j/v5KPOWqry/HiA3hM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763609840; c=relaxed/simple; bh=o+TnvX8H67E39YgV1VzM4yPneveTtrGVWnzs7PTFEyI=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=NYe5XJIsM3h0cBiqifxwJk7zDkRlFgcK9OuVWHGaHGwuG1FjrkK/0V8t7/Usg1K0g28wjAclayJhpeB/xyCbUCxGoOyys3W7YsMi0P5SGpYlzj3yLKeXpY5K0SWMSMDOj6fNNDcteB2q/wrfbg8E1gF6s0vuSHeCUmfbmGInXus= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=c50ijRWD; arc=none smtp.client-ip=209.85.128.176 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="c50ijRWD" Received: by mail-yw1-f176.google.com with SMTP id 00721157ae682-786572c14e3so4135387b3.2 for ; Wed, 19 Nov 2025 19:37:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1763609835; x=1764214635; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=dB9vGRQKOzD+0FIDq610kWqp5roY5Ews5Rp+LunXhtg=; b=c50ijRWDv6C4+bgjDgEntLElA/090B2lBIWQ/umM7rjkKgvMUfVLaWczs2DEj2Rbyx I2Mgxaiors/DmJAgoAkki5Yp8iYo6qaNIfe18qbZFMlNXnyel4IyEggprq4KIhCUYFh9 Bm+3PjfIENhOZTyXldLtte3+qD4dvFuMMCUGwm1rMUeiij76QlSqlEiCCnnZF7kJ5BZe 4iUe9LHarqM1jVzlwwYg4D8J61TCwd26KZejYV0GYmvc1qplkVnBMWVAlC/3VQ6aEcqF gyMUwQxdOyvxUG+CcLNJhg297RkK9mcREkxV+nLAF8yQJDS0JslUe6HU4TOWOl27kwIR uJlQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1763609835; x=1764214635; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=dB9vGRQKOzD+0FIDq610kWqp5roY5Ews5Rp+LunXhtg=; b=VxcXeFB939hG8ZTpXUYZU1+HntcJRHMRMUrgfGmG6i1NOdzpdzS1lgXvVFf3qS6VTP SLa3ne0hrMztsn/XHaUWm+l3U5f69jER94/f3YSVV+RtwIs5V5/GaiBgErGqc3A9M/DX rAb5MBAwHjl34pDIGKrL0GQQtUgo2KFKXOZyxkpowQhADp0cAGJ7E7Ok1SMvD4omef8+ +J3aeg+pUmKUFCa1/MLp//lIoxyAo0zp0gtNWslOpwHGU1ZTTUpnSlpkbcvI8e1nWQ9h 0xTIftj4pzDLYy3ArqEiTmVD52WzV0QicGuRklAejexLo7ThLdSonpb11akIBTPILZhu PefQ== X-Forwarded-Encrypted: i=1; AJvYcCXIrhzcequ6qo807jrole8k3pcvuJlA9DaKKnLcf9rL/mXdsew6FpKWO5h7WOsyDivnUMy4plABIVLGauo=@vger.kernel.org X-Gm-Message-State: AOJu0YyU98zfslODRHWOjoozqdRylo2cV2+Bhz/uJ7A6MAfjGr6bsN+b gZKtBguOZ47HS4rjIRxTJYlBT3Y+Y6pdvf9oprZQlvlwbH9DICfi4ttK X-Gm-Gg: ASbGncvL4i3ql+sxJri+IGOm+4GTE95iQaZtDca0S6rAwWx78gE+2X7p7Tf7RGtShYy +sxrkkXtXpVUH9PQwRem8bjeuZNAubKHVpgx5Ch8L+89J88UZVL4vmxPGpFQCgSH7I0lzudGBiW w5xbq4wyH6uO+GbqCj8b706lLNgjRHhLDLdr9rkU2VmjW4eI3xCX1tEVwMA8ymhTWzj4VgVDx7S kqn7u2H0m7bhRG0+0UeRHgPHvVKu/llgpLBEpYVOQMCxFhXSPZl7VxAJPeRdw+ChMWNyypqEqlw VqFGZCtRpYxJbhWG0gzTbnQh+SJOPGQLn0f6z1Yz8kLUgZWnkzjzzGGH03y8804LNiBLZMxFroL LhtFyvtZ/oMSqksxbUClz1PN4NbeYV1eZN5s0GSNEM+sBfmztdVSD8i9r5lxhpftznYJ9pdqGvL 2tgYYvUYzDm/7lu8xMT0OyGQ== X-Google-Smtp-Source: AGHT+IH+ZvCrkl3kFqXJlHEtVfU0bYPG0c0dzdNLXfeCYjxXFUTZZPthcByghfwOu6Wk5tCc8cKAKg== X-Received: by 2002:a05:690e:12cb:b0:63f:a324:bbf3 with SMTP id 956f58d0204a3-642f79d694dmr1177865d50.42.1763609835427; Wed, 19 Nov 2025 19:37:15 -0800 (PST) Received: from localhost ([2a03:2880:25ff:52::]) by smtp.gmail.com with ESMTPSA id 956f58d0204a3-642f718bbbesm456461d50.21.2025.11.19.19.37.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 Nov 2025 19:37:15 -0800 (PST) From: Bobby Eshleman Date: Wed, 19 Nov 2025 19:37:12 -0800 Subject: [PATCH net-next v7 5/5] selftests: drv-net: devmem: add autorelease tests Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20251119-scratch-bobbyeshleman-devmem-tcp-token-upstream-v7-5-1abc8467354c@meta.com> References: <20251119-scratch-bobbyeshleman-devmem-tcp-token-upstream-v7-0-1abc8467354c@meta.com> In-Reply-To: <20251119-scratch-bobbyeshleman-devmem-tcp-token-upstream-v7-0-1abc8467354c@meta.com> To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Simon Horman , Kuniyuki Iwashima , Willem de Bruijn , Neal Cardwell , David Ahern , Arnd Bergmann , Jonathan Corbet , Andrew Lunn , Shuah Khan , Donald Hunter , Mina Almasry Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, Stanislav Fomichev , Bobby Eshleman X-Mailer: b4 0.14.3 From: Bobby Eshleman Add tests cases that check the autorelease modes (on and off). The new tests are the same as the old ones, but just pass a flag to ncdevmem to select the autorelease mode. Only for RX tests is autorelease checked, as the autorelease ncdevmem flag is unused in the TX case and doesn't apply to TX bind operations. Signed-off-by: Bobby Eshleman --- Note: I tested successfully with kperf, but I'm troubleshooting some mlx5 issues with ncdevmem so this patch, though simple, is not fully validated. Will respond to this thread when solve the issue. Changes in v7: - use autorelease netlink - remove sockopt tests --- tools/testing/selftests/drivers/net/hw/devmem.py | 22 +++++++++++++++++++= +-- tools/testing/selftests/drivers/net/hw/ncdevmem.c | 19 +++++++++++++------ 2 files changed, 33 insertions(+), 8 deletions(-) diff --git a/tools/testing/selftests/drivers/net/hw/devmem.py b/tools/testi= ng/selftests/drivers/net/hw/devmem.py index 45c2d49d55b6..dddb9d77cb28 100755 --- a/tools/testing/selftests/drivers/net/hw/devmem.py +++ b/tools/testing/selftests/drivers/net/hw/devmem.py @@ -25,7 +25,23 @@ def check_rx(cfg) -> None: =20 port =3D rand_port() socat =3D f"socat -u - TCP{cfg.addr_ipver}:{cfg.baddr}:{port},bind=3D{= cfg.remote_baddr}:{port}" - listen_cmd =3D f"{cfg.bin_local} -l -f {cfg.ifname} -s {cfg.addr} -p {= port} -c {cfg.remote_addr} -v 7" + listen_cmd =3D f"{cfg.bin_local} -l -f {cfg.ifname} -s {cfg.addr} -p {= port} -c {cfg.remote_addr} -v 7 -a 0" + + with bkg(listen_cmd, exit_wait=3DTrue) as ncdevmem: + wait_port_listen(port) + cmd(f"yes $(echo -e \x01\x02\x03\x04\x05\x06) | \ + head -c 1K | {socat}", host=3Dcfg.remote, shell=3DTrue) + + ksft_eq(ncdevmem.ret, 0) + + +@ksft_disruptive +def check_rx_autorelease(cfg) -> None: + require_devmem(cfg) + + port =3D rand_port() + socat =3D f"socat -u - TCP{cfg.addr_ipver}:{cfg.baddr}:{port},bind=3D{= cfg.remote_baddr}:{port}" + listen_cmd =3D f"{cfg.bin_local} -l -f {cfg.ifname} -s {cfg.addr} -p {= port} -c {cfg.remote_addr} -v 7 -a 1" =20 with bkg(listen_cmd, exit_wait=3DTrue) as ncdevmem: wait_port_listen(port) @@ -68,7 +84,9 @@ def main() -> None: cfg.bin_local =3D path.abspath(path.dirname(__file__) + "/ncdevmem= ") cfg.bin_remote =3D cfg.remote.deploy(cfg.bin_local) =20 - ksft_run([check_rx, check_tx, check_tx_chunks], + ksft_run([check_rx, check_rx_autorelease, + check_tx, check_tx_autorelease, + check_tx_chunks, check_tx_chunks_autorelease], args=3D(cfg, )) ksft_exit() =20 diff --git a/tools/testing/selftests/drivers/net/hw/ncdevmem.c b/tools/test= ing/selftests/drivers/net/hw/ncdevmem.c index 3288ed04ce08..406f1771d9ec 100644 --- a/tools/testing/selftests/drivers/net/hw/ncdevmem.c +++ b/tools/testing/selftests/drivers/net/hw/ncdevmem.c @@ -92,6 +92,7 @@ static char *port; static size_t do_validation; static int start_queue =3D -1; static int num_queues =3D -1; +static int devmem_autorelease; static char *ifname; static unsigned int ifindex; static unsigned int dmabuf_id; @@ -679,7 +680,8 @@ static int configure_flow_steering(struct sockaddr_in6 = *server_sin) =20 static int bind_rx_queue(unsigned int ifindex, unsigned int dmabuf_fd, struct netdev_queue_id *queues, - unsigned int n_queue_index, struct ynl_sock **ys) + unsigned int n_queue_index, struct ynl_sock **ys, + int autorelease) { struct netdev_bind_rx_req *req =3D NULL; struct netdev_bind_rx_rsp *rsp =3D NULL; @@ -695,6 +697,7 @@ static int bind_rx_queue(unsigned int ifindex, unsigned= int dmabuf_fd, req =3D netdev_bind_rx_req_alloc(); netdev_bind_rx_req_set_ifindex(req, ifindex); netdev_bind_rx_req_set_fd(req, dmabuf_fd); + netdev_bind_rx_req_set_autorelease(req, autorelease); __netdev_bind_rx_req_set_queues(req, queues, n_queue_index); =20 rsp =3D netdev_bind_rx(*ys, req); @@ -872,7 +875,8 @@ static int do_server(struct memory_buffer *mem) goto err_reset_rss; } =20 - if (bind_rx_queue(ifindex, mem->fd, create_queues(), num_queues, &ys)) { + if (bind_rx_queue(ifindex, mem->fd, create_queues(), num_queues, &ys, + devmem_autorelease)) { pr_err("Failed to bind"); goto err_reset_flow_steering; } @@ -1092,7 +1096,7 @@ int run_devmem_tests(void) goto err_reset_headersplit; } =20 - if (!bind_rx_queue(ifindex, mem->fd, queues, num_queues, &ys)) { + if (!bind_rx_queue(ifindex, mem->fd, queues, num_queues, &ys, 0)) { pr_err("Binding empty queues array should have failed"); goto err_unbind; } @@ -1108,7 +1112,7 @@ int run_devmem_tests(void) goto err_reset_headersplit; } =20 - if (!bind_rx_queue(ifindex, mem->fd, queues, num_queues, &ys)) { + if (!bind_rx_queue(ifindex, mem->fd, queues, num_queues, &ys, 0)) { pr_err("Configure dmabuf with header split off should have failed"); goto err_unbind; } @@ -1124,7 +1128,7 @@ int run_devmem_tests(void) goto err_reset_headersplit; } =20 - if (bind_rx_queue(ifindex, mem->fd, queues, num_queues, &ys)) { + if (bind_rx_queue(ifindex, mem->fd, queues, num_queues, &ys, 0)) { pr_err("Failed to bind"); goto err_reset_headersplit; } @@ -1397,7 +1401,7 @@ int main(int argc, char *argv[]) int is_server =3D 0, opt; int ret, err =3D 1; =20 - while ((opt =3D getopt(argc, argv, "ls:c:p:v:q:t:f:z:")) !=3D -1) { + while ((opt =3D getopt(argc, argv, "ls:c:p:v:q:t:f:z:a:")) !=3D -1) { switch (opt) { case 'l': is_server =3D 1; @@ -1426,6 +1430,9 @@ int main(int argc, char *argv[]) case 'z': max_chunk =3D atoi(optarg); break; + case 'a': + devmem_autorelease =3D atoi(optarg); + break; case '?': fprintf(stderr, "unknown option: %c\n", optopt); break; --=20 2.47.3