From nobody Sun Feb 8 07:14:35 2026 Received: from mail-yw1-f182.google.com (mail-yw1-f182.google.com [209.85.128.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0D4CB285041 for ; Thu, 23 Oct 2025 21:00:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1761253234; cv=none; b=ivIa1xXPXjcIZTgEyh3RClWxrniA3SsrfojxGKtE3APRxKjdcYKkwM5FKG13CmLL+d22dQu3hF+T2Z1+quQiXBRAvSOXNvuLPjzNmy1WwL164hCGVsU6E5YucrRN8vd6IFzbVvZSj8hgBHzerCsM1Y5Ok81SNNu/YGQuAbQzr9Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1761253234; c=relaxed/simple; bh=fdrJ2W7mBo4ScK0+ExdwCjhXZ7mjfmL0Ggt7KK74nGg=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=FCe0w+6Ce742W3ngLu6Ql6oRbKSeTDparJln6vLiFajSQSiFmJ1pF+FexlX9SljkLWNOpsYVy1k6s6D2Q076+tzTutujDNNqxZPE/KcJ0vmTGn6vH/b5XI1SA3eW7KdVb2ONcjs6CurnnD6F7eUJ5mnWe+mmi8zt8xTJBz74FFc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=kQ3eZKvR; arc=none smtp.client-ip=209.85.128.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="kQ3eZKvR" Received: by mail-yw1-f182.google.com with SMTP id 00721157ae682-783fa3aa582so16208777b3.3 for ; Thu, 23 Oct 2025 14:00:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1761253232; x=1761858032; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=gJXxA4eRg+MQwHOWJI8nle9db3f24EujzoUrSFcpQz8=; b=kQ3eZKvRsBKY5XC/POQc7DD156EHCuzDEgNAAbQ1B1cG/1b7gcFuZRF6glX3VG3PE8 /hCUJ6i1yaCvU/SWZxTx2eCC89gOzosFVk5Ul2x5J9BMBJgWDv0lThMDnfzBiYmALuf6 7fNzodT0F8SevgSYss/rOkXqNQBUW/pbQcALa91JkQ8LVExvw0lx0ihF+KnPAK6VyeJJ BW3L8J++emFpgpMAdnx4j2pviUIpmwUKmmJNUnmYHS69637o+KG8rjn6tLMqR+zYQdWJ S6rsPGLU2HlxXNpdYo/w/HNBDg0uQTdKObTxaZEhQOc0bHSYcqRffVL+ooWgKL2j6W+G +B9g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1761253232; x=1761858032; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=gJXxA4eRg+MQwHOWJI8nle9db3f24EujzoUrSFcpQz8=; b=FX67cVR0lkNnKpWc9tsToR8d2QxXyTe0JaJuU/HQEvZbLmxmocZw5QH4mm4CoytmKp YOPloLJ5O6/3QjgNx8AM0KgKp9LED7NpjoRhmKnjVsVL2Zq2dQckJZYeZBkT+aDb/Jrt aJrnt0we37Ch2M54YL2CFzWEdTEpoN2zQ/wHmn72rFzWDzOPbEXd723V1w1NR/MWqwY6 zd9y1lO4E+FHG0YukC5I/q19wpG7djyb51NQ1e8KY11Zi9YlubJlEtSR0rkTVRlHpjZU yWkrYagz41WwBHmp4HtMA6+fJ1vOmtYlCnr4zsUeqt6yvLv6nH0mBpxPRDyNoOzyhdl4 WoFQ== X-Forwarded-Encrypted: i=1; AJvYcCWZbUA7dIzQtioLKE21i/D8kS4PeDEJt5E1dVoZifCqqO9kPRm2IhsZkOXgayyy7aCzabwSh7ElLCleB7k=@vger.kernel.org X-Gm-Message-State: AOJu0YwMbvwN4pSgTmpyQxAG8mcsYRqlM8Jr9IHt5M/d6ZWit5TRuObz urZHxVtaoHFdbW9lUgLMssmmYTHe3wI9ksnDri6JuucCN5RlV4al5lsh X-Gm-Gg: ASbGnctZl+vsXZq+DuVHCN4ou2mvmGSoP/P+Fd8M0oqhHA8XcxQMHzlsz/PeE1mwE3B nFoS7UAC6eYdx5hJSd8hYAaah6778SfOzjZeG5aymakcYQ97M1VPDSIrSya5jib6OlfBjin7CjA sgv4qZ2lWqq7awUT4WqabqZBWD4bpA6In/s+vw6XtheaN0w82TgGU4CFiqc1BbgtCYzEcsxBq0y wR/uM4hB6U1rHhA5uPq5Vdsm18fpjYBQh0PCGAy3dgn0j22+7Fp5Xi3+sFKD4Zn3/wKdZUt+Tr9 JdtTS+7D2BdiVbHtytiWabWVhPIrF4BUDZ8FYyWaUP4XxDB988ObrlWs8L4GnC+2FchtHcjVqse +5h6D5X9XGazUWKyXUkbCHlPl08pFWvFIyqziiPsZ0qzUEFNa9Tb1o/suq6b+DqHsijRDDpYqxb OUvRFl20mpwmEMfW/ln30t/Q== X-Google-Smtp-Source: AGHT+IEme6ZRZ3QzP4u+9rAEIbdLf6rnEyW7G1AZ/Yr399l93wvEZVkFNB32e6E/kws7pH65k0i0bw== X-Received: by 2002:a05:690c:6d11:b0:740:3210:6a9 with SMTP id 00721157ae682-7836d1e8357mr223105287b3.23.1761253231768; Thu, 23 Oct 2025 14:00:31 -0700 (PDT) Received: from localhost ([2a03:2880:25ff:5f::]) by smtp.gmail.com with ESMTPSA id 00721157ae682-785cd6cc5cbsm8513077b3.38.2025.10.23.14.00.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Oct 2025 14:00:31 -0700 (PDT) From: Bobby Eshleman Date: Thu, 23 Oct 2025 13:58:20 -0700 Subject: [PATCH net-next v5 1/4] net: devmem: rename tx_vec to vec in dmabuf binding Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20251023-scratch-bobbyeshleman-devmem-tcp-token-upstream-v5-1-47cb85f5259e@meta.com> References: <20251023-scratch-bobbyeshleman-devmem-tcp-token-upstream-v5-0-47cb85f5259e@meta.com> In-Reply-To: <20251023-scratch-bobbyeshleman-devmem-tcp-token-upstream-v5-0-47cb85f5259e@meta.com> To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Simon Horman , Kuniyuki Iwashima , Willem de Bruijn , Neal Cardwell , David Ahern , Mina Almasry Cc: Stanislav Fomichev , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Bobby Eshleman X-Mailer: b4 0.13.0 From: Bobby Eshleman Rename the 'tx_vec' field in struct net_devmem_dmabuf_binding to 'vec'. This field holds pointers to net_iov structures. The rename prepares for reusing 'vec' for both TX and RX directions. No functional change intended. Signed-off-by: Bobby Eshleman Reviewed-by: Mina Almasry --- net/core/devmem.c | 22 +++++++++++----------- net/core/devmem.h | 2 +- 2 files changed, 12 insertions(+), 12 deletions(-) diff --git a/net/core/devmem.c b/net/core/devmem.c index d9de31a6cc7f..b4c570d4f37a 100644 --- a/net/core/devmem.c +++ b/net/core/devmem.c @@ -74,7 +74,7 @@ void __net_devmem_dmabuf_binding_free(struct work_struct = *wq) dma_buf_detach(binding->dmabuf, binding->attachment); dma_buf_put(binding->dmabuf); xa_destroy(&binding->bound_rxqs); - kvfree(binding->tx_vec); + kvfree(binding->vec); kfree(binding); } =20 @@ -231,10 +231,10 @@ net_devmem_bind_dmabuf(struct net_device *dev, } =20 if (direction =3D=3D DMA_TO_DEVICE) { - binding->tx_vec =3D kvmalloc_array(dmabuf->size / PAGE_SIZE, - sizeof(struct net_iov *), - GFP_KERNEL); - if (!binding->tx_vec) { + binding->vec =3D kvmalloc_array(dmabuf->size / PAGE_SIZE, + sizeof(struct net_iov *), + GFP_KERNEL); + if (!binding->vec) { err =3D -ENOMEM; goto err_unmap; } @@ -248,7 +248,7 @@ net_devmem_bind_dmabuf(struct net_device *dev, dev_to_node(&dev->dev)); if (!binding->chunk_pool) { err =3D -ENOMEM; - goto err_tx_vec; + goto err_vec; } =20 virtual =3D 0; @@ -294,7 +294,7 @@ net_devmem_bind_dmabuf(struct net_device *dev, page_pool_set_dma_addr_netmem(net_iov_to_netmem(niov), net_devmem_get_dma_addr(niov)); if (direction =3D=3D DMA_TO_DEVICE) - binding->tx_vec[owner->area.base_virtual / PAGE_SIZE + i] =3D niov; + binding->vec[owner->area.base_virtual / PAGE_SIZE + i] =3D niov; } =20 virtual +=3D len; @@ -314,8 +314,8 @@ net_devmem_bind_dmabuf(struct net_device *dev, gen_pool_for_each_chunk(binding->chunk_pool, net_devmem_dmabuf_free_chunk_owner, NULL); gen_pool_destroy(binding->chunk_pool); -err_tx_vec: - kvfree(binding->tx_vec); +err_vec: + kvfree(binding->vec); err_unmap: dma_buf_unmap_attachment_unlocked(binding->attachment, binding->sgt, direction); @@ -361,7 +361,7 @@ struct net_devmem_dmabuf_binding *net_devmem_get_bindin= g(struct sock *sk, int err =3D 0; =20 binding =3D net_devmem_lookup_dmabuf(dmabuf_id); - if (!binding || !binding->tx_vec) { + if (!binding || !binding->vec) { err =3D -EINVAL; goto out_err; } @@ -393,7 +393,7 @@ net_devmem_get_niov_at(struct net_devmem_dmabuf_binding= *binding, *off =3D virt_addr % PAGE_SIZE; *size =3D PAGE_SIZE - *off; =20 - return binding->tx_vec[virt_addr / PAGE_SIZE]; + return binding->vec[virt_addr / PAGE_SIZE]; } =20 /*** "Dmabuf devmem memory provider" ***/ diff --git a/net/core/devmem.h b/net/core/devmem.h index 101150d761af..2ada54fb63d7 100644 --- a/net/core/devmem.h +++ b/net/core/devmem.h @@ -63,7 +63,7 @@ struct net_devmem_dmabuf_binding { * address. This array is convenient to map the virtual addresses to * net_iovs in the TX path. */ - struct net_iov **tx_vec; + struct net_iov **vec; =20 struct work_struct unbind_w; }; --=20 2.47.3 From nobody Sun Feb 8 07:14:35 2026 Received: from mail-yw1-f178.google.com (mail-yw1-f178.google.com [209.85.128.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 26578296BD0 for ; Thu, 23 Oct 2025 21:00:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.178 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1761253235; cv=none; b=qQ/ZX7MdDAUH2S6nTRBYMHcIYiPp4+OUS8+oL5E7WoyFc9L6xp6irt5LH05Kz2iAwJKaKgAysUlM6AHgmDtoKMwKGX4kb0ZZyyC6nmtjn6Q+Q6ipLbQO9vlYbAdRYPoduJx1Y8qk5dpHO0q43Kjg0TvEmHiGSHo7Be8k8LY3hmA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1761253235; c=relaxed/simple; bh=gXo9qbz0VuCHxtbMD6pT4iuXLZL3lwwihvU29sxTKFM=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=mFzcLYwsAc7O9QWZmcUNpFR8v4ePpdrXq5AKyXkqhC1ejUnFDsCRcjXbJWHt4esiqa+20dt8EHtgTyBaBySa40psvSlveBI5EGSg9Koxz6OACfe8pGfTZwUcf/6GScKvE683/Ld8mZCyXDYzO0Tw9gukCNGtz8yq++oaNbse2N8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=m3bmhMCT; arc=none smtp.client-ip=209.85.128.178 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="m3bmhMCT" Received: by mail-yw1-f178.google.com with SMTP id 00721157ae682-784807fa38dso15711637b3.2 for ; Thu, 23 Oct 2025 14:00:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1761253233; x=1761858033; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=aQD284fTBTbrkTq+WZ5fQmnQt2qjwJsfpEi9I4GFfrI=; b=m3bmhMCT09Q9AQcAmF1weSiVikcvW+LyruZrTzO8OLZNxYfKN5NY0VJr4DEG+QwqzB MbHh2tkbqkXuTIRl7ytOnFxF3fmzVqWNbfTj5frlX7ZzB64OR8rf7vJEwkP8otuwZfqB Cy0NmWUwv8pueirW1VIkLkddZkW8h9SYNH4EPCdQnEzOYKkEFhJRfmH+cRg96+3cRhYY q0chyXoQTQGPXKJb5KSMO2vQ3oHqbIP6iJrCCxea7GQwYZUZa2NyjrYeB/A4zrBmB7lt kuFgDy2Def+vmOYB+5WjelLA+BHsZG7N/mCunF1doxgfJXsfIPBOpZlbWCugE29BSJIr 7rpg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1761253233; x=1761858033; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=aQD284fTBTbrkTq+WZ5fQmnQt2qjwJsfpEi9I4GFfrI=; b=qfOpAEja2vI7bZBG0ggAsOB9Xz9osp5xM5toZGaB1vOqYd3pDPZnaC3OV5SqI1f9so 5VSL5dwSQk0H9BjvC8eKDMM2FOVKPprOBL39LeUHtbp0mvOM+4/ulw1ugSWAay+isSjg Zl8Bv2oQtoFX+Vz+vv9oLKA2Gh3RdbUi4S3YZh2s1ZjdHKFI3uij2UJ2pHTBuZ92UrDv xdUuxH+J40tMWeRgS80gNzwSkmMixCRi04ytZLNRQC2czVpOS3TjkiIFQMkwADlbktwr lMq5a9NnPqPyMMcwCjVedWJi9RbUY5i5bhgHFemZStYA9OoAGWymhMOBrRMkb8s9tZ1O mYUQ== X-Forwarded-Encrypted: i=1; AJvYcCUTBKdbTSHXuf5V3ayY7uygDvra05ZFXjkRNKZe1CnWAU7EWkhS65fAWePLR4wjKGknGN5+XiOqsGpApRE=@vger.kernel.org X-Gm-Message-State: AOJu0YxLJ31VhjCjbVPQg+XyBWjUyLMD94i1EbWZQEI5tV/cOLk/FooE 2apQjN94JteJZzFJL4g7EvUjsmBXrVOYu/2hSPDqX9vaXnYRGloGDrra X-Gm-Gg: ASbGncuxaSHgSZLAZ1U4rZgbSYIckouyGjyY1p7bAfyXmAymk2voQKBen2oqlDcaFL1 +EPcVnEEIu9+8ONX6mRwkTkjBx+wNZkA2qU/pAUosWk5lMBUu58YfTN6Jxc5rs71w9qXIbzPB0b 86Ye3pH+t9b+9dmWKLAgXdLH1IfhImY+zEju0E5vLTABU0BmsRwYnTfXx8QzjKKlEC4n61NGd7E c0+VSkVu+MJ+ab5jSYujjdWgT5o/WHLjzjRI0GwE1s12KP/OYad4I7bYRagTXwej5CXWiz7bokH dkLdaGr63eXVAW+66gx3gogPpPnc3JQ7QA7Sg4fXeXrlrjjtrXN4tQ9tkZxp3SKGLIYW2V2h+h0 FWA5vC6/v8zMDlgKBI680Iphru+2+NdBE5dA0BTTIbKeQw+jj78e7TbbUlTRDw9J8eGHgew2uoJ c0Xgo4fETmHn5DeuLDYFdGHg== X-Google-Smtp-Source: AGHT+IHG/Cr/c5i3KEq5Iudl/DqdJmXfAjdFgBWwqT8Ofxgiwu4nbIqyDzUfL0S4iZJRC+QoUa35Ag== X-Received: by 2002:a05:690e:144e:b0:63e:1d55:725e with SMTP id 956f58d0204a3-63e1d55738bmr16418017d50.58.1761253232907; Thu, 23 Oct 2025 14:00:32 -0700 (PDT) Received: from localhost ([2a03:2880:25ff:53::]) by smtp.gmail.com with ESMTPSA id 956f58d0204a3-63f378ef44dsm969494d50.7.2025.10.23.14.00.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Oct 2025 14:00:32 -0700 (PDT) From: Bobby Eshleman Date: Thu, 23 Oct 2025 13:58:21 -0700 Subject: [PATCH net-next v5 2/4] net: devmem: refactor sock_devmem_dontneed for autorelease split Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20251023-scratch-bobbyeshleman-devmem-tcp-token-upstream-v5-2-47cb85f5259e@meta.com> References: <20251023-scratch-bobbyeshleman-devmem-tcp-token-upstream-v5-0-47cb85f5259e@meta.com> In-Reply-To: <20251023-scratch-bobbyeshleman-devmem-tcp-token-upstream-v5-0-47cb85f5259e@meta.com> To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Simon Horman , Kuniyuki Iwashima , Willem de Bruijn , Neal Cardwell , David Ahern , Mina Almasry Cc: Stanislav Fomichev , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Bobby Eshleman X-Mailer: b4 0.13.0 From: Bobby Eshleman Refactor sock_devmem_dontneed() in preparation for supporting both autorelease and manual token release modes. Split the function into two parts: - sock_devmem_dontneed(): handles input validation, token allocation, and copying from userspace - sock_devmem_dontneed_autorelease(): performs the actual token release via xarray lookup and page pool put This separation allows a future commit to add a parallel sock_devmem_dontneed_manual_release() function that uses a different token tracking mechanism (per-niov reference counting) without duplicating the input validation logic. The refactoring is purely mechanical with no functional change. Only intended to minimize the noise in subsequent patches. Signed-off-by: Bobby Eshleman Reviewed-by: Mina Almasry --- net/core/sock.c | 52 ++++++++++++++++++++++++++++++++-------------------- 1 file changed, 32 insertions(+), 20 deletions(-) diff --git a/net/core/sock.c b/net/core/sock.c index a99132cc0965..e7b378753763 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1082,30 +1082,13 @@ static int sock_reserve_memory(struct sock *sk, int= bytes) #define MAX_DONTNEED_FRAGS 1024 =20 static noinline_for_stack int -sock_devmem_dontneed(struct sock *sk, sockptr_t optval, unsigned int optle= n) +sock_devmem_dontneed_autorelease(struct sock *sk, struct dmabuf_token *tok= ens, + unsigned int num_tokens) { - unsigned int num_tokens, i, j, k, netmem_num =3D 0; - struct dmabuf_token *tokens; + unsigned int i, j, k, netmem_num =3D 0; int ret =3D 0, num_frags =3D 0; netmem_ref netmems[16]; =20 - if (!sk_is_tcp(sk)) - return -EBADF; - - if (optlen % sizeof(*tokens) || - optlen > sizeof(*tokens) * MAX_DONTNEED_TOKENS) - return -EINVAL; - - num_tokens =3D optlen / sizeof(*tokens); - tokens =3D kvmalloc_array(num_tokens, sizeof(*tokens), GFP_KERNEL); - if (!tokens) - return -ENOMEM; - - if (copy_from_sockptr(tokens, optval, optlen)) { - kvfree(tokens); - return -EFAULT; - } - xa_lock_bh(&sk->sk_user_frags); for (i =3D 0; i < num_tokens; i++) { for (j =3D 0; j < tokens[i].token_count; j++) { @@ -1135,6 +1118,35 @@ sock_devmem_dontneed(struct sock *sk, sockptr_t optv= al, unsigned int optlen) for (k =3D 0; k < netmem_num; k++) WARN_ON_ONCE(!napi_pp_put_page(netmems[k])); =20 + return ret; +} + +static noinline_for_stack int +sock_devmem_dontneed(struct sock *sk, sockptr_t optval, unsigned int optle= n) +{ + struct dmabuf_token *tokens; + unsigned int num_tokens; + int ret; + + if (!sk_is_tcp(sk)) + return -EBADF; + + if (optlen % sizeof(*tokens) || + optlen > sizeof(*tokens) * MAX_DONTNEED_TOKENS) + return -EINVAL; + + num_tokens =3D optlen / sizeof(*tokens); + tokens =3D kvmalloc_array(num_tokens, sizeof(*tokens), GFP_KERNEL); + if (!tokens) + return -ENOMEM; + + if (copy_from_sockptr(tokens, optval, optlen)) { + kvfree(tokens); + return -EFAULT; + } + + ret =3D sock_devmem_dontneed_autorelease(sk, tokens, num_tokens); + kvfree(tokens); return ret; } --=20 2.47.3 From nobody Sun Feb 8 07:14:35 2026 Received: from mail-yx1-f43.google.com (mail-yx1-f43.google.com [74.125.224.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 481AA2BEFE0 for ; Thu, 23 Oct 2025 21:00:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=74.125.224.43 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1761253237; cv=none; b=HwA9U5ZtXKzwqe96nYEec7yMRVs8dG+d8ZU8uSHqJ0TX+p3APGVuyuLl8qrhAQfU3jv8zZFxVWmmdzxuHfcMBVOs33v2LbDSnOzy33m5DCLARq7atoO4kyPezluVB2DP/cUsmJW1RQ1jr1gUU4RVRLRW4/DOdIkKcbRUEKyZZDI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1761253237; c=relaxed/simple; bh=tiGZrIhQN7al1AkXsnsWbyfuAS+K9g0Elp5nN1f1Zzw=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=QuhxE4Q0LT+mHZJYHCHSI0ZjzYmLenQciaCLPugIPjj0zGUoGcXE9IUQL+HCTqK4o8vHrmIHxP4RRn0AdemmTziQmTSB6x3QtmPOyCh4shuZIAPb3aumIyxU82rnchfKrbtfk7mAx3eT6xORp7yjh/wtDh0OXlp2EnDJYX9cQMg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=aX+XiYa/; arc=none smtp.client-ip=74.125.224.43 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="aX+XiYa/" Received: by mail-yx1-f43.google.com with SMTP id 956f58d0204a3-63e0dd765a0so1337933d50.3 for ; Thu, 23 Oct 2025 14:00:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1761253234; x=1761858034; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=wfeloB3By7XpEZq45XGU8DD3RfucB12MG9BXvik3Dno=; b=aX+XiYa/ESIRN/H0J3BM1pFUzv+RNsK5umHwHLNyB2drhyemIsPcgGKoJyF0KEsYNs 1rZvYZ/GZ4m2T7UPJ618cj00tJDWbNVBUpjYEVDJW7kySOKgEBtp1Ga7MAKihhvr5Fvw dS/bNfYjhHz85hnHjL5Oq0BjGzHoqzLymTq8m63fdad2StIx2LZHUkAzztjjh/UvfnkM /D2emrtZgeFe04to9Ksk9xfe3NxH/8UVqEoMjwWcDFy7jQBvuhSELKegnuh6qa0qhKWb oLUstgyJ64eTLxfbgQb4qTQgQ+E7XgRIxCoCvjN5EwR3FYD6p4pOq71YrFY5Jd54fYMY Y/8A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1761253234; x=1761858034; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=wfeloB3By7XpEZq45XGU8DD3RfucB12MG9BXvik3Dno=; b=Z7cUFAvP0kb0lzkNJIKYXeawF2qUyfIV0MYnVAjRuAHkxTu5Sg82SvLvoqpiTy7VfS 4zfhurmi0A2Il1LA8jSnvWmknE0Ymk+edFfWQeYmdxvaOfrkxX4UcpXNfvwQfdVSG/+5 ZvrN1aRfNah40Yi+Xb3YBHXFvQvc+NlpbgLMaqfK918anw6GStjvqV1BvQeMRI/fdqDH C/eAgsPvlf/68UCrXAYbKCvdSz2icJF0rJhjZ4LBzIomb3C0iCzQifNnDjgxK8OWErTc JExBhLRaPkYyPA7QUWPTJ0X4QYjpDqxmv9/99eDMmOs/Gz4soH7hxwDBXE/Gurr/kil+ oceg== X-Forwarded-Encrypted: i=1; AJvYcCWLxAR8SyyPDH0lGNDUP8HDB3Qc2waedoVJtb9filrX7PfCAPtHpUjTlQuXahZZ7q7H7p6A6f5g2t4oC+Y=@vger.kernel.org X-Gm-Message-State: AOJu0Yz4yFWiAk30WxFk7qpAjQ9/P3ZoaxX0fxbJUbzrnBq9K7m8Oc5h k8LHX+foRTpvnQJJRCTrni86fWF8r6p8CalgaRL2fhhSXsI8a2VJd6Ji X-Gm-Gg: ASbGncti5sIZkJy6lG0iNp+srWqxTHPfUREemdhzP9OACQt3zQlsPwCoFKCcLUQxUVW 92II5+etcI0fSYVXe9AY9clKwpvQpP5rOPFd+d3xITN709dSWFTWFS0/HDMue2fbJhsBTWk6cmR XnORu/mDdA3oSFs6OJ1dwsx2i2OxXEngvpsoCs17L5lNtwMZK2A0PKUgARY0iTnrsinoEyUdmk+ uOVSQKJG8lqpLy3AI+GkxsFRNUJHap5Wswz/khI34nbnR3VpLJ41nnmZvxklOJuZJQ7eOo07Xh7 7fDObKnTFlZl3SlYhCUCoknIAis2Hkwtg5jXvBi8f+XgUxB+IvqnlfVY0pvzpqVeDK1SG200FUy /4wz5BAB4iE9W+zPQYHYwumQqApJpw8XPsHP4kQ8nN1SCwNOo7Pd1fM9pcEXGWaOwyde5zGD9IV n+4Z9s8kgTuB+Hb15WDGY7rg== X-Google-Smtp-Source: AGHT+IF2I0Van4Uv+serwEhwXlardkuhRZ5/eOKfgS426oDpqB9zEhrBv9MG+NXxfoFgWnMAxgJiAw== X-Received: by 2002:a05:690c:4b13:b0:783:7266:58ee with SMTP id 00721157ae682-78372666210mr411883857b3.7.1761253233877; Thu, 23 Oct 2025 14:00:33 -0700 (PDT) Received: from localhost ([2a03:2880:25ff:73::]) by smtp.gmail.com with ESMTPSA id 956f58d0204a3-63f37a06e8dsm965631d50.14.2025.10.23.14.00.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Oct 2025 14:00:33 -0700 (PDT) From: Bobby Eshleman Date: Thu, 23 Oct 2025 13:58:22 -0700 Subject: [PATCH net-next v5 3/4] net: devmem: use niov array for token management Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20251023-scratch-bobbyeshleman-devmem-tcp-token-upstream-v5-3-47cb85f5259e@meta.com> References: <20251023-scratch-bobbyeshleman-devmem-tcp-token-upstream-v5-0-47cb85f5259e@meta.com> In-Reply-To: <20251023-scratch-bobbyeshleman-devmem-tcp-token-upstream-v5-0-47cb85f5259e@meta.com> To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Simon Horman , Kuniyuki Iwashima , Willem de Bruijn , Neal Cardwell , David Ahern , Mina Almasry Cc: Stanislav Fomichev , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Bobby Eshleman X-Mailer: b4 0.13.0 From: Bobby Eshleman Replace xarray-based token lookups with direct array access using page offsets as dmabuf tokens. When enabled, this eliminates xarray overhead and reduces CPU utilization in devmem RX threads by approximately 13%. This patch changes the meaning of tokens. Tokens previously referred to unique fragments of pages. In this patch tokens instead represent references to pages, not fragments. Because of this, multiple tokens may refer to the same page and so have identical value (e.g., two small fragments may coexist on the same page). The token and offset pair that the user receives uniquely identifies fragments if needed. This assumes that the user is not attempting to sort / uniq the token list using tokens alone. This introduces a restriction: devmem RX sockets cannot switch dmabuf bindings. This is necessary because 32-bit tokens lack sufficient bits to encode both large dmabuf page counts and binding/queue IDs. For example, a system with 8 NICs and 32 queues needs 8 bits for binding IDs, leaving only 24 bits for pages (64GB max). This restriction aligns with common usage, as steering flows to different queues/devices is often undesirable for TCP. This patch adds an atomic uref counter to net_iov for tracking user references via binding->vec. The pp_ref_count is only updated on uref transitions from zero to one or from one to zero, to minimize atomic overhead. If a user fails to refill and closes before returning all tokens, the binding will finish the uref release when unbound. A flag "autorelease" is added. This will be used for enabling the old behavior of the kernel releasing references for the sockets upon close(2) (autorelease), instead of requiring that socket users do this themselves. The autorelease flag is always true in this patch, meaning that the old (non-optimized) behavior is kept unconditionally. A future patch supports a user-facing knob to toggle this feature and will change the default to false for the improved performance. Signed-off-by: Bobby Eshleman --- Changes in v5: - remove unused variables - introduce autorelease flag, preparing for future patch toggle new behavior Changes in v3: - make urefs per-binding instead of per-socket, reducing memory footprint - fallback to cleaning up references in dmabuf unbind if socket leaked tokens - drop ethtool patch Changes in v2: - always use GFP_ZERO for binding->vec (Mina) - remove WARN for changed binding (Mina) - remove extraneous binding ref get (Mina) - remove WARNs on invalid user input (Mina) - pre-assign niovs in binding->vec for RX case (Mina) - use atomic_set(, 0) to initialize sk_user_frags.urefs - fix length of alloc for urefs --- include/net/netmem.h | 1 + include/net/sock.h | 8 ++++-- net/core/devmem.c | 45 ++++++++++++++++++++++------- net/core/devmem.h | 11 ++++++- net/core/sock.c | 75 ++++++++++++++++++++++++++++++++++++++++++++= ---- net/ipv4/tcp.c | 69 +++++++++++++++++++++++++++++++++----------- net/ipv4/tcp_ipv4.c | 12 ++++++-- net/ipv4/tcp_minisocks.c | 3 +- 8 files changed, 185 insertions(+), 39 deletions(-) diff --git a/include/net/netmem.h b/include/net/netmem.h index 651e2c62d1dd..de39afaede8d 100644 --- a/include/net/netmem.h +++ b/include/net/netmem.h @@ -116,6 +116,7 @@ struct net_iov { }; struct net_iov_area *owner; enum net_iov_type type; + atomic_t uref; }; =20 struct net_iov_area { diff --git a/include/net/sock.h b/include/net/sock.h index 01ce231603db..1963ab54c465 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -350,7 +350,7 @@ struct sk_filter; * @sk_scm_rights: flagged by SO_PASSRIGHTS to recv SCM_RIGHTS * @sk_scm_unused: unused flags for scm_recv() * @ns_tracker: tracker for netns reference - * @sk_user_frags: xarray of pages the user is holding a reference on. + * @sk_devmem_info: the devmem binding information for the socket * @sk_owner: reference to the real owner of the socket that calls * sock_lock_init_class_and_name(). */ @@ -579,7 +579,11 @@ struct sock { struct numa_drop_counters *sk_drop_counters; struct rcu_head sk_rcu; netns_tracker ns_tracker; - struct xarray sk_user_frags; + struct { + struct xarray frags; + struct net_devmem_dmabuf_binding *binding; + bool autorelease; + } sk_devmem_info; =20 #if IS_ENABLED(CONFIG_PROVE_LOCKING) && IS_ENABLED(CONFIG_MODULES) struct module *sk_owner; diff --git a/net/core/devmem.c b/net/core/devmem.c index b4c570d4f37a..8f3199fe0f7b 100644 --- a/net/core/devmem.c +++ b/net/core/devmem.c @@ -11,6 +11,7 @@ #include #include #include +#include #include #include #include @@ -115,6 +116,29 @@ void net_devmem_free_dmabuf(struct net_iov *niov) gen_pool_free(binding->chunk_pool, dma_addr, PAGE_SIZE); } =20 +static void +net_devmem_dmabuf_binding_put_urefs(struct net_devmem_dmabuf_binding *bind= ing) +{ + int i; + + if (binding->autorelease) + return; + + for (i =3D 0; i < binding->dmabuf->size / PAGE_SIZE; i++) { + struct net_iov *niov; + netmem_ref netmem; + + niov =3D binding->vec[i]; + + if (!net_is_devmem_iov(niov)) + continue; + + netmem =3D net_iov_to_netmem(niov); + if (atomic_xchg(&niov->uref, 0) > 0) + WARN_ON_ONCE(!napi_pp_put_page(netmem)); + } +} + void net_devmem_unbind_dmabuf(struct net_devmem_dmabuf_binding *binding) { struct netdev_rx_queue *rxq; @@ -142,6 +166,7 @@ void net_devmem_unbind_dmabuf(struct net_devmem_dmabuf_= binding *binding) __net_mp_close_rxq(binding->dev, rxq_idx, &mp_params); } =20 + net_devmem_dmabuf_binding_put_urefs(binding); net_devmem_dmabuf_binding_put(binding); } =20 @@ -230,14 +255,13 @@ net_devmem_bind_dmabuf(struct net_device *dev, goto err_detach; } =20 - if (direction =3D=3D DMA_TO_DEVICE) { - binding->vec =3D kvmalloc_array(dmabuf->size / PAGE_SIZE, - sizeof(struct net_iov *), - GFP_KERNEL); - if (!binding->vec) { - err =3D -ENOMEM; - goto err_unmap; - } + /* used by tx and also rx if !binding->autorelease */ + binding->vec =3D kvmalloc_array(dmabuf->size / PAGE_SIZE, + sizeof(struct net_iov *), + GFP_KERNEL | __GFP_ZERO); + if (!binding->vec) { + err =3D -ENOMEM; + goto err_unmap; } =20 /* For simplicity we expect to make PAGE_SIZE allocations, but the @@ -291,10 +315,10 @@ net_devmem_bind_dmabuf(struct net_device *dev, niov =3D &owner->area.niovs[i]; niov->type =3D NET_IOV_DMABUF; niov->owner =3D &owner->area; + atomic_set(&niov->uref, 0); page_pool_set_dma_addr_netmem(net_iov_to_netmem(niov), net_devmem_get_dma_addr(niov)); - if (direction =3D=3D DMA_TO_DEVICE) - binding->vec[owner->area.base_virtual / PAGE_SIZE + i] =3D niov; + binding->vec[owner->area.base_virtual / PAGE_SIZE + i] =3D niov; } =20 virtual +=3D len; @@ -307,6 +331,7 @@ net_devmem_bind_dmabuf(struct net_device *dev, goto err_free_chunks; =20 list_add(&binding->list, &priv->bindings); + binding->autorelease =3D true; =20 return binding; =20 diff --git a/net/core/devmem.h b/net/core/devmem.h index 2ada54fb63d7..7662e9e42c35 100644 --- a/net/core/devmem.h +++ b/net/core/devmem.h @@ -61,11 +61,20 @@ struct net_devmem_dmabuf_binding { =20 /* Array of net_iov pointers for this binding, sorted by virtual * address. This array is convenient to map the virtual addresses to - * net_iovs in the TX path. + * net_iovs. */ struct net_iov **vec; =20 struct work_struct unbind_w; + + /* If true, outstanding tokens will be automatically released upon each + * socket's close(2). + * + * If false, then sockets are responsible for releasing tokens before + * close(2). The kernel will only release lingering tokens when the + * dmabuf is unbound. + */ + bool autorelease; }; =20 #if defined(CONFIG_NET_DEVMEM) diff --git a/net/core/sock.c b/net/core/sock.c index e7b378753763..595b5a858d03 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -87,6 +87,7 @@ =20 #include #include +#include #include #include #include @@ -151,6 +152,7 @@ #include =20 #include "dev.h" +#include "devmem.h" =20 static DEFINE_MUTEX(proto_list_mutex); static LIST_HEAD(proto_list); @@ -1081,6 +1083,57 @@ static int sock_reserve_memory(struct sock *sk, int = bytes) #define MAX_DONTNEED_TOKENS 128 #define MAX_DONTNEED_FRAGS 1024 =20 +static noinline_for_stack int +sock_devmem_dontneed_manual_release(struct sock *sk, struct dmabuf_token *= tokens, + unsigned int num_tokens) +{ + unsigned int netmem_num =3D 0; + int ret =3D 0, num_frags =3D 0; + netmem_ref netmems[16]; + struct net_iov *niov; + unsigned int i, j, k; + + for (i =3D 0; i < num_tokens; i++) { + for (j =3D 0; j < tokens[i].token_count; j++) { + struct net_iov *niov; + unsigned int token; + netmem_ref netmem; + + token =3D tokens[i].token_start + j; + if (token >=3D sk->sk_devmem_info.binding->dmabuf->size / PAGE_SIZE) + break; + + if (++num_frags > MAX_DONTNEED_FRAGS) + goto frag_limit_reached; + niov =3D sk->sk_devmem_info.binding->vec[token]; + netmem =3D net_iov_to_netmem(niov); + + if (!netmem || WARN_ON_ONCE(!netmem_is_net_iov(netmem))) + continue; + + netmems[netmem_num++] =3D netmem; + if (netmem_num =3D=3D ARRAY_SIZE(netmems)) { + for (k =3D 0; k < netmem_num; k++) { + niov =3D netmem_to_net_iov(netmems[k]); + if (atomic_dec_and_test(&niov->uref)) + WARN_ON_ONCE(!napi_pp_put_page(netmems[k])); + } + netmem_num =3D 0; + } + ret++; + } + } + +frag_limit_reached: + for (k =3D 0; k < netmem_num; k++) { + niov =3D netmem_to_net_iov(netmems[k]); + if (atomic_dec_and_test(&niov->uref)) + WARN_ON_ONCE(!napi_pp_put_page(netmems[k])); + } + + return ret; +} + static noinline_for_stack int sock_devmem_dontneed_autorelease(struct sock *sk, struct dmabuf_token *tok= ens, unsigned int num_tokens) @@ -1089,32 +1142,32 @@ sock_devmem_dontneed_autorelease(struct sock *sk, s= truct dmabuf_token *tokens, int ret =3D 0, num_frags =3D 0; netmem_ref netmems[16]; =20 - xa_lock_bh(&sk->sk_user_frags); + xa_lock_bh(&sk->sk_devmem_info.frags); for (i =3D 0; i < num_tokens; i++) { for (j =3D 0; j < tokens[i].token_count; j++) { if (++num_frags > MAX_DONTNEED_FRAGS) goto frag_limit_reached; =20 netmem_ref netmem =3D (__force netmem_ref)__xa_erase( - &sk->sk_user_frags, tokens[i].token_start + j); + &sk->sk_devmem_info.frags, tokens[i].token_start + j); =20 if (!netmem || WARN_ON_ONCE(!netmem_is_net_iov(netmem))) continue; =20 netmems[netmem_num++] =3D netmem; if (netmem_num =3D=3D ARRAY_SIZE(netmems)) { - xa_unlock_bh(&sk->sk_user_frags); + xa_unlock_bh(&sk->sk_devmem_info.frags); for (k =3D 0; k < netmem_num; k++) WARN_ON_ONCE(!napi_pp_put_page(netmems[k])); netmem_num =3D 0; - xa_lock_bh(&sk->sk_user_frags); + xa_lock_bh(&sk->sk_devmem_info.frags); } ret++; } } =20 frag_limit_reached: - xa_unlock_bh(&sk->sk_user_frags); + xa_unlock_bh(&sk->sk_devmem_info.frags); for (k =3D 0; k < netmem_num; k++) WARN_ON_ONCE(!napi_pp_put_page(netmems[k])); =20 @@ -1135,6 +1188,12 @@ sock_devmem_dontneed(struct sock *sk, sockptr_t optv= al, unsigned int optlen) optlen > sizeof(*tokens) * MAX_DONTNEED_TOKENS) return -EINVAL; =20 + /* recvmsg() has never returned a token for this socket, which needs to + * happen before we know if the dmabuf has autorelease set or not. + */ + if (!sk->sk_devmem_info.binding) + return -EINVAL; + num_tokens =3D optlen / sizeof(*tokens); tokens =3D kvmalloc_array(num_tokens, sizeof(*tokens), GFP_KERNEL); if (!tokens) @@ -1145,7 +1204,11 @@ sock_devmem_dontneed(struct sock *sk, sockptr_t optv= al, unsigned int optlen) return -EFAULT; } =20 - ret =3D sock_devmem_dontneed_autorelease(sk, tokens, num_tokens); + if (sk->sk_devmem_info.autorelease) + ret =3D sock_devmem_dontneed_autorelease(sk, tokens, num_tokens); + else + ret =3D sock_devmem_dontneed_manual_release(sk, tokens, + num_tokens); =20 kvfree(tokens); return ret; diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index e15b38f6bd2d..cfa77c852e64 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -260,6 +260,7 @@ #include #include #include +#include #include #include #include @@ -492,7 +493,9 @@ void tcp_init_sock(struct sock *sk) =20 set_bit(SOCK_SUPPORT_ZC, &sk->sk_socket->flags); sk_sockets_allocated_inc(sk); - xa_init_flags(&sk->sk_user_frags, XA_FLAGS_ALLOC1); + xa_init_flags(&sk->sk_devmem_info.frags, XA_FLAGS_ALLOC1); + sk->sk_devmem_info.binding =3D NULL; + sk->sk_devmem_info.autorelease =3D false; } EXPORT_IPV6_MOD(tcp_init_sock); =20 @@ -2422,11 +2425,11 @@ static void tcp_xa_pool_commit_locked(struct sock *= sk, struct tcp_xa_pool *p) =20 /* Commit part that has been copied to user space. */ for (i =3D 0; i < p->idx; i++) - __xa_cmpxchg(&sk->sk_user_frags, p->tokens[i], XA_ZERO_ENTRY, + __xa_cmpxchg(&sk->sk_devmem_info.frags, p->tokens[i], XA_ZERO_ENTRY, (__force void *)p->netmems[i], GFP_KERNEL); /* Rollback what has been pre-allocated and is no longer needed. */ for (; i < p->max; i++) - __xa_erase(&sk->sk_user_frags, p->tokens[i]); + __xa_erase(&sk->sk_devmem_info.frags, p->tokens[i]); =20 p->max =3D 0; p->idx =3D 0; @@ -2437,11 +2440,11 @@ static void tcp_xa_pool_commit(struct sock *sk, str= uct tcp_xa_pool *p) if (!p->max) return; =20 - xa_lock_bh(&sk->sk_user_frags); + xa_lock_bh(&sk->sk_devmem_info.frags); =20 tcp_xa_pool_commit_locked(sk, p); =20 - xa_unlock_bh(&sk->sk_user_frags); + xa_unlock_bh(&sk->sk_devmem_info.frags); } =20 static int tcp_xa_pool_refill(struct sock *sk, struct tcp_xa_pool *p, @@ -2452,18 +2455,18 @@ static int tcp_xa_pool_refill(struct sock *sk, stru= ct tcp_xa_pool *p, if (p->idx < p->max) return 0; =20 - xa_lock_bh(&sk->sk_user_frags); + xa_lock_bh(&sk->sk_devmem_info.frags); =20 tcp_xa_pool_commit_locked(sk, p); =20 for (k =3D 0; k < max_frags; k++) { - err =3D __xa_alloc(&sk->sk_user_frags, &p->tokens[k], + err =3D __xa_alloc(&sk->sk_devmem_info.frags, &p->tokens[k], XA_ZERO_ENTRY, xa_limit_31b, GFP_KERNEL); if (err) break; } =20 - xa_unlock_bh(&sk->sk_user_frags); + xa_unlock_bh(&sk->sk_devmem_info.frags); =20 p->max =3D k; p->idx =3D 0; @@ -2477,6 +2480,7 @@ static int tcp_recvmsg_dmabuf(struct sock *sk, const = struct sk_buff *skb, unsigned int offset, struct msghdr *msg, int remaining_len) { + struct net_devmem_dmabuf_binding *binding =3D NULL; struct dmabuf_cmsg dmabuf_cmsg =3D { 0 }; struct tcp_xa_pool tcp_xa_pool; unsigned int start; @@ -2534,6 +2538,7 @@ static int tcp_recvmsg_dmabuf(struct sock *sk, const = struct sk_buff *skb, skb_frag_t *frag =3D &skb_shinfo(skb)->frags[i]; struct net_iov *niov; u64 frag_offset; + u32 token; int end; =20 /* !skb_frags_readable() should indicate that ALL the @@ -2566,13 +2571,35 @@ static int tcp_recvmsg_dmabuf(struct sock *sk, cons= t struct sk_buff *skb, start; dmabuf_cmsg.frag_offset =3D frag_offset; dmabuf_cmsg.frag_size =3D copy; - err =3D tcp_xa_pool_refill(sk, &tcp_xa_pool, - skb_shinfo(skb)->nr_frags - i); - if (err) + + binding =3D net_devmem_iov_binding(niov); + + if (!sk->sk_devmem_info.binding) { + sk->sk_devmem_info.binding =3D binding; + sk->sk_devmem_info.autorelease =3D + binding->autorelease; + } + + if (sk->sk_devmem_info.binding !=3D binding) { + err =3D -EFAULT; goto out; + } + + if (binding->autorelease) { + err =3D tcp_xa_pool_refill(sk, &tcp_xa_pool, + skb_shinfo(skb)->nr_frags - i); + if (err) + goto out; + + dmabuf_cmsg.frag_token =3D + tcp_xa_pool.tokens[tcp_xa_pool.idx]; + } else { + token =3D net_iov_virtual_addr(niov) >> PAGE_SHIFT; + dmabuf_cmsg.frag_token =3D token; + } + =20 /* Will perform the exchange later */ - dmabuf_cmsg.frag_token =3D tcp_xa_pool.tokens[tcp_xa_pool.idx]; dmabuf_cmsg.dmabuf_id =3D net_devmem_iov_binding_id(niov); =20 offset +=3D copy; @@ -2585,8 +2612,14 @@ static int tcp_recvmsg_dmabuf(struct sock *sk, const= struct sk_buff *skb, if (err) goto out; =20 - atomic_long_inc(&niov->pp_ref_count); - tcp_xa_pool.netmems[tcp_xa_pool.idx++] =3D skb_frag_netmem(frag); + if (sk->sk_devmem_info.autorelease) { + atomic_long_inc(&niov->pp_ref_count); + tcp_xa_pool.netmems[tcp_xa_pool.idx++] =3D + skb_frag_netmem(frag); + } else { + if (atomic_inc_return(&niov->uref) =3D=3D 1) + atomic_long_inc(&niov->pp_ref_count); + } =20 sent +=3D copy; =20 @@ -2596,7 +2629,9 @@ static int tcp_recvmsg_dmabuf(struct sock *sk, const = struct sk_buff *skb, start =3D end; } =20 - tcp_xa_pool_commit(sk, &tcp_xa_pool); + if (sk->sk_devmem_info.autorelease) + tcp_xa_pool_commit(sk, &tcp_xa_pool); + if (!remaining_len) goto out; =20 @@ -2614,7 +2649,9 @@ static int tcp_recvmsg_dmabuf(struct sock *sk, const = struct sk_buff *skb, } =20 out: - tcp_xa_pool_commit(sk, &tcp_xa_pool); + if (sk->sk_devmem_info.autorelease) + tcp_xa_pool_commit(sk, &tcp_xa_pool); + if (!sent) sent =3D err; =20 diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 40a76da5364a..feb15440cac4 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -89,6 +89,9 @@ =20 #include =20 +#include +#include "../core/devmem.h" + #include =20 #ifdef CONFIG_TCP_MD5SIG @@ -2493,7 +2496,7 @@ static void tcp_release_user_frags(struct sock *sk) unsigned long index; void *netmem; =20 - xa_for_each(&sk->sk_user_frags, index, netmem) + xa_for_each(&sk->sk_devmem_info.frags, index, netmem) WARN_ON_ONCE(!napi_pp_put_page((__force netmem_ref)netmem)); #endif } @@ -2502,9 +2505,12 @@ void tcp_v4_destroy_sock(struct sock *sk) { struct tcp_sock *tp =3D tcp_sk(sk); =20 - tcp_release_user_frags(sk); + if (sk->sk_devmem_info.binding && + sk->sk_devmem_info.binding->autorelease) + tcp_release_user_frags(sk); =20 - xa_destroy(&sk->sk_user_frags); + xa_destroy(&sk->sk_devmem_info.frags); + sk->sk_devmem_info.binding =3D NULL; =20 trace_tcp_destroy_sock(sk); =20 diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c index ded2cf1f6006..512a3dbb57a4 100644 --- a/net/ipv4/tcp_minisocks.c +++ b/net/ipv4/tcp_minisocks.c @@ -663,7 +663,8 @@ struct sock *tcp_create_openreq_child(const struct sock= *sk, =20 __TCP_INC_STATS(sock_net(sk), TCP_MIB_PASSIVEOPENS); =20 - xa_init_flags(&newsk->sk_user_frags, XA_FLAGS_ALLOC1); + xa_init_flags(&newsk->sk_devmem_info.frags, XA_FLAGS_ALLOC1); + newsk->sk_devmem_info.binding =3D NULL; =20 return newsk; } --=20 2.47.3 From nobody Sun Feb 8 07:14:35 2026 Received: from mail-yw1-f170.google.com (mail-yw1-f170.google.com [209.85.128.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D9D1A2C21D4 for ; Thu, 23 Oct 2025 21:00:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.170 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1761253237; cv=none; b=YNCGuJnUnH+3hKvkdtuCaU6O5H1LhUQtqoimIrCxlRh74EqP9r67jBKZkvn4kj7G8f0iiCXWa5wK+nTeecNVtx/4NRlaGaEsHnXfuf9XkEsI+aDJiotgBgGp7UfmiWdG/Bepqiw8+SqIejGbCwMMr9L2s4nXlycR6GxobIUIJxg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1761253237; c=relaxed/simple; bh=1YaM8Gl2RC8oqmYjWIDWFTPbTKaqfIddmif0TMp102s=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=g001mfqZwhYomU8FMvLnucA1MJMG8zizJfgeg7UMNEm0XwPeizkkWTTBUysIrlzYNbp8SFc+yWUqmIcSxWGFfH1bbC6K1xt6B1Ratity7Ynn7HysiclAKzIFa1uLO7mN5RR6U5fc7Q8qWQFag8JHR9TRPm40unYjm0Oqrrt++2c= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=I3FdWqpU; arc=none smtp.client-ip=209.85.128.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="I3FdWqpU" Received: by mail-yw1-f170.google.com with SMTP id 00721157ae682-783fa3aa122so17446677b3.0 for ; Thu, 23 Oct 2025 14:00:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1761253235; x=1761858035; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=SwlreoE9kQbwjOXFtM8S3Npqc7qKt2aCVWyGonuNkt8=; b=I3FdWqpUj7+feu6qm2JSxV5+sjT0xb+6YO32dVbt5i36hBclfQ2hjD7QnDsSv3Jx6y KTeucP7EB3TRMf3/NVQfvv883opG2fdq08oQ0W7WZlKnTk1eLGvppC7Pud1iwqhjTgbD ofyQei7h7CJwyRycQyQwxbDqvN+ZoVHTXQ5NBnUymoiRMZ1X4WHPdbwW//wm1kTU+3FL gfQURzmDGlAce3nT4fSqcnkct3fDQ82ho+HT/f1od2Wf8nNcw/MBG8xt2E3Jtyiclg2a K+d0WU0iNvcizVskMfRlU35jHaXJeHYRXo4vBzmkShTTDQBoivr94SAsqvqzKekWXxqj gBwg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1761253235; x=1761858035; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=SwlreoE9kQbwjOXFtM8S3Npqc7qKt2aCVWyGonuNkt8=; b=AX5etn1K5nsrTrputeosfCANCykH/hNXNmfkwMYnqUnxEVal2sSQFYHr+Eot6wsx4v j7+0IHK+7i3L182F5t/22N+dluEE+rSfKBdKSq7MyHE6n2OTqHUoO0aIv+mTqkt+9Y2n fgBaX/F3VhSIBGBzGEVj8cSwRTV0qpqOMXzhQj7LGPxb0tNKk3Odh1tQFyytXUrbUqgH SLVXvimlLr3q9xvf2TddMlHeKH12D8lQJ12XPXJAVhzJQvrgJlXc6jn/Xy0ajLtnQQJZ k+0YQljoZ0Xz9Q9Jlux/q0mi1nul2auIsWg4/9KBkMy+OrkWft/efv2RT1dNVsUxkutP 69Tg== X-Forwarded-Encrypted: i=1; AJvYcCVXla327BCuhAliNn2NxUlLokzJIdf1S0GDLBirRcbZHMkuU7nZhRbT7gbpt1cJpbcURSur2IryCxEdKvQ=@vger.kernel.org X-Gm-Message-State: AOJu0Yy7mvigeT5AO4ln/F7Lse5WlyXFPQHNHAN5g4DqLxKz9ii+CmyC QagzPVuy8uhzOqLoBnmL1FeeZQRondTvsMKEFI6nJQ9bmYTFHNdR1nTy X-Gm-Gg: ASbGncukqWS0ugKDksdThqKFNqTUq8J1yYXkgoHZWXb2hIBbtiR06C66QDFRuRh08a7 YtHZUC0KGsooyb/0kL3AUOon/BgUipOoU0qhQQk214QAWgb+OLL6pga0M1X3iHFXA6J+wZfIOhR bKartS1uz5ZebewJCwv1row3iNAU5H5fGMm6WaGeZNMwSgS4QSSf1O2eYWWZQxugl+WKFDSSzeZ 3boJVAd7EYw2O6NytGc3bG1gwMWD5WDG7WYNaRmRezXaBE6s5Rhc04kwydX1d4BtTI1JPPlG/zx q/MYJ9KGmq6hHJMcqdztgLk7jLCFo89tPOre6z1hwA82fQw4IZlPGmC6N5f4PTym3lADl3IYDPk sC51XXdpmL1tI1MUjLiV+CqLyhbTs1+CenP6Jlw5TL1hMgNJX1aCFdaugFr+BPvjf5wCTzVGWoJ 6xMkrCAH6Y+tc= X-Google-Smtp-Source: AGHT+IEBIum988zCRistEInnTerl8FyEqMWMxIcD25ZwST/CVAmNRsjUTKijGwb/OAf3K2YfFky7uw== X-Received: by 2002:a05:690e:150a:b0:63e:3bd4:9db4 with SMTP id 956f58d0204a3-63f42a57a2dmr264565d50.0.1761253234759; Thu, 23 Oct 2025 14:00:34 -0700 (PDT) Received: from localhost ([2a03:2880:25ff:50::]) by smtp.gmail.com with ESMTPSA id 956f58d0204a3-63f378ef9d1sm967513d50.9.2025.10.23.14.00.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Oct 2025 14:00:34 -0700 (PDT) From: Bobby Eshleman Date: Thu, 23 Oct 2025 13:58:23 -0700 Subject: [PATCH net-next v5 4/4] net: add per-netns sysctl for devmem autorelease Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20251023-scratch-bobbyeshleman-devmem-tcp-token-upstream-v5-4-47cb85f5259e@meta.com> References: <20251023-scratch-bobbyeshleman-devmem-tcp-token-upstream-v5-0-47cb85f5259e@meta.com> In-Reply-To: <20251023-scratch-bobbyeshleman-devmem-tcp-token-upstream-v5-0-47cb85f5259e@meta.com> To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Simon Horman , Kuniyuki Iwashima , Willem de Bruijn , Neal Cardwell , David Ahern , Mina Almasry Cc: Stanislav Fomichev , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Bobby Eshleman X-Mailer: b4 0.13.0 From: Bobby Eshleman Add a new per-namespace sysctl to control the autorelease behavior of devmem dmabuf bindings. The sysctl is found at: /proc/sys/net/core/devmem_autorelease When a binding is created, it inherits the autorelease setting from the network namespace of the device to which it's being bound. If autorelease is enabled (1): - Tokens are stored in socket's xarray - Tokens are automatically released when socket is closed If autorelease is disabled (0): - Tokens are tracked via uref counter in each net_iov - User must manually release tokens via SO_DEVMEM_DONTNEED - Lingering tokens are released when dmabuf is unbound - This is the new default behavior for better performance This allows application developers to choose between automatic cleanup (easier, backwards compatible) and manual control (more explicit token management, but more performant). Changes the default to autorelease=3D0, so that users gain the performance benefit by default. Signed-off-by: Bobby Eshleman --- include/net/netns/core.h | 1 + net/core/devmem.c | 2 +- net/core/net_namespace.c | 1 + net/core/sysctl_net_core.c | 9 +++++++++ 4 files changed, 12 insertions(+), 1 deletion(-) diff --git a/include/net/netns/core.h b/include/net/netns/core.h index 9ef3d70e5e9c..7af5ab0d757b 100644 --- a/include/net/netns/core.h +++ b/include/net/netns/core.h @@ -18,6 +18,7 @@ struct netns_core { u8 sysctl_txrehash; u8 sysctl_tstamp_allow_data; u8 sysctl_bypass_prot_mem; + u8 sysctl_devmem_autorelease; =20 #ifdef CONFIG_PROC_FS struct prot_inuse __percpu *prot_inuse; diff --git a/net/core/devmem.c b/net/core/devmem.c index 8f3199fe0f7b..9cd6d93676f9 100644 --- a/net/core/devmem.c +++ b/net/core/devmem.c @@ -331,7 +331,7 @@ net_devmem_bind_dmabuf(struct net_device *dev, goto err_free_chunks; =20 list_add(&binding->list, &priv->bindings); - binding->autorelease =3D true; + binding->autorelease =3D dev_net(dev)->core.sysctl_devmem_autorelease; =20 return binding; =20 diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c index adcfef55a66f..890826b113d6 100644 --- a/net/core/net_namespace.c +++ b/net/core/net_namespace.c @@ -396,6 +396,7 @@ static __net_init void preinit_net_sysctl(struct net *n= et) net->core.sysctl_txrehash =3D SOCK_TXREHASH_ENABLED; net->core.sysctl_tstamp_allow_data =3D 1; net->core.sysctl_txq_reselection =3D msecs_to_jiffies(1000); + net->core.sysctl_devmem_autorelease =3D 0; } =20 /* init code that must occur even if setup_net() is not called. */ diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c index 8d4decb2606f..375ec395227e 100644 --- a/net/core/sysctl_net_core.c +++ b/net/core/sysctl_net_core.c @@ -692,6 +692,15 @@ static struct ctl_table netns_core_table[] =3D { .extra1 =3D SYSCTL_ZERO, .extra2 =3D SYSCTL_ONE }, + { + .procname =3D "devmem_autorelease", + .data =3D &init_net.core.sysctl_devmem_autorelease, + .maxlen =3D sizeof(u8), + .mode =3D 0644, + .proc_handler =3D proc_dou8vec_minmax, + .extra1 =3D SYSCTL_ZERO, + .extra2 =3D SYSCTL_ONE + }, /* sysctl_core_net_init() will set the values after this * to readonly in network namespaces */ --=20 2.47.3