From: Bobby Eshleman <bobbyeshleman@meta.com>
Update devmem.rst documentation to describe the new SO_DEVMEM_AUTORELEASE
socket option and its usage.
Document the following:
- The two token release modes (automatic vs manual)
- How to use SO_DEVMEM_AUTORELEASE to control the behavior
- Performance benefits of disabling autorelease (~10% CPU reduction)
- Restrictions and caveats of manual token release
- Usage examples for both getsockopt and setsockopt
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
---
Documentation/networking/devmem.rst | 70 +++++++++++++++++++++++++++++++++++--
1 file changed, 68 insertions(+), 2 deletions(-)
diff --git a/Documentation/networking/devmem.rst b/Documentation/networking/devmem.rst
index a6cd7236bfbd..1bfce686dce6 100644
--- a/Documentation/networking/devmem.rst
+++ b/Documentation/networking/devmem.rst
@@ -215,8 +215,8 @@ Freeing frags
-------------
Frags received via SCM_DEVMEM_DMABUF are pinned by the kernel while the user
-processes the frag. The user must return the frag to the kernel via
-SO_DEVMEM_DONTNEED::
+processes the frag. Users should return tokens to the kernel via
+SO_DEVMEM_DONTNEED when they are done processing the data::
ret = setsockopt(client_fd, SOL_SOCKET, SO_DEVMEM_DONTNEED, &token,
sizeof(token));
@@ -235,6 +235,72 @@ can be less than the tokens provided by the user in case of:
(a) an internal kernel leak bug.
(b) the user passed more than 1024 frags.
+
+Autorelease Control
+~~~~~~~~~~~~~~~~~~~
+
+The SO_DEVMEM_AUTORELEASE socket option controls what happens to outstanding
+tokens (tokens not released via SO_DEVMEM_DONTNEED) when the socket closes::
+
+ int autorelease = 0; /* 0 = manual release, 1 = automatic release */
+ ret = setsockopt(client_fd, SOL_SOCKET, SO_DEVMEM_AUTORELEASE,
+ &autorelease, sizeof(autorelease));
+
+ /* Query current setting */
+ int current_val;
+ socklen_t len = sizeof(current_val);
+ ret = getsockopt(client_fd, SOL_SOCKET, SO_DEVMEM_AUTORELEASE,
+ ¤t_val, &len);
+
+When autorelease is disabled (default):
+
+- Outstanding tokens are NOT released when the socket closes
+- Outstanding tokens are only released when the dmabuf is unbound
+- Provides better performance by eliminating xarray overhead (~10% CPU reduction)
+- Kernel tracks tokens via atomic reference counters in net_iov structures
+
+When autorelease is enabled:
+
+- Outstanding tokens are automatically released when the socket closes
+- Backwards compatible behavior
+- Kernel tracks tokens in an xarray per socket
+
+Important: In both modes, applications should call SO_DEVMEM_DONTNEED to
+return tokens as soon as they are done processing. The autorelease setting only
+affects what happens to tokens that are still outstanding when close() is called.
+
+The autorelease setting can only be changed when the socket has no outstanding
+tokens. If tokens are present, setsockopt returns -EBUSY.
+
+
+Performance Considerations
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Disabling autorelease provides approximately ~10% CPU utilization improvement in
+RX workloads by:
+
+- Eliminating xarray allocations and lookups for token tracking
+- Using atomic reference counters instead
+- Reducing lock contention on the xarray spinlock
+
+However, applications must ensure all tokens are released via
+SO_DEVMEM_DONTNEED before closing the socket, otherwise the backing pages will
+remain pinned until the dmabuf is unbound.
+
+
+Caveats
+~~~~~~~
+
+- With autorelease disabled, sockets cannot switch between different dmabuf
+ bindings. This restriction exists because tokens in this mode do not encode
+ the binding information necessary to perform the token release.
+
+- Applications using manual release mode (autorelease=0) must ensure all tokens
+ are returned via SO_DEVMEM_DONTNEED before socket close to avoid resource
+ leaks during the lifetime of the dmabuf binding. Tokens not released before
+ close() will only be freed when the dmabuf is unbound.
+
+
TX Interface
============
--
2.47.3
On 11/04, Bobby Eshleman wrote: > From: Bobby Eshleman <bobbyeshleman@meta.com> > [..] > +Autorelease Control > +~~~~~~~~~~~~~~~~~~~ Have you considered an option to have this flag on the dmabuf binding itself? This will let us keep everything in ynl and not add a new socket option. I think also semantically, this is a property of the binding and not the socket? (not sure what's gonna happen if we have autorelease=on and autorelease=off sockets receiving to the same dmabuf)
On Wed, Nov 05, 2025 at 09:34:03AM -0800, Stanislav Fomichev wrote: > On 11/04, Bobby Eshleman wrote: > > From: Bobby Eshleman <bobbyeshleman@meta.com> > > > > [..] > > > +Autorelease Control > > +~~~~~~~~~~~~~~~~~~~ > > Have you considered an option to have this flag on the dmabuf binding > itself? This will let us keep everything in ynl and not add a new socket > option. I think also semantically, this is a property of the binding > and not the socket? (not sure what's gonna happen if we have > autorelease=on and autorelease=off sockets receiving to the same > dmabuf) This was our initial instinct too and was the implementation in the prior version, but we opted for a socket-based property because it simplifies backwards compatibility with multi-binding steering rules. In this case, where bindings may have different autorelease settings, the recv path would need to error out once any binding with different autorelease value was detected, because the dont_need path doesn't have any context to know if any specific token is part of the socket's xarray (autorelease=on) or part of the binding->vec (autorelease=off). At the socket level we can just prevent the mode switch by counting outstanding references... to do this at the binding level, I think we have to revert back to the ethtool approach we experimented with earlier (trying to resolve steering rules to queues, and then check their binding->autorelease values and make sure they are consistent). This should work out off the box for mixed-modes, given then outstanding ref rule. Probably should add a test for specifically that... Best, Bobby
On Wed, Nov 5, 2025 at 9:34 AM Stanislav Fomichev <stfomichev@gmail.com> wrote: > > On 11/04, Bobby Eshleman wrote: > > From: Bobby Eshleman <bobbyeshleman@meta.com> > > > > [..] > > > +Autorelease Control > > +~~~~~~~~~~~~~~~~~~~ > > Have you considered an option to have this flag on the dmabuf binding > itself? This will let us keep everything in ynl and not add a new socket > option. I think also semantically, this is a property of the binding > and not the socket? (not sure what's gonna happen if we have > autorelease=on and autorelease=off sockets receiving to the same > dmabuf) I think this thread (and maybe other comments on that patch) is the context that missed your inbox: https://lore.kernel.org/netdev/aQIoxVO3oICd8U8Q@devvm11784.nha0.facebook.com/ Let us know if you disagree. -- Thanks, Mina
On 11/05, Mina Almasry wrote: > On Wed, Nov 5, 2025 at 9:34 AM Stanislav Fomichev <stfomichev@gmail.com> wrote: > > > > On 11/04, Bobby Eshleman wrote: > > > From: Bobby Eshleman <bobbyeshleman@meta.com> > > > > > > > [..] > > > > > +Autorelease Control > > > +~~~~~~~~~~~~~~~~~~~ > > > > Have you considered an option to have this flag on the dmabuf binding > > itself? This will let us keep everything in ynl and not add a new socket > > option. I think also semantically, this is a property of the binding > > and not the socket? (not sure what's gonna happen if we have > > autorelease=on and autorelease=off sockets receiving to the same > > dmabuf) > > I think this thread (and maybe other comments on that patch) is the > context that missed your inbox: > > https://lore.kernel.org/netdev/aQIoxVO3oICd8U8Q@devvm11784.nha0.facebook.com/ > > Let us know if you disagree. Thanks, I did miss that whole v5 because I was OOO, let me take a look!
On 11/05, Stanislav Fomichev wrote: > On 11/05, Mina Almasry wrote: > > On Wed, Nov 5, 2025 at 9:34 AM Stanislav Fomichev <stfomichev@gmail.com> wrote: > > > > > > On 11/04, Bobby Eshleman wrote: > > > > From: Bobby Eshleman <bobbyeshleman@meta.com> > > > > > > > > > > [..] > > > > > > > +Autorelease Control > > > > +~~~~~~~~~~~~~~~~~~~ > > > > > > Have you considered an option to have this flag on the dmabuf binding > > > itself? This will let us keep everything in ynl and not add a new socket > > > option. I think also semantically, this is a property of the binding > > > and not the socket? (not sure what's gonna happen if we have > > > autorelease=on and autorelease=off sockets receiving to the same > > > dmabuf) > > > > I think this thread (and maybe other comments on that patch) is the > > context that missed your inbox: > > > > https://lore.kernel.org/netdev/aQIoxVO3oICd8U8Q@devvm11784.nha0.facebook.com/ > > > > Let us know if you disagree. > > Thanks, I did miss that whole v5 because I was OOO, let me take a look! Thank you for the context! I think that the current approach is ok, we can go with that, but I wonder whether we can simplify things a bit? What if we prohibit the co-existence of autorelease=on and autorelease=off sockets on the system? The first binding basically locks the kernel path into one way or the other (presumably by using static-branch) and prohibits new bindings that use a different mode. It will let us still keep the mode on the binding and will help us not think about the co-existance (we can also still keep things like one-dmabuf-per-socket restrictions in the new mode, etc). I think for you, Mina, this should still work? You have a knob to go back to the old mode if needed. At the same time, we keep the UAPI surface smaller and keep the path more simple. Ideally, we can also deprecate the old mode at some point (if you manage to successfully migrate of course). WDYT?
On Wed, Nov 05, 2025 at 03:17:06PM -0800, Stanislav Fomichev wrote: > On 11/05, Stanislav Fomichev wrote: > > Thank you for the context! > > I think that the current approach is ok, we can go with that, but I > wonder whether we can simplify things a bit? What if we prohibit the > co-existence of autorelease=on and autorelease=off sockets on the > system? The first binding basically locks the kernel path into one way or > the other (presumably by using static-branch) and prohibits new bindings > that use a different mode. It will let us still keep the mode on the binding > and will help us not think about the co-existance (we can also still keep > things like one-dmabuf-per-socket restrictions in the new mode, etc). > That approach is okay by me. Best, Bobby > I think for you, Mina, this should still work? You have a knob to go back > to the old mode if needed. At the same time, we keep the UAPI surface > smaller and keep the path more simple. Ideally, we can also deprecate > the old mode at some point (if you manage to successfully migrate of > course). WDYT?
© 2016 - 2025 Red Hat, Inc.