From: Bobby Eshleman <bobbyeshleman@meta.com>
Add a new per-namespace sysctl to control the autorelease
behavior of devmem dmabuf bindings. The sysctl is found at:
/proc/sys/net/core/devmem_autorelease
When a binding is created, it inherits the autorelease setting from the
network namespace of the device to which it's being bound.
If autorelease is enabled (1):
- Tokens are stored in socket's xarray
- Tokens are automatically released when socket is closed
If autorelease is disabled (0):
- Tokens are tracked via uref counter in each net_iov
- User must manually release tokens via SO_DEVMEM_DONTNEED
- Lingering tokens are released when dmabuf is unbound
- This is the new default behavior for better performance
This allows application developers to choose between automatic cleanup
(easier, backwards compatible) and manual control (more explicit token
management, but more performant).
Changes the default to autorelease=0, so that users gain the performance
benefit by default.
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
---
include/net/netns/core.h | 1 +
net/core/devmem.c | 2 +-
net/core/net_namespace.c | 1 +
net/core/sysctl_net_core.c | 9 +++++++++
4 files changed, 12 insertions(+), 1 deletion(-)
diff --git a/include/net/netns/core.h b/include/net/netns/core.h
index 9ef3d70e5e9c..7af5ab0d757b 100644
--- a/include/net/netns/core.h
+++ b/include/net/netns/core.h
@@ -18,6 +18,7 @@ struct netns_core {
u8 sysctl_txrehash;
u8 sysctl_tstamp_allow_data;
u8 sysctl_bypass_prot_mem;
+ u8 sysctl_devmem_autorelease;
#ifdef CONFIG_PROC_FS
struct prot_inuse __percpu *prot_inuse;
diff --git a/net/core/devmem.c b/net/core/devmem.c
index 8f3199fe0f7b..9cd6d93676f9 100644
--- a/net/core/devmem.c
+++ b/net/core/devmem.c
@@ -331,7 +331,7 @@ net_devmem_bind_dmabuf(struct net_device *dev,
goto err_free_chunks;
list_add(&binding->list, &priv->bindings);
- binding->autorelease = true;
+ binding->autorelease = dev_net(dev)->core.sysctl_devmem_autorelease;
return binding;
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index adcfef55a66f..890826b113d6 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -396,6 +396,7 @@ static __net_init void preinit_net_sysctl(struct net *net)
net->core.sysctl_txrehash = SOCK_TXREHASH_ENABLED;
net->core.sysctl_tstamp_allow_data = 1;
net->core.sysctl_txq_reselection = msecs_to_jiffies(1000);
+ net->core.sysctl_devmem_autorelease = 0;
}
/* init code that must occur even if setup_net() is not called. */
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index 8d4decb2606f..375ec395227e 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -692,6 +692,15 @@ static struct ctl_table netns_core_table[] = {
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_ONE
},
+ {
+ .procname = "devmem_autorelease",
+ .data = &init_net.core.sysctl_devmem_autorelease,
+ .maxlen = sizeof(u8),
+ .mode = 0644,
+ .proc_handler = proc_dou8vec_minmax,
+ .extra1 = SYSCTL_ZERO,
+ .extra2 = SYSCTL_ONE
+ },
/* sysctl_core_net_init() will set the values after this
* to readonly in network namespaces
*/
--
2.47.3
On Thu, Oct 23, 2025 at 2:00 PM Bobby Eshleman <bobbyeshleman@gmail.com> wrote:
>
> From: Bobby Eshleman <bobbyeshleman@meta.com>
>
> Add a new per-namespace sysctl to control the autorelease
> behavior of devmem dmabuf bindings. The sysctl is found at:
> /proc/sys/net/core/devmem_autorelease
>
> When a binding is created, it inherits the autorelease setting from the
> network namespace of the device to which it's being bound.
>
> If autorelease is enabled (1):
> - Tokens are stored in socket's xarray
> - Tokens are automatically released when socket is closed
>
> If autorelease is disabled (0):
> - Tokens are tracked via uref counter in each net_iov
> - User must manually release tokens via SO_DEVMEM_DONTNEED
> - Lingering tokens are released when dmabuf is unbound
> - This is the new default behavior for better performance
>
Maybe quote the significant better performance in the docs and commit message.
> This allows application developers to choose between automatic cleanup
> (easier, backwards compatible) and manual control (more explicit token
> management, but more performant).
>
> Changes the default to autorelease=0, so that users gain the performance
> benefit by default.
>
> Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
> ---
> include/net/netns/core.h | 1 +
> net/core/devmem.c | 2 +-
> net/core/net_namespace.c | 1 +
> net/core/sysctl_net_core.c | 9 +++++++++
> 4 files changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/include/net/netns/core.h b/include/net/netns/core.h
> index 9ef3d70e5e9c..7af5ab0d757b 100644
> --- a/include/net/netns/core.h
> +++ b/include/net/netns/core.h
> @@ -18,6 +18,7 @@ struct netns_core {
> u8 sysctl_txrehash;
> u8 sysctl_tstamp_allow_data;
> u8 sysctl_bypass_prot_mem;
> + u8 sysctl_devmem_autorelease;
>
> #ifdef CONFIG_PROC_FS
> struct prot_inuse __percpu *prot_inuse;
> diff --git a/net/core/devmem.c b/net/core/devmem.c
> index 8f3199fe0f7b..9cd6d93676f9 100644
> --- a/net/core/devmem.c
> +++ b/net/core/devmem.c
> @@ -331,7 +331,7 @@ net_devmem_bind_dmabuf(struct net_device *dev,
> goto err_free_chunks;
>
> list_add(&binding->list, &priv->bindings);
> - binding->autorelease = true;
> + binding->autorelease = dev_net(dev)->core.sysctl_devmem_autorelease;
>
Do you need to READ_ONCE this and WRITE_ONCE the write site? Or is
that silly for a u8? Maybe better be safe.
Could we not make this an optional netlink argument? I thought that
was a bit nicer than a sysctl.
Needs a doc update.
--
Thanks,
Mina
On Mon, Oct 27, 2025 at 06:22:16PM -0700, Mina Almasry wrote: > On Thu, Oct 23, 2025 at 2:00 PM Bobby Eshleman <bobbyeshleman@gmail.com> wrote: [...] > > diff --git a/net/core/devmem.c b/net/core/devmem.c > > index 8f3199fe0f7b..9cd6d93676f9 100644 > > --- a/net/core/devmem.c > > +++ b/net/core/devmem.c > > @@ -331,7 +331,7 @@ net_devmem_bind_dmabuf(struct net_device *dev, > > goto err_free_chunks; > > > > list_add(&binding->list, &priv->bindings); > > - binding->autorelease = true; > > + binding->autorelease = dev_net(dev)->core.sysctl_devmem_autorelease; > > > > Do you need to READ_ONCE this and WRITE_ONCE the write site? Or is > that silly for a u8? Maybe better be safe. Probably worth it to be safe. > > Could we not make this an optional netlink argument? I thought that > was a bit nicer than a sysctl. > > Needs a doc update. > > > -- Thanks, Mina Sounds good, I'll change to nl for the next rev. Thanks for the review! Best, Bobby
On Tue, Oct 28, 2025 at 2:14 PM Bobby Eshleman <bobbyeshleman@gmail.com> wrote: > > On Mon, Oct 27, 2025 at 06:22:16PM -0700, Mina Almasry wrote: > > On Thu, Oct 23, 2025 at 2:00 PM Bobby Eshleman <bobbyeshleman@gmail.com> wrote: > > [...] > > > > diff --git a/net/core/devmem.c b/net/core/devmem.c > > > index 8f3199fe0f7b..9cd6d93676f9 100644 > > > --- a/net/core/devmem.c > > > +++ b/net/core/devmem.c > > > @@ -331,7 +331,7 @@ net_devmem_bind_dmabuf(struct net_device *dev, > > > goto err_free_chunks; > > > > > > list_add(&binding->list, &priv->bindings); > > > - binding->autorelease = true; > > > + binding->autorelease = dev_net(dev)->core.sysctl_devmem_autorelease; > > > > > > > Do you need to READ_ONCE this and WRITE_ONCE the write site? Or is > > that silly for a u8? Maybe better be safe. > > Probably worth it to be safe. > > > > Could we not make this an optional netlink argument? I thought that > > was a bit nicer than a sysctl. > > > > Needs a doc update. > > > > > > -- Thanks, Mina > > Sounds good, I'll change to nl for the next rev. Thanks for the review! > Sorry to pile the requests, but any chance we can have the kselftest improved to cover the default case and the autorelease=on case? I'm thinking out loud here: if we make autorelease a property of the socket like I say in the other thread, does changing the value at runtime blow everything up. My thinking is that no, what's important is that the sk->devmem_info.autorelease **never** gets toggled for any active sockets, but as long as the value is constant, everything should work fine, yes? -- Thanks, Mina
On Tue, Oct 28, 2025 at 07:09:58PM -0700, Mina Almasry wrote: > On Tue, Oct 28, 2025 at 2:14 PM Bobby Eshleman <bobbyeshleman@gmail.com> wrote: > > > > On Mon, Oct 27, 2025 at 06:22:16PM -0700, Mina Almasry wrote: > > > On Thu, Oct 23, 2025 at 2:00 PM Bobby Eshleman <bobbyeshleman@gmail.com> wrote: > > > > [...] > > > > > > diff --git a/net/core/devmem.c b/net/core/devmem.c > > > > index 8f3199fe0f7b..9cd6d93676f9 100644 > > > > --- a/net/core/devmem.c > > > > +++ b/net/core/devmem.c > > > > @@ -331,7 +331,7 @@ net_devmem_bind_dmabuf(struct net_device *dev, > > > > goto err_free_chunks; > > > > > > > > list_add(&binding->list, &priv->bindings); > > > > - binding->autorelease = true; > > > > + binding->autorelease = dev_net(dev)->core.sysctl_devmem_autorelease; > > > > > > > > > > Do you need to READ_ONCE this and WRITE_ONCE the write site? Or is > > > that silly for a u8? Maybe better be safe. > > > > Probably worth it to be safe. > > > > > > Could we not make this an optional netlink argument? I thought that > > > was a bit nicer than a sysctl. > > > > > > Needs a doc update. > > > > > > > > > -- Thanks, Mina > > > > Sounds good, I'll change to nl for the next rev. Thanks for the review! > > > > Sorry to pile the requests, but any chance we can have the kselftest > improved to cover the default case and the autorelease=on case? > No problem, I had the same thought. > I'm thinking out loud here: if we make autorelease a property of the > socket like I say in the other thread, does changing the value at > runtime blow everything up. My thinking is that no, what's important > is that the sk->devmem_info.autorelease **never** gets toggled for any > active sockets, but as long as the value is constant, everything > should work fine, yes? I agree, autorelease can be toggled so long as the xarray is empty and there are no outstanding urefs (to avoid sock_devmem_dontneed from doing the wrong thing with the tokens). Best, Bobby
© 2016 - 2026 Red Hat, Inc.