[v4] mm: bpf kfuncs to access memcg data

[PATCH bpf-next v4 3/6] mm: introduce bpf_get_root_mem_cgroup() BPF kfunc

Posted by Roman Gushchin 1 month, 2 weeks ago

Introduce a BPF kfunc to get a trusted pointer to the root memory
cgroup. It's very handy to traverse the full memcg tree, e.g.
for handling a system-wide OOM.

It's possible to obtain this pointer by traversing the memcg tree
up from any known memcg, but it's sub-optimal and makes BPF programs
more complex and less efficient.

bpf_get_root_mem_cgroup() has a KF_ACQUIRE | KF_RET_NULL semantics,
however in reality it's not necessary to bump the corresponding
reference counter - root memory cgroup is immortal, reference counting
is skipped, see css_get(). Once set, root_mem_cgroup is always a valid
memcg pointer. It's safe to call bpf_put_mem_cgroup() for the pointer
obtained with bpf_get_root_mem_cgroup(), it's effectively a no-op.

Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
---
 mm/bpf_memcontrol.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/mm/bpf_memcontrol.c b/mm/bpf_memcontrol.c
index 82eb95de77b7..187919eb2fe2 100644
--- a/mm/bpf_memcontrol.c
+++ b/mm/bpf_memcontrol.c
@@ -10,6 +10,25 @@
 
 __bpf_kfunc_start_defs();
 
+/**
+ * bpf_get_root_mem_cgroup - Returns a pointer to the root memory cgroup
+ *
+ * The function has KF_ACQUIRE semantics, even though the root memory
+ * cgroup is never destroyed after being created and doesn't require
+ * reference counting. And it's perfectly safe to pass it to
+ * bpf_put_mem_cgroup()
+ *
+ * Return: A pointer to the root memory cgroup.
+ */
+__bpf_kfunc struct mem_cgroup *bpf_get_root_mem_cgroup(void)
+{
+	if (mem_cgroup_disabled())
+		return NULL;
+
+	/* css_get() is not needed */
+	return root_mem_cgroup;
+}
+
 /**
  * bpf_get_mem_cgroup - Get a reference to a memory cgroup
  * @css: pointer to the css structure
@@ -64,6 +83,7 @@ __bpf_kfunc void bpf_put_mem_cgroup(struct mem_cgroup *memcg)
 __bpf_kfunc_end_defs();
 
 BTF_KFUNCS_START(bpf_memcontrol_kfuncs)
+BTF_ID_FLAGS(func, bpf_get_root_mem_cgroup, KF_ACQUIRE | KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_get_mem_cgroup, KF_ACQUIRE | KF_RET_NULL | KF_RCU)
 BTF_ID_FLAGS(func, bpf_put_mem_cgroup, KF_RELEASE)
 
-- 
2.52.0

Re: [PATCH bpf-next v4 3/6] mm: introduce bpf_get_root_mem_cgroup() BPF kfunc

Posted by Matt Bobrowski 1 month, 1 week ago

On Mon, Dec 22, 2025 at 08:41:53PM -0800, Roman Gushchin wrote:
> Introduce a BPF kfunc to get a trusted pointer to the root memory
> cgroup. It's very handy to traverse the full memcg tree, e.g.
> for handling a system-wide OOM.
> 
> It's possible to obtain this pointer by traversing the memcg tree
> up from any known memcg, but it's sub-optimal and makes BPF programs
> more complex and less efficient.
> 
> bpf_get_root_mem_cgroup() has a KF_ACQUIRE | KF_RET_NULL semantics,
> however in reality it's not necessary to bump the corresponding
> reference counter - root memory cgroup is immortal, reference counting
> is skipped, see css_get(). Once set, root_mem_cgroup is always a valid
> memcg pointer. It's safe to call bpf_put_mem_cgroup() for the pointer
> obtained with bpf_get_root_mem_cgroup(), it's effectively a no-op.
> 
> Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
> ---
>  mm/bpf_memcontrol.c | 20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)
> 
> diff --git a/mm/bpf_memcontrol.c b/mm/bpf_memcontrol.c
> index 82eb95de77b7..187919eb2fe2 100644
> --- a/mm/bpf_memcontrol.c
> +++ b/mm/bpf_memcontrol.c
> @@ -10,6 +10,25 @@
>  
>  __bpf_kfunc_start_defs();
>  
> +/**
> + * bpf_get_root_mem_cgroup - Returns a pointer to the root memory cgroup
> + *
> + * The function has KF_ACQUIRE semantics, even though the root memory
> + * cgroup is never destroyed after being created and doesn't require
> + * reference counting. And it's perfectly safe to pass it to
> + * bpf_put_mem_cgroup()
> + *
> + * Return: A pointer to the root memory cgroup.
> + */
> +__bpf_kfunc struct mem_cgroup *bpf_get_root_mem_cgroup(void)
> +{
> +	if (mem_cgroup_disabled())
> +		return NULL;
> +
> +	/* css_get() is not needed */
> +	return root_mem_cgroup;
> +}
> +
>  /**
>   * bpf_get_mem_cgroup - Get a reference to a memory cgroup
>   * @css: pointer to the css structure
> @@ -64,6 +83,7 @@ __bpf_kfunc void bpf_put_mem_cgroup(struct mem_cgroup *memcg)
>  __bpf_kfunc_end_defs();
>  
>  BTF_KFUNCS_START(bpf_memcontrol_kfuncs)
> +BTF_ID_FLAGS(func, bpf_get_root_mem_cgroup, KF_ACQUIRE | KF_RET_NULL)

I feel as though relying on KF_ACQUIRE semantics here is somewhat
odd. Users of this BPF kfunc will now be forced to call
bpf_put_mem_cgroup() on the returned root_mem_cgroup, despite it being
completely unnecessary.

Perhaps we should consider introducing a new KF bit/value which
essentially allows such BPF kfuncs to also have their returned
pointers implicitly marked as "trusted", similar to that of the legacy
RET_PTR_TO_BTF_ID_TRUSTED. What do you think? That way it obviates the
requirement to call into any backing KF_RELEASE BPF kfunc after the
fact.

Re: [PATCH bpf-next v4 3/6] mm: introduce bpf_get_root_mem_cgroup() BPF kfunc

Posted by Roman Gushchin 1 month, 1 week ago

Matt Bobrowski <mattbobrowski@google.com> writes:

> On Mon, Dec 22, 2025 at 08:41:53PM -0800, Roman Gushchin wrote:
>> Introduce a BPF kfunc to get a trusted pointer to the root memory
>> cgroup. It's very handy to traverse the full memcg tree, e.g.
>> for handling a system-wide OOM.
>> 
>> It's possible to obtain this pointer by traversing the memcg tree
>> up from any known memcg, but it's sub-optimal and makes BPF programs
>> more complex and less efficient.
>> 
>> bpf_get_root_mem_cgroup() has a KF_ACQUIRE | KF_RET_NULL semantics,
>> however in reality it's not necessary to bump the corresponding
>> reference counter - root memory cgroup is immortal, reference counting
>> is skipped, see css_get(). Once set, root_mem_cgroup is always a valid
>> memcg pointer. It's safe to call bpf_put_mem_cgroup() for the pointer
>> obtained with bpf_get_root_mem_cgroup(), it's effectively a no-op.
>> 
>> Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
>> ---
>>  mm/bpf_memcontrol.c | 20 ++++++++++++++++++++
>>  1 file changed, 20 insertions(+)
>> 
>> diff --git a/mm/bpf_memcontrol.c b/mm/bpf_memcontrol.c
>> index 82eb95de77b7..187919eb2fe2 100644
>> --- a/mm/bpf_memcontrol.c
>> +++ b/mm/bpf_memcontrol.c
>> @@ -10,6 +10,25 @@
>>  
>>  __bpf_kfunc_start_defs();
>>  
>> +/**
>> + * bpf_get_root_mem_cgroup - Returns a pointer to the root memory cgroup
>> + *
>> + * The function has KF_ACQUIRE semantics, even though the root memory
>> + * cgroup is never destroyed after being created and doesn't require
>> + * reference counting. And it's perfectly safe to pass it to
>> + * bpf_put_mem_cgroup()
>> + *
>> + * Return: A pointer to the root memory cgroup.
>> + */
>> +__bpf_kfunc struct mem_cgroup *bpf_get_root_mem_cgroup(void)
>> +{
>> +	if (mem_cgroup_disabled())
>> +		return NULL;
>> +
>> +	/* css_get() is not needed */
>> +	return root_mem_cgroup;
>> +}
>> +
>>  /**
>>   * bpf_get_mem_cgroup - Get a reference to a memory cgroup
>>   * @css: pointer to the css structure
>> @@ -64,6 +83,7 @@ __bpf_kfunc void bpf_put_mem_cgroup(struct mem_cgroup *memcg)
>>  __bpf_kfunc_end_defs();
>>  
>>  BTF_KFUNCS_START(bpf_memcontrol_kfuncs)
>> +BTF_ID_FLAGS(func, bpf_get_root_mem_cgroup, KF_ACQUIRE | KF_RET_NULL)
>
> I feel as though relying on KF_ACQUIRE semantics here is somewhat
> odd. Users of this BPF kfunc will now be forced to call
> bpf_put_mem_cgroup() on the returned root_mem_cgroup, despite it being
> completely unnecessary.

A agree that it's annoying, but I doubt this extra call makes any
difference in the real world.

Also, the corresponding kernel code designed to hide the special
handling of the root cgroup. css_get()/css_put() are simple no-ops for
the root cgroup, but are totally valid. So in most places the root
cgroup is handled as any other, which simplifies the code. I guess
the same will be true for many bpf programs.

Thanks!

Re: [PATCH bpf-next v4 3/6] mm: introduce bpf_get_root_mem_cgroup() BPF kfunc

Posted by Matt Bobrowski 1 month, 1 week ago

On Tue, Dec 30, 2025 at 09:00:28PM +0000, Roman Gushchin wrote:
> Matt Bobrowski <mattbobrowski@google.com> writes:
> 
> > On Mon, Dec 22, 2025 at 08:41:53PM -0800, Roman Gushchin wrote:
> >> Introduce a BPF kfunc to get a trusted pointer to the root memory
> >> cgroup. It's very handy to traverse the full memcg tree, e.g.
> >> for handling a system-wide OOM.
> >> 
> >> It's possible to obtain this pointer by traversing the memcg tree
> >> up from any known memcg, but it's sub-optimal and makes BPF programs
> >> more complex and less efficient.
> >> 
> >> bpf_get_root_mem_cgroup() has a KF_ACQUIRE | KF_RET_NULL semantics,
> >> however in reality it's not necessary to bump the corresponding
> >> reference counter - root memory cgroup is immortal, reference counting
> >> is skipped, see css_get(). Once set, root_mem_cgroup is always a valid
> >> memcg pointer. It's safe to call bpf_put_mem_cgroup() for the pointer
> >> obtained with bpf_get_root_mem_cgroup(), it's effectively a no-op.
> >> 
> >> Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
> >> ---
> >>  mm/bpf_memcontrol.c | 20 ++++++++++++++++++++
> >>  1 file changed, 20 insertions(+)
> >> 
> >> diff --git a/mm/bpf_memcontrol.c b/mm/bpf_memcontrol.c
> >> index 82eb95de77b7..187919eb2fe2 100644
> >> --- a/mm/bpf_memcontrol.c
> >> +++ b/mm/bpf_memcontrol.c
> >> @@ -10,6 +10,25 @@
> >>  
> >>  __bpf_kfunc_start_defs();
> >>  
> >> +/**
> >> + * bpf_get_root_mem_cgroup - Returns a pointer to the root memory cgroup
> >> + *
> >> + * The function has KF_ACQUIRE semantics, even though the root memory
> >> + * cgroup is never destroyed after being created and doesn't require
> >> + * reference counting. And it's perfectly safe to pass it to
> >> + * bpf_put_mem_cgroup()
> >> + *
> >> + * Return: A pointer to the root memory cgroup.
> >> + */
> >> +__bpf_kfunc struct mem_cgroup *bpf_get_root_mem_cgroup(void)
> >> +{
> >> +	if (mem_cgroup_disabled())
> >> +		return NULL;
> >> +
> >> +	/* css_get() is not needed */
> >> +	return root_mem_cgroup;
> >> +}
> >> +
> >>  /**
> >>   * bpf_get_mem_cgroup - Get a reference to a memory cgroup
> >>   * @css: pointer to the css structure
> >> @@ -64,6 +83,7 @@ __bpf_kfunc void bpf_put_mem_cgroup(struct mem_cgroup *memcg)
> >>  __bpf_kfunc_end_defs();
> >>  
> >>  BTF_KFUNCS_START(bpf_memcontrol_kfuncs)
> >> +BTF_ID_FLAGS(func, bpf_get_root_mem_cgroup, KF_ACQUIRE | KF_RET_NULL)
> >
> > I feel as though relying on KF_ACQUIRE semantics here is somewhat
> > odd. Users of this BPF kfunc will now be forced to call
> > bpf_put_mem_cgroup() on the returned root_mem_cgroup, despite it being
> > completely unnecessary.
> 
> A agree that it's annoying, but I doubt this extra call makes any
> difference in the real world.

Sure, that certainly holds true.

> Also, the corresponding kernel code designed to hide the special
> handling of the root cgroup. css_get()/css_put() are simple no-ops for
> the root cgroup, but are totally valid.

Yes, I do see that.

> So in most places the root cgroup is handled as any other, which
> simplifies the code. I guess the same will be true for many bpf
> programs.

I see, however the same might not necessarily hold for all other
global pointers which end up being handed out by a BPF kfunc (not
necessarily bpf_get_root_mem_cgroup()). This is why I was wondering
whether there's some sense to introducing another KF flag (or
something similar) which allows returned values from BPF kfuncs to be
implicitly treated as trusted.

Re: [PATCH bpf-next v4 3/6] mm: introduce bpf_get_root_mem_cgroup() BPF kfunc

Posted by Alexei Starovoitov 1 month, 1 week ago

On Tue, Dec 30, 2025 at 11:42 PM Matt Bobrowski
<mattbobrowski@google.com> wrote:
>
> On Tue, Dec 30, 2025 at 09:00:28PM +0000, Roman Gushchin wrote:
> > Matt Bobrowski <mattbobrowski@google.com> writes:
> >
> > > On Mon, Dec 22, 2025 at 08:41:53PM -0800, Roman Gushchin wrote:
> > >> Introduce a BPF kfunc to get a trusted pointer to the root memory
> > >> cgroup. It's very handy to traverse the full memcg tree, e.g.
> > >> for handling a system-wide OOM.
> > >>
> > >> It's possible to obtain this pointer by traversing the memcg tree
> > >> up from any known memcg, but it's sub-optimal and makes BPF programs
> > >> more complex and less efficient.
> > >>
> > >> bpf_get_root_mem_cgroup() has a KF_ACQUIRE | KF_RET_NULL semantics,
> > >> however in reality it's not necessary to bump the corresponding
> > >> reference counter - root memory cgroup is immortal, reference counting
> > >> is skipped, see css_get(). Once set, root_mem_cgroup is always a valid
> > >> memcg pointer. It's safe to call bpf_put_mem_cgroup() for the pointer
> > >> obtained with bpf_get_root_mem_cgroup(), it's effectively a no-op.
> > >>
> > >> Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
> > >> ---
> > >>  mm/bpf_memcontrol.c | 20 ++++++++++++++++++++
> > >>  1 file changed, 20 insertions(+)
> > >>
> > >> diff --git a/mm/bpf_memcontrol.c b/mm/bpf_memcontrol.c
> > >> index 82eb95de77b7..187919eb2fe2 100644
> > >> --- a/mm/bpf_memcontrol.c
> > >> +++ b/mm/bpf_memcontrol.c
> > >> @@ -10,6 +10,25 @@
> > >>
> > >>  __bpf_kfunc_start_defs();
> > >>
> > >> +/**
> > >> + * bpf_get_root_mem_cgroup - Returns a pointer to the root memory cgroup
> > >> + *
> > >> + * The function has KF_ACQUIRE semantics, even though the root memory
> > >> + * cgroup is never destroyed after being created and doesn't require
> > >> + * reference counting. And it's perfectly safe to pass it to
> > >> + * bpf_put_mem_cgroup()
> > >> + *
> > >> + * Return: A pointer to the root memory cgroup.
> > >> + */
> > >> +__bpf_kfunc struct mem_cgroup *bpf_get_root_mem_cgroup(void)
> > >> +{
> > >> +  if (mem_cgroup_disabled())
> > >> +          return NULL;
> > >> +
> > >> +  /* css_get() is not needed */
> > >> +  return root_mem_cgroup;
> > >> +}
> > >> +
> > >>  /**
> > >>   * bpf_get_mem_cgroup - Get a reference to a memory cgroup
> > >>   * @css: pointer to the css structure
> > >> @@ -64,6 +83,7 @@ __bpf_kfunc void bpf_put_mem_cgroup(struct mem_cgroup *memcg)
> > >>  __bpf_kfunc_end_defs();
> > >>
> > >>  BTF_KFUNCS_START(bpf_memcontrol_kfuncs)
> > >> +BTF_ID_FLAGS(func, bpf_get_root_mem_cgroup, KF_ACQUIRE | KF_RET_NULL)
> > >
> > > I feel as though relying on KF_ACQUIRE semantics here is somewhat
> > > odd. Users of this BPF kfunc will now be forced to call
> > > bpf_put_mem_cgroup() on the returned root_mem_cgroup, despite it being
> > > completely unnecessary.
> >
> > A agree that it's annoying, but I doubt this extra call makes any
> > difference in the real world.
>
> Sure, that certainly holds true.
>
> > Also, the corresponding kernel code designed to hide the special
> > handling of the root cgroup. css_get()/css_put() are simple no-ops for
> > the root cgroup, but are totally valid.
>
> Yes, I do see that.
>
> > So in most places the root cgroup is handled as any other, which
> > simplifies the code. I guess the same will be true for many bpf
> > programs.
>
> I see, however the same might not necessarily hold for all other
> global pointers which end up being handed out by a BPF kfunc (not
> necessarily bpf_get_root_mem_cgroup()). This is why I was wondering
> whether there's some sense to introducing another KF flag (or
> something similar) which allows returned values from BPF kfuncs to be
> implicitly treated as trusted.

No need for a new KF flag. Any struct returned by kfunc should be
trusted or trusted_or_null if KF_RET_NULL was specified.
I don't remember off the top of my head, but this behavior
is already implemented or we discussed making it this way.

Re: [PATCH bpf-next v4 3/6] mm: introduce bpf_get_root_mem_cgroup() BPF kfunc

Posted by Matt Bobrowski 1 month ago

On Wed, Dec 31, 2025 at 09:32:17AM -0800, Alexei Starovoitov wrote:
> On Tue, Dec 30, 2025 at 11:42 PM Matt Bobrowski
> <mattbobrowski@google.com> wrote:
> >
> > On Tue, Dec 30, 2025 at 09:00:28PM +0000, Roman Gushchin wrote:
> > > Matt Bobrowski <mattbobrowski@google.com> writes:
> > >
> > > > On Mon, Dec 22, 2025 at 08:41:53PM -0800, Roman Gushchin wrote:
> > > >> Introduce a BPF kfunc to get a trusted pointer to the root memory
> > > >> cgroup. It's very handy to traverse the full memcg tree, e.g.
> > > >> for handling a system-wide OOM.
> > > >>
> > > >> It's possible to obtain this pointer by traversing the memcg tree
> > > >> up from any known memcg, but it's sub-optimal and makes BPF programs
> > > >> more complex and less efficient.
> > > >>
> > > >> bpf_get_root_mem_cgroup() has a KF_ACQUIRE | KF_RET_NULL semantics,
> > > >> however in reality it's not necessary to bump the corresponding
> > > >> reference counter - root memory cgroup is immortal, reference counting
> > > >> is skipped, see css_get(). Once set, root_mem_cgroup is always a valid
> > > >> memcg pointer. It's safe to call bpf_put_mem_cgroup() for the pointer
> > > >> obtained with bpf_get_root_mem_cgroup(), it's effectively a no-op.
> > > >>
> > > >> Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
> > > >> ---
> > > >>  mm/bpf_memcontrol.c | 20 ++++++++++++++++++++
> > > >>  1 file changed, 20 insertions(+)
> > > >>
> > > >> diff --git a/mm/bpf_memcontrol.c b/mm/bpf_memcontrol.c
> > > >> index 82eb95de77b7..187919eb2fe2 100644
> > > >> --- a/mm/bpf_memcontrol.c
> > > >> +++ b/mm/bpf_memcontrol.c
> > > >> @@ -10,6 +10,25 @@
> > > >>
> > > >>  __bpf_kfunc_start_defs();
> > > >>
> > > >> +/**
> > > >> + * bpf_get_root_mem_cgroup - Returns a pointer to the root memory cgroup
> > > >> + *
> > > >> + * The function has KF_ACQUIRE semantics, even though the root memory
> > > >> + * cgroup is never destroyed after being created and doesn't require
> > > >> + * reference counting. And it's perfectly safe to pass it to
> > > >> + * bpf_put_mem_cgroup()
> > > >> + *
> > > >> + * Return: A pointer to the root memory cgroup.
> > > >> + */
> > > >> +__bpf_kfunc struct mem_cgroup *bpf_get_root_mem_cgroup(void)
> > > >> +{
> > > >> +  if (mem_cgroup_disabled())
> > > >> +          return NULL;
> > > >> +
> > > >> +  /* css_get() is not needed */
> > > >> +  return root_mem_cgroup;
> > > >> +}
> > > >> +
> > > >>  /**
> > > >>   * bpf_get_mem_cgroup - Get a reference to a memory cgroup
> > > >>   * @css: pointer to the css structure
> > > >> @@ -64,6 +83,7 @@ __bpf_kfunc void bpf_put_mem_cgroup(struct mem_cgroup *memcg)
> > > >>  __bpf_kfunc_end_defs();
> > > >>
> > > >>  BTF_KFUNCS_START(bpf_memcontrol_kfuncs)
> > > >> +BTF_ID_FLAGS(func, bpf_get_root_mem_cgroup, KF_ACQUIRE | KF_RET_NULL)
> > > >
> > > > I feel as though relying on KF_ACQUIRE semantics here is somewhat
> > > > odd. Users of this BPF kfunc will now be forced to call
> > > > bpf_put_mem_cgroup() on the returned root_mem_cgroup, despite it being
> > > > completely unnecessary.
> > >
> > > A agree that it's annoying, but I doubt this extra call makes any
> > > difference in the real world.
> >
> > Sure, that certainly holds true.
> >
> > > Also, the corresponding kernel code designed to hide the special
> > > handling of the root cgroup. css_get()/css_put() are simple no-ops for
> > > the root cgroup, but are totally valid.
> >
> > Yes, I do see that.
> >
> > > So in most places the root cgroup is handled as any other, which
> > > simplifies the code. I guess the same will be true for many bpf
> > > programs.
> >
> > I see, however the same might not necessarily hold for all other
> > global pointers which end up being handed out by a BPF kfunc (not
> > necessarily bpf_get_root_mem_cgroup()). This is why I was wondering
> > whether there's some sense to introducing another KF flag (or
> > something similar) which allows returned values from BPF kfuncs to be
> > implicitly treated as trusted.
> 
> No need for a new KF flag. Any struct returned by kfunc should be
> trusted or trusted_or_null if KF_RET_NULL was specified.
> I don't remember off the top of my head, but this behavior
> is already implemented or we discussed making it this way.

Hm, I do not see any evidence of this kind of semantic currently
implemented, so perhaps it was only discussed at some point. Would you
like me to put forward a patch that introduces this kind of implicit
trust semantic for BPF kfuncs returning pointer to struct types?

Re: [PATCH bpf-next v4 3/6] mm: introduce bpf_get_root_mem_cgroup() BPF kfunc

Posted by Alexei Starovoitov 1 month ago

On Sun, Jan 4, 2026 at 11:49 PM Matt Bobrowski <mattbobrowski@google.com> wrote:
>
> >
> > No need for a new KF flag. Any struct returned by kfunc should be
> > trusted or trusted_or_null if KF_RET_NULL was specified.
> > I don't remember off the top of my head, but this behavior
> > is already implemented or we discussed making it this way.
>
> Hm, I do not see any evidence of this kind of semantic currently
> implemented, so perhaps it was only discussed at some point. Would you
> like me to put forward a patch that introduces this kind of implicit
> trust semantic for BPF kfuncs returning pointer to struct types?

Hmm. What about these:
BTF_ID_FLAGS(func, scx_bpf_cpu_rq)
BTF_ID_FLAGS(func, scx_bpf_locked_rq, KF_RET_NULL)
BTF_ID_FLAGS(func, scx_bpf_cpu_curr, KF_RET_NULL | KF_RCU_PROTECTED)

I thought they're returning a trusted pointer without acquiring it.
iirc the last one returns trusted in RCU CS,
but the first two return just a legacy ptr_to_btf_id ?
This is something to fix asap then.

Re: [PATCH bpf-next v4 3/6] mm: introduce bpf_get_root_mem_cgroup() BPF kfunc

Posted by Matt Bobrowski 1 month ago

On Mon, Jan 05, 2026 at 08:05:54AM -0800, Alexei Starovoitov wrote:
> On Sun, Jan 4, 2026 at 11:49 PM Matt Bobrowski <mattbobrowski@google.com> wrote:
> >
> > >
> > > No need for a new KF flag. Any struct returned by kfunc should be
> > > trusted or trusted_or_null if KF_RET_NULL was specified.
> > > I don't remember off the top of my head, but this behavior
> > > is already implemented or we discussed making it this way.
> >
> > Hm, I do not see any evidence of this kind of semantic currently
> > implemented, so perhaps it was only discussed at some point. Would you
> > like me to put forward a patch that introduces this kind of implicit
> > trust semantic for BPF kfuncs returning pointer to struct types?
> 
> Hmm. What about these:
> BTF_ID_FLAGS(func, scx_bpf_cpu_rq)
> BTF_ID_FLAGS(func, scx_bpf_locked_rq, KF_RET_NULL)
> BTF_ID_FLAGS(func, scx_bpf_cpu_curr, KF_RET_NULL | KF_RCU_PROTECTED)
> 
> I thought they're returning a trusted pointer without acquiring it.
> iirc the last one returns trusted in RCU CS,
> but the first two return just a legacy ptr_to_btf_id ?
> This is something to fix asap then.

No, AFAIU they do not. These simply return a regular pointer to BTF ID
(PTR_TO_BTF_ID), rather than a formally "trusted" pointer (which would
carry the PTR_TRUSTED flag or a ref_obj_id). scx_bpf_cpu_curr returns
a MEM_RCU pointer (via KF_RCU_PROTECTED), which is somewhat considered
to be trusted within a RCU read-side critical section *ONLY*.

Kumar/Tejun,

Please keep me honest here.

Re: [PATCH bpf-next v4 3/6] mm: introduce bpf_get_root_mem_cgroup() BPF kfunc

Posted by Kumar Kartikeya Dwivedi 1 month ago

On Mon, 5 Jan 2026 at 22:04, Matt Bobrowski <mattbobrowski@google.com> wrote:
>
> On Mon, Jan 05, 2026 at 08:05:54AM -0800, Alexei Starovoitov wrote:
> > On Sun, Jan 4, 2026 at 11:49 PM Matt Bobrowski <mattbobrowski@google.com> wrote:
> > >
> > > >
> > > > No need for a new KF flag. Any struct returned by kfunc should be
> > > > trusted or trusted_or_null if KF_RET_NULL was specified.
> > > > I don't remember off the top of my head, but this behavior
> > > > is already implemented or we discussed making it this way.
> > >
> > > Hm, I do not see any evidence of this kind of semantic currently
> > > implemented, so perhaps it was only discussed at some point. Would you
> > > like me to put forward a patch that introduces this kind of implicit
> > > trust semantic for BPF kfuncs returning pointer to struct types?
> >
> > Hmm. What about these:
> > BTF_ID_FLAGS(func, scx_bpf_cpu_rq)
> > BTF_ID_FLAGS(func, scx_bpf_locked_rq, KF_RET_NULL)
> > BTF_ID_FLAGS(func, scx_bpf_cpu_curr, KF_RET_NULL | KF_RCU_PROTECTED)
> >
> > I thought they're returning a trusted pointer without acquiring it.
> > iirc the last one returns trusted in RCU CS,
> > but the first two return just a legacy ptr_to_btf_id ?
> > This is something to fix asap then.
>
> No, AFAIU they do not. These simply return a regular pointer to BTF ID
> (PTR_TO_BTF_ID), rather than a formally "trusted" pointer (which would
> carry the PTR_TRUSTED flag or a ref_obj_id). scx_bpf_cpu_curr returns
> a MEM_RCU pointer (via KF_RCU_PROTECTED), which is somewhat considered
> to be trusted within a RCU read-side critical section *ONLY*.
>
> Kumar/Tejun,

Yeah, they don't return a trusted pointer. I think it would make sense
to change the behavior here by default.
A non-trusted pointer cannot be passed to kfuncs taking trusted
arguments, so hopefully it will only make things more permissive and
doesn't break anything.

>
> Please keep me honest here.

Re: [PATCH bpf-next v4 3/6] mm: introduce bpf_get_root_mem_cgroup() BPF kfunc

Posted by Matt Bobrowski 1 month ago

On Tue, Jan 06, 2026 at 04:13:24PM +0100, Kumar Kartikeya Dwivedi wrote:
> On Mon, 5 Jan 2026 at 22:04, Matt Bobrowski <mattbobrowski@google.com> wrote:
> >
> > On Mon, Jan 05, 2026 at 08:05:54AM -0800, Alexei Starovoitov wrote:
> > > On Sun, Jan 4, 2026 at 11:49 PM Matt Bobrowski <mattbobrowski@google.com> wrote:
> > > >
> > > > >
> > > > > No need for a new KF flag. Any struct returned by kfunc should be
> > > > > trusted or trusted_or_null if KF_RET_NULL was specified.
> > > > > I don't remember off the top of my head, but this behavior
> > > > > is already implemented or we discussed making it this way.
> > > >
> > > > Hm, I do not see any evidence of this kind of semantic currently
> > > > implemented, so perhaps it was only discussed at some point. Would you
> > > > like me to put forward a patch that introduces this kind of implicit
> > > > trust semantic for BPF kfuncs returning pointer to struct types?
> > >
> > > Hmm. What about these:
> > > BTF_ID_FLAGS(func, scx_bpf_cpu_rq)
> > > BTF_ID_FLAGS(func, scx_bpf_locked_rq, KF_RET_NULL)
> > > BTF_ID_FLAGS(func, scx_bpf_cpu_curr, KF_RET_NULL | KF_RCU_PROTECTED)
> > >
> > > I thought they're returning a trusted pointer without acquiring it.
> > > iirc the last one returns trusted in RCU CS,
> > > but the first two return just a legacy ptr_to_btf_id ?
> > > This is something to fix asap then.
> >
> > No, AFAIU they do not. These simply return a regular pointer to BTF ID
> > (PTR_TO_BTF_ID), rather than a formally "trusted" pointer (which would
> > carry the PTR_TRUSTED flag or a ref_obj_id). scx_bpf_cpu_curr returns
> > a MEM_RCU pointer (via KF_RCU_PROTECTED), which is somewhat considered
> > to be trusted within a RCU read-side critical section *ONLY*.
> >
> > Kumar/Tejun,
> 
> Yeah, they don't return a trusted pointer. I think it would make sense
> to change the behavior here by default.

Thanks for chiming in and confirming this Kumar! I also agree that any
BPF kfunc returning a pointer should be treated as being implicitly
trusted by default. I can't think of any scenario whereby a BPF kfunc
would want to return a pointer that'd fundamentally be untrusted, but
there always could be some exceptions. Anyway, I will work on this and
send something through for review soon.

> A non-trusted pointer cannot be passed to kfuncs taking trusted
> arguments, so hopefully it will only make things more permissive and
> doesn't break anything.

We can only hope! ;)

Re: [PATCH bpf-next v4 3/6] mm: introduce bpf_get_root_mem_cgroup() BPF kfunc

Posted by Roman Gushchin 1 month, 1 week ago

Matt Bobrowski <mattbobrowski@google.com> writes:

> On Tue, Dec 30, 2025 at 09:00:28PM +0000, Roman Gushchin wrote:
>> Matt Bobrowski <mattbobrowski@google.com> writes:
>> 
>> > On Mon, Dec 22, 2025 at 08:41:53PM -0800, Roman Gushchin wrote:
>> >> Introduce a BPF kfunc to get a trusted pointer to the root memory
>> >> cgroup. It's very handy to traverse the full memcg tree, e.g.
>> >> for handling a system-wide OOM.
>> >> 
>> >> It's possible to obtain this pointer by traversing the memcg tree
>> >> up from any known memcg, but it's sub-optimal and makes BPF programs
>> >> more complex and less efficient.
>> >> 
>> >> bpf_get_root_mem_cgroup() has a KF_ACQUIRE | KF_RET_NULL semantics,
>> >> however in reality it's not necessary to bump the corresponding
>> >> reference counter - root memory cgroup is immortal, reference counting
>> >> is skipped, see css_get(). Once set, root_mem_cgroup is always a valid
>> >> memcg pointer. It's safe to call bpf_put_mem_cgroup() for the pointer
>> >> obtained with bpf_get_root_mem_cgroup(), it's effectively a no-op.
>> >> 
>> >> Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
>> >> ---
>> >>  mm/bpf_memcontrol.c | 20 ++++++++++++++++++++
>> >>  1 file changed, 20 insertions(+)
>> >> 
>> >> diff --git a/mm/bpf_memcontrol.c b/mm/bpf_memcontrol.c
>> >> index 82eb95de77b7..187919eb2fe2 100644
>> >> --- a/mm/bpf_memcontrol.c
>> >> +++ b/mm/bpf_memcontrol.c
>> >> @@ -10,6 +10,25 @@
>> >>  
>> >>  __bpf_kfunc_start_defs();
>> >>  
>> >> +/**
>> >> + * bpf_get_root_mem_cgroup - Returns a pointer to the root memory cgroup
>> >> + *
>> >> + * The function has KF_ACQUIRE semantics, even though the root memory
>> >> + * cgroup is never destroyed after being created and doesn't require
>> >> + * reference counting. And it's perfectly safe to pass it to
>> >> + * bpf_put_mem_cgroup()
>> >> + *
>> >> + * Return: A pointer to the root memory cgroup.
>> >> + */
>> >> +__bpf_kfunc struct mem_cgroup *bpf_get_root_mem_cgroup(void)
>> >> +{
>> >> +	if (mem_cgroup_disabled())
>> >> +		return NULL;
>> >> +
>> >> +	/* css_get() is not needed */
>> >> +	return root_mem_cgroup;
>> >> +}
>> >> +
>> >>  /**
>> >>   * bpf_get_mem_cgroup - Get a reference to a memory cgroup
>> >>   * @css: pointer to the css structure
>> >> @@ -64,6 +83,7 @@ __bpf_kfunc void bpf_put_mem_cgroup(struct mem_cgroup *memcg)
>> >>  __bpf_kfunc_end_defs();
>> >>  
>> >>  BTF_KFUNCS_START(bpf_memcontrol_kfuncs)
>> >> +BTF_ID_FLAGS(func, bpf_get_root_mem_cgroup, KF_ACQUIRE | KF_RET_NULL)
>> >
>> > I feel as though relying on KF_ACQUIRE semantics here is somewhat
>> > odd. Users of this BPF kfunc will now be forced to call
>> > bpf_put_mem_cgroup() on the returned root_mem_cgroup, despite it being
>> > completely unnecessary.
>> 
>> A agree that it's annoying, but I doubt this extra call makes any
>> difference in the real world.
>
> Sure, that certainly holds true.
>
>> Also, the corresponding kernel code designed to hide the special
>> handling of the root cgroup. css_get()/css_put() are simple no-ops for
>> the root cgroup, but are totally valid.
>
> Yes, I do see that.
>
>> So in most places the root cgroup is handled as any other, which
>> simplifies the code. I guess the same will be true for many bpf
>> programs.
>
> I see, however the same might not necessarily hold for all other
> global pointers which end up being handed out by a BPF kfunc (not
> necessarily bpf_get_root_mem_cgroup()). This is why I was wondering
> whether there's some sense to introducing another KF flag (or
> something similar) which allows returned values from BPF kfuncs to be
> implicitly treated as trusted.

Agree. It sounds like a good idea to me.