[v3] mm: BPF OOM

[PATCH bpf-next v3 10/17] mm: introduce bpf_task_is_oom_victim() kfunc

Posted by Roman Gushchin 1 week, 5 days ago

Export tsk_is_oom_victim() helper as a BPF kfunc.
It's very useful to avoid redundant oom kills.

Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
Suggested-by: Michal Hocko <mhocko@suse.com>
---
 mm/oom_kill.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 8f63a370b8f5..53f9f9674658 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -1381,10 +1381,24 @@ __bpf_kfunc int bpf_out_of_memory(struct mem_cgroup *memcg__nullable,
 	return ret;
 }
 
+/**
+ * bpf_task_is_oom_victim - Check if the task has been marked as an OOM victim
+ * @task: task to check
+ *
+ * Returns true if the task has been previously selected by the OOM killer
+ * to be killed. It's expected that the task will be destroyed soon and some
+ * memory will be freed, so maybe no additional actions required.
+ */
+__bpf_kfunc bool bpf_task_is_oom_victim(struct task_struct *task)
+{
+	return tsk_is_oom_victim(task);
+}
+
 __bpf_kfunc_end_defs();
 
 BTF_KFUNCS_START(bpf_oom_kfuncs)
 BTF_ID_FLAGS(func, bpf_oom_kill_process, KF_SLEEPABLE)
+BTF_ID_FLAGS(func, bpf_task_is_oom_victim)
 BTF_KFUNCS_END(bpf_oom_kfuncs)
 
 BTF_ID_LIST_SINGLE(bpf_oom_ops_ids, struct, bpf_oom_ops)
-- 
2.52.0

Re: [PATCH bpf-next v3 10/17] mm: introduce bpf_task_is_oom_victim() kfunc

Posted by Matt Bobrowski 6 days, 12 hours ago

On Mon, Jan 26, 2026 at 06:44:13PM -0800, Roman Gushchin wrote:
> Export tsk_is_oom_victim() helper as a BPF kfunc.
> It's very useful to avoid redundant oom kills.
> 
> Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
> Suggested-by: Michal Hocko <mhocko@suse.com>
> ---
>  mm/oom_kill.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 8f63a370b8f5..53f9f9674658 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -1381,10 +1381,24 @@ __bpf_kfunc int bpf_out_of_memory(struct mem_cgroup *memcg__nullable,
>  	return ret;
>  }
>  
> +/**
> + * bpf_task_is_oom_victim - Check if the task has been marked as an OOM victim
> + * @task: task to check
> + *
> + * Returns true if the task has been previously selected by the OOM killer
> + * to be killed. It's expected that the task will be destroyed soon and some
> + * memory will be freed, so maybe no additional actions required.
> + */
> +__bpf_kfunc bool bpf_task_is_oom_victim(struct task_struct *task)
> +{
> +	return tsk_is_oom_victim(task);
> +}

Why not just do a direct memory read (i.e., task->signal->oom_mm)
within the BPF program? I'm not quite convinced that a BPF kfunc
wrapper for something like tsk_is_oom_victim() is warranted as you can
literally achieve the same semantics without one.

Re: [PATCH bpf-next v3 10/17] mm: introduce bpf_task_is_oom_victim() kfunc

Posted by Alexei Starovoitov 6 days, 1 hour ago

On Sun, Feb 1, 2026 at 9:39 PM Matt Bobrowski <mattbobrowski@google.com> wrote:
>
> On Mon, Jan 26, 2026 at 06:44:13PM -0800, Roman Gushchin wrote:
> > Export tsk_is_oom_victim() helper as a BPF kfunc.
> > It's very useful to avoid redundant oom kills.
> >
> > Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
> > Suggested-by: Michal Hocko <mhocko@suse.com>
> > ---
> >  mm/oom_kill.c | 14 ++++++++++++++
> >  1 file changed, 14 insertions(+)
> >
> > diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> > index 8f63a370b8f5..53f9f9674658 100644
> > --- a/mm/oom_kill.c
> > +++ b/mm/oom_kill.c
> > @@ -1381,10 +1381,24 @@ __bpf_kfunc int bpf_out_of_memory(struct mem_cgroup *memcg__nullable,
> >       return ret;
> >  }
> >
> > +/**
> > + * bpf_task_is_oom_victim - Check if the task has been marked as an OOM victim
> > + * @task: task to check
> > + *
> > + * Returns true if the task has been previously selected by the OOM killer
> > + * to be killed. It's expected that the task will be destroyed soon and some
> > + * memory will be freed, so maybe no additional actions required.
> > + */
> > +__bpf_kfunc bool bpf_task_is_oom_victim(struct task_struct *task)
> > +{
> > +     return tsk_is_oom_victim(task);
> > +}
>
> Why not just do a direct memory read (i.e., task->signal->oom_mm)
> within the BPF program? I'm not quite convinced that a BPF kfunc
> wrapper for something like tsk_is_oom_victim() is warranted as you can
> literally achieve the same semantics without one.

+1
there is no need for this kfunc.

Re: [PATCH bpf-next v3 10/17] mm: introduce bpf_task_is_oom_victim() kfunc

Posted by Roman Gushchin 5 days, 18 hours ago

Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:

> On Sun, Feb 1, 2026 at 9:39 PM Matt Bobrowski <mattbobrowski@google.com> wrote:
>>
>> On Mon, Jan 26, 2026 at 06:44:13PM -0800, Roman Gushchin wrote:
>> > Export tsk_is_oom_victim() helper as a BPF kfunc.
>> > It's very useful to avoid redundant oom kills.
>> >
>> > Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
>> > Suggested-by: Michal Hocko <mhocko@suse.com>
>> > ---
>> >  mm/oom_kill.c | 14 ++++++++++++++
>> >  1 file changed, 14 insertions(+)
>> >
>> > diff --git a/mm/oom_kill.c b/mm/oom_kill.c
>> > index 8f63a370b8f5..53f9f9674658 100644
>> > --- a/mm/oom_kill.c
>> > +++ b/mm/oom_kill.c
>> > @@ -1381,10 +1381,24 @@ __bpf_kfunc int bpf_out_of_memory(struct mem_cgroup *memcg__nullable,
>> >       return ret;
>> >  }
>> >
>> > +/**
>> > + * bpf_task_is_oom_victim - Check if the task has been marked as an OOM victim
>> > + * @task: task to check
>> > + *
>> > + * Returns true if the task has been previously selected by the OOM killer
>> > + * to be killed. It's expected that the task will be destroyed soon and some
>> > + * memory will be freed, so maybe no additional actions required.
>> > + */
>> > +__bpf_kfunc bool bpf_task_is_oom_victim(struct task_struct *task)
>> > +{
>> > +     return tsk_is_oom_victim(task);
>> > +}
>>
>> Why not just do a direct memory read (i.e., task->signal->oom_mm)
>> within the BPF program? I'm not quite convinced that a BPF kfunc
>> wrapper for something like tsk_is_oom_victim() is warranted as you can
>> literally achieve the same semantics without one.
>
> +1
> there is no need for this kfunc.

It was explicitly asked by Michal Hocko, who is (co)maintaining the oom
code. I don't have a strong opinion here. I agree that it can be easily
open-coded without a kfunc, but at the same time the cost of having an
extra kfunc is not high and it makes the API more consistent.

Michal, do you feel strongly about having a dedicated kfunc vs the
direct memory read?

Re: [PATCH bpf-next v3 10/17] mm: introduce bpf_task_is_oom_victim() kfunc

Posted by Michal Hocko 5 days, 5 hours ago

On Mon 02-02-26 16:14:37, Roman Gushchin wrote:
> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
> 
> > On Sun, Feb 1, 2026 at 9:39 PM Matt Bobrowski <mattbobrowski@google.com> wrote:
> >>
> >> On Mon, Jan 26, 2026 at 06:44:13PM -0800, Roman Gushchin wrote:
> >> > Export tsk_is_oom_victim() helper as a BPF kfunc.
> >> > It's very useful to avoid redundant oom kills.
> >> >
> >> > Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
> >> > Suggested-by: Michal Hocko <mhocko@suse.com>
> >> > ---
> >> >  mm/oom_kill.c | 14 ++++++++++++++
> >> >  1 file changed, 14 insertions(+)
> >> >
> >> > diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> >> > index 8f63a370b8f5..53f9f9674658 100644
> >> > --- a/mm/oom_kill.c
> >> > +++ b/mm/oom_kill.c
> >> > @@ -1381,10 +1381,24 @@ __bpf_kfunc int bpf_out_of_memory(struct mem_cgroup *memcg__nullable,
> >> >       return ret;
> >> >  }
> >> >
> >> > +/**
> >> > + * bpf_task_is_oom_victim - Check if the task has been marked as an OOM victim
> >> > + * @task: task to check
> >> > + *
> >> > + * Returns true if the task has been previously selected by the OOM killer
> >> > + * to be killed. It's expected that the task will be destroyed soon and some
> >> > + * memory will be freed, so maybe no additional actions required.
> >> > + */
> >> > +__bpf_kfunc bool bpf_task_is_oom_victim(struct task_struct *task)
> >> > +{
> >> > +     return tsk_is_oom_victim(task);
> >> > +}
> >>
> >> Why not just do a direct memory read (i.e., task->signal->oom_mm)
> >> within the BPF program? I'm not quite convinced that a BPF kfunc
> >> wrapper for something like tsk_is_oom_victim() is warranted as you can
> >> literally achieve the same semantics without one.
> >
> > +1
> > there is no need for this kfunc.
> 
> It was explicitly asked by Michal Hocko, who is (co)maintaining the oom
> code. I don't have a strong opinion here. I agree that it can be easily
> open-coded without a kfunc, but at the same time the cost of having an
> extra kfunc is not high and it makes the API more consistent.
> 
> Michal, do you feel strongly about having a dedicated kfunc vs the
> direct memory read?

The reason I wanted this an explicit API is that oom states are quite
internal part of the oom synchronization. And I would really like to
have that completely transparent for oom policies. In other words I do
not want to touch all potential oom policies or break them in the worst
case just because we need to change this. So while a trivial interface
now (and hopefully for a long time) it is really an internal thing.

Do I insist? No, I do not but I would like to hear why this is a bad
idea.

-- 
Michal Hocko
SUSE Labs

Re: [PATCH bpf-next v3 10/17] mm: introduce bpf_task_is_oom_victim() kfunc

Posted by Alexei Starovoitov 5 days, 2 hours ago

On Tue, Feb 3, 2026 at 5:23 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Mon 02-02-26 16:14:37, Roman Gushchin wrote:
> > Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
> >
> > > On Sun, Feb 1, 2026 at 9:39 PM Matt Bobrowski <mattbobrowski@google.com> wrote:
> > >>
> > >> On Mon, Jan 26, 2026 at 06:44:13PM -0800, Roman Gushchin wrote:
> > >> > Export tsk_is_oom_victim() helper as a BPF kfunc.
> > >> > It's very useful to avoid redundant oom kills.
> > >> >
> > >> > Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
> > >> > Suggested-by: Michal Hocko <mhocko@suse.com>
> > >> > ---
> > >> >  mm/oom_kill.c | 14 ++++++++++++++
> > >> >  1 file changed, 14 insertions(+)
> > >> >
> > >> > diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> > >> > index 8f63a370b8f5..53f9f9674658 100644
> > >> > --- a/mm/oom_kill.c
> > >> > +++ b/mm/oom_kill.c
> > >> > @@ -1381,10 +1381,24 @@ __bpf_kfunc int bpf_out_of_memory(struct mem_cgroup *memcg__nullable,
> > >> >       return ret;
> > >> >  }
> > >> >
> > >> > +/**
> > >> > + * bpf_task_is_oom_victim - Check if the task has been marked as an OOM victim
> > >> > + * @task: task to check
> > >> > + *
> > >> > + * Returns true if the task has been previously selected by the OOM killer
> > >> > + * to be killed. It's expected that the task will be destroyed soon and some
> > >> > + * memory will be freed, so maybe no additional actions required.
> > >> > + */
> > >> > +__bpf_kfunc bool bpf_task_is_oom_victim(struct task_struct *task)
> > >> > +{
> > >> > +     return tsk_is_oom_victim(task);
> > >> > +}
> > >>
> > >> Why not just do a direct memory read (i.e., task->signal->oom_mm)
> > >> within the BPF program? I'm not quite convinced that a BPF kfunc
> > >> wrapper for something like tsk_is_oom_victim() is warranted as you can
> > >> literally achieve the same semantics without one.
> > >
> > > +1
> > > there is no need for this kfunc.
> >
> > It was explicitly asked by Michal Hocko, who is (co)maintaining the oom
> > code. I don't have a strong opinion here. I agree that it can be easily
> > open-coded without a kfunc, but at the same time the cost of having an
> > extra kfunc is not high and it makes the API more consistent.
> >
> > Michal, do you feel strongly about having a dedicated kfunc vs the
> > direct memory read?
>
> The reason I wanted this an explicit API is that oom states are quite
> internal part of the oom synchronization. And I would really like to
> have that completely transparent for oom policies. In other words I do
> not want to touch all potential oom policies or break them in the worst
> case just because we need to change this. So while a trivial interface
> now (and hopefully for a long time) it is really an internal thing.
>
> Do I insist? No, I do not but I would like to hear why this is a bad
> idea.

It's a bad idea, since it doesn't address your goal.
bpf prog can access task->signal->oom_mm without kfunc just fine
and it will be doing so because performance matters and
static inline bool foo(task)
{
  return task->signal->oom_mm;
}

will be inlined as 2 loads while kfunc is a function call with 6 registers
being scratched.

If anything changes and, say, oom_mm will get renamed whether
it was kfunc or not doesn't change much. progs will adopt to a new
way easily with CORE. kfuncs can also be renamed/deleted, etc.
You're thinking about kfuncs as a stable api. It's definitely not.
It's not a layer of isolation either. kfuncs are necessary only
for the cases where bpf prog cannot do it on its own.

"internal thing" is also a wrong way of thinking of bpf-oom.
bpf-oom _will_ look into oom, cgroup and kernel internals in general.
All bpf progs do because they have to do that to achieve their goals.
Everything in mm/internal.h have been available to access by bpf progs
for a decade now. Did it cause any issue to mm development? No.
So let's not build some non-existent wall or "internal oom thing".

Re: [PATCH bpf-next v3 10/17] mm: introduce bpf_task_is_oom_victim() kfunc

Posted by Michal Hocko 4 days, 9 hours ago

On Tue 03-02-26 08:31:19, Alexei Starovoitov wrote:
> On Tue, Feb 3, 2026 at 5:23 AM Michal Hocko <mhocko@suse.com> wrote:
> >
> > On Mon 02-02-26 16:14:37, Roman Gushchin wrote:
[...]
> > > Michal, do you feel strongly about having a dedicated kfunc vs the
> > > direct memory read?
> >
> > The reason I wanted this an explicit API is that oom states are quite
> > internal part of the oom synchronization. And I would really like to
> > have that completely transparent for oom policies. In other words I do
> > not want to touch all potential oom policies or break them in the worst
> > case just because we need to change this. So while a trivial interface
> > now (and hopefully for a long time) it is really an internal thing.
> >
> > Do I insist? No, I do not but I would like to hear why this is a bad
> > idea.
> 
> It's a bad idea, since it doesn't address your goal.
> bpf prog can access task->signal->oom_mm without kfunc just fine
> and it will be doing so because performance matters and
> static inline bool foo(task)
> {
>   return task->signal->oom_mm;
> }

OK, so my understanding was that BPF can only use exported
functionality. If those progs can access whatever they get a pointer for
and than traverse down the road then this is moot from a large part.

> will be inlined as 2 loads while kfunc is a function call with 6 registers
> being scratched.

performance is not really crucial in this context. We are OOM, couple of
loads vs. registers will not make much difference. It is really more
about code writers what they can/should be using. OOM is a piece of
complex code with many loose ends that might not be obvious.

> If anything changes and, say, oom_mm will get renamed whether
> it was kfunc or not doesn't change much. progs will adopt to a new
> way easily with CORE. kfuncs can also be renamed/deleted, etc.
> You're thinking about kfuncs as a stable api. It's definitely not.
> It's not a layer of isolation either. kfuncs are necessary only
> for the cases where bpf prog cannot do it on its own.

It is obviously not clear to me where that line is for BPF progs. Where
is this documented?

-- 
Michal Hocko
SUSE Labs

Re: [PATCH bpf-next v3 10/17] mm: introduce bpf_task_is_oom_victim() kfunc

Posted by Alexei Starovoitov 3 days, 18 hours ago

On Wed, Feb 4, 2026 at 1:02 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Tue 03-02-26 08:31:19, Alexei Starovoitov wrote:
> > On Tue, Feb 3, 2026 at 5:23 AM Michal Hocko <mhocko@suse.com> wrote:
> > >
> > > On Mon 02-02-26 16:14:37, Roman Gushchin wrote:
> [...]
> > > > Michal, do you feel strongly about having a dedicated kfunc vs the
> > > > direct memory read?
> > >
> > > The reason I wanted this an explicit API is that oom states are quite
> > > internal part of the oom synchronization. And I would really like to
> > > have that completely transparent for oom policies. In other words I do
> > > not want to touch all potential oom policies or break them in the worst
> > > case just because we need to change this. So while a trivial interface
> > > now (and hopefully for a long time) it is really an internal thing.
> > >
> > > Do I insist? No, I do not but I would like to hear why this is a bad
> > > idea.
> >
> > It's a bad idea, since it doesn't address your goal.
> > bpf prog can access task->signal->oom_mm without kfunc just fine
> > and it will be doing so because performance matters and
> > static inline bool foo(task)
> > {
> >   return task->signal->oom_mm;
> > }
>
> OK, so my understanding was that BPF can only use exported
> functionality. If those progs can access whatever they get a pointer for
> and than traverse down the road then this is moot from a large part.

bpf could access all kernel internals from day one 10 years ago.
We made it more ergonomic over the years.

> > If anything changes and, say, oom_mm will get renamed whether
> > it was kfunc or not doesn't change much. progs will adopt to a new
> > way easily with CORE. kfuncs can also be renamed/deleted, etc.
> > You're thinking about kfuncs as a stable api. It's definitely not.
> > It's not a layer of isolation either. kfuncs are necessary only
> > for the cases where bpf prog cannot do it on its own.
>
> It is obviously not clear to me where that line is for BPF progs. Where
> is this documented?

See Documentation/bpf/kfuncs.rst
Especially "kfunc lifecycle expectations" section.