From: JP Kobryn <inwardvessel@gmail.com>
Introduce BPF kfunc to access memory events, e.g.:
MEMCG_LOW, MEMCG_MAX, MEMCG_OOM, MEMCG_OOM_KILL etc.
Signed-off-by: JP Kobryn <inwardvessel@gmail.com>
Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
---
mm/bpf_memcontrol.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/mm/bpf_memcontrol.c b/mm/bpf_memcontrol.c
index d84fe6f3ed43..858eb43766ce 100644
--- a/mm/bpf_memcontrol.c
+++ b/mm/bpf_memcontrol.c
@@ -103,6 +103,22 @@ __bpf_kfunc unsigned long bpf_mem_cgroup_usage(struct mem_cgroup *memcg)
return mem_cgroup_usage(memcg, false) * PAGE_SIZE;
}
+/**
+ * bpf_mem_cgroup_memory_events - Read memory cgroup's memory event value
+ * @memcg: memory cgroup
+ * @event: memory event id
+ *
+ * Returns current memory event count.
+ */
+__bpf_kfunc unsigned long bpf_mem_cgroup_memory_events(struct mem_cgroup *memcg,
+ enum memcg_memory_event event)
+{
+ if (event >= MEMCG_NR_MEMORY_EVENTS)
+ return (unsigned long)-1;
+
+ return atomic_long_read(&memcg->memory_events[event]);
+}
+
/**
* bpf_mem_cgroup_page_state - Read memory cgroup's page state counter
* @memcg: memory cgroup
@@ -137,6 +153,7 @@ BTF_ID_FLAGS(func, bpf_get_mem_cgroup, KF_TRUSTED_ARGS | KF_ACQUIRE | KF_RET_NUL
BTF_ID_FLAGS(func, bpf_put_mem_cgroup, KF_TRUSTED_ARGS | KF_RELEASE)
BTF_ID_FLAGS(func, bpf_mem_cgroup_vm_events, KF_TRUSTED_ARGS)
+BTF_ID_FLAGS(func, bpf_mem_cgroup_memory_events, KF_TRUSTED_ARGS)
BTF_ID_FLAGS(func, bpf_mem_cgroup_usage, KF_TRUSTED_ARGS)
BTF_ID_FLAGS(func, bpf_mem_cgroup_page_state, KF_TRUSTED_ARGS)
BTF_ID_FLAGS(func, bpf_mem_cgroup_flush_stats, KF_TRUSTED_ARGS | KF_SLEEPABLE)
--
2.52.0
> diff --git a/mm/bpf_memcontrol.c b/mm/bpf_memcontrol.c
> index d84fe6f3ed43..858eb43766ce 100644
> --- a/mm/bpf_memcontrol.c
> +++ b/mm/bpf_memcontrol.c
> @@ -103,6 +103,22 @@ __bpf_kfunc unsigned long bpf_mem_cgroup_usage(struct mem_cgroup *memcg)
> return mem_cgroup_usage(memcg, false) * PAGE_SIZE;
> }
>
> +/**
> + * bpf_mem_cgroup_memory_events - Read memory cgroup's memory event value
> + * @memcg: memory cgroup
> + * @event: memory event id
> + *
> + * Returns current memory event count.
> + */
> +__bpf_kfunc unsigned long bpf_mem_cgroup_memory_events(struct mem_cgroup *memcg,
> + enum memcg_memory_event event)
> +{
> + if (event >= MEMCG_NR_MEMORY_EVENTS)
> + return (unsigned long)-1;
Should this also check for negative values? The enum type is typically
signed, so a BPF program passing a negative event value would pass this
check but cause an out-of-bounds read from memcg->memory_events[].
The similar bpf_mem_cgroup_page_state() in this same file uses:
if (idx < 0 || idx >= MEMCG_NR_STAT)
which checks both bounds. Could the check here be changed to:
if (event < 0 || event >= MEMCG_NR_MEMORY_EVENTS)
to match that pattern?
> +
> + return atomic_long_read(&memcg->memory_events[event]);
> +}
[ ... ]
---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/20389033088
bot+bpf-ci@kernel.org writes:
>> diff --git a/mm/bpf_memcontrol.c b/mm/bpf_memcontrol.c
>> index d84fe6f3ed43..858eb43766ce 100644
>> --- a/mm/bpf_memcontrol.c
>> +++ b/mm/bpf_memcontrol.c
>> @@ -103,6 +103,22 @@ __bpf_kfunc unsigned long bpf_mem_cgroup_usage(struct mem_cgroup *memcg)
>> return mem_cgroup_usage(memcg, false) * PAGE_SIZE;
>> }
>>
>> +/**
>> + * bpf_mem_cgroup_memory_events - Read memory cgroup's memory event value
>> + * @memcg: memory cgroup
>> + * @event: memory event id
>> + *
>> + * Returns current memory event count.
>> + */
>> +__bpf_kfunc unsigned long bpf_mem_cgroup_memory_events(struct mem_cgroup *memcg,
>> + enum memcg_memory_event event)
>> +{
>> + if (event >= MEMCG_NR_MEMORY_EVENTS)
>> + return (unsigned long)-1;
>
> Should this also check for negative values? The enum type is typically
> signed, so a BPF program passing a negative event value would pass this
> check but cause an out-of-bounds read from memcg->memory_events[].
>
> The similar bpf_mem_cgroup_page_state() in this same file uses:
>
> if (idx < 0 || idx >= MEMCG_NR_STAT)
>
> which checks both bounds. Could the check here be changed to:
>
> if (event < 0 || event >= MEMCG_NR_MEMORY_EVENTS)
>
> to match that pattern?
No, the bpf verifier enforces event > 0.
It's a false positive.
On 12/19/25 11:41 PM, Roman Gushchin wrote:
> bot+bpf-ci@kernel.org writes:
>
>>> diff --git a/mm/bpf_memcontrol.c b/mm/bpf_memcontrol.c
>>> index d84fe6f3ed43..858eb43766ce 100644
>>> --- a/mm/bpf_memcontrol.c
>>> +++ b/mm/bpf_memcontrol.c
>>> @@ -103,6 +103,22 @@ __bpf_kfunc unsigned long bpf_mem_cgroup_usage(struct mem_cgroup *memcg)
>>> return mem_cgroup_usage(memcg, false) * PAGE_SIZE;
>>> }
>>>
>>> +/**
>>> + * bpf_mem_cgroup_memory_events - Read memory cgroup's memory event value
>>> + * @memcg: memory cgroup
>>> + * @event: memory event id
>>> + *
>>> + * Returns current memory event count.
>>> + */
>>> +__bpf_kfunc unsigned long bpf_mem_cgroup_memory_events(struct mem_cgroup *memcg,
>>> + enum memcg_memory_event event)
>>> +{
>>> + if (event >= MEMCG_NR_MEMORY_EVENTS)
>>> + return (unsigned long)-1;
>>
>> Should this also check for negative values? The enum type is typically
>> signed, so a BPF program passing a negative event value would pass this
>> check but cause an out-of-bounds read from memcg->memory_events[].
>>
>> The similar bpf_mem_cgroup_page_state() in this same file uses:
>>
>> if (idx < 0 || idx >= MEMCG_NR_STAT)
>>
>> which checks both bounds. Could the check here be changed to:
>>
>> if (event < 0 || event >= MEMCG_NR_MEMORY_EVENTS)
>>
>> to match that pattern?
>
> No, the bpf verifier enforces event > 0.
> It's a false positive.
I'll add some words here to the bpf prompts, thanks Roman.
-chris
Chris Mason <clm@meta.com> writes:
> On 12/19/25 11:41 PM, Roman Gushchin wrote:
>> bot+bpf-ci@kernel.org writes:
>>
>>>> diff --git a/mm/bpf_memcontrol.c b/mm/bpf_memcontrol.c
>>>> index d84fe6f3ed43..858eb43766ce 100644
>>>> --- a/mm/bpf_memcontrol.c
>>>> +++ b/mm/bpf_memcontrol.c
>>>> @@ -103,6 +103,22 @@ __bpf_kfunc unsigned long bpf_mem_cgroup_usage(struct mem_cgroup *memcg)
>>>> return mem_cgroup_usage(memcg, false) * PAGE_SIZE;
>>>> }
>>>>
>>>> +/**
>>>> + * bpf_mem_cgroup_memory_events - Read memory cgroup's memory event value
>>>> + * @memcg: memory cgroup
>>>> + * @event: memory event id
>>>> + *
>>>> + * Returns current memory event count.
>>>> + */
>>>> +__bpf_kfunc unsigned long bpf_mem_cgroup_memory_events(struct mem_cgroup *memcg,
>>>> + enum memcg_memory_event event)
>>>> +{
>>>> + if (event >= MEMCG_NR_MEMORY_EVENTS)
>>>> + return (unsigned long)-1;
>>>
>>> Should this also check for negative values? The enum type is typically
>>> signed, so a BPF program passing a negative event value would pass this
>>> check but cause an out-of-bounds read from memcg->memory_events[].
>>>
>>> The similar bpf_mem_cgroup_page_state() in this same file uses:
>>>
>>> if (idx < 0 || idx >= MEMCG_NR_STAT)
>>>
>>> which checks both bounds. Could the check here be changed to:
>>>
>>> if (event < 0 || event >= MEMCG_NR_MEMORY_EVENTS)
>>>
>>> to match that pattern?
>>
>> No, the bpf verifier enforces event > 0.
>> It's a false positive.
>
> I'll add some words here to the bpf prompts, thanks Roman.
I'll try to play with it too, hopefully we can fix it.
Thanks!
On 12/20/25 1:43 PM, Roman Gushchin wrote:
> Chris Mason <clm@meta.com> writes:
>
>> On 12/19/25 11:41 PM, Roman Gushchin wrote:
>>> bot+bpf-ci@kernel.org writes:
>>>
>>>>> diff --git a/mm/bpf_memcontrol.c b/mm/bpf_memcontrol.c
>>>>> index d84fe6f3ed43..858eb43766ce 100644
>>>>> --- a/mm/bpf_memcontrol.c
>>>>> +++ b/mm/bpf_memcontrol.c
>>>>> @@ -103,6 +103,22 @@ __bpf_kfunc unsigned long bpf_mem_cgroup_usage(struct mem_cgroup *memcg)
>>>>> return mem_cgroup_usage(memcg, false) * PAGE_SIZE;
>>>>> }
>>>>>
>>>>> +/**
>>>>> + * bpf_mem_cgroup_memory_events - Read memory cgroup's memory event value
>>>>> + * @memcg: memory cgroup
>>>>> + * @event: memory event id
>>>>> + *
>>>>> + * Returns current memory event count.
>>>>> + */
>>>>> +__bpf_kfunc unsigned long bpf_mem_cgroup_memory_events(struct mem_cgroup *memcg,
>>>>> + enum memcg_memory_event event)
>>>>> +{
>>>>> + if (event >= MEMCG_NR_MEMORY_EVENTS)
>>>>> + return (unsigned long)-1;
>>>>
>>>> Should this also check for negative values? The enum type is typically
>>>> signed, so a BPF program passing a negative event value would pass this
>>>> check but cause an out-of-bounds read from memcg->memory_events[].
>>>>
>>>> The similar bpf_mem_cgroup_page_state() in this same file uses:
>>>>
>>>> if (idx < 0 || idx >= MEMCG_NR_STAT)
>>>>
>>>> which checks both bounds. Could the check here be changed to:
>>>>
>>>> if (event < 0 || event >= MEMCG_NR_MEMORY_EVENTS)
>>>>
>>>> to match that pattern?
>>>
>>> No, the bpf verifier enforces event > 0.
>>> It's a false positive.
>>
>> I'll add some words here to the bpf prompts, thanks Roman.
>
> I'll try to play with it too, hopefully we can fix it.
>
https://github.com/masoncl/review-prompts/commit/fcc3bf704798f6be64cbb2e28b05a5c91eee9c7b
Seems to do the trick here:
[ ... ]
6. Missing negative check:
- The function only checks event >= MEMCG_NR_MEMORY_EVENTS
- Unlike bpf_mem_cgroup_page_state() which checks idx < 0 || idx >= MEMCG_NR_STAT
- However, event is an enum memcg_memory_event, not int
- Enums in C are typically unsigned when all values are non-negative (starting from 0)
- The enum starts at MEMCG_LOW = 0 (implicit) and all values are positive
- A negative value passed to this enum parameter would be extremely unusual and would require explicit casting
Let me verify the enum values are all non-negative:
Looking at memcg_memory_event enum:
enum memcg_memory_event {
MEMCG_LOW, // 0
MEMCG_HIGH, // 1
MEMCG_MAX, // 2
MEMCG_OOM, // 3
MEMCG_OOM_KILL, // 4
MEMCG_OOM_GROUP_KILL, // 5
MEMCG_SWAP_HIGH, // 6
MEMCG_SWAP_MAX, // 7
MEMCG_SWAP_FAIL, // 8
MEMCG_SOCK_THROTTLED, // 9
MEMCG_NR_MEMORY_EVENTS, // 10
}
All values are 0 or positive. Since the parameter is typed as enum memcg_memory_event (not int), and BPF programs use constants from vmlinux.h, the lack of a < 0 check is acceptable:
1. BPF programs using enum constants can only pass valid values (0-9)
2. Even if a computed value were used, the enum type constrains the interpretation
3. If a negative value were somehow cast to this enum, the unsigned comparison event >= MEMCG_NR_MEMORY_EVENTS would catch it (negative becomes large positive in unsigned comparison)
Wait, let me reconsider this. The function parameter is enum memcg_memory_event event. In C, enums are typically represented as int under the hood. If someone passes a negative value, the comparison event >=
MEMCG_NR_MEMORY_EVENTS would be signed comparison if event is treated as signed int.
Let me check if this is a concern:
● plugin:semcode:semcode - grep_functions (MCP)(pattern: "memcg_memory_event", path_pattern: "bpf_memcontrol")
⎿ Searching function bodies for pattern: memcg_memory_event (filtering paths matching: bpf_memcontrol, limit: 100) at git commit 0f92b93b6e42765a13c933e4e9cb268f8d77cf3c
Found 1 matching line(s):
… +3 lines (ctrl+o to expand)
● Let me think more carefully about the signed/unsigned comparison issue. The comparison event >= MEMCG_NR_MEMORY_EVENTS involves:
- event - an enum (which in C is typically int, signed)
- MEMCG_NR_MEMORY_EVENTS - an enum constant (value 10)
If event were negative (e.g., -1), then -1 >= 10 is false, and the function would proceed to access memcg->memory_events[-1] - an out-of-bounds access.
However, I need to consider the BPF context:
1. Per bpf.md: BPF programs use enum constants from vmlinux.h, and the compiler substitutes valid values at compile time
2. The verifier tracks scalar values: Even if computed at runtime, the verifier tracks value ranges
3. KF_TRUSTED_ARGS: The memcg pointer is validated, but this doesn't affect the enum parameter
Let me compare with bpf_mem_cgroup_page_state() which takes int idx and explicitly checks idx < 0:
__bpf_kfunc unsigned long bpf_mem_cgroup_page_state(struct mem_cgroup *memcg, int idx)
{
if (idx < 0 || idx >= MEMCG_NR_STAT)
return (unsigned long)-1;
...
}
The difference is:
- bpf_mem_cgroup_page_state takes int idx - explicitly signed, needs both checks
- bpf_mem_cgroup_memory_events takes enum memcg_memory_event event - enum typed
For the enum case, per the BPF documentation loaded earlier:
DO NOT report as bugs: Kfuncs that take enum-typed parameters and use constants from vmlinux.h without explicit bounds checks.
-chris
Chris Mason <clm@meta.com> writes:
> On 12/20/25 1:43 PM, Roman Gushchin wrote:
>> Chris Mason <clm@meta.com> writes:
>>
>>> On 12/19/25 11:41 PM, Roman Gushchin wrote:
>>>> bot+bpf-ci@kernel.org writes:
>>>>
>>>>>> diff --git a/mm/bpf_memcontrol.c b/mm/bpf_memcontrol.c
>>>>>> index d84fe6f3ed43..858eb43766ce 100644
>>>>>> --- a/mm/bpf_memcontrol.c
>>>>>> +++ b/mm/bpf_memcontrol.c
>>>>>> @@ -103,6 +103,22 @@ __bpf_kfunc unsigned long bpf_mem_cgroup_usage(struct mem_cgroup *memcg)
>>>>>> return mem_cgroup_usage(memcg, false) * PAGE_SIZE;
>>>>>> }
>>>>>>
>>>>>> +/**
>>>>>> + * bpf_mem_cgroup_memory_events - Read memory cgroup's memory event value
>>>>>> + * @memcg: memory cgroup
>>>>>> + * @event: memory event id
>>>>>> + *
>>>>>> + * Returns current memory event count.
>>>>>> + */
>>>>>> +__bpf_kfunc unsigned long bpf_mem_cgroup_memory_events(struct mem_cgroup *memcg,
>>>>>> + enum memcg_memory_event event)
>>>>>> +{
>>>>>> + if (event >= MEMCG_NR_MEMORY_EVENTS)
>>>>>> + return (unsigned long)-1;
>>>>>
>>>>> Should this also check for negative values? The enum type is typically
>>>>> signed, so a BPF program passing a negative event value would pass this
>>>>> check but cause an out-of-bounds read from memcg->memory_events[].
>>>>>
>>>>> The similar bpf_mem_cgroup_page_state() in this same file uses:
>>>>>
>>>>> if (idx < 0 || idx >= MEMCG_NR_STAT)
>>>>>
>>>>> which checks both bounds. Could the check here be changed to:
>>>>>
>>>>> if (event < 0 || event >= MEMCG_NR_MEMORY_EVENTS)
>>>>>
>>>>> to match that pattern?
>>>>
>>>> No, the bpf verifier enforces event > 0.
>>>> It's a false positive.
>>>
>>> I'll add some words here to the bpf prompts, thanks Roman.
>>
>> I'll try to play with it too, hopefully we can fix it.
>>
>
> https://github.com/masoncl/review-prompts/commit/fcc3bf704798f6be64cbb2e28b05a5c91eee9c7b
Hi Chris!
I'm sorry, apparently I was dead wrong and overestimated the bpf
verifier (and ai was correct, lol). Somebody told me that enums
are fully covered as a feedback to an earlier version and I didn't
check.
In reality the verifier doesn't guarantee the correctness of the value
passed as an enum, only that it's a u32. So we need to check the value.
I've added necessarily checks in v3 of my patchset. It passes the local
ai review without your latest change. Please, revert it.
Thanks and sorry for the hassle
On 12/22/25 5:23 PM, Roman Gushchin wrote: > Chris Mason <clm@meta.com> writes: > >> On 12/20/25 1:43 PM, Roman Gushchin wrote: >>>>> No, the bpf verifier enforces event > 0. >>>>> It's a false positive. >>>> >>>> I'll add some words here to the bpf prompts, thanks Roman. >>> >>> I'll try to play with it too, hopefully we can fix it. >>> >> >> https://github.com/masoncl/review-prompts/commit/fcc3bf704798f6be64cbb2e28b05a5c91eee9c7b > > Hi Chris! > > I'm sorry, apparently I was dead wrong and overestimated the bpf > verifier (and ai was correct, lol). Somebody told me that enums > are fully covered as a feedback to an earlier version and I didn't > check. > > In reality the verifier doesn't guarantee the correctness of the value > passed as an enum, only that it's a u32. So we need to check the value. > I've added necessarily checks in v3 of my patchset. It passes the local > ai review without your latest change. Please, revert it. > > Thanks and sorry for the hassle Thanks Roman, I adjusted the prompt changes and looked harder for proof of exactly what checks are done. -chris
On Fri, Dec 19, 2025 at 6:13 PM Roman Gushchin <roman.gushchin@linux.dev> wrote:
>
> From: JP Kobryn <inwardvessel@gmail.com>
>
> Introduce BPF kfunc to access memory events, e.g.:
> MEMCG_LOW, MEMCG_MAX, MEMCG_OOM, MEMCG_OOM_KILL etc.
>
> Signed-off-by: JP Kobryn <inwardvessel@gmail.com>
> Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
> Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
> ---
> mm/bpf_memcontrol.c | 17 +++++++++++++++++
> 1 file changed, 17 insertions(+)
>
> diff --git a/mm/bpf_memcontrol.c b/mm/bpf_memcontrol.c
> index d84fe6f3ed43..858eb43766ce 100644
> --- a/mm/bpf_memcontrol.c
> +++ b/mm/bpf_memcontrol.c
> @@ -103,6 +103,22 @@ __bpf_kfunc unsigned long bpf_mem_cgroup_usage(struct mem_cgroup *memcg)
> return mem_cgroup_usage(memcg, false) * PAGE_SIZE;
> }
>
> +/**
> + * bpf_mem_cgroup_memory_events - Read memory cgroup's memory event value
> + * @memcg: memory cgroup
> + * @event: memory event id
> + *
> + * Returns current memory event count.
> + */
> +__bpf_kfunc unsigned long bpf_mem_cgroup_memory_events(struct mem_cgroup *memcg,
> + enum memcg_memory_event event)
> +{
> + if (event >= MEMCG_NR_MEMORY_EVENTS)
> + return (unsigned long)-1;
> +
> + return atomic_long_read(&memcg->memory_events[event]);
> +}
Why is patch 5 not squashed with patch 4?
I'd think placing bpf_mem_cgroup_memory_events()
right next to bpf_mem_cgroup_vm_events() in the same patch
will make the difference more clear.
For non-mm people the names are very close and on the first glance
it looks like a duplicate.
On Sun, Dec 21, 2025 at 2:49 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Fri, Dec 19, 2025 at 6:13 PM Roman Gushchin <roman.gushchin@linux.dev> wrote:
> >
> > From: JP Kobryn <inwardvessel@gmail.com>
> >
> > Introduce BPF kfunc to access memory events, e.g.:
> > MEMCG_LOW, MEMCG_MAX, MEMCG_OOM, MEMCG_OOM_KILL etc.
> >
> > Signed-off-by: JP Kobryn <inwardvessel@gmail.com>
> > Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
> > Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
> > ---
> > mm/bpf_memcontrol.c | 17 +++++++++++++++++
> > 1 file changed, 17 insertions(+)
> >
> > diff --git a/mm/bpf_memcontrol.c b/mm/bpf_memcontrol.c
> > index d84fe6f3ed43..858eb43766ce 100644
> > --- a/mm/bpf_memcontrol.c
> > +++ b/mm/bpf_memcontrol.c
> > @@ -103,6 +103,22 @@ __bpf_kfunc unsigned long bpf_mem_cgroup_usage(struct mem_cgroup *memcg)
> > return mem_cgroup_usage(memcg, false) * PAGE_SIZE;
> > }
> >
> > +/**
> > + * bpf_mem_cgroup_memory_events - Read memory cgroup's memory event value
> > + * @memcg: memory cgroup
> > + * @event: memory event id
> > + *
> > + * Returns current memory event count.
> > + */
> > +__bpf_kfunc unsigned long bpf_mem_cgroup_memory_events(struct mem_cgroup *memcg,
> > + enum memcg_memory_event event)
> > +{
> > + if (event >= MEMCG_NR_MEMORY_EVENTS)
> > + return (unsigned long)-1;
> > +
> > + return atomic_long_read(&memcg->memory_events[event]);
> > +}
>
> Why is patch 5 not squashed with patch 4?
> I'd think placing bpf_mem_cgroup_memory_events()
> right next to bpf_mem_cgroup_vm_events() in the same patch
> will make the difference more clear.
> For non-mm people the names are very close and on the first glance
> it looks like a duplicate.
Also see a bunch of kdoc warnings flagged by CI.
© 2016 - 2026 Red Hat, Inc.