[RFC PATCH 0/3] Memory Controller eBPF support

Hui Zhu posted 3 patches 1 week, 6 days ago
MAINTAINERS                                   |   5 +
init/Kconfig                                  |  38 ++++
mm/Makefile                                   |   1 +
mm/memcontrol.c                               |  26 ++-
mm/memcontrol_bpf.c                           | 200 ++++++++++++++++++
mm/memcontrol_bpf.h                           | 103 +++++++++
samples/bpf/Makefile                          |   2 +
samples/bpf/memcg_printk.bpf.c                |  30 +++
samples/bpf/memcg_printk.c                    |  82 +++++++
.../selftests/bpf/prog_tests/memcg_ops.c      | 117 ++++++++++
tools/testing/selftests/bpf/progs/memcg_ops.c |  20 ++
11 files changed, 617 insertions(+), 7 deletions(-)
create mode 100644 mm/memcontrol_bpf.c
create mode 100644 mm/memcontrol_bpf.h
create mode 100644 samples/bpf/memcg_printk.bpf.c
create mode 100644 samples/bpf/memcg_printk.c
create mode 100644 tools/testing/selftests/bpf/prog_tests/memcg_ops.c
create mode 100644 tools/testing/selftests/bpf/progs/memcg_ops.c
[RFC PATCH 0/3] Memory Controller eBPF support
Posted by Hui Zhu 1 week, 6 days ago
From: Hui Zhu <zhuhui@kylinos.cn>

This series proposes adding eBPF support to the Linux memory
controller, enabling dynamic and extensible memory management
policies at runtime.

Background

The memory controller (memcg) currently provides fixed memory
accounting and reclamation policies through static kernel code.
This limits flexibility for specialized workloads and use cases
that require custom memory management strategies.

By enabling eBPF programs to hook into key memory control
operations, administrators can implement custom policies without
recompiling the kernel, while maintaining the safety guarantees
provided by the BPF verifier.

Use Cases

1. Custom memory reclamation strategies for specialized workloads
2. Dynamic memory pressure monitoring and telemetry
3. Memory accounting adjustments based on runtime conditions
4. Integration with container orchestration systems for
   intelligent resource management
5. Research and experimentation with novel memory management
   algorithms

Design Overview

This series introduces:

1. A new BPF struct ops type (`memcg_ops`) that allows eBPF
   programs to implement custom behavior for memory charging
   operations.

2. A hook point in the `try_charge_memcg()` fast path that
   invokes registered eBPF programs to determine if custom
   memory management should be applied.

3. The eBPF handler can inspect memory cgroup context and
   optionally modify certain parameters (e.g., `nr_pages` for
   reclamation size).

4. A reference counting mechanism using `percpu_ref` to safely
   manage the lifecycle of registered eBPF struct ops instances.

5. Configuration via `CONFIG_MEMCG_BPF` to allow disabling this
   feature at build time.

Implementation Details

- Uses BPF struct ops for a cleaner integration model
- Leverages static branch keys for minimal overhead when feature
  is unused
- RCU synchronization ensures safe replacement of handlers
- Sample eBPF program demonstrates monitoring capabilities
- Comprehensive selftest suite validates core functionality

Performance Considerations

- Zero overhead when feature is disabled or no eBPF program is
  loaded (static branch is disabled)
- Minimal overhead when enabled: one indirect function call per
  charge attempt
- eBPF programs run under the restrictions of the BPF verifier

Patch Overview

PATCH 1/3: Core kernel implementation
  - Adds eBPF struct ops support to memcg
  - Introduces CONFIG_MEMCG_BPF option
  - Implements safe registration/unregistration mechanism

PATCH 2/3: Selftest suite
  - prog_tests/memcg_ops.c: Test entry points
  - progs/memcg_ops.bpf.c: Test eBPF program
  - Validates load, attach, and single-handler constraints

PATCH 3/3: Sample userspace program
  - samples/bpf/memcg_printk.bpf.c: Monitoring eBPF program
  - samples/bpf/memcg_printk.c: Userspace loader
  - Demonstrates real-world usage and debugging capabilities

Open Questions & Discussion Points

1. Should the eBPF handler have access to additional memory
   cgroup state? Current design exposes minimal context to
   reduce attack surface.

2. Are there other memory control operations that would benefit
   from eBPF extensibility (e.g., uncharge, reclaim)?

3. Should there be permission checks or restrictions on who can
   load memcg eBPF programs? Currently inherits BPF's
   CAP_PERFMON/CAP_SYS_ADMIN requirements.

4. How should we handle multiple eBPF programs trying to
   register? Current implementation allows only one active
   handler.

5. Is the current exposed context in `try_charge_memcg` struct
   sufficient, or should additional fields be added?

Testing

The selftests provide comprehensive coverage of the core
functionality. The sample program can be used for manual
testing and as a reference for implementing additional
monitoring tools.

Hui Zhu (3):
  memcg: add eBPF struct ops support for memory charging
  selftests/bpf: add memcg eBPF struct ops test
  samples/bpf: add example memcg eBPF program

 MAINTAINERS                                   |   5 +
 init/Kconfig                                  |  38 ++++
 mm/Makefile                                   |   1 +
 mm/memcontrol.c                               |  26 ++-
 mm/memcontrol_bpf.c                           | 200 ++++++++++++++++++
 mm/memcontrol_bpf.h                           | 103 +++++++++
 samples/bpf/Makefile                          |   2 +
 samples/bpf/memcg_printk.bpf.c                |  30 +++
 samples/bpf/memcg_printk.c                    |  82 +++++++
 .../selftests/bpf/prog_tests/memcg_ops.c      | 117 ++++++++++
 tools/testing/selftests/bpf/progs/memcg_ops.c |  20 ++
 11 files changed, 617 insertions(+), 7 deletions(-)
 create mode 100644 mm/memcontrol_bpf.c
 create mode 100644 mm/memcontrol_bpf.h
 create mode 100644 samples/bpf/memcg_printk.bpf.c
 create mode 100644 samples/bpf/memcg_printk.c
 create mode 100644 tools/testing/selftests/bpf/prog_tests/memcg_ops.c
 create mode 100644 tools/testing/selftests/bpf/progs/memcg_ops.c

-- 
2.43.0
Re: [RFC PATCH 0/3] Memory Controller eBPF support
Posted by Roman Gushchin 1 week, 4 days ago
Hui Zhu <hui.zhu@linux.dev> writes:

> From: Hui Zhu <zhuhui@kylinos.cn>
>
> This series proposes adding eBPF support to the Linux memory
> controller, enabling dynamic and extensible memory management
> policies at runtime.
>
> Background
>
> The memory controller (memcg) currently provides fixed memory
> accounting and reclamation policies through static kernel code.
> This limits flexibility for specialized workloads and use cases
> that require custom memory management strategies.
>
> By enabling eBPF programs to hook into key memory control
> operations, administrators can implement custom policies without
> recompiling the kernel, while maintaining the safety guarantees
> provided by the BPF verifier.
>
> Use Cases
>
> 1. Custom memory reclamation strategies for specialized workloads
> 2. Dynamic memory pressure monitoring and telemetry
> 3. Memory accounting adjustments based on runtime conditions
> 4. Integration with container orchestration systems for
>    intelligent resource management
> 5. Research and experimentation with novel memory management
>    algorithms
>
> Design Overview
>
> This series introduces:
>
> 1. A new BPF struct ops type (`memcg_ops`) that allows eBPF
>    programs to implement custom behavior for memory charging
>    operations.
>
> 2. A hook point in the `try_charge_memcg()` fast path that
>    invokes registered eBPF programs to determine if custom
>    memory management should be applied.
>
> 3. The eBPF handler can inspect memory cgroup context and
>    optionally modify certain parameters (e.g., `nr_pages` for
>    reclamation size).
>
> 4. A reference counting mechanism using `percpu_ref` to safely
>    manage the lifecycle of registered eBPF struct ops instances.

Can you please describe how these hooks will be used in practice?
What's the problem you can solve with it and can't without?

I generally agree with an idea to use BPF for various memcg-related
policies, but I'm not sure how specific callbacks can be used in
practice.

Thanks!
Re: [RFC PATCH 0/3] Memory Controller eBPF support
Posted by hui.zhu@linux.dev 1 week, 4 days ago
2025年11月20日 11:04, "Roman Gushchin" <roman.gushchin@linux.dev mailto:roman.gushchin@linux.dev?to=%22Roman%20Gushchin%22%20%3Croman.gushchin%40linux.dev%3E > 写到:


> 
> Hui Zhu <hui.zhu@linux.dev> writes:
> 
> > 
> > From: Hui Zhu <zhuhui@kylinos.cn>
> > 
> >  This series proposes adding eBPF support to the Linux memory
> >  controller, enabling dynamic and extensible memory management
> >  policies at runtime.
> > 
> >  Background
> > 
> >  The memory controller (memcg) currently provides fixed memory
> >  accounting and reclamation policies through static kernel code.
> >  This limits flexibility for specialized workloads and use cases
> >  that require custom memory management strategies.
> > 
> >  By enabling eBPF programs to hook into key memory control
> >  operations, administrators can implement custom policies without
> >  recompiling the kernel, while maintaining the safety guarantees
> >  provided by the BPF verifier.
> > 
> >  Use Cases
> > 
> >  1. Custom memory reclamation strategies for specialized workloads
> >  2. Dynamic memory pressure monitoring and telemetry
> >  3. Memory accounting adjustments based on runtime conditions
> >  4. Integration with container orchestration systems for
> >  intelligent resource management
> >  5. Research and experimentation with novel memory management
> >  algorithms
> > 
> >  Design Overview
> > 
> >  This series introduces:
> > 
> >  1. A new BPF struct ops type (`memcg_ops`) that allows eBPF
> >  programs to implement custom behavior for memory charging
> >  operations.
> > 
> >  2. A hook point in the `try_charge_memcg()` fast path that
> >  invokes registered eBPF programs to determine if custom
> >  memory management should be applied.
> > 
> >  3. The eBPF handler can inspect memory cgroup context and
> >  optionally modify certain parameters (e.g., `nr_pages` for
> >  reclamation size).
> > 
> >  4. A reference counting mechanism using `percpu_ref` to safely
> >  manage the lifecycle of registered eBPF struct ops instances.
> > 
> Can you please describe how these hooks will be used in practice?
> What's the problem you can solve with it and can't without?
> 
> I generally agree with an idea to use BPF for various memcg-related
> policies, but I'm not sure how specific callbacks can be used in
> practice.

Hi Roman,

Following are some ideas that can use ebpf memcg:

Priority‑Based Reclaim and Limits in Multi‑Tenant Environments:
On a single machine with multiple tenants / namespaces / containers,
under memory pressure it’s hard to decide “who should be squeezed first”
with static policies baked into the kernel.
Assign a BPF profile to each tenant’s memcg:
Under high global pressure, BPF can decide:
Which memcgs’ memory.high should be raised (delaying reclaim),
Which memcgs should be scanned and reclaimed more aggressively.

Online Profiling / Diagnosing Memory Hotspots:
A cgroup’s memory keeps growing, but without patching the kernel it’s
difficult to obtain fine‑grained information.
Attach BPF to the memcg charge/uncharge path:
Record large allocations (greater than N KB) with call stacks and
owning file/module, and send them to user space via a BPF ring buffer.
Based on sampled data, generate:
“Top N memory allocation stacks in this container over the last 10 minutes,”
Reports of which objects / call paths are growing fastest.
This makes it possible to pinpoint the root cause of host memory
anomalies without changing application code, which is very useful
in operations/ops scenarios.

SLO‑Driven Auto Throttling / Scale‑In/Out Signals:
Use eBPF to observe memory usage slope, frequent reclaim,
or near‑OOM behavior within a memcg.
When it decides “OOM is imminent,” instead of just killing/raising
limits, it can emit a signal to a control‑plane component.
For example, send an event to a user‑space agent to trigger
automatic scaling, QPS adjustment, or throttling.

Prevent a cgroup from launching a large‑scale fork+malloc attack:
BPF checks per‑uid or per‑cgroup allocation behavior over the
last few seconds during memcg charge.

And I maintain a software project, https://github.com/teawater/mem-agent,
for specialized memory management and related functions.
However, I found that implementing certain memory QoS categories
for memcg solely from user space is rather inefficient,
as it requires frequent access to values within memcg.
This is why I want memcg to support eBPF—so that I can place
custom memory management logic directly into the kernel using eBPF,
greatly improving efficiency.

Best,
Hui

> 
> Thanks!
>
Re: [RFC PATCH 0/3] Memory Controller eBPF support
Posted by Michal Hocko 1 week, 4 days ago
On Thu 20-11-25 09:29:52, hui.zhu@linux.dev wrote:
[...]
> > I generally agree with an idea to use BPF for various memcg-related
> > policies, but I'm not sure how specific callbacks can be used in
> > practice.
> 
> Hi Roman,
> 
> Following are some ideas that can use ebpf memcg:
> 
> Priority‑Based Reclaim and Limits in Multi‑Tenant Environments:
> On a single machine with multiple tenants / namespaces / containers,
> under memory pressure it’s hard to decide “who should be squeezed first”
> with static policies baked into the kernel.
> Assign a BPF profile to each tenant’s memcg:
> Under high global pressure, BPF can decide:
> Which memcgs’ memory.high should be raised (delaying reclaim),
> Which memcgs should be scanned and reclaimed more aggressively.
> 
> Online Profiling / Diagnosing Memory Hotspots:
> A cgroup’s memory keeps growing, but without patching the kernel it’s
> difficult to obtain fine‑grained information.
> Attach BPF to the memcg charge/uncharge path:
> Record large allocations (greater than N KB) with call stacks and
> owning file/module, and send them to user space via a BPF ring buffer.
> Based on sampled data, generate:
> “Top N memory allocation stacks in this container over the last 10 minutes,”
> Reports of which objects / call paths are growing fastest.
> This makes it possible to pinpoint the root cause of host memory
> anomalies without changing application code, which is very useful
> in operations/ops scenarios.
> 
> SLO‑Driven Auto Throttling / Scale‑In/Out Signals:
> Use eBPF to observe memory usage slope, frequent reclaim,
> or near‑OOM behavior within a memcg.
> When it decides “OOM is imminent,” instead of just killing/raising
> limits, it can emit a signal to a control‑plane component.
> For example, send an event to a user‑space agent to trigger
> automatic scaling, QPS adjustment, or throttling.
> 
> Prevent a cgroup from launching a large‑scale fork+malloc attack:
> BPF checks per‑uid or per‑cgroup allocation behavior over the
> last few seconds during memcg charge.

AFAIU, these are just very high level ideas rather than anything you are
trying to target with this patch series, right?

All I can see is that you add a reclaim hook but it is not really clear
to me how feasible it is to actually implement a real memory reclaim
strategy this way.

In prinicipal I am not really opposed but the memory reclaim process is
rather involved process and I would really like to see there is
something real to be done without exporting all the MM code to BPF for
any practical use. Is there any POC out there?
-- 
Michal Hocko
SUSE Labs
Re: [RFC PATCH 0/3] Memory Controller eBPF support
Posted by hui.zhu@linux.dev 1 week, 3 days ago
2025年11月21日 03:20, "Michal Hocko" <mhocko@suse.com mailto:mhocko@suse.com?to=%22Michal%20Hocko%22%20%3Cmhocko%40suse.com%3E > 写到:


> 
> On Thu 20-11-25 09:29:52, hui.zhu@linux.dev wrote:
> [...]
> 
> > 
> > I generally agree with an idea to use BPF for various memcg-related
> >  policies, but I'm not sure how specific callbacks can be used in
> >  practice.
> >  
> >  Hi Roman,
> >  
> >  Following are some ideas that can use ebpf memcg:
> >  
> >  Priority‑Based Reclaim and Limits in Multi‑Tenant Environments:
> >  On a single machine with multiple tenants / namespaces / containers,
> >  under memory pressure it’s hard to decide “who should be squeezed first”
> >  with static policies baked into the kernel.
> >  Assign a BPF profile to each tenant’s memcg:
> >  Under high global pressure, BPF can decide:
> >  Which memcgs’ memory.high should be raised (delaying reclaim),
> >  Which memcgs should be scanned and reclaimed more aggressively.
> >  
> >  Online Profiling / Diagnosing Memory Hotspots:
> >  A cgroup’s memory keeps growing, but without patching the kernel it’s
> >  difficult to obtain fine‑grained information.
> >  Attach BPF to the memcg charge/uncharge path:
> >  Record large allocations (greater than N KB) with call stacks and
> >  owning file/module, and send them to user space via a BPF ring buffer.
> >  Based on sampled data, generate:
> >  “Top N memory allocation stacks in this container over the last 10 minutes,”
> >  Reports of which objects / call paths are growing fastest.
> >  This makes it possible to pinpoint the root cause of host memory
> >  anomalies without changing application code, which is very useful
> >  in operations/ops scenarios.
> >  
> >  SLO‑Driven Auto Throttling / Scale‑In/Out Signals:
> >  Use eBPF to observe memory usage slope, frequent reclaim,
> >  or near‑OOM behavior within a memcg.
> >  When it decides “OOM is imminent,” instead of just killing/raising
> >  limits, it can emit a signal to a control‑plane component.
> >  For example, send an event to a user‑space agent to trigger
> >  automatic scaling, QPS adjustment, or throttling.
> >  
> >  Prevent a cgroup from launching a large‑scale fork+malloc attack:
> >  BPF checks per‑uid or per‑cgroup allocation behavior over the
> >  last few seconds during memcg charge.
> > 
> AFAIU, these are just very high level ideas rather than anything you are
> trying to target with this patch series, right?
> 
> All I can see is that you add a reclaim hook but it is not really clear
> to me how feasible it is to actually implement a real memory reclaim
> strategy this way.
> 
> In prinicipal I am not really opposed but the memory reclaim process is
> rather involved process and I would really like to see there is
> something real to be done without exporting all the MM code to BPF for
> any practical use. Is there any POC out there?

Hi Michal,

I apologize for not delivering a more substantial POC.

I was hesitant to add extensive eBPF support to memcg
because I wasn't certain it aligned with the community's
vision—and such support would require introducing many
eBPF hooks into memcg.

I will add more eBPF hook to memcg and provide a more
meaningful POC in the next version.

Best,
Hui


> -- 
> Michal Hocko
> SUSE Labs
>
Re: [RFC PATCH 0/3] Memory Controller eBPF support
Posted by Michal Hocko 6 days, 14 hours ago
On Fri 21-11-25 02:46:31, hui.zhu@linux.dev wrote:
> 2025年11月21日 03:20, "Michal Hocko" <mhocko@suse.com mailto:mhocko@suse.com?to=%22Michal%20Hocko%22%20%3Cmhocko%40suse.com%3E > 写到:
> 
> 
> > 
> > On Thu 20-11-25 09:29:52, hui.zhu@linux.dev wrote:
> > [...]
> > 
> > > 
> > > I generally agree with an idea to use BPF for various memcg-related
> > >  policies, but I'm not sure how specific callbacks can be used in
> > >  practice.
> > >  
> > >  Hi Roman,
> > >  
> > >  Following are some ideas that can use ebpf memcg:
> > >  
> > >  Priority‑Based Reclaim and Limits in Multi‑Tenant Environments:
> > >  On a single machine with multiple tenants / namespaces / containers,
> > >  under memory pressure it’s hard to decide “who should be squeezed first”
> > >  with static policies baked into the kernel.
> > >  Assign a BPF profile to each tenant’s memcg:
> > >  Under high global pressure, BPF can decide:
> > >  Which memcgs’ memory.high should be raised (delaying reclaim),
> > >  Which memcgs should be scanned and reclaimed more aggressively.
> > >  
> > >  Online Profiling / Diagnosing Memory Hotspots:
> > >  A cgroup’s memory keeps growing, but without patching the kernel it’s
> > >  difficult to obtain fine‑grained information.
> > >  Attach BPF to the memcg charge/uncharge path:
> > >  Record large allocations (greater than N KB) with call stacks and
> > >  owning file/module, and send them to user space via a BPF ring buffer.
> > >  Based on sampled data, generate:
> > >  “Top N memory allocation stacks in this container over the last 10 minutes,”
> > >  Reports of which objects / call paths are growing fastest.
> > >  This makes it possible to pinpoint the root cause of host memory
> > >  anomalies without changing application code, which is very useful
> > >  in operations/ops scenarios.
> > >  
> > >  SLO‑Driven Auto Throttling / Scale‑In/Out Signals:
> > >  Use eBPF to observe memory usage slope, frequent reclaim,
> > >  or near‑OOM behavior within a memcg.
> > >  When it decides “OOM is imminent,” instead of just killing/raising
> > >  limits, it can emit a signal to a control‑plane component.
> > >  For example, send an event to a user‑space agent to trigger
> > >  automatic scaling, QPS adjustment, or throttling.
> > >  
> > >  Prevent a cgroup from launching a large‑scale fork+malloc attack:
> > >  BPF checks per‑uid or per‑cgroup allocation behavior over the
> > >  last few seconds during memcg charge.
> > > 
> > AFAIU, these are just very high level ideas rather than anything you are
> > trying to target with this patch series, right?
> > 
> > All I can see is that you add a reclaim hook but it is not really clear
> > to me how feasible it is to actually implement a real memory reclaim
> > strategy this way.
> > 
> > In prinicipal I am not really opposed but the memory reclaim process is
> > rather involved process and I would really like to see there is
> > something real to be done without exporting all the MM code to BPF for
> > any practical use. Is there any POC out there?
> 
> Hi Michal,
> 
> I apologize for not delivering a more substantial POC.
> 
> I was hesitant to add extensive eBPF support to memcg
> because I wasn't certain it aligned with the community's
> vision—and such support would require introducing many
> eBPF hooks into memcg.
> 
> I will add more eBPF hook to memcg and provide a more
> meaningful POC in the next version.

Just to make sure we are on the same page. I am not suggesting we need
more of those hooks. I just want to see how many do we really need in
order to have a sensible eBPF driven reclaim policy which seems to be
the main usecase you want to puruse, right?
-- 
Michal Hocko
SUSE Labs
Re: [RFC PATCH 0/3] Memory Controller eBPF support
Posted by hui.zhu@linux.dev 6 days, 13 hours ago
2025年11月25日 20:12, "Michal Hocko" <mhocko@suse.com mailto:mhocko@suse.com?to=%22Michal%20Hocko%22%20%3Cmhocko%40suse.com%3E > 写到:


> 
> On Fri 21-11-25 02:46:31, hui.zhu@linux.dev wrote:
> 
> > 
> > 2025年11月21日 03:20, "Michal Hocko" <mhocko@suse.com mailto:mhocko@suse.com?to=%22Michal%20Hocko%22%20%3Cmhocko%40suse.com%3E > 写到:
> >  
> >  
> >  
> >  On Thu 20-11-25 09:29:52, hui.zhu@linux.dev wrote:
> >  [...]
> >  
> >  > 
> >  > I generally agree with an idea to use BPF for various memcg-related
> >  > policies, but I'm not sure how specific callbacks can be used in
> >  > practice.
> >  > 
> >  > Hi Roman,
> >  > 
> >  > Following are some ideas that can use ebpf memcg:
> >  > 
> >  > Priority‑Based Reclaim and Limits in Multi‑Tenant Environments:
> >  > On a single machine with multiple tenants / namespaces / containers,
> >  > under memory pressure it’s hard to decide “who should be squeezed first”
> >  > with static policies baked into the kernel.
> >  > Assign a BPF profile to each tenant’s memcg:
> >  > Under high global pressure, BPF can decide:
> >  > Which memcgs’ memory.high should be raised (delaying reclaim),
> >  > Which memcgs should be scanned and reclaimed more aggressively.
> >  > 
> >  > Online Profiling / Diagnosing Memory Hotspots:
> >  > A cgroup’s memory keeps growing, but without patching the kernel it’s
> >  > difficult to obtain fine‑grained information.
> >  > Attach BPF to the memcg charge/uncharge path:
> >  > Record large allocations (greater than N KB) with call stacks and
> >  > owning file/module, and send them to user space via a BPF ring buffer.
> >  > Based on sampled data, generate:
> >  > “Top N memory allocation stacks in this container over the last 10 minutes,”
> >  > Reports of which objects / call paths are growing fastest.
> >  > This makes it possible to pinpoint the root cause of host memory
> >  > anomalies without changing application code, which is very useful
> >  > in operations/ops scenarios.
> >  > 
> >  > SLO‑Driven Auto Throttling / Scale‑In/Out Signals:
> >  > Use eBPF to observe memory usage slope, frequent reclaim,
> >  > or near‑OOM behavior within a memcg.
> >  > When it decides “OOM is imminent,” instead of just killing/raising
> >  > limits, it can emit a signal to a control‑plane component.
> >  > For example, send an event to a user‑space agent to trigger
> >  > automatic scaling, QPS adjustment, or throttling.
> >  > 
> >  > Prevent a cgroup from launching a large‑scale fork+malloc attack:
> >  > BPF checks per‑uid or per‑cgroup allocation behavior over the
> >  > last few seconds during memcg charge.
> >  > 
> >  AFAIU, these are just very high level ideas rather than anything you are
> >  trying to target with this patch series, right?
> >  
> >  All I can see is that you add a reclaim hook but it is not really clear
> >  to me how feasible it is to actually implement a real memory reclaim
> >  strategy this way.
> >  
> >  In prinicipal I am not really opposed but the memory reclaim process is
> >  rather involved process and I would really like to see there is
> >  something real to be done without exporting all the MM code to BPF for
> >  any practical use. Is there any POC out there?
> >  
> >  Hi Michal,
> >  
> >  I apologize for not delivering a more substantial POC.
> >  
> >  I was hesitant to add extensive eBPF support to memcg
> >  because I wasn't certain it aligned with the community's
> >  vision—and such support would require introducing many
> >  eBPF hooks into memcg.
> >  
> >  I will add more eBPF hook to memcg and provide a more
> >  meaningful POC in the next version.
> > 
> Just to make sure we are on the same page. I am not suggesting we need
> more of those hooks. I just want to see how many do we really need in
> order to have a sensible eBPF driven reclaim policy which seems to be
> the main usecase you want to puruse, right?

I got your point.

My goal is implement dynamic memory reclamation for memcgs without limits,
triggered by specific conditions.

For instance, with memcg A and memcg B both unlimited, when memcg A faces
high PSI pressure, ebpf control memcg B do some memory reclaim work when
it try charge.

Best,
Hui

> -- 
> Michal Hocko
> SUSE Labs
>
Re: [RFC PATCH 0/3] Memory Controller eBPF support
Posted by Michal Hocko 6 days, 13 hours ago
On Tue 25-11-25 12:39:11, hui.zhu@linux.dev wrote:
> My goal is implement dynamic memory reclamation for memcgs without limits,
> triggered by specific conditions.
> 
> For instance, with memcg A and memcg B both unlimited, when memcg A faces
> high PSI pressure, ebpf control memcg B do some memory reclaim work when
> it try charge.

Understood. Please also think whether this is already possible with
existing interfaces and if not what are roadblocks in that direction.

Thanks!
-- 
Michal Hocko
SUSE Labs
Re: [RFC PATCH 0/3] Memory Controller eBPF support
Posted by hui.zhu@linux.dev 5 days, 23 hours ago
2025年11月25日 20:55, "Michal Hocko" <mhocko@suse.com mailto:mhocko@suse.com?to=%22Michal%20Hocko%22%20%3Cmhocko%40suse.com%3E > 写到:


> 
> On Tue 25-11-25 12:39:11, hui.zhu@linux.dev wrote:
> 
> > 
> > My goal is implement dynamic memory reclamation for memcgs without limits,
> >  triggered by specific conditions.
> >  
> >  For instance, with memcg A and memcg B both unlimited, when memcg A faces
> >  high PSI pressure, ebpf control memcg B do some memory reclaim work when
> >  it try charge.
> > 
> Understood. Please also think whether this is already possible with
> existing interfaces and if not what are roadblocks in that direction.

I think it's possible to implement a userspace program using the existing
PSI userspace interfaces and the control interfaces provided by memcg to
accomplish this task.
However, this approach has several limitations:
the entire process depends on the continuous execution of the userspace
program, response latency is higher, and we cannot perform fine-grained
operations on target memcg.

Now that Roman has provided PSI eBPF functionality at
https://lore.kernel.org/lkml/20251027231727.472628-1-roman.gushchin@linux.dev/
Maybe we could add eBPF support to memcg as well, allowing us to implement
the entire functionality directly in the kernel through eBPF.

Best,
Hui

> 
> Thanks!
> -- 
> Michal Hocko
> SUSE Labs
>
Re: [RFC PATCH 0/3] Memory Controller eBPF support
Posted by Michal Hocko 5 days, 10 hours ago
On Wed 26-11-25 03:05:32, hui.zhu@linux.dev wrote:
> 2025年11月25日 20:55, "Michal Hocko" <mhocko@suse.com mailto:mhocko@suse.com?to=%22Michal%20Hocko%22%20%3Cmhocko%40suse.com%3E > 写到:
> 
> 
> > 
> > On Tue 25-11-25 12:39:11, hui.zhu@linux.dev wrote:
> > 
> > > 
> > > My goal is implement dynamic memory reclamation for memcgs without limits,
> > >  triggered by specific conditions.
> > >  
> > >  For instance, with memcg A and memcg B both unlimited, when memcg A faces
> > >  high PSI pressure, ebpf control memcg B do some memory reclaim work when
> > >  it try charge.
> > > 
> > Understood. Please also think whether this is already possible with
> > existing interfaces and if not what are roadblocks in that direction.
> 
> I think it's possible to implement a userspace program using the existing
> PSI userspace interfaces and the control interfaces provided by memcg to
> accomplish this task.
> However, this approach has several limitations:
> the entire process depends on the continuous execution of the userspace
> program, response latency is higher, and we cannot perform fine-grained
> operations on target memcg.

I will need to back these arguments by some actual numbers.

> Now that Roman has provided PSI eBPF functionality at
> https://lore.kernel.org/lkml/20251027231727.472628-1-roman.gushchin@linux.dev/
> Maybe we could add eBPF support to memcg as well, allowing us to implement
> the entire functionality directly in the kernel through eBPF.

His usecase is very specific to OOM handling and we have agreed that
this specific usecase is really tricky to achieve from userspace. I
haven't see sound arguments for this usecase yet.
-- 
Michal Hocko
SUSE Labs
Re: [RFC PATCH 0/3] Memory Controller eBPF support
Posted by hui.zhu@linux.dev 4 days, 17 hours ago
2025年11月27日 00:01, "Michal Hocko" <mhocko@suse.com mailto:mhocko@suse.com?to=%22Michal%20Hocko%22%20%3Cmhocko%40suse.com%3E > 写到:


> 
> On Wed 26-11-25 03:05:32, hui.zhu@linux.dev wrote:
> 
> > 
> > 2025年11月25日 20:55, "Michal Hocko" <mhocko@suse.com mailto:mhocko@suse.com?to=%22Michal%20Hocko%22%20%3Cmhocko%40suse.com%3E > 写到:
> >  
> >  
> >  
> >  On Tue 25-11-25 12:39:11, hui.zhu@linux.dev wrote:
> >  
> >  > 
> >  > My goal is implement dynamic memory reclamation for memcgs without limits,
> >  > triggered by specific conditions.
> >  > 
> >  > For instance, with memcg A and memcg B both unlimited, when memcg A faces
> >  > high PSI pressure, ebpf control memcg B do some memory reclaim work when
> >  > it try charge.
> >  > 
> >  Understood. Please also think whether this is already possible with
> >  existing interfaces and if not what are roadblocks in that direction.
> >  
> >  I think it's possible to implement a userspace program using the existing
> >  PSI userspace interfaces and the control interfaces provided by memcg to
> >  accomplish this task.
> >  However, this approach has several limitations:
> >  the entire process depends on the continuous execution of the userspace
> >  program, response latency is higher, and we cannot perform fine-grained
> >  operations on target memcg.
> > 
> I will need to back these arguments by some actual numbers.

Agree – I’ll implement a PoC show it.

Best,
Hui

> 
> > 
> > Now that Roman has provided PSI eBPF functionality at
> >  https://lore.kernel.org/lkml/20251027231727.472628-1-roman.gushchin@linux.dev/
> >  Maybe we could add eBPF support to memcg as well, allowing us to implement
> >  the entire functionality directly in the kernel through eBPF.
> > 
> His usecase is very specific to OOM handling and we have agreed that
> this specific usecase is really tricky to achieve from userspace. I
> haven't see sound arguments for this usecase yet.
> -- 
> Michal Hocko
> SUSE Labs
>