MAINTAINERS | 5 + init/Kconfig | 38 ++++ mm/Makefile | 1 + mm/memcontrol.c | 26 ++- mm/memcontrol_bpf.c | 200 ++++++++++++++++++ mm/memcontrol_bpf.h | 103 +++++++++ samples/bpf/Makefile | 2 + samples/bpf/memcg_printk.bpf.c | 30 +++ samples/bpf/memcg_printk.c | 82 +++++++ .../selftests/bpf/prog_tests/memcg_ops.c | 117 ++++++++++ tools/testing/selftests/bpf/progs/memcg_ops.c | 20 ++ 11 files changed, 617 insertions(+), 7 deletions(-) create mode 100644 mm/memcontrol_bpf.c create mode 100644 mm/memcontrol_bpf.h create mode 100644 samples/bpf/memcg_printk.bpf.c create mode 100644 samples/bpf/memcg_printk.c create mode 100644 tools/testing/selftests/bpf/prog_tests/memcg_ops.c create mode 100644 tools/testing/selftests/bpf/progs/memcg_ops.c
From: Hui Zhu <zhuhui@kylinos.cn> This series proposes adding eBPF support to the Linux memory controller, enabling dynamic and extensible memory management policies at runtime. Background The memory controller (memcg) currently provides fixed memory accounting and reclamation policies through static kernel code. This limits flexibility for specialized workloads and use cases that require custom memory management strategies. By enabling eBPF programs to hook into key memory control operations, administrators can implement custom policies without recompiling the kernel, while maintaining the safety guarantees provided by the BPF verifier. Use Cases 1. Custom memory reclamation strategies for specialized workloads 2. Dynamic memory pressure monitoring and telemetry 3. Memory accounting adjustments based on runtime conditions 4. Integration with container orchestration systems for intelligent resource management 5. Research and experimentation with novel memory management algorithms Design Overview This series introduces: 1. A new BPF struct ops type (`memcg_ops`) that allows eBPF programs to implement custom behavior for memory charging operations. 2. A hook point in the `try_charge_memcg()` fast path that invokes registered eBPF programs to determine if custom memory management should be applied. 3. The eBPF handler can inspect memory cgroup context and optionally modify certain parameters (e.g., `nr_pages` for reclamation size). 4. A reference counting mechanism using `percpu_ref` to safely manage the lifecycle of registered eBPF struct ops instances. 5. Configuration via `CONFIG_MEMCG_BPF` to allow disabling this feature at build time. Implementation Details - Uses BPF struct ops for a cleaner integration model - Leverages static branch keys for minimal overhead when feature is unused - RCU synchronization ensures safe replacement of handlers - Sample eBPF program demonstrates monitoring capabilities - Comprehensive selftest suite validates core functionality Performance Considerations - Zero overhead when feature is disabled or no eBPF program is loaded (static branch is disabled) - Minimal overhead when enabled: one indirect function call per charge attempt - eBPF programs run under the restrictions of the BPF verifier Patch Overview PATCH 1/3: Core kernel implementation - Adds eBPF struct ops support to memcg - Introduces CONFIG_MEMCG_BPF option - Implements safe registration/unregistration mechanism PATCH 2/3: Selftest suite - prog_tests/memcg_ops.c: Test entry points - progs/memcg_ops.bpf.c: Test eBPF program - Validates load, attach, and single-handler constraints PATCH 3/3: Sample userspace program - samples/bpf/memcg_printk.bpf.c: Monitoring eBPF program - samples/bpf/memcg_printk.c: Userspace loader - Demonstrates real-world usage and debugging capabilities Open Questions & Discussion Points 1. Should the eBPF handler have access to additional memory cgroup state? Current design exposes minimal context to reduce attack surface. 2. Are there other memory control operations that would benefit from eBPF extensibility (e.g., uncharge, reclaim)? 3. Should there be permission checks or restrictions on who can load memcg eBPF programs? Currently inherits BPF's CAP_PERFMON/CAP_SYS_ADMIN requirements. 4. How should we handle multiple eBPF programs trying to register? Current implementation allows only one active handler. 5. Is the current exposed context in `try_charge_memcg` struct sufficient, or should additional fields be added? Testing The selftests provide comprehensive coverage of the core functionality. The sample program can be used for manual testing and as a reference for implementing additional monitoring tools. Hui Zhu (3): memcg: add eBPF struct ops support for memory charging selftests/bpf: add memcg eBPF struct ops test samples/bpf: add example memcg eBPF program MAINTAINERS | 5 + init/Kconfig | 38 ++++ mm/Makefile | 1 + mm/memcontrol.c | 26 ++- mm/memcontrol_bpf.c | 200 ++++++++++++++++++ mm/memcontrol_bpf.h | 103 +++++++++ samples/bpf/Makefile | 2 + samples/bpf/memcg_printk.bpf.c | 30 +++ samples/bpf/memcg_printk.c | 82 +++++++ .../selftests/bpf/prog_tests/memcg_ops.c | 117 ++++++++++ tools/testing/selftests/bpf/progs/memcg_ops.c | 20 ++ 11 files changed, 617 insertions(+), 7 deletions(-) create mode 100644 mm/memcontrol_bpf.c create mode 100644 mm/memcontrol_bpf.h create mode 100644 samples/bpf/memcg_printk.bpf.c create mode 100644 samples/bpf/memcg_printk.c create mode 100644 tools/testing/selftests/bpf/prog_tests/memcg_ops.c create mode 100644 tools/testing/selftests/bpf/progs/memcg_ops.c -- 2.43.0
Hui Zhu <hui.zhu@linux.dev> writes: > From: Hui Zhu <zhuhui@kylinos.cn> > > This series proposes adding eBPF support to the Linux memory > controller, enabling dynamic and extensible memory management > policies at runtime. > > Background > > The memory controller (memcg) currently provides fixed memory > accounting and reclamation policies through static kernel code. > This limits flexibility for specialized workloads and use cases > that require custom memory management strategies. > > By enabling eBPF programs to hook into key memory control > operations, administrators can implement custom policies without > recompiling the kernel, while maintaining the safety guarantees > provided by the BPF verifier. > > Use Cases > > 1. Custom memory reclamation strategies for specialized workloads > 2. Dynamic memory pressure monitoring and telemetry > 3. Memory accounting adjustments based on runtime conditions > 4. Integration with container orchestration systems for > intelligent resource management > 5. Research and experimentation with novel memory management > algorithms > > Design Overview > > This series introduces: > > 1. A new BPF struct ops type (`memcg_ops`) that allows eBPF > programs to implement custom behavior for memory charging > operations. > > 2. A hook point in the `try_charge_memcg()` fast path that > invokes registered eBPF programs to determine if custom > memory management should be applied. > > 3. The eBPF handler can inspect memory cgroup context and > optionally modify certain parameters (e.g., `nr_pages` for > reclamation size). > > 4. A reference counting mechanism using `percpu_ref` to safely > manage the lifecycle of registered eBPF struct ops instances. Can you please describe how these hooks will be used in practice? What's the problem you can solve with it and can't without? I generally agree with an idea to use BPF for various memcg-related policies, but I'm not sure how specific callbacks can be used in practice. Thanks!
2025年11月20日 11:04, "Roman Gushchin" <roman.gushchin@linux.dev mailto:roman.gushchin@linux.dev?to=%22Roman%20Gushchin%22%20%3Croman.gushchin%40linux.dev%3E > 写到: > > Hui Zhu <hui.zhu@linux.dev> writes: > > > > > From: Hui Zhu <zhuhui@kylinos.cn> > > > > This series proposes adding eBPF support to the Linux memory > > controller, enabling dynamic and extensible memory management > > policies at runtime. > > > > Background > > > > The memory controller (memcg) currently provides fixed memory > > accounting and reclamation policies through static kernel code. > > This limits flexibility for specialized workloads and use cases > > that require custom memory management strategies. > > > > By enabling eBPF programs to hook into key memory control > > operations, administrators can implement custom policies without > > recompiling the kernel, while maintaining the safety guarantees > > provided by the BPF verifier. > > > > Use Cases > > > > 1. Custom memory reclamation strategies for specialized workloads > > 2. Dynamic memory pressure monitoring and telemetry > > 3. Memory accounting adjustments based on runtime conditions > > 4. Integration with container orchestration systems for > > intelligent resource management > > 5. Research and experimentation with novel memory management > > algorithms > > > > Design Overview > > > > This series introduces: > > > > 1. A new BPF struct ops type (`memcg_ops`) that allows eBPF > > programs to implement custom behavior for memory charging > > operations. > > > > 2. A hook point in the `try_charge_memcg()` fast path that > > invokes registered eBPF programs to determine if custom > > memory management should be applied. > > > > 3. The eBPF handler can inspect memory cgroup context and > > optionally modify certain parameters (e.g., `nr_pages` for > > reclamation size). > > > > 4. A reference counting mechanism using `percpu_ref` to safely > > manage the lifecycle of registered eBPF struct ops instances. > > > Can you please describe how these hooks will be used in practice? > What's the problem you can solve with it and can't without? > > I generally agree with an idea to use BPF for various memcg-related > policies, but I'm not sure how specific callbacks can be used in > practice. Hi Roman, Following are some ideas that can use ebpf memcg: Priority‑Based Reclaim and Limits in Multi‑Tenant Environments: On a single machine with multiple tenants / namespaces / containers, under memory pressure it’s hard to decide “who should be squeezed first” with static policies baked into the kernel. Assign a BPF profile to each tenant’s memcg: Under high global pressure, BPF can decide: Which memcgs’ memory.high should be raised (delaying reclaim), Which memcgs should be scanned and reclaimed more aggressively. Online Profiling / Diagnosing Memory Hotspots: A cgroup’s memory keeps growing, but without patching the kernel it’s difficult to obtain fine‑grained information. Attach BPF to the memcg charge/uncharge path: Record large allocations (greater than N KB) with call stacks and owning file/module, and send them to user space via a BPF ring buffer. Based on sampled data, generate: “Top N memory allocation stacks in this container over the last 10 minutes,” Reports of which objects / call paths are growing fastest. This makes it possible to pinpoint the root cause of host memory anomalies without changing application code, which is very useful in operations/ops scenarios. SLO‑Driven Auto Throttling / Scale‑In/Out Signals: Use eBPF to observe memory usage slope, frequent reclaim, or near‑OOM behavior within a memcg. When it decides “OOM is imminent,” instead of just killing/raising limits, it can emit a signal to a control‑plane component. For example, send an event to a user‑space agent to trigger automatic scaling, QPS adjustment, or throttling. Prevent a cgroup from launching a large‑scale fork+malloc attack: BPF checks per‑uid or per‑cgroup allocation behavior over the last few seconds during memcg charge. And I maintain a software project, https://github.com/teawater/mem-agent, for specialized memory management and related functions. However, I found that implementing certain memory QoS categories for memcg solely from user space is rather inefficient, as it requires frequent access to values within memcg. This is why I want memcg to support eBPF—so that I can place custom memory management logic directly into the kernel using eBPF, greatly improving efficiency. Best, Hui > > Thanks! >
On Thu 20-11-25 09:29:52, hui.zhu@linux.dev wrote: [...] > > I generally agree with an idea to use BPF for various memcg-related > > policies, but I'm not sure how specific callbacks can be used in > > practice. > > Hi Roman, > > Following are some ideas that can use ebpf memcg: > > Priority‑Based Reclaim and Limits in Multi‑Tenant Environments: > On a single machine with multiple tenants / namespaces / containers, > under memory pressure it’s hard to decide “who should be squeezed first” > with static policies baked into the kernel. > Assign a BPF profile to each tenant’s memcg: > Under high global pressure, BPF can decide: > Which memcgs’ memory.high should be raised (delaying reclaim), > Which memcgs should be scanned and reclaimed more aggressively. > > Online Profiling / Diagnosing Memory Hotspots: > A cgroup’s memory keeps growing, but without patching the kernel it’s > difficult to obtain fine‑grained information. > Attach BPF to the memcg charge/uncharge path: > Record large allocations (greater than N KB) with call stacks and > owning file/module, and send them to user space via a BPF ring buffer. > Based on sampled data, generate: > “Top N memory allocation stacks in this container over the last 10 minutes,” > Reports of which objects / call paths are growing fastest. > This makes it possible to pinpoint the root cause of host memory > anomalies without changing application code, which is very useful > in operations/ops scenarios. > > SLO‑Driven Auto Throttling / Scale‑In/Out Signals: > Use eBPF to observe memory usage slope, frequent reclaim, > or near‑OOM behavior within a memcg. > When it decides “OOM is imminent,” instead of just killing/raising > limits, it can emit a signal to a control‑plane component. > For example, send an event to a user‑space agent to trigger > automatic scaling, QPS adjustment, or throttling. > > Prevent a cgroup from launching a large‑scale fork+malloc attack: > BPF checks per‑uid or per‑cgroup allocation behavior over the > last few seconds during memcg charge. AFAIU, these are just very high level ideas rather than anything you are trying to target with this patch series, right? All I can see is that you add a reclaim hook but it is not really clear to me how feasible it is to actually implement a real memory reclaim strategy this way. In prinicipal I am not really opposed but the memory reclaim process is rather involved process and I would really like to see there is something real to be done without exporting all the MM code to BPF for any practical use. Is there any POC out there? -- Michal Hocko SUSE Labs
2025年11月21日 03:20, "Michal Hocko" <mhocko@suse.com mailto:mhocko@suse.com?to=%22Michal%20Hocko%22%20%3Cmhocko%40suse.com%3E > 写到: > > On Thu 20-11-25 09:29:52, hui.zhu@linux.dev wrote: > [...] > > > > > I generally agree with an idea to use BPF for various memcg-related > > policies, but I'm not sure how specific callbacks can be used in > > practice. > > > > Hi Roman, > > > > Following are some ideas that can use ebpf memcg: > > > > Priority‑Based Reclaim and Limits in Multi‑Tenant Environments: > > On a single machine with multiple tenants / namespaces / containers, > > under memory pressure it’s hard to decide “who should be squeezed first” > > with static policies baked into the kernel. > > Assign a BPF profile to each tenant’s memcg: > > Under high global pressure, BPF can decide: > > Which memcgs’ memory.high should be raised (delaying reclaim), > > Which memcgs should be scanned and reclaimed more aggressively. > > > > Online Profiling / Diagnosing Memory Hotspots: > > A cgroup’s memory keeps growing, but without patching the kernel it’s > > difficult to obtain fine‑grained information. > > Attach BPF to the memcg charge/uncharge path: > > Record large allocations (greater than N KB) with call stacks and > > owning file/module, and send them to user space via a BPF ring buffer. > > Based on sampled data, generate: > > “Top N memory allocation stacks in this container over the last 10 minutes,” > > Reports of which objects / call paths are growing fastest. > > This makes it possible to pinpoint the root cause of host memory > > anomalies without changing application code, which is very useful > > in operations/ops scenarios. > > > > SLO‑Driven Auto Throttling / Scale‑In/Out Signals: > > Use eBPF to observe memory usage slope, frequent reclaim, > > or near‑OOM behavior within a memcg. > > When it decides “OOM is imminent,” instead of just killing/raising > > limits, it can emit a signal to a control‑plane component. > > For example, send an event to a user‑space agent to trigger > > automatic scaling, QPS adjustment, or throttling. > > > > Prevent a cgroup from launching a large‑scale fork+malloc attack: > > BPF checks per‑uid or per‑cgroup allocation behavior over the > > last few seconds during memcg charge. > > > AFAIU, these are just very high level ideas rather than anything you are > trying to target with this patch series, right? > > All I can see is that you add a reclaim hook but it is not really clear > to me how feasible it is to actually implement a real memory reclaim > strategy this way. > > In prinicipal I am not really opposed but the memory reclaim process is > rather involved process and I would really like to see there is > something real to be done without exporting all the MM code to BPF for > any practical use. Is there any POC out there? Hi Michal, I apologize for not delivering a more substantial POC. I was hesitant to add extensive eBPF support to memcg because I wasn't certain it aligned with the community's vision—and such support would require introducing many eBPF hooks into memcg. I will add more eBPF hook to memcg and provide a more meaningful POC in the next version. Best, Hui > -- > Michal Hocko > SUSE Labs >
On Fri 21-11-25 02:46:31, hui.zhu@linux.dev wrote: > 2025年11月21日 03:20, "Michal Hocko" <mhocko@suse.com mailto:mhocko@suse.com?to=%22Michal%20Hocko%22%20%3Cmhocko%40suse.com%3E > 写到: > > > > > > On Thu 20-11-25 09:29:52, hui.zhu@linux.dev wrote: > > [...] > > > > > > > > I generally agree with an idea to use BPF for various memcg-related > > > policies, but I'm not sure how specific callbacks can be used in > > > practice. > > > > > > Hi Roman, > > > > > > Following are some ideas that can use ebpf memcg: > > > > > > Priority‑Based Reclaim and Limits in Multi‑Tenant Environments: > > > On a single machine with multiple tenants / namespaces / containers, > > > under memory pressure it’s hard to decide “who should be squeezed first” > > > with static policies baked into the kernel. > > > Assign a BPF profile to each tenant’s memcg: > > > Under high global pressure, BPF can decide: > > > Which memcgs’ memory.high should be raised (delaying reclaim), > > > Which memcgs should be scanned and reclaimed more aggressively. > > > > > > Online Profiling / Diagnosing Memory Hotspots: > > > A cgroup’s memory keeps growing, but without patching the kernel it’s > > > difficult to obtain fine‑grained information. > > > Attach BPF to the memcg charge/uncharge path: > > > Record large allocations (greater than N KB) with call stacks and > > > owning file/module, and send them to user space via a BPF ring buffer. > > > Based on sampled data, generate: > > > “Top N memory allocation stacks in this container over the last 10 minutes,” > > > Reports of which objects / call paths are growing fastest. > > > This makes it possible to pinpoint the root cause of host memory > > > anomalies without changing application code, which is very useful > > > in operations/ops scenarios. > > > > > > SLO‑Driven Auto Throttling / Scale‑In/Out Signals: > > > Use eBPF to observe memory usage slope, frequent reclaim, > > > or near‑OOM behavior within a memcg. > > > When it decides “OOM is imminent,” instead of just killing/raising > > > limits, it can emit a signal to a control‑plane component. > > > For example, send an event to a user‑space agent to trigger > > > automatic scaling, QPS adjustment, or throttling. > > > > > > Prevent a cgroup from launching a large‑scale fork+malloc attack: > > > BPF checks per‑uid or per‑cgroup allocation behavior over the > > > last few seconds during memcg charge. > > > > > AFAIU, these are just very high level ideas rather than anything you are > > trying to target with this patch series, right? > > > > All I can see is that you add a reclaim hook but it is not really clear > > to me how feasible it is to actually implement a real memory reclaim > > strategy this way. > > > > In prinicipal I am not really opposed but the memory reclaim process is > > rather involved process and I would really like to see there is > > something real to be done without exporting all the MM code to BPF for > > any practical use. Is there any POC out there? > > Hi Michal, > > I apologize for not delivering a more substantial POC. > > I was hesitant to add extensive eBPF support to memcg > because I wasn't certain it aligned with the community's > vision—and such support would require introducing many > eBPF hooks into memcg. > > I will add more eBPF hook to memcg and provide a more > meaningful POC in the next version. Just to make sure we are on the same page. I am not suggesting we need more of those hooks. I just want to see how many do we really need in order to have a sensible eBPF driven reclaim policy which seems to be the main usecase you want to puruse, right? -- Michal Hocko SUSE Labs
2025年11月25日 20:12, "Michal Hocko" <mhocko@suse.com mailto:mhocko@suse.com?to=%22Michal%20Hocko%22%20%3Cmhocko%40suse.com%3E > 写到: > > On Fri 21-11-25 02:46:31, hui.zhu@linux.dev wrote: > > > > > 2025年11月21日 03:20, "Michal Hocko" <mhocko@suse.com mailto:mhocko@suse.com?to=%22Michal%20Hocko%22%20%3Cmhocko%40suse.com%3E > 写到: > > > > > > > > On Thu 20-11-25 09:29:52, hui.zhu@linux.dev wrote: > > [...] > > > > > > > > I generally agree with an idea to use BPF for various memcg-related > > > policies, but I'm not sure how specific callbacks can be used in > > > practice. > > > > > > Hi Roman, > > > > > > Following are some ideas that can use ebpf memcg: > > > > > > Priority‑Based Reclaim and Limits in Multi‑Tenant Environments: > > > On a single machine with multiple tenants / namespaces / containers, > > > under memory pressure it’s hard to decide “who should be squeezed first” > > > with static policies baked into the kernel. > > > Assign a BPF profile to each tenant’s memcg: > > > Under high global pressure, BPF can decide: > > > Which memcgs’ memory.high should be raised (delaying reclaim), > > > Which memcgs should be scanned and reclaimed more aggressively. > > > > > > Online Profiling / Diagnosing Memory Hotspots: > > > A cgroup’s memory keeps growing, but without patching the kernel it’s > > > difficult to obtain fine‑grained information. > > > Attach BPF to the memcg charge/uncharge path: > > > Record large allocations (greater than N KB) with call stacks and > > > owning file/module, and send them to user space via a BPF ring buffer. > > > Based on sampled data, generate: > > > “Top N memory allocation stacks in this container over the last 10 minutes,” > > > Reports of which objects / call paths are growing fastest. > > > This makes it possible to pinpoint the root cause of host memory > > > anomalies without changing application code, which is very useful > > > in operations/ops scenarios. > > > > > > SLO‑Driven Auto Throttling / Scale‑In/Out Signals: > > > Use eBPF to observe memory usage slope, frequent reclaim, > > > or near‑OOM behavior within a memcg. > > > When it decides “OOM is imminent,” instead of just killing/raising > > > limits, it can emit a signal to a control‑plane component. > > > For example, send an event to a user‑space agent to trigger > > > automatic scaling, QPS adjustment, or throttling. > > > > > > Prevent a cgroup from launching a large‑scale fork+malloc attack: > > > BPF checks per‑uid or per‑cgroup allocation behavior over the > > > last few seconds during memcg charge. > > > > > AFAIU, these are just very high level ideas rather than anything you are > > trying to target with this patch series, right? > > > > All I can see is that you add a reclaim hook but it is not really clear > > to me how feasible it is to actually implement a real memory reclaim > > strategy this way. > > > > In prinicipal I am not really opposed but the memory reclaim process is > > rather involved process and I would really like to see there is > > something real to be done without exporting all the MM code to BPF for > > any practical use. Is there any POC out there? > > > > Hi Michal, > > > > I apologize for not delivering a more substantial POC. > > > > I was hesitant to add extensive eBPF support to memcg > > because I wasn't certain it aligned with the community's > > vision—and such support would require introducing many > > eBPF hooks into memcg. > > > > I will add more eBPF hook to memcg and provide a more > > meaningful POC in the next version. > > > Just to make sure we are on the same page. I am not suggesting we need > more of those hooks. I just want to see how many do we really need in > order to have a sensible eBPF driven reclaim policy which seems to be > the main usecase you want to puruse, right? I got your point. My goal is implement dynamic memory reclamation for memcgs without limits, triggered by specific conditions. For instance, with memcg A and memcg B both unlimited, when memcg A faces high PSI pressure, ebpf control memcg B do some memory reclaim work when it try charge. Best, Hui > -- > Michal Hocko > SUSE Labs >
On Tue 25-11-25 12:39:11, hui.zhu@linux.dev wrote: > My goal is implement dynamic memory reclamation for memcgs without limits, > triggered by specific conditions. > > For instance, with memcg A and memcg B both unlimited, when memcg A faces > high PSI pressure, ebpf control memcg B do some memory reclaim work when > it try charge. Understood. Please also think whether this is already possible with existing interfaces and if not what are roadblocks in that direction. Thanks! -- Michal Hocko SUSE Labs
2025年11月25日 20:55, "Michal Hocko" <mhocko@suse.com mailto:mhocko@suse.com?to=%22Michal%20Hocko%22%20%3Cmhocko%40suse.com%3E > 写到: > > On Tue 25-11-25 12:39:11, hui.zhu@linux.dev wrote: > > > > > My goal is implement dynamic memory reclamation for memcgs without limits, > > triggered by specific conditions. > > > > For instance, with memcg A and memcg B both unlimited, when memcg A faces > > high PSI pressure, ebpf control memcg B do some memory reclaim work when > > it try charge. > > > Understood. Please also think whether this is already possible with > existing interfaces and if not what are roadblocks in that direction. I think it's possible to implement a userspace program using the existing PSI userspace interfaces and the control interfaces provided by memcg to accomplish this task. However, this approach has several limitations: the entire process depends on the continuous execution of the userspace program, response latency is higher, and we cannot perform fine-grained operations on target memcg. Now that Roman has provided PSI eBPF functionality at https://lore.kernel.org/lkml/20251027231727.472628-1-roman.gushchin@linux.dev/ Maybe we could add eBPF support to memcg as well, allowing us to implement the entire functionality directly in the kernel through eBPF. Best, Hui > > Thanks! > -- > Michal Hocko > SUSE Labs >
On Wed 26-11-25 03:05:32, hui.zhu@linux.dev wrote: > 2025年11月25日 20:55, "Michal Hocko" <mhocko@suse.com mailto:mhocko@suse.com?to=%22Michal%20Hocko%22%20%3Cmhocko%40suse.com%3E > 写到: > > > > > > On Tue 25-11-25 12:39:11, hui.zhu@linux.dev wrote: > > > > > > > > My goal is implement dynamic memory reclamation for memcgs without limits, > > > triggered by specific conditions. > > > > > > For instance, with memcg A and memcg B both unlimited, when memcg A faces > > > high PSI pressure, ebpf control memcg B do some memory reclaim work when > > > it try charge. > > > > > Understood. Please also think whether this is already possible with > > existing interfaces and if not what are roadblocks in that direction. > > I think it's possible to implement a userspace program using the existing > PSI userspace interfaces and the control interfaces provided by memcg to > accomplish this task. > However, this approach has several limitations: > the entire process depends on the continuous execution of the userspace > program, response latency is higher, and we cannot perform fine-grained > operations on target memcg. I will need to back these arguments by some actual numbers. > Now that Roman has provided PSI eBPF functionality at > https://lore.kernel.org/lkml/20251027231727.472628-1-roman.gushchin@linux.dev/ > Maybe we could add eBPF support to memcg as well, allowing us to implement > the entire functionality directly in the kernel through eBPF. His usecase is very specific to OOM handling and we have agreed that this specific usecase is really tricky to achieve from userspace. I haven't see sound arguments for this usecase yet. -- Michal Hocko SUSE Labs
2025年11月27日 00:01, "Michal Hocko" <mhocko@suse.com mailto:mhocko@suse.com?to=%22Michal%20Hocko%22%20%3Cmhocko%40suse.com%3E > 写到: > > On Wed 26-11-25 03:05:32, hui.zhu@linux.dev wrote: > > > > > 2025年11月25日 20:55, "Michal Hocko" <mhocko@suse.com mailto:mhocko@suse.com?to=%22Michal%20Hocko%22%20%3Cmhocko%40suse.com%3E > 写到: > > > > > > > > On Tue 25-11-25 12:39:11, hui.zhu@linux.dev wrote: > > > > > > > > My goal is implement dynamic memory reclamation for memcgs without limits, > > > triggered by specific conditions. > > > > > > For instance, with memcg A and memcg B both unlimited, when memcg A faces > > > high PSI pressure, ebpf control memcg B do some memory reclaim work when > > > it try charge. > > > > > Understood. Please also think whether this is already possible with > > existing interfaces and if not what are roadblocks in that direction. > > > > I think it's possible to implement a userspace program using the existing > > PSI userspace interfaces and the control interfaces provided by memcg to > > accomplish this task. > > However, this approach has several limitations: > > the entire process depends on the continuous execution of the userspace > > program, response latency is higher, and we cannot perform fine-grained > > operations on target memcg. > > > I will need to back these arguments by some actual numbers. Agree – I’ll implement a PoC show it. Best, Hui > > > > > Now that Roman has provided PSI eBPF functionality at > > https://lore.kernel.org/lkml/20251027231727.472628-1-roman.gushchin@linux.dev/ > > Maybe we could add eBPF support to memcg as well, allowing us to implement > > the entire functionality directly in the kernel through eBPF. > > > His usecase is very specific to OOM handling and we have agreed that > this specific usecase is really tricky to achieve from userspace. I > haven't see sound arguments for this usecase yet. > -- > Michal Hocko > SUSE Labs >
© 2016 - 2025 Red Hat, Inc.