Documentation/admin-guide/cgroup-v2.rst | 10 +++ include/linux/cgroup-defs.h | 7 ++ kernel/cgroup/cgroup.c | 32 ++++++++ kernel/cgroup/dmem.c | 132 ++++++++++++++++++++++++++++++-- mm/memcontrol.c | 42 ++-------- 5 files changed, 183 insertions(+), 40 deletions(-)
Just like we have memory.peak, introduce a dmem.peak, which uses the
page_counter support for that.
It can be written to in order to reset the peak, but different from
memory.peak, which expects any write, dmem.peak expects the region name to
be written to it. That region peak is the one that is reset.
That requires ofp_peak to carry a pointer to the pool that was reset.
Writing a different region name will reset the different region and make
the original region peak get back to its non-reset value.
While at it, we reuse a helper from memcontrol, which we moved to
kernel/cgroup/cgroup.c.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
---
Thadeu Lima de Souza Cascardo (2):
mm/page_counter: decouple peak_reset from peak_write
cgroup/dmem: introduce a peak file
Documentation/admin-guide/cgroup-v2.rst | 10 +++
include/linux/cgroup-defs.h | 7 ++
kernel/cgroup/cgroup.c | 32 ++++++++
kernel/cgroup/dmem.c | 132 ++++++++++++++++++++++++++++++--
mm/memcontrol.c | 42 ++--------
5 files changed, 183 insertions(+), 40 deletions(-)
---
base-commit: 7fd2df204f342fc17d1a0bfcd474b24232fb0f32
change-id: 20260409-dmem_peak-3abc1be95072
Best regards,
--
Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
Hello Thadeu. On Wed, May 06, 2026 at 08:58:23AM -0300, Thadeu Lima de Souza Cascardo <cascardo@igalia.com> wrote: > Just like we have memory.peak, introduce a dmem.peak, which uses the > page_counter support for that. > > It can be written to in order to reset the peak, but different from > memory.peak, which expects any write, dmem.peak expects the region name to > be written to it. That region peak is the one that is reset. > > That requires ofp_peak to carry a pointer to the pool that was reset. (It'd be nicer to have generic data in that generic structure, at least some void *priv. But see below.) > Writing a different region name will reset the different region and make > the original region peak get back to its non-reset value. I'm slightly confused by this fds x pool matricity when there's only a single slot in cgroup_file_ctx::cgroup_of_peak. The intended use case is that users should maintain one fd per pool and not mix it up? This stanza would better fit to cgroup-v2.rst proper than the commit message. Or make it simpler and start with non-resettable peak file (like memory.peak had started too) and see how it fares. WDYT? Thanks, Michal
On Wed, May 06, 2026 at 03:53:59PM +0200, Michal Koutný wrote: > Hello Thadeu. > > On Wed, May 06, 2026 at 08:58:23AM -0300, Thadeu Lima de Souza Cascardo <cascardo@igalia.com> wrote: > > Just like we have memory.peak, introduce a dmem.peak, which uses the > > page_counter support for that. > > > > It can be written to in order to reset the peak, but different from > > memory.peak, which expects any write, dmem.peak expects the region name to > > be written to it. That region peak is the one that is reset. > > > > That requires ofp_peak to carry a pointer to the pool that was reset. > > (It'd be nicer to have generic data in that generic structure, at least > some void *priv. But see below.) > I used void *, at first, but as the only current use is for the pool and as mixing different uses may lead to misuse, I thought it would be safer to use the type directly. This has been pointed out before for other members of cgroup_file_ctx. See [1]. > > Writing a different region name will reset the different region and make > > the original region peak get back to its non-reset value. > > I'm slightly confused by this fds x pool matricity when there's only > a single slot in cgroup_file_ctx::cgroup_of_peak. > > The intended use case is that users should maintain one fd per pool and > not mix it up? > This stanza would better fit to cgroup-v2.rst proper than the commit > message. Or make it simpler and start with non-resettable peak file > (like memory.peak had started too) and see how it fares. WDYT? > That is also due to the limitation that each file descriptor has a single linked list under cgroup_file_ctx::cgroup_of_peak. To allow for all the pools to be reset at once, we would need one list per file descriptor. But, on the other hand, as you pointed out, this leads to the flexibility of being able to reset only one pool, while leaving the others as is. Whereas, if one needs to reset all pools, they can use one file descriptor per region. I started with a non-resettable peak file, but as memory.peak can be reset, I added that feature too. If we want to merge a non-resettable support ealier and need to take longer to discuss how to work on the resettable support given the above, I can resubmit. But I guess we can see if we can reach an agreement sonner rather than later. Thanks. Cascardo. > > Thanks, > Michal [1] https://lore.kernel.org/all/CAHk-=wgiYkECT=hZRKj8ZwfBPw2Uz=gpOGBGd4ny0KYhSsjC0w@mail.gmail.com/
On Wed, May 06, 2026 at 11:18:26AM -0300, Thadeu Lima de Souza Cascardo <cascardo@igalia.com> wrote: > I used void *, at first, but as the only current use is for the pool and as > mixing different uses may lead to misuse, I thought it would be safer to > use the type directly. This has been pointed out before for other members > of cgroup_file_ctx. See [1]. That mail reacts to union overlaps and pointer vs embedded struct allocations. Correct me if I missed your part. I agree that having properly typed pointer is safer. cgroup_file_ctx sub-structs are for generic cgroup files. But here somehow a specific controller needs propagated to the generic member. What about storing also the `list_head *watchers` inside `struct cgroup_of_peak` and each subsys would manage it as needed? (ofp->watchers == NULL could also substitute ofp->value == OFP_PEAK_UNSET) > I started with a non-resettable peak file, but as memory.peak can be reset, > I added that feature too. At the same time pids.peak has survived without reset option till today. > If we want to merge a non-resettable support ealier and need to take > longer to discuss how to work on the resettable support given the > above, I can resubmit. But I guess we can see if we can reach an > agreement sonner rather than later. What kind of users do you envision (i.e. would they need resets at all)? Anyway, the behavior should be explained in cgroup-v2.rst since that's where they'll look for it. HTH, Michal
© 2016 - 2026 Red Hat, Inc.