[PATCH 0/2] cgroup/dmem: introduce a peak file

Thadeu Lima de Souza Cascardo posted 2 patches 1 month, 1 week ago
There is a newer version of this series
Documentation/admin-guide/cgroup-v2.rst |  10 +++
include/linux/cgroup-defs.h             |   7 ++
kernel/cgroup/cgroup.c                  |  32 ++++++++
kernel/cgroup/dmem.c                    | 132 ++++++++++++++++++++++++++++++--
mm/memcontrol.c                         |  42 ++--------
5 files changed, 183 insertions(+), 40 deletions(-)
[PATCH 0/2] cgroup/dmem: introduce a peak file
Posted by Thadeu Lima de Souza Cascardo 1 month, 1 week ago
Just like we have memory.peak, introduce a dmem.peak, which uses the
page_counter support for that.

It can be written to in order to reset the peak, but different from
memory.peak, which expects any write, dmem.peak expects the region name to
be written to it. That region peak is the one that is reset.

That requires ofp_peak to carry a pointer to the pool that was reset.

Writing a different region name will reset the different region and make
the original region peak get back to its non-reset value.

While at it, we reuse a helper from memcontrol, which we moved to
kernel/cgroup/cgroup.c.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
---
Thadeu Lima de Souza Cascardo (2):
      mm/page_counter: decouple peak_reset from peak_write
      cgroup/dmem: introduce a peak file

 Documentation/admin-guide/cgroup-v2.rst |  10 +++
 include/linux/cgroup-defs.h             |   7 ++
 kernel/cgroup/cgroup.c                  |  32 ++++++++
 kernel/cgroup/dmem.c                    | 132 ++++++++++++++++++++++++++++++--
 mm/memcontrol.c                         |  42 ++--------
 5 files changed, 183 insertions(+), 40 deletions(-)
---
base-commit: 7fd2df204f342fc17d1a0bfcd474b24232fb0f32
change-id: 20260409-dmem_peak-3abc1be95072

Best regards,
--  
Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
Re: [PATCH 0/2] cgroup/dmem: introduce a peak file
Posted by Michal Koutný 1 month, 1 week ago
Hello Thadeu.

On Wed, May 06, 2026 at 08:58:23AM -0300, Thadeu Lima de Souza Cascardo <cascardo@igalia.com> wrote:
> Just like we have memory.peak, introduce a dmem.peak, which uses the
> page_counter support for that.
> 
> It can be written to in order to reset the peak, but different from
> memory.peak, which expects any write, dmem.peak expects the region name to
> be written to it. That region peak is the one that is reset.
> 
> That requires ofp_peak to carry a pointer to the pool that was reset.

(It'd be nicer to have generic data in that generic structure, at least
some void *priv. But see below.)

> Writing a different region name will reset the different region and make
> the original region peak get back to its non-reset value.

I'm slightly confused by this fds x pool matricity when there's only
a single slot in cgroup_file_ctx::cgroup_of_peak.

The intended use case is that users should maintain one fd per pool and
not mix it up?
This stanza would better fit to cgroup-v2.rst proper than the commit
message. Or make it simpler and start with non-resettable peak file
(like memory.peak had started too) and see how it fares. WDYT?


Thanks,
Michal
Re: [PATCH 0/2] cgroup/dmem: introduce a peak file
Posted by Thadeu Lima de Souza Cascardo 1 month, 1 week ago
On Wed, May 06, 2026 at 03:53:59PM +0200, Michal Koutný wrote:
> Hello Thadeu.
> 
> On Wed, May 06, 2026 at 08:58:23AM -0300, Thadeu Lima de Souza Cascardo <cascardo@igalia.com> wrote:
> > Just like we have memory.peak, introduce a dmem.peak, which uses the
> > page_counter support for that.
> > 
> > It can be written to in order to reset the peak, but different from
> > memory.peak, which expects any write, dmem.peak expects the region name to
> > be written to it. That region peak is the one that is reset.
> > 
> > That requires ofp_peak to carry a pointer to the pool that was reset.
> 
> (It'd be nicer to have generic data in that generic structure, at least
> some void *priv. But see below.)
> 

I used void *, at first, but as the only current use is for the pool and as
mixing different uses may lead to misuse, I thought it would be safer to
use the type directly. This has been pointed out before for other members
of cgroup_file_ctx. See [1].

> > Writing a different region name will reset the different region and make
> > the original region peak get back to its non-reset value.
> 
> I'm slightly confused by this fds x pool matricity when there's only
> a single slot in cgroup_file_ctx::cgroup_of_peak.
> 
> The intended use case is that users should maintain one fd per pool and
> not mix it up?
> This stanza would better fit to cgroup-v2.rst proper than the commit
> message. Or make it simpler and start with non-resettable peak file
> (like memory.peak had started too) and see how it fares. WDYT?
> 

That is also due to the limitation that each file descriptor has a single
linked list under cgroup_file_ctx::cgroup_of_peak. To allow for all the
pools to be reset at once, we would need one list per file descriptor.

But, on the other hand, as you pointed out, this leads to the flexibility
of being able to reset only one pool, while leaving the others as is.
Whereas, if one needs to reset all pools, they can use one file descriptor
per region.

I started with a non-resettable peak file, but as memory.peak can be reset,
I added that feature too. If we want to merge a non-resettable support
ealier and need to take longer to discuss how to work on the resettable
support given the above, I can resubmit. But I guess we can see if we can
reach an agreement sonner rather than later.

Thanks.
Cascardo.

> 
> Thanks,
> Michal

[1] https://lore.kernel.org/all/CAHk-=wgiYkECT=hZRKj8ZwfBPw2Uz=gpOGBGd4ny0KYhSsjC0w@mail.gmail.com/
Re: [PATCH 0/2] cgroup/dmem: introduce a peak file
Posted by Michal Koutný 1 month, 1 week ago
On Wed, May 06, 2026 at 11:18:26AM -0300, Thadeu Lima de Souza Cascardo <cascardo@igalia.com> wrote:
> I used void *, at first, but as the only current use is for the pool and as
> mixing different uses may lead to misuse, I thought it would be safer to
> use the type directly. This has been pointed out before for other members
> of cgroup_file_ctx. See [1].

That mail reacts to union overlaps and pointer vs embedded struct
allocations. Correct me if I missed your part.

I agree that having properly typed pointer is safer.
cgroup_file_ctx sub-structs are for generic cgroup files. But here
somehow a specific controller needs propagated to the generic member.

What about storing also the `list_head *watchers` inside `struct
cgroup_of_peak` and each subsys would manage it as needed?
(ofp->watchers == NULL could also substitute ofp->value ==
OFP_PEAK_UNSET)


> I started with a non-resettable peak file, but as memory.peak can be reset,
> I added that feature too. 

At the same time pids.peak has survived without reset option till today.

> If we want to merge a non-resettable support ealier and need to take
> longer to discuss how to work on the resettable support given the
> above, I can resubmit. But I guess we can see if we can reach an
> agreement sonner rather than later.

What kind of users do you envision (i.e. would they need resets at all)?
Anyway, the behavior should be explained in cgroup-v2.rst since that's
where they'll look for it.

HTH,
Michal