[PATCH 0/3] cgroup: nmi safe css_rstat_updated

Shakeel Butt posted 3 patches 4 months ago
There is a newer version of this series
include/linux/cgroup-defs.h |  4 ++
kernel/cgroup/rstat.c       | 80 ++++++++++++++++++++++++++++++-------
mm/memcontrol.c             | 10 ++---
3 files changed, 75 insertions(+), 19 deletions(-)
[PATCH 0/3] cgroup: nmi safe css_rstat_updated
Posted by Shakeel Butt 4 months ago
BPF programs can run in nmi context and may trigger memcg charged memory
allocation in such context. Recently linux added support to nmi safe
page allocation along with memcg charging of such allocations. However
the kmalloc/slab support and corresponding memcg charging is still
lacking,

To provide nmi safe support for memcg charging for kmalloc/slab
allocations, we need nmi safe memcg stats and for that we need nmi safe
css_rstat_updated() which adds the given cgroup state whose stats are
updated into the per-cpu per-ss update tree. This series took the aim to
make css_rstat_updated() nmi safe.

This series made css_rstat_updated by using per-cpu lockless lists whose
node in embedded in individual struct cgroup_subsys_state and the
per-cpu head is placed in struct cgroup_subsys. For rstat users without
cgroup_subsys, a global per-cpu lockless list head is created. The main
challenge to use lockless in this scenario was the potential multiple
inserters using the same lockless node of a cgroup_subsys_state which is
different from traditional users of lockless lists.

The multiple inserters using potentially same lockless node was resolved
by making one of them succeed on reset the lockless node and the winner
gets to insert the lockless node in the corresponding lockless list.


Changelog since v1:
- Based on Yosry's suggestion always use llist on the update side and
  create the update tree on flush side

[v1] https://lore.kernel.org/cgroups/20250429061211.1295443-1-shakeel.butt@linux.dev/
 

Shakeel Butt (3):
  cgroup: support to enable nmi-safe css_rstat_updated
  cgroup: make css_rstat_updated nmi safe
  memcg: cgroup: call memcg_rstat_updated irrespective of in_nmi()

 include/linux/cgroup-defs.h |  4 ++
 kernel/cgroup/rstat.c       | 80 ++++++++++++++++++++++++++++++-------
 mm/memcontrol.c             | 10 ++---
 3 files changed, 75 insertions(+), 19 deletions(-)

-- 
2.47.1
Re: [PATCH 0/3] cgroup: nmi safe css_rstat_updated
Posted by Michal Koutný 4 months ago
On Mon, Jun 09, 2025 at 03:56:08PM -0700, Shakeel Butt <shakeel.butt@linux.dev> wrote:
> BPF programs can run in nmi context and may trigger memcg charged memory
> allocation in such context. Recently linux added support to nmi safe
> page allocation along with memcg charging of such allocations. However
> the kmalloc/slab support and corresponding memcg charging is still
> lacking,
> 
> To provide nmi safe support for memcg charging for kmalloc/slab
> allocations, we need nmi safe memcg stats and for that we need nmi safe
> css_rstat_updated() which adds the given cgroup state whose stats are
> updated into the per-cpu per-ss update tree. This series took the aim to
> make css_rstat_updated() nmi safe.

memcg charging relies on page counters and per-cpu stocks.
css_rstat_updated() is "only" for statistics (which has admiteddly some
in-kernel consumers but those are already affected by batching and
flushing errors).

Have I missed some updates that make css_rstat_updated() calls critical
for memcg charging? I'd find it useful to explain this aspect more in
the cover letter.

Thanks,
Michal
Re: [PATCH 0/3] cgroup: nmi safe css_rstat_updated
Posted by Shakeel Butt 4 months ago
On Tue, Jun 10, 2025 at 12:53:11PM +0200, Michal Koutný wrote:
> On Mon, Jun 09, 2025 at 03:56:08PM -0700, Shakeel Butt <shakeel.butt@linux.dev> wrote:
> > BPF programs can run in nmi context and may trigger memcg charged memory
> > allocation in such context. Recently linux added support to nmi safe
> > page allocation along with memcg charging of such allocations. However
> > the kmalloc/slab support and corresponding memcg charging is still
> > lacking,
> > 
> > To provide nmi safe support for memcg charging for kmalloc/slab
> > allocations, we need nmi safe memcg stats and for that we need nmi safe
> > css_rstat_updated() which adds the given cgroup state whose stats are
> > updated into the per-cpu per-ss update tree. This series took the aim to
> > make css_rstat_updated() nmi safe.
> 
> memcg charging relies on page counters and per-cpu stocks.
> css_rstat_updated() is "only" for statistics (which has admiteddly some
> in-kernel consumers but those are already affected by batching and
> flushing errors).
> 
> Have I missed some updates that make css_rstat_updated() calls critical
> for memcg charging? I'd find it useful to explain this aspect more in
> the cover letter.

For kernel memory, the charging and stats (MEMCG_KMEM,
NR_SLAB_RECLAIMABLE_B, NR_SLAB_UNRECLAIMABLE_B) updates happen together.
I will add a line or two in the next version.
Re: [PATCH 0/3] cgroup: nmi safe css_rstat_updated
Posted by Andrew Morton 4 months ago
On Mon,  9 Jun 2025 15:56:08 -0700 Shakeel Butt <shakeel.butt@linux.dev> wrote:

> BPF programs can run in nmi context and may trigger memcg charged memory
> allocation in such context. Recently linux added support to nmi safe
> page allocation along with memcg charging of such allocations. However
> the kmalloc/slab support and corresponding memcg charging is still
> lacking,
> 
> To provide nmi safe support for memcg charging for kmalloc/slab
> allocations, we need nmi safe memcg stats and for that we need nmi safe
> css_rstat_updated() which adds the given cgroup state whose stats are
> updated into the per-cpu per-ss update tree. This series took the aim to
> make css_rstat_updated() nmi safe.
> 
> This series made css_rstat_updated by using per-cpu lockless lists whose
> node in embedded in individual struct cgroup_subsys_state and the
> per-cpu head is placed in struct cgroup_subsys. For rstat users without
> cgroup_subsys, a global per-cpu lockless list head is created. The main
> challenge to use lockless in this scenario was the potential multiple
> inserters using the same lockless node of a cgroup_subsys_state which is
> different from traditional users of lockless lists.
> 
> The multiple inserters using potentially same lockless node was resolved
> by making one of them succeed on reset the lockless node and the winner
> gets to insert the lockless node in the corresponding lockless list.

And what happens with the losers?
Re: [PATCH 0/3] cgroup: nmi safe css_rstat_updated
Posted by Shakeel Butt 4 months ago
On Mon, Jun 09, 2025 at 04:44:10PM -0700, Andrew Morton wrote:
> On Mon,  9 Jun 2025 15:56:08 -0700 Shakeel Butt <shakeel.butt@linux.dev> wrote:
> 
> > BPF programs can run in nmi context and may trigger memcg charged memory
> > allocation in such context. Recently linux added support to nmi safe
> > page allocation along with memcg charging of such allocations. However
> > the kmalloc/slab support and corresponding memcg charging is still
> > lacking,
> > 
> > To provide nmi safe support for memcg charging for kmalloc/slab
> > allocations, we need nmi safe memcg stats and for that we need nmi safe
> > css_rstat_updated() which adds the given cgroup state whose stats are
> > updated into the per-cpu per-ss update tree. This series took the aim to
> > make css_rstat_updated() nmi safe.
> > 
> > This series made css_rstat_updated by using per-cpu lockless lists whose
> > node in embedded in individual struct cgroup_subsys_state and the
> > per-cpu head is placed in struct cgroup_subsys. For rstat users without
> > cgroup_subsys, a global per-cpu lockless list head is created. The main
> > challenge to use lockless in this scenario was the potential multiple
> > inserters using the same lockless node of a cgroup_subsys_state which is
> > different from traditional users of lockless lists.
> > 
> > The multiple inserters using potentially same lockless node was resolved
> > by making one of them succeed on reset the lockless node and the winner
> > gets to insert the lockless node in the corresponding lockless list.
> 
> And what happens with the losers?

Losers can continue their normal work without worrying about this
specific insertion. Basically we need one successful insertion. In
addition this is a contention between process context, softirq, hardirq
and nmi on the same cpu for the same cgroup which should be very
unlikely.