include/linux/cgroup-defs.h | 4 ++ kernel/cgroup/rstat.c | 80 ++++++++++++++++++++++++++++++------- mm/memcontrol.c | 10 ++--- 3 files changed, 75 insertions(+), 19 deletions(-)
BPF programs can run in nmi context and may trigger memcg charged memory allocation in such context. Recently linux added support to nmi safe page allocation along with memcg charging of such allocations. However the kmalloc/slab support and corresponding memcg charging is still lacking, To provide nmi safe support for memcg charging for kmalloc/slab allocations, we need nmi safe memcg stats and for that we need nmi safe css_rstat_updated() which adds the given cgroup state whose stats are updated into the per-cpu per-ss update tree. This series took the aim to make css_rstat_updated() nmi safe. This series made css_rstat_updated by using per-cpu lockless lists whose node in embedded in individual struct cgroup_subsys_state and the per-cpu head is placed in struct cgroup_subsys. For rstat users without cgroup_subsys, a global per-cpu lockless list head is created. The main challenge to use lockless in this scenario was the potential multiple inserters using the same lockless node of a cgroup_subsys_state which is different from traditional users of lockless lists. The multiple inserters using potentially same lockless node was resolved by making one of them succeed on reset the lockless node and the winner gets to insert the lockless node in the corresponding lockless list. Changelog since v1: - Based on Yosry's suggestion always use llist on the update side and create the update tree on flush side [v1] https://lore.kernel.org/cgroups/20250429061211.1295443-1-shakeel.butt@linux.dev/ Shakeel Butt (3): cgroup: support to enable nmi-safe css_rstat_updated cgroup: make css_rstat_updated nmi safe memcg: cgroup: call memcg_rstat_updated irrespective of in_nmi() include/linux/cgroup-defs.h | 4 ++ kernel/cgroup/rstat.c | 80 ++++++++++++++++++++++++++++++------- mm/memcontrol.c | 10 ++--- 3 files changed, 75 insertions(+), 19 deletions(-) -- 2.47.1
On Mon, Jun 09, 2025 at 03:56:08PM -0700, Shakeel Butt <shakeel.butt@linux.dev> wrote: > BPF programs can run in nmi context and may trigger memcg charged memory > allocation in such context. Recently linux added support to nmi safe > page allocation along with memcg charging of such allocations. However > the kmalloc/slab support and corresponding memcg charging is still > lacking, > > To provide nmi safe support for memcg charging for kmalloc/slab > allocations, we need nmi safe memcg stats and for that we need nmi safe > css_rstat_updated() which adds the given cgroup state whose stats are > updated into the per-cpu per-ss update tree. This series took the aim to > make css_rstat_updated() nmi safe. memcg charging relies on page counters and per-cpu stocks. css_rstat_updated() is "only" for statistics (which has admiteddly some in-kernel consumers but those are already affected by batching and flushing errors). Have I missed some updates that make css_rstat_updated() calls critical for memcg charging? I'd find it useful to explain this aspect more in the cover letter. Thanks, Michal
On Tue, Jun 10, 2025 at 12:53:11PM +0200, Michal Koutný wrote: > On Mon, Jun 09, 2025 at 03:56:08PM -0700, Shakeel Butt <shakeel.butt@linux.dev> wrote: > > BPF programs can run in nmi context and may trigger memcg charged memory > > allocation in such context. Recently linux added support to nmi safe > > page allocation along with memcg charging of such allocations. However > > the kmalloc/slab support and corresponding memcg charging is still > > lacking, > > > > To provide nmi safe support for memcg charging for kmalloc/slab > > allocations, we need nmi safe memcg stats and for that we need nmi safe > > css_rstat_updated() which adds the given cgroup state whose stats are > > updated into the per-cpu per-ss update tree. This series took the aim to > > make css_rstat_updated() nmi safe. > > memcg charging relies on page counters and per-cpu stocks. > css_rstat_updated() is "only" for statistics (which has admiteddly some > in-kernel consumers but those are already affected by batching and > flushing errors). > > Have I missed some updates that make css_rstat_updated() calls critical > for memcg charging? I'd find it useful to explain this aspect more in > the cover letter. For kernel memory, the charging and stats (MEMCG_KMEM, NR_SLAB_RECLAIMABLE_B, NR_SLAB_UNRECLAIMABLE_B) updates happen together. I will add a line or two in the next version.
On Mon, 9 Jun 2025 15:56:08 -0700 Shakeel Butt <shakeel.butt@linux.dev> wrote: > BPF programs can run in nmi context and may trigger memcg charged memory > allocation in such context. Recently linux added support to nmi safe > page allocation along with memcg charging of such allocations. However > the kmalloc/slab support and corresponding memcg charging is still > lacking, > > To provide nmi safe support for memcg charging for kmalloc/slab > allocations, we need nmi safe memcg stats and for that we need nmi safe > css_rstat_updated() which adds the given cgroup state whose stats are > updated into the per-cpu per-ss update tree. This series took the aim to > make css_rstat_updated() nmi safe. > > This series made css_rstat_updated by using per-cpu lockless lists whose > node in embedded in individual struct cgroup_subsys_state and the > per-cpu head is placed in struct cgroup_subsys. For rstat users without > cgroup_subsys, a global per-cpu lockless list head is created. The main > challenge to use lockless in this scenario was the potential multiple > inserters using the same lockless node of a cgroup_subsys_state which is > different from traditional users of lockless lists. > > The multiple inserters using potentially same lockless node was resolved > by making one of them succeed on reset the lockless node and the winner > gets to insert the lockless node in the corresponding lockless list. And what happens with the losers?
On Mon, Jun 09, 2025 at 04:44:10PM -0700, Andrew Morton wrote: > On Mon, 9 Jun 2025 15:56:08 -0700 Shakeel Butt <shakeel.butt@linux.dev> wrote: > > > BPF programs can run in nmi context and may trigger memcg charged memory > > allocation in such context. Recently linux added support to nmi safe > > page allocation along with memcg charging of such allocations. However > > the kmalloc/slab support and corresponding memcg charging is still > > lacking, > > > > To provide nmi safe support for memcg charging for kmalloc/slab > > allocations, we need nmi safe memcg stats and for that we need nmi safe > > css_rstat_updated() which adds the given cgroup state whose stats are > > updated into the per-cpu per-ss update tree. This series took the aim to > > make css_rstat_updated() nmi safe. > > > > This series made css_rstat_updated by using per-cpu lockless lists whose > > node in embedded in individual struct cgroup_subsys_state and the > > per-cpu head is placed in struct cgroup_subsys. For rstat users without > > cgroup_subsys, a global per-cpu lockless list head is created. The main > > challenge to use lockless in this scenario was the potential multiple > > inserters using the same lockless node of a cgroup_subsys_state which is > > different from traditional users of lockless lists. > > > > The multiple inserters using potentially same lockless node was resolved > > by making one of them succeed on reset the lockless node and the winner > > gets to insert the lockless node in the corresponding lockless list. > > And what happens with the losers? Losers can continue their normal work without worrying about this specific insertion. Basically we need one successful insertion. In addition this is a contention between process context, softirq, hardirq and nmi on the same cpu for the same cgroup which should be very unlikely.
© 2016 - 2025 Red Hat, Inc.