[v2] cgroup: nmi safe css_rstat_updated

[PATCH v2 0/4] cgroup: nmi safe css_rstat_updated

Posted by Shakeel Butt 4 months ago

BPF programs can run in nmi context and may trigger memcg charged memory
allocation in such context. Recently linux added support to nmi safe
page allocation along with memcg charging of such allocations. However
the kmalloc/slab support and corresponding memcg charging is still
lacking,

To provide nmi safe support for memcg charging for kmalloc/slab
allocations, we need nmi safe memcg stats because for kernel memory
charging and stats happen together. At the moment, memcg charging and
memcg stats are nmi safe and the only thing which is not nmi safe is
adding the cgroup to the per-cpu rstat update tree. i.e.
css_rstat_updated() which this series is doing.

This series made css_rstat_updated by using per-cpu lockless lists whose
node in embedded in individual struct cgroup_subsys_state and the
per-cpu head is placed in struct cgroup_subsys. For rstat users without
cgroup_subsys, a global per-cpu lockless list head is created. The main
challenge to use lockless in this scenario was the potential multiple
inserters from the stacked context i.e. process, softirq, hardirq & nmi,
potentially using the same per-cpu lockless node of a given
cgroup_subsys_state. The normal lockless list does not protect against
such scenario.

The multiple stacked inserters using potentially same lockless node was
resolved by making one of them succeed on reset the lockless node and the
winner gets to insert the lockless node in the corresponding lockless
list. The losers can assume the lockless list insertion will eventually
succeed and continue their operation.

Changelog since v2:
- Add more clear explanation in cover letter and in the comment as
  suggested by Andrew, Michal & Tejun.
- Use this_cpu_cmpxchg() instead of try_cmpxchg() as suggested by Tejun.
- Remove the per-cpu ss locks as they are not needed anymore.

Changelog since v1:
- Based on Yosry's suggestion always use llist on the update side and
  create the update tree on flush side

[v1] https://lore.kernel.org/cgroups/20250429061211.1295443-1-shakeel.butt@linux.dev/
 


Shakeel Butt (4):
  cgroup: support to enable nmi-safe css_rstat_updated
  cgroup: make css_rstat_updated nmi safe
  cgroup: remove per-cpu per-subsystem locks
  memcg: cgroup: call css_rstat_updated irrespective of in_nmi()

 include/linux/cgroup-defs.h   |  11 +--
 include/trace/events/cgroup.h |  47 ----------
 kernel/cgroup/rstat.c         | 169 +++++++++++++---------------------
 mm/memcontrol.c               |  10 +-
 4 files changed, 74 insertions(+), 163 deletions(-)

-- 
2.47.1

Re: [PATCH v2 0/4] cgroup: nmi safe css_rstat_updated

Posted by JP Kobryn 3 months, 3 weeks ago

On 6/11/25 3:15 PM, Shakeel Butt wrote:
> BPF programs can run in nmi context and may trigger memcg charged memory
> allocation in such context. Recently linux added support to nmi safe
> page allocation along with memcg charging of such allocations. However
> the kmalloc/slab support and corresponding memcg charging is still
> lacking,
> 
> To provide nmi safe support for memcg charging for kmalloc/slab
> allocations, we need nmi safe memcg stats because for kernel memory
> charging and stats happen together. At the moment, memcg charging and
> memcg stats are nmi safe and the only thing which is not nmi safe is
> adding the cgroup to the per-cpu rstat update tree. i.e.
> css_rstat_updated() which this series is doing.
> 
> This series made css_rstat_updated by using per-cpu lockless lists whose
> node in embedded in individual struct cgroup_subsys_state and the
> per-cpu head is placed in struct cgroup_subsys. For rstat users without
> cgroup_subsys, a global per-cpu lockless list head is created. The main
> challenge to use lockless in this scenario was the potential multiple
> inserters from the stacked context i.e. process, softirq, hardirq & nmi,
> potentially using the same per-cpu lockless node of a given
> cgroup_subsys_state. The normal lockless list does not protect against
> such scenario.
> 
> The multiple stacked inserters using potentially same lockless node was
> resolved by making one of them succeed on reset the lockless node and the
> winner gets to insert the lockless node in the corresponding lockless
> list. The losers can assume the lockless list insertion will eventually
> succeed and continue their operation.
> 
> Changelog since v2:
> - Add more clear explanation in cover letter and in the comment as
>    suggested by Andrew, Michal & Tejun.
> - Use this_cpu_cmpxchg() instead of try_cmpxchg() as suggested by Tejun.
> - Remove the per-cpu ss locks as they are not needed anymore.
> 
> Changelog since v1:
> - Based on Yosry's suggestion always use llist on the update side and
>    create the update tree on flush side
> 
> [v1] https://lore.kernel.org/cgroups/20250429061211.1295443-1-shakeel.butt@linux.dev/
>   
> 
> 
> Shakeel Butt (4):
>    cgroup: support to enable nmi-safe css_rstat_updated
>    cgroup: make css_rstat_updated nmi safe
>    cgroup: remove per-cpu per-subsystem locks
>    memcg: cgroup: call css_rstat_updated irrespective of in_nmi()
> 
>   include/linux/cgroup-defs.h   |  11 +--
>   include/trace/events/cgroup.h |  47 ----------
>   kernel/cgroup/rstat.c         | 169 +++++++++++++---------------------
>   mm/memcontrol.c               |  10 +-
>   4 files changed, 74 insertions(+), 163 deletions(-)
> 

I tested this series by doing some updates/flushes on a cgroup hierarchy
with four levels. This tag can be added to the patches in this series.

Tested-by: JP Kobryn <inwardvessel@gmail.com>

Re: [PATCH v2 0/4] cgroup: nmi safe css_rstat_updated

Posted by Shakeel Butt 3 months, 3 weeks ago

On Mon, Jun 16, 2025 at 01:08:49PM -0700, JP Kobryn wrote:
> On 6/11/25 3:15 PM, Shakeel Butt wrote:
> > BPF programs can run in nmi context and may trigger memcg charged memory
> > allocation in such context. Recently linux added support to nmi safe
> > page allocation along with memcg charging of such allocations. However
> > the kmalloc/slab support and corresponding memcg charging is still
> > lacking,
> > 
> > To provide nmi safe support for memcg charging for kmalloc/slab
> > allocations, we need nmi safe memcg stats because for kernel memory
> > charging and stats happen together. At the moment, memcg charging and
> > memcg stats are nmi safe and the only thing which is not nmi safe is
> > adding the cgroup to the per-cpu rstat update tree. i.e.
> > css_rstat_updated() which this series is doing.
> > 
> > This series made css_rstat_updated by using per-cpu lockless lists whose
> > node in embedded in individual struct cgroup_subsys_state and the
> > per-cpu head is placed in struct cgroup_subsys. For rstat users without
> > cgroup_subsys, a global per-cpu lockless list head is created. The main
> > challenge to use lockless in this scenario was the potential multiple
> > inserters from the stacked context i.e. process, softirq, hardirq & nmi,
> > potentially using the same per-cpu lockless node of a given
> > cgroup_subsys_state. The normal lockless list does not protect against
> > such scenario.
> > 
> > The multiple stacked inserters using potentially same lockless node was
> > resolved by making one of them succeed on reset the lockless node and the
> > winner gets to insert the lockless node in the corresponding lockless
> > list. The losers can assume the lockless list insertion will eventually
> > succeed and continue their operation.
> > 
> > Changelog since v2:
> > - Add more clear explanation in cover letter and in the comment as
> >    suggested by Andrew, Michal & Tejun.
> > - Use this_cpu_cmpxchg() instead of try_cmpxchg() as suggested by Tejun.
> > - Remove the per-cpu ss locks as they are not needed anymore.
> > 
> > Changelog since v1:
> > - Based on Yosry's suggestion always use llist on the update side and
> >    create the update tree on flush side
> > 
> > [v1] https://lore.kernel.org/cgroups/20250429061211.1295443-1-shakeel.butt@linux.dev/
> > 
> > 
> > Shakeel Butt (4):
> >    cgroup: support to enable nmi-safe css_rstat_updated
> >    cgroup: make css_rstat_updated nmi safe
> >    cgroup: remove per-cpu per-subsystem locks
> >    memcg: cgroup: call css_rstat_updated irrespective of in_nmi()
> > 
> >   include/linux/cgroup-defs.h   |  11 +--
> >   include/trace/events/cgroup.h |  47 ----------
> >   kernel/cgroup/rstat.c         | 169 +++++++++++++---------------------
> >   mm/memcontrol.c               |  10 +-
> >   4 files changed, 74 insertions(+), 163 deletions(-)
> > 
> 
> I tested this series by doing some updates/flushes on a cgroup hierarchy
> with four levels. This tag can be added to the patches in this series.
> 
> Tested-by: JP Kobryn <inwardvessel@gmail.com>
> 

Thanks a lot.

Re: [PATCH v2 0/4] cgroup: nmi safe css_rstat_updated

Posted by Tejun Heo 3 months, 3 weeks ago

Hello,

On Wed, Jun 11, 2025 at 03:15:28PM -0700, Shakeel Butt wrote:
> Shakeel Butt (4):
>   cgroup: support to enable nmi-safe css_rstat_updated
>   cgroup: make css_rstat_updated nmi safe
>   cgroup: remove per-cpu per-subsystem locks
>   memcg: cgroup: call css_rstat_updated irrespective of in_nmi()

The patches look good to me. How should it be routed? Should I take all
four, just the first three or would it better to route all through -mm?

Thanks.

-- 
tejun

Re: [PATCH v2 0/4] cgroup: nmi safe css_rstat_updated

Posted by Shakeel Butt 3 months, 3 weeks ago

On Mon, Jun 16, 2025 at 08:15:17AM -1000, Tejun Heo wrote:
> Hello,
> 
> On Wed, Jun 11, 2025 at 03:15:28PM -0700, Shakeel Butt wrote:
> > Shakeel Butt (4):
> >   cgroup: support to enable nmi-safe css_rstat_updated
> >   cgroup: make css_rstat_updated nmi safe
> >   cgroup: remove per-cpu per-subsystem locks
> >   memcg: cgroup: call css_rstat_updated irrespective of in_nmi()
> 
> The patches look good to me. How should it be routed? Should I take all
> four, just the first three or would it better to route all through -mm?
> 

I would like all four to be together and since most of the code is in
cgroup, cgroup tree makes more sense unless Andrew has different
opinion.

thanks,
Shakeel

Re: [PATCH v2 0/4] cgroup: nmi safe css_rstat_updated

Posted by Tejun Heo 3 months, 3 weeks ago

On Mon, Jun 16, 2025 at 12:20:28PM -0700, Shakeel Butt wrote:
> On Mon, Jun 16, 2025 at 08:15:17AM -1000, Tejun Heo wrote:
> > Hello,
> > 
> > On Wed, Jun 11, 2025 at 03:15:28PM -0700, Shakeel Butt wrote:
> > > Shakeel Butt (4):
> > >   cgroup: support to enable nmi-safe css_rstat_updated
> > >   cgroup: make css_rstat_updated nmi safe
> > >   cgroup: remove per-cpu per-subsystem locks
> > >   memcg: cgroup: call css_rstat_updated irrespective of in_nmi()
> > 
> > The patches look good to me. How should it be routed? Should I take all
> > four, just the first three or would it better to route all through -mm?
> > 
> 
> I would like all four to be together and since most of the code is in
> cgroup, cgroup tree makes more sense unless Andrew has different
> opinion.

Okay, I'll route them through cgroup. The patches don't apply cleanly on
cgroup/for-6.17. Can you please send a refreshed set?

Thanks.

-- 
tejun

Re: [PATCH v2 0/4] cgroup: nmi safe css_rstat_updated

Posted by Shakeel Butt 3 months, 3 weeks ago

On Tue, Jun 17, 2025 at 09:06:01AM -1000, Tejun Heo wrote:
> On Mon, Jun 16, 2025 at 12:20:28PM -0700, Shakeel Butt wrote:
> > On Mon, Jun 16, 2025 at 08:15:17AM -1000, Tejun Heo wrote:
> > > Hello,
> > > 
> > > On Wed, Jun 11, 2025 at 03:15:28PM -0700, Shakeel Butt wrote:
> > > > Shakeel Butt (4):
> > > >   cgroup: support to enable nmi-safe css_rstat_updated
> > > >   cgroup: make css_rstat_updated nmi safe
> > > >   cgroup: remove per-cpu per-subsystem locks
> > > >   memcg: cgroup: call css_rstat_updated irrespective of in_nmi()
> > > 
> > > The patches look good to me. How should it be routed? Should I take all
> > > four, just the first three or would it better to route all through -mm?
> > > 
> > 
> > I would like all four to be together and since most of the code is in
> > cgroup, cgroup tree makes more sense unless Andrew has different
> > opinion.
> 
> Okay, I'll route them through cgroup. The patches don't apply cleanly on
> cgroup/for-6.17. Can you please send a refreshed set?

Yup, I will do asap.