kernel/cgroup/cgroup-internal.h | 2 + kernel/cgroup/cgroup.c | 57 ++++++++++++++++++---------- kernel/cgroup/rstat.c | 67 +++++++++++++++++---------------- 3 files changed, 74 insertions(+), 52 deletions(-)
Within css_create() there are several error paths that lead to async cleanup of a given css. An warning was found [0] which appears to be the result of calling css_rstat_exit() on this cleanup path with a css having uninitialized rstat objects. This series makes adjustments to provide up a place in css_create() where css_rstat_init() can both 1) fail gracefully and 2) guarantee completion before async cleanup leading to css_rstat_exit() occurs. The core change is splitting init_and_link_css() into two separate functions and calling css_rstat_init() between them. The reason for this is that because of the refcount incrementing that occurs within, this function puts a constraint on the error handling that follows. See excerpt below: css_create(...) { ... init_and_link_css(css, ...); /* any subsequent error needs async css cleanup */ err = percpu_ref_init(...); if (err) goto err_free_css; err = cgroup_idr_alloc(...); if (err) goto err_free_css; err = css_rstat_init(css, ...); if (err) goto err_free_css; ... err_free_css: INIT_RCU_WORK(&css->destroy_rwork, css_free_rwork_fn); queue_rcu_work(cgroup_destroy_wq, &css->destroy_rwork); return ERR_PTR(err); } If any of the three goto jumps are taken, async cleanup will begin and css_rstat_exit() will be invoked. But since css_rstat_init() would not have succeeded, the warning will eventually be reached. In this series, the code above changes to resemble something like this: css_create(...) { ... init_css(css); if (css->css_rstat_flush) { err = css_rstat_init(css) if (err) { ss->css_free(css); return ERR_PTR(err); } } link_css(css, ...); ... } There was some refactoring done to eliminate self-imposed constraints on where css_rstat_init() may be called. For example, one previous constraint was that css_rstat_init() could only be called after init_and_link_css() since that is where it gets its subsystem and cgroup fields set, and css->ss was used to determine how to allocate rstat (base or subsystem). The changes in this series remove this constraint by eliminating the dependency of subsystem/cgroup information in rstat init/exit. The resulting init/exit routines can work exclusively on rstat entities and not be concerned with other types of entities. The refactoring may also improve maintainability. Existing code like this: css_rstat_init(&cgrp->self); /* for base state */ css_rstat_init(css); /* for subsystems */ ... will now look like this: cgroup_rstat_base_init(cgrp); /* for base stats */ css_rstat_init(css); /* for subsystems */ Finally, the number of nested conditionals in two functions above has been reduced compared to the amount previously in css_rstat_init/exit(). This could help improve code readability. [0] https://lore.kernel.org/all/684c5802.a00a0220.279073.0016.GAE@google.com/ JP Kobryn (5): cgroup: add exclusive css rstat init/exit api for base stats cgroup: check for rstat flush callback at css rstat init/exit call sites cgroup: split init_and_link_css() cgroup: initialize css rstat before linking to cgroups in css_create() cgroup: break up the internal rstat init/exit logic by subsys and base kernel/cgroup/cgroup-internal.h | 2 + kernel/cgroup/cgroup.c | 57 ++++++++++++++++++---------- kernel/cgroup/rstat.c | 67 +++++++++++++++++---------------- 3 files changed, 74 insertions(+), 52 deletions(-) -- 2.47.1
Hi. On Mon, Jul 21, 2025 at 06:40:25PM -0700, JP Kobryn <inwardvessel@gmail.com> wrote: ... Thanks for the instructive summary! > If any of the three goto jumps are taken, async cleanup will begin and > css_rstat_exit() will be invoked. But since css_rstat_init() would not have > succeeded, the warning will eventually be reached. First thought is why not simply add a flag that'd guide whether css_rstat_exit() has work to do. This is meant as a fix, so it should have some metadata, I'd consider this one: Fixes: 5da3bfa029d68 ("cgroup: use separate rstat trees for each subsystem") (that's when css_rstat_init was moved to css_create) and likely this Reported-by: syzbot+8d052e8b99e40bc625ed@syzkaller.appspotmail.com (Sorry for being such a bureaucrat.) It's most appropriate in your 4/5 but do you think it'd be possible to reshuffle the series to put the fix in front (to ease it for stable kernels) and refactorings after? Regards, Michal
Thanks for taking a look Michal. On 7/25/25 10:23 AM, Michal Koutný wrote: > Hi. > > On Mon, Jul 21, 2025 at 06:40:25PM -0700, JP Kobryn <inwardvessel@gmail.com> wrote: > ... > > Thanks for the instructive summary! > >> If any of the three goto jumps are taken, async cleanup will begin and >> css_rstat_exit() will be invoked. But since css_rstat_init() would not have >> succeeded, the warning will eventually be reached. > First thought is why not simply add a flag that'd guide whether > css_rstat_exit() has work to do. I did consider adding an "initialized" flag to the css but since there can be multiple css's per cgroup it felt like it would be adding overhead. So I went the path of getting the call sequence right. I'm open to feedback on this, though. > > This is meant as a fix, so it should have some metadata, I'd consider this one: > Fixes: 5da3bfa029d68 ("cgroup: use separate rstat trees for each subsystem") > > (that's when css_rstat_init was moved to css_create) > > and likely this > Reported-by: syzbot+8d052e8b99e40bc625ed@syzkaller.appspotmail.com > > (Sorry for being such a bureaucrat.) No problem, I overlooked that. > It's most appropriate in your 4/5 but do you think it'd be possible to > reshuffle the series to put the fix in front (to ease it for stable > kernels) and refactorings after? Let me give that a try. As it is right now, patches 1-3 are pre-reqs for 4. I can try to get the actual fix to the front and then add patches to additionally make nicer/refactor.
On Mon, Jul 28, 2025 at 11:04:56AM -0700, JP Kobryn <inwardvessel@gmail.com> wrote: > I did consider adding an "initialized" flag to the css but since there can > be multiple css's per > cgroup it felt like it would be adding overhead. So I went the path of > getting the call > sequence right. I'm open to feedback on this, though. An implicit flag that builds upon the assumption that css_rstat_init() must only succeed after it allocates ->rstat_cpu (didn't check gotchas of this approach with !CONFIG_SMP) --- a/kernel/cgroup/rstat.c +++ b/kernel/cgroup/rstat.c @@ -488,6 +488,10 @@ void css_rstat_exit(struct cgroup_subsys_state *css) if (!css_uses_rstat(css)) return; + /* Incomplete css whose css_rstat_init failed */ + if (!css->rstat_cpu) + return; + css_rstat_flush(css); /* sanity check */
On 7/29/25 2:42 AM, Michal Koutný wrote: > On Mon, Jul 28, 2025 at 11:04:56AM -0700, JP Kobryn <inwardvessel@gmail.com> wrote: >> I did consider adding an "initialized" flag to the css but since there can >> be multiple css's per >> cgroup it felt like it would be adding overhead. So I went the path of >> getting the call >> sequence right. I'm open to feedback on this, though. > > An implicit flag that builds upon the assumption that css_rstat_init() > must only succeed after it allocates ->rstat_cpu (didn't check gotchas > of this approach with !CONFIG_SMP) I think this can work. This can probably be the early fix and then the refactoring can follow.
© 2016 - 2025 Red Hat, Inc.