[PATCH 0/5 cgroup/for-6.16-fixes] harden css_create() for safe placement of call to css_rstat_init()

JP Kobryn posted 5 patches 2 months, 2 weeks ago
kernel/cgroup/cgroup-internal.h |  2 +
kernel/cgroup/cgroup.c          | 57 ++++++++++++++++++----------
kernel/cgroup/rstat.c           | 67 +++++++++++++++++----------------
3 files changed, 74 insertions(+), 52 deletions(-)
[PATCH 0/5 cgroup/for-6.16-fixes] harden css_create() for safe placement of call to css_rstat_init()
Posted by JP Kobryn 2 months, 2 weeks ago
Within css_create() there are several error paths that lead to async
cleanup of a given css. An warning was found [0] which appears to be the
result of calling css_rstat_exit() on this cleanup path with a css having
uninitialized rstat objects. This series makes adjustments to provide up a
place in css_create() where css_rstat_init() can both 1) fail gracefully
and 2) guarantee completion before async cleanup leading to
css_rstat_exit() occurs.

The core change is splitting init_and_link_css() into two separate
functions and calling css_rstat_init() between them. The reason for this is
that because of the refcount incrementing that occurs within, this function
puts a constraint on the error handling that follows. See excerpt below:

css_create(...)
{
	...
	init_and_link_css(css, ...);
	/* any subsequent error needs async css cleanup */

	err = percpu_ref_init(...);
	if (err)
		goto err_free_css;
	err = cgroup_idr_alloc(...);
	if (err)
		goto err_free_css;
	err = css_rstat_init(css, ...);
	if (err)
		goto err_free_css;
	...
err_free_css:
	INIT_RCU_WORK(&css->destroy_rwork, css_free_rwork_fn);
	queue_rcu_work(cgroup_destroy_wq, &css->destroy_rwork);
	return ERR_PTR(err);
}

If any of the three goto jumps are taken, async cleanup will begin and
css_rstat_exit() will be invoked. But since css_rstat_init() would not have
succeeded, the warning will eventually be reached.

In this series, the code above changes to resemble something like this:

css_create(...)
{
	...
	init_css(css);
	if (css->css_rstat_flush) {
		err = css_rstat_init(css)
		if (err) {
			ss->css_free(css);
			return ERR_PTR(err);
		}
	}
	link_css(css, ...);
	...
}

There was some refactoring done to eliminate self-imposed constraints on
where css_rstat_init() may be called. For example, one previous constraint
was that css_rstat_init() could only be called after init_and_link_css()
since that is where it gets its subsystem and cgroup fields set, and
css->ss was used to determine how to allocate rstat (base or subsystem).
The changes in this series remove this constraint by eliminating the
dependency of subsystem/cgroup information in rstat init/exit. The
resulting init/exit routines can work exclusively on rstat entities and not
be concerned with other types of entities.

The refactoring may also improve maintainability. Existing code like this:

css_rstat_init(&cgrp->self); /* for base state */
css_rstat_init(css); /* for subsystems */

... will now look like this:

cgroup_rstat_base_init(cgrp); /* for base stats */
css_rstat_init(css); /* for subsystems */

Finally, the number of nested conditionals in two functions above has been
reduced compared to the amount previously in css_rstat_init/exit(). This
could help improve code readability.

[0] https://lore.kernel.org/all/684c5802.a00a0220.279073.0016.GAE@google.com/

JP Kobryn (5):
  cgroup: add exclusive css rstat init/exit api for base stats
  cgroup: check for rstat flush callback at css rstat init/exit call
    sites
  cgroup: split init_and_link_css()
  cgroup: initialize css rstat before linking to cgroups in css_create()
  cgroup: break up the internal rstat init/exit logic by subsys and base

 kernel/cgroup/cgroup-internal.h |  2 +
 kernel/cgroup/cgroup.c          | 57 ++++++++++++++++++----------
 kernel/cgroup/rstat.c           | 67 +++++++++++++++++----------------
 3 files changed, 74 insertions(+), 52 deletions(-)

-- 
2.47.1
Re: [PATCH 0/5 cgroup/for-6.16-fixes] harden css_create() for safe placement of call to css_rstat_init()
Posted by Michal Koutný 2 months, 1 week ago
Hi.

On Mon, Jul 21, 2025 at 06:40:25PM -0700, JP Kobryn <inwardvessel@gmail.com> wrote:
...

Thanks for the instructive summary!

> If any of the three goto jumps are taken, async cleanup will begin and
> css_rstat_exit() will be invoked. But since css_rstat_init() would not have
> succeeded, the warning will eventually be reached.

First thought is why not simply add a flag that'd guide whether
css_rstat_exit() has work to do.

This is meant as a fix, so it should have some metadata, I'd consider this one:
Fixes: 5da3bfa029d68 ("cgroup: use separate rstat trees for each subsystem")

(that's when css_rstat_init was moved to css_create)

and likely this
Reported-by: syzbot+8d052e8b99e40bc625ed@syzkaller.appspotmail.com

(Sorry for being such a bureaucrat.)

It's most appropriate in your 4/5 but do you think it'd be possible to
reshuffle the series to put the fix in front (to ease it for stable
kernels) and refactorings after?

Regards,
Michal
Re: [PATCH 0/5 cgroup/for-6.16-fixes] harden css_create() for safe placement of call to css_rstat_init()
Posted by JP Kobryn 2 months, 1 week ago
Thanks for taking a look Michal.

On 7/25/25 10:23 AM, Michal Koutný wrote:
> Hi.
>
> On Mon, Jul 21, 2025 at 06:40:25PM -0700, JP Kobryn <inwardvessel@gmail.com> wrote:
> ...
>
> Thanks for the instructive summary!
>
>> If any of the three goto jumps are taken, async cleanup will begin and
>> css_rstat_exit() will be invoked. But since css_rstat_init() would not have
>> succeeded, the warning will eventually be reached.
> First thought is why not simply add a flag that'd guide whether
> css_rstat_exit() has work to do.
I did consider adding an "initialized" flag to the css but since there 
can be multiple css's per
cgroup it felt like it would be adding overhead. So I went the path of 
getting the call
sequence right. I'm open to feedback on this, though.
>
> This is meant as a fix, so it should have some metadata, I'd consider this one:
> Fixes: 5da3bfa029d68 ("cgroup: use separate rstat trees for each subsystem")
>
> (that's when css_rstat_init was moved to css_create)
>
> and likely this
> Reported-by: syzbot+8d052e8b99e40bc625ed@syzkaller.appspotmail.com
>
> (Sorry for being such a bureaucrat.)
No problem, I overlooked that.
> It's most appropriate in your 4/5 but do you think it'd be possible to
> reshuffle the series to put the fix in front (to ease it for stable
> kernels) and refactorings after?
Let me give that a try. As it is right now, patches 1-3 are pre-reqs for 
4. I can try to get the
actual fix to the front and then add patches to additionally make 
nicer/refactor.
Re: [PATCH 0/5 cgroup/for-6.16-fixes] harden css_create() for safe placement of call to css_rstat_init()
Posted by Michal Koutný 2 months, 1 week ago
On Mon, Jul 28, 2025 at 11:04:56AM -0700, JP Kobryn <inwardvessel@gmail.com> wrote:
> I did consider adding an "initialized" flag to the css but since there can
> be multiple css's per
> cgroup it felt like it would be adding overhead. So I went the path of
> getting the call
> sequence right. I'm open to feedback on this, though.

An implicit flag that builds upon the assumption that css_rstat_init()
must only succeed after it allocates ->rstat_cpu (didn't check gotchas
of this approach with !CONFIG_SMP)

--- a/kernel/cgroup/rstat.c
+++ b/kernel/cgroup/rstat.c
@@ -488,6 +488,10 @@ void css_rstat_exit(struct cgroup_subsys_state *css)
        if (!css_uses_rstat(css))
                return;

+       /* Incomplete css whose css_rstat_init failed */
+       if (!css->rstat_cpu)
+               return;
+
        css_rstat_flush(css);

        /* sanity check */
Re: [PATCH 0/5 cgroup/for-6.16-fixes] harden css_create() for safe placement of call to css_rstat_init()
Posted by JP Kobryn 2 months, 1 week ago
On 7/29/25 2:42 AM, Michal Koutný wrote:
> On Mon, Jul 28, 2025 at 11:04:56AM -0700, JP Kobryn <inwardvessel@gmail.com> wrote:
>> I did consider adding an "initialized" flag to the css but since there can
>> be multiple css's per
>> cgroup it felt like it would be adding overhead. So I went the path of
>> getting the call
>> sequence right. I'm open to feedback on this, though.
> 
> An implicit flag that builds upon the assumption that css_rstat_init()
> must only succeed after it allocates ->rstat_cpu (didn't check gotchas
> of this approach with !CONFIG_SMP)

I think this can work. This can probably be the early fix and then the
refactoring can follow.