[PATCH v2 00/10] Add kernel cmdline option for rt_group_sched

Michal Koutný posted 10 patches 9 months, 1 week ago
.../admin-guide/kernel-parameters.txt         |  5 ++
init/Kconfig                                  | 11 +++
kernel/sched/core.c                           | 70 +++++++++++++++----
kernel/sched/rt.c                             | 51 +++++++++-----
kernel/sched/sched.h                          | 34 +++++++--
kernel/sched/syscalls.c                       |  5 +-
6 files changed, 138 insertions(+), 38 deletions(-)
[PATCH v2 00/10] Add kernel cmdline option for rt_group_sched
Posted by Michal Koutný 9 months, 1 week ago
Despite RT_GROUP_SCHED is only available on cgroup v1, there are still
some (v1-bound) users of this feature. General purpose distros (e.g.
[1][2][3][4]) cannot enable CONFIG_RT_GROUP_SCHED easily:
- since it prevents creation of RT tasks unless RT runtime is determined
  and distributed into cgroup tree,
- grouping of RT threads is not what is desired by default on such
  systems,
- it prevents use of cgroup v2 with RT tasks.

This changeset aims at deferring the decision whether to have
CONFIG_RT_GROUP_SCHED or not up until the boot time.
By default RT groups are available as originally but the user can
pass rt_group_sched=0 kernel cmdline parameter that disables the
grouping and behavior is like with !CONFIG_RT_GROUP_SCHED (with certain
runtime overhead).

The series is organized as follows:

1) generic ifdefs cleanup, no functional changes,
2) preparing root_task_group to be used in places that take shortcuts in
   the case of !CONFIG_RT_GROUP_SCHED,
3) boot cmdline option that controls cgroup (v1) attributes,
4) conditional bypass of non-root task groups,
5) checks and comments refresh.

The crux are patches:
  sched: Skip non-root task_groups with disabled RT_GROUP
  sched: Bypass bandwitdh checks with runtime disabled RT_GROUP_SCHED

Further notes:
- it is not sched_feat() flag because that can be flipped any time
- runtime disablement is not implemented as infinite per-cgroup RT limit
  since that'd still employ group scheduling which is unlike
  !CONFIG_RT_GROUP_SCHED
- there remain two variants of various functions for
  CONFIG_RT_GROUP_SCHED and !CONFIG_RT_GROUP_SCHED, those could be
  folded into one and runtime evaluated guards in the folded functions
  could be used (I haven't posted it yet due to unclear performance
  benefit)
- I noticed some lockdep issues over rt_runtime_lock but those are also
  in an unpatched kernel (and they seem to have been present since a
  long time with CONFIG_RT_GROUP_SCHED)

Changes from RFC (https://lore.kernel.org/r/20241216201305.19761-1-mkoutny@suse.com/):
- fix macro CONFIG_RT_GROUP_SCHED_DEFAULT_DISABLED invocation
- rebase on torvalds/master

Changes from v1 (https://lore.kernel.org/all/20250210151239.50055-1-mkoutny@suse.com/)
- add runtime deprecation warning

[1] Debian (https://salsa.debian.org/kernel-team/linux/-/blob/debian/latest/debian/config/kernelarch-x86/config),
[2] ArchLinux (https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/blob/main/config),
[3] Fedora (https://src.fedoraproject.org/rpms/kernel/blob/rawhide/f/kernel-x86_64-fedora.config)
[4] openSUSE TW (https://github.com/SUSE/kernel-source/blob/stable/config/x86_64/default)

Michal Koutný (10):
  sched: Convert CONFIG_RT_GROUP_SCHED macros to code conditions
  sched: Remove unneeed macro wrap
  sched: Always initialize rt_rq's task_group
  sched: Add commadline option for RT_GROUP_SCHED toggling
  sched: Skip non-root task_groups with disabled RT_GROUP_SCHED
  sched: Bypass bandwitdh checks with runtime disabled RT_GROUP_SCHED
  sched: Do not construct nor expose RT_GROUP_SCHED structures if
    disabled
  sched: Add RT_GROUP WARN checks for non-root task_groups
  sched: Add annotations to RT_GROUP_SCHED fields
  sched: Add deprecation warning for users of RT_GROUP_SCHED

 .../admin-guide/kernel-parameters.txt         |  5 ++
 init/Kconfig                                  | 11 +++
 kernel/sched/core.c                           | 70 +++++++++++++++----
 kernel/sched/rt.c                             | 51 +++++++++-----
 kernel/sched/sched.h                          | 34 +++++++--
 kernel/sched/syscalls.c                       |  5 +-
 6 files changed, 138 insertions(+), 38 deletions(-)


base-commit: 69e858e0b8b2ea07759e995aa383e8780d9d140c
-- 
2.48.1

Re: [PATCH v2 00/10] Add kernel cmdline option for rt_group_sched
Posted by Peter Zijlstra 8 months, 2 weeks ago
On Mon, Mar 10, 2025 at 06:04:32PM +0100, Michal Koutný wrote:
> Despite RT_GROUP_SCHED is only available on cgroup v1, there are still
> some (v1-bound) users of this feature. General purpose distros (e.g.
> [1][2][3][4]) cannot enable CONFIG_RT_GROUP_SCHED easily:
> - since it prevents creation of RT tasks unless RT runtime is determined
>   and distributed into cgroup tree,
> - grouping of RT threads is not what is desired by default on such
>   systems,
> - it prevents use of cgroup v2 with RT tasks.
> 
> This changeset aims at deferring the decision whether to have
> CONFIG_RT_GROUP_SCHED or not up until the boot time.
> By default RT groups are available as originally but the user can
> pass rt_group_sched=0 kernel cmdline parameter that disables the
> grouping and behavior is like with !CONFIG_RT_GROUP_SCHED (with certain
> runtime overhead).
> 
> The series is organized as follows:

Right, so at OSPM we had a proposal for a cgroup-v2 variant of all this
that's based on deadline servers. And I am hoping we can eventually
either fully deprecate the v1 thing or re-implement it sufficiently
close without breaking the interface.

But this is purely about enabling cgroup-v1 usage, right?

You meantion some overhead of having this on, is that measured and in
the patches?

Anyway, I'll go have a peek now, finally :-)
Re: [PATCH v2 00/10] Add kernel cmdline option for rt_group_sched
Posted by Michal Koutný 8 months, 2 weeks ago
On Tue, Apr 01, 2025 at 01:05:08PM +0200, Peter Zijlstra <peterz@infradead.org> wrote:
> > By default RT groups are available as originally but the user can
> > pass rt_group_sched=0 kernel cmdline parameter that disables the
> > grouping and behavior is like with !CONFIG_RT_GROUP_SCHED (with certain
> > runtime overhead).
> > 
> ...
> 
> Right, so at OSPM we had a proposal for a cgroup-v2 variant of all this
> that's based on deadline servers.

Interesting, are there any slides or recording available?

> And I am hoping we can eventually either fully deprecate the v1 thing
> or re-implement it sufficiently close without breaking the interface.

I converged to discourate rt_groups for these reasons:
1) They aren't RT guarantee for workloads
  - especially when it's possible to configure different periods
2) They aren't containment of RT tasks
  - RT task throttled in a group may hold a shared resource and thus its
    issues propagate to RT tasks in different groups
3) The allocation model [2] is difficult to configure
  - to honor delegation and reasonable default
  - illustration of another allocation model resource are cpuset cpus,
    whose abstraction in cgroup v2 is quite sophisticated

Based on that, I'm not proponent of any RT groups support in cgroup v2
(I'd need to see a use case where it could be justified). IIUC, the
deadline servers could help with 1).

> But this is purely about enabling cgroup-v1 usage, right?

Yes, users need to explicitly be on cgroup v1 (IOW they're stuck on v1
because of reliance on RT groups).

> You meantion some overhead of having this on, is that measured and in
> the patches?

I expect most would be affected RT task users who go from
!CONFIG_RT_GROUP_SCHED to CONFIG_RT_GROUP_SCHED and
CONFIG_RT_GROUP_SCHED_DEFAULT_DISABLED. That's my perception from code
that I touched but I haven't measured anything. Would this be
an interesting datum?

Thanks,
Michal

[1] https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#allocations
Re: [PATCH v2 00/10] Add kernel cmdline option for rt_group_sched
Posted by Juri Lelli 8 months, 1 week ago
Hi Michal,

On 03/04/25 14:17, Michal Koutný wrote:
> On Tue, Apr 01, 2025 at 01:05:08PM +0200, Peter Zijlstra <peterz@infradead.org> wrote:
> > > By default RT groups are available as originally but the user can
> > > pass rt_group_sched=0 kernel cmdline parameter that disables the
> > > grouping and behavior is like with !CONFIG_RT_GROUP_SCHED (with certain
> > > runtime overhead).
> > > 
> > ...
> > 
> > Right, so at OSPM we had a proposal for a cgroup-v2 variant of all this
> > that's based on deadline servers.
> 
> Interesting, are there any slides or recording available?

Yes, here (freshly uploaded :)

https://youtu.be/1-s8YU3Rzts?si=c4H0jZl4_5bq8pI9

Best,
Juri
Re: [PATCH v2 00/10] Add kernel cmdline option for rt_group_sched
Posted by Michal Koutný 8 months, 3 weeks ago
Hello.

On Mon, Mar 10, 2025 at 06:04:32PM +0100, Michal Koutný <mkoutny@suse.com> wrote:
...
> Changes from v1 (https://lore.kernel.org/all/20250210151239.50055-1-mkoutny@suse.com/)
> - add runtime deprecation warning

Peter, has this addition made the boot-time configurability less
dreadful (until legacy users can migrate to something better)?

Thanks,
Michal