[RFC PATCH 0/9] Add kernel cmdline option for rt_group_sched

Michal Koutný posted 9 patches 1 year ago
There is a newer version of this series
.../admin-guide/kernel-parameters.txt         |  5 ++
init/Kconfig                                  | 11 +++
kernel/sched/core.c                           | 69 +++++++++++++++----
kernel/sched/rt.c                             | 51 +++++++++-----
kernel/sched/sched.h                          | 34 +++++++--
kernel/sched/syscalls.c                       |  5 +-
6 files changed, 137 insertions(+), 38 deletions(-)
[RFC PATCH 0/9] Add kernel cmdline option for rt_group_sched
Posted by Michal Koutný 1 year ago
Despite RT_GROUP_SCHED is only available on cgroup v1, there are still
some users of this feature. General purpose distros (e.g. [1][2][3][4])
cannot enable CONFIG_RT_GROUP_SCHED easily:
- since it prevents creation of RT tasks unless RT runtime is determined
  and distributed into cgroup tree,
- grouping of RT threads is not what is desired by default on such
  systems,
- it prevents use of cgroup v2 with RT tasks.

This changeset aims at deferring the decision whether to have
CONFIG_RT_GROUP_SCHED or not up until the boot time.
By default RT groups are available as originally but the user can
pass rt_group_sched=0 kernel cmdline parameter that disables the
grouping and behavior is like with !CONFIG_RT_GROUP_SCHED (with certain
runtime overhead).

The series is organized as follows:

1) generic ifdefs cleanup, no functional changes,
2) preparing root_task_group to be used in places that take shortcuts in
   the case of !CONFIG_RT_GROUP_SCHED,
3) boot cmdline option that controls cgroup (v1) attributes,
4) conditional bypass of non-root task groups,
5) checks and comments refresh.

The crux are patches:
  sched: Skip non-root task_groups with disabled RT_GROUP
  sched: Bypass bandwitdh checks with runtime disabled RT_GROUP_SCHED

Futher notes:
- it is not sched_feat() flag because that can be flipped any time
- runtime disablement is not implemented as infinite per-cgroup RT limit
  since that'd still employ group scheduling which is unlike
  !CONFIG_RT_GROUP_SCHED

RFC notes:
- there remain two variants of various functions for
  CONFIG_RT_GROUP_SCHED and !CONFIG_RT_GROUP_SCHED, those could be
  folded into one and runtime evaluated guards in the folded functions
  could be used (I haven't posted it yet due to unclear performance
  benefit)
- I noticed some lockdep issues over rt_runtime_lock but those are also
  in an unpatched kernel (and they seem to have been present since a
  long time without complications)

[1] Debian (https://salsa.debian.org/kernel-team/linux/-/blob/debian/latest/debian/config/kernelarch-x86/config),
[2] ArchLinux (https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/blob/main/config),
[3] Fedora (https://src.fedoraproject.org/rpms/kernel/blob/rawhide/f/kernel-x86_64-fedora.config)
[4] openSUSE TW (https://github.com/SUSE/kernel-source/blob/stable/config/x86_64/default)

Michal Koutný (9):
  sched: Convert CONFIG_RT_GROUP_SCHED macros to code conditions
  sched: Remove unneeed macro wrap
  sched: Always initialize rt_rq's task_group
  sched: Add commadline option for RT_GROUP_SCHED toggling
  sched: Skip non-root task_groups with disabled RT_GROUP_SCHED
  sched: Bypass bandwitdh checks with runtime disabled RT_GROUP_SCHED
  sched: Do not construct nor expose RT_GROUP_SCHED structures if
    disabled
  sched: Add RT_GROUP WARN checks for non-root task_groups
  sched: Add annotations to RT_GROUP_SCHED fields

 .../admin-guide/kernel-parameters.txt         |  5 ++
 init/Kconfig                                  | 11 +++
 kernel/sched/core.c                           | 69 +++++++++++++++----
 kernel/sched/rt.c                             | 51 +++++++++-----
 kernel/sched/sched.h                          | 34 +++++++--
 kernel/sched/syscalls.c                       |  5 +-
 6 files changed, 137 insertions(+), 38 deletions(-)


base-commit: f92f4749861b06fed908d336b4dee1326003291b
-- 
2.47.1

Re: [RFC PATCH 0/9] Add kernel cmdline option for rt_group_sched
Posted by Peter Zijlstra 11 months, 2 weeks ago
On Mon, Dec 16, 2024 at 09:12:56PM +0100, Michal Koutný wrote:
> Despite RT_GROUP_SCHED is only available on cgroup v1, there are still
> some users of this feature. General purpose distros (e.g. [1][2][3][4])
> cannot enable CONFIG_RT_GROUP_SCHED easily:

We all hate this thing and want it to go away. So not being able to use
it is a pro from where I'm at.

Sadly the replacement isn't there yet either, which makes it all really
difficult.
Re: [RFC PATCH 0/9] Add kernel cmdline option for rt_group_sched
Posted by Michal Koutný 11 months, 1 week ago
On Tue, Jan 07, 2025 at 08:41:06PM +0100, Peter Zijlstra <peterz@infradead.org> wrote:
> We all hate this thing and want it to go away. So not being able to use
> it is a pro from where I'm at.

I understand and to some extent am not a fan of it neither (we had
disabled it in SUSE quite some time ago). I'd consider the remaining
existing users legacy.

> Sadly the replacement isn't there yet either, which makes it all really
> difficult.

Exactly. Thus the runtime switch is meant as a bridge for general
purpose distros where a kernel is shipped pre-configured (i.e. one
config where the default is non-grouped not to hinder the majority use
cases).

Considering the legacy usecases on distribution kernels do you oppose
the chosen approach? I can work on changes if you have comments on the
implementation itself.

Thanks,
Michal

Re: [RFC PATCH 0/9] Add kernel cmdline option for rt_group_sched
Posted by Michal Koutný 11 months, 2 weeks ago
On Mon, Dec 16, 2024 at 09:12:56PM +0100, Michal Koutný <mkoutny@suse.com> wrote:
> The series is organized as follows:

(I saw no replies, this may have slipped through the turn of the year
period.)

> RFC notes:

So I wonder if there any initial comments on this change.

Thanks,
Michal