Documentation/scheduler/sched-debug.rst | 2 +- Documentation/scheduler/sched-design-CFS.rst | 2 +- Documentation/scheduler/sched-domains.rst | 5 +- Documentation/scheduler/sched-ext.rst | 3 +- Documentation/scheduler/sched-stats.rst | 2 +- Documentation/translations/sp_SP/scheduler/sched-design-CFS.rst | 2 +- fs/proc/base.c | 7 --- include/linux/energy_model.h | 2 - include/linux/sched/debug.h | 2 - include/linux/sched/topology.h | 4 -- include/trace/events/sched.h | 2 - kernel/sched/build_utility.c | 4 +- kernel/sched/core.c | 46 ++++++---------- kernel/sched/core_sched.c | 2 +- kernel/sched/deadline.c | 14 +++-- kernel/sched/ext.c | 2 +- kernel/sched/fair.c | 64 +++++++++++----------- kernel/sched/rt.c | 7 +-- kernel/sched/sched.h | 83 +++++------------------------ kernel/sched/stats.h | 2 +- kernel/sched/topology.c | 13 ----- lib/Kconfig.debug | 9 ---- 22 files changed, 79 insertions(+), 200 deletions(-)
For more than a decade, CONFIG_SCHED_DEBUG=y has been enabled in all the major Linux distributions: /boot/config-6.11.0-19-generic:CONFIG_SCHED_DEBUG=y The reason is that while originally CONFIG_SCHED_DEBUG started out as a debugging feature, over the years (decades ...) it has grown various bits of statistics, instrumentation and control knobs that are useful for sysadmin and general software development purposes as well. But within the kernel we still pretend that there's a choice, and sometimes code that is seemingly 'debug only' creates overhead that should be optimized in reality. So make it all official and make CONFIG_SCHED_DEBUG unconditional. This gets rid of a large amount of #ifdefs, so good riddance ... Ingo Molnar (5): sched/debug: Change SCHED_WARN_ON() to WARN_ON_ONCE() sched/debug: Make 'const_debug' tunables unconditional __read_mostly sched/debug: Make CONFIG_SCHED_DEBUG functionality unconditional sched/debug, Documentation: Remove (most) CONFIG_SCHED_DEBUG references from documentation sched/debug: Remove CONFIG_SCHED_DEBUG Documentation/scheduler/sched-debug.rst | 2 +- Documentation/scheduler/sched-design-CFS.rst | 2 +- Documentation/scheduler/sched-domains.rst | 5 +- Documentation/scheduler/sched-ext.rst | 3 +- Documentation/scheduler/sched-stats.rst | 2 +- Documentation/translations/sp_SP/scheduler/sched-design-CFS.rst | 2 +- fs/proc/base.c | 7 --- include/linux/energy_model.h | 2 - include/linux/sched/debug.h | 2 - include/linux/sched/topology.h | 4 -- include/trace/events/sched.h | 2 - kernel/sched/build_utility.c | 4 +- kernel/sched/core.c | 46 ++++++---------- kernel/sched/core_sched.c | 2 +- kernel/sched/deadline.c | 14 +++-- kernel/sched/ext.c | 2 +- kernel/sched/fair.c | 64 +++++++++++----------- kernel/sched/rt.c | 7 +-- kernel/sched/sched.h | 83 +++++------------------------ kernel/sched/stats.h | 2 +- kernel/sched/topology.c | 13 ----- lib/Kconfig.debug | 9 ---- 22 files changed, 79 insertions(+), 200 deletions(-) -- 2.45.2
On 17/03/25 11:42, Ingo Molnar wrote: > For more than a decade, CONFIG_SCHED_DEBUG=y has been enabled > in all the major Linux distributions: > > /boot/config-6.11.0-19-generic:CONFIG_SCHED_DEBUG=y > > The reason is that while originally CONFIG_SCHED_DEBUG started > out as a debugging feature, over the years (decades ...) it has > grown various bits of statistics, instrumentation and > control knobs that are useful for sysadmin and general software > development purposes as well. > > But within the kernel we still pretend that there's a choice, > and sometimes code that is seemingly 'debug only' creates overhead > that should be optimized in reality. > > So make it all official and make CONFIG_SCHED_DEBUG unconditional. > This gets rid of a large amount of #ifdefs, so good riddance ... > Pretty much every distro I'm aware of has CONFIG_SCHED_DEBUG=y; a quick check tells me it's been like so for RHEL since at least 2013, and that's from a commit copying configs from RHEL-6 to RHEL-7. Two things however come to mind: 1) What does this mean for the debug stuff we've repeatedly said wasn't ABI because it was under CONFIG_SCHED_DEBUG? I've been burned by making sched_domain.flags read-only, and there's still writable stuff: # ls -al /sys/kernel/debug/sched/domains/cpu0/domain0/ total 0 drwxr-xr-x. 2 root root 0 Mar 19 04:36 . drwxr-xr-x. 3 root root 0 Mar 19 04:36 .. -rw-r--r--. 1 root root 0 Mar 19 04:36 busy_factor -rw-r--r--. 1 root root 0 Mar 19 04:36 cache_nice_tries -r--r--r--. 1 root root 0 Mar 19 04:36 flags -r--r--r--. 1 root root 0 Mar 19 04:36 groups_flags -rw-r--r--. 1 root root 0 Mar 19 04:36 imbalance_pct -r--r--r--. 1 root root 0 Mar 19 04:36 level -rw-r--r--. 1 root root 0 Mar 19 04:36 max_interval -rw-r--r--. 1 root root 0 Mar 19 04:36 max_newidle_lb_cost -rw-r--r--. 1 root root 0 Mar 19 04:36 min_interval -r--r--r--. 1 root root 0 Mar 19 04:36 name + all the non topology related debug knobs. 2) Peter mentioned a few times that, last time it was benchmarked, there were noticeable perf differences between CONFIG_SCHED_DEBUG=n and CONFIG_SCHED_DEBUG=y. This would be an occasion to re-measure that and potentially move (some of) these checks to e.g. a sched_debug_verbose static key.
* Valentin Schneider <vschneid@redhat.com> wrote: > On 17/03/25 11:42, Ingo Molnar wrote: > > For more than a decade, CONFIG_SCHED_DEBUG=y has been enabled > > in all the major Linux distributions: > > > > /boot/config-6.11.0-19-generic:CONFIG_SCHED_DEBUG=y > > > > The reason is that while originally CONFIG_SCHED_DEBUG started > > out as a debugging feature, over the years (decades ...) it has > > grown various bits of statistics, instrumentation and > > control knobs that are useful for sysadmin and general software > > development purposes as well. > > > > But within the kernel we still pretend that there's a choice, > > and sometimes code that is seemingly 'debug only' creates overhead > > that should be optimized in reality. > > > > So make it all official and make CONFIG_SCHED_DEBUG unconditional. > > This gets rid of a large amount of #ifdefs, so good riddance ... > > > > Pretty much every distro I'm aware of has CONFIG_SCHED_DEBUG=y; a quick check > tells me it's been like so for RHEL since at least 2013, and that's from a > commit copying configs from RHEL-6 to RHEL-7. > > Two things however come to mind: > > 1) What does this mean for the debug stuff we've repeatedly said wasn't ABI > because it was under CONFIG_SCHED_DEBUG? I've been burned by making > sched_domain.flags read-only, and there's still writable stuff: > > # ls -al /sys/kernel/debug/sched/domains/cpu0/domain0/ > total 0 > drwxr-xr-x. 2 root root 0 Mar 19 04:36 . > drwxr-xr-x. 3 root root 0 Mar 19 04:36 .. > -rw-r--r--. 1 root root 0 Mar 19 04:36 busy_factor > -rw-r--r--. 1 root root 0 Mar 19 04:36 cache_nice_tries > -r--r--r--. 1 root root 0 Mar 19 04:36 flags > -r--r--r--. 1 root root 0 Mar 19 04:36 groups_flags > -rw-r--r--. 1 root root 0 Mar 19 04:36 imbalance_pct > -r--r--r--. 1 root root 0 Mar 19 04:36 level > -rw-r--r--. 1 root root 0 Mar 19 04:36 max_interval > -rw-r--r--. 1 root root 0 Mar 19 04:36 max_newidle_lb_cost > -rw-r--r--. 1 root root 0 Mar 19 04:36 min_interval > -r--r--r--. 1 root root 0 Mar 19 04:36 name > > + all the non topology related debug knobs. Yeah, I don't think these or other sysctls are as contentious as previously thought. We might want to put '/debug/' into the directory name above, or we could move it over to debugfs entirely - but we should make it clear via the name that these are debugging knobs in essence. > 2) Peter mentioned a few times that, last time it was benchmarked, there > were noticeable perf differences between CONFIG_SCHED_DEBUG=n and > CONFIG_SCHED_DEBUG=y. This would be an occasion to re-measure that and > potentially move (some of) these checks to e.g. a sched_debug_verbose > static key. Yeah, and this is an argument strongly *in favor* of eliminating CONFIG_SCHED_DEBUG: in a way the CONFIG_SCHED_DEBUG "option" created a false sense of "it's only debug code". But it's not a genuine debug option, it's actual overhead for the vast majority of Linux distros and users. So let's just eliminate SCHED_DEBUG, and fix any overhead. It's exactly what we should do anyway - nothing changes IMHO, just the appearance of urgency. :-) Thanks, Ingo
Ingo, please fix your mail setup..
These were all in my spam-box, because you used
From: Ingo Molnar <mingo@kernel.org>
but sent it using gmail, so the DKIM signature looks like
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=gmail.com; [...]
and then that results in
dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=QUARANTINE)
header.from=kernel.org;
because the DKIM signature - while a valid signature for gmail - does
not match the kernel.org signature.
So you need to use 'mail.kernel.org' to send the email to get the
right signature, as documented in
https://korg.docs.kernel.org/mail.html
otherwise any sane setup will mark all those things as spam.
Linus
* Linus Torvalds <torvalds@linux-foundation.org> wrote: > Ingo, please fix your mail setup.. > > These were all in my spam-box, because you used > > From: Ingo Molnar <mingo@kernel.org> > > but sent it using gmail, so the DKIM signature looks like > > DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; > d=gmail.com; [...] > > and then that results in > > dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=QUARANTINE) > header.from=kernel.org; > > because the DKIM signature - while a valid signature for gmail - does > not match the kernel.org signature. > > So you need to use 'mail.kernel.org' to send the email to get the > right signature, as documented in > > https://korg.docs.kernel.org/mail.html > > otherwise any sane setup will mark all those things as spam. Sorry about that! (And I just sent out another series with the same flawed script ...) I thought I have fixed that all up, but apparently only for my main Mutt setup, not for some of my older Git-patchbomb scripts that used .gitconfig's [sendemail]. :-/ Thanks, Ingo
* Ingo Molnar <mingo@kernel.org> wrote: > > * Linus Torvalds <torvalds@linux-foundation.org> wrote: > > > Ingo, please fix your mail setup.. > > > > These were all in my spam-box, because you used > > > > From: Ingo Molnar <mingo@kernel.org> > > > > but sent it using gmail, so the DKIM signature looks like > > > > DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; > > d=gmail.com; [...] > > > > and then that results in > > > > dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=QUARANTINE) > > header.from=kernel.org; > > > > because the DKIM signature - while a valid signature for gmail - does > > not match the kernel.org signature. > > > > So you need to use 'mail.kernel.org' to send the email to get the > > right signature, as documented in > > > > https://korg.docs.kernel.org/mail.html > > > > otherwise any sane setup will mark all those things as spam. > > Sorry about that! > > (And I just sent out another series with the same flawed script ...) > > I thought I have fixed that all up, but apparently only for my main > Mutt setup, not for some of my older Git-patchbomb scripts that used > .gitconfig's [sendemail]. :-/ BTW., the reason I didn't notice this sooner is because I read lkml via a local maildir representation of the Lore Git archive (all hail Konstantin), where these mails showed up just fine. Thanks, Ingo
On 3/17/25 16:12, Ingo Molnar wrote: > For more than a decade, CONFIG_SCHED_DEBUG=y has been enabled > in all the major Linux distributions: > > /boot/config-6.11.0-19-generic:CONFIG_SCHED_DEBUG=y > > The reason is that while originally CONFIG_SCHED_DEBUG started > out as a debugging feature, over the years (decades ...) it has > grown various bits of statistics, instrumentation and > control knobs that are useful for sysadmin and general software > development purposes as well. A tunable like base_slice which is the only tunable available for EEVDF is under the debug. So an option is to get rid of CONFIG_SCHED_DEBUG and make it available to all. We had seen performance regression when domains folder was built with cpu hotplug. Later that was moved iff verbose was enabled. Maybe something like that can be done if something is hurting performance. > > But within the kernel we still pretend that there's a choice, > and sometimes code that is seemingly 'debug only' creates overhead > that should be optimized in reality. > > So make it all official and make CONFIG_SCHED_DEBUG unconditional. > This gets rid of a large amount of #ifdefs, so good riddance ... > There are some references in selftest like these, maybe remove them as well? tools/testing/selftests/sched_ext/config:CONFIG_SCHED_DEBUG=y tools/testing/selftests/sched/config:CONFIG_SCHED_DEBUG=y tools/testing/selftests/wireguard/qemu/debug.config:CONFIG_SCHED_DEBUG=y Also ran unixbench and hackbench on 80 CPU system (1NUMA) with and without CONFIG_SCHED_DEBUG. hackbench numbers are almost the same. for unixbench, process creation/Context Switching show 1-2% improvement with CONFIG_SCHED_DEBUG=n
* Shrikanth Hegde <sshegde@linux.ibm.com> wrote: > > > On 3/17/25 16:12, Ingo Molnar wrote: > > For more than a decade, CONFIG_SCHED_DEBUG=y has been enabled > > in all the major Linux distributions: > > > > /boot/config-6.11.0-19-generic:CONFIG_SCHED_DEBUG=y > > > > The reason is that while originally CONFIG_SCHED_DEBUG started > > out as a debugging feature, over the years (decades ...) it has > > grown various bits of statistics, instrumentation and > > control knobs that are useful for sysadmin and general software > > development purposes as well. > > A tunable like base_slice which is the only tunable available for EEVDF is under the debug. > So an option is to get rid of CONFIG_SCHED_DEBUG and make it available to all. > > We had seen performance regression when domains folder was built with cpu hotplug. > Later that was moved iff verbose was enabled. Maybe something like that can be done > if something is hurting performance. > > > > > But within the kernel we still pretend that there's a choice, > > and sometimes code that is seemingly 'debug only' creates overhead > > that should be optimized in reality. > > > > So make it all official and make CONFIG_SCHED_DEBUG unconditional. > > This gets rid of a large amount of #ifdefs, so good riddance ... > > > > There are some references in selftest like these, maybe remove them as well? > > tools/testing/selftests/sched_ext/config:CONFIG_SCHED_DEBUG=y > tools/testing/selftests/sched/config:CONFIG_SCHED_DEBUG=y > tools/testing/selftests/wireguard/qemu/debug.config:CONFIG_SCHED_DEBUG=y Indeed - fixed. I left out all the defconfigs from the patches, because there's a lot of them (~79 reference CONFIG_SCHED_DEBUG ...) and they get refreshed naturally in any case. > Also ran unixbench and hackbench on 80 CPU system (1NUMA) with and > without CONFIG_SCHED_DEBUG. hackbench numbers are almost the same. > > for unixbench, process creation/Context Switching show 1-2% > improvement with CONFIG_SCHED_DEBUG=n Thank you for the testing! I'll add: Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com> to the series if you don't mind. And irrespectively of this series we should probably look at that 1-2% overhead in unixbench context switching overhead, maybe there's a few low hanging fruits in the debug code. Ingo
On 3/20/25 02:44, Ingo Molnar wrote: > > * Shrikanth Hegde <sshegde@linux.ibm.com> wrote: > >> >> >> On 3/17/25 16:12, Ingo Molnar wrote: >>> For more than a decade, CONFIG_SCHED_DEBUG=y has been enabled >>> in all the major Linux distributions: >>> >>> /boot/config-6.11.0-19-generic:CONFIG_SCHED_DEBUG=y >>> >>> The reason is that while originally CONFIG_SCHED_DEBUG started >>> out as a debugging feature, over the years (decades ...) it has >>> grown various bits of statistics, instrumentation and >>> control knobs that are useful for sysadmin and general software >>> development purposes as well. >> >> A tunable like base_slice which is the only tunable available for EEVDF is under the debug. >> So an option is to get rid of CONFIG_SCHED_DEBUG and make it available to all. >> >> We had seen performance regression when domains folder was built with cpu hotplug. >> Later that was moved iff verbose was enabled. Maybe something like that can be done >> if something is hurting performance. >> >>> >>> But within the kernel we still pretend that there's a choice, >>> and sometimes code that is seemingly 'debug only' creates overhead >>> that should be optimized in reality. >>> >>> So make it all official and make CONFIG_SCHED_DEBUG unconditional. >>> This gets rid of a large amount of #ifdefs, so good riddance ... >>> >> >> There are some references in selftest like these, maybe remove them as well? >> >> tools/testing/selftests/sched_ext/config:CONFIG_SCHED_DEBUG=y >> tools/testing/selftests/sched/config:CONFIG_SCHED_DEBUG=y >> tools/testing/selftests/wireguard/qemu/debug.config:CONFIG_SCHED_DEBUG=y > > Indeed - fixed. > > I left out all the defconfigs from the patches, because there's a lot > of them (~79 reference CONFIG_SCHED_DEBUG ...) and they get refreshed > naturally in any case. > >> Also ran unixbench and hackbench on 80 CPU system (1NUMA) with and >> without CONFIG_SCHED_DEBUG. hackbench numbers are almost the same. >> >> for unixbench, process creation/Context Switching show 1-2% >> improvement with CONFIG_SCHED_DEBUG=n > > Thank you for the testing! I'll add: > > Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com> > > to the series if you don't mind. It was minimal testing. If that suffices i am okay with the tag. > > And irrespectively of this series we should probably look at that 1-2% > overhead in unixbench context switching overhead, maybe there's a few > low hanging fruits in the debug code. > > Ingo ok. Let me see perf record and see what shows up.
The following commit has been merged into the sched/core branch of tip:
Commit-ID: 14d281db78b2e5af1bdce793910ce1ea74520d05
Gitweb: https://git.kernel.org/tip/14d281db78b2e5af1bdce793910ce1ea74520d05
Author: Ingo Molnar <mingo@kernel.org>
AuthorDate: Wed, 19 Mar 2025 22:13:15 +01:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Wed, 19 Mar 2025 22:23:24 +01:00
sched/debug: Remove CONFIG_SCHED_DEBUG from self-test config files
We leave most of the defconfigs alone (there's over 70 of them),
but let's remove CONFIG_SCHED_DEBUG from the scheduler self-test
Kconfig files.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/Z9szt3MpQmQ56TRd@gmail.com
---
tools/testing/selftests/sched/config | 2 +-
tools/testing/selftests/sched_ext/config | 1 -
tools/testing/selftests/wireguard/qemu/debug.config | 1 -
3 files changed, 1 insertion(+), 3 deletions(-)
diff --git a/tools/testing/selftests/sched/config b/tools/testing/selftests/sched/config
index e8b09aa..1bb8bf6 100644
--- a/tools/testing/selftests/sched/config
+++ b/tools/testing/selftests/sched/config
@@ -1 +1 @@
-CONFIG_SCHED_DEBUG=y
+# empty
diff --git a/tools/testing/selftests/sched_ext/config b/tools/testing/selftests/sched_ext/config
index 0de9b4e..aa901b0 100644
--- a/tools/testing/selftests/sched_ext/config
+++ b/tools/testing/selftests/sched_ext/config
@@ -1,4 +1,3 @@
-CONFIG_SCHED_DEBUG=y
CONFIG_SCHED_CLASS_EXT=y
CONFIG_CGROUPS=y
CONFIG_CGROUP_SCHED=y
diff --git a/tools/testing/selftests/wireguard/qemu/debug.config b/tools/testing/selftests/wireguard/qemu/debug.config
index 139fd9a..c305d2f 100644
--- a/tools/testing/selftests/wireguard/qemu/debug.config
+++ b/tools/testing/selftests/wireguard/qemu/debug.config
@@ -27,7 +27,6 @@ CONFIG_DEBUG_KMEMLEAK=y
CONFIG_DEBUG_STACK_USAGE=y
CONFIG_DEBUG_SHIRQ=y
CONFIG_WQ_WATCHDOG=y
-CONFIG_SCHED_DEBUG=y
CONFIG_SCHED_INFO=y
CONFIG_SCHEDSTATS=y
CONFIG_SCHED_STACK_END_CHECK=y
© 2016 - 2025 Red Hat, Inc.