Documentation/scheduler/sched-ext.rst | 10 ++++++++++ kernel/sched/ext.c | 17 +++++++++++++++++ tools/sched_ext/scx_show_state.py | 1 + 3 files changed, 28 insertions(+)
From: Andrea Righi <andrea.righi@linux.dev>
As discussed during the distro-centric session within the sched_ext
Microconference at LPC 2024, introduce a sequence counter that is
incremented every time a BPF scheduler is loaded.
This feature can help distributions in diagnosing potential performance
regressions by identifying systems where users are running (or have ran)
custom BPF schedulers.
Example:
arighi@virtme-ng~> cat /sys/kernel/sched_ext/enable_seq
0
arighi@virtme-ng~> sudo scx_simple
local=1 global=0
^CEXIT: unregistered from user space
arighi@virtme-ng~> cat /sys/kernel/sched_ext/enable_seq
1
In this way user-space tools (such as Ubuntu's apport and similar) are
able to gather and include this information in bug reports.
Cc: Giovanni Gherdovich <giovanni.gherdovich@suse.com>
Cc: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Cc: Marcelo Henrique Cerri <marcelo.cerri@canonical.com>
Cc: Phil Auld <pauld@redhat.com>
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
---
Documentation/scheduler/sched-ext.rst | 10 ++++++++++
kernel/sched/ext.c | 17 +++++++++++++++++
tools/sched_ext/scx_show_state.py | 1 +
3 files changed, 28 insertions(+)
diff --git a/Documentation/scheduler/sched-ext.rst b/Documentation/scheduler/sched-ext.rst
index a707d2181a77..6c0d70e2e27d 100644
--- a/Documentation/scheduler/sched-ext.rst
+++ b/Documentation/scheduler/sched-ext.rst
@@ -83,6 +83,15 @@ The current status of the BPF scheduler can be determined as follows:
# cat /sys/kernel/sched_ext/root/ops
simple
+You can check if any BPF scheduler has ever been loaded since boot by examining
+this monotonically incrementing counter (a value of zero indicates that no BPF
+scheduler has been loaded):
+
+.. code-block:: none
+
+ # cat /sys/kernel/sched_ext/enable_seq
+ 1
+
``tools/sched_ext/scx_show_state.py`` is a drgn script which shows more
detailed information:
@@ -96,6 +105,7 @@ detailed information:
enable_state : enabled (2)
bypass_depth : 0
nr_rejected : 0
+ enable_seq : 1
If ``CONFIG_SCHED_DEBUG`` is set, whether a given task is on sched_ext can
be determined as follows:
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 9ee5a9a261cc..8057ab4c76da 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -874,6 +874,13 @@ static struct scx_exit_info *scx_exit_info;
static atomic_long_t scx_nr_rejected = ATOMIC_LONG_INIT(0);
static atomic_long_t scx_hotplug_seq = ATOMIC_LONG_INIT(0);
+/*
+ * A monotically increasing sequence number that is incremented every time a
+ * scheduler is enabled. This can be used by to check if any custom sched_ext
+ * scheduler has ever been used in the system.
+ */
+static atomic_long_t scx_enable_seq = ATOMIC_LONG_INIT(0);
+
/*
* The maximum amount of time in jiffies that a task may be runnable without
* being scheduled on a CPU. If this timeout is exceeded, it will trigger
@@ -4154,11 +4161,19 @@ static ssize_t scx_attr_hotplug_seq_show(struct kobject *kobj,
}
SCX_ATTR(hotplug_seq);
+static ssize_t scx_attr_enable_seq_show(struct kobject *kobj,
+ struct kobj_attribute *ka, char *buf)
+{
+ return sysfs_emit(buf, "%ld\n", atomic_long_read(&scx_enable_seq));
+}
+SCX_ATTR(enable_seq);
+
static struct attribute *scx_global_attrs[] = {
&scx_attr_state.attr,
&scx_attr_switch_all.attr,
&scx_attr_nr_rejected.attr,
&scx_attr_hotplug_seq.attr,
+ &scx_attr_enable_seq.attr,
NULL,
};
@@ -5176,6 +5191,8 @@ static int scx_ops_enable(struct sched_ext_ops *ops, struct bpf_link *link)
kobject_uevent(scx_root_kobj, KOBJ_ADD);
mutex_unlock(&scx_ops_enable_mutex);
+ atomic_long_inc(&scx_enable_seq);
+
return 0;
err_del:
diff --git a/tools/sched_ext/scx_show_state.py b/tools/sched_ext/scx_show_state.py
index d457d2a74e1e..8bc626ede1c4 100644
--- a/tools/sched_ext/scx_show_state.py
+++ b/tools/sched_ext/scx_show_state.py
@@ -37,3 +37,4 @@ print(f'switched_all : {read_static_key("__scx_switched_all")}')
print(f'enable_state : {ops_state_str(enable_state)} ({enable_state})')
print(f'bypass_depth : {read_atomic("scx_ops_bypass_depth")}')
print(f'nr_rejected : {read_atomic("scx_nr_rejected")}')
+print(f'enable_seq : {read_atomic("scx_enable_seq")}')
--
2.46.0
On Sat, Sep 21, 2024 at 09:39:21PM +0200, andrea.righi@linux.dev wrote: > From: Andrea Righi <andrea.righi@linux.dev> > > As discussed during the distro-centric session within the sched_ext > Microconference at LPC 2024, introduce a sequence counter that is > incremented every time a BPF scheduler is loaded. > > This feature can help distributions in diagnosing potential performance > regressions by identifying systems where users are running (or have ran) > custom BPF schedulers. > > Example: > > arighi@virtme-ng~> cat /sys/kernel/sched_ext/enable_seq > 0 > arighi@virtme-ng~> sudo scx_simple > local=1 global=0 > ^CEXIT: unregistered from user space > arighi@virtme-ng~> cat /sys/kernel/sched_ext/enable_seq > 1 > > In this way user-space tools (such as Ubuntu's apport and similar) are > able to gather and include this information in bug reports. > > Cc: Giovanni Gherdovich <giovanni.gherdovich@suse.com> > Cc: Kleber Sacilotto de Souza <kleber.souza@canonical.com> > Cc: Marcelo Henrique Cerri <marcelo.cerri@canonical.com> > Cc: Phil Auld <pauld@redhat.com> > Signed-off-by: Andrea Righi <andrea.righi@linux.dev> Applied to sched_ext/for-6.12-fixes. Thanks. -- tejun
Hi Andrea, On Sat, Sep 21, 2024 at 09:39:21PM +0200 andrea.righi@linux.dev wrote: > From: Andrea Righi <andrea.righi@linux.dev> > > As discussed during the distro-centric session within the sched_ext > Microconference at LPC 2024, introduce a sequence counter that is > incremented every time a BPF scheduler is loaded. > > This feature can help distributions in diagnosing potential performance > regressions by identifying systems where users are running (or have ran) > custom BPF schedulers. > > Example: > > arighi@virtme-ng~> cat /sys/kernel/sched_ext/enable_seq > 0 > arighi@virtme-ng~> sudo scx_simple > local=1 global=0 > ^CEXIT: unregistered from user space > arighi@virtme-ng~> cat /sys/kernel/sched_ext/enable_seq > 1 > > In this way user-space tools (such as Ubuntu's apport and similar) are > able to gather and include this information in bug reports. > > Cc: Giovanni Gherdovich <giovanni.gherdovich@suse.com> > Cc: Kleber Sacilotto de Souza <kleber.souza@canonical.com> > Cc: Marcelo Henrique Cerri <marcelo.cerri@canonical.com> > Cc: Phil Auld <pauld@redhat.com> > Signed-off-by: Andrea Righi <andrea.righi@linux.dev> Thanks for pulling this together. I am hopeful we can get it in a 6.12-rc. Reviewed-by: Phil Auld <pauld@redhat.com> Cheers, Phil > --- > Documentation/scheduler/sched-ext.rst | 10 ++++++++++ > kernel/sched/ext.c | 17 +++++++++++++++++ > tools/sched_ext/scx_show_state.py | 1 + > 3 files changed, 28 insertions(+) > > diff --git a/Documentation/scheduler/sched-ext.rst b/Documentation/scheduler/sched-ext.rst > index a707d2181a77..6c0d70e2e27d 100644 > --- a/Documentation/scheduler/sched-ext.rst > +++ b/Documentation/scheduler/sched-ext.rst > @@ -83,6 +83,15 @@ The current status of the BPF scheduler can be determined as follows: > # cat /sys/kernel/sched_ext/root/ops > simple > > +You can check if any BPF scheduler has ever been loaded since boot by examining > +this monotonically incrementing counter (a value of zero indicates that no BPF > +scheduler has been loaded): > + > +.. code-block:: none > + > + # cat /sys/kernel/sched_ext/enable_seq > + 1 > + > ``tools/sched_ext/scx_show_state.py`` is a drgn script which shows more > detailed information: > > @@ -96,6 +105,7 @@ detailed information: > enable_state : enabled (2) > bypass_depth : 0 > nr_rejected : 0 > + enable_seq : 1 > > If ``CONFIG_SCHED_DEBUG`` is set, whether a given task is on sched_ext can > be determined as follows: > diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c > index 9ee5a9a261cc..8057ab4c76da 100644 > --- a/kernel/sched/ext.c > +++ b/kernel/sched/ext.c > @@ -874,6 +874,13 @@ static struct scx_exit_info *scx_exit_info; > static atomic_long_t scx_nr_rejected = ATOMIC_LONG_INIT(0); > static atomic_long_t scx_hotplug_seq = ATOMIC_LONG_INIT(0); > > +/* > + * A monotically increasing sequence number that is incremented every time a > + * scheduler is enabled. This can be used by to check if any custom sched_ext > + * scheduler has ever been used in the system. > + */ > +static atomic_long_t scx_enable_seq = ATOMIC_LONG_INIT(0); > + > /* > * The maximum amount of time in jiffies that a task may be runnable without > * being scheduled on a CPU. If this timeout is exceeded, it will trigger > @@ -4154,11 +4161,19 @@ static ssize_t scx_attr_hotplug_seq_show(struct kobject *kobj, > } > SCX_ATTR(hotplug_seq); > > +static ssize_t scx_attr_enable_seq_show(struct kobject *kobj, > + struct kobj_attribute *ka, char *buf) > +{ > + return sysfs_emit(buf, "%ld\n", atomic_long_read(&scx_enable_seq)); > +} > +SCX_ATTR(enable_seq); > + > static struct attribute *scx_global_attrs[] = { > &scx_attr_state.attr, > &scx_attr_switch_all.attr, > &scx_attr_nr_rejected.attr, > &scx_attr_hotplug_seq.attr, > + &scx_attr_enable_seq.attr, > NULL, > }; > > @@ -5176,6 +5191,8 @@ static int scx_ops_enable(struct sched_ext_ops *ops, struct bpf_link *link) > kobject_uevent(scx_root_kobj, KOBJ_ADD); > mutex_unlock(&scx_ops_enable_mutex); > > + atomic_long_inc(&scx_enable_seq); > + > return 0; > > err_del: > diff --git a/tools/sched_ext/scx_show_state.py b/tools/sched_ext/scx_show_state.py > index d457d2a74e1e..8bc626ede1c4 100644 > --- a/tools/sched_ext/scx_show_state.py > +++ b/tools/sched_ext/scx_show_state.py > @@ -37,3 +37,4 @@ print(f'switched_all : {read_static_key("__scx_switched_all")}') > print(f'enable_state : {ops_state_str(enable_state)} ({enable_state})') > print(f'bypass_depth : {read_atomic("scx_ops_bypass_depth")}') > print(f'nr_rejected : {read_atomic("scx_nr_rejected")}') > +print(f'enable_seq : {read_atomic("scx_enable_seq")}') > -- > 2.46.0 > > --
Hello, Andrea. On Sat, Sep 21, 2024 at 09:39:21PM +0200, andrea.righi@linux.dev wrote: > static struct attribute *scx_global_attrs[] = { > &scx_attr_state.attr, > &scx_attr_switch_all.attr, > &scx_attr_nr_rejected.attr, > &scx_attr_hotplug_seq.attr, > + &scx_attr_enable_seq.attr, > NULL, > }; Can you put this in scx_sched_attrs instead as it probably would make sense to track this per-scheduler in the future when we support stacked schedulers. Thanks. -- tejun
Hi Tejun, On Sun, Sep 22, 2024 at 11:21:02AM -1000 Tejun Heo wrote: > Hello, Andrea. > > On Sat, Sep 21, 2024 at 09:39:21PM +0200, andrea.righi@linux.dev wrote: > > static struct attribute *scx_global_attrs[] = { > > &scx_attr_state.attr, > > &scx_attr_switch_all.attr, > > &scx_attr_nr_rejected.attr, > > &scx_attr_hotplug_seq.attr, > > + &scx_attr_enable_seq.attr, > > NULL, > > }; > > Can you put this in scx_sched_attrs instead as it probably would make sense > to track this per-scheduler in the future when we support stacked > schedulers. It's not a per scheduler counter, though. It's global. We want to know that a (any) scx scheduler has been loaded at some time in the past. It's really only interesting when 0 or > 0. The actual non-zero number and which scheduler(s) don't matter that much. And it needs to persist when the scheduler is unloaded (I didn't look but I uspect the per scheduler attrs come and go?). Cheers, Phil > > Thanks. > > -- > tejun > --
Hello, On Mon, Sep 23, 2024 at 12:45:48PM +0200, Phil Auld wrote: ... > It's not a per scheduler counter, though. It's global. We want to know Yeah, the sequence is global but we can report the sequence at which a given scheduler is loaded on each scheduler. That way, e.g., you can tell whether a particular scheduler instance is the same one you looked at the last time too. > that a (any) scx scheduler has been loaded at some time in the past. It's > really only interesting when 0 or > 0. The actual non-zero number and which > scheduler(s) don't matter that much. Not necessarily. e.g. You can also detect scheduler failing or being updated for other reasons. > And it needs to persist when the scheduler is unloaded (I didn't look but > I uspect the per scheduler attrs come and go?). Yes, the load sequence number should stay persistent across all schedulers, but each scheduler should report the sequence number at which *it* was loaded. Note that this doesn't really change anything now. If you only care whether any SCX scheduler has ever been loaded, you'd always look under root. Thanks. -- tejun
On Mon, Sep 23, 2024 at 05:32:15AM -1000 Tejun Heo wrote: > Hello, > > On Mon, Sep 23, 2024 at 12:45:48PM +0200, Phil Auld wrote: > ... > > It's not a per scheduler counter, though. It's global. We want to know > > Yeah, the sequence is global but we can report the sequence at which a given > scheduler is loaded on each scheduler. That way, e.g., you can tell whether > a particular scheduler instance is the same one you looked at the last time > too. > > > that a (any) scx scheduler has been loaded at some time in the past. It's > > really only interesting when 0 or > 0. The actual non-zero number and which > > scheduler(s) don't matter that much. > > Not necessarily. e.g. You can also detect scheduler failing or being updated > for other reasons. Sure, but the primary purpose is practically boolean. > > > And it needs to persist when the scheduler is unloaded (I didn't look but > > I uspect the per scheduler attrs come and go?). > > Yes, the load sequence number should stay persistent across all schedulers, > but each scheduler should report the sequence number at which *it* was > loaded. Note that this doesn't really change anything now. If you only care > whether any SCX scheduler has ever been loaded, you'd always look under > root. > In my testing root is not there is nothing is loaded. Cheers, Phil > Thanks. > > -- > tejun > --
On Mon, Sep 23, 2024 at 12:00:07PM -0400, Phil Auld wrote: > On Mon, Sep 23, 2024 at 05:32:15AM -1000 Tejun Heo wrote: > > Hello, > > > > On Mon, Sep 23, 2024 at 12:45:48PM +0200, Phil Auld wrote: > > ... > > > It's not a per scheduler counter, though. It's global. We want to know > > > > Yeah, the sequence is global but we can report the sequence at which a given > > scheduler is loaded on each scheduler. That way, e.g., you can tell whether > > a particular scheduler instance is the same one you looked at the last time > > too. > > > > > that a (any) scx scheduler has been loaded at some time in the past. It's > > > really only interesting when 0 or > 0. The actual non-zero number and which > > > scheduler(s) don't matter that much. > > > > Not necessarily. e.g. You can also detect scheduler failing or being updated > > for other reasons. > > Sure, but the primary purpose is practically boolean. > > > > > > And it needs to persist when the scheduler is unloaded (I didn't look but > > > I uspect the per scheduler attrs come and go?). > > > > Yes, the load sequence number should stay persistent across all schedulers, > > but each scheduler should report the sequence number at which *it* was > > loaded. Note that this doesn't really change anything now. If you only care > > whether any SCX scheduler has ever been loaded, you'd always look under > > root. > > > > In my testing root is not there is nothing is loaded. Right, there's no root if no sched_ext scheduler is loaded. Maybe we should always keep root present, or have a global counter and one per-sched? -Andrea
Hello, On Mon, Sep 23, 2024 at 06:34:20PM +0200, Andrea Righi wrote: ... > > > Yes, the load sequence number should stay persistent across all schedulers, > > > but each scheduler should report the sequence number at which *it* was > > > loaded. Note that this doesn't really change anything now. If you only care > > > whether any SCX scheduler has ever been loaded, you'd always look under > > > root. > > > > > > > In my testing root is not there is nothing is loaded. Ah, right. > Right, there's no root if no sched_ext scheduler is loaded. Maybe we > should always keep root present, or have a global counter and one > per-sched? I'll apply as-is. Let's add per-scheduler load seq separately. Thanks. -- tejun
On Mon, Sep 23, 2024 at 06:47:12AM -1000 Tejun Heo wrote: > Hello, > > On Mon, Sep 23, 2024 at 06:34:20PM +0200, Andrea Righi wrote: > ... > > > > Yes, the load sequence number should stay persistent across all schedulers, > > > > but each scheduler should report the sequence number at which *it* was > > > > loaded. Note that this doesn't really change anything now. If you only care > > > > whether any SCX scheduler has ever been loaded, you'd always look under > > > > root. > > > > > > > > > > In my testing root is not there is nothing is loaded. > > Ah, right. > > > Right, there's no root if no sched_ext scheduler is loaded. Maybe we > > should always keep root present, or have a global counter and one > > per-sched? > > I'll apply as-is. Let's add per-scheduler load seq separately. Thanks! I was thinking that per-scheduler you could just snapshot the global counter (either before or after increment) on load. Then you could easily tell when each was loaded relative to each other etc. Especially while "per-scheduler" is defined by string comparison I'd prefer not to rely on that for this use case. Cheers, Phil > > Thanks. > > -- > tejun > --
September 23, 2024 at 6:57 PM, "Phil Auld" <pauld@redhat.com> wrote: > > On Mon, Sep 23, 2024 at 06:47:12AM -1000 Tejun Heo wrote: > > > > > Hello, > > > > > > > > On Mon, Sep 23, 2024 at 06:34:20PM +0200, Andrea Righi wrote: > > > > ... > > > > > > Yes, the load sequence number should stay persistent across all schedulers, > > > > > > but each scheduler should report the sequence number at which *it* was > > > > > > loaded. Note that this doesn't really change anything now. If you only care > > > > > > whether any SCX scheduler has ever been loaded, you'd always look under > > > > > > root. > > > > > > > > > > > > > > > > In my testing root is not there is nothing is loaded. > > > > > > > > Ah, right. > > > > > > > > Right, there's no root if no sched_ext scheduler is loaded. Maybe we > > > > should always keep root present, or have a global counter and one > > > > per-sched? > > > > > > > > I'll apply as-is. Let's add per-scheduler load seq separately. > > > > Thanks! I was thinking that per-scheduler you could just snapshot the > > global counter (either before or after increment) on load. Then you > > could easily tell when each was loaded relative to each other etc. > > Especially while "per-scheduler" is defined by string comparison I'd > > prefer not to rely on that for this use case. Right, and it should be a trivial change, I'll work on that as soon as I find a stable spot (still on the road for conferences) :) -Andrea
Hello, On Mon, Sep 23, 2024 at 12:57:49PM -0400, Phil Auld wrote: .. > > I'll apply as-is. Let's add per-scheduler load seq separately. > > Thanks! I was thinking that per-scheduler you could just snapshot the > global counter (either before or after increment) on load. Then you > could easily tell when each was loaded relative to each other etc. Yeap. > Especially while "per-scheduler" is defined by string comparison I'd > prefer not to rely on that for this use case. Thanks. -- tejun
On Mon, Sep 23, 2024 at 12:45:48PM +0200, Phil Auld wrote: > > Hi Tejun, > > On Sun, Sep 22, 2024 at 11:21:02AM -1000 Tejun Heo wrote: > > Hello, Andrea. > > > > On Sat, Sep 21, 2024 at 09:39:21PM +0200, andrea.righi@linux.dev wrote: > > > static struct attribute *scx_global_attrs[] = { > > > &scx_attr_state.attr, > > > &scx_attr_switch_all.attr, > > > &scx_attr_nr_rejected.attr, > > > &scx_attr_hotplug_seq.attr, > > > + &scx_attr_enable_seq.attr, > > > NULL, > > > }; > > > > Can you put this in scx_sched_attrs instead as it probably would make sense > > to track this per-scheduler in the future when we support stacked > > schedulers. > > It's not a per scheduler counter, though. It's global. We want to know > that a (any) scx scheduler has been loaded at some time in the past. It's > really only interesting when 0 or > 0. The actual non-zero number and which > scheduler(s) don't matter that much. > > And it needs to persist when the scheduler is unloaded (I didn't look but > I uspect the per scheduler attrs come and go?). Correct, if we make the counter per-scheduler we would lose this information once the running scheduler is unloaded. Instead we want to maintain this information persistent, so that user support can clearly see if any of the BPF scheduler has ever been used since boot. Thanks, -Andrea
© 2016 - 2024 Red Hat, Inc.