kernel/sched/ext.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
While adding outer irqsave/restore locking, 0e7ffff1b811 ("scx: Fix raciness
in scx_ops_bypass()") forgot to convert an inner rq_unlock_irqrestore() to
rq_unlock() which could re-enable IRQ prematurely leading to the following
warning:
raw_local_irq_restore() called with IRQs enabled
WARNING: CPU: 1 PID: 96 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x30/0x40
...
Sched_ext: create_dsq (enabling)
pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : warn_bogus_irq_restore+0x30/0x40
lr : warn_bogus_irq_restore+0x30/0x40
...
Call trace:
warn_bogus_irq_restore+0x30/0x40 (P)
warn_bogus_irq_restore+0x30/0x40 (L)
scx_ops_bypass+0x224/0x3b8
scx_ops_enable.isra.0+0x2c8/0xaa8
bpf_scx_reg+0x18/0x30
...
irq event stamp: 33739
hardirqs last enabled at (33739): [<ffff8000800b699c>] scx_ops_bypass+0x174/0x3b8
hardirqs last disabled at (33738): [<ffff800080d48ad4>] _raw_spin_lock_irqsave+0xb4/0xd8
Drop the stray _irqrestore().
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Ihor Solodrai <ihor.solodrai@pm.me>
Link: http://lkml.kernel.org/r/qC39k3UsonrBYD_SmuxHnZIQLsuuccoCrkiqb_BT7DvH945A1_LZwE4g-5Pu9FcCtqZt4lY1HhIPi0homRuNWxkgo1rgP3bkxa0donw8kV4=@pm.me
Fixes: 0e7ffff1b811 ("scx: Fix raciness in scx_ops_bypass()")
Cc: stable@vger.kernel.org # v6.12
---
kernel/sched/ext.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 7fff1d045477..98519e6d0dcd 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -4763,7 +4763,7 @@ static void scx_ops_bypass(bool bypass)
* sees scx_rq_bypassing() before moving tasks to SCX.
*/
if (!scx_enabled()) {
- rq_unlock_irqrestore(rq, &rf);
+ rq_unlock(rq, &rf);
continue;
}
On Wednesday, December 11th, 2024 at 1:01 PM, Tejun Heo <tj@kernel.org> wrote:
>
>
> While adding outer irqsave/restore locking, 0e7ffff1b811 ("scx: Fix raciness
> in scx_ops_bypass()") forgot to convert an inner rq_unlock_irqrestore() to
> rq_unlock() which could re-enable IRQ prematurely leading to the following
> warning:
>
> raw_local_irq_restore() called with IRQs enabled
> WARNING: CPU: 1 PID: 96 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x30/0x40
> ...
> Sched_ext: create_dsq (enabling)
> pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> pc : warn_bogus_irq_restore+0x30/0x40
> lr : warn_bogus_irq_restore+0x30/0x40
> ...
> Call trace:
> warn_bogus_irq_restore+0x30/0x40 (P)
> warn_bogus_irq_restore+0x30/0x40 (L)
> scx_ops_bypass+0x224/0x3b8
> scx_ops_enable.isra.0+0x2c8/0xaa8
> bpf_scx_reg+0x18/0x30
> ...
> irq event stamp: 33739
> hardirqs last enabled at (33739): [<ffff8000800b699c>] scx_ops_bypass+0x174/0x3b8
>
> hardirqs last disabled at (33738): [<ffff800080d48ad4>] _raw_spin_lock_irqsave+0xb4/0xd8
>
>
> Drop the stray _irqrestore().
>
> Signed-off-by: Tejun Heo tj@kernel.org
>
> Reported-by: Ihor Solodrai ihor.solodrai@pm.me
>
> Link: http://lkml.kernel.org/r/qC39k3UsonrBYD_SmuxHnZIQLsuuccoCrkiqb_BT7DvH945A1_LZwE4g-5Pu9FcCtqZt4lY1HhIPi0homRuNWxkgo1rgP3bkxa0donw8kV4=@pm.me
> Fixes: 0e7ffff1b811 ("scx: Fix raciness in scx_ops_bypass()")
> Cc: stable@vger.kernel.org # v6.12
> ---
> kernel/sched/ext.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> index 7fff1d045477..98519e6d0dcd 100644
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
> @@ -4763,7 +4763,7 @@ static void scx_ops_bypass(bool bypass)
> * sees scx_rq_bypassing() before moving tasks to SCX.
> */
> if (!scx_enabled()) {
> - rq_unlock_irqrestore(rq, &rf);
> + rq_unlock(rq, &rf);
> continue;
> }
Hi Tejun,
I tried this patch on BPF CI: the pipeline ran 3 times
successfully. That's 12 selftests/sched_ext runs in total.
https://github.com/kernel-patches/vmtest/actions/runs/12301284063
Tested-by: Ihor Solodrai ihor.solodrai@pm.me
Thanks for the fix!
Hi Tejun, I re-enabled selftests/sched_ext on BPF CI today. The kernel on CI includes this patch. Sometimes there is a failure on attempt to attach a dsp_local_on scheduler. Examples of failed jobs: * https://github.com/kernel-patches/bpf/actions/runs/12379720791/job/34555104994 * https://github.com/kernel-patches/bpf/actions/runs/12382862660/job/34564648924 * https://github.com/kernel-patches/bpf/actions/runs/12381361846/job/34560047798 Here is a piece of log that is present in failed run, but not in a successful run: 2024-12-17T19:30:12.9010943Z [ 5.285022] sched_ext: BPF scheduler "dsp_local_on" enabled 2024-12-17T19:30:13.9022892Z ERR: dsp_local_on.c:37 2024-12-17T19:30:13.9025841Z Expected skel->data->uei.kind == EXIT_KIND(SCX_EXIT_ERROR) (0 == 1024) 2024-12-17T19:30:13.9256108Z ERR: exit.c:30 2024-12-17T19:30:13.9256641Z Failed to attach scheduler 2024-12-17T19:30:13.9611443Z [ 6.345087] smpboot: CPU 1 is now offline Could you please investigate? Thanks.
Hello, On Tue, Dec 17, 2024 at 11:44:08PM +0000, Ihor Solodrai wrote: > I re-enabled selftests/sched_ext on BPF CI today. The kernel on CI > includes this patch. Sometimes there is a failure on attempt to attach > a dsp_local_on scheduler. > > Examples of failed jobs: > > * https://github.com/kernel-patches/bpf/actions/runs/12379720791/job/34555104994 > * https://github.com/kernel-patches/bpf/actions/runs/12382862660/job/34564648924 > * https://github.com/kernel-patches/bpf/actions/runs/12381361846/job/34560047798 > > Here is a piece of log that is present in failed run, but not in > a successful run: > > 2024-12-17T19:30:12.9010943Z [ 5.285022] sched_ext: BPF scheduler "dsp_local_on" enabled > 2024-12-17T19:30:13.9022892Z ERR: dsp_local_on.c:37 > 2024-12-17T19:30:13.9025841Z Expected skel->data->uei.kind == EXIT_KIND(SCX_EXIT_ERROR) (0 == 1024) > 2024-12-17T19:30:13.9256108Z ERR: exit.c:30 > 2024-12-17T19:30:13.9256641Z Failed to attach scheduler > 2024-12-17T19:30:13.9611443Z [ 6.345087] smpboot: CPU 1 is now offline > > Could you please investigate? The test prog is wrong in assuming all possible CPUs to be consecutive and online but I'm not sure whether that's what's making the test flaky. Do you have dmesg from a failed run? Thanks. -- tejun
On Wednesday, December 18th, 2024 at 10:34 AM, Tejun Heo <tj@kernel.org> wrote: > > > Hello, > > On Tue, Dec 17, 2024 at 11:44:08PM +0000, Ihor Solodrai wrote: > > > I re-enabled selftests/sched_ext on BPF CI today. The kernel on CI > > includes this patch. Sometimes there is a failure on attempt to attach > > a dsp_local_on scheduler. > > > > Examples of failed jobs: > > > > * https://github.com/kernel-patches/bpf/actions/runs/12379720791/job/34555104994 > > * https://github.com/kernel-patches/bpf/actions/runs/12382862660/job/34564648924 > > * https://github.com/kernel-patches/bpf/actions/runs/12381361846/job/34560047798 > > > > Here is a piece of log that is present in failed run, but not in > > a successful run: > > > > 2024-12-17T19:30:12.9010943Z [ 5.285022] sched_ext: BPF scheduler "dsp_local_on" enabled > > 2024-12-17T19:30:13.9022892Z ERR: dsp_local_on.c:37 > > 2024-12-17T19:30:13.9025841Z Expected skel->data->uei.kind == EXIT_KIND(SCX_EXIT_ERROR) (0 == 1024) > > 2024-12-17T19:30:13.9256108Z ERR: exit.c:30 > > 2024-12-17T19:30:13.9256641Z Failed to attach scheduler > > 2024-12-17T19:30:13.9611443Z [ 6.345087] smpboot: CPU 1 is now offline > > > > Could you please investigate? > > > The test prog is wrong in assuming all possible CPUs to be consecutive and > online but I'm not sure whether that's what's making the test flaky. Do you > have dmesg from a failed run? Tejun, can you elaborate on what you're looking for in the logs? My understanding is that QEMU prints some of the dmesg messages. QEMU output is available in raw logs. Here is a link (you have to login to github to open): https://productionresultssa1.blob.core.windows.net/actions-results/99cd995e-679f-4180-872b-d31e1f564837/workflow-job-run-7216a7c9-5129-5959-a45a-28d6f9b737e2/logs/job/job-logs.txt?rsct=text%2Fplain&se=2024-12-19T22%3A57%3A01Z&sig=z%2B%2FUtIIhli4VG%2FCCVxawBnubNwfIIsl9Q2FlTVvM8q0%3D&ske=2024-12-20T07%3A00%3A35Z&skoid=ca7593d4-ee42-46cd-af88-8b886a2f84eb&sks=b&skt=2024-12-19T19%3A00%3A35Z&sktid=398a6654-997b-47e9-b12b-9515b896b4de&skv=2024-11-04&sp=r&spr=https&sr=b&st=2024-12-19T22%3A46%3A56Z&sv=2024-11-04 Generally, you can access raw logs by going to the job, and clicking the gear on the topright -> "View raw logs". > > Thanks. > > -- > tejun
On Thursday, December 19th, 2024 at 2:51 PM, Ihor Solodrai <ihor.solodrai@pm.me> wrote: > [...] > > Tejun, can you elaborate on what you're looking for in the logs? > My understanding is that QEMU prints some of the dmesg messages. > QEMU output is available in raw logs. I made changes to the CI scripts to explicitly dump dmesg in case of a failure. It looks like most of that log was already printed. Job: https://github.com/kernel-patches/bpf/actions/runs/12436924307/job/34726070343 Raw log: https://productionresultssa11.blob.core.windows.net/actions-results/a10f57cb-19e3-487a-9fb0-69742cfbef1b/workflow-job-run-4c580b44-6466-54d8-b922-6f707064e5ca/logs/job/job-logs.txt?rsct=text%2Fplain&se=2024-12-20T19%3A34%3A55Z&sig=kQ09k9r01VtP4p%2FgYvvCmm2FUuOHfsLjU3ARzks4xmE%3D&ske=2024-12-21T07%3A00%3A50Z&skoid=ca7593d4-ee42-46cd-af88-8b886a2f84eb&sks=b&skt=2024-12-20T19%3A00%3A50Z&sktid=398a6654-997b-47e9-b12b-9515b896b4de&skv=2024-11-04&sp=r&spr=https&sr=b&st=2024-12-20T19%3A24%3A50Z&sv=2024-11-04 Search for "dmesg start". > [...]
The dsp_local_on selftest expects the scheduler to fail by trying to
schedule an e.g. CPU-affine task to the wrong CPU. However, this isn't
guaranteed to happen in the 1 second window that the test is running.
Besides, it's odd to have this particular exception path tested when there
are no other tests that verify that the interface is working at all - e.g.
the test would pass if dsp_local_on interface is completely broken and fails
on any attempt.
Flip the test so that it verifies that the feature works. While at it, fix a
typo in the info message.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Ihor Solodrai <ihor.solodrai@pm.me>
Link: http://lkml.kernel.org/r/Z1n9v7Z6iNJ-wKmq@slm.duckdns.org
---
tools/testing/selftests/sched_ext/dsp_local_on.bpf.c | 5 ++++-
tools/testing/selftests/sched_ext/dsp_local_on.c | 5 +++--
2 files changed, 7 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/sched_ext/dsp_local_on.bpf.c b/tools/testing/selftests/sched_ext/dsp_local_on.bpf.c
index 6325bf76f47e..fbda6bf54671 100644
--- a/tools/testing/selftests/sched_ext/dsp_local_on.bpf.c
+++ b/tools/testing/selftests/sched_ext/dsp_local_on.bpf.c
@@ -43,7 +43,10 @@ void BPF_STRUCT_OPS(dsp_local_on_dispatch, s32 cpu, struct task_struct *prev)
if (!p)
return;
- target = bpf_get_prandom_u32() % nr_cpus;
+ if (p->nr_cpus_allowed == nr_cpus)
+ target = bpf_get_prandom_u32() % nr_cpus;
+ else
+ target = scx_bpf_task_cpu(p);
scx_bpf_dsq_insert(p, SCX_DSQ_LOCAL_ON | target, SCX_SLICE_DFL, 0);
bpf_task_release(p);
diff --git a/tools/testing/selftests/sched_ext/dsp_local_on.c b/tools/testing/selftests/sched_ext/dsp_local_on.c
index 472851b56854..0ff27e57fe43 100644
--- a/tools/testing/selftests/sched_ext/dsp_local_on.c
+++ b/tools/testing/selftests/sched_ext/dsp_local_on.c
@@ -34,9 +34,10 @@ static enum scx_test_status run(void *ctx)
/* Just sleeping is fine, plenty of scheduling events happening */
sleep(1);
- SCX_EQ(skel->data->uei.kind, EXIT_KIND(SCX_EXIT_ERROR));
bpf_link__destroy(link);
+ SCX_EQ(skel->data->uei.kind, EXIT_KIND(SCX_EXIT_UNREG));
+
return SCX_TEST_PASS;
}
@@ -50,7 +51,7 @@ static void cleanup(void *ctx)
struct scx_test dsp_local_on = {
.name = "dsp_local_on",
.description = "Verify we can directly dispatch tasks to a local DSQs "
- "from osp.dispatch()",
+ "from ops.dispatch()",
.setup = setup,
.run = run,
.cleanup = cleanup,
On Tuesday, December 24th, 2024 at 4:09 PM, Tejun Heo <tj@kernel.org> wrote:
>
>
> The dsp_local_on selftest expects the scheduler to fail by trying to
> schedule an e.g. CPU-affine task to the wrong CPU. However, this isn't
> guaranteed to happen in the 1 second window that the test is running.
> Besides, it's odd to have this particular exception path tested when there
> are no other tests that verify that the interface is working at all - e.g.
> the test would pass if dsp_local_on interface is completely broken and fails
> on any attempt.
>
> Flip the test so that it verifies that the feature works. While at it, fix a
> typo in the info message.
>
> Signed-off-by: Tejun Heo tj@kernel.org
>
> Reported-by: Ihor Solodrai ihor.solodrai@pm.me
>
> Link: http://lkml.kernel.org/r/Z1n9v7Z6iNJ-wKmq@slm.duckdns.org
> ---
> tools/testing/selftests/sched_ext/dsp_local_on.bpf.c | 5 ++++-
> tools/testing/selftests/sched_ext/dsp_local_on.c | 5 +++--
> 2 files changed, 7 insertions(+), 3 deletions(-)
Hi Tejun.
I've tried running sched_ext selftests on BPF CI today, applying a set
of patches from sched_ext/for-6.13-fixes, including this one.
You can see the list of patches I added here:
https://github.com/kernel-patches/vmtest/pull/332/files
With that, dsq_local_on has failed on x86_64 (llvm-18), although it
passed with other configurations:
https://github.com/kernel-patches/vmtest/actions/runs/12798804552/job/35683769806
Here is a piece of log that appears to be relevant:
2025-01-15T23:28:55.8238375Z [ 5.334631] sched_ext: BPF scheduler "dsp_local_on" disabled (runtime error)
2025-01-15T23:28:55.8243034Z [ 5.335420] sched_ext: dsp_local_on: SCX_DSQ_LOCAL[_ON] verdict target cpu 1 not allowed for kworker/u8:1[33]
2025-01-15T23:28:55.8246187Z [ 5.336139] dispatch_to_local_dsq+0x13e/0x1f0
2025-01-15T23:28:55.8249296Z [ 5.336474] flush_dispatch_buf+0x13d/0x170
2025-01-15T23:28:55.8252083Z [ 5.336793] balance_scx+0x225/0x3e0
2025-01-15T23:28:55.8254695Z [ 5.337065] __schedule+0x406/0xe80
2025-01-15T23:28:55.8257121Z [ 5.337330] schedule+0x41/0xb0
2025-01-15T23:28:55.8260146Z [ 5.337574] schedule_timeout+0xe5/0x160
2025-01-15T23:28:55.8263080Z [ 5.337875] rcu_tasks_kthread+0xb1/0xc0
2025-01-15T23:28:55.8265477Z [ 5.338169] kthread+0xfa/0x120
2025-01-15T23:28:55.8268202Z [ 5.338410] ret_from_fork+0x37/0x50
2025-01-15T23:28:55.8271272Z [ 5.338690] ret_from_fork_asm+0x1a/0x30
2025-01-15T23:28:56.7349562Z ERR: dsp_local_on.c:39
2025-01-15T23:28:56.7350182Z Expected skel->data->uei.kind == EXIT_KIND(SCX_EXIT_UNREG) (1024 == 64)
Could you please take a look?
Thank you.
>
> [...]
Hello, sorry about the delay. On Wed, Jan 15, 2025 at 11:50:37PM +0000, Ihor Solodrai wrote: ... > 2025-01-15T23:28:55.8238375Z [ 5.334631] sched_ext: BPF scheduler "dsp_local_on" disabled (runtime error) > 2025-01-15T23:28:55.8243034Z [ 5.335420] sched_ext: dsp_local_on: SCX_DSQ_LOCAL[_ON] verdict target cpu 1 not allowed for kworker/u8:1[33] That's a head scratcher. It's a single node 2 cpu instance and all unbound kworkers should be allowed on all CPUs. I'll update the test to test the actual cpumask but can you see whether this failure is consistent or flaky? Thanks. -- tejun
On Tuesday, January 21st, 2025 at 5:40 PM, Tejun Heo <tj@kernel.org> wrote: > > > Hello, sorry about the delay. > > On Wed, Jan 15, 2025 at 11:50:37PM +0000, Ihor Solodrai wrote: > ... > > > 2025-01-15T23:28:55.8238375Z [ 5.334631] sched_ext: BPF scheduler "dsp_local_on" disabled (runtime error) > > 2025-01-15T23:28:55.8243034Z [ 5.335420] sched_ext: dsp_local_on: SCX_DSQ_LOCAL[_ON] verdict target cpu 1 not allowed for kworker/u8:1[33] > > > That's a head scratcher. It's a single node 2 cpu instance and all unbound > kworkers should be allowed on all CPUs. I'll update the test to test the > actual cpumask but can you see whether this failure is consistent or flaky? I re-ran all the jobs, and all sched_ext jobs have failed (3/3). Previous time only 1 of 3 runs failed. https://github.com/kernel-patches/vmtest/actions/runs/12798804552/job/36016405680 > > Thanks. > > -- > tejun
On Wed, Jan 22, 2025 at 07:10:00PM +0000, Ihor Solodrai wrote: > > On Tuesday, January 21st, 2025 at 5:40 PM, Tejun Heo <tj@kernel.org> wrote: > > > > > > > Hello, sorry about the delay. > > > > On Wed, Jan 15, 2025 at 11:50:37PM +0000, Ihor Solodrai wrote: > > ... > > > > > 2025-01-15T23:28:55.8238375Z [ 5.334631] sched_ext: BPF scheduler "dsp_local_on" disabled (runtime error) > > > 2025-01-15T23:28:55.8243034Z [ 5.335420] sched_ext: dsp_local_on: SCX_DSQ_LOCAL[_ON] verdict target cpu 1 not allowed for kworker/u8:1[33] > > > > > > That's a head scratcher. It's a single node 2 cpu instance and all unbound > > kworkers should be allowed on all CPUs. I'll update the test to test the > > actual cpumask but can you see whether this failure is consistent or flaky? > > I re-ran all the jobs, and all sched_ext jobs have failed (3/3). > Previous time only 1 of 3 runs failed. > > https://github.com/kernel-patches/vmtest/actions/runs/12798804552/job/36016405680 Oh I see what happens, SCX_DSQ_LOCAL_ON is (incorrectly) resolved to 0. More exactly, none of the enum values are being resolved correctly, likely due to the CO:RE enum refactoring. There’s probably something broken in tools/testing/selftests/sched_ext/Makefile, I’ll take a look. Thanks, -Andrea
On Thu, Jan 23, 2025 at 10:40:52AM +0100, Andrea Righi wrote: > On Wed, Jan 22, 2025 at 07:10:00PM +0000, Ihor Solodrai wrote: > > > > On Tuesday, January 21st, 2025 at 5:40 PM, Tejun Heo <tj@kernel.org> wrote: > > > > > > > > > > > Hello, sorry about the delay. > > > > > > On Wed, Jan 15, 2025 at 11:50:37PM +0000, Ihor Solodrai wrote: > > > ... > > > > > > > 2025-01-15T23:28:55.8238375Z [ 5.334631] sched_ext: BPF scheduler "dsp_local_on" disabled (runtime error) > > > > 2025-01-15T23:28:55.8243034Z [ 5.335420] sched_ext: dsp_local_on: SCX_DSQ_LOCAL[_ON] verdict target cpu 1 not allowed for kworker/u8:1[33] > > > > > > > > > That's a head scratcher. It's a single node 2 cpu instance and all unbound > > > kworkers should be allowed on all CPUs. I'll update the test to test the > > > actual cpumask but can you see whether this failure is consistent or flaky? > > > > I re-ran all the jobs, and all sched_ext jobs have failed (3/3). > > Previous time only 1 of 3 runs failed. > > > > https://github.com/kernel-patches/vmtest/actions/runs/12798804552/job/36016405680 > > Oh I see what happens, SCX_DSQ_LOCAL_ON is (incorrectly) resolved to 0. > > More exactly, none of the enum values are being resolved correctly, likely > due to the CO:RE enum refactoring. There’s probably something broken in > tools/testing/selftests/sched_ext/Makefile, I’ll take a look. Yeah, we need to add SCX_ENUM_INIT() to each test. Will do that once the pending pull request is merged. The original report is a separate issue tho. I'm still a bit baffled by it. Thanks. -- tejun
On Thu, Jan 23, 2025 at 06:57:58AM -1000, Tejun Heo wrote: > On Thu, Jan 23, 2025 at 10:40:52AM +0100, Andrea Righi wrote: > > On Wed, Jan 22, 2025 at 07:10:00PM +0000, Ihor Solodrai wrote: > > > > > > On Tuesday, January 21st, 2025 at 5:40 PM, Tejun Heo <tj@kernel.org> wrote: > > > > > > > > > > > > > > > Hello, sorry about the delay. > > > > > > > > On Wed, Jan 15, 2025 at 11:50:37PM +0000, Ihor Solodrai wrote: > > > > ... > > > > > > > > > 2025-01-15T23:28:55.8238375Z [ 5.334631] sched_ext: BPF scheduler "dsp_local_on" disabled (runtime error) > > > > > 2025-01-15T23:28:55.8243034Z [ 5.335420] sched_ext: dsp_local_on: SCX_DSQ_LOCAL[_ON] verdict target cpu 1 not allowed for kworker/u8:1[33] > > > > > > > > > > > > That's a head scratcher. It's a single node 2 cpu instance and all unbound > > > > kworkers should be allowed on all CPUs. I'll update the test to test the > > > > actual cpumask but can you see whether this failure is consistent or flaky? > > > > > > I re-ran all the jobs, and all sched_ext jobs have failed (3/3). > > > Previous time only 1 of 3 runs failed. > > > > > > https://github.com/kernel-patches/vmtest/actions/runs/12798804552/job/36016405680 > > > > Oh I see what happens, SCX_DSQ_LOCAL_ON is (incorrectly) resolved to 0. > > > > More exactly, none of the enum values are being resolved correctly, likely > > due to the CO:RE enum refactoring. There’s probably something broken in > > tools/testing/selftests/sched_ext/Makefile, I’ll take a look. > > Yeah, we need to add SCX_ENUM_INIT() to each test. Will do that once the > pending pull request is merged. The original report is a separate issue tho. > I'm still a bit baffled by it. For the enum part: https://lore.kernel.org/all/20250123124606.242115-1-arighi@nvidia.com/ And yeah, I missed that the original bug report was about the unbound kworker not allowed to be dispatched on cpu 1. Weird... I'm wondering if we need to do the cpumask_cnt / scx_bpf_dsq_cancel() game, like we did with scx_rustland to handle concurrent affinity changes, but in this case the kworker shouldn't have its affinity changed... -Andrea
On Thu, Jan 23, 2025 at 07:45:08PM +0100, Andrea Righi wrote: > On Thu, Jan 23, 2025 at 06:57:58AM -1000, Tejun Heo wrote: > > On Thu, Jan 23, 2025 at 10:40:52AM +0100, Andrea Righi wrote: > > > On Wed, Jan 22, 2025 at 07:10:00PM +0000, Ihor Solodrai wrote: > > > > > > > > On Tuesday, January 21st, 2025 at 5:40 PM, Tejun Heo <tj@kernel.org> wrote: > > > > > > > > > > > > > > > > > > > Hello, sorry about the delay. > > > > > > > > > > On Wed, Jan 15, 2025 at 11:50:37PM +0000, Ihor Solodrai wrote: > > > > > ... > > > > > > > > > > > 2025-01-15T23:28:55.8238375Z [ 5.334631] sched_ext: BPF scheduler "dsp_local_on" disabled (runtime error) > > > > > > 2025-01-15T23:28:55.8243034Z [ 5.335420] sched_ext: dsp_local_on: SCX_DSQ_LOCAL[_ON] verdict target cpu 1 not allowed for kworker/u8:1[33] > > > > > > > > > > > > > > > That's a head scratcher. It's a single node 2 cpu instance and all unbound > > > > > kworkers should be allowed on all CPUs. I'll update the test to test the > > > > > actual cpumask but can you see whether this failure is consistent or flaky? > > > > > > > > I re-ran all the jobs, and all sched_ext jobs have failed (3/3). > > > > Previous time only 1 of 3 runs failed. > > > > > > > > https://github.com/kernel-patches/vmtest/actions/runs/12798804552/job/36016405680 > > > > > > Oh I see what happens, SCX_DSQ_LOCAL_ON is (incorrectly) resolved to 0. > > > > > > More exactly, none of the enum values are being resolved correctly, likely > > > due to the CO:RE enum refactoring. There’s probably something broken in > > > tools/testing/selftests/sched_ext/Makefile, I’ll take a look. > > > > Yeah, we need to add SCX_ENUM_INIT() to each test. Will do that once the > > pending pull request is merged. The original report is a separate issue tho. > > I'm still a bit baffled by it. > > For the enum part: https://lore.kernel.org/all/20250123124606.242115-1-arighi@nvidia.com/ > > And yeah, I missed that the original bug report was about the unbound > kworker not allowed to be dispatched on cpu 1. Weird... I'm wondering if we > need to do the cpumask_cnt / scx_bpf_dsq_cancel() game, like we did with > scx_rustland to handle concurrent affinity changes, but in this case the > kworker shouldn't have its affinity changed... Thinking more about this, scx_bpf_task_cpu(p) returns the last known CPU where the task p was running, but it doesn't necessarily give a CPU where the task can run at any time. In general it's probably a safer choice to rely on p->cpus_ptr, maybe doing bpf_cpumask_any_distribute(p->cpus_ptr) for this test case. However, I still don't see why the unbound kworker couldn't be dispatched on cpu 1 in this particular case... -Andrea
From e9fe182772dcb2630964724fd93e9c90b68ea0fd Mon Sep 17 00:00:00 2001
From: Tejun Heo <tj@kernel.org>
Date: Fri, 24 Jan 2025 10:48:25 -1000
dsp_local_on has several incorrect assumptions, one of which is that
p->nr_cpus_allowed always tracks p->cpus_ptr. This is not true when a task
is scheduled out while migration is disabled - p->cpus_ptr is temporarily
overridden to the previous CPU while p->nr_cpus_allowed remains unchanged.
This led to sporadic test faliures when dsp_local_on_dispatch() tries to put
a migration disabled task to a different CPU. Fix it by keeping the previous
CPU when migration is disabled.
There are SCX schedulers that make use of p->nr_cpus_allowed. They should
also implement explicit handling for p->migration_disabled.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Ihor Solodrai <ihor.solodrai@pm.me>
Cc: Andrea Righi <arighi@nvidia.com>
Cc: Changwoo Min <changwoo@igalia.com>
---
Applying to sched_ext/for-6.14-fixes. Thanks.
tools/testing/selftests/sched_ext/dsp_local_on.bpf.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/sched_ext/dsp_local_on.bpf.c b/tools/testing/selftests/sched_ext/dsp_local_on.bpf.c
index fbda6bf54671..758b479bd1ee 100644
--- a/tools/testing/selftests/sched_ext/dsp_local_on.bpf.c
+++ b/tools/testing/selftests/sched_ext/dsp_local_on.bpf.c
@@ -43,7 +43,7 @@ void BPF_STRUCT_OPS(dsp_local_on_dispatch, s32 cpu, struct task_struct *prev)
if (!p)
return;
- if (p->nr_cpus_allowed == nr_cpus)
+ if (p->nr_cpus_allowed == nr_cpus && !p->migration_disabled)
target = bpf_get_prandom_u32() % nr_cpus;
else
target = scx_bpf_task_cpu(p);
--
2.48.1
On Fri, Jan 24, 2025 at 12:00:38PM -1000, Tejun Heo wrote:
> From e9fe182772dcb2630964724fd93e9c90b68ea0fd Mon Sep 17 00:00:00 2001
> From: Tejun Heo <tj@kernel.org>
> Date: Fri, 24 Jan 2025 10:48:25 -1000
>
> dsp_local_on has several incorrect assumptions, one of which is that
> p->nr_cpus_allowed always tracks p->cpus_ptr. This is not true when a task
> is scheduled out while migration is disabled - p->cpus_ptr is temporarily
> overridden to the previous CPU while p->nr_cpus_allowed remains unchanged.
>
> This led to sporadic test faliures when dsp_local_on_dispatch() tries to put
> a migration disabled task to a different CPU. Fix it by keeping the previous
> CPU when migration is disabled.
>
> There are SCX schedulers that make use of p->nr_cpus_allowed. They should
> also implement explicit handling for p->migration_disabled.
>
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Reported-by: Ihor Solodrai <ihor.solodrai@pm.me>
> Cc: Andrea Righi <arighi@nvidia.com>
> Cc: Changwoo Min <changwoo@igalia.com>
> ---
> Applying to sched_ext/for-6.14-fixes. Thanks.
>
> tools/testing/selftests/sched_ext/dsp_local_on.bpf.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tools/testing/selftests/sched_ext/dsp_local_on.bpf.c b/tools/testing/selftests/sched_ext/dsp_local_on.bpf.c
> index fbda6bf54671..758b479bd1ee 100644
> --- a/tools/testing/selftests/sched_ext/dsp_local_on.bpf.c
> +++ b/tools/testing/selftests/sched_ext/dsp_local_on.bpf.c
> @@ -43,7 +43,7 @@ void BPF_STRUCT_OPS(dsp_local_on_dispatch, s32 cpu, struct task_struct *prev)
> if (!p)
> return;
>
> - if (p->nr_cpus_allowed == nr_cpus)
> + if (p->nr_cpus_allowed == nr_cpus && !p->migration_disabled)
This doesn't work with !CONFIG_SMP, maybe we can introduce a helper like:
static bool is_migration_disabled(const struct task_struct *p)
{
if (bpf_core_field_exists(p->migration_disabled))
return p->migration_disabled;
return false;
}
> target = bpf_get_prandom_u32() % nr_cpus;
> else
> target = scx_bpf_task_cpu(p);
> --
> 2.48.1
>
Thanks,
-Andrea
On Sat, Jan 25, 2025 at 05:54:23AM +0100, Andrea Righi wrote:
...
> > - if (p->nr_cpus_allowed == nr_cpus)
> > + if (p->nr_cpus_allowed == nr_cpus && !p->migration_disabled)
>
> This doesn't work with !CONFIG_SMP, maybe we can introduce a helper like:
>
> static bool is_migration_disabled(const struct task_struct *p)
> {
> if (bpf_core_field_exists(p->migration_disabled))
> return p->migration_disabled;
> return false;
Ah, right. Would you care to send the patch?
Thanks.
--
tejun
On Tue, Dec 24, 2024 at 02:09:15PM -1000, Tejun Heo wrote: > The dsp_local_on selftest expects the scheduler to fail by trying to > schedule an e.g. CPU-affine task to the wrong CPU. However, this isn't > guaranteed to happen in the 1 second window that the test is running. > Besides, it's odd to have this particular exception path tested when there > are no other tests that verify that the interface is working at all - e.g. > the test would pass if dsp_local_on interface is completely broken and fails > on any attempt. > > Flip the test so that it verifies that the feature works. While at it, fix a > typo in the info message. > > Signed-off-by: Tejun Heo <tj@kernel.org> > Reported-by: Ihor Solodrai <ihor.solodrai@pm.me> > Link: http://lkml.kernel.org/r/Z1n9v7Z6iNJ-wKmq@slm.duckdns.org Applied to sched_ext/for-6.13-fixes. Thanks. -- tejun
On Wed, Dec 11, 2024 at 11:01:51AM -1000, Tejun Heo wrote:
> While adding outer irqsave/restore locking, 0e7ffff1b811 ("scx: Fix raciness
> in scx_ops_bypass()") forgot to convert an inner rq_unlock_irqrestore() to
> rq_unlock() which could re-enable IRQ prematurely leading to the following
> warning:
>
> raw_local_irq_restore() called with IRQs enabled
> WARNING: CPU: 1 PID: 96 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x30/0x40
> ...
> Sched_ext: create_dsq (enabling)
> pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> pc : warn_bogus_irq_restore+0x30/0x40
> lr : warn_bogus_irq_restore+0x30/0x40
> ...
> Call trace:
> warn_bogus_irq_restore+0x30/0x40 (P)
> warn_bogus_irq_restore+0x30/0x40 (L)
> scx_ops_bypass+0x224/0x3b8
> scx_ops_enable.isra.0+0x2c8/0xaa8
> bpf_scx_reg+0x18/0x30
> ...
> irq event stamp: 33739
> hardirqs last enabled at (33739): [<ffff8000800b699c>] scx_ops_bypass+0x174/0x3b8
> hardirqs last disabled at (33738): [<ffff800080d48ad4>] _raw_spin_lock_irqsave+0xb4/0xd8
>
> Drop the stray _irqrestore().
>
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Reported-by: Ihor Solodrai <ihor.solodrai@pm.me>
> Link: http://lkml.kernel.org/r/qC39k3UsonrBYD_SmuxHnZIQLsuuccoCrkiqb_BT7DvH945A1_LZwE4g-5Pu9FcCtqZt4lY1HhIPi0homRuNWxkgo1rgP3bkxa0donw8kV4=@pm.me
> Fixes: 0e7ffff1b811 ("scx: Fix raciness in scx_ops_bypass()")
> Cc: stable@vger.kernel.org # v6.12
Applying to sched_ext/for-6.13-fixes.
Thanks.
--
tejun
© 2016 - 2025 Red Hat, Inc.