sched_ext: Fix invalid irq restore in scx_ops_bypass()

[PATCH sched_ext/for-6.13-fixes] sched_ext: Fix invalid irq restore in scx_ops_bypass()

Posted by Tejun Heo 1 year, 1 month ago

While adding outer irqsave/restore locking, 0e7ffff1b811 ("scx: Fix raciness
in scx_ops_bypass()") forgot to convert an inner rq_unlock_irqrestore() to
rq_unlock() which could re-enable IRQ prematurely leading to the following
warning:

  raw_local_irq_restore() called with IRQs enabled
  WARNING: CPU: 1 PID: 96 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x30/0x40
  ...
  Sched_ext: create_dsq (enabling)
  pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
  pc : warn_bogus_irq_restore+0x30/0x40
  lr : warn_bogus_irq_restore+0x30/0x40
  ...
  Call trace:
   warn_bogus_irq_restore+0x30/0x40 (P)
   warn_bogus_irq_restore+0x30/0x40 (L)
   scx_ops_bypass+0x224/0x3b8
   scx_ops_enable.isra.0+0x2c8/0xaa8
   bpf_scx_reg+0x18/0x30
  ...
  irq event stamp: 33739
  hardirqs last  enabled at (33739): [<ffff8000800b699c>] scx_ops_bypass+0x174/0x3b8
  hardirqs last disabled at (33738): [<ffff800080d48ad4>] _raw_spin_lock_irqsave+0xb4/0xd8

Drop the stray _irqrestore().

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Ihor Solodrai <ihor.solodrai@pm.me>
Link: http://lkml.kernel.org/r/qC39k3UsonrBYD_SmuxHnZIQLsuuccoCrkiqb_BT7DvH945A1_LZwE4g-5Pu9FcCtqZt4lY1HhIPi0homRuNWxkgo1rgP3bkxa0donw8kV4=@pm.me
Fixes: 0e7ffff1b811 ("scx: Fix raciness in scx_ops_bypass()")
Cc: stable@vger.kernel.org # v6.12
---
 kernel/sched/ext.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 7fff1d045477..98519e6d0dcd 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -4763,7 +4763,7 @@ static void scx_ops_bypass(bool bypass)
 		 * sees scx_rq_bypassing() before moving tasks to SCX.
 		 */
 		if (!scx_enabled()) {
-			rq_unlock_irqrestore(rq, &rf);
+			rq_unlock(rq, &rf);
 			continue;
 		}

Re: [PATCH sched_ext/for-6.13-fixes] sched_ext: Fix invalid irq restore in scx_ops_bypass()

Posted by Ihor Solodrai 1 year, 1 month ago

On Wednesday, December 11th, 2024 at 1:01 PM, Tejun Heo <tj@kernel.org> wrote:

> 
> 
> While adding outer irqsave/restore locking, 0e7ffff1b811 ("scx: Fix raciness
> in scx_ops_bypass()") forgot to convert an inner rq_unlock_irqrestore() to
> rq_unlock() which could re-enable IRQ prematurely leading to the following
> warning:
> 
> raw_local_irq_restore() called with IRQs enabled
> WARNING: CPU: 1 PID: 96 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x30/0x40
> ...
> Sched_ext: create_dsq (enabling)
> pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> pc : warn_bogus_irq_restore+0x30/0x40
> lr : warn_bogus_irq_restore+0x30/0x40
> ...
> Call trace:
> warn_bogus_irq_restore+0x30/0x40 (P)
> warn_bogus_irq_restore+0x30/0x40 (L)
> scx_ops_bypass+0x224/0x3b8
> scx_ops_enable.isra.0+0x2c8/0xaa8
> bpf_scx_reg+0x18/0x30
> ...
> irq event stamp: 33739
> hardirqs last enabled at (33739): [<ffff8000800b699c>] scx_ops_bypass+0x174/0x3b8
> 
> hardirqs last disabled at (33738): [<ffff800080d48ad4>] _raw_spin_lock_irqsave+0xb4/0xd8
> 
> 
> Drop the stray _irqrestore().
> 
> Signed-off-by: Tejun Heo tj@kernel.org
> 
> Reported-by: Ihor Solodrai ihor.solodrai@pm.me
> 
> Link: http://lkml.kernel.org/r/qC39k3UsonrBYD_SmuxHnZIQLsuuccoCrkiqb_BT7DvH945A1_LZwE4g-5Pu9FcCtqZt4lY1HhIPi0homRuNWxkgo1rgP3bkxa0donw8kV4=@pm.me
> Fixes: 0e7ffff1b811 ("scx: Fix raciness in scx_ops_bypass()")
> Cc: stable@vger.kernel.org # v6.12
> ---
> kernel/sched/ext.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> index 7fff1d045477..98519e6d0dcd 100644
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
> @@ -4763,7 +4763,7 @@ static void scx_ops_bypass(bool bypass)
> * sees scx_rq_bypassing() before moving tasks to SCX.
> */
> if (!scx_enabled()) {
> - rq_unlock_irqrestore(rq, &rf);
> + rq_unlock(rq, &rf);
> continue;
> }

Hi Tejun,


I tried this patch on BPF CI: the pipeline ran 3 times
successfully. That's 12 selftests/sched_ext runs in total.

https://github.com/kernel-patches/vmtest/actions/runs/12301284063

Tested-by: Ihor Solodrai ihor.solodrai@pm.me

Thanks for the fix!

Re: [PATCH sched_ext/for-6.13-fixes] sched_ext: Fix invalid irq restore in scx_ops_bypass()

Posted by Ihor Solodrai 1 year, 1 month ago

Hi Tejun,

I re-enabled selftests/sched_ext on BPF CI today. The kernel on CI
includes this patch. Sometimes there is a failure on attempt to attach
a dsp_local_on scheduler.

Examples of failed jobs:

  * https://github.com/kernel-patches/bpf/actions/runs/12379720791/job/34555104994
  * https://github.com/kernel-patches/bpf/actions/runs/12382862660/job/34564648924
  * https://github.com/kernel-patches/bpf/actions/runs/12381361846/job/34560047798

Here is a piece of log that is present in failed run, but not in
a successful run:

2024-12-17T19:30:12.9010943Z [    5.285022] sched_ext: BPF scheduler "dsp_local_on" enabled
2024-12-17T19:30:13.9022892Z ERR: dsp_local_on.c:37
2024-12-17T19:30:13.9025841Z Expected skel->data->uei.kind == EXIT_KIND(SCX_EXIT_ERROR) (0 == 1024)
2024-12-17T19:30:13.9256108Z ERR: exit.c:30
2024-12-17T19:30:13.9256641Z Failed to attach scheduler
2024-12-17T19:30:13.9611443Z [    6.345087] smpboot: CPU 1 is now offline

Could you please investigate?

Thanks.

Re: [PATCH sched_ext/for-6.13-fixes] sched_ext: Fix invalid irq restore in scx_ops_bypass()

Posted by Tejun Heo 1 year, 1 month ago

Hello,

On Tue, Dec 17, 2024 at 11:44:08PM +0000, Ihor Solodrai wrote:
> I re-enabled selftests/sched_ext on BPF CI today. The kernel on CI
> includes this patch. Sometimes there is a failure on attempt to attach
> a dsp_local_on scheduler.
> 
> Examples of failed jobs:
> 
>   * https://github.com/kernel-patches/bpf/actions/runs/12379720791/job/34555104994
>   * https://github.com/kernel-patches/bpf/actions/runs/12382862660/job/34564648924
>   * https://github.com/kernel-patches/bpf/actions/runs/12381361846/job/34560047798
> 
> Here is a piece of log that is present in failed run, but not in
> a successful run:
> 
> 2024-12-17T19:30:12.9010943Z [    5.285022] sched_ext: BPF scheduler "dsp_local_on" enabled
> 2024-12-17T19:30:13.9022892Z ERR: dsp_local_on.c:37
> 2024-12-17T19:30:13.9025841Z Expected skel->data->uei.kind == EXIT_KIND(SCX_EXIT_ERROR) (0 == 1024)
> 2024-12-17T19:30:13.9256108Z ERR: exit.c:30
> 2024-12-17T19:30:13.9256641Z Failed to attach scheduler
> 2024-12-17T19:30:13.9611443Z [    6.345087] smpboot: CPU 1 is now offline
> 
> Could you please investigate?

The test prog is wrong in assuming all possible CPUs to be consecutive and
online but I'm not sure whether that's what's making the test flaky. Do you
have dmesg from a failed run?

Thanks.

-- 
tejun

Re: [PATCH sched_ext/for-6.13-fixes] sched_ext: Fix invalid irq restore in scx_ops_bypass()

Posted by Ihor Solodrai 1 year, 1 month ago

On Wednesday, December 18th, 2024 at 10:34 AM, Tejun Heo <tj@kernel.org> wrote:

> 
> 
> Hello,
> 
> On Tue, Dec 17, 2024 at 11:44:08PM +0000, Ihor Solodrai wrote:
> 
> > I re-enabled selftests/sched_ext on BPF CI today. The kernel on CI
> > includes this patch. Sometimes there is a failure on attempt to attach
> > a dsp_local_on scheduler.
> > 
> > Examples of failed jobs:
> > 
> > * https://github.com/kernel-patches/bpf/actions/runs/12379720791/job/34555104994
> > * https://github.com/kernel-patches/bpf/actions/runs/12382862660/job/34564648924
> > * https://github.com/kernel-patches/bpf/actions/runs/12381361846/job/34560047798
> > 
> > Here is a piece of log that is present in failed run, but not in
> > a successful run:
> > 
> > 2024-12-17T19:30:12.9010943Z [ 5.285022] sched_ext: BPF scheduler "dsp_local_on" enabled
> > 2024-12-17T19:30:13.9022892Z ERR: dsp_local_on.c:37
> > 2024-12-17T19:30:13.9025841Z Expected skel->data->uei.kind == EXIT_KIND(SCX_EXIT_ERROR) (0 == 1024)
> > 2024-12-17T19:30:13.9256108Z ERR: exit.c:30
> > 2024-12-17T19:30:13.9256641Z Failed to attach scheduler
> > 2024-12-17T19:30:13.9611443Z [ 6.345087] smpboot: CPU 1 is now offline
> > 
> > Could you please investigate?
> 
> 
> The test prog is wrong in assuming all possible CPUs to be consecutive and
> online but I'm not sure whether that's what's making the test flaky. Do you
> have dmesg from a failed run?

Tejun, can you elaborate on what you're looking for in the logs?
My understanding is that QEMU prints some of the dmesg messages.
QEMU output is available in raw logs.

Here is a link (you have to login to github to open):

https://productionresultssa1.blob.core.windows.net/actions-results/99cd995e-679f-4180-872b-d31e1f564837/workflow-job-run-7216a7c9-5129-5959-a45a-28d6f9b737e2/logs/job/job-logs.txt?rsct=text%2Fplain&se=2024-12-19T22%3A57%3A01Z&sig=z%2B%2FUtIIhli4VG%2FCCVxawBnubNwfIIsl9Q2FlTVvM8q0%3D&ske=2024-12-20T07%3A00%3A35Z&skoid=ca7593d4-ee42-46cd-af88-8b886a2f84eb&sks=b&skt=2024-12-19T19%3A00%3A35Z&sktid=398a6654-997b-47e9-b12b-9515b896b4de&skv=2024-11-04&sp=r&spr=https&sr=b&st=2024-12-19T22%3A46%3A56Z&sv=2024-11-04

Generally, you can access raw logs by going to the job, and 
clicking the gear on the topright -> "View raw logs".

> 
> Thanks.
> 
> --
> tejun

Re: [PATCH sched_ext/for-6.13-fixes] sched_ext: Fix invalid irq restore in scx_ops_bypass()

Posted by Ihor Solodrai 1 year, 1 month ago

On Thursday, December 19th, 2024 at 2:51 PM, Ihor Solodrai <ihor.solodrai@pm.me> wrote:

> [...]
> 
> Tejun, can you elaborate on what you're looking for in the logs?
> My understanding is that QEMU prints some of the dmesg messages.
> QEMU output is available in raw logs.

I made changes to the CI scripts to explicitly dump dmesg in case of a failure.
It looks like most of that log was already printed.

Job: https://github.com/kernel-patches/bpf/actions/runs/12436924307/job/34726070343

Raw log: https://productionresultssa11.blob.core.windows.net/actions-results/a10f57cb-19e3-487a-9fb0-69742cfbef1b/workflow-job-run-4c580b44-6466-54d8-b922-6f707064e5ca/logs/job/job-logs.txt?rsct=text%2Fplain&se=2024-12-20T19%3A34%3A55Z&sig=kQ09k9r01VtP4p%2FgYvvCmm2FUuOHfsLjU3ARzks4xmE%3D&ske=2024-12-21T07%3A00%3A50Z&skoid=ca7593d4-ee42-46cd-af88-8b886a2f84eb&sks=b&skt=2024-12-20T19%3A00%3A50Z&sktid=398a6654-997b-47e9-b12b-9515b896b4de&skv=2024-11-04&sp=r&spr=https&sr=b&st=2024-12-20T19%3A24%3A50Z&sv=2024-11-04 

Search for "dmesg start".

> [...]

[PATCH sched_ext/for-6.13-fixes] sched_ext: Fix dsq_local_on selftest

Posted by Tejun Heo 1 year, 1 month ago

The dsp_local_on selftest expects the scheduler to fail by trying to
schedule an e.g. CPU-affine task to the wrong CPU. However, this isn't
guaranteed to happen in the 1 second window that the test is running.
Besides, it's odd to have this particular exception path tested when there
are no other tests that verify that the interface is working at all - e.g.
the test would pass if dsp_local_on interface is completely broken and fails
on any attempt.

Flip the test so that it verifies that the feature works. While at it, fix a
typo in the info message.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Ihor Solodrai <ihor.solodrai@pm.me>
Link: http://lkml.kernel.org/r/Z1n9v7Z6iNJ-wKmq@slm.duckdns.org
---
 tools/testing/selftests/sched_ext/dsp_local_on.bpf.c |    5 ++++-
 tools/testing/selftests/sched_ext/dsp_local_on.c     |    5 +++--
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/sched_ext/dsp_local_on.bpf.c b/tools/testing/selftests/sched_ext/dsp_local_on.bpf.c
index 6325bf76f47e..fbda6bf54671 100644
--- a/tools/testing/selftests/sched_ext/dsp_local_on.bpf.c
+++ b/tools/testing/selftests/sched_ext/dsp_local_on.bpf.c
@@ -43,7 +43,10 @@ void BPF_STRUCT_OPS(dsp_local_on_dispatch, s32 cpu, struct task_struct *prev)
 	if (!p)
 		return;
 
-	target = bpf_get_prandom_u32() % nr_cpus;
+	if (p->nr_cpus_allowed == nr_cpus)
+		target = bpf_get_prandom_u32() % nr_cpus;
+	else
+		target = scx_bpf_task_cpu(p);
 
 	scx_bpf_dsq_insert(p, SCX_DSQ_LOCAL_ON | target, SCX_SLICE_DFL, 0);
 	bpf_task_release(p);
diff --git a/tools/testing/selftests/sched_ext/dsp_local_on.c b/tools/testing/selftests/sched_ext/dsp_local_on.c
index 472851b56854..0ff27e57fe43 100644
--- a/tools/testing/selftests/sched_ext/dsp_local_on.c
+++ b/tools/testing/selftests/sched_ext/dsp_local_on.c
@@ -34,9 +34,10 @@ static enum scx_test_status run(void *ctx)
 	/* Just sleeping is fine, plenty of scheduling events happening */
 	sleep(1);
 
-	SCX_EQ(skel->data->uei.kind, EXIT_KIND(SCX_EXIT_ERROR));
 	bpf_link__destroy(link);
 
+	SCX_EQ(skel->data->uei.kind, EXIT_KIND(SCX_EXIT_UNREG));
+
 	return SCX_TEST_PASS;
 }
 
@@ -50,7 +51,7 @@ static void cleanup(void *ctx)
 struct scx_test dsp_local_on = {
 	.name = "dsp_local_on",
 	.description = "Verify we can directly dispatch tasks to a local DSQs "
-		       "from osp.dispatch()",
+		       "from ops.dispatch()",
 	.setup = setup,
 	.run = run,
 	.cleanup = cleanup,

Re: [PATCH sched_ext/for-6.13-fixes] sched_ext: Fix dsq_local_on selftest

Posted by Ihor Solodrai 1 year ago

On Tuesday, December 24th, 2024 at 4:09 PM, Tejun Heo <tj@kernel.org> wrote:

> 
> 
> The dsp_local_on selftest expects the scheduler to fail by trying to
> schedule an e.g. CPU-affine task to the wrong CPU. However, this isn't
> guaranteed to happen in the 1 second window that the test is running.
> Besides, it's odd to have this particular exception path tested when there
> are no other tests that verify that the interface is working at all - e.g.
> the test would pass if dsp_local_on interface is completely broken and fails
> on any attempt.
> 
> Flip the test so that it verifies that the feature works. While at it, fix a
> typo in the info message.
> 
> Signed-off-by: Tejun Heo tj@kernel.org
> 
> Reported-by: Ihor Solodrai ihor.solodrai@pm.me
> 
> Link: http://lkml.kernel.org/r/Z1n9v7Z6iNJ-wKmq@slm.duckdns.org
> ---
> tools/testing/selftests/sched_ext/dsp_local_on.bpf.c | 5 ++++-
> tools/testing/selftests/sched_ext/dsp_local_on.c | 5 +++--
> 2 files changed, 7 insertions(+), 3 deletions(-)

Hi Tejun.

I've tried running sched_ext selftests on BPF CI today, applying a set
of patches from sched_ext/for-6.13-fixes, including this one.

You can see the list of patches I added here:
https://github.com/kernel-patches/vmtest/pull/332/files

With that, dsq_local_on has failed on x86_64 (llvm-18), although it
passed with other configurations:
https://github.com/kernel-patches/vmtest/actions/runs/12798804552/job/35683769806

Here is a piece of log that appears to be relevant:

    2025-01-15T23:28:55.8238375Z [    5.334631] sched_ext: BPF scheduler "dsp_local_on" disabled (runtime error)
    2025-01-15T23:28:55.8243034Z [    5.335420] sched_ext: dsp_local_on: SCX_DSQ_LOCAL[_ON] verdict target cpu 1 not allowed for kworker/u8:1[33]
    2025-01-15T23:28:55.8246187Z [    5.336139]    dispatch_to_local_dsq+0x13e/0x1f0
    2025-01-15T23:28:55.8249296Z [    5.336474]    flush_dispatch_buf+0x13d/0x170
    2025-01-15T23:28:55.8252083Z [    5.336793]    balance_scx+0x225/0x3e0
    2025-01-15T23:28:55.8254695Z [    5.337065]    __schedule+0x406/0xe80
    2025-01-15T23:28:55.8257121Z [    5.337330]    schedule+0x41/0xb0
    2025-01-15T23:28:55.8260146Z [    5.337574]    schedule_timeout+0xe5/0x160
    2025-01-15T23:28:55.8263080Z [    5.337875]    rcu_tasks_kthread+0xb1/0xc0
    2025-01-15T23:28:55.8265477Z [    5.338169]    kthread+0xfa/0x120
    2025-01-15T23:28:55.8268202Z [    5.338410]    ret_from_fork+0x37/0x50
    2025-01-15T23:28:55.8271272Z [    5.338690]    ret_from_fork_asm+0x1a/0x30
    2025-01-15T23:28:56.7349562Z ERR: dsp_local_on.c:39
    2025-01-15T23:28:56.7350182Z Expected skel->data->uei.kind == EXIT_KIND(SCX_EXIT_UNREG) (1024 == 64)

Could you please take a look?

Thank you.

> 
> [...]

Re: [PATCH sched_ext/for-6.13-fixes] sched_ext: Fix dsq_local_on selftest

Posted by Tejun Heo 1 year ago

Hello, sorry about the delay.

On Wed, Jan 15, 2025 at 11:50:37PM +0000, Ihor Solodrai wrote:
...
>     2025-01-15T23:28:55.8238375Z [    5.334631] sched_ext: BPF scheduler "dsp_local_on" disabled (runtime error)
>     2025-01-15T23:28:55.8243034Z [    5.335420] sched_ext: dsp_local_on: SCX_DSQ_LOCAL[_ON] verdict target cpu 1 not allowed for kworker/u8:1[33]

That's a head scratcher. It's a single node 2 cpu instance and all unbound
kworkers should be allowed on all CPUs. I'll update the test to test the
actual cpumask but can you see whether this failure is consistent or flaky?

Thanks.

-- 
tejun

Re: [PATCH sched_ext/for-6.13-fixes] sched_ext: Fix dsq_local_on selftest

Posted by Ihor Solodrai 1 year ago

On Tuesday, January 21st, 2025 at 5:40 PM, Tejun Heo <tj@kernel.org> wrote:

> 
> 
> Hello, sorry about the delay.
> 
> On Wed, Jan 15, 2025 at 11:50:37PM +0000, Ihor Solodrai wrote:
> ...
> 
> > 2025-01-15T23:28:55.8238375Z [ 5.334631] sched_ext: BPF scheduler "dsp_local_on" disabled (runtime error)
> > 2025-01-15T23:28:55.8243034Z [ 5.335420] sched_ext: dsp_local_on: SCX_DSQ_LOCAL[_ON] verdict target cpu 1 not allowed for kworker/u8:1[33]
> 
> 
> That's a head scratcher. It's a single node 2 cpu instance and all unbound
> kworkers should be allowed on all CPUs. I'll update the test to test the
> actual cpumask but can you see whether this failure is consistent or flaky?

I re-ran all the jobs, and all sched_ext jobs have failed (3/3).
Previous time only 1 of 3 runs failed.

https://github.com/kernel-patches/vmtest/actions/runs/12798804552/job/36016405680

> 
> Thanks.
> 
> --
> tejun

Re: [PATCH sched_ext/for-6.13-fixes] sched_ext: Fix dsq_local_on selftest

Posted by Andrea Righi 1 year ago

On Wed, Jan 22, 2025 at 07:10:00PM +0000, Ihor Solodrai wrote:
> 
> On Tuesday, January 21st, 2025 at 5:40 PM, Tejun Heo <tj@kernel.org> wrote:
> 
> > 
> > 
> > Hello, sorry about the delay.
> > 
> > On Wed, Jan 15, 2025 at 11:50:37PM +0000, Ihor Solodrai wrote:
> > ...
> > 
> > > 2025-01-15T23:28:55.8238375Z [ 5.334631] sched_ext: BPF scheduler "dsp_local_on" disabled (runtime error)
> > > 2025-01-15T23:28:55.8243034Z [ 5.335420] sched_ext: dsp_local_on: SCX_DSQ_LOCAL[_ON] verdict target cpu 1 not allowed for kworker/u8:1[33]
> > 
> > 
> > That's a head scratcher. It's a single node 2 cpu instance and all unbound
> > kworkers should be allowed on all CPUs. I'll update the test to test the
> > actual cpumask but can you see whether this failure is consistent or flaky?
> 
> I re-ran all the jobs, and all sched_ext jobs have failed (3/3).
> Previous time only 1 of 3 runs failed.
> 
> https://github.com/kernel-patches/vmtest/actions/runs/12798804552/job/36016405680

Oh I see what happens, SCX_DSQ_LOCAL_ON is (incorrectly) resolved to 0.

More exactly, none of the enum values are being resolved correctly, likely
due to the CO:RE enum refactoring. There’s probably something broken in
tools/testing/selftests/sched_ext/Makefile, I’ll take a look.

Thanks,
-Andrea

Re: [PATCH sched_ext/for-6.13-fixes] sched_ext: Fix dsq_local_on selftest

Posted by Tejun Heo 1 year ago

On Thu, Jan 23, 2025 at 10:40:52AM +0100, Andrea Righi wrote:
> On Wed, Jan 22, 2025 at 07:10:00PM +0000, Ihor Solodrai wrote:
> > 
> > On Tuesday, January 21st, 2025 at 5:40 PM, Tejun Heo <tj@kernel.org> wrote:
> > 
> > > 
> > > 
> > > Hello, sorry about the delay.
> > > 
> > > On Wed, Jan 15, 2025 at 11:50:37PM +0000, Ihor Solodrai wrote:
> > > ...
> > > 
> > > > 2025-01-15T23:28:55.8238375Z [ 5.334631] sched_ext: BPF scheduler "dsp_local_on" disabled (runtime error)
> > > > 2025-01-15T23:28:55.8243034Z [ 5.335420] sched_ext: dsp_local_on: SCX_DSQ_LOCAL[_ON] verdict target cpu 1 not allowed for kworker/u8:1[33]
> > > 
> > > 
> > > That's a head scratcher. It's a single node 2 cpu instance and all unbound
> > > kworkers should be allowed on all CPUs. I'll update the test to test the
> > > actual cpumask but can you see whether this failure is consistent or flaky?
> > 
> > I re-ran all the jobs, and all sched_ext jobs have failed (3/3).
> > Previous time only 1 of 3 runs failed.
> > 
> > https://github.com/kernel-patches/vmtest/actions/runs/12798804552/job/36016405680
> 
> Oh I see what happens, SCX_DSQ_LOCAL_ON is (incorrectly) resolved to 0.
> 
> More exactly, none of the enum values are being resolved correctly, likely
> due to the CO:RE enum refactoring. There’s probably something broken in
> tools/testing/selftests/sched_ext/Makefile, I’ll take a look.

Yeah, we need to add SCX_ENUM_INIT() to each test. Will do that once the
pending pull request is merged. The original report is a separate issue tho.
I'm still a bit baffled by it.

Thanks.

-- 
tejun

Re: [PATCH sched_ext/for-6.13-fixes] sched_ext: Fix dsq_local_on selftest

Posted by Andrea Righi 1 year ago

On Thu, Jan 23, 2025 at 06:57:58AM -1000, Tejun Heo wrote:
> On Thu, Jan 23, 2025 at 10:40:52AM +0100, Andrea Righi wrote:
> > On Wed, Jan 22, 2025 at 07:10:00PM +0000, Ihor Solodrai wrote:
> > > 
> > > On Tuesday, January 21st, 2025 at 5:40 PM, Tejun Heo <tj@kernel.org> wrote:
> > > 
> > > > 
> > > > 
> > > > Hello, sorry about the delay.
> > > > 
> > > > On Wed, Jan 15, 2025 at 11:50:37PM +0000, Ihor Solodrai wrote:
> > > > ...
> > > > 
> > > > > 2025-01-15T23:28:55.8238375Z [ 5.334631] sched_ext: BPF scheduler "dsp_local_on" disabled (runtime error)
> > > > > 2025-01-15T23:28:55.8243034Z [ 5.335420] sched_ext: dsp_local_on: SCX_DSQ_LOCAL[_ON] verdict target cpu 1 not allowed for kworker/u8:1[33]
> > > > 
> > > > 
> > > > That's a head scratcher. It's a single node 2 cpu instance and all unbound
> > > > kworkers should be allowed on all CPUs. I'll update the test to test the
> > > > actual cpumask but can you see whether this failure is consistent or flaky?
> > > 
> > > I re-ran all the jobs, and all sched_ext jobs have failed (3/3).
> > > Previous time only 1 of 3 runs failed.
> > > 
> > > https://github.com/kernel-patches/vmtest/actions/runs/12798804552/job/36016405680
> > 
> > Oh I see what happens, SCX_DSQ_LOCAL_ON is (incorrectly) resolved to 0.
> > 
> > More exactly, none of the enum values are being resolved correctly, likely
> > due to the CO:RE enum refactoring. There’s probably something broken in
> > tools/testing/selftests/sched_ext/Makefile, I’ll take a look.
> 
> Yeah, we need to add SCX_ENUM_INIT() to each test. Will do that once the
> pending pull request is merged. The original report is a separate issue tho.
> I'm still a bit baffled by it.

For the enum part: https://lore.kernel.org/all/20250123124606.242115-1-arighi@nvidia.com/

And yeah, I missed that the original bug report was about the unbound
kworker not allowed to be dispatched on cpu 1. Weird... I'm wondering if we
need to do the cpumask_cnt / scx_bpf_dsq_cancel() game, like we did with
scx_rustland to handle concurrent affinity changes, but in this case the
kworker shouldn't have its affinity changed...

-Andrea

Re: [PATCH sched_ext/for-6.13-fixes] sched_ext: Fix dsq_local_on selftest

Posted by Andrea Righi 1 year ago

On Thu, Jan 23, 2025 at 07:45:08PM +0100, Andrea Righi wrote:
> On Thu, Jan 23, 2025 at 06:57:58AM -1000, Tejun Heo wrote:
> > On Thu, Jan 23, 2025 at 10:40:52AM +0100, Andrea Righi wrote:
> > > On Wed, Jan 22, 2025 at 07:10:00PM +0000, Ihor Solodrai wrote:
> > > > 
> > > > On Tuesday, January 21st, 2025 at 5:40 PM, Tejun Heo <tj@kernel.org> wrote:
> > > > 
> > > > > 
> > > > > 
> > > > > Hello, sorry about the delay.
> > > > > 
> > > > > On Wed, Jan 15, 2025 at 11:50:37PM +0000, Ihor Solodrai wrote:
> > > > > ...
> > > > > 
> > > > > > 2025-01-15T23:28:55.8238375Z [ 5.334631] sched_ext: BPF scheduler "dsp_local_on" disabled (runtime error)
> > > > > > 2025-01-15T23:28:55.8243034Z [ 5.335420] sched_ext: dsp_local_on: SCX_DSQ_LOCAL[_ON] verdict target cpu 1 not allowed for kworker/u8:1[33]
> > > > > 
> > > > > 
> > > > > That's a head scratcher. It's a single node 2 cpu instance and all unbound
> > > > > kworkers should be allowed on all CPUs. I'll update the test to test the
> > > > > actual cpumask but can you see whether this failure is consistent or flaky?
> > > > 
> > > > I re-ran all the jobs, and all sched_ext jobs have failed (3/3).
> > > > Previous time only 1 of 3 runs failed.
> > > > 
> > > > https://github.com/kernel-patches/vmtest/actions/runs/12798804552/job/36016405680
> > > 
> > > Oh I see what happens, SCX_DSQ_LOCAL_ON is (incorrectly) resolved to 0.
> > > 
> > > More exactly, none of the enum values are being resolved correctly, likely
> > > due to the CO:RE enum refactoring. There’s probably something broken in
> > > tools/testing/selftests/sched_ext/Makefile, I’ll take a look.
> > 
> > Yeah, we need to add SCX_ENUM_INIT() to each test. Will do that once the
> > pending pull request is merged. The original report is a separate issue tho.
> > I'm still a bit baffled by it.
> 
> For the enum part: https://lore.kernel.org/all/20250123124606.242115-1-arighi@nvidia.com/
> 
> And yeah, I missed that the original bug report was about the unbound
> kworker not allowed to be dispatched on cpu 1. Weird... I'm wondering if we
> need to do the cpumask_cnt / scx_bpf_dsq_cancel() game, like we did with
> scx_rustland to handle concurrent affinity changes, but in this case the
> kworker shouldn't have its affinity changed...

Thinking more about this, scx_bpf_task_cpu(p) returns the last known CPU
where the task p was running, but it doesn't necessarily give a CPU where
the task can run at any time. In general it's probably a safer choice to
rely on p->cpus_ptr, maybe doing bpf_cpumask_any_distribute(p->cpus_ptr)
for this test case.

However, I still don't see why the unbound kworker couldn't be dispatched
on cpu 1 in this particular case...

-Andrea

[PATCH sched_ext/for-6.14-fixes] sched_ext: selftests/dsp_local_on: Fix sporadic failures

Posted by Tejun Heo 1 year ago

From e9fe182772dcb2630964724fd93e9c90b68ea0fd Mon Sep 17 00:00:00 2001
From: Tejun Heo <tj@kernel.org>
Date: Fri, 24 Jan 2025 10:48:25 -1000

dsp_local_on has several incorrect assumptions, one of which is that
p->nr_cpus_allowed always tracks p->cpus_ptr. This is not true when a task
is scheduled out while migration is disabled - p->cpus_ptr is temporarily
overridden to the previous CPU while p->nr_cpus_allowed remains unchanged.

This led to sporadic test faliures when dsp_local_on_dispatch() tries to put
a migration disabled task to a different CPU. Fix it by keeping the previous
CPU when migration is disabled.

There are SCX schedulers that make use of p->nr_cpus_allowed. They should
also implement explicit handling for p->migration_disabled.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Ihor Solodrai <ihor.solodrai@pm.me>
Cc: Andrea Righi <arighi@nvidia.com>
Cc: Changwoo Min <changwoo@igalia.com>
---
Applying to sched_ext/for-6.14-fixes. Thanks.

 tools/testing/selftests/sched_ext/dsp_local_on.bpf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/sched_ext/dsp_local_on.bpf.c b/tools/testing/selftests/sched_ext/dsp_local_on.bpf.c
index fbda6bf54671..758b479bd1ee 100644
--- a/tools/testing/selftests/sched_ext/dsp_local_on.bpf.c
+++ b/tools/testing/selftests/sched_ext/dsp_local_on.bpf.c
@@ -43,7 +43,7 @@ void BPF_STRUCT_OPS(dsp_local_on_dispatch, s32 cpu, struct task_struct *prev)
 	if (!p)
 		return;

-	if (p->nr_cpus_allowed == nr_cpus)
+	if (p->nr_cpus_allowed == nr_cpus && !p->migration_disabled)
 		target = bpf_get_prandom_u32() % nr_cpus;
 	else
 		target = scx_bpf_task_cpu(p);
-- 
2.48.1

Re: [PATCH sched_ext/for-6.14-fixes] sched_ext: selftests/dsp_local_on: Fix sporadic failures

Posted by Andrea Righi 1 year ago

On Fri, Jan 24, 2025 at 12:00:38PM -1000, Tejun Heo wrote:
> From e9fe182772dcb2630964724fd93e9c90b68ea0fd Mon Sep 17 00:00:00 2001
> From: Tejun Heo <tj@kernel.org>
> Date: Fri, 24 Jan 2025 10:48:25 -1000
> 
> dsp_local_on has several incorrect assumptions, one of which is that
> p->nr_cpus_allowed always tracks p->cpus_ptr. This is not true when a task
> is scheduled out while migration is disabled - p->cpus_ptr is temporarily
> overridden to the previous CPU while p->nr_cpus_allowed remains unchanged.
> 
> This led to sporadic test faliures when dsp_local_on_dispatch() tries to put
> a migration disabled task to a different CPU. Fix it by keeping the previous
> CPU when migration is disabled.
> 
> There are SCX schedulers that make use of p->nr_cpus_allowed. They should
> also implement explicit handling for p->migration_disabled.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Reported-by: Ihor Solodrai <ihor.solodrai@pm.me>
> Cc: Andrea Righi <arighi@nvidia.com>
> Cc: Changwoo Min <changwoo@igalia.com>
> ---
> Applying to sched_ext/for-6.14-fixes. Thanks.
> 
>  tools/testing/selftests/sched_ext/dsp_local_on.bpf.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tools/testing/selftests/sched_ext/dsp_local_on.bpf.c b/tools/testing/selftests/sched_ext/dsp_local_on.bpf.c
> index fbda6bf54671..758b479bd1ee 100644
> --- a/tools/testing/selftests/sched_ext/dsp_local_on.bpf.c
> +++ b/tools/testing/selftests/sched_ext/dsp_local_on.bpf.c
> @@ -43,7 +43,7 @@ void BPF_STRUCT_OPS(dsp_local_on_dispatch, s32 cpu, struct task_struct *prev)
>  	if (!p)
>  		return;
>  
> -	if (p->nr_cpus_allowed == nr_cpus)
> +	if (p->nr_cpus_allowed == nr_cpus && !p->migration_disabled)

This doesn't work with !CONFIG_SMP, maybe we can introduce a helper like:

static bool is_migration_disabled(const struct task_struct *p)
{
	if (bpf_core_field_exists(p->migration_disabled))
		return p->migration_disabled;
	return false;
}

>  		target = bpf_get_prandom_u32() % nr_cpus;
>  	else
>  		target = scx_bpf_task_cpu(p);
> -- 
> 2.48.1
> 

Thanks,
-Andrea

Re: [PATCH sched_ext/for-6.14-fixes] sched_ext: selftests/dsp_local_on: Fix sporadic failures

Posted by Tejun Heo 1 year ago

On Sat, Jan 25, 2025 at 05:54:23AM +0100, Andrea Righi wrote:
...
> > -	if (p->nr_cpus_allowed == nr_cpus)
> > +	if (p->nr_cpus_allowed == nr_cpus && !p->migration_disabled)
> 
> This doesn't work with !CONFIG_SMP, maybe we can introduce a helper like:
> 
> static bool is_migration_disabled(const struct task_struct *p)
> {
> 	if (bpf_core_field_exists(p->migration_disabled))
> 		return p->migration_disabled;
> 	return false;

Ah, right. Would you care to send the patch?

Thanks.

-- 
tejun

Re: [PATCH sched_ext/for-6.13-fixes] sched_ext: Fix dsq_local_on selftest

Posted by Tejun Heo 1 year, 1 month ago

On Tue, Dec 24, 2024 at 02:09:15PM -1000, Tejun Heo wrote:
> The dsp_local_on selftest expects the scheduler to fail by trying to
> schedule an e.g. CPU-affine task to the wrong CPU. However, this isn't
> guaranteed to happen in the 1 second window that the test is running.
> Besides, it's odd to have this particular exception path tested when there
> are no other tests that verify that the interface is working at all - e.g.
> the test would pass if dsp_local_on interface is completely broken and fails
> on any attempt.
> 
> Flip the test so that it verifies that the feature works. While at it, fix a
> typo in the info message.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Reported-by: Ihor Solodrai <ihor.solodrai@pm.me>
> Link: http://lkml.kernel.org/r/Z1n9v7Z6iNJ-wKmq@slm.duckdns.org

Applied to sched_ext/for-6.13-fixes.

Thanks.

-- 
tejun

Re: [PATCH sched_ext/for-6.13-fixes] sched_ext: Fix invalid irq restore in scx_ops_bypass()

Posted by Tejun Heo 1 year, 1 month ago

On Wed, Dec 11, 2024 at 11:01:51AM -1000, Tejun Heo wrote:
> While adding outer irqsave/restore locking, 0e7ffff1b811 ("scx: Fix raciness
> in scx_ops_bypass()") forgot to convert an inner rq_unlock_irqrestore() to
> rq_unlock() which could re-enable IRQ prematurely leading to the following
> warning:
> 
>   raw_local_irq_restore() called with IRQs enabled
>   WARNING: CPU: 1 PID: 96 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x30/0x40
>   ...
>   Sched_ext: create_dsq (enabling)
>   pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>   pc : warn_bogus_irq_restore+0x30/0x40
>   lr : warn_bogus_irq_restore+0x30/0x40
>   ...
>   Call trace:
>    warn_bogus_irq_restore+0x30/0x40 (P)
>    warn_bogus_irq_restore+0x30/0x40 (L)
>    scx_ops_bypass+0x224/0x3b8
>    scx_ops_enable.isra.0+0x2c8/0xaa8
>    bpf_scx_reg+0x18/0x30
>   ...
>   irq event stamp: 33739
>   hardirqs last  enabled at (33739): [<ffff8000800b699c>] scx_ops_bypass+0x174/0x3b8
>   hardirqs last disabled at (33738): [<ffff800080d48ad4>] _raw_spin_lock_irqsave+0xb4/0xd8
> 
> Drop the stray _irqrestore().
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Reported-by: Ihor Solodrai <ihor.solodrai@pm.me>
> Link: http://lkml.kernel.org/r/qC39k3UsonrBYD_SmuxHnZIQLsuuccoCrkiqb_BT7DvH945A1_LZwE4g-5Pu9FcCtqZt4lY1HhIPi0homRuNWxkgo1rgP3bkxa0donw8kV4=@pm.me
> Fixes: 0e7ffff1b811 ("scx: Fix raciness in scx_ops_bypass()")
> Cc: stable@vger.kernel.org # v6.12

Applying to sched_ext/for-6.13-fixes.

Thanks.

-- 
tejun