kernel/sched/ext.c | 9 +++++++++ 1 file changed, 9 insertions(+)
When getting an LLC CPU mask in the default CPU selection policy,
scx_select_cpu_dfl(), a pointer to the sched_domain is dereferenced
using rcu_read_lock() without holding rcu_read_lock(). Such an unprotected
dereference often causes the following warning and can cause an invalid
memory access in the worst case.
Therefore, protect dereference of a sched_domain pointer using a pair
of rcu_read_lock() and unlock().
[ 20.996135] =============================
[ 20.996345] WARNING: suspicious RCU usage
[ 20.996563] 6.11.0-virtme #17 Tainted: G W
[ 20.996576] -----------------------------
[ 20.996576] kernel/sched/ext.c:3323 suspicious rcu_dereference_check() usage!
[ 20.996576]
[ 20.996576] other info that might help us debug this:
[ 20.996576]
[ 20.996576]
[ 20.996576] rcu_scheduler_active = 2, debug_locks = 1
[ 20.996576] 4 locks held by kworker/8:1/140:
[ 20.996576] #0: ffff8b18c00dd348 ((wq_completion)pm){+.+.}-{0:0}, at: process_one_work+0x4a0/0x590
[ 20.996576] #1: ffffb3da01f67e58 ((work_completion)(&dev->power.work)){+.+.}-{0:0}, at: process_one_work+0x1ba/0x590
[ 20.996576] #2: ffffffffa316f9f0 (&rcu_state.gp_wq){..-.}-{2:2}, at: swake_up_one+0x15/0x60
[ 20.996576] #3: ffff8b1880398a60 (&p->pi_lock){-.-.}-{2:2}, at: try_to_wake_up+0x59/0x7d0
[ 20.996576]
[ 20.996576] stack backtrace:
[ 20.996576] CPU: 8 UID: 0 PID: 140 Comm: kworker/8:1 Tainted: G W 6.11.0-virtme #17
[ 20.996576] Tainted: [W]=WARN
[ 20.996576] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
[ 20.996576] Workqueue: pm pm_runtime_work
[ 20.996576] Sched_ext: simple (disabling+all), task: runnable_at=-6ms
[ 20.996576] Call Trace:
[ 20.996576] <IRQ>
[ 20.996576] dump_stack_lvl+0x6f/0xb0
[ 20.996576] lockdep_rcu_suspicious.cold+0x4e/0x96
[ 20.996576] scx_select_cpu_dfl+0x234/0x260
[ 20.996576] select_task_rq_scx+0xfb/0x190
[ 20.996576] select_task_rq+0x47/0x110
[ 20.996576] try_to_wake_up+0x110/0x7d0
[ 20.996576] swake_up_one+0x39/0x60
[ 20.996576] rcu_core+0xb08/0xe50
[ 20.996576] ? srso_alias_return_thunk+0x5/0xfbef5
[ 20.996576] ? mark_held_locks+0x40/0x70
[ 20.996576] handle_softirqs+0xd3/0x410
[ 20.996576] irq_exit_rcu+0x78/0xa0
[ 20.996576] sysvec_apic_timer_interrupt+0x73/0x80
[ 20.996576] </IRQ>
[ 20.996576] <TASK>
[ 20.996576] asm_sysvec_apic_timer_interrupt+0x1a/0x20
[ 20.996576] RIP: 0010:_raw_spin_unlock_irqrestore+0x36/0x70
[ 20.996576] Code: f5 53 48 8b 74 24 10 48 89 fb 48 83 c7 18 e8 11 b4 36 ff 48 89 df e8 99 0d 37 ff f7 c5 00 02 00 00 75 17 9c 58 f6 c4 02 75 2b <65> ff 0d 5b 55 3c 5e 74 16 5b 5d e9 95 8e 28 00 e8 a5 ee 44 ff 9c
[ 20.996576] RSP: 0018:ffffb3da01f67d20 EFLAGS: 00000246
[ 20.996576] RAX: 0000000000000002 RBX: ffffffffa4640220 RCX: 0000000000000040
[ 20.996576] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffa1c7b27b
[ 20.996576] RBP: 0000000000000246 R08: 0000000000000001 R09: 0000000000000000
[ 20.996576] R10: 0000000000000001 R11: 000000000000021c R12: 0000000000000246
[ 20.996576] R13: ffff8b1881363958 R14: 0000000000000000 R15: ffff8b1881363800
[ 20.996576] ? _raw_spin_unlock_irqrestore+0x4b/0x70
[ 20.996576] serial_port_runtime_resume+0xd4/0x1a0
[ 20.996576] ? __pfx_serial_port_runtime_resume+0x10/0x10
[ 20.996576] __rpm_callback+0x44/0x170
[ 20.996576] ? __pfx_serial_port_runtime_resume+0x10/0x10
[ 20.996576] rpm_callback+0x55/0x60
[ 20.996576] ? __pfx_serial_port_runtime_resume+0x10/0x10
[ 20.996576] rpm_resume+0x582/0x7b0
[ 20.996576] pm_runtime_work+0x7c/0xb0
[ 20.996576] process_one_work+0x1fb/0x590
[ 20.996576] worker_thread+0x18e/0x350
[ 20.996576] ? __pfx_worker_thread+0x10/0x10
[ 20.996576] kthread+0xe2/0x110
[ 20.996576] ? __pfx_kthread+0x10/0x10
[ 20.996576] ret_from_fork+0x34/0x50
[ 20.996576] ? __pfx_kthread+0x10/0x10
[ 20.996576] ret_from_fork_asm+0x1a/0x30
[ 20.996576] </TASK>
[ 21.056592] sched_ext: BPF scheduler "simple" disabled (unregistered from user space)
Signed-off-by: Changwoo Min <changwoo@igalia.com>
---
kernel/sched/ext.c | 9 +++++++++
1 file changed, 9 insertions(+)
ChangeLog v1 -> v2:
- Extend the RCU critical section to cover the use of llc_cpus.
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 6ff501a18b88..f0e9c77eb33b 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -3302,6 +3302,12 @@ static s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu,
*found = false;
+
+ /*
+ * This is necessary to protect llc_cpus.
+ */
+ rcu_read_lock();
+
/*
* Determine the scheduling domain only if the task is allowed to run
* on all CPUs.
@@ -3436,9 +3442,12 @@ static s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu,
if (cpu >= 0)
goto cpu_found;
+ rcu_read_unlock();
return prev_cpu;
cpu_found:
+ rcu_read_unlock();
+
*found = true;
return cpu;
}
--
2.47.0
On Sat, Nov 09, 2024 at 03:29:05PM +0900, Changwoo Min wrote: > When getting an LLC CPU mask in the default CPU selection policy, > scx_select_cpu_dfl(), a pointer to the sched_domain is dereferenced > using rcu_read_lock() without holding rcu_read_lock(). Such an unprotected > dereference often causes the following warning and can cause an invalid > memory access in the worst case. > > Therefore, protect dereference of a sched_domain pointer using a pair > of rcu_read_lock() and unlock(). ... > Signed-off-by: Changwoo Min <changwoo@igalia.com> Applied to sched_ext/for-6.13. Thanks. -- tejun
Hi Changwoo, On Sat, Nov 09, 2024 at 03:29:05PM +0900, Changwoo Min wrote: > When getting an LLC CPU mask in the default CPU selection policy, > scx_select_cpu_dfl(), a pointer to the sched_domain is dereferenced > using rcu_read_lock() without holding rcu_read_lock(). Such an unprotected > dereference often causes the following warning and can cause an invalid > memory access in the worst case. Thanks for catching this! I'm wondering why we didn't hit this in our CI... I'll investigate. Just a minor nit below, apart than that looks good to me. Acked-by: Andrea Righi <arighi@nvidia.com> > > Therefore, protect dereference of a sched_domain pointer using a pair > of rcu_read_lock() and unlock(). > > [ 20.996135] ============================= > [ 20.996345] WARNING: suspicious RCU usage > [ 20.996563] 6.11.0-virtme #17 Tainted: G W > [ 20.996576] ----------------------------- > [ 20.996576] kernel/sched/ext.c:3323 suspicious rcu_dereference_check() usage! > [ 20.996576] > [ 20.996576] other info that might help us debug this: > [ 20.996576] > [ 20.996576] > [ 20.996576] rcu_scheduler_active = 2, debug_locks = 1 > [ 20.996576] 4 locks held by kworker/8:1/140: > [ 20.996576] #0: ffff8b18c00dd348 ((wq_completion)pm){+.+.}-{0:0}, at: process_one_work+0x4a0/0x590 > [ 20.996576] #1: ffffb3da01f67e58 ((work_completion)(&dev->power.work)){+.+.}-{0:0}, at: process_one_work+0x1ba/0x590 > [ 20.996576] #2: ffffffffa316f9f0 (&rcu_state.gp_wq){..-.}-{2:2}, at: swake_up_one+0x15/0x60 > [ 20.996576] #3: ffff8b1880398a60 (&p->pi_lock){-.-.}-{2:2}, at: try_to_wake_up+0x59/0x7d0 > [ 20.996576] > [ 20.996576] stack backtrace: > [ 20.996576] CPU: 8 UID: 0 PID: 140 Comm: kworker/8:1 Tainted: G W 6.11.0-virtme #17 > [ 20.996576] Tainted: [W]=WARN > [ 20.996576] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 > [ 20.996576] Workqueue: pm pm_runtime_work > [ 20.996576] Sched_ext: simple (disabling+all), task: runnable_at=-6ms > [ 20.996576] Call Trace: > [ 20.996576] <IRQ> > [ 20.996576] dump_stack_lvl+0x6f/0xb0 > [ 20.996576] lockdep_rcu_suspicious.cold+0x4e/0x96 > [ 20.996576] scx_select_cpu_dfl+0x234/0x260 > [ 20.996576] select_task_rq_scx+0xfb/0x190 > [ 20.996576] select_task_rq+0x47/0x110 > [ 20.996576] try_to_wake_up+0x110/0x7d0 > [ 20.996576] swake_up_one+0x39/0x60 > [ 20.996576] rcu_core+0xb08/0xe50 > [ 20.996576] ? srso_alias_return_thunk+0x5/0xfbef5 > [ 20.996576] ? mark_held_locks+0x40/0x70 > [ 20.996576] handle_softirqs+0xd3/0x410 > [ 20.996576] irq_exit_rcu+0x78/0xa0 > [ 20.996576] sysvec_apic_timer_interrupt+0x73/0x80 > [ 20.996576] </IRQ> > [ 20.996576] <TASK> > [ 20.996576] asm_sysvec_apic_timer_interrupt+0x1a/0x20 > [ 20.996576] RIP: 0010:_raw_spin_unlock_irqrestore+0x36/0x70 > [ 20.996576] Code: f5 53 48 8b 74 24 10 48 89 fb 48 83 c7 18 e8 11 b4 36 ff 48 89 df e8 99 0d 37 ff f7 c5 00 02 00 00 75 17 9c 58 f6 c4 02 75 2b <65> ff 0d 5b 55 3c 5e 74 16 5b 5d e9 95 8e 28 00 e8 a5 ee 44 ff 9c > [ 20.996576] RSP: 0018:ffffb3da01f67d20 EFLAGS: 00000246 > [ 20.996576] RAX: 0000000000000002 RBX: ffffffffa4640220 RCX: 0000000000000040 > [ 20.996576] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffa1c7b27b > [ 20.996576] RBP: 0000000000000246 R08: 0000000000000001 R09: 0000000000000000 > [ 20.996576] R10: 0000000000000001 R11: 000000000000021c R12: 0000000000000246 > [ 20.996576] R13: ffff8b1881363958 R14: 0000000000000000 R15: ffff8b1881363800 > [ 20.996576] ? _raw_spin_unlock_irqrestore+0x4b/0x70 > [ 20.996576] serial_port_runtime_resume+0xd4/0x1a0 > [ 20.996576] ? __pfx_serial_port_runtime_resume+0x10/0x10 > [ 20.996576] __rpm_callback+0x44/0x170 > [ 20.996576] ? __pfx_serial_port_runtime_resume+0x10/0x10 > [ 20.996576] rpm_callback+0x55/0x60 > [ 20.996576] ? __pfx_serial_port_runtime_resume+0x10/0x10 > [ 20.996576] rpm_resume+0x582/0x7b0 > [ 20.996576] pm_runtime_work+0x7c/0xb0 > [ 20.996576] process_one_work+0x1fb/0x590 > [ 20.996576] worker_thread+0x18e/0x350 > [ 20.996576] ? __pfx_worker_thread+0x10/0x10 > [ 20.996576] kthread+0xe2/0x110 > [ 20.996576] ? __pfx_kthread+0x10/0x10 > [ 20.996576] ret_from_fork+0x34/0x50 > [ 20.996576] ? __pfx_kthread+0x10/0x10 > [ 20.996576] ret_from_fork_asm+0x1a/0x30 > [ 20.996576] </TASK> > [ 21.056592] sched_ext: BPF scheduler "simple" disabled (unregistered from user space) > > Signed-off-by: Changwoo Min <changwoo@igalia.com> > --- > kernel/sched/ext.c | 9 +++++++++ > 1 file changed, 9 insertions(+) > > ChangeLog v1 -> v2: > - Extend the RCU critical section to cover the use of llc_cpus. > > diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c > index 6ff501a18b88..f0e9c77eb33b 100644 > --- a/kernel/sched/ext.c > +++ b/kernel/sched/ext.c > @@ -3302,6 +3302,12 @@ static s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, > > *found = false; > > + ^ Can we remove this empty line? > + /* > + * This is necessary to protect llc_cpus. > + */ > + rcu_read_lock(); > + > /* > * Determine the scheduling domain only if the task is allowed to run > * on all CPUs. > @@ -3436,9 +3442,12 @@ static s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, > if (cpu >= 0) > goto cpu_found; > > + rcu_read_unlock(); > return prev_cpu; > > cpu_found: > + rcu_read_unlock(); > + > *found = true; > return cpu; > } > -- > 2.47.0 >
Hi Andrea, On 24. 11. 9. 22:47, Andrea Righi wrote: > Thanks for catching this! I'm wondering why we didn't hit this in our > CI... I'll investigate. Yes, that's what I am also curious about. I triggered the warning on a 2-socket virtual machine using virtme-ng. Regards, Changwoo Min
© 2016 - 2024 Red Hat, Inc.