[PATCH] MIPS: smp: report dying CPU to RCU in stop_this_cpu()

Jonas Jelonek posted 1 patch 3 days, 13 hours ago
[PATCH] MIPS: smp: report dying CPU to RCU in stop_this_cpu()
Posted by Jonas Jelonek 3 days, 13 hours ago
smp_send_stop() parks all secondary CPUs in stop_this_cpu(). The function
marks the CPU offline for the scheduler via set_cpu_online(false) but
never informs RCU, so RCU keeps expecting a quiescent state from CPUs
that are now spinning forever with interrupts disabled.

As long as nothing waits for an RCU grace period after smp_send_stop()
this is harmless, which is why it went unnoticed. Since commit
91840be8f710 ("irq_work: Fix use-after-free in irq_work_single() on PREEMPT_RT")
however, irq_work_sync() calls synchronize_rcu() on architectures without
an irq_work self-IPI, i.e. where arch_irq_work_has_interrupt() returns
false. That is the asm-generic default used by MIPS. Any irq_work_sync()
issued in the reboot/shutdown path after smp_send_stop() then blocks on
a grace period that can never complete, hanging the reboot:

  WARNING: CPU: 0 PID: 15 at kernel/irq_work.c:144 irq_work_queue_on
  ...
  rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
  rcu: Offline CPU 1 blocking current GP.
  rcu: Offline CPU 2 blocking current GP.
  rcu: Offline CPU 3 blocking current GP.

This issue popped up during kernel bump downstream in OpenWrt from
6.18.33 to 6.18.34, since the suspected change has been backported to
6.18 stable branch [1].

Call rcutree_report_cpu_dead() once interrupts are disabled, mirroring the
generic CPU-hotplug offline path (and arm64's stop handling), so RCU stops
waiting on the parked CPUs and grace periods can still complete.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-6.18.y&id=18c0456ea2615b1a743a6db739c74411c3b42bc6

Fixes: 91840be8f710 ("irq_work: Fix use-after-free in irq_work_single() on PREEMPT_RT")
CC: stable@vger.kernel.org
Signed-off-by: Jonas Jelonek <jelonek.jonas@gmail.com>

diff --git a/arch/mips/kernel/smp.c b/arch/mips/kernel/smp.c
index 4868e79f3b30..0f28b4a62e72 100644
--- a/arch/mips/kernel/smp.c
+++ b/arch/mips/kernel/smp.c
@@ -20,6 +20,7 @@
 #include <linux/sched/mm.h>
 #include <linux/cpumask.h>
 #include <linux/cpu.h>
+#include <linux/rcupdate.h>
 #include <linux/err.h>
 #include <linux/ftrace.h>
 #include <linux/irqdomain.h>
@@ -422,6 +423,7 @@ static void stop_this_cpu(void *dummy)
 	set_cpu_online(smp_processor_id(), false);
 	calculate_cpu_foreign_map();
 	local_irq_disable();
+	rcutree_report_cpu_dead();
 	while (1);
 }
 
-- 
2.51.0
Re: [PATCH] MIPS: smp: report dying CPU to RCU in stop_this_cpu()
Posted by Sebastian Andrzej Siewior 3 days ago
On 2026-06-04 18:24:07 [+0000], Jonas Jelonek wrote:
…
> This issue popped up during kernel bump downstream in OpenWrt from
> 6.18.33 to 6.18.34, since the suspected change has been backported to
> 6.18 stable branch [1].

I would avoid the link and simply write after the backport of the patch
or so.

> Call rcutree_report_cpu_dead() once interrupts are disabled, mirroring the
> generic CPU-hotplug offline path (and arm64's stop handling), so RCU stops
> waiting on the parked CPUs and grace periods can still complete.

This is part of cpuhp_report_idle_dead(). Is it now invoked twice? Or is
something else missing/ different? 

Sebastian
Re: [PATCH] MIPS: smp: report dying CPU to RCU in stop_this_cpu()
Posted by Jonas Jelonek 3 days ago
Hi Sebastian,

I'm not an expert in this area so please correct me if some claim or
explanation is wrong.

On 05.06.26 08:42, Sebastian Andrzej Siewior wrote:
> On 2026-06-04 18:24:07 [+0000], Jonas Jelonek wrote:
> …
>> This issue popped up during kernel bump downstream in OpenWrt from
>> 6.18.33 to 6.18.34, since the suspected change has been backported to
>> 6.18 stable branch [1].
> I would avoid the link and simply write after the backport of the patch
> or so.

Fine with that, I can adjust that in a v2.

>> Call rcutree_report_cpu_dead() once interrupts are disabled, mirroring the
>> generic CPU-hotplug offline path (and arm64's stop handling), so RCU stops
>> waiting on the parked CPUs and grace periods can still complete.
> This is part of cpuhp_report_idle_dead(). Is it now invoked twice? Or is
> something else missing/ different? 

Those seem to be two different paths. To be honest I'm not confident
under which circumstances which of those paths is used to take down
a CPU. In my case, issuing a reboot command reaches smp_send_stop()
where the issue explained in the patch message then happens.

> Sebastian

Best,
Jonas
Re: [PATCH] MIPS: smp: report dying CPU to RCU in stop_this_cpu()
Posted by Huacai Chen 2 days, 17 hours ago
On Fri, Jun 5, 2026 at 3:28 PM Jonas Jelonek <jelonek.jonas@gmail.com> wrote:
>
> Hi Sebastian,
>
> I'm not an expert in this area so please correct me if some claim or
> explanation is wrong.
>
> On 05.06.26 08:42, Sebastian Andrzej Siewior wrote:
> > On 2026-06-04 18:24:07 [+0000], Jonas Jelonek wrote:
> > …
> >> This issue popped up during kernel bump downstream in OpenWrt from
> >> 6.18.33 to 6.18.34, since the suspected change has been backported to
> >> 6.18 stable branch [1].
> > I would avoid the link and simply write after the backport of the patch
> > or so.
>
> Fine with that, I can adjust that in a v2.
>
> >> Call rcutree_report_cpu_dead() once interrupts are disabled, mirroring the
> >> generic CPU-hotplug offline path (and arm64's stop handling), so RCU stops
> >> waiting on the parked CPUs and grace periods can still complete.
> > This is part of cpuhp_report_idle_dead(). Is it now invoked twice? Or is
> > something else missing/ different?
>
> Those seem to be two different paths. To be honest I'm not confident
> under which circumstances which of those paths is used to take down
> a CPU. In my case, issuing a reboot command reaches smp_send_stop()
> where the issue explained in the patch message then happens.
I think I know the reason. Halt/poweroff/reboot doesn't call cpu
hotplug functions to disable non-boot cpus, instead it only calls
migrate_to_reboot_cpu() and then goto the arch-specific code. And
arch-specific code also doesn't call cpu hotplug functions, it only
calls smp_send_stop() to send IPIs to non-boot cpus, then non-boot
cpus call stop_this_cpu(). This is why stop_this_cpu() needs
rcutree_report_cpu_dead().

Huacai


>
> > Sebastian
>
> Best,
> Jonas
>
Re: [PATCH] MIPS: smp: report dying CPU to RCU in stop_this_cpu()
Posted by Sebastian Andrzej Siewior 2 days, 21 hours ago
On 2026-06-05 09:12:09 [+0200], Jonas Jelonek wrote:
> Hi Sebastian,
Hi,

> >> Call rcutree_report_cpu_dead() once interrupts are disabled, mirroring the
> >> generic CPU-hotplug offline path (and arm64's stop handling), so RCU stops
> >> waiting on the parked CPUs and grace periods can still complete.
> > This is part of cpuhp_report_idle_dead(). Is it now invoked twice? Or is
> > something else missing/ different? 
> 
> Those seem to be two different paths. To be honest I'm not confident
> under which circumstances which of those paths is used to take down
> a CPU. In my case, issuing a reboot command reaches smp_send_stop()
> where the issue explained in the patch message then happens.
> 

Does 
	echo 0 > /sys/devices/system/cpu/cpu1/online

lead to the same problem?

I missed that arm64 has also this but only if the online path fails kind
of early, see
	04e613ded8c26 ("arm64: smp: Tell RCU about CPUs that fail to come online")

so this not the "normal" case but an exception. Mips seems to be doing
something different here. I am not sure if this is the only thing that
is missing.

> Best,
> Jonas

Sebastian
Re: [PATCH] MIPS: smp: report dying CPU to RCU in stop_this_cpu()
Posted by Jonas Jelonek 2 days, 20 hours ago
Hi,

On 05.06.26 12:34, Sebastian Andrzej Siewior wrote:
> Does 
> 	echo 0 > /sys/devices/system/cpu/cpu1/online
>
> lead to the same problem?

Funny, my device doesn't have this 'online' file, neither for the
other CPUs. So it seems this CPU hotplug isn't supported/used here?
Or am I missing a Kconfig option for that?

I'm working on a Realtek RTL931x SoC here, it has MIPS interAptiv
cores. I can provide more information if needed.

> I missed that arm64 has also this but only if the online path fails kind
> of early, see
> 	04e613ded8c26 ("arm64: smp: Tell RCU about CPUs that fail to come online")
>
> so this not the "normal" case but an exception. Mips seems to be doing
> something different here. I am not sure if this is the only thing that
> is missing.
>
>> Best,
>> Jonas
> Sebastian

Best,
Jonas
Re: [PATCH] MIPS: smp: report dying CPU to RCU in stop_this_cpu()
Posted by Huacai Chen 3 days, 4 hours ago
Hi, Jonas,

On Fri, Jun 5, 2026 at 2:25 AM Jonas Jelonek <jelonek.jonas@gmail.com> wrote:
>
> smp_send_stop() parks all secondary CPUs in stop_this_cpu(). The function
> marks the CPU offline for the scheduler via set_cpu_online(false) but
> never informs RCU, so RCU keeps expecting a quiescent state from CPUs
> that are now spinning forever with interrupts disabled.
>
> As long as nothing waits for an RCU grace period after smp_send_stop()
> this is harmless, which is why it went unnoticed. Since commit
> 91840be8f710 ("irq_work: Fix use-after-free in irq_work_single() on PREEMPT_RT")
> however, irq_work_sync() calls synchronize_rcu() on architectures without
> an irq_work self-IPI, i.e. where arch_irq_work_has_interrupt() returns
> false. That is the asm-generic default used by MIPS. Any irq_work_sync()
> issued in the reboot/shutdown path after smp_send_stop() then blocks on
> a grace period that can never complete, hanging the reboot:
>
>   WARNING: CPU: 0 PID: 15 at kernel/irq_work.c:144 irq_work_queue_on
>   ...
>   rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
>   rcu: Offline CPU 1 blocking current GP.
>   rcu: Offline CPU 2 blocking current GP.
>   rcu: Offline CPU 3 blocking current GP.
>
> This issue popped up during kernel bump downstream in OpenWrt from
> 6.18.33 to 6.18.34, since the suspected change has been backported to
> 6.18 stable branch [1].
Now 91840be8f710 ("irq_work: Fix use-after-free in irq_work_single()
on PREEMPT_RT") has been backported to as early as 6.1 LTS.

>
> Call rcutree_report_cpu_dead() once interrupts are disabled, mirroring the
> generic CPU-hotplug offline path (and arm64's stop handling), so RCU stops
> waiting on the parked CPUs and grace periods can still complete.
>
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-6.18.y&id=18c0456ea2615b1a743a6db739c74411c3b42bc6
>
> Fixes: 91840be8f710 ("irq_work: Fix use-after-free in irq_work_single() on PREEMPT_RT")
> CC: stable@vger.kernel.org
> Signed-off-by: Jonas Jelonek <jelonek.jonas@gmail.com>
>
> diff --git a/arch/mips/kernel/smp.c b/arch/mips/kernel/smp.c
> index 4868e79f3b30..0f28b4a62e72 100644
> --- a/arch/mips/kernel/smp.c
> +++ b/arch/mips/kernel/smp.c
> @@ -20,6 +20,7 @@
>  #include <linux/sched/mm.h>
>  #include <linux/cpumask.h>
>  #include <linux/cpu.h>
> +#include <linux/rcupdate.h>
>  #include <linux/err.h>
>  #include <linux/ftrace.h>
>  #include <linux/irqdomain.h>
> @@ -422,6 +423,7 @@ static void stop_this_cpu(void *dummy)
>         set_cpu_online(smp_processor_id(), false);
>         calculate_cpu_foreign_map();
>         local_irq_disable();
> +       rcutree_report_cpu_dead();
I'm not sure but maybe it is better to before local_irq_disable()?

Huacai
>         while (1);
>  }
>
> --
> 2.51.0
>
>
Re: [PATCH] MIPS: smp: report dying CPU to RCU in stop_this_cpu()
Posted by Jonas Jelonek 3 days ago
Hi Huacai,

On 05.06.26 05:01, Huacai Chen wrote:
> Hi, Jonas,
>
> On Fri, Jun 5, 2026 at 2:25 AM Jonas Jelonek <jelonek.jonas@gmail.com> wrote:
>> smp_send_stop() parks all secondary CPUs in stop_this_cpu(). The function
>> marks the CPU offline for the scheduler via set_cpu_online(false) but
>> never informs RCU, so RCU keeps expecting a quiescent state from CPUs
>> that are now spinning forever with interrupts disabled.
>>
>> As long as nothing waits for an RCU grace period after smp_send_stop()
>> this is harmless, which is why it went unnoticed. Since commit
>> 91840be8f710 ("irq_work: Fix use-after-free in irq_work_single() on PREEMPT_RT")
>> however, irq_work_sync() calls synchronize_rcu() on architectures without
>> an irq_work self-IPI, i.e. where arch_irq_work_has_interrupt() returns
>> false. That is the asm-generic default used by MIPS. Any irq_work_sync()
>> issued in the reboot/shutdown path after smp_send_stop() then blocks on
>> a grace period that can never complete, hanging the reboot:
>>
>>   WARNING: CPU: 0 PID: 15 at kernel/irq_work.c:144 irq_work_queue_on
>>   ...
>>   rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
>>   rcu: Offline CPU 1 blocking current GP.
>>   rcu: Offline CPU 2 blocking current GP.
>>   rcu: Offline CPU 3 blocking current GP.
>>
>> This issue popped up during kernel bump downstream in OpenWrt from
>> 6.18.33 to 6.18.34, since the suspected change has been backported to
>> 6.18 stable branch [1].
> Now 91840be8f710 ("irq_work: Fix use-after-free in irq_work_single()
> on PREEMPT_RT") has been backported to as early as 6.1 LTS.

Yes, as also pointed out by Sebastian I should adjust this paragraph
to be more accurate.

>> Call rcutree_report_cpu_dead() once interrupts are disabled, mirroring the
>> generic CPU-hotplug offline path (and arm64's stop handling), so RCU stops
>> waiting on the parked CPUs and grace periods can still complete.
>>
>> [1] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-6.18.y&id=18c0456ea2615b1a743a6db739c74411c3b42bc6
>>
>> Fixes: 91840be8f710 ("irq_work: Fix use-after-free in irq_work_single() on PREEMPT_RT")
>> CC: stable@vger.kernel.org
>> Signed-off-by: Jonas Jelonek <jelonek.jonas@gmail.com>
>>
>> diff --git a/arch/mips/kernel/smp.c b/arch/mips/kernel/smp.c
>> index 4868e79f3b30..0f28b4a62e72 100644
>> --- a/arch/mips/kernel/smp.c
>> +++ b/arch/mips/kernel/smp.c
>> @@ -20,6 +20,7 @@
>>  #include <linux/sched/mm.h>
>>  #include <linux/cpumask.h>
>>  #include <linux/cpu.h>
>> +#include <linux/rcupdate.h>
>>  #include <linux/err.h>
>>  #include <linux/ftrace.h>
>>  #include <linux/irqdomain.h>
>> @@ -422,6 +423,7 @@ static void stop_this_cpu(void *dummy)
>>         set_cpu_online(smp_processor_id(), false);
>>         calculate_cpu_foreign_map();
>>         local_irq_disable();
>> +       rcutree_report_cpu_dead();
> I'm not sure but maybe it is better to before local_irq_disable()?

rcutree_report_cpu_dead() starts with lockdep_assert_irqs_disabled() so
it needs IRQs disabled already.

> Huacai
>>         while (1);
>>  }
>>
>> --
>> 2.51.0
>>
>>

Best,
Jonas