smp_send_stop() parks all secondary CPUs in stop_this_cpu(). The function
marks the CPU offline for the scheduler via set_cpu_online(false) but
never informs RCU, so RCU keeps expecting a quiescent state from CPUs
that are now spinning forever with interrupts disabled.
As long as nothing waits for an RCU grace period after smp_send_stop()
this is harmless, which is why it went unnoticed. Since commit
91840be8f710 ("irq_work: Fix use-after-free in irq_work_single() on PREEMPT_RT")
however, irq_work_sync() calls synchronize_rcu() on architectures without
an irq_work self-IPI, i.e. where arch_irq_work_has_interrupt() returns
false. That is the asm-generic default used by MIPS. Any irq_work_sync()
issued in the reboot/shutdown path after smp_send_stop() then blocks on
a grace period that can never complete, hanging the reboot:
WARNING: CPU: 0 PID: 15 at kernel/irq_work.c:144 irq_work_queue_on
...
rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
rcu: Offline CPU 1 blocking current GP.
rcu: Offline CPU 2 blocking current GP.
rcu: Offline CPU 3 blocking current GP.
This issue popped up during kernel bump downstream in OpenWrt from
6.18.33 to 6.18.34, since the suspected change has been backported to
6.18 stable branch [1].
Call rcutree_report_cpu_dead() once interrupts are disabled, mirroring the
generic CPU-hotplug offline path (and arm64's stop handling), so RCU stops
waiting on the parked CPUs and grace periods can still complete.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-6.18.y&id=18c0456ea2615b1a743a6db739c74411c3b42bc6
Fixes: 91840be8f710 ("irq_work: Fix use-after-free in irq_work_single() on PREEMPT_RT")
CC: stable@vger.kernel.org
Signed-off-by: Jonas Jelonek <jelonek.jonas@gmail.com>
diff --git a/arch/mips/kernel/smp.c b/arch/mips/kernel/smp.c
index 4868e79f3b30..0f28b4a62e72 100644
--- a/arch/mips/kernel/smp.c
+++ b/arch/mips/kernel/smp.c
@@ -20,6 +20,7 @@
#include <linux/sched/mm.h>
#include <linux/cpumask.h>
#include <linux/cpu.h>
+#include <linux/rcupdate.h>
#include <linux/err.h>
#include <linux/ftrace.h>
#include <linux/irqdomain.h>
@@ -422,6 +423,7 @@ static void stop_this_cpu(void *dummy)
set_cpu_online(smp_processor_id(), false);
calculate_cpu_foreign_map();
local_irq_disable();
+ rcutree_report_cpu_dead();
while (1);
}
--
2.51.0
On 2026-06-04 18:24:07 [+0000], Jonas Jelonek wrote: … > This issue popped up during kernel bump downstream in OpenWrt from > 6.18.33 to 6.18.34, since the suspected change has been backported to > 6.18 stable branch [1]. I would avoid the link and simply write after the backport of the patch or so. > Call rcutree_report_cpu_dead() once interrupts are disabled, mirroring the > generic CPU-hotplug offline path (and arm64's stop handling), so RCU stops > waiting on the parked CPUs and grace periods can still complete. This is part of cpuhp_report_idle_dead(). Is it now invoked twice? Or is something else missing/ different? Sebastian
Hi Sebastian, I'm not an expert in this area so please correct me if some claim or explanation is wrong. On 05.06.26 08:42, Sebastian Andrzej Siewior wrote: > On 2026-06-04 18:24:07 [+0000], Jonas Jelonek wrote: > … >> This issue popped up during kernel bump downstream in OpenWrt from >> 6.18.33 to 6.18.34, since the suspected change has been backported to >> 6.18 stable branch [1]. > I would avoid the link and simply write after the backport of the patch > or so. Fine with that, I can adjust that in a v2. >> Call rcutree_report_cpu_dead() once interrupts are disabled, mirroring the >> generic CPU-hotplug offline path (and arm64's stop handling), so RCU stops >> waiting on the parked CPUs and grace periods can still complete. > This is part of cpuhp_report_idle_dead(). Is it now invoked twice? Or is > something else missing/ different? Those seem to be two different paths. To be honest I'm not confident under which circumstances which of those paths is used to take down a CPU. In my case, issuing a reboot command reaches smp_send_stop() where the issue explained in the patch message then happens. > Sebastian Best, Jonas
On Fri, Jun 5, 2026 at 3:28 PM Jonas Jelonek <jelonek.jonas@gmail.com> wrote: > > Hi Sebastian, > > I'm not an expert in this area so please correct me if some claim or > explanation is wrong. > > On 05.06.26 08:42, Sebastian Andrzej Siewior wrote: > > On 2026-06-04 18:24:07 [+0000], Jonas Jelonek wrote: > > … > >> This issue popped up during kernel bump downstream in OpenWrt from > >> 6.18.33 to 6.18.34, since the suspected change has been backported to > >> 6.18 stable branch [1]. > > I would avoid the link and simply write after the backport of the patch > > or so. > > Fine with that, I can adjust that in a v2. > > >> Call rcutree_report_cpu_dead() once interrupts are disabled, mirroring the > >> generic CPU-hotplug offline path (and arm64's stop handling), so RCU stops > >> waiting on the parked CPUs and grace periods can still complete. > > This is part of cpuhp_report_idle_dead(). Is it now invoked twice? Or is > > something else missing/ different? > > Those seem to be two different paths. To be honest I'm not confident > under which circumstances which of those paths is used to take down > a CPU. In my case, issuing a reboot command reaches smp_send_stop() > where the issue explained in the patch message then happens. I think I know the reason. Halt/poweroff/reboot doesn't call cpu hotplug functions to disable non-boot cpus, instead it only calls migrate_to_reboot_cpu() and then goto the arch-specific code. And arch-specific code also doesn't call cpu hotplug functions, it only calls smp_send_stop() to send IPIs to non-boot cpus, then non-boot cpus call stop_this_cpu(). This is why stop_this_cpu() needs rcutree_report_cpu_dead(). Huacai > > > Sebastian > > Best, > Jonas >
On 2026-06-05 09:12:09 [+0200], Jonas Jelonek wrote:
> Hi Sebastian,
Hi,
> >> Call rcutree_report_cpu_dead() once interrupts are disabled, mirroring the
> >> generic CPU-hotplug offline path (and arm64's stop handling), so RCU stops
> >> waiting on the parked CPUs and grace periods can still complete.
> > This is part of cpuhp_report_idle_dead(). Is it now invoked twice? Or is
> > something else missing/ different?
>
> Those seem to be two different paths. To be honest I'm not confident
> under which circumstances which of those paths is used to take down
> a CPU. In my case, issuing a reboot command reaches smp_send_stop()
> where the issue explained in the patch message then happens.
>
Does
echo 0 > /sys/devices/system/cpu/cpu1/online
lead to the same problem?
I missed that arm64 has also this but only if the online path fails kind
of early, see
04e613ded8c26 ("arm64: smp: Tell RCU about CPUs that fail to come online")
so this not the "normal" case but an exception. Mips seems to be doing
something different here. I am not sure if this is the only thing that
is missing.
> Best,
> Jonas
Sebastian
Hi,
On 05.06.26 12:34, Sebastian Andrzej Siewior wrote:
> Does
> echo 0 > /sys/devices/system/cpu/cpu1/online
>
> lead to the same problem?
Funny, my device doesn't have this 'online' file, neither for the
other CPUs. So it seems this CPU hotplug isn't supported/used here?
Or am I missing a Kconfig option for that?
I'm working on a Realtek RTL931x SoC here, it has MIPS interAptiv
cores. I can provide more information if needed.
> I missed that arm64 has also this but only if the online path fails kind
> of early, see
> 04e613ded8c26 ("arm64: smp: Tell RCU about CPUs that fail to come online")
>
> so this not the "normal" case but an exception. Mips seems to be doing
> something different here. I am not sure if this is the only thing that
> is missing.
>
>> Best,
>> Jonas
> Sebastian
Best,
Jonas
Hi, Jonas,
On Fri, Jun 5, 2026 at 2:25 AM Jonas Jelonek <jelonek.jonas@gmail.com> wrote:
>
> smp_send_stop() parks all secondary CPUs in stop_this_cpu(). The function
> marks the CPU offline for the scheduler via set_cpu_online(false) but
> never informs RCU, so RCU keeps expecting a quiescent state from CPUs
> that are now spinning forever with interrupts disabled.
>
> As long as nothing waits for an RCU grace period after smp_send_stop()
> this is harmless, which is why it went unnoticed. Since commit
> 91840be8f710 ("irq_work: Fix use-after-free in irq_work_single() on PREEMPT_RT")
> however, irq_work_sync() calls synchronize_rcu() on architectures without
> an irq_work self-IPI, i.e. where arch_irq_work_has_interrupt() returns
> false. That is the asm-generic default used by MIPS. Any irq_work_sync()
> issued in the reboot/shutdown path after smp_send_stop() then blocks on
> a grace period that can never complete, hanging the reboot:
>
> WARNING: CPU: 0 PID: 15 at kernel/irq_work.c:144 irq_work_queue_on
> ...
> rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
> rcu: Offline CPU 1 blocking current GP.
> rcu: Offline CPU 2 blocking current GP.
> rcu: Offline CPU 3 blocking current GP.
>
> This issue popped up during kernel bump downstream in OpenWrt from
> 6.18.33 to 6.18.34, since the suspected change has been backported to
> 6.18 stable branch [1].
Now 91840be8f710 ("irq_work: Fix use-after-free in irq_work_single()
on PREEMPT_RT") has been backported to as early as 6.1 LTS.
>
> Call rcutree_report_cpu_dead() once interrupts are disabled, mirroring the
> generic CPU-hotplug offline path (and arm64's stop handling), so RCU stops
> waiting on the parked CPUs and grace periods can still complete.
>
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-6.18.y&id=18c0456ea2615b1a743a6db739c74411c3b42bc6
>
> Fixes: 91840be8f710 ("irq_work: Fix use-after-free in irq_work_single() on PREEMPT_RT")
> CC: stable@vger.kernel.org
> Signed-off-by: Jonas Jelonek <jelonek.jonas@gmail.com>
>
> diff --git a/arch/mips/kernel/smp.c b/arch/mips/kernel/smp.c
> index 4868e79f3b30..0f28b4a62e72 100644
> --- a/arch/mips/kernel/smp.c
> +++ b/arch/mips/kernel/smp.c
> @@ -20,6 +20,7 @@
> #include <linux/sched/mm.h>
> #include <linux/cpumask.h>
> #include <linux/cpu.h>
> +#include <linux/rcupdate.h>
> #include <linux/err.h>
> #include <linux/ftrace.h>
> #include <linux/irqdomain.h>
> @@ -422,6 +423,7 @@ static void stop_this_cpu(void *dummy)
> set_cpu_online(smp_processor_id(), false);
> calculate_cpu_foreign_map();
> local_irq_disable();
> + rcutree_report_cpu_dead();
I'm not sure but maybe it is better to before local_irq_disable()?
Huacai
> while (1);
> }
>
> --
> 2.51.0
>
>
Hi Huacai,
On 05.06.26 05:01, Huacai Chen wrote:
> Hi, Jonas,
>
> On Fri, Jun 5, 2026 at 2:25 AM Jonas Jelonek <jelonek.jonas@gmail.com> wrote:
>> smp_send_stop() parks all secondary CPUs in stop_this_cpu(). The function
>> marks the CPU offline for the scheduler via set_cpu_online(false) but
>> never informs RCU, so RCU keeps expecting a quiescent state from CPUs
>> that are now spinning forever with interrupts disabled.
>>
>> As long as nothing waits for an RCU grace period after smp_send_stop()
>> this is harmless, which is why it went unnoticed. Since commit
>> 91840be8f710 ("irq_work: Fix use-after-free in irq_work_single() on PREEMPT_RT")
>> however, irq_work_sync() calls synchronize_rcu() on architectures without
>> an irq_work self-IPI, i.e. where arch_irq_work_has_interrupt() returns
>> false. That is the asm-generic default used by MIPS. Any irq_work_sync()
>> issued in the reboot/shutdown path after smp_send_stop() then blocks on
>> a grace period that can never complete, hanging the reboot:
>>
>> WARNING: CPU: 0 PID: 15 at kernel/irq_work.c:144 irq_work_queue_on
>> ...
>> rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
>> rcu: Offline CPU 1 blocking current GP.
>> rcu: Offline CPU 2 blocking current GP.
>> rcu: Offline CPU 3 blocking current GP.
>>
>> This issue popped up during kernel bump downstream in OpenWrt from
>> 6.18.33 to 6.18.34, since the suspected change has been backported to
>> 6.18 stable branch [1].
> Now 91840be8f710 ("irq_work: Fix use-after-free in irq_work_single()
> on PREEMPT_RT") has been backported to as early as 6.1 LTS.
Yes, as also pointed out by Sebastian I should adjust this paragraph
to be more accurate.
>> Call rcutree_report_cpu_dead() once interrupts are disabled, mirroring the
>> generic CPU-hotplug offline path (and arm64's stop handling), so RCU stops
>> waiting on the parked CPUs and grace periods can still complete.
>>
>> [1] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-6.18.y&id=18c0456ea2615b1a743a6db739c74411c3b42bc6
>>
>> Fixes: 91840be8f710 ("irq_work: Fix use-after-free in irq_work_single() on PREEMPT_RT")
>> CC: stable@vger.kernel.org
>> Signed-off-by: Jonas Jelonek <jelonek.jonas@gmail.com>
>>
>> diff --git a/arch/mips/kernel/smp.c b/arch/mips/kernel/smp.c
>> index 4868e79f3b30..0f28b4a62e72 100644
>> --- a/arch/mips/kernel/smp.c
>> +++ b/arch/mips/kernel/smp.c
>> @@ -20,6 +20,7 @@
>> #include <linux/sched/mm.h>
>> #include <linux/cpumask.h>
>> #include <linux/cpu.h>
>> +#include <linux/rcupdate.h>
>> #include <linux/err.h>
>> #include <linux/ftrace.h>
>> #include <linux/irqdomain.h>
>> @@ -422,6 +423,7 @@ static void stop_this_cpu(void *dummy)
>> set_cpu_online(smp_processor_id(), false);
>> calculate_cpu_foreign_map();
>> local_irq_disable();
>> + rcutree_report_cpu_dead();
> I'm not sure but maybe it is better to before local_irq_disable()?
rcutree_report_cpu_dead() starts with lockdep_assert_irqs_disabled() so
it needs IRQs disabled already.
> Huacai
>> while (1);
>> }
>>
>> --
>> 2.51.0
>>
>>
Best,
Jonas
© 2016 - 2026 Red Hat, Inc.