Hello!
Running rcutorture on v7.0-rc3 results in spurious CPU-hotplug failures,
most frequently on the TREE03 scenario, which suffers about ten such
failures per hundred hours of test time. Repeat-by is as follows:
tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 80 --duration 100h --configs "100*TREE03" --trust-make
Though a faster repeat-by instead uses kvm-remote.sh and lots of systems.
Bisection converges here:
6df415aa46ec ("cgroup/cpuset: Defer housekeeping_update() calls from CPU hotplug to workqueue")
Reverting this commit gets rid of the spurious CPU-hotplug failures.
Of course, this also gets rid of some ability to do dynamic nohz_full
processing.
Now, the problem might be that the workqueue handler might still be
in flight by the time that rcutorture fired up the next CPU-hotplug
operation, especially given that the TREE03 scenario only waits 200
milliseconds between these operations. This suggests waiting for this
handler before ending each CPU-hotplug operation. And the crude patch
below does make the problem go away.
This alleged fix is quite heavy-handed, and also fragile in that if
hk_sd_workfn() uses a different workqueue, this breaks. It might be
better to call into the cgroups/cpusets code and to use flush_work()
to wait only on hk_sd_workfn() and nothing else. But it seemed best to
keep things trivial to start with.
Either way, please consider the patch below to be part of this bug report
rather than a proper fix.
Thoughts?
Thanx, Paul
------------------------------------------------------------------------
diff --git a/kernel/cpu.c b/kernel/cpu.c
index bc4f7a9ba64e6..36a9399be331d 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1514,6 +1514,7 @@ int remove_cpu(unsigned int cpu)
lock_device_hotplug();
ret = device_offline(get_cpu_device(cpu));
+ flush_workqueue(system_unbound_wq);
unlock_device_hotplug();
return ret;
@@ -1730,6 +1731,7 @@ int add_cpu(unsigned int cpu)
lock_device_hotplug();
ret = device_online(get_cpu_device(cpu));
+ flush_workqueue(system_unbound_wq);
unlock_device_hotplug();
return ret;
On 3/18/26 8:53 AM, Paul E. McKenney wrote:
> Hello!
>
> Running rcutorture on v7.0-rc3 results in spurious CPU-hotplug failures,
> most frequently on the TREE03 scenario, which suffers about ten such
> failures per hundred hours of test time. Repeat-by is as follows:
>
> tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 80 --duration 100h --configs "100*TREE03" --trust-make
>
> Though a faster repeat-by instead uses kvm-remote.sh and lots of systems.
>
> Bisection converges here:
>
> 6df415aa46ec ("cgroup/cpuset: Defer housekeeping_update() calls from CPU hotplug to workqueue")
>
> Reverting this commit gets rid of the spurious CPU-hotplug failures.
> Of course, this also gets rid of some ability to do dynamic nohz_full
> processing.
>
> Now, the problem might be that the workqueue handler might still be
> in flight by the time that rcutorture fired up the next CPU-hotplug
> operation, especially given that the TREE03 scenario only waits 200
> milliseconds between these operations. This suggests waiting for this
> handler before ending each CPU-hotplug operation. And the crude patch
> below does make the problem go away.
>
> This alleged fix is quite heavy-handed, and also fragile in that if
> hk_sd_workfn() uses a different workqueue, this breaks. It might be
> better to call into the cgroups/cpusets code and to use flush_work()
> to wait only on hk_sd_workfn() and nothing else. But it seemed best to
> keep things trivial to start with.
>
> Either way, please consider the patch below to be part of this bug report
> rather than a proper fix.
>
> Thoughts?
>
> Thanx, Paul
There is a fix commit ca174c705db5 ("cgroup/cpuset: Call
rebuild_sched_domains() directly in hotplug") in rc4 that may help. Could
you try out the rc4 kernel to see if that can resolve the problem that
you have?
Thanks,
Longman
On Wed, Mar 18, 2026 at 11:02:16AM -0400, Waiman Long wrote:
> On 3/18/26 8:53 AM, Paul E. McKenney wrote:
> > Hello!
> >
> > Running rcutorture on v7.0-rc3 results in spurious CPU-hotplug failures,
> > most frequently on the TREE03 scenario, which suffers about ten such
> > failures per hundred hours of test time. Repeat-by is as follows:
> >
> > tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 80 --duration 100h --configs "100*TREE03" --trust-make
> >
> > Though a faster repeat-by instead uses kvm-remote.sh and lots of systems.
> >
> > Bisection converges here:
> >
> > 6df415aa46ec ("cgroup/cpuset: Defer housekeeping_update() calls from CPU hotplug to workqueue")
> >
> > Reverting this commit gets rid of the spurious CPU-hotplug failures.
> > Of course, this also gets rid of some ability to do dynamic nohz_full
> > processing.
> >
> > Now, the problem might be that the workqueue handler might still be
> > in flight by the time that rcutorture fired up the next CPU-hotplug
> > operation, especially given that the TREE03 scenario only waits 200
> > milliseconds between these operations. This suggests waiting for this
> > handler before ending each CPU-hotplug operation. And the crude patch
> > below does make the problem go away.
> >
> > This alleged fix is quite heavy-handed, and also fragile in that if
> > hk_sd_workfn() uses a different workqueue, this breaks. It might be
> > better to call into the cgroups/cpusets code and to use flush_work()
> > to wait only on hk_sd_workfn() and nothing else. But it seemed best to
> > keep things trivial to start with.
> >
> > Either way, please consider the patch below to be part of this bug report
> > rather than a proper fix.
> >
> > Thoughts?
> >
> > Thanx, Paul
> There is a fix commit ca174c705db5 ("cgroup/cpuset: Call
> rebuild_sched_domains() directly in hotplug") in rc4 that may help. Could
> you try out the rc4 kernel to see if that can resolve the problem that you
> have?
It does, thank you!
Tested-by: Paul E. McKenney <paulmck@kernel.org>
On Wed, Mar 18, 2026 at 11:43:37AM -0700, Paul E. McKenney wrote:
> On Wed, Mar 18, 2026 at 11:02:16AM -0400, Waiman Long wrote:
> > On 3/18/26 8:53 AM, Paul E. McKenney wrote:
> > > Hello!
> > >
> > > Running rcutorture on v7.0-rc3 results in spurious CPU-hotplug failures,
> > > most frequently on the TREE03 scenario, which suffers about ten such
> > > failures per hundred hours of test time. Repeat-by is as follows:
> > >
> > > tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 80 --duration 100h --configs "100*TREE03" --trust-make
> > >
> > > Though a faster repeat-by instead uses kvm-remote.sh and lots of systems.
> > >
> > > Bisection converges here:
> > >
> > > 6df415aa46ec ("cgroup/cpuset: Defer housekeeping_update() calls from CPU hotplug to workqueue")
> > >
> > > Reverting this commit gets rid of the spurious CPU-hotplug failures.
> > > Of course, this also gets rid of some ability to do dynamic nohz_full
> > > processing.
> > >
> > > Now, the problem might be that the workqueue handler might still be
> > > in flight by the time that rcutorture fired up the next CPU-hotplug
> > > operation, especially given that the TREE03 scenario only waits 200
> > > milliseconds between these operations. This suggests waiting for this
> > > handler before ending each CPU-hotplug operation. And the crude patch
> > > below does make the problem go away.
> > >
> > > This alleged fix is quite heavy-handed, and also fragile in that if
> > > hk_sd_workfn() uses a different workqueue, this breaks. It might be
> > > better to call into the cgroups/cpusets code and to use flush_work()
> > > to wait only on hk_sd_workfn() and nothing else. But it seemed best to
> > > keep things trivial to start with.
> > >
> > > Either way, please consider the patch below to be part of this bug report
> > > rather than a proper fix.
> > >
> > > Thoughts?
> > >
> > > Thanx, Paul
> > There is a fix commit ca174c705db5 ("cgroup/cpuset: Call
> > rebuild_sched_domains() directly in hotplug") in rc4 that may help. Could
> > you try out the rc4 kernel to see if that can resolve the problem that you
> > have?
>
> It does, thank you!
>
> Tested-by: Paul E. McKenney <paulmck@kernel.org>
This did fix the problem, except for PREEMPT_RT kernels (which I have
not yet bisected). If there is another patch for that configuration,
could you please let me know?
Thanx, Paul
On Tue, Mar 24, 2026 at 02:41:16AM -0700, Paul E. McKenney wrote:
> On Wed, Mar 18, 2026 at 11:43:37AM -0700, Paul E. McKenney wrote:
> > On Wed, Mar 18, 2026 at 11:02:16AM -0400, Waiman Long wrote:
> > > On 3/18/26 8:53 AM, Paul E. McKenney wrote:
> > > > Hello!
> > > >
> > > > Running rcutorture on v7.0-rc3 results in spurious CPU-hotplug failures,
> > > > most frequently on the TREE03 scenario, which suffers about ten such
> > > > failures per hundred hours of test time. Repeat-by is as follows:
> > > >
> > > > tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 80 --duration 100h --configs "100*TREE03" --trust-make
> > > >
> > > > Though a faster repeat-by instead uses kvm-remote.sh and lots of systems.
> > > >
> > > > Bisection converges here:
> > > >
> > > > 6df415aa46ec ("cgroup/cpuset: Defer housekeeping_update() calls from CPU hotplug to workqueue")
> > > >
> > > > Reverting this commit gets rid of the spurious CPU-hotplug failures.
> > > > Of course, this also gets rid of some ability to do dynamic nohz_full
> > > > processing.
> > > >
> > > > Now, the problem might be that the workqueue handler might still be
> > > > in flight by the time that rcutorture fired up the next CPU-hotplug
> > > > operation, especially given that the TREE03 scenario only waits 200
> > > > milliseconds between these operations. This suggests waiting for this
> > > > handler before ending each CPU-hotplug operation. And the crude patch
> > > > below does make the problem go away.
> > > >
> > > > This alleged fix is quite heavy-handed, and also fragile in that if
> > > > hk_sd_workfn() uses a different workqueue, this breaks. It might be
> > > > better to call into the cgroups/cpusets code and to use flush_work()
> > > > to wait only on hk_sd_workfn() and nothing else. But it seemed best to
> > > > keep things trivial to start with.
> > > >
> > > > Either way, please consider the patch below to be part of this bug report
> > > > rather than a proper fix.
> > > >
> > > > Thoughts?
> > > >
> > > > Thanx, Paul
> > > There is a fix commit ca174c705db5 ("cgroup/cpuset: Call
> > > rebuild_sched_domains() directly in hotplug") in rc4 that may help. Could
> > > you try out the rc4 kernel to see if that can resolve the problem that you
> > > have?
> >
> > It does, thank you!
> >
> > Tested-by: Paul E. McKenney <paulmck@kernel.org>
>
> This did fix the problem, except for PREEMPT_RT kernels (which I have
> not yet bisected). If there is another patch for that configuration,
> could you please let me know?
And which I am having a hard time reproducing. :-(
Thanx, Paul
On 3/24/26 5:41 AM, Paul E. McKenney wrote:
> On Wed, Mar 18, 2026 at 11:43:37AM -0700, Paul E. McKenney wrote:
>> On Wed, Mar 18, 2026 at 11:02:16AM -0400, Waiman Long wrote:
>>> On 3/18/26 8:53 AM, Paul E. McKenney wrote:
>>>> Hello!
>>>>
>>>> Running rcutorture on v7.0-rc3 results in spurious CPU-hotplug failures,
>>>> most frequently on the TREE03 scenario, which suffers about ten such
>>>> failures per hundred hours of test time. Repeat-by is as follows:
>>>>
>>>> tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 80 --duration 100h --configs "100*TREE03" --trust-make
>>>>
>>>> Though a faster repeat-by instead uses kvm-remote.sh and lots of systems.
>>>>
>>>> Bisection converges here:
>>>>
>>>> 6df415aa46ec ("cgroup/cpuset: Defer housekeeping_update() calls from CPU hotplug to workqueue")
>>>>
>>>> Reverting this commit gets rid of the spurious CPU-hotplug failures.
>>>> Of course, this also gets rid of some ability to do dynamic nohz_full
>>>> processing.
>>>>
>>>> Now, the problem might be that the workqueue handler might still be
>>>> in flight by the time that rcutorture fired up the next CPU-hotplug
>>>> operation, especially given that the TREE03 scenario only waits 200
>>>> milliseconds between these operations. This suggests waiting for this
>>>> handler before ending each CPU-hotplug operation. And the crude patch
>>>> below does make the problem go away.
>>>>
>>>> This alleged fix is quite heavy-handed, and also fragile in that if
>>>> hk_sd_workfn() uses a different workqueue, this breaks. It might be
>>>> better to call into the cgroups/cpusets code and to use flush_work()
>>>> to wait only on hk_sd_workfn() and nothing else. But it seemed best to
>>>> keep things trivial to start with.
>>>>
>>>> Either way, please consider the patch below to be part of this bug report
>>>> rather than a proper fix.
>>>>
>>>> Thoughts?
>>>>
>>>> Thanx, Paul
>>> There is a fix commit ca174c705db5 ("cgroup/cpuset: Call
>>> rebuild_sched_domains() directly in hotplug") in rc4 that may help. Could
>>> you try out the rc4 kernel to see if that can resolve the problem that you
>>> have?
>> It does, thank you!
>>
>> Tested-by: Paul E. McKenney <paulmck@kernel.org>
> This did fix the problem, except for PREEMPT_RT kernels (which I have
> not yet bisected). If there is another patch for that configuration,
> could you please let me know?
Thank for the notice. I haven't much testing with respect to PREEMPT_RT
kernel. I will try to run some tests on PREEMPT_RT kernel and see if
there is any problem. Please let me know if you found out the new cpuset
code is at fault after bisection.
Cheers,
Longman
On 3/18/26 2:43 PM, Paul E. McKenney wrote:
> On Wed, Mar 18, 2026 at 11:02:16AM -0400, Waiman Long wrote:
>> On 3/18/26 8:53 AM, Paul E. McKenney wrote:
>>> Hello!
>>>
>>> Running rcutorture on v7.0-rc3 results in spurious CPU-hotplug failures,
>>> most frequently on the TREE03 scenario, which suffers about ten such
>>> failures per hundred hours of test time. Repeat-by is as follows:
>>>
>>> tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 80 --duration 100h --configs "100*TREE03" --trust-make
>>>
>>> Though a faster repeat-by instead uses kvm-remote.sh and lots of systems.
>>>
>>> Bisection converges here:
>>>
>>> 6df415aa46ec ("cgroup/cpuset: Defer housekeeping_update() calls from CPU hotplug to workqueue")
>>>
>>> Reverting this commit gets rid of the spurious CPU-hotplug failures.
>>> Of course, this also gets rid of some ability to do dynamic nohz_full
>>> processing.
>>>
>>> Now, the problem might be that the workqueue handler might still be
>>> in flight by the time that rcutorture fired up the next CPU-hotplug
>>> operation, especially given that the TREE03 scenario only waits 200
>>> milliseconds between these operations. This suggests waiting for this
>>> handler before ending each CPU-hotplug operation. And the crude patch
>>> below does make the problem go away.
>>>
>>> This alleged fix is quite heavy-handed, and also fragile in that if
>>> hk_sd_workfn() uses a different workqueue, this breaks. It might be
>>> better to call into the cgroups/cpusets code and to use flush_work()
>>> to wait only on hk_sd_workfn() and nothing else. But it seemed best to
>>> keep things trivial to start with.
>>>
>>> Either way, please consider the patch below to be part of this bug report
>>> rather than a proper fix.
>>>
>>> Thoughts?
>>>
>>> Thanx, Paul
>> There is a fix commit ca174c705db5 ("cgroup/cpuset: Call
>> rebuild_sched_domains() directly in hotplug") in rc4 that may help. Could
>> you try out the rc4 kernel to see if that can resolve the problem that you
>> have?
> It does, thank you!
>
> Tested-by: Paul E. McKenney <paulmck@kernel.org>
>
Thanks for the confirmation.
Cheers,
Longman
© 2016 - 2026 Red Hat, Inc.