kernel/stop_machine.c | 31 +++++++++++++++++++++++-------- kernel/workqueue.c | 12 ++++++++++-- 2 files changed, 33 insertions(+), 10 deletions(-)
Here are a few patches to fix a lockup caused by very slow progress due to a scalability problem in workqueue watchdog touch being hammered by thousands of CPUs in multi_cpu_stop. Patch 2 is the fix. I did notice when making a microbenchmark reproducer that the RCU call was actually also causing slowdowns. Not nearly so bad as the workqueue touch, but workqueue queueing of dummy jobs slowed down by a factor of several times when lots of other CPUs were making rcu_momentary_dyntick_idle() calls. So I did the stop_machine patches to reduce that. So those patches 3,4 are independent of the first two and can go in any order. Thanks, Nick Nicholas Piggin (4): workqueue: wq_watchdog_touch is always called with valid CPU workqueue: Improve scalability of workqueue watchdog touch stop_machine: Rearrange multi_cpu_stop state machine loop stop_machine: Add a delay between multi_cpu_stop touching watchdogs kernel/stop_machine.c | 31 +++++++++++++++++++++++-------- kernel/workqueue.c | 12 ++++++++++-- 2 files changed, 33 insertions(+), 10 deletions(-) -- 2.45.1
Hello Nicholas.
On Tue, Jun 25, 2024 at 09:42:43PM GMT, Nicholas Piggin <npiggin@gmail.com> wrote:
> Here are a few patches to fix a lockup caused by very slow progress due
> to a scalability problem in workqueue watchdog touch being hammered by
> thousands of CPUs in multi_cpu_stop. Patch 2 is the fix.
<del>Is this something you noticed with stop_machine alone or in some broader
context?</del> I see you mention CPU hotplug in patch 2. Was it a single
CPU or SMT offlining?
Good job tracking it down touching same cacheline.
I had a suspicion [1] back in the day that cpuhp_smt_disable() would
scale in O(nr_cpus^2) but I didn't dedicate time to verifying that. Has
something similar popped up in your examination?
Also, I think your change in patch 2 effectively reduces
wq_watchdog_thresh to 3/4 of configured value. (Not sure if default
should be scaled accordingly.)
Thanks,
Michal
[1]
cpuhp_smt_disable
mutex_lock(&cpu_add_remove_lock); // cpu_maps_update_begin()
for_each_online_cpu // <-- nr_cpus
takedown_cpu(cpu)
stop_machine_cpuslocked // cpu_hotplug_lock
active_cpus = cpumask_of(cpu)
for_each_cpu(cpu, cpu_online_mask) // <-- nr_cpus
cpu_stop_queue_work(cpu, work) // this is serial
multi_cpu_stop // this runs nr_cpus in parallel
// runs on downed cpu only
take_cpu_down
// other cpus are spinning in multi_cpu_stop() until take_cpu_down
// above is finished on downed cpu
// with interrupts disabled
do {cpu_relax} while(curstate != MULTI_STOP_EXIT)
lockup_detector_cleanup() // not a touch
mutex_unlock(&cpu_add_remove_lock);
On Tue, Jun 25, 2024 at 09:42:43PM +1000, Nicholas Piggin wrote: > Here are a few patches to fix a lockup caused by very slow progress due > to a scalability problem in workqueue watchdog touch being hammered by > thousands of CPUs in multi_cpu_stop. Patch 2 is the fix. > > I did notice when making a microbenchmark reproducer that the RCU call > was actually also causing slowdowns. Not nearly so bad as the workqueue > touch, but workqueue queueing of dummy jobs slowed down by a factor of > several times when lots of other CPUs were making > rcu_momentary_dyntick_idle() calls. So I did the stop_machine patches to > reduce that. So those patches 3,4 are independent of the first two and > can go in any order. For the series: Reviewed-by: Paul E. McKenney <paulmck@kernel.org> > Thanks, > Nick > > Nicholas Piggin (4): > workqueue: wq_watchdog_touch is always called with valid CPU > workqueue: Improve scalability of workqueue watchdog touch > stop_machine: Rearrange multi_cpu_stop state machine loop > stop_machine: Add a delay between multi_cpu_stop touching watchdogs > > kernel/stop_machine.c | 31 +++++++++++++++++++++++-------- > kernel/workqueue.c | 12 ++++++++++-- > 2 files changed, 33 insertions(+), 10 deletions(-) > > -- > 2.45.1 >
On Wed Jun 26, 2024 at 12:53 AM AEST, Paul E. McKenney wrote: > On Tue, Jun 25, 2024 at 09:42:43PM +1000, Nicholas Piggin wrote: > > Here are a few patches to fix a lockup caused by very slow progress due > > to a scalability problem in workqueue watchdog touch being hammered by > > thousands of CPUs in multi_cpu_stop. Patch 2 is the fix. > > > > I did notice when making a microbenchmark reproducer that the RCU call > > was actually also causing slowdowns. Not nearly so bad as the workqueue > > touch, but workqueue queueing of dummy jobs slowed down by a factor of > > several times when lots of other CPUs were making > > rcu_momentary_dyntick_idle() calls. So I did the stop_machine patches to > > reduce that. So those patches 3,4 are independent of the first two and > > can go in any order. > > For the series: > > Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Oh, it did get a comment :) Thanks Paul. Not sure who owns the multi_cpu_stop loop, Tejun and Peter I guess but that was 10+ years ago :P I might ask Andrew if he would take patches 3-4, if there are no objections. Thanks, Nick
* Nicholas Piggin <npiggin@gmail.com> [2024-06-26 10:57:36]:
> On Wed Jun 26, 2024 at 12:53 AM AEST, Paul E. McKenney wrote:
> > On Tue, Jun 25, 2024 at 09:42:43PM +1000, Nicholas Piggin wrote:
> > > Here are a few patches to fix a lockup caused by very slow progress due
> > > to a scalability problem in workqueue watchdog touch being hammered by
> > > thousands of CPUs in multi_cpu_stop. Patch 2 is the fix.
> > >
> > > I did notice when making a microbenchmark reproducer that the RCU call
> > > was actually also causing slowdowns. Not nearly so bad as the workqueue
> > > touch, but workqueue queueing of dummy jobs slowed down by a factor of
> > > several times when lots of other CPUs were making
> > > rcu_momentary_dyntick_idle() calls. So I did the stop_machine patches to
> > > reduce that. So those patches 3,4 are independent of the first two and
> > > can go in any order.
> >
> > For the series:
> >
> > Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
>
> Oh, it did get a comment :) Thanks Paul. Not sure who owns the
> multi_cpu_stop loop, Tejun and Peter I guess but that was 10+
> years ago :P
>
> I might ask Andrew if he would take patches 3-4, if there are
> no objections.
>
patches 3 and 4 are still not part of any tree.
Can we please include them or are there any reservations on them.
The patches still seem to apply on top of Linus tree except one line where
rcu_momentary_dyntick_idle() has been renamed to rcu_momentary_eqs()
Commit 32a9f26e5e26 ("rcu: Rename rcu_momentary_dyntick_idle() into
rcu_momentary_eqs()")
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/patch/?id=32a9f26e5e26
--
Thanks and Regards
Srikar Dronamraju
© 2016 - 2025 Red Hat, Inc.