Fix scalability problem in workqueue watchdog touch caused by stop_machine

[PATCH 2/4] workqueue: Improve scalability of workqueue watchdog touch

Posted by Nicholas Piggin 1 year, 7 months ago

On a ~2000 CPU powerpc system, hard lockups have been observed in the
workqueue code when stop_machine runs (in this case due to CPU hotplug).
This is due to lots of CPUs spinning in multi_cpu_stop, calling
touch_nmi_watchdog() which ends up calling wq_watchdog_touch().
wq_watchdog_touch() writes to the global variable wq_watchdog_touched,
and that can find itself in the same cacheline as other important
workqueue data, which slows down operations to the point of lockups.

In the case of the following abridged trace, worker_pool_idr was in
the hot line, causing the lockups to always appear at idr_find.

  watchdog: CPU 1125 self-detected hard LOCKUP @ idr_find
  Call Trace:
  get_work_pool
  __queue_work
  call_timer_fn
  run_timer_softirq
  __do_softirq
  do_softirq_own_stack
  irq_exit
  timer_interrupt
  decrementer_common_virt
  * interrupt: 900 (timer) at multi_cpu_stop
  multi_cpu_stop
  cpu_stopper_thread
  smpboot_thread_fn
  kthread

Fix this by having wq_watchdog_touch() only write to the line if the
last time a touch was recorded exceeds 1/4 of the watchdog threshold.

Reported-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 kernel/workqueue.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 0954b778b315..f60886782f31 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -7560,12 +7560,18 @@ static void wq_watchdog_timer_fn(struct timer_list *unused)
 
 notrace void wq_watchdog_touch(int cpu)
 {
+	unsigned long thresh = READ_ONCE(wq_watchdog_thresh) * HZ;
+	unsigned long touch_ts = READ_ONCE(wq_watchdog_touched);
+	unsigned long now = jiffies;
+
 	if (cpu >= 0)
-		per_cpu(wq_watchdog_touched_cpu, cpu) = jiffies;
+		per_cpu(wq_watchdog_touched_cpu, cpu) = now;
 	else
 		WARN_ONCE(1, "%s should be called with valid CPU", __func__);
 
-	wq_watchdog_touched = jiffies;
+	/* Don't unnecessarily store to global cacheline */
+	if (time_after(now, touch_ts + thresh / 4))
+		WRITE_ONCE(wq_watchdog_touched, jiffies);
 }
 
 static void wq_watchdog_set_thresh(unsigned long thresh)
-- 
2.45.1

Re: [PATCH 2/4] workqueue: Improve scalability of workqueue watchdog touch

Posted by Hillf Danton 1 year, 7 months ago

On Tue, Jun 25, 2024 at 09:42:45PM +1000, Nicholas Piggin wrote:
> On a ~2000 CPU powerpc system, hard lockups have been observed in the
> workqueue code when stop_machine runs (in this case due to CPU hotplug).
> This is due to lots of CPUs spinning in multi_cpu_stop, calling
> touch_nmi_watchdog() which ends up calling wq_watchdog_touch().
> wq_watchdog_touch() writes to the global variable wq_watchdog_touched,
> and that can find itself in the same cacheline as other important
> workqueue data, which slows down operations to the point of lockups.
> 
> In the case of the following abridged trace, worker_pool_idr was in
> the hot line, causing the lockups to always appear at idr_find.
> 
Wonder if the MCS lock does not help in this case.

>   watchdog: CPU 1125 self-detected hard LOCKUP @ idr_find
>   Call Trace:
>   get_work_pool
>   __queue_work
>   call_timer_fn
>   run_timer_softirq
>   __do_softirq
>   do_softirq_own_stack
>   irq_exit
>   timer_interrupt
>   decrementer_common_virt
>   * interrupt: 900 (timer) at multi_cpu_stop
>   multi_cpu_stop
>   cpu_stopper_thread
>   smpboot_thread_fn
>   kthread
>

Re: [PATCH 2/4] workqueue: Improve scalability of workqueue watchdog touch

Posted by Waiman Long 1 year, 7 months ago

On 6/27/24 08:16, Hillf Danton wrote:
> On Tue, Jun 25, 2024 at 09:42:45PM +1000, Nicholas Piggin wrote:
>> On a ~2000 CPU powerpc system, hard lockups have been observed in the
>> workqueue code when stop_machine runs (in this case due to CPU hotplug).
>> This is due to lots of CPUs spinning in multi_cpu_stop, calling
>> touch_nmi_watchdog() which ends up calling wq_watchdog_touch().
>> wq_watchdog_touch() writes to the global variable wq_watchdog_touched,
>> and that can find itself in the same cacheline as other important
>> workqueue data, which slows down operations to the point of lockups.
>>
>> In the case of the following abridged trace, worker_pool_idr was in
>> the hot line, causing the lockups to always appear at idr_find.
>>
> Wonder if the MCS lock does not help in this case.

This patch just tries to avoid polluting the shared cacheline leading to 
excessive cacheline bouncing. No locking is involved. I am not sure what 
you are thinking about using MCS lock for.

Regards,
Longman

>>    watchdog: CPU 1125 self-detected hard LOCKUP @ idr_find
>>    Call Trace:
>>    get_work_pool
>>    __queue_work
>>    call_timer_fn
>>    run_timer_softirq
>>    __do_softirq
>>    do_softirq_own_stack
>>    irq_exit
>>    timer_interrupt
>>    decrementer_common_virt
>>    * interrupt: 900 (timer) at multi_cpu_stop
>>    multi_cpu_stop
>>    cpu_stopper_thread
>>    smpboot_thread_fn
>>    kthread
>>

Re: [PATCH 2/4] workqueue: Improve scalability of workqueue watchdog touch

Posted by Tejun Heo 1 year, 7 months ago

On Tue, Jun 25, 2024 at 09:42:45PM +1000, Nicholas Piggin wrote:
> On a ~2000 CPU powerpc system, hard lockups have been observed in the
> workqueue code when stop_machine runs (in this case due to CPU hotplug).
> This is due to lots of CPUs spinning in multi_cpu_stop, calling
> touch_nmi_watchdog() which ends up calling wq_watchdog_touch().
> wq_watchdog_touch() writes to the global variable wq_watchdog_touched,
> and that can find itself in the same cacheline as other important
> workqueue data, which slows down operations to the point of lockups.
> 
> In the case of the following abridged trace, worker_pool_idr was in
> the hot line, causing the lockups to always appear at idr_find.
> 
>   watchdog: CPU 1125 self-detected hard LOCKUP @ idr_find
>   Call Trace:
>   get_work_pool
>   __queue_work
>   call_timer_fn
>   run_timer_softirq
>   __do_softirq
>   do_softirq_own_stack
>   irq_exit
>   timer_interrupt
>   decrementer_common_virt
>   * interrupt: 900 (timer) at multi_cpu_stop
>   multi_cpu_stop
>   cpu_stopper_thread
>   smpboot_thread_fn
>   kthread
> 
> Fix this by having wq_watchdog_touch() only write to the line if the
> last time a touch was recorded exceeds 1/4 of the watchdog threshold.
> 
> Reported-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

Applied 1-2 to wq/for-6.11. I think 3 and 4 should probably be routed
through either tip or Andrew?

Thanks.

-- 
tejun

Re: [PATCH 2/4] workqueue: Improve scalability of workqueue watchdog touch

Posted by Nicholas Piggin 1 year, 7 months ago

On Wed Jun 26, 2024 at 2:57 AM AEST, Tejun Heo wrote:
> On Tue, Jun 25, 2024 at 09:42:45PM +1000, Nicholas Piggin wrote:
> > On a ~2000 CPU powerpc system, hard lockups have been observed in the
> > workqueue code when stop_machine runs (in this case due to CPU hotplug).
> > This is due to lots of CPUs spinning in multi_cpu_stop, calling
> > touch_nmi_watchdog() which ends up calling wq_watchdog_touch().
> > wq_watchdog_touch() writes to the global variable wq_watchdog_touched,
> > and that can find itself in the same cacheline as other important
> > workqueue data, which slows down operations to the point of lockups.
> > 
> > In the case of the following abridged trace, worker_pool_idr was in
> > the hot line, causing the lockups to always appear at idr_find.
> > 
> >   watchdog: CPU 1125 self-detected hard LOCKUP @ idr_find
> >   Call Trace:
> >   get_work_pool
> >   __queue_work
> >   call_timer_fn
> >   run_timer_softirq
> >   __do_softirq
> >   do_softirq_own_stack
> >   irq_exit
> >   timer_interrupt
> >   decrementer_common_virt
> >   * interrupt: 900 (timer) at multi_cpu_stop
> >   multi_cpu_stop
> >   cpu_stopper_thread
> >   smpboot_thread_fn
> >   kthread
> > 
> > Fix this by having wq_watchdog_touch() only write to the line if the
> > last time a touch was recorded exceeds 1/4 of the watchdog threshold.
> > 
> > Reported-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> > Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>
> Applied 1-2 to wq/for-6.11.

Thanks Tejun.

> I think 3 and 4 should probably be routed
> through either tip or Andrew?

Yeah, let's see if it gets any comments.

Thanks,
Nick