[RESEND PATCH 2/2] smp: Reduce NMI traffic from CSD waiters to CSD destination.

Imran Khan posted 2 patches 2 years, 9 months ago
[RESEND PATCH 2/2] smp: Reduce NMI traffic from CSD waiters to CSD destination.
Posted by Imran Khan 2 years, 9 months ago
On systems with hundreds of CPUs, if few hundred or most of the CPUs
detect a CSD hang, then all of these waiters endup sending an NMI to
destination CPU to dump its backtrace.
Depending on the number of such NMIs, destination CPU can spent
a significant amount of time handling these NMIs and thus making
it more difficult for this CPU to address those pending CSDs timely.
In worst case it can happen that by the time destination CPU is done
handling all of the above mentioned backtrace NMIs, csd wait time
may have elapsed and all of the waiters start sending backtrace NMI
again and this behaviour continues in loop.

To avoid the above mentioned scenario, issue backtrace NMI only from
first waiter. The other waiters to same CSD destination can make use
of backtrace obtained via fist waiter's NMI.

Signed-off-by: Imran Khan <imran.f.khan@oracle.com>
---
 kernel/smp.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index b7ccba677a0a0..a1cd21ea8b308 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -43,6 +43,8 @@ static DEFINE_PER_CPU_ALIGNED(struct call_function_data, cfd_data);
 
 static DEFINE_PER_CPU_SHARED_ALIGNED(struct llist_head, call_single_queue);
 
+static DEFINE_PER_CPU(atomic_t, trigger_backtrace) = ATOMIC_INIT(1);
+
 static void __flush_smp_call_function_queue(bool warn_cpu_offline);
 
 int smpcfd_prepare_cpu(unsigned int cpu)
@@ -242,7 +244,8 @@ static bool csd_lock_wait_toolong(struct __call_single_data *csd, u64 ts0, u64 *
 			 *bug_id, !cpu_cur_csd ? "unresponsive" : "handling this request");
 	}
 	if (cpu >= 0) {
-		dump_cpu_task(cpu);
+		if (atomic_cmpxchg_acquire(&per_cpu(trigger_backtrace, cpu), 1, 0))
+			dump_cpu_task(cpu);
 		if (!cpu_cur_csd) {
 			pr_alert("csd: Re-sending CSD lock (#%d) IPI from CPU#%02d to CPU#%02d\n", *bug_id, raw_smp_processor_id(), cpu);
 			arch_send_call_function_single_ipi(cpu);
@@ -423,9 +426,14 @@ static void __flush_smp_call_function_queue(bool warn_cpu_offline)
 	struct llist_node *entry, *prev;
 	struct llist_head *head;
 	static bool warned;
+	atomic_t *tbt;
 
 	lockdep_assert_irqs_disabled();
 
+	/* Allow waiters to send backtrace NMI from here onwards */
+	tbt = this_cpu_ptr(&trigger_backtrace);
+	atomic_set_release(tbt, 1);
+
 	head = this_cpu_ptr(&call_single_queue);
 	entry = llist_del_all(head);
 	entry = llist_reverse_order(entry);
-- 
2.34.1
Re: [RESEND PATCH 2/2] smp: Reduce NMI traffic from CSD waiters to CSD destination.
Posted by Paul E. McKenney 2 years, 9 months ago
On Tue, May 09, 2023 at 08:31:24AM +1000, Imran Khan wrote:
> On systems with hundreds of CPUs, if few hundred or most of the CPUs
> detect a CSD hang, then all of these waiters endup sending an NMI to
> destination CPU to dump its backtrace.
> Depending on the number of such NMIs, destination CPU can spent
> a significant amount of time handling these NMIs and thus making
> it more difficult for this CPU to address those pending CSDs timely.
> In worst case it can happen that by the time destination CPU is done
> handling all of the above mentioned backtrace NMIs, csd wait time
> may have elapsed and all of the waiters start sending backtrace NMI
> again and this behaviour continues in loop.
> 
> To avoid the above mentioned scenario, issue backtrace NMI only from
> first waiter. The other waiters to same CSD destination can make use
> of backtrace obtained via fist waiter's NMI.
> 
> Signed-off-by: Imran Khan <imran.f.khan@oracle.com>

Reviewed-by: Paul E. McKenney <paulmck@kernel.org>

> ---
>  kernel/smp.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/smp.c b/kernel/smp.c
> index b7ccba677a0a0..a1cd21ea8b308 100644
> --- a/kernel/smp.c
> +++ b/kernel/smp.c
> @@ -43,6 +43,8 @@ static DEFINE_PER_CPU_ALIGNED(struct call_function_data, cfd_data);
>  
>  static DEFINE_PER_CPU_SHARED_ALIGNED(struct llist_head, call_single_queue);
>  
> +static DEFINE_PER_CPU(atomic_t, trigger_backtrace) = ATOMIC_INIT(1);
> +
>  static void __flush_smp_call_function_queue(bool warn_cpu_offline);
>  
>  int smpcfd_prepare_cpu(unsigned int cpu)
> @@ -242,7 +244,8 @@ static bool csd_lock_wait_toolong(struct __call_single_data *csd, u64 ts0, u64 *
>  			 *bug_id, !cpu_cur_csd ? "unresponsive" : "handling this request");
>  	}
>  	if (cpu >= 0) {
> -		dump_cpu_task(cpu);
> +		if (atomic_cmpxchg_acquire(&per_cpu(trigger_backtrace, cpu), 1, 0))
> +			dump_cpu_task(cpu);
>  		if (!cpu_cur_csd) {
>  			pr_alert("csd: Re-sending CSD lock (#%d) IPI from CPU#%02d to CPU#%02d\n", *bug_id, raw_smp_processor_id(), cpu);
>  			arch_send_call_function_single_ipi(cpu);
> @@ -423,9 +426,14 @@ static void __flush_smp_call_function_queue(bool warn_cpu_offline)
>  	struct llist_node *entry, *prev;
>  	struct llist_head *head;
>  	static bool warned;
> +	atomic_t *tbt;
>  
>  	lockdep_assert_irqs_disabled();
>  
> +	/* Allow waiters to send backtrace NMI from here onwards */
> +	tbt = this_cpu_ptr(&trigger_backtrace);
> +	atomic_set_release(tbt, 1);
> +
>  	head = this_cpu_ptr(&call_single_queue);
>  	entry = llist_del_all(head);
>  	entry = llist_reverse_order(entry);
> -- 
> 2.34.1
>
Re: [RESEND PATCH 2/2] smp: Reduce NMI traffic from CSD waiters to CSD destination.
Posted by Imran Khan 2 years, 8 months ago
Hello Paul,

On 16/5/2023 10:09 pm, Paul E. McKenney wrote:
> On Tue, May 09, 2023 at 08:31:24AM +1000, Imran Khan wrote:
>> On systems with hundreds of CPUs, if few hundred or most of the CPUs
>> detect a CSD hang, then all of these waiters endup sending an NMI to
>> destination CPU to dump its backtrace.
>> Depending on the number of such NMIs, destination CPU can spent
>> a significant amount of time handling these NMIs and thus making
>> it more difficult for this CPU to address those pending CSDs timely.
>> In worst case it can happen that by the time destination CPU is done
>> handling all of the above mentioned backtrace NMIs, csd wait time
>> may have elapsed and all of the waiters start sending backtrace NMI
>> again and this behaviour continues in loop.
>>
>> To avoid the above mentioned scenario, issue backtrace NMI only from
>> first waiter. The other waiters to same CSD destination can make use
>> of backtrace obtained via fist waiter's NMI.
>>
>> Signed-off-by: Imran Khan <imran.f.khan@oracle.com>
> 
> Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
> 

Thanks a lot for reviewing this and [1]. Could you kindly let me know
if you plan to pick these in your tree, at some point of time.

Thanks,
Imran

[1]:
https://lore.kernel.org/all/088edfa0-c1b7-407f-8b20-caf0fecfbb79@paulmck-laptop/

>> ---
>>  kernel/smp.c | 10 +++++++++-
>>  1 file changed, 9 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/smp.c b/kernel/smp.c
>> index b7ccba677a0a0..a1cd21ea8b308 100644
>> --- a/kernel/smp.c
>> +++ b/kernel/smp.c
>> @@ -43,6 +43,8 @@ static DEFINE_PER_CPU_ALIGNED(struct call_function_data, cfd_data);
>>  
>>  static DEFINE_PER_CPU_SHARED_ALIGNED(struct llist_head, call_single_queue);
>>  
>> +static DEFINE_PER_CPU(atomic_t, trigger_backtrace) = ATOMIC_INIT(1);
>> +
>>  static void __flush_smp_call_function_queue(bool warn_cpu_offline);
>>  
>>  int smpcfd_prepare_cpu(unsigned int cpu)
>> @@ -242,7 +244,8 @@ static bool csd_lock_wait_toolong(struct __call_single_data *csd, u64 ts0, u64 *
>>  			 *bug_id, !cpu_cur_csd ? "unresponsive" : "handling this request");
>>  	}
>>  	if (cpu >= 0) {
>> -		dump_cpu_task(cpu);
>> +		if (atomic_cmpxchg_acquire(&per_cpu(trigger_backtrace, cpu), 1, 0))
>> +			dump_cpu_task(cpu);
>>  		if (!cpu_cur_csd) {
>>  			pr_alert("csd: Re-sending CSD lock (#%d) IPI from CPU#%02d to CPU#%02d\n", *bug_id, raw_smp_processor_id(), cpu);
>>  			arch_send_call_function_single_ipi(cpu);
>> @@ -423,9 +426,14 @@ static void __flush_smp_call_function_queue(bool warn_cpu_offline)
>>  	struct llist_node *entry, *prev;
>>  	struct llist_head *head;
>>  	static bool warned;
>> +	atomic_t *tbt;
>>  
>>  	lockdep_assert_irqs_disabled();
>>  
>> +	/* Allow waiters to send backtrace NMI from here onwards */
>> +	tbt = this_cpu_ptr(&trigger_backtrace);
>> +	atomic_set_release(tbt, 1);
>> +
>>  	head = this_cpu_ptr(&call_single_queue);
>>  	entry = llist_del_all(head);
>>  	entry = llist_reverse_order(entry);
>> -- 
>> 2.34.1
>>
Re: [RESEND PATCH 2/2] smp: Reduce NMI traffic from CSD waiters to CSD destination.
Posted by Paul E. McKenney 2 years, 8 months ago
On Tue, May 30, 2023 at 11:24:00AM +1000, Imran Khan wrote:
> Hello Paul,
> 
> On 16/5/2023 10:09 pm, Paul E. McKenney wrote:
> > On Tue, May 09, 2023 at 08:31:24AM +1000, Imran Khan wrote:
> >> On systems with hundreds of CPUs, if few hundred or most of the CPUs
> >> detect a CSD hang, then all of these waiters endup sending an NMI to
> >> destination CPU to dump its backtrace.
> >> Depending on the number of such NMIs, destination CPU can spent
> >> a significant amount of time handling these NMIs and thus making
> >> it more difficult for this CPU to address those pending CSDs timely.
> >> In worst case it can happen that by the time destination CPU is done
> >> handling all of the above mentioned backtrace NMIs, csd wait time
> >> may have elapsed and all of the waiters start sending backtrace NMI
> >> again and this behaviour continues in loop.
> >>
> >> To avoid the above mentioned scenario, issue backtrace NMI only from
> >> first waiter. The other waiters to same CSD destination can make use
> >> of backtrace obtained via fist waiter's NMI.
> >>
> >> Signed-off-by: Imran Khan <imran.f.khan@oracle.com>
> > 
> > Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
> 
> Thanks a lot for reviewing this and [1]. Could you kindly let me know
> if you plan to pick these in your tree, at some point of time.

I have done so, and they should make it to -next early next week,
assuming testing goes well.

							Thanx, Paul

> Thanks,
> Imran
> 
> [1]:
> https://lore.kernel.org/all/088edfa0-c1b7-407f-8b20-caf0fecfbb79@paulmck-laptop/
> 
> >> ---
> >>  kernel/smp.c | 10 +++++++++-
> >>  1 file changed, 9 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/kernel/smp.c b/kernel/smp.c
> >> index b7ccba677a0a0..a1cd21ea8b308 100644
> >> --- a/kernel/smp.c
> >> +++ b/kernel/smp.c
> >> @@ -43,6 +43,8 @@ static DEFINE_PER_CPU_ALIGNED(struct call_function_data, cfd_data);
> >>  
> >>  static DEFINE_PER_CPU_SHARED_ALIGNED(struct llist_head, call_single_queue);
> >>  
> >> +static DEFINE_PER_CPU(atomic_t, trigger_backtrace) = ATOMIC_INIT(1);
> >> +
> >>  static void __flush_smp_call_function_queue(bool warn_cpu_offline);
> >>  
> >>  int smpcfd_prepare_cpu(unsigned int cpu)
> >> @@ -242,7 +244,8 @@ static bool csd_lock_wait_toolong(struct __call_single_data *csd, u64 ts0, u64 *
> >>  			 *bug_id, !cpu_cur_csd ? "unresponsive" : "handling this request");
> >>  	}
> >>  	if (cpu >= 0) {
> >> -		dump_cpu_task(cpu);
> >> +		if (atomic_cmpxchg_acquire(&per_cpu(trigger_backtrace, cpu), 1, 0))
> >> +			dump_cpu_task(cpu);
> >>  		if (!cpu_cur_csd) {
> >>  			pr_alert("csd: Re-sending CSD lock (#%d) IPI from CPU#%02d to CPU#%02d\n", *bug_id, raw_smp_processor_id(), cpu);
> >>  			arch_send_call_function_single_ipi(cpu);
> >> @@ -423,9 +426,14 @@ static void __flush_smp_call_function_queue(bool warn_cpu_offline)
> >>  	struct llist_node *entry, *prev;
> >>  	struct llist_head *head;
> >>  	static bool warned;
> >> +	atomic_t *tbt;
> >>  
> >>  	lockdep_assert_irqs_disabled();
> >>  
> >> +	/* Allow waiters to send backtrace NMI from here onwards */
> >> +	tbt = this_cpu_ptr(&trigger_backtrace);
> >> +	atomic_set_release(tbt, 1);
> >> +
> >>  	head = this_cpu_ptr(&call_single_queue);
> >>  	entry = llist_del_all(head);
> >>  	entry = llist_reverse_order(entry);
> >> -- 
> >> 2.34.1
> >>