[patch 0/4] genirq: Prevent migration live lock in handle_edge_irq()

Thomas Gleixner posted 4 patches 2 months, 2 weeks ago
chip.c      |   68 ++++++++++++++++++++++++++++++++++++++++++++++++++++--------
internals.h |    6 ++---
pm.c        |   16 +++++---------
spurious.c  |   37 --------------------------------
4 files changed, 69 insertions(+), 58 deletions(-)
[patch 0/4] genirq: Prevent migration live lock in handle_edge_irq()
Posted by Thomas Gleixner 2 months, 2 weeks ago
Yicon reported and Liangyan debugged a live lock in handle_edge_irq()
related to interrupt migration.

If the interrupt affinity is moved to a new target CPU and the interrupt is
currently handled on the previous target CPU for edge type interrupts the
handler might get stuck on the previous target:

CPU 0 (previous target)		CPU 1 (new target)

  handle_edge_irq()
   repeat:
	handle_event()		handle_edge_irq()
			        if (INPROGESS) {
				  set(PENDING);
				  mask();
				  return;
				}
	if (PENDING) {
	  clear(PENDING);
	  unmask();
	  goto repeat;
	}

The migration in software never completes and CPU0 continues to handle the
pending events forever. This happens when the device raises interrupts with
a high rate and always before handle_event() completes and before the CPU0
handler can clear INPROGRESS so that CPU1 sets the PENDING flag over and
over. This has been observed in virtual machines.

The following series is addressing this by making the new target CPU wait
for the handler to complete on CPU1 and thereby completing the software
migration.

A draft combo patch of this has been tested by Liangyan:

  https://lore.kernel.org/all/87o6u0rpaa.ffs@tglx

The series splits up the draft patch and has proper changelogs.

Thanks,

	tglx
---
 chip.c      |   68 ++++++++++++++++++++++++++++++++++++++++++++++++++++--------
 internals.h |    6 ++---
 pm.c        |   16 +++++---------
 spurious.c  |   37 --------------------------------
 4 files changed, 69 insertions(+), 58 deletions(-)
Re: [External] [patch 0/4] genirq: Prevent migration live lock in handle_edge_irq()
Posted by Liangyan 2 months, 2 weeks ago

On 2025/7/19 02:54, Thomas Gleixner wrote:
> Yicon reported and Liangyan debugged a live lock in handle_edge_irq()
> related to interrupt migration.
> 
> If the interrupt affinity is moved to a new target CPU and the interrupt is
> currently handled on the previous target CPU for edge type interrupts the
> handler might get stuck on the previous target:
> 
> CPU 0 (previous target)		CPU 1 (new target)
> 
>   handle_edge_irq()
>    repeat:
> 	handle_event()		handle_edge_irq()
> 			        if (INPROGESS) {
> 				  set(PENDING);
> 				  mask();
> 				  return;
> 				}
> 	if (PENDING) {
> 	  clear(PENDING);
> 	  unmask();
> 	  goto repeat;
> 	}
> 
> The migration in software never completes and CPU0 continues to handle the
> pending events forever. This happens when the device raises interrupts with
> a high rate and always before handle_event() completes and before the CPU0
> handler can clear INPROGRESS so that CPU1 sets the PENDING flag over and
> over. This has been observed in virtual machines.
> 
> The following series is addressing this by making the new target CPU wait
> for the handler to complete on CPU1 and thereby completing the software
> migration.
> 
> A draft combo patch of this has been tested by Liangyan:
> 
>   https://lore.kernel.org/all/87o6u0rpaa.ffs@tglx
> 
> The series splits up the draft patch and has proper changelogs.
> 
> Thanks,
> 
> 	tglx
> ---
>  chip.c      |   68 ++++++++++++++++++++++++++++++++++++++++++++++++++++--------
>  internals.h |    6 ++---
>  pm.c        |   16 +++++---------
>  spurious.c  |   37 --------------------------------
>  4 files changed, 69 insertions(+), 58 deletions(-)
> 
> 

Tested-by: Liangyan <liangyan.peng@bytedance.com>

Regards,
Liangyan