xen/arch/x86/include/asm/apic.h | 5 + xen/arch/x86/irq.c | 163 +++++++++++++++++++++++++------- 2 files changed, 132 insertions(+), 36 deletions(-)
Hello, The following series aim to fix interrupt handling when doing CPU plug/unplug operations. Without this series running: cpus=`xl info max_cpu_id` while [ 1 ]; do for i in `seq 1 $cpus`; do xen-hptool cpu-offline $i; xen-hptool cpu-online $i; done done Quite quickly results in interrupts getting lost and "No irq handler for vector" messages on the Xen console. Drivers in dom0 also start getting interrupt timeouts and the system becomes unusable. After applying the series running the loop over night still result in a fully usable system, no "No irq handler for vector" messages at all, no interrupt loses reported by dom0. Test with x2apic-mode={mixed,cluster}. I've attempted to document all code as good as I could, interrupt handling has some unexpected corner cases that are hard to diagnose and reason about. Some XenRT testing is undergoing to ensure no breakages. Thanks, Roger. Roger Pau Monne (3): x86/irq: deal with old_cpu_mask for interrupts in movement in fixup_irqs() x86/irq: handle moving interrupts in _assign_irq_vector() x86/irq: forward pending interrupts to new destination in fixup_irqs() xen/arch/x86/include/asm/apic.h | 5 + xen/arch/x86/irq.c | 163 +++++++++++++++++++++++++------- 2 files changed, 132 insertions(+), 36 deletions(-) -- 2.45.2
Sorry, forgot to add the for-4.19 tag and Cc Oleksii. Since we have taken the start of the series, we might as well take the remaining patches (if other x86 maintainers agree) and attempt to hopefully fix all the interrupt issues with CPU hotplug/unplug. FTR: there are further issues when doing CPU hotplug/unplug from a PVH dom0, but those are out of the scope for 4.19, as I haven't even started to diagnose what's going on. Thanks, Roger. On Thu, Jun 13, 2024 at 06:56:14PM +0200, Roger Pau Monne wrote: > Hello, > > The following series aim to fix interrupt handling when doing CPU > plug/unplug operations. Without this series running: > > cpus=`xl info max_cpu_id` > while [ 1 ]; do > for i in `seq 1 $cpus`; do > xen-hptool cpu-offline $i; > xen-hptool cpu-online $i; > done > done > > Quite quickly results in interrupts getting lost and "No irq handler for > vector" messages on the Xen console. Drivers in dom0 also start getting > interrupt timeouts and the system becomes unusable. > > After applying the series running the loop over night still result in a > fully usable system, no "No irq handler for vector" messages at all, no > interrupt loses reported by dom0. Test with x2apic-mode={mixed,cluster}. > > I've attempted to document all code as good as I could, interrupt > handling has some unexpected corner cases that are hard to diagnose and > reason about. > > Some XenRT testing is undergoing to ensure no breakages. > > Thanks, Roger. > > Roger Pau Monne (3): > x86/irq: deal with old_cpu_mask for interrupts in movement in > fixup_irqs() > x86/irq: handle moving interrupts in _assign_irq_vector() > x86/irq: forward pending interrupts to new destination in fixup_irqs() > > xen/arch/x86/include/asm/apic.h | 5 + > xen/arch/x86/irq.c | 163 +++++++++++++++++++++++++------- > 2 files changed, 132 insertions(+), 36 deletions(-) > > -- > 2.45.2 >
On Fri, 2024-06-14 at 09:28 +0200, Roger Pau Monné wrote: > Sorry, forgot to add the for-4.19 tag and Cc Oleksii. > > Since we have taken the start of the series, we might as well take > the > remaining patches (if other x86 maintainers agree) and attempt to > hopefully fix all the interrupt issues with CPU hotplug/unplug. > > FTR: there are further issues when doing CPU hotplug/unplug from a > PVH > dom0, but those are out of the scope for 4.19, as I haven't even > started to diagnose what's going on. And this issues were before the current patch series was introduced? ~ Oleksii > > Thanks, Roger. > > On Thu, Jun 13, 2024 at 06:56:14PM +0200, Roger Pau Monne wrote: > > Hello, > > > > The following series aim to fix interrupt handling when doing CPU > > plug/unplug operations. Without this series running: > > > > cpus=`xl info max_cpu_id` > > while [ 1 ]; do > > for i in `seq 1 $cpus`; do > > xen-hptool cpu-offline $i; > > xen-hptool cpu-online $i; > > done > > done > > > > Quite quickly results in interrupts getting lost and "No irq > > handler for > > vector" messages on the Xen console. Drivers in dom0 also start > > getting > > interrupt timeouts and the system becomes unusable. > > > > After applying the series running the loop over night still result > > in a > > fully usable system, no "No irq handler for vector" messages at > > all, no > > interrupt loses reported by dom0. Test with x2apic- > > mode={mixed,cluster}. > > > > I've attempted to document all code as good as I could, interrupt > > handling has some unexpected corner cases that are hard to diagnose > > and > > reason about. > > > > Some XenRT testing is undergoing to ensure no breakages. > > > > Thanks, Roger. > > > > Roger Pau Monne (3): > > x86/irq: deal with old_cpu_mask for interrupts in movement in > > fixup_irqs() > > x86/irq: handle moving interrupts in _assign_irq_vector() > > x86/irq: forward pending interrupts to new destination in > > fixup_irqs() > > > > xen/arch/x86/include/asm/apic.h | 5 + > > xen/arch/x86/irq.c | 163 +++++++++++++++++++++++++--- > > ---- > > 2 files changed, 132 insertions(+), 36 deletions(-) > > > > -- > > 2.45.2 > >
On Fri, Jun 14, 2024 at 01:52:59PM +0200, Oleksii K. wrote: > On Fri, 2024-06-14 at 09:28 +0200, Roger Pau Monné wrote: > > Sorry, forgot to add the for-4.19 tag and Cc Oleksii. > > > > Since we have taken the start of the series, we might as well take > > the > > remaining patches (if other x86 maintainers agree) and attempt to > > hopefully fix all the interrupt issues with CPU hotplug/unplug. > > > > FTR: there are further issues when doing CPU hotplug/unplug from a > > PVH > > dom0, but those are out of the scope for 4.19, as I haven't even > > started to diagnose what's going on. > And this issues were before the current patch series was introduced? Sure, the issues with PVH dom0 cpu hotplug/unplug are additional to the ones fixed here. Thanks, Roger.
© 2016 - 2024 Red Hat, Inc.