Hello,
The following series aim to fix interrupt handling when doing CPU
plug/unplug operations. Without this series running:
cpus=`xl info max_cpu_id`
while [ 1 ]; do
for i in `seq 1 $cpus`; do
xen-hptool cpu-offline $i;
xen-hptool cpu-online $i;
done
done
Quite quickly results in interrupts getting lost and "No irq handler for
vector" messages on the Xen console. Drivers in dom0 also start getting
interrupt timeouts and the system becomes unusable.
After applying the series running the loop over night still result in a
fully usable system, no "No irq handler for vector" messages at all, no
interrupt loses reported by dom0. Test with x2apic-mode={mixed,cluster}.
I've attempted to document all code as good as I could, interrupt
handling has some unexpected corner cases that are hard to diagnose and
reason about.
I'm currently also doing some extra testing with XenRT in case I've
missed something.
Thanks, Roger.
Roger Pau Monne (7):
x86/smp: do not use shorthand IPI destinations in CPU hot{,un}plug
contexts
x86/irq: describe how the interrupt CPU movement works
x86/irq: limit interrupt movement done by fixup_irqs()
x86/irq: restrict CPU movement in set_desc_affinity()
x86/irq: deal with old_cpu_mask for interrupts in movement in
fixup_irqs()
x86/irq: handle moving interrupts in _assign_irq_vector()
x86/irq: forward pending interrupts to new destination in fixup_irqs()
xen/arch/x86/include/asm/apic.h | 5 +
xen/arch/x86/include/asm/irq.h | 40 ++++++-
xen/arch/x86/irq.c | 197 ++++++++++++++++++++++++--------
xen/arch/x86/smp.c | 2 +-
xen/common/cpu.c | 5 +
xen/include/xen/cpu.h | 10 ++
xen/include/xen/rwlock.h | 2 +
7 files changed, 214 insertions(+), 47 deletions(-)
--
2.44.0