[PATCH 06/12] x86/i8259: redo workaround for AMD spurious PIC interrupts

Roger Pau Monne posted 12 patches 3 weeks, 2 days ago
Only 6 patches received!
[PATCH 06/12] x86/i8259: redo workaround for AMD spurious PIC interrupts
Posted by Roger Pau Monne 3 weeks, 2 days ago
Do not set legacy IRQs target CPU mask to all CPUs, as the interrupt is
strictly targeting a single CPU, even if a spurious interrupt can be
delivered to other CPUs.

Instead set the target CPU mask correctly, as do_IRQ() will always deal
with spurious interrupts generated on AMD hardware.  Otherwise fixup_irqs()
would assume the IRQ is targeting all CPUs, and would simply not perform
the required affinity move if the target CPU goes offline.

Most of this is unlikely to make any difference in practice, the IO-APIC
setup will take over those entries and set the proper destination as part
of startup setup.

Fixes: 87f37449d586 ("x86/i8259: do not assume interrupts always target CPU0")
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/arch/x86/i8259.c | 17 +++++------------
 1 file changed, 5 insertions(+), 12 deletions(-)

diff --git a/xen/arch/x86/i8259.c b/xen/arch/x86/i8259.c
index 5c7e21a7515c..41f949a36531 100644
--- a/xen/arch/x86/i8259.c
+++ b/xen/arch/x86/i8259.c
@@ -407,21 +407,14 @@ void __init init_IRQ(void)
         per_cpu(vector_irq, cpu)[LEGACY_VECTOR(irq)] = irq;
 
         /*
-         * The interrupt affinity logic never targets interrupts to offline
-         * CPUs, hence it's safe to use cpumask_all here.
-         *
          * Legacy PIC interrupts are only targeted to CPU0, but depending on
          * the platform they can be distributed to any online CPU in hardware.
-         * Note this behavior has only been observed on AMD hardware. In order
-         * to cope install all active legacy vectors on all CPUs.
-         *
-         * IO-APIC will change the destination mask if/when taking ownership of
-         * the interrupt.
+         * Note this behavior has only been observed on AMD hardware. Set the
+         * target CPU as expected here, and cope with the possibly spurious
+         * interrupts in do_IRQ().  This behavior has only been observed
+         * during AP bringup.
          */
-        cpumask_copy(desc->arch.cpu_mask,
-                     (boot_cpu_data.x86_vendor &
-                      (X86_VENDOR_AMD | X86_VENDOR_HYGON) ? &cpumask_all
-                                                          : cpumask_of(cpu)));
+        cpumask_copy(desc->arch.cpu_mask, cpumask_of(cpu));
         desc->arch.vector = LEGACY_VECTOR(irq);
     }
     
-- 
2.51.0


Re: [PATCH 06/12] x86/i8259: redo workaround for AMD spurious PIC interrupts
Posted by Jan Beulich 3 weeks, 2 days ago
On 20.11.2025 10:58, Roger Pau Monne wrote:
> --- a/xen/arch/x86/i8259.c
> +++ b/xen/arch/x86/i8259.c
> @@ -407,21 +407,14 @@ void __init init_IRQ(void)
>          per_cpu(vector_irq, cpu)[LEGACY_VECTOR(irq)] = irq;
>  
>          /*
> -         * The interrupt affinity logic never targets interrupts to offline
> -         * CPUs, hence it's safe to use cpumask_all here.
> -         *
>           * Legacy PIC interrupts are only targeted to CPU0, but depending on
>           * the platform they can be distributed to any online CPU in hardware.
> -         * Note this behavior has only been observed on AMD hardware. In order
> -         * to cope install all active legacy vectors on all CPUs.
> -         *
> -         * IO-APIC will change the destination mask if/when taking ownership of
> -         * the interrupt.
> +         * Note this behavior has only been observed on AMD hardware. Set the
> +         * target CPU as expected here, and cope with the possibly spurious
> +         * interrupts in do_IRQ().  This behavior has only been observed
> +         * during AP bringup.
>           */
> -        cpumask_copy(desc->arch.cpu_mask,
> -                     (boot_cpu_data.x86_vendor &
> -                      (X86_VENDOR_AMD | X86_VENDOR_HYGON) ? &cpumask_all
> -                                                          : cpumask_of(cpu)));
> +        cpumask_copy(desc->arch.cpu_mask, cpumask_of(cpu));
>          desc->arch.vector = LEGACY_VECTOR(irq);
>      }

Doesn't this collide with what setup_vector_irq() does (see also patch 04)? If
you don't set all bits here, not all CPUs will have the vector_irq[] slot set
correctly for do_IRQ() to actually be able to associate the vector with the
right IRQ.

Jan
Re: [PATCH 06/12] x86/i8259: redo workaround for AMD spurious PIC interrupts
Posted by Roger Pau Monné 3 weeks, 1 day ago
On Thu, Nov 20, 2025 at 04:05:38PM +0100, Jan Beulich wrote:
> On 20.11.2025 10:58, Roger Pau Monne wrote:
> > --- a/xen/arch/x86/i8259.c
> > +++ b/xen/arch/x86/i8259.c
> > @@ -407,21 +407,14 @@ void __init init_IRQ(void)
> >          per_cpu(vector_irq, cpu)[LEGACY_VECTOR(irq)] = irq;
> >  
> >          /*
> > -         * The interrupt affinity logic never targets interrupts to offline
> > -         * CPUs, hence it's safe to use cpumask_all here.
> > -         *
> >           * Legacy PIC interrupts are only targeted to CPU0, but depending on
> >           * the platform they can be distributed to any online CPU in hardware.
> > -         * Note this behavior has only been observed on AMD hardware. In order
> > -         * to cope install all active legacy vectors on all CPUs.
> > -         *
> > -         * IO-APIC will change the destination mask if/when taking ownership of
> > -         * the interrupt.
> > +         * Note this behavior has only been observed on AMD hardware. Set the
> > +         * target CPU as expected here, and cope with the possibly spurious
> > +         * interrupts in do_IRQ().  This behavior has only been observed
> > +         * during AP bringup.
> >           */
> > -        cpumask_copy(desc->arch.cpu_mask,
> > -                     (boot_cpu_data.x86_vendor &
> > -                      (X86_VENDOR_AMD | X86_VENDOR_HYGON) ? &cpumask_all
> > -                                                          : cpumask_of(cpu)));
> > +        cpumask_copy(desc->arch.cpu_mask, cpumask_of(cpu));
> >          desc->arch.vector = LEGACY_VECTOR(irq);
> >      }
> 
> Doesn't this collide with what setup_vector_irq() does (see also patch 04)? If
> you don't set all bits here, not all CPUs will have the vector_irq[] slot set
> correctly for do_IRQ() to actually be able to associate the vector with the
> right IRQ.

For the AMD workaround I've only ever saw the spurious vector
triggering on CPUs different than the BSP at bringup, I don't think we
strictly need to bind all legacy vectors on all possible CPUs.  Well
behaved PIC interrupts will only target the BSP, and that's properly
setup.

Thanks, Roger.
Re: [PATCH 06/12] x86/i8259: redo workaround for AMD spurious PIC interrupts
Posted by Jan Beulich 2 weeks, 5 days ago
On 21.11.2025 09:35, Roger Pau Monné wrote:
> On Thu, Nov 20, 2025 at 04:05:38PM +0100, Jan Beulich wrote:
>> On 20.11.2025 10:58, Roger Pau Monne wrote:
>>> --- a/xen/arch/x86/i8259.c
>>> +++ b/xen/arch/x86/i8259.c
>>> @@ -407,21 +407,14 @@ void __init init_IRQ(void)
>>>          per_cpu(vector_irq, cpu)[LEGACY_VECTOR(irq)] = irq;
>>>  
>>>          /*
>>> -         * The interrupt affinity logic never targets interrupts to offline
>>> -         * CPUs, hence it's safe to use cpumask_all here.
>>> -         *
>>>           * Legacy PIC interrupts are only targeted to CPU0, but depending on
>>>           * the platform they can be distributed to any online CPU in hardware.
>>> -         * Note this behavior has only been observed on AMD hardware. In order
>>> -         * to cope install all active legacy vectors on all CPUs.
>>> -         *
>>> -         * IO-APIC will change the destination mask if/when taking ownership of
>>> -         * the interrupt.
>>> +         * Note this behavior has only been observed on AMD hardware. Set the
>>> +         * target CPU as expected here, and cope with the possibly spurious
>>> +         * interrupts in do_IRQ().  This behavior has only been observed
>>> +         * during AP bringup.
>>>           */
>>> -        cpumask_copy(desc->arch.cpu_mask,
>>> -                     (boot_cpu_data.x86_vendor &
>>> -                      (X86_VENDOR_AMD | X86_VENDOR_HYGON) ? &cpumask_all
>>> -                                                          : cpumask_of(cpu)));
>>> +        cpumask_copy(desc->arch.cpu_mask, cpumask_of(cpu));
>>>          desc->arch.vector = LEGACY_VECTOR(irq);
>>>      }
>>
>> Doesn't this collide with what setup_vector_irq() does (see also patch 04)? If
>> you don't set all bits here, not all CPUs will have the vector_irq[] slot set
>> correctly for do_IRQ() to actually be able to associate the vector with the
>> right IRQ.
> 
> For the AMD workaround I've only ever saw the spurious vector
> triggering on CPUs different than the BSP at bringup,

Besides this possibly being an artifact (the issue occurring once during boot
may hide it otherwise potentially also occurring later), I think we want to be
conservative and (continue to) cover all possible cases unless we have an
explanation for why the issue would be AP-bringup-time-only.

Jan

> I don't think we
> strictly need to bind all legacy vectors on all possible CPUs.  Well
> behaved PIC interrupts will only target the BSP, and that's properly
> setup.
> 
> Thanks, Roger.


[PATCH 07/12] x86/io-apic: fix usage of setup_ioapic_dest()
Posted by Roger Pau Monne 3 weeks, 2 days ago
Attempt to clarify the purpose of setup_ioapic_dest(), as the comment ahead
of it is outdated, and looks to be a verbatim copy from Linux from one of
the code imports.

The function serves two purposes: shuffling the interrupts across CPUs
after SMP bringup or re-assigning all interrupts to CPU#0 if no IRQ
balancing is set at run time.  However the function won't perform any of
those functions correctly, as it was unconditionally using
desc->arch.cpu_mask as the target CPU mask for interrupts (which is the
current target anyway).

Fix by adding a new `shuffle` parameter that's used to signal whether the
function is intended to balance interrupts across CPUs, or to re-assign all
interrupts to the BSP.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
I couldn't find a specific Fixes tag to use here, I think this has been
broken all along.
---
 xen/arch/x86/include/asm/irq.h    |  2 +-
 xen/arch/x86/io_apic.c            | 13 +++++++------
 xen/arch/x86/platform_hypercall.c |  2 +-
 xen/arch/x86/smpboot.c            |  3 ++-
 4 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/xen/arch/x86/include/asm/irq.h b/xen/arch/x86/include/asm/irq.h
index 7315150b66b4..df7b48c8653e 100644
--- a/xen/arch/x86/include/asm/irq.h
+++ b/xen/arch/x86/include/asm/irq.h
@@ -137,7 +137,7 @@ int i8259A_resume(void);
 void enable_IO_APIC(void);
 void setup_IO_APIC(void);
 void disable_IO_APIC(void);
-void setup_ioapic_dest(void);
+void setup_ioapic_dest(bool shuffle);
 vmask_t *io_apic_get_used_vector_map(unsigned int irq);
 
 extern unsigned int io_apic_irqs;
diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c
index 3c59bf0e15da..19960d291c47 100644
--- a/xen/arch/x86/io_apic.c
+++ b/xen/arch/x86/io_apic.c
@@ -717,12 +717,14 @@ static int __init find_isa_irq_apic(int irq, int type)
 static int pin_2_irq(int idx, int apic, int pin);
 
 /*
- * This function currently is only a helper for the i386 smp boot process where 
- * we need to reprogram the ioredtbls to cater for the cpus which have come online
- * so mask in all cases should simply be TARGET_CPUS
+ * This function serves two different purposes: shuffling the IO-APIC
+ * interrupts across CPUs after SMP bringup, or re-assigning all interrupts to
+ * the BSP if IRQ balancing is disabled at runtime.  Such functional
+ * distinction is signaled by the `shuffle` parameter.
  */
-void /*__init*/ setup_ioapic_dest(void)
+void setup_ioapic_dest(bool shuffle)
 {
+    const cpumask_t *mask = shuffle ? TARGET_CPUS : cpumask_of(0);
     int pin, ioapic, irq, irq_entry;
 
     if (skip_ioapic_setup)
@@ -740,8 +742,7 @@ void /*__init*/ setup_ioapic_dest(void)
             desc = irq_to_desc(irq);
 
             spin_lock_irqsave(&desc->lock, flags);
-            BUG_ON(!cpumask_intersects(desc->arch.cpu_mask, &cpu_online_map));
-            set_ioapic_affinity_irq(desc, desc->arch.cpu_mask);
+            set_ioapic_affinity_irq(desc, mask);
             spin_unlock_irqrestore(&desc->lock, flags);
         }
     }
diff --git a/xen/arch/x86/platform_hypercall.c b/xen/arch/x86/platform_hypercall.c
index 79bb99e0b602..22fd65a6aa7e 100644
--- a/xen/arch/x86/platform_hypercall.c
+++ b/xen/arch/x86/platform_hypercall.c
@@ -337,7 +337,7 @@ ret_t do_platform_op(
         case QUIRK_NOIRQBALANCING:
             printk("Platform quirk -- Disabling IRQ balancing/affinity.\n");
             opt_noirqbalance = 1;
-            setup_ioapic_dest();
+            setup_ioapic_dest(false);
             break;
         case QUIRK_IOAPIC_BAD_REGSEL:
             dprintk(XENLOG_WARNING,
diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
index 69cc9bbc6e2c..55ea62d7d67f 100644
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -1433,7 +1433,8 @@ void __init smp_cpus_done(void)
         check_nmi_watchdog();
     }
 
-    setup_ioapic_dest();
+    if ( !opt_noirqbalance )
+        setup_ioapic_dest(true);
 
     mtrr_save_state();
     mtrr_aps_sync_end();
-- 
2.51.0


Re: [PATCH 07/12] x86/io-apic: fix usage of setup_ioapic_dest()
Posted by Jan Beulich 3 weeks, 2 days ago
On 20.11.2025 10:58, Roger Pau Monne wrote:
> Attempt to clarify the purpose of setup_ioapic_dest(), as the comment ahead
> of it is outdated, and looks to be a verbatim copy from Linux from one of
> the code imports.
> 
> The function serves two purposes: shuffling the interrupts across CPUs
> after SMP bringup or re-assigning all interrupts to CPU#0 if no IRQ
> balancing is set at run time.  However the function won't perform any of
> those functions correctly, as it was unconditionally using
> desc->arch.cpu_mask as the target CPU mask for interrupts (which is the
> current target anyway).
> 
> Fix by adding a new `shuffle` parameter that's used to signal whether the
> function is intended to balance interrupts across CPUs, or to re-assign all
> interrupts to the BSP.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>

> ---
> I couldn't find a specific Fixes tag to use here, I think this has been
> broken all along.

Perhaps dddd88c891af ("Auto-disable IRQ balancing/affinity on buggy chipsets"),
which is where the 2nd use of the function was introduced?

> --- a/xen/arch/x86/io_apic.c
> +++ b/xen/arch/x86/io_apic.c
> @@ -717,12 +717,14 @@ static int __init find_isa_irq_apic(int irq, int type)
>  static int pin_2_irq(int idx, int apic, int pin);
>  
>  /*
> - * This function currently is only a helper for the i386 smp boot process where 
> - * we need to reprogram the ioredtbls to cater for the cpus which have come online
> - * so mask in all cases should simply be TARGET_CPUS
> + * This function serves two different purposes: shuffling the IO-APIC
> + * interrupts across CPUs after SMP bringup, or re-assigning all interrupts to
> + * the BSP if IRQ balancing is disabled at runtime.  Such functional
> + * distinction is signaled by the `shuffle` parameter.
>   */
> -void /*__init*/ setup_ioapic_dest(void)
> +void setup_ioapic_dest(bool shuffle)
>  {
> +    const cpumask_t *mask = shuffle ? TARGET_CPUS : cpumask_of(0);

Don't we want to aim at getting rid of TARGET_CPUS, which is now only hiding
something invariant? IOW should we perhaps avoid introducing new uses?

Jan

[PATCH 08/12] x86/irq: adjust bind_irq_vector() to take a single CPU as parameter
Posted by Roger Pau Monne 3 weeks, 2 days ago
The vector will be targeting a single CPU at a time, so passing a mask is
not needed.  Simplify the interface and adjust callers to make use of it.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/arch/x86/hpet.c            |  2 +-
 xen/arch/x86/include/asm/irq.h |  2 +-
 xen/arch/x86/io_apic.c         |  2 +-
 xen/arch/x86/irq.c             | 23 ++++++++++-------------
 4 files changed, 13 insertions(+), 16 deletions(-)

diff --git a/xen/arch/x86/hpet.c b/xen/arch/x86/hpet.c
index a69abe2650a8..abf4eaf86db1 100644
--- a/xen/arch/x86/hpet.c
+++ b/xen/arch/x86/hpet.c
@@ -352,7 +352,7 @@ static int __init hpet_setup_msi_irq(struct hpet_event_channel *ch)
      * Technically we don't want to bind the IRQ to any CPU yet, but we need to
      * specify at least one online one here.  Use the BSP.
      */
-    ret = bind_irq_vector(ch->msi.irq, HPET_BROADCAST_VECTOR, cpumask_of(0));
+    ret = bind_irq_vector(ch->msi.irq, HPET_BROADCAST_VECTOR, 0);
     if ( ret )
         return ret;
     cpumask_setall(desc->affinity);
diff --git a/xen/arch/x86/include/asm/irq.h b/xen/arch/x86/include/asm/irq.h
index df7b48c8653e..355332188932 100644
--- a/xen/arch/x86/include/asm/irq.h
+++ b/xen/arch/x86/include/asm/irq.h
@@ -199,7 +199,7 @@ void setup_vector_irq(unsigned int cpu);
 void move_native_irq(struct irq_desc *desc);
 void move_masked_irq(struct irq_desc *desc);
 
-int bind_irq_vector(int irq, int vector, const cpumask_t *mask);
+int bind_irq_vector(int irq, int vector, unsigned int cpu);
 
 void cf_check end_nonmaskable_irq(struct irq_desc *desc, uint8_t vector);
 void irq_set_affinity(struct irq_desc *desc, const cpumask_t *mask);
diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c
index 19960d291c47..dfbe27b12d54 100644
--- a/xen/arch/x86/io_apic.c
+++ b/xen/arch/x86/io_apic.c
@@ -1859,7 +1859,7 @@ static void __init check_timer(void)
     vector = IRQ0_VECTOR;
     clear_irq_vector(0);
 
-    if ((ret = bind_irq_vector(0, vector, &cpumask_all)))
+    if ((ret = bind_irq_vector(0, vector, smp_processor_id())))
         printk(KERN_ERR"..IRQ0 is not set correctly with ioapic!!!, err:%d\n", ret);
     
     irq_desc[0].status &= ~IRQ_DISABLED;
diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index 5cd934ea2a32..e09559fce856 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -150,26 +150,23 @@ static void trace_irq_mask(uint32_t event, int irq, int vector,
 }
 
 static int __init _bind_irq_vector(struct irq_desc *desc, int vector,
-                                   const cpumask_t *cpu_mask)
+                                   unsigned int cpu)
 {
-    cpumask_t online_mask;
-    int cpu;
-
     BUG_ON((unsigned)vector >= X86_IDT_VECTORS);
 
-    cpumask_and(&online_mask, cpu_mask, &cpu_online_map);
-    if (cpumask_empty(&online_mask))
+    if ( !cpu_online(cpu) )
         return -EINVAL;
     if ( (desc->arch.vector == vector) &&
-         cpumask_equal(desc->arch.cpu_mask, &online_mask) )
+         cpumask_test_cpu(cpu, desc->arch.cpu_mask) )
         return 0;
     if ( desc->arch.vector != IRQ_VECTOR_UNASSIGNED )
         return -EBUSY;
-    trace_irq_mask(TRC_HW_IRQ_BIND_VECTOR, desc->irq, vector, &online_mask);
-    for_each_cpu(cpu, &online_mask)
-        per_cpu(vector_irq, cpu)[vector] = desc->irq;
+
+    trace_irq_mask(TRC_HW_IRQ_BIND_VECTOR, desc->irq, vector, cpumask_of(cpu));
+    per_cpu(vector_irq, cpu)[vector] = desc->irq;
     desc->arch.vector = vector;
-    cpumask_copy(desc->arch.cpu_mask, &online_mask);
+    cpumask_clear(desc->arch.cpu_mask);
+    cpumask_set_cpu(cpu, desc->arch.cpu_mask);
     if ( desc->arch.used_vectors )
     {
         ASSERT(!test_bit(vector, desc->arch.used_vectors));
@@ -179,7 +176,7 @@ static int __init _bind_irq_vector(struct irq_desc *desc, int vector,
     return 0;
 }
 
-int __init bind_irq_vector(int irq, int vector, const cpumask_t *mask)
+int __init bind_irq_vector(int irq, int vector, unsigned int cpu)
 {
     struct irq_desc *desc = irq_to_desc(irq);
     unsigned long flags;
@@ -189,7 +186,7 @@ int __init bind_irq_vector(int irq, int vector, const cpumask_t *mask)
 
     spin_lock_irqsave(&desc->lock, flags);
     spin_lock(&vector_lock);
-    ret = _bind_irq_vector(desc, vector, mask);
+    ret = _bind_irq_vector(desc, vector, cpu);
     spin_unlock(&vector_lock);
     spin_unlock_irqrestore(&desc->lock, flags);
 
-- 
2.51.0


Re: [PATCH 08/12] x86/irq: adjust bind_irq_vector() to take a single CPU as parameter
Posted by Jan Beulich 2 weeks, 5 days ago
On 20.11.2025 10:58, Roger Pau Monne wrote:
> The vector will be targeting a single CPU at a time, so passing a mask is
> not needed.  Simplify the interface and adjust callers to make use of it.

Considering the two callers we have, and considering the function is __init,
do we need the function parameter at all? Can't we uniformly use the BSP?

> --- a/xen/arch/x86/hpet.c
> +++ b/xen/arch/x86/hpet.c
> @@ -352,7 +352,7 @@ static int __init hpet_setup_msi_irq(struct hpet_event_channel *ch)
>       * Technically we don't want to bind the IRQ to any CPU yet, but we need to
>       * specify at least one online one here.  Use the BSP.
>       */
> -    ret = bind_irq_vector(ch->msi.irq, HPET_BROADCAST_VECTOR, cpumask_of(0));
> +    ret = bind_irq_vector(ch->msi.irq, HPET_BROADCAST_VECTOR, 0);

Irrespective of the remark above, the comment then will want re-wording some,
too.

Jan
[PATCH 09/12] x86/irq: convert cpumask parameter to integer in {,p}irq_set_affinity()
Posted by Roger Pau Monne 3 weeks, 2 days ago
Existing callers where already generating the passed cpumask using
cpumask_of() to contain a single target CPU.  Reduce complexity by passing
the single target CPU as an integer parameter.

No functional change intended.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
The function names are misleading, as {,p}irq_set_affinity() doesn't adjust
the affinity of the interrupt (desc->affinity) but the interrupt target
itself.  Further cleanup might be helpful to correctly differentiate
between setting interrupt affinity vs setting interrupt target.
---
 xen/arch/x86/hvm/hvm.c         | 2 +-
 xen/arch/x86/include/asm/irq.h | 2 +-
 xen/arch/x86/irq.c             | 8 ++++----
 xen/common/event_channel.c     | 6 ++----
 xen/include/xen/irq.h          | 3 +--
 5 files changed, 9 insertions(+), 12 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 0ff242d4a0d6..33521599a844 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -485,7 +485,7 @@ void hvm_migrate_pirq(struct hvm_pirq_dpci *pirq_dpci, const struct vcpu *v)
         if ( !desc )
             return;
         ASSERT(MSI_IRQ(desc - irq_desc));
-        irq_set_affinity(desc, cpumask_of(v->processor));
+        irq_set_affinity(desc, v->processor);
         spin_unlock_irq(&desc->lock);
     }
 }
diff --git a/xen/arch/x86/include/asm/irq.h b/xen/arch/x86/include/asm/irq.h
index 355332188932..73abc8323a8d 100644
--- a/xen/arch/x86/include/asm/irq.h
+++ b/xen/arch/x86/include/asm/irq.h
@@ -202,7 +202,7 @@ void move_masked_irq(struct irq_desc *desc);
 int bind_irq_vector(int irq, int vector, unsigned int cpu);
 
 void cf_check end_nonmaskable_irq(struct irq_desc *desc, uint8_t vector);
-void irq_set_affinity(struct irq_desc *desc, const cpumask_t *mask);
+void irq_set_affinity(struct irq_desc *desc, unsigned int cpu);
 
 int init_domain_irq_mapping(struct domain *d);
 void cleanup_domain_irq_mapping(struct domain *d);
diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index e09559fce856..bfb94852a6dc 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -947,7 +947,7 @@ unsigned int set_desc_affinity(struct irq_desc *desc, const cpumask_t *mask)
 }
 
 /* For re-setting irq interrupt affinity for specific irq */
-void irq_set_affinity(struct irq_desc *desc, const cpumask_t *mask)
+void irq_set_affinity(struct irq_desc *desc, unsigned int cpu)
 {
     if (!desc->handler->set_affinity)
         return;
@@ -955,19 +955,19 @@ void irq_set_affinity(struct irq_desc *desc, const cpumask_t *mask)
     ASSERT(spin_is_locked(&desc->lock));
     desc->status &= ~IRQ_MOVE_PENDING;
     smp_wmb();
-    cpumask_copy(desc->arch.pending_mask, mask);
+    cpumask_copy(desc->arch.pending_mask, cpumask_of(cpu));
     smp_wmb();
     desc->status |= IRQ_MOVE_PENDING;
 }
 
-void pirq_set_affinity(struct domain *d, int pirq, const cpumask_t *mask)
+void pirq_set_affinity(struct domain *d, int pirq, unsigned int cpu)
 {
     unsigned long flags;
     struct irq_desc *desc = domain_spin_lock_irq_desc(d, pirq, &flags);
 
     if ( !desc )
         return;
-    irq_set_affinity(desc, mask);
+    irq_set_affinity(desc, cpu);
     spin_unlock_irqrestore(&desc->lock, flags);
 }
 
diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
index 67700b050ad1..8e155649b171 100644
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -1226,8 +1226,7 @@ int evtchn_bind_vcpu(evtchn_port_t port, unsigned int vcpu_id)
             break;
         unlink_pirq_port(chn, d->vcpu[chn->notify_vcpu_id]);
         chn->notify_vcpu_id = v->vcpu_id;
-        pirq_set_affinity(d, chn->u.pirq.irq,
-                          cpumask_of(v->processor));
+        pirq_set_affinity(d, chn->u.pirq.irq, v->processor);
         link_pirq_port(port, chn, v);
         break;
 #endif
@@ -1712,7 +1711,6 @@ void evtchn_destroy_final(struct domain *d)
 void evtchn_move_pirqs(struct vcpu *v)
 {
     struct domain *d = v->domain;
-    const cpumask_t *mask = cpumask_of(v->processor);
     unsigned int port;
     struct evtchn *chn;
 
@@ -1720,7 +1718,7 @@ void evtchn_move_pirqs(struct vcpu *v)
     for ( port = v->pirq_evtchn_head; port; port = chn->u.pirq.next_port )
     {
         chn = evtchn_from_port(d, port);
-        pirq_set_affinity(d, chn->u.pirq.irq, mask);
+        pirq_set_affinity(d, chn->u.pirq.irq, v->processor);
     }
     read_unlock(&d->event_lock);
 }
diff --git a/xen/include/xen/irq.h b/xen/include/xen/irq.h
index 95034c0d6bb5..f0b119d23521 100644
--- a/xen/include/xen/irq.h
+++ b/xen/include/xen/irq.h
@@ -193,8 +193,7 @@ extern void desc_guest_eoi(struct irq_desc *desc, struct pirq *pirq);
 extern int pirq_guest_unmask(struct domain *d);
 extern int pirq_guest_bind(struct vcpu *v, struct pirq *pirq, int will_share);
 extern void pirq_guest_unbind(struct domain *d, struct pirq *pirq);
-extern void pirq_set_affinity(struct domain *d, int pirq,
-                              const cpumask_t *mask);
+extern void pirq_set_affinity(struct domain *d, int pirq, unsigned int cpu);
 extern struct irq_desc *domain_spin_lock_irq_desc(
     struct domain *d, int pirq, unsigned long *pflags);
 extern struct irq_desc *pirq_spin_lock_irq_desc(
-- 
2.51.0


Re: [PATCH 09/12] x86/irq: convert cpumask parameter to integer in {,p}irq_set_affinity()
Posted by Jan Beulich 2 weeks, 5 days ago
On 20.11.2025 10:58, Roger Pau Monne wrote:
> Existing callers where already generating the passed cpumask using
> cpumask_of() to contain a single target CPU.  Reduce complexity by passing
> the single target CPU as an integer parameter.
> 
> No functional change intended.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
> The function names are misleading, as {,p}irq_set_affinity() doesn't adjust
> the affinity of the interrupt (desc->affinity) but the interrupt target
> itself.  Further cleanup might be helpful to correctly differentiate
> between setting interrupt affinity vs setting interrupt target.

Indeed, I wanted to mention this as well. As you touch all callers anyway,
how about doing a rename (whether to "target" or "binding" or yet something
else) right here?

Jan

[PATCH 10/12] x86/irq: convert irq_desc cpu_mask field to integer
Posted by Roger Pau Monne 3 weeks, 2 days ago
With the dropping of logical target mode for interrupts Xen can now
guarantee that an irq_desc can only target a single CPU.  As such, simplify
the fields and convert the cpu_mask that used to contain the set of target
CPUs into a integer that will contain the single target CPU.

No functional change intended in how interrupts are assigned or distributed
across CPUs.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/arch/x86/hpet.c                      |  9 ++-
 xen/arch/x86/i8259.c                     |  2 +-
 xen/arch/x86/include/asm/irq.h           | 10 +--
 xen/arch/x86/include/asm/msi.h           |  3 +-
 xen/arch/x86/io_apic.c                   | 27 +++----
 xen/arch/x86/irq.c                       | 97 ++++++++++--------------
 xen/arch/x86/msi.c                       | 17 +----
 xen/arch/x86/smpboot.c                   |  2 +-
 xen/common/cpu.c                         |  1 +
 xen/drivers/passthrough/amd/iommu_init.c |  2 +-
 xen/drivers/passthrough/vtd/iommu.c      |  2 +-
 xen/include/xen/cpumask.h                |  1 +
 12 files changed, 70 insertions(+), 103 deletions(-)

diff --git a/xen/arch/x86/hpet.c b/xen/arch/x86/hpet.c
index abf4eaf86db1..168675420f06 100644
--- a/xen/arch/x86/hpet.c
+++ b/xen/arch/x86/hpet.c
@@ -310,9 +310,9 @@ static void cf_check hpet_msi_set_affinity(
     struct msi_msg msg = ch->msi.msg;
 
     /* This really is only for dump_irqs(). */
-    cpumask_copy(desc->arch.cpu_mask, mask);
+    desc->arch.cpu = cpumask_any(mask);
 
-    msg.dest32 = cpu_mask_to_apicid(mask);
+    msg.dest32 = cpu_physical_id(desc->arch.cpu);
     msg.address_lo &= ~MSI_ADDR_DEST_ID_MASK;
     msg.address_lo |= MSI_ADDR_DEST_ID(msg.dest32);
     if ( msg.dest32 != ch->msi.msg.dest32 )
@@ -337,7 +337,8 @@ static int __hpet_setup_msi_irq(struct irq_desc *desc)
 {
     struct msi_msg msg;
 
-    msi_compose_msg(desc->arch.vector, desc->arch.cpu_mask, &msg);
+    msg.dest32 = cpu_physical_id(desc->arch.cpu);
+    msi_compose_msg(desc->arch.vector, &msg);
     return hpet_msi_write(desc->action->dev_id, &msg);
 }
 
@@ -501,7 +502,7 @@ static void set_channel_irq_affinity(struct hpet_event_channel *ch)
      * we're no longer at risk of missing IRQs (provided hpet_broadcast_enter()
      * keeps setting the new deadline only afterwards).
      */
-    cpumask_copy(desc->arch.cpu_mask, cpumask_of(ch->cpu));
+    desc->arch.cpu = ch->cpu;
 
     spin_unlock(&desc->lock);
 
diff --git a/xen/arch/x86/i8259.c b/xen/arch/x86/i8259.c
index 41f949a36531..5ad5e622c0e1 100644
--- a/xen/arch/x86/i8259.c
+++ b/xen/arch/x86/i8259.c
@@ -414,7 +414,7 @@ void __init init_IRQ(void)
          * interrupts in do_IRQ().  This behavior has only been observed
          * during AP bringup.
          */
-        cpumask_copy(desc->arch.cpu_mask, cpumask_of(cpu));
+        desc->arch.cpu = cpu;
         desc->arch.vector = LEGACY_VECTOR(irq);
     }
     
diff --git a/xen/arch/x86/include/asm/irq.h b/xen/arch/x86/include/asm/irq.h
index 73abc8323a8d..97c706acebf2 100644
--- a/xen/arch/x86/include/asm/irq.h
+++ b/xen/arch/x86/include/asm/irq.h
@@ -69,13 +69,9 @@ struct irq_desc;
 struct arch_irq_desc {
         int16_t vector;                  /* vector itself is only 8 bits, */
         int16_t old_vector;              /* but we use -1 for unassigned  */
-        /*
-         * Except for high priority interrupts @cpu_mask may have bits set for
-         * offline CPUs.  Consumers need to be careful to mask this down to
-         * online ones as necessary.  There is supposed to always be a non-
-         * empty intersection with cpu_online_map.
-         */
-        cpumask_var_t cpu_mask;
+/* Special target CPU values. */
+#define CPU_INVALID  ~0U
+        unsigned int cpu;                /* Target CPU of the interrupt. */
         cpumask_var_t old_cpu_mask;
         cpumask_var_t pending_mask;
         vmask_t *used_vectors;
diff --git a/xen/arch/x86/include/asm/msi.h b/xen/arch/x86/include/asm/msi.h
index 00059d4a3a9d..2f91294105be 100644
--- a/xen/arch/x86/include/asm/msi.h
+++ b/xen/arch/x86/include/asm/msi.h
@@ -235,8 +235,7 @@ struct arch_msix {
 };
 
 void early_msi_init(void);
-void msi_compose_msg(unsigned vector, const cpumask_t *cpu_mask,
-                     struct msi_msg *msg);
+void msi_compose_msg(unsigned vector, struct msi_msg *msg);
 void __msi_set_enable(pci_sbdf_t sbdf, int pos, int enable);
 void cf_check mask_msi_irq(struct irq_desc *desc);
 void cf_check unmask_msi_irq(struct irq_desc *desc);
diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c
index dfbe27b12d54..a50f7061161a 100644
--- a/xen/arch/x86/io_apic.c
+++ b/xen/arch/x86/io_apic.c
@@ -1112,8 +1112,7 @@ static void __init setup_IO_APIC_irqs(void)
             if (platform_legacy_irq(irq))
                 disable_8259A_irq(irq_to_desc(irq));
 
-            set_entry_dest(&entry,
-                           cpu_mask_to_apicid(irq_to_desc(irq)->arch.cpu_mask));
+            set_entry_dest(&entry, cpu_physical_id(irq_to_desc(irq)->arch.cpu));
             spin_lock_irqsave(&ioapic_lock, flags);
             __ioapic_write_entry(apic, pin, false, entry);
             spin_unlock_irqrestore(&ioapic_lock, flags);
@@ -2137,14 +2136,11 @@ int io_apic_set_pci_routing (int ioapic, int pin, int irq, int edge_level, int a
         return vector;
     entry.vector = vector;
 
-    if (cpumask_intersects(desc->arch.cpu_mask, TARGET_CPUS)) {
-        cpumask_t *mask = this_cpu(scratch_cpumask);
-
-        cpumask_and(mask, desc->arch.cpu_mask, TARGET_CPUS);
-        set_entry_dest(&entry, cpu_mask_to_apicid(mask));
+    if (cpu_online(desc->arch.cpu)) {
+        set_entry_dest(&entry, cpu_physical_id(desc->arch.cpu));
     } else {
-        printk(XENLOG_ERR "IRQ%d: no target CPU (%*pb vs %*pb)\n",
-               irq, CPUMASK_PR(desc->arch.cpu_mask), CPUMASK_PR(TARGET_CPUS));
+        printk(XENLOG_ERR "IRQ%d: target CPU %u is offline\n",
+               irq, desc->arch.cpu);
         desc->status |= IRQ_DISABLED;
     }
 
@@ -2333,17 +2329,12 @@ int ioapic_guest_write(unsigned long physbase, unsigned int reg, u32 val)
     /* Set the vector field to the real vector! */
     rte.vector = desc->arch.vector;
 
-    if ( cpumask_intersects(desc->arch.cpu_mask, TARGET_CPUS) )
-    {
-        cpumask_t *mask = this_cpu(scratch_cpumask);
-
-        cpumask_and(mask, desc->arch.cpu_mask, TARGET_CPUS);
-        set_entry_dest(&rte, cpu_mask_to_apicid(mask));
-    }
+    if ( cpu_online(desc->arch.cpu) )
+        set_entry_dest(&rte, cpu_physical_id(desc->arch.cpu));
     else
     {
-        gprintk(XENLOG_ERR, "IRQ%d: no target CPU (%*pb vs %*pb)\n",
-               irq, CPUMASK_PR(desc->arch.cpu_mask), CPUMASK_PR(TARGET_CPUS));
+        gprintk(XENLOG_ERR, "IRQ%d: target CPU %u offline\n",
+                irq, desc->arch.cpu);
         desc->status |= IRQ_DISABLED;
         rte.mask = 1;
     }
diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index bfb94852a6dc..a56d1e8fc267 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -156,8 +156,7 @@ static int __init _bind_irq_vector(struct irq_desc *desc, int vector,
 
     if ( !cpu_online(cpu) )
         return -EINVAL;
-    if ( (desc->arch.vector == vector) &&
-         cpumask_test_cpu(cpu, desc->arch.cpu_mask) )
+    if ( (desc->arch.vector == vector) && cpu == desc->arch.cpu )
         return 0;
     if ( desc->arch.vector != IRQ_VECTOR_UNASSIGNED )
         return -EBUSY;
@@ -165,8 +164,7 @@ static int __init _bind_irq_vector(struct irq_desc *desc, int vector,
     trace_irq_mask(TRC_HW_IRQ_BIND_VECTOR, desc->irq, vector, cpumask_of(cpu));
     per_cpu(vector_irq, cpu)[vector] = desc->irq;
     desc->arch.vector = vector;
-    cpumask_clear(desc->arch.cpu_mask);
-    cpumask_set_cpu(cpu, desc->arch.cpu_mask);
+    desc->arch.cpu = cpu;
     if ( desc->arch.used_vectors )
     {
         ASSERT(!test_bit(vector, desc->arch.used_vectors));
@@ -195,23 +193,21 @@ int __init bind_irq_vector(int irq, int vector, unsigned int cpu)
 
 static void _clear_irq_vector(struct irq_desc *desc)
 {
-    unsigned int cpu, old_vector, irq = desc->irq;
+    unsigned int cpu = desc->arch.cpu, old_vector, irq = desc->irq;
     unsigned int vector = desc->arch.vector;
     cpumask_t *tmp_mask = this_cpu(scratch_cpumask);
 
     BUG_ON(!valid_irq_vector(vector));
 
     /* Always clear desc->arch.vector */
-    cpumask_and(tmp_mask, desc->arch.cpu_mask, &cpu_online_map);
-
-    for_each_cpu(cpu, tmp_mask)
+    if ( cpu_online(cpu) )
     {
         ASSERT(per_cpu(vector_irq, cpu)[vector] == irq);
         per_cpu(vector_irq, cpu)[vector] = ~irq;
     }
 
     desc->arch.vector = IRQ_VECTOR_UNASSIGNED;
-    cpumask_clear(desc->arch.cpu_mask);
+    desc->arch.cpu = CPU_INVALID;
 
     if ( desc->arch.used_vectors )
     {
@@ -219,7 +215,7 @@ static void _clear_irq_vector(struct irq_desc *desc)
         clear_bit(vector, desc->arch.used_vectors);
     }
 
-    trace_irq_mask(TRC_HW_IRQ_CLEAR_VECTOR, irq, vector, tmp_mask);
+    trace_irq_mask(TRC_HW_IRQ_CLEAR_VECTOR, irq, vector, cpumask_of(cpu));
 
     if ( unlikely(desc->arch.move_in_progress) )
     {
@@ -392,22 +388,16 @@ int irq_to_vector(int irq)
 
 int arch_init_one_irq_desc(struct irq_desc *desc)
 {
-    if ( !zalloc_cpumask_var(&desc->arch.cpu_mask) )
-        return -ENOMEM;
-
     if ( !alloc_cpumask_var(&desc->arch.old_cpu_mask) )
-    {
-        free_cpumask_var(desc->arch.cpu_mask);
         return -ENOMEM;
-    }
 
     if ( !alloc_cpumask_var(&desc->arch.pending_mask) )
     {
         free_cpumask_var(desc->arch.old_cpu_mask);
-        free_cpumask_var(desc->arch.cpu_mask);
         return -ENOMEM;
     }
 
+    desc->arch.cpu = CPU_INVALID;
     desc->arch.vector = IRQ_VECTOR_UNASSIGNED;
     desc->arch.old_vector = IRQ_VECTOR_UNASSIGNED;
     desc->arch.creator_domid = DOMID_INVALID;
@@ -553,12 +543,12 @@ static int _assign_irq_vector(struct irq_desc *desc, const cpumask_t *mask)
         if ( old_vector >= FIRST_HIPRIORITY_VECTOR &&
              old_vector <= LAST_HIPRIORITY_VECTOR )
         {
-            cpumask_copy(desc->arch.cpu_mask,
-                         cpumask_of(cpumask_any(&tmp_mask)));
+            desc->arch.cpu = cpumask_any(&tmp_mask);
             return 0;
         }
 
-        if (cpumask_intersects(&tmp_mask, desc->arch.cpu_mask)) {
+        if ( cpumask_test_cpu(desc->arch.cpu, &tmp_mask) )
+        {
             desc->arch.vector = old_vector;
             return 0;
         }
@@ -570,7 +560,7 @@ static int _assign_irq_vector(struct irq_desc *desc, const cpumask_t *mask)
          * If the current destination is online refuse to shuffle.  Retry after
          * the in-progress movement has finished.
          */
-        if ( cpumask_intersects(desc->arch.cpu_mask, &cpu_online_map) )
+        if ( cpu_online(desc->arch.cpu) )
             return -EAGAIN;
 
         /*
@@ -591,11 +581,10 @@ static int _assign_irq_vector(struct irq_desc *desc, const cpumask_t *mask)
              * in the 'mask' parameter.
              */
             desc->arch.vector = desc->arch.old_vector;
-            cpumask_and(desc->arch.cpu_mask, desc->arch.old_cpu_mask, mask);
+            desc->arch.cpu = cpumask_any(desc->arch.old_cpu_mask);
 
             /* Undo any possibly done cleanup. */
-            for_each_cpu(cpu, desc->arch.cpu_mask)
-                per_cpu(vector_irq, cpu)[desc->arch.vector] = irq;
+            per_cpu(vector_irq, desc->arch.cpu)[desc->arch.vector] = irq;
 
             /* Cancel the pending move and release the current vector. */
             desc->arch.old_vector = IRQ_VECTOR_UNASSIGNED;
@@ -669,7 +658,7 @@ next:
 
         if ( desc->arch.move_in_progress || desc->arch.move_cleanup_count )
         {
-            ASSERT(!cpumask_intersects(desc->arch.cpu_mask, &cpu_online_map));
+            ASSERT(!cpu_online(desc->arch.cpu));
             /*
              * Special case when evacuating an interrupt from a CPU to be
              * offlined and the interrupt was already in the process of being
@@ -684,8 +673,9 @@ next:
         }
         else if ( valid_irq_vector(old_vector) )
         {
-            cpumask_and(desc->arch.old_cpu_mask, desc->arch.cpu_mask,
-                        &cpu_online_map);
+            cpumask_clear(desc->arch.old_cpu_mask);
+            if ( cpu_online(desc->arch.cpu) )
+                cpumask_set_cpu(desc->arch.cpu, desc->arch.old_cpu_mask);
             desc->arch.old_vector = desc->arch.vector;
             if ( !cpumask_empty(desc->arch.old_cpu_mask) )
                 desc->arch.move_in_progress = 1;
@@ -698,8 +688,7 @@ next:
 
         per_cpu(vector_irq, cpu)[vector] = irq;
         desc->arch.vector = vector;
-        cpumask_clear(desc->arch.cpu_mask);
-        cpumask_set_cpu(cpu, desc->arch.cpu_mask);
+        desc->arch.cpu = cpu;
 
         desc->arch.used = IRQ_USED;
         ASSERT((desc->arch.used_vectors == NULL)
@@ -762,14 +751,14 @@ void setup_vector_irq(unsigned int cpu)
     /* Mark the inuse vectors */
     for ( irq = 0; irq < nr_irqs; ++irq )
     {
-        struct irq_desc *desc = irq_to_desc(irq);
+        const struct irq_desc *desc = irq_to_desc(irq);
 
         if ( !irq_desc_initialized(desc) )
             continue;
         vector = irq_to_vector(irq);
         if ( (vector >= FIRST_HIPRIORITY_VECTOR &&
               vector <= LAST_HIPRIORITY_VECTOR) ||
-             cpumask_test_cpu(cpu, desc->arch.cpu_mask) )
+             cpu == desc->arch.cpu )
             per_cpu(vector_irq, cpu)[vector] = irq;
     }
 }
@@ -847,8 +836,7 @@ void cf_check irq_move_cleanup_interrupt(void)
         if (!desc->arch.move_cleanup_count)
             goto unlock;
 
-        if ( vector == desc->arch.vector &&
-             cpumask_test_cpu(me, desc->arch.cpu_mask) )
+        if ( vector == desc->arch.vector && me == desc->arch.cpu )
             goto unlock;
 
         irr = apic_read(APIC_IRR + (vector / 32 * 0x10));
@@ -910,8 +898,7 @@ void cf_check irq_complete_move(struct irq_desc *desc)
     vector = (u8)get_irq_regs()->entry_vector;
     me = smp_processor_id();
 
-    if ( vector == desc->arch.vector &&
-         cpumask_test_cpu(me, desc->arch.cpu_mask) )
+    if ( vector == desc->arch.vector && me == desc->arch.cpu )
         send_cleanup_vector(desc);
 }
 
@@ -919,7 +906,6 @@ unsigned int set_desc_affinity(struct irq_desc *desc, const cpumask_t *mask)
 {
     int ret;
     unsigned long flags;
-    cpumask_t dest_mask;
 
     if ( mask && !cpumask_intersects(mask, &cpu_online_map) )
         return BAD_APICID;
@@ -932,18 +918,12 @@ unsigned int set_desc_affinity(struct irq_desc *desc, const cpumask_t *mask)
         return BAD_APICID;
 
     if ( mask )
-    {
         cpumask_copy(desc->affinity, mask);
-        cpumask_and(&dest_mask, mask, desc->arch.cpu_mask);
-    }
     else
-    {
         cpumask_setall(desc->affinity);
-        cpumask_copy(&dest_mask, desc->arch.cpu_mask);
-    }
-    cpumask_and(&dest_mask, &dest_mask, &cpu_online_map);
 
-    return cpu_mask_to_apicid(&dest_mask);
+    ASSERT(cpu_online(desc->arch.cpu));
+    return cpu_physical_id(desc->arch.cpu);
 }
 
 /* For re-setting irq interrupt affinity for specific irq */
@@ -1687,8 +1667,7 @@ int pirq_guest_bind(struct vcpu *v, struct pirq *pirq, int will_share)
                 cpumask_setall(desc->affinity);
                 affinity = &cpumask_all;
             }
-            else if ( !cpumask_intersects(desc->arch.cpu_mask,
-                                          &cpu_online_map) )
+            else if ( !cpu_online(desc->arch.cpu) )
                 affinity = desc->affinity;
             if ( affinity )
                 desc->handler->set_affinity(desc, affinity);
@@ -1966,6 +1945,15 @@ static void do_IRQ_guest(struct irq_desc *desc, unsigned int vector)
     }
 }
 
+static const cpumask_t *get_cpumask(unsigned int cpu)
+{
+    if ( cpu < nr_cpu_ids )
+        return cpumask_of(cpu);
+
+    ASSERT(cpu == CPU_INVALID);
+    return &cpumask_none;
+}
+
 void do_IRQ(struct cpu_user_regs *regs)
 {
     struct irqaction *action;
@@ -2013,7 +2001,8 @@ void do_IRQ(struct cpu_user_regs *regs)
                     spin_lock(&desc->lock);
                     printk("IRQ%d a={%*pbl}[{%*pbl},{%*pbl}] v=%02x[%02x] t=%s s=%08x\n",
                            ~irq, CPUMASK_PR(desc->affinity),
-                           CPUMASK_PR(desc->arch.cpu_mask),
+                           /* TODO: handle hipri vectors nicely. */
+                           CPUMASK_PR(get_cpumask(desc->arch.cpu)),
                            CPUMASK_PR(desc->arch.old_cpu_mask),
                            desc->arch.vector, desc->arch.old_vector,
                            desc->handler->typename, desc->status);
@@ -2545,7 +2534,8 @@ static void cf_check dump_irqs(unsigned char key)
 
         printk("   IRQ:%4d vec:%02x %-15s status=%03x aff:{%*pbl}/{%*pbl} ",
                irq, desc->arch.vector, desc->handler->typename, desc->status,
-               CPUMASK_PR(desc->affinity), CPUMASK_PR(desc->arch.cpu_mask));
+               CPUMASK_PR(desc->affinity),
+               CPUMASK_PR(get_cpumask(desc->arch.cpu)));
 
         if ( ssid )
             printk("Z=%-25s ", ssid);
@@ -2683,8 +2673,7 @@ void fixup_irqs(void)
              * interrupts.
              */
             if ( apic_irr_read(desc->arch.old_vector) )
-                send_IPI_mask(cpumask_of(cpumask_any(desc->arch.cpu_mask)),
-                              desc->arch.vector);
+                send_IPI_mask(cpumask_of(desc->arch.cpu), desc->arch.vector);
 
             /*
              * This CPU is going offline, remove it from ->arch.old_cpu_mask
@@ -2709,8 +2698,7 @@ void fixup_irqs(void)
          * are a subset of the online mask.  What fixup_irqs() cares about is
          * evacuating interrupts from CPUs not in the online mask.
          */
-        if ( !desc->action || cpumask_subset(desc->arch.cpu_mask,
-                                             &cpu_online_map) )
+        if ( !desc->action || cpu_online(desc->arch.cpu) )
         {
             spin_unlock(&desc->lock);
             continue;
@@ -2732,7 +2720,7 @@ void fixup_irqs(void)
          * the interrupt, signal to check whether there are any pending vectors
          * to be handled in the local APIC after the interrupt has been moved.
          */
-        if ( cpumask_test_cpu(cpu, desc->arch.cpu_mask) )
+        if ( cpu == desc->arch.cpu )
             check_irr = true;
 
         if ( desc->handler->set_affinity )
@@ -2754,8 +2742,7 @@ void fixup_irqs(void)
              * desc in order for any in-flight interrupts to be delivered to
              * the lapic.
              */
-            send_IPI_mask(cpumask_of(cpumask_any(desc->arch.cpu_mask)),
-                          desc->arch.vector);
+            send_IPI_mask(cpumask_of(desc->arch.cpu), desc->arch.vector);
 
         spin_unlock(&desc->lock);
 
diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c
index 5389bc08674a..f70abac687e4 100644
--- a/xen/arch/x86/msi.c
+++ b/xen/arch/x86/msi.c
@@ -154,24 +154,13 @@ static bool msix_memory_decoded(const struct pci_dev *dev, unsigned int pos)
 /*
  * MSI message composition
  */
-void msi_compose_msg(unsigned vector, const cpumask_t *cpu_mask, struct msi_msg *msg)
+void msi_compose_msg(unsigned vector, struct msi_msg *msg)
 {
     memset(msg, 0, sizeof(*msg));
 
     if ( vector < FIRST_DYNAMIC_VECTOR )
         return;
 
-    if ( cpu_mask )
-    {
-        cpumask_t *mask = this_cpu(scratch_cpumask);
-
-        if ( !cpumask_intersects(cpu_mask, &cpu_online_map) )
-            return;
-
-        cpumask_and(mask, cpu_mask, &cpu_online_map);
-        msg->dest32 = cpu_mask_to_apicid(mask);
-    }
-
     msg->address_hi = MSI_ADDR_BASE_HI;
     msg->address_lo = MSI_ADDR_BASE_LO |
                       MSI_ADDR_DESTMODE_PHYS |
@@ -531,7 +520,9 @@ int __setup_msi_irq(struct irq_desc *desc, struct msi_desc *msidesc,
 
     desc->msi_desc = msidesc;
     desc->handler = handler;
-    msi_compose_msg(desc->arch.vector, desc->arch.cpu_mask, &msg);
+    msg.dest32 = cpu_physical_id(desc->arch.cpu);
+
+    msi_compose_msg(desc->arch.vector, &msg);
     ret = write_msi_msg(msidesc, &msg, true);
     if ( unlikely(ret) )
     {
diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
index 55ea62d7d67f..3492073d5c8c 100644
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -1465,7 +1465,7 @@ void __init smp_intr_init(void)
 
         desc = irq_to_desc(irq);
         desc->arch.vector = vector;
-        cpumask_copy(desc->arch.cpu_mask, cpumask_of(cpu));
+        desc->arch.cpu = cpu;
         cpumask_setall(desc->affinity);
     }
 
diff --git a/xen/common/cpu.c b/xen/common/cpu.c
index f09af0444b6a..f52f02e2a9d6 100644
--- a/xen/common/cpu.c
+++ b/xen/common/cpu.c
@@ -16,6 +16,7 @@ unsigned int __read_mostly nr_cpumask_bits
 const cpumask_t cpumask_all = {
     .bits[0 ... (BITS_TO_LONGS(NR_CPUS) - 1)] = ~0UL
 };
+const cpumask_t cpumask_none;
 
 /*
  * cpu_bit_bitmap[] is a special, "compressed" data structure that
diff --git a/xen/drivers/passthrough/amd/iommu_init.c b/xen/drivers/passthrough/amd/iommu_init.c
index 00d2c46cbcd5..0d03b9d86254 100644
--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -507,7 +507,7 @@ static void cf_check set_x2apic_affinity(
     if ( dest == BAD_APICID )
         return;
 
-    msi_compose_msg(desc->arch.vector, NULL, &iommu->msi.msg);
+    msi_compose_msg(desc->arch.vector, &iommu->msi.msg);
     iommu->msi.msg.dest32 = dest;
 
     ctrl.dest_mode = MASK_EXTR(iommu->msi.msg.address_lo,
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index 90f36ac22b8a..d2dbf74bc8bb 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1194,7 +1194,7 @@ static void cf_check dma_msi_set_affinity(
         return;
     }
 
-    msi_compose_msg(desc->arch.vector, NULL, &msg);
+    msi_compose_msg(desc->arch.vector, &msg);
     msg.dest32 = dest;
     if (x2apic_enabled)
         msg.address_hi = dest & 0xFFFFFF00;
diff --git a/xen/include/xen/cpumask.h b/xen/include/xen/cpumask.h
index b713bb69a9cb..1d72c179d7a4 100644
--- a/xen/include/xen/cpumask.h
+++ b/xen/include/xen/cpumask.h
@@ -289,6 +289,7 @@ static inline const cpumask_t *cpumask_of(unsigned int cpu)
 #define cpumask_bits(maskp) ((maskp)->bits)
 
 extern const cpumask_t cpumask_all;
+extern const cpumask_t cpumask_none;
 
 /*
  * cpumask_var_t: struct cpumask for stack usage.
-- 
2.51.0


Re: [PATCH 10/12] x86/irq: convert irq_desc cpu_mask field to integer
Posted by Jan Beulich 2 weeks, 5 days ago
On 20.11.2025 10:58, Roger Pau Monne wrote:
> --- a/xen/arch/x86/hpet.c
> +++ b/xen/arch/x86/hpet.c
> @@ -310,9 +310,9 @@ static void cf_check hpet_msi_set_affinity(
>      struct msi_msg msg = ch->msi.msg;
>  
>      /* This really is only for dump_irqs(). */
> -    cpumask_copy(desc->arch.cpu_mask, mask);
> +    desc->arch.cpu = cpumask_any(mask);

Going from the comment, couldn't you use CPU_INVALID here? Then again, see
"x86/HPET: drop .set_affinity hook", where the function goes away anyway.

> @@ -337,7 +337,8 @@ static int __hpet_setup_msi_irq(struct irq_desc *desc)
>  {
>      struct msi_msg msg;
>  
> -    msi_compose_msg(desc->arch.vector, desc->arch.cpu_mask, &msg);
> +    msg.dest32 = cpu_physical_id(desc->arch.cpu);
> +    msi_compose_msg(desc->arch.vector, &msg);
>      return hpet_msi_write(desc->action->dev_id, &msg);
>  }

Setting msg.dest32 ahead of calling msi_compose_msg() feels odd. It makes things
look as if this was an input to the function, when by its name it rather would
want to be an output. Furthermore this is dead code right now, as the function
clears the entire structure first thing. Imo it being the function to fill the
field should be retained; instead of the CPU mask you'd once again make it a
scalar parameter. For the case where NULL was passed before, ...

> --- a/xen/arch/x86/include/asm/irq.h
> +++ b/xen/arch/x86/include/asm/irq.h
> @@ -69,13 +69,9 @@ struct irq_desc;
>  struct arch_irq_desc {
>          int16_t vector;                  /* vector itself is only 8 bits, */
>          int16_t old_vector;              /* but we use -1 for unassigned  */
> -        /*
> -         * Except for high priority interrupts @cpu_mask may have bits set for
> -         * offline CPUs.  Consumers need to be careful to mask this down to
> -         * online ones as necessary.  There is supposed to always be a non-
> -         * empty intersection with cpu_online_map.
> -         */
> -        cpumask_var_t cpu_mask;
> +/* Special target CPU values. */
> +#define CPU_INVALID  ~0U

... you already make a suitable constant available. (Nit: The expansion wants
parenthesizing.)

> --- a/xen/arch/x86/io_apic.c
> +++ b/xen/arch/x86/io_apic.c
> @@ -1112,8 +1112,7 @@ static void __init setup_IO_APIC_irqs(void)
>              if (platform_legacy_irq(irq))
>                  disable_8259A_irq(irq_to_desc(irq));
>  
> -            set_entry_dest(&entry,
> -                           cpu_mask_to_apicid(irq_to_desc(irq)->arch.cpu_mask));
> +            set_entry_dest(&entry, cpu_physical_id(irq_to_desc(irq)->arch.cpu));

I may as well mention this here: Looks like this patch removes all call sites
of cpu_mask_to_apicid(). That would leave the function unreachable, i.e. violating
a Misra rule, so I think the function needs dropping right here.

> @@ -2137,14 +2136,11 @@ int io_apic_set_pci_routing (int ioapic, int pin, int irq, int edge_level, int a
>          return vector;
>      entry.vector = vector;
>  
> -    if (cpumask_intersects(desc->arch.cpu_mask, TARGET_CPUS)) {
> -        cpumask_t *mask = this_cpu(scratch_cpumask);
> -
> -        cpumask_and(mask, desc->arch.cpu_mask, TARGET_CPUS);
> -        set_entry_dest(&entry, cpu_mask_to_apicid(mask));
> +    if (cpu_online(desc->arch.cpu)) {

Can CPU_INVALID make it here? If so, it needs guarding against. If not, an
assertion may be nice. (Same possibly elsewhere.)

> --- a/xen/arch/x86/irq.c
> +++ b/xen/arch/x86/irq.c
> @@ -156,8 +156,7 @@ static int __init _bind_irq_vector(struct irq_desc *desc, int vector,
>  
>      if ( !cpu_online(cpu) )
>          return -EINVAL;
> -    if ( (desc->arch.vector == vector) &&
> -         cpumask_test_cpu(cpu, desc->arch.cpu_mask) )
> +    if ( (desc->arch.vector == vector) && cpu == desc->arch.cpu )

Please can you be consistent with parentheses on both sides of the &&?
(I'd prefer the excess ones to be dropped, but the alternative is also
okay.)

> @@ -684,8 +673,9 @@ next:
>          }
>          else if ( valid_irq_vector(old_vector) )
>          {
> -            cpumask_and(desc->arch.old_cpu_mask, desc->arch.cpu_mask,
> -                        &cpu_online_map);
> +            cpumask_clear(desc->arch.old_cpu_mask);
> +            if ( cpu_online(desc->arch.cpu) )
> +                cpumask_set_cpu(desc->arch.cpu, desc->arch.old_cpu_mask);

As mentioned for an earlier patch, to avoid the LOCK-ed update
cpumask_copy() may be better to use. Yet iirc like there, likely this
goes away again later in the series (by the title right in the next patch),
so perhaps not a big deal.

> --- a/xen/common/cpu.c
> +++ b/xen/common/cpu.c
> @@ -16,6 +16,7 @@ unsigned int __read_mostly nr_cpumask_bits
>  const cpumask_t cpumask_all = {
>      .bits[0 ... (BITS_TO_LONGS(NR_CPUS) - 1)] = ~0UL
>  };
> +const cpumask_t cpumask_none;

This feels wasteful at least for larger NR_CPUS. And it's likely going to
violate some Misra rule on non-x86, for having no user. On x86, as long as
NR_CPUS <= 8 * PAGE_SIZE, you could (re-)use zero_page[] instead.

> --- a/xen/drivers/passthrough/amd/iommu_init.c
> +++ b/xen/drivers/passthrough/amd/iommu_init.c
> @@ -507,7 +507,7 @@ static void cf_check set_x2apic_affinity(
>      if ( dest == BAD_APICID )
>          return;
>  
> -    msi_compose_msg(desc->arch.vector, NULL, &iommu->msi.msg);
> +    msi_compose_msg(desc->arch.vector, &iommu->msi.msg);
>      iommu->msi.msg.dest32 = dest;

With the outlined adjustment above, it looks like the explicit setting
of .dest32 would then not be needed here (and perhaps elsewhere) anymore.

Jan
[PATCH 11/12] x86/irq: convert irq_desc old_cpu_mask field to integer
Posted by Roger Pau Monne 3 weeks, 2 days ago
As the cpu_mask field has already been converted to an integer, propagate
such change to the field that stores the previous target CPU and convert it
to an integer.

Also convert the move_cleanup_count field into a boolean, since the
previous target will always be a single CPU.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/arch/x86/include/asm/irq.h |  4 +-
 xen/arch/x86/irq.c             | 90 +++++++++++++---------------------
 2 files changed, 35 insertions(+), 59 deletions(-)

diff --git a/xen/arch/x86/include/asm/irq.h b/xen/arch/x86/include/asm/irq.h
index 97c706acebf2..bc59ce7c3ffb 100644
--- a/xen/arch/x86/include/asm/irq.h
+++ b/xen/arch/x86/include/asm/irq.h
@@ -72,10 +72,10 @@ struct arch_irq_desc {
 /* Special target CPU values. */
 #define CPU_INVALID  ~0U
         unsigned int cpu;                /* Target CPU of the interrupt. */
-        cpumask_var_t old_cpu_mask;
+        unsigned int old_cpu;
         cpumask_var_t pending_mask;
         vmask_t *used_vectors;
-        unsigned move_cleanup_count;
+        bool move_cleanup : 1;
         bool move_in_progress : 1;
         int8_t used;
         /*
diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index a56d1e8fc267..680f190da065 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -115,7 +115,7 @@ static void release_old_vec(struct irq_desc *desc)
     unsigned int vector = desc->arch.old_vector;
 
     desc->arch.old_vector = IRQ_VECTOR_UNASSIGNED;
-    cpumask_clear(desc->arch.old_cpu_mask);
+    desc->arch.old_cpu = CPU_INVALID;
 
     if ( !valid_irq_vector(vector) )
         ASSERT_UNREACHABLE();
@@ -195,7 +195,6 @@ static void _clear_irq_vector(struct irq_desc *desc)
 {
     unsigned int cpu = desc->arch.cpu, old_vector, irq = desc->irq;
     unsigned int vector = desc->arch.vector;
-    cpumask_t *tmp_mask = this_cpu(scratch_cpumask);
 
     BUG_ON(!valid_irq_vector(vector));
 
@@ -221,10 +220,10 @@ static void _clear_irq_vector(struct irq_desc *desc)
     {
         /* If we were in motion, also clear desc->arch.old_vector */
         old_vector = desc->arch.old_vector;
-        cpumask_and(tmp_mask, desc->arch.old_cpu_mask, &cpu_online_map);
 
-        for_each_cpu(cpu, tmp_mask)
+        if ( cpu_online(desc->arch.old_cpu) )
         {
+            cpu = desc->arch.old_cpu;
             ASSERT(per_cpu(vector_irq, cpu)[old_vector] == irq);
             TRACE_TIME(TRC_HW_IRQ_MOVE_FINISH, irq, old_vector, cpu);
             per_cpu(vector_irq, cpu)[old_vector] = ~irq;
@@ -388,16 +387,11 @@ int irq_to_vector(int irq)
 
 int arch_init_one_irq_desc(struct irq_desc *desc)
 {
-    if ( !alloc_cpumask_var(&desc->arch.old_cpu_mask) )
-        return -ENOMEM;
-
     if ( !alloc_cpumask_var(&desc->arch.pending_mask) )
-    {
-        free_cpumask_var(desc->arch.old_cpu_mask);
         return -ENOMEM;
-    }
 
     desc->arch.cpu = CPU_INVALID;
+    desc->arch.old_cpu = CPU_INVALID;
     desc->arch.vector = IRQ_VECTOR_UNASSIGNED;
     desc->arch.old_vector = IRQ_VECTOR_UNASSIGNED;
     desc->arch.creator_domid = DOMID_INVALID;
@@ -554,7 +548,7 @@ static int _assign_irq_vector(struct irq_desc *desc, const cpumask_t *mask)
         }
     }
 
-    if ( desc->arch.move_in_progress || desc->arch.move_cleanup_count )
+    if ( desc->arch.move_in_progress || desc->arch.move_cleanup )
     {
         /*
          * If the current destination is online refuse to shuffle.  Retry after
@@ -570,9 +564,9 @@ static int _assign_irq_vector(struct irq_desc *desc, const cpumask_t *mask)
          * ->arch.old_cpu_mask.
          */
         ASSERT(valid_irq_vector(desc->arch.old_vector));
-        ASSERT(cpumask_intersects(desc->arch.old_cpu_mask, &cpu_online_map));
+        ASSERT(cpu_online(desc->arch.old_cpu));
 
-        if ( cpumask_intersects(desc->arch.old_cpu_mask, mask) )
+        if ( cpumask_test_cpu(desc->arch.old_cpu, mask) )
         {
             /*
              * Fallback to the old destination if moving is in progress and the
@@ -581,16 +575,16 @@ static int _assign_irq_vector(struct irq_desc *desc, const cpumask_t *mask)
              * in the 'mask' parameter.
              */
             desc->arch.vector = desc->arch.old_vector;
-            desc->arch.cpu = cpumask_any(desc->arch.old_cpu_mask);
+            desc->arch.cpu = desc->arch.old_cpu;
 
             /* Undo any possibly done cleanup. */
             per_cpu(vector_irq, desc->arch.cpu)[desc->arch.vector] = irq;
 
             /* Cancel the pending move and release the current vector. */
             desc->arch.old_vector = IRQ_VECTOR_UNASSIGNED;
-            cpumask_clear(desc->arch.old_cpu_mask);
+            desc->arch.old_cpu = CPU_INVALID;
             desc->arch.move_in_progress = 0;
-            desc->arch.move_cleanup_count = 0;
+            desc->arch.move_cleanup =  false;
             if ( desc->arch.used_vectors )
             {
                 ASSERT(test_bit(old_vector, desc->arch.used_vectors));
@@ -656,7 +650,7 @@ next:
         current_vector = vector;
         current_offset = offset;
 
-        if ( desc->arch.move_in_progress || desc->arch.move_cleanup_count )
+        if ( desc->arch.move_in_progress || desc->arch.move_cleanup )
         {
             ASSERT(!cpu_online(desc->arch.cpu));
             /*
@@ -673,12 +667,13 @@ next:
         }
         else if ( valid_irq_vector(old_vector) )
         {
-            cpumask_clear(desc->arch.old_cpu_mask);
-            if ( cpu_online(desc->arch.cpu) )
-                cpumask_set_cpu(desc->arch.cpu, desc->arch.old_cpu_mask);
+            desc->arch.old_cpu = CPU_INVALID;
             desc->arch.old_vector = desc->arch.vector;
-            if ( !cpumask_empty(desc->arch.old_cpu_mask) )
+            if ( cpu_online(desc->arch.cpu) )
+            {
+                desc->arch.old_cpu = desc->arch.cpu;
                 desc->arch.move_in_progress = 1;
+            }
             else
                 /* This can happen while offlining a CPU. */
                 release_old_vec(desc);
@@ -833,7 +828,7 @@ void cf_check irq_move_cleanup_interrupt(void)
         if (desc->handler->enable == enable_8259A_irq)
             goto unlock;
 
-        if (!desc->arch.move_cleanup_count)
+        if ( !desc->arch.move_cleanup )
             goto unlock;
 
         if ( vector == desc->arch.vector && me == desc->arch.cpu )
@@ -862,13 +857,10 @@ void cf_check irq_move_cleanup_interrupt(void)
         TRACE_TIME(TRC_HW_IRQ_MOVE_CLEANUP, irq, vector, me);
 
         per_cpu(vector_irq, me)[vector] = ~irq;
-        desc->arch.move_cleanup_count--;
+        desc->arch.move_cleanup = false;
 
-        if ( desc->arch.move_cleanup_count == 0 )
-        {
-            ASSERT(vector == desc->arch.old_vector);
-            release_old_vec(desc);
-        }
+        ASSERT(vector == desc->arch.old_vector);
+        release_old_vec(desc);
 unlock:
         spin_unlock(&desc->lock);
     }
@@ -876,12 +868,11 @@ unlock:
 
 static void send_cleanup_vector(struct irq_desc *desc)
 {
-    cpumask_and(desc->arch.old_cpu_mask, desc->arch.old_cpu_mask,
-                &cpu_online_map);
-    desc->arch.move_cleanup_count = cpumask_weight(desc->arch.old_cpu_mask);
-
-    if ( desc->arch.move_cleanup_count )
-        send_IPI_mask(desc->arch.old_cpu_mask, IRQ_MOVE_CLEANUP_VECTOR);
+    if ( cpu_online(desc->arch.old_cpu) )
+    {
+        desc->arch.move_cleanup = true;
+        send_IPI_mask(cpumask_of(desc->arch.old_cpu), IRQ_MOVE_CLEANUP_VECTOR);
+    }
     else
         release_old_vec(desc);
 
@@ -2003,7 +1994,7 @@ void do_IRQ(struct cpu_user_regs *regs)
                            ~irq, CPUMASK_PR(desc->affinity),
                            /* TODO: handle hipri vectors nicely. */
                            CPUMASK_PR(get_cpumask(desc->arch.cpu)),
-                           CPUMASK_PR(desc->arch.old_cpu_mask),
+                           CPUMASK_PR(get_cpumask(desc->arch.old_cpu)),
                            desc->arch.vector, desc->arch.old_vector,
                            desc->handler->typename, desc->status);
                     spin_unlock(&desc->lock);
@@ -2636,26 +2627,14 @@ void fixup_irqs(void)
             continue;
         }
 
-        if ( desc->arch.move_cleanup_count )
+        if ( desc->arch.move_cleanup && !cpu_online(desc->arch.old_cpu) )
         {
             /* The cleanup IPI may have got sent while we were still online. */
-            cpumask_andnot(affinity, desc->arch.old_cpu_mask,
-                           &cpu_online_map);
-            desc->arch.move_cleanup_count -= cpumask_weight(affinity);
-            if ( !desc->arch.move_cleanup_count )
-                release_old_vec(desc);
-            else
-                /*
-                 * Adjust old_cpu_mask to account for the offline CPUs,
-                 * otherwise further calls to fixup_irqs() could subtract those
-                 * again and possibly underflow the counter.
-                 */
-                cpumask_andnot(desc->arch.old_cpu_mask, desc->arch.old_cpu_mask,
-                               affinity);
+            desc->arch.move_cleanup = false;
+            release_old_vec(desc);
         }
 
-        if ( desc->arch.move_in_progress &&
-             cpumask_test_cpu(cpu, desc->arch.old_cpu_mask) )
+        if ( desc->arch.move_in_progress && cpu == desc->arch.old_cpu )
         {
             /*
              * This to be offlined CPU was the target of an interrupt that's
@@ -2685,12 +2664,9 @@ void fixup_irqs(void)
              * per-cpu vector table will no longer have ->arch.old_vector
              * setup, and hence ->arch.old_cpu_mask would be stale.
              */
-            cpumask_clear_cpu(cpu, desc->arch.old_cpu_mask);
-            if ( cpumask_empty(desc->arch.old_cpu_mask) )
-            {
-                desc->arch.move_in_progress = 0;
-                release_old_vec(desc);
-            }
+            desc->arch.old_cpu = CPU_INVALID;
+            desc->arch.move_in_progress = 0;
+            release_old_vec(desc);
         }
 
         /*
-- 
2.51.0


Re: [PATCH 11/12] x86/irq: convert irq_desc old_cpu_mask field to integer
Posted by Jan Beulich 2 weeks, 5 days ago
On 20.11.2025 10:58, Roger Pau Monne wrote:
> --- a/xen/arch/x86/irq.c
> +++ b/xen/arch/x86/irq.c
> @@ -115,7 +115,7 @@ static void release_old_vec(struct irq_desc *desc)
>      unsigned int vector = desc->arch.old_vector;
>  
>      desc->arch.old_vector = IRQ_VECTOR_UNASSIGNED;
> -    cpumask_clear(desc->arch.old_cpu_mask);
> +    desc->arch.old_cpu = CPU_INVALID;

With this, ...

> @@ -221,10 +220,10 @@ static void _clear_irq_vector(struct irq_desc *desc)
>      {
>          /* If we were in motion, also clear desc->arch.old_vector */
>          old_vector = desc->arch.old_vector;
> -        cpumask_and(tmp_mask, desc->arch.old_cpu_mask, &cpu_online_map);
>  
> -        for_each_cpu(cpu, tmp_mask)
> +        if ( cpu_online(desc->arch.old_cpu) )

... you pretty certainly want to guard against the value making it here (even
if just accidentally), and thus triggering the assertion in cpumask_check().
(Again possibly elsewhere as well.)

> @@ -581,16 +575,16 @@ static int _assign_irq_vector(struct irq_desc *desc, const cpumask_t *mask)
>               * in the 'mask' parameter.
>               */
>              desc->arch.vector = desc->arch.old_vector;
> -            desc->arch.cpu = cpumask_any(desc->arch.old_cpu_mask);
> +            desc->arch.cpu = desc->arch.old_cpu;
>  
>              /* Undo any possibly done cleanup. */
>              per_cpu(vector_irq, desc->arch.cpu)[desc->arch.vector] = irq;
>  
>              /* Cancel the pending move and release the current vector. */
>              desc->arch.old_vector = IRQ_VECTOR_UNASSIGNED;
> -            cpumask_clear(desc->arch.old_cpu_mask);
> +            desc->arch.old_cpu = CPU_INVALID;
>              desc->arch.move_in_progress = 0;
> -            desc->arch.move_cleanup_count = 0;
> +            desc->arch.move_cleanup =  false;

Nit: Excess blank.

> @@ -2003,7 +1994,7 @@ void do_IRQ(struct cpu_user_regs *regs)
>                             ~irq, CPUMASK_PR(desc->affinity),
>                             /* TODO: handle hipri vectors nicely. */
>                             CPUMASK_PR(get_cpumask(desc->arch.cpu)),
> -                           CPUMASK_PR(desc->arch.old_cpu_mask),
> +                           CPUMASK_PR(get_cpumask(desc->arch.old_cpu)),

I should have asked on the previous patch already: Does it actually make sense
to still print these in mask form? Without that you wouldn't need get_cpumask(),
and as a result you also wouldn't need cpumask_none.

> @@ -2685,12 +2664,9 @@ void fixup_irqs(void)
>               * per-cpu vector table will no longer have ->arch.old_vector
>               * setup, and hence ->arch.old_cpu_mask would be stale.
>               */
> -            cpumask_clear_cpu(cpu, desc->arch.old_cpu_mask);
> -            if ( cpumask_empty(desc->arch.old_cpu_mask) )
> -            {
> -                desc->arch.move_in_progress = 0;
> -                release_old_vec(desc);
> -            }
> +            desc->arch.old_cpu = CPU_INVALID;
> +            desc->arch.move_in_progress = 0;

As you touch the line anyway, switch to using "false"?

Jan
[PATCH 12/12] x86/irq: convert irq_desc pending_mask field to integer
Posted by Roger Pau Monne 3 weeks, 2 days ago
Much like the rest of the fields that relate to the current or old CPU
target, the pending_mask can be converted into an integer field.

No functional change intended.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/arch/x86/include/asm/irq.h |  2 +-
 xen/arch/x86/irq.c             | 14 +++++---------
 2 files changed, 6 insertions(+), 10 deletions(-)

diff --git a/xen/arch/x86/include/asm/irq.h b/xen/arch/x86/include/asm/irq.h
index bc59ce7c3ffb..55047772eb46 100644
--- a/xen/arch/x86/include/asm/irq.h
+++ b/xen/arch/x86/include/asm/irq.h
@@ -73,7 +73,7 @@ struct arch_irq_desc {
 #define CPU_INVALID  ~0U
         unsigned int cpu;                /* Target CPU of the interrupt. */
         unsigned int old_cpu;
-        cpumask_var_t pending_mask;
+        unsigned int pending_cpu;
         vmask_t *used_vectors;
         bool move_cleanup : 1;
         bool move_in_progress : 1;
diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index 680f190da065..8d7947116e33 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -387,11 +387,9 @@ int irq_to_vector(int irq)
 
 int arch_init_one_irq_desc(struct irq_desc *desc)
 {
-    if ( !alloc_cpumask_var(&desc->arch.pending_mask) )
-        return -ENOMEM;
-
     desc->arch.cpu = CPU_INVALID;
     desc->arch.old_cpu = CPU_INVALID;
+    desc->arch.pending_cpu = CPU_INVALID;
     desc->arch.vector = IRQ_VECTOR_UNASSIGNED;
     desc->arch.old_vector = IRQ_VECTOR_UNASSIGNED;
     desc->arch.creator_domid = DOMID_INVALID;
@@ -760,8 +758,6 @@ void setup_vector_irq(unsigned int cpu)
 
 void move_masked_irq(struct irq_desc *desc)
 {
-    cpumask_t *pending_mask = desc->arch.pending_mask;
-
     if (likely(!(desc->status & IRQ_MOVE_PENDING)))
         return;
     
@@ -779,10 +775,10 @@ void move_masked_irq(struct irq_desc *desc)
      *
      * For correct operation this depends on the caller masking the irqs.
      */
-    if ( likely(cpumask_intersects(pending_mask, &cpu_online_map)) )
-        desc->handler->set_affinity(desc, pending_mask);
+    if ( likely(cpu_online(desc->arch.pending_cpu)) )
+        desc->handler->set_affinity(desc, cpumask_of(desc->arch.pending_cpu));
 
-    cpumask_clear(pending_mask);
+    desc->arch.pending_cpu = CPU_INVALID;
 }
 
 void move_native_irq(struct irq_desc *desc)
@@ -926,7 +922,7 @@ void irq_set_affinity(struct irq_desc *desc, unsigned int cpu)
     ASSERT(spin_is_locked(&desc->lock));
     desc->status &= ~IRQ_MOVE_PENDING;
     smp_wmb();
-    cpumask_copy(desc->arch.pending_mask, cpumask_of(cpu));
+    desc->arch.pending_cpu = cpu;
     smp_wmb();
     desc->status |= IRQ_MOVE_PENDING;
 }
-- 
2.51.0


Re: [PATCH 12/12] x86/irq: convert irq_desc pending_mask field to integer
Posted by Jan Beulich 2 weeks, 5 days ago
On 20.11.2025 10:58, Roger Pau Monne wrote:
> @@ -779,10 +775,10 @@ void move_masked_irq(struct irq_desc *desc)
>       *
>       * For correct operation this depends on the caller masking the irqs.
>       */
> -    if ( likely(cpumask_intersects(pending_mask, &cpu_online_map)) )
> -        desc->handler->set_affinity(desc, pending_mask);
> +    if ( likely(cpu_online(desc->arch.pending_cpu)) )

Same remark again regarding the guarding against hitting ...

> +        desc->handler->set_affinity(desc, cpumask_of(desc->arch.pending_cpu));
>  
> -    cpumask_clear(pending_mask);
> +    desc->arch.pending_cpu = CPU_INVALID;

... the value stored here.

Jan