[PATCH 05/12] x86/io-apic: purge usage of logical mode

Roger Pau Monne posted 12 patches 3 weeks, 2 days ago
Only 6 patches received!
[PATCH 05/12] x86/io-apic: purge usage of logical mode
Posted by Roger Pau Monne 3 weeks, 2 days ago
The IO-APIC RTEs are unconditionally programmed with physical destination
mode, and hence the field to set in the RTE is always physical_dest.

Remove the mode parameter from SET_DEST() and take the opportunity to
convert it into a function, there's no need for it to be a macro.

This is a benign fix, because due to the endianness of x86 the start of the
physical_dest and logical_dest fields on the RTE overlap.

While there also adjust setup_IO_APIC_irqs() to use the target CPU set by
the assign_irq_vector(), rather than using TARGET_CPUS and possibly
creating a mismatch between the target CPU in desc->arch.cpu_mask and the
one programmed in the IO-APIC RTE.

Fixes: 032d6733a458 ("x86/apic: remove delivery and destination mode fields from drivers")
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/arch/x86/io_apic.c | 27 +++++++++++++++------------
 1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c
index 46c2a43dac6d..3c59bf0e15da 100644
--- a/xen/arch/x86/io_apic.c
+++ b/xen/arch/x86/io_apic.c
@@ -1037,12 +1037,14 @@ static hw_irq_controller ioapic_edge_type;
 #define IOAPIC_EDGE	0
 #define IOAPIC_LEVEL	1
 
-#define SET_DEST(ent, mode, val) do { \
-    if (x2apic_enabled && iommu_intremap) \
-        (ent).dest.dest32 = (val); \
-    else \
-        (ent).dest.mode.mode##_dest = (val); \
-} while (0)
+static void set_entry_dest(struct IO_APIC_route_entry *entry,
+                           unsigned int apic_id)
+{
+    if ( x2apic_enabled && iommu_intremap )
+        entry->dest.dest32 = apic_id;
+    else
+        entry->dest.physical.physical_dest = apic_id;
+}
 
 static inline void ioapic_register_intr(int irq, unsigned long trigger)
 {
@@ -1109,7 +1111,8 @@ static void __init setup_IO_APIC_irqs(void)
             if (platform_legacy_irq(irq))
                 disable_8259A_irq(irq_to_desc(irq));
 
-            SET_DEST(entry, logical, cpu_mask_to_apicid(TARGET_CPUS));
+            set_entry_dest(&entry,
+                           cpu_mask_to_apicid(irq_to_desc(irq)->arch.cpu_mask));
             spin_lock_irqsave(&ioapic_lock, flags);
             __ioapic_write_entry(apic, pin, false, entry);
             spin_unlock_irqrestore(&ioapic_lock, flags);
@@ -1140,7 +1143,7 @@ static void __init setup_ExtINT_IRQ0_pin(unsigned int apic, unsigned int pin, in
      */
     entry.dest_mode = 0; /* physical delivery */
     entry.mask = 0;					/* unmask IRQ now */
-    SET_DEST(entry, logical, cpu_mask_to_apicid(TARGET_CPUS));
+    set_entry_dest(&entry, cpu_mask_to_apicid(TARGET_CPUS));
     entry.delivery_mode = dest_Fixed;
     entry.polarity = 0;
     entry.trigger = 0;
@@ -1432,7 +1435,7 @@ void disable_IO_APIC(void)
         entry.dest_mode       = 0; /* Physical */
         entry.delivery_mode   = dest_ExtINT; /* ExtInt */
         entry.vector          = 0;
-        SET_DEST(entry, physical, get_apic_id());
+        set_entry_dest(&entry, get_apic_id());
 
         /*
          * Add it to the IO-APIC irq-routing table:
@@ -1806,7 +1809,7 @@ static void __init unlock_ExtINT_logic(void)
 
     entry1.dest_mode = 0;			/* physical delivery */
     entry1.mask = 0;			/* unmask IRQ now */
-    SET_DEST(entry1, physical, get_apic_id());
+    set_entry_dest(&entry1, get_apic_id());
     entry1.delivery_mode = dest_ExtINT;
     entry1.polarity = entry0.polarity;
     entry1.trigger = 0;
@@ -2137,7 +2140,7 @@ int io_apic_set_pci_routing (int ioapic, int pin, int irq, int edge_level, int a
         cpumask_t *mask = this_cpu(scratch_cpumask);
 
         cpumask_and(mask, desc->arch.cpu_mask, TARGET_CPUS);
-        SET_DEST(entry, logical, cpu_mask_to_apicid(mask));
+        set_entry_dest(&entry, cpu_mask_to_apicid(mask));
     } else {
         printk(XENLOG_ERR "IRQ%d: no target CPU (%*pb vs %*pb)\n",
                irq, CPUMASK_PR(desc->arch.cpu_mask), CPUMASK_PR(TARGET_CPUS));
@@ -2334,7 +2337,7 @@ int ioapic_guest_write(unsigned long physbase, unsigned int reg, u32 val)
         cpumask_t *mask = this_cpu(scratch_cpumask);
 
         cpumask_and(mask, desc->arch.cpu_mask, TARGET_CPUS);
-        SET_DEST(rte, logical, cpu_mask_to_apicid(mask));
+        set_entry_dest(&rte, cpu_mask_to_apicid(mask));
     }
     else
     {
-- 
2.51.0


Re: [PATCH 05/12] x86/io-apic: purge usage of logical mode
Posted by Andrew Cooper 3 weeks, 2 days ago
On 20/11/2025 9:06 am, Roger Pau Monne wrote:
> The IO-APIC RTEs are unconditionally programmed with physical destination
> mode, and hence the field to set in the RTE is always physical_dest.
>
> Remove the mode parameter from SET_DEST() and take the opportunity to
> convert it into a function, there's no need for it to be a macro.
>
> This is a benign fix, because due to the endianness of x86 the start of the
> physical_dest and logical_dest fields on the RTE overlap.

RTEs do not have overlapping fields; it's Xen's abstraction of the
IO-APIC which is buggy.

For starters, Xen's IO_APIC_route_entry still only has a 4-bit
physical_dest field which hasn't been true since the Pentium 4 days. 
This might explain some of the interrupt bugs we see.

~Andrew

Re: [PATCH 05/12] x86/io-apic: purge usage of logical mode
Posted by Jan Beulich 3 weeks, 2 days ago
On 20.11.2025 14:18, Andrew Cooper wrote:
> On 20/11/2025 9:06 am, Roger Pau Monne wrote:
>> The IO-APIC RTEs are unconditionally programmed with physical destination
>> mode, and hence the field to set in the RTE is always physical_dest.
>>
>> Remove the mode parameter from SET_DEST() and take the opportunity to
>> convert it into a function, there's no need for it to be a macro.
>>
>> This is a benign fix, because due to the endianness of x86 the start of the
>> physical_dest and logical_dest fields on the RTE overlap.
> 
> RTEs do not have overlapping fields; it's Xen's abstraction of the
> IO-APIC which is buggy.

I wouldn't put it this negatively. In the old days, ...

> For starters, Xen's IO_APIC_route_entry still only has a 4-bit
> physical_dest field which hasn't been true since the Pentium 4 days. 
> This might explain some of the interrupt bugs we see.

... as you mention here, the two fields were distinct (and hence overlapping).
In a number of places we passed "logical" to SET_DEST() as the middle argument,
thus covering for the too narrow field width of physical_dest. Dropping that
parameter and always using physical_dest requires that field to be widened,
though (or else we'll end up chopping off the top 4 bits, as we already do in
disable_IO_APIC() and unlock_ExtINT_logic() - both benign as long as the CPU
used always has APIC ID 0, which will at least typically be the case, I think).

Jan

Re: [PATCH 05/12] x86/io-apic: purge usage of logical mode
Posted by Andrew Cooper 3 weeks, 2 days ago
On 20/11/2025 2:27 pm, Jan Beulich wrote:
> On 20.11.2025 14:18, Andrew Cooper wrote:
>> On 20/11/2025 9:06 am, Roger Pau Monne wrote:
>>> The IO-APIC RTEs are unconditionally programmed with physical destination
>>> mode, and hence the field to set in the RTE is always physical_dest.
>>>
>>> Remove the mode parameter from SET_DEST() and take the opportunity to
>>> convert it into a function, there's no need for it to be a macro.
>>>
>>> This is a benign fix, because due to the endianness of x86 the start of the
>>> physical_dest and logical_dest fields on the RTE overlap.
>> RTEs do not have overlapping fields; it's Xen's abstraction of the
>> IO-APIC which is buggy.
> I wouldn't put it this negatively. In the old days, ...
>
>> For starters, Xen's IO_APIC_route_entry still only has a 4-bit
>> physical_dest field which hasn't been true since the Pentium 4 days. 
>> This might explain some of the interrupt bugs we see.
> ... as you mention here, the two fields were distinct (and hence overlapping).

Since when?

Even in the oldest manuals I can find, it's a single field called
destination, and who's contents is interpreted differently depending on
the logical mode bit.

From a hardware point of view, there will either be 4 or 8 bits of
storage, and it will have nothing to do with the lower bits.

> In a number of places we passed "logical" to SET_DEST() as the middle argument,
> thus covering for the too narrow field width of physical_dest. Dropping that
> parameter and always using physical_dest requires that field to be widened,
> though (or else we'll end up chopping off the top 4 bits, as we already do in
> disable_IO_APIC() and unlock_ExtINT_logic() - both benign as long as the CPU
> used always has APIC ID 0, which will at least typically be the case, I think).

Latent or not, it's still in need of fixing.

It looks like the code Xen inherited was added to Linux in e1d919786
(Jan 2008, even then only x86_32) and deleted by d83e94acd957 (August 2008).

It looks like we're 17 years late undoing this...

~Andrew

Re: [PATCH 05/12] x86/io-apic: purge usage of logical mode
Posted by Jan Beulich 3 weeks, 1 day ago
On 20.11.2025 19:30, Andrew Cooper wrote:
> On 20/11/2025 2:27 pm, Jan Beulich wrote:
>> On 20.11.2025 14:18, Andrew Cooper wrote:
>>> On 20/11/2025 9:06 am, Roger Pau Monne wrote:
>>>> The IO-APIC RTEs are unconditionally programmed with physical destination
>>>> mode, and hence the field to set in the RTE is always physical_dest.
>>>>
>>>> Remove the mode parameter from SET_DEST() and take the opportunity to
>>>> convert it into a function, there's no need for it to be a macro.
>>>>
>>>> This is a benign fix, because due to the endianness of x86 the start of the
>>>> physical_dest and logical_dest fields on the RTE overlap.
>>> RTEs do not have overlapping fields; it's Xen's abstraction of the
>>> IO-APIC which is buggy.
>> I wouldn't put it this negatively. In the old days, ...
>>
>>> For starters, Xen's IO_APIC_route_entry still only has a 4-bit
>>> physical_dest field which hasn't been true since the Pentium 4 days. 
>>> This might explain some of the interrupt bugs we see.
>> ... as you mention here, the two fields were distinct (and hence overlapping).
> 
> Since when?
> 
> Even in the oldest manuals I can find, it's a single field called
> destination, and who's contents is interpreted differently depending on
> the logical mode bit.

And these two different meanings are reflected in the union-ized definition
in Xen. Which I still think is fair to describe as "overlapping". And the
use of a union itself isn't buggy imo; it's some of the code using it which
is.

Jan