[PATCH v2 5/5] amd/iommu: force atomic updates of remapping entries

Roger Pau Monne posted 5 patches 2 years, 6 months ago
There is a newer version of this series
[PATCH v2 5/5] amd/iommu: force atomic updates of remapping entries
Posted by Roger Pau Monne 2 years, 6 months ago
Disable XT (x2APIC) support and thus 128 IRTE entries if cmpxchg16b is
not available, do so in order to avoid having to disable the IRTE (and
possibly loose interrupts) when updating the entry.  Note this also
removes the usage of a while loop in order to disable the entry, since
RemapEn is no longer toggled when updating.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
x2APIC support was added late enough on AMD that CPUs that I believe
all models that have x2APIC must also have CX16.

The AMD-Vi manual contains the following advice in the "2.3.2 Making
Guest Interrupt Virtualization Changes" section:

"""
If RemapEn=1 before the change, the following steps may be followed to
change interrupt virtualization information:
 * Set RemapEn=0 in the entry to disable interrupt virtualization;
   device-initiated interrupt requests for the DeviceID and vector are
   processed as IO_PAGE_FAULT events.
 * Update the fields in the IRTE and invalidate the interrupt table
   (see Section 2.4.5 [INVALIDATE_INTERRUPT_TABLE]).
 * Set RemapEn=1 to virtualize interrupts from the device.
"""

However if the update of the IRTE is done atomically I assume that
setting RemapEn=0 is not longer necessary, and that the
INVALIDATE_INTERRUPT_TABLE command can be executed after the entry has
been (atomically) updated.

Note that on VT-d IRTEs always have a size of 128b, so it's not
possible to use a smaller entry size if CX16 is not available in order
to do atomic updates.
---
 xen/drivers/passthrough/amd/iommu_init.c | 10 +++++
 xen/drivers/passthrough/amd/iommu_intr.c | 57 +++++++-----------------
 2 files changed, 26 insertions(+), 41 deletions(-)

diff --git a/xen/drivers/passthrough/amd/iommu_init.c b/xen/drivers/passthrough/amd/iommu_init.c
index af6713d2fc02..e276856507a1 100644
--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -1417,6 +1417,16 @@ int __init amd_iommu_prepare(bool xt)
             has_xt = false;
     }
 
+    /*
+     * Prevent using 128bit IRTE format if there's no support for cmpxchg16b
+     * to update the entry atomically.
+     */
+    if ( xt && has_xt && !cpu_has_cx16 )
+    {
+        has_xt = false;
+        printk(XENLOG_WARNING "AMD-Vi: x2APIC not supported without CX16\n");
+    }
+
     if ( ivhd_type != ACPI_IVRS_TYPE_HARDWARE )
         iommuv2_enabled = true;
 
diff --git a/xen/drivers/passthrough/amd/iommu_intr.c b/xen/drivers/passthrough/amd/iommu_intr.c
index bb324eb16da1..4c6122a099f2 100644
--- a/xen/drivers/passthrough/amd/iommu_intr.c
+++ b/xen/drivers/passthrough/amd/iommu_intr.c
@@ -39,6 +39,7 @@ union irte32 {
 };
 
 union irte128 {
+    __uint128_t raw128;
     uint64_t raw[2];
     struct {
         bool remap_en:1;
@@ -221,15 +222,16 @@ static void update_intremap_entry(const struct amd_iommu *iommu,
                 .vector = vector,
             },
         };
-
-        ASSERT(!entry.ptr128->full.remap_en);
-        entry.ptr128->raw[1] = irte.raw[1];
-        /*
-         * High half needs to be set before low one (containing RemapEn).  See
-         * comment in free_intremap_entry() regarding the choice of barrier.
+        union irte128 old_irte = *entry.ptr128;
+        __uint128_t ret = cmpxchg16b(entry.ptr128, &old_irte, &irte);
+
+         /*
+         * In the above, we use cmpxchg16 to atomically update the 128-bit
+         * IRTE, and the hardware cannot update the IRTE behind us, so
+         * the return value of cmpxchg16 should be the same as old_ire.
+         * This ASSERT validate it.
          */
-        smp_wmb();
-        ACCESS_ONCE(entry.ptr128->raw[0]) = irte.raw[0];
+        ASSERT(ret == old_irte.raw128);
     }
     else
     {
@@ -298,21 +300,12 @@ static int update_intremap_entry_from_ioapic(
 
     entry = get_intremap_entry(iommu, req_id, offset);
 
-    /* The RemapEn fields match for all formats. */
-    while ( iommu->enabled && entry.ptr32->flds.remap_en )
-    {
-        entry.ptr32->flds.remap_en = false;
-        spin_unlock(lock);
-
-        amd_iommu_flush_intremap(iommu, req_id);
-
-        spin_lock(lock);
-    }
-
     update_intremap_entry(iommu, entry, vector, delivery_mode, dest_mode, dest);
 
     spin_unlock_irqrestore(lock, flags);
 
+    amd_iommu_flush_intremap(iommu, req_id);
+
     set_rte_index(rte, offset);
 
     return 0;
@@ -321,7 +314,6 @@ static int update_intremap_entry_from_ioapic(
 void cf_check amd_iommu_ioapic_update_ire(
     unsigned int apic, unsigned int pin, uint64_t raw)
 {
-    struct IO_APIC_route_entry old_rte = { };
     struct IO_APIC_route_entry new_rte = { .raw = raw };
     int seg, bdf, rc;
     struct amd_iommu *iommu;
@@ -343,14 +335,6 @@ void cf_check amd_iommu_ioapic_update_ire(
         return;
     }
 
-    old_rte = __ioapic_read_entry(apic, pin, true);
-    /* mask the interrupt while we change the intremap table */
-    if ( !old_rte.mask )
-    {
-        old_rte.mask = 1;
-        __ioapic_write_entry(apic, pin, true, old_rte);
-    }
-
     /* Update interrupt remapping entry */
     rc = update_intremap_entry_from_ioapic(
              bdf, iommu, &new_rte,
@@ -469,22 +453,13 @@ static int update_intremap_entry_from_msi_msg(
 
     entry = get_intremap_entry(iommu, req_id, offset);
 
-    /* The RemapEn fields match for all formats. */
-    while ( iommu->enabled && entry.ptr32->flds.remap_en )
-    {
-        entry.ptr32->flds.remap_en = false;
-        spin_unlock(lock);
-
-        amd_iommu_flush_intremap(iommu, req_id);
-        if ( alias_id != req_id )
-            amd_iommu_flush_intremap(iommu, alias_id);
-
-        spin_lock(lock);
-    }
-
     update_intremap_entry(iommu, entry, vector, delivery_mode, dest_mode, dest);
     spin_unlock_irqrestore(lock, flags);
 
+    amd_iommu_flush_intremap(iommu, req_id);
+    if ( alias_id != req_id )
+        amd_iommu_flush_intremap(iommu, alias_id);
+
     *data = (msg->data & ~(INTREMAP_MAX_ENTRIES - 1)) | offset;
 
     /*
-- 
2.41.0


Re: [PATCH v2 5/5] amd/iommu: force atomic updates of remapping entries
Posted by Jan Beulich 2 years, 6 months ago
On 18.07.2023 14:43, Roger Pau Monne wrote:
> Disable XT (x2APIC) support and thus 128 IRTE entries if cmpxchg16b is
> not available, do so in order to avoid having to disable the IRTE (and
> possibly loose interrupts) when updating the entry.  Note this also
> removes the usage of a while loop in order to disable the entry, since
> RemapEn is no longer toggled when updating.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
> x2APIC support was added late enough on AMD that CPUs that I believe
> all models that have x2APIC must also have CX16.
> 
> The AMD-Vi manual contains the following advice in the "2.3.2 Making
> Guest Interrupt Virtualization Changes" section:
> 
> """
> If RemapEn=1 before the change, the following steps may be followed to
> change interrupt virtualization information:
>  * Set RemapEn=0 in the entry to disable interrupt virtualization;
>    device-initiated interrupt requests for the DeviceID and vector are
>    processed as IO_PAGE_FAULT events.
>  * Update the fields in the IRTE and invalidate the interrupt table
>    (see Section 2.4.5 [INVALIDATE_INTERRUPT_TABLE]).
>  * Set RemapEn=1 to virtualize interrupts from the device.
> """
> 
> However if the update of the IRTE is done atomically I assume that
> setting RemapEn=0 is not longer necessary, and that the
> INVALIDATE_INTERRUPT_TABLE command can be executed after the entry has
> been (atomically) updated.

There's one additional prereq to this: The IOMMU also needs to read
128-bit IRTEs atomically. I'm afraid we can't take this for given
without it being said explicitly in the spec (I've just gone through
and looked at all occurrences of "atomic", without finding anything
applicable to IRTEs). If this can be resolved, I think I'm fine with
the patch.

Jan

Re: [PATCH v2 5/5] amd/iommu: force atomic updates of remapping entries
Posted by Roger Pau Monné 2 years, 6 months ago
On Wed, Jul 19, 2023 at 02:46:49PM +0200, Jan Beulich wrote:
> On 18.07.2023 14:43, Roger Pau Monne wrote:
> > Disable XT (x2APIC) support and thus 128 IRTE entries if cmpxchg16b is
> > not available, do so in order to avoid having to disable the IRTE (and
> > possibly loose interrupts) when updating the entry.  Note this also
> > removes the usage of a while loop in order to disable the entry, since
> > RemapEn is no longer toggled when updating.
> > 
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > ---
> > x2APIC support was added late enough on AMD that CPUs that I believe
> > all models that have x2APIC must also have CX16.
> > 
> > The AMD-Vi manual contains the following advice in the "2.3.2 Making
> > Guest Interrupt Virtualization Changes" section:
> > 
> > """
> > If RemapEn=1 before the change, the following steps may be followed to
> > change interrupt virtualization information:
> >  * Set RemapEn=0 in the entry to disable interrupt virtualization;
> >    device-initiated interrupt requests for the DeviceID and vector are
> >    processed as IO_PAGE_FAULT events.
> >  * Update the fields in the IRTE and invalidate the interrupt table
> >    (see Section 2.4.5 [INVALIDATE_INTERRUPT_TABLE]).
> >  * Set RemapEn=1 to virtualize interrupts from the device.
> > """
> > 
> > However if the update of the IRTE is done atomically I assume that
> > setting RemapEn=0 is not longer necessary, and that the
> > INVALIDATE_INTERRUPT_TABLE command can be executed after the entry has
> > been (atomically) updated.
> 
> There's one additional prereq to this: The IOMMU also needs to read
> 128-bit IRTEs atomically. I'm afraid we can't take this for given
> without it being said explicitly in the spec (I've just gone through
> and looked at all occurrences of "atomic", without finding anything
> applicable to IRTEs). If this can be resolved, I think I'm fine with
> the patch.

Hm, indeed I was taking for granted that the IOMMU will read the entry
atomically.

I was also worried about the IOMMU caching only parts of the IRTE, and
thus even when doing an atomic read of the full entry it could still
get inconsistent data if using previously cached parts of the entry.
The text in INVALIDATE_INTERRUPT_TABLE seems to suggest that IRTE
fields could be cached on a field by field basis.

I will raise a question with AMD in order to try to figure out whether
the IOMMU will do atomic reads, and also cache the full entry.

Thanks, Roger.