[Xen-devel] [PATCH v3 00/14] x86: AMD x2APIC support

Jan Beulich posted 14 patches 4 years, 9 months ago
Only 0 patches received!
[Xen-devel] [PATCH v3 00/14] x86: AMD x2APIC support
Posted by Jan Beulich 4 years, 9 months ago
Despite the title this is actually all AMD IOMMU side work; all x86
side adjustments have already been carried out.

The first and last patches aren't really x2APIC related, but were found
helpful in the course of the re-work done for this version. The first
one lives in its place for easy backporting.

Note that this series now depends on the v4 "x86: IRQ management
adjustments" one, in particular on at least "x86/IOMMU: don't restrict
IRQ affinities to online CPUs".

See individual patches for changes from v2.

01: free more memory when cleaning up after error
02: use bit field for extended feature register
03: use bit field for control register
04: use bit field for IRTE
05: pass IOMMU to iterate_ivrs_entries() callback
06: pass IOMMU to amd_iommu_alloc_intremap_table()
07: pass IOMMU to {get,free,update}_intremap_entry()
08: introduce 128-bit IRTE non-guest-APIC IRTE format
09: split amd_iommu_init_one()
10: allow enabling with IRQ not yet set up
11: adjust setup of internal interrupt for x2APIC mode
12: enable x2APIC mode when available
13: correct IRTE updating
14: process softirqs while dumping IRTs

Full set of patches once again attached here due to still unresolved
email issues over here.

Jan
AMD/IOMMU: free more memory when cleaning up after error

The interrupt remapping in-use bitmaps were leaked in all cases. The
ring buffers and the mapping of the MMIO space were leaked for any IOMMU
that hadn't been enabled yet.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v3: New.

--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -1070,13 +1070,12 @@ static void __init amd_iommu_init_cleanu
     {
         list_del(&iommu->list);
         if ( iommu->enabled )
-        {
             disable_iommu(iommu);
-            deallocate_ring_buffer(&iommu->cmd_buffer);
-            deallocate_ring_buffer(&iommu->event_log);
-            deallocate_ring_buffer(&iommu->ppr_log);
-            unmap_iommu_mmio_region(iommu);
-        }
+
+        deallocate_ring_buffer(&iommu->cmd_buffer);
+        deallocate_ring_buffer(&iommu->event_log);
+        deallocate_ring_buffer(&iommu->ppr_log);
+        unmap_iommu_mmio_region(iommu);
         xfree(iommu);
     }
 
--- a/xen/drivers/passthrough/amd/iommu_intr.c
+++ b/xen/drivers/passthrough/amd/iommu_intr.c
@@ -610,6 +610,8 @@ int __init amd_iommu_free_intremap_table
 {
     void *tb = ivrs_mapping->intremap_table;
 
+    XFREE(ivrs_mapping->intremap_inuse);
+
     if ( tb )
     {
         __free_amd_iommu_tables(tb, INTREMAP_TABLE_ORDER);
AMD/IOMMU: use bit field for extended feature register

This also takes care of several of the shift values wrongly having been
specified as hex rather than dec.

Take the opportunity and
- replace a readl() pair by a single readq(),
- add further fields.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v3: Another attempt at deriving masks from bitfields, hopefully better
    liked by clang (mine was fine even with the v2 variant).
v2: Correct sats_sup position and name. Re-base over new earlier patch.

--- a/xen/drivers/passthrough/amd/iommu_detect.c
+++ b/xen/drivers/passthrough/amd/iommu_detect.c
@@ -60,49 +60,77 @@ static int __init get_iommu_capabilities
 
 void __init get_iommu_features(struct amd_iommu *iommu)
 {
-    u32 low, high;
-    int i = 0 ;
     const struct amd_iommu *first;
-    static const char *__initdata feature_str[] = {
-        "- Prefetch Pages Command", 
-        "- Peripheral Page Service Request", 
-        "- X2APIC Supported", 
-        "- NX bit Supported", 
-        "- Guest Translation", 
-        "- Reserved bit [5]",
-        "- Invalidate All Command", 
-        "- Guest APIC supported", 
-        "- Hardware Error Registers", 
-        "- Performance Counters", 
-        NULL
-    };
-
     ASSERT( iommu->mmio_base );
 
     if ( !iommu_has_cap(iommu, PCI_CAP_EFRSUP_SHIFT) )
     {
-        iommu->features = 0;
+        iommu->features.raw = 0;
         return;
     }
 
-    low = readl(iommu->mmio_base + IOMMU_EXT_FEATURE_MMIO_OFFSET);
-    high = readl(iommu->mmio_base + IOMMU_EXT_FEATURE_MMIO_OFFSET + 4);
-
-    iommu->features = ((u64)high << 32) | low;
+    iommu->features.raw =
+        readq(iommu->mmio_base + IOMMU_EXT_FEATURE_MMIO_OFFSET);
 
     /* Don't log the same set of features over and over. */
     first = list_first_entry(&amd_iommu_head, struct amd_iommu, list);
-    if ( iommu != first && iommu->features == first->features )
+    if ( iommu != first && iommu->features.raw == first->features.raw )
         return;
 
     printk("AMD-Vi: IOMMU Extended Features:\n");
 
-    while ( feature_str[i] )
+#define FEAT(fld, str) do {                                    \
+    if ( --((union amd_iommu_ext_features){}).flds.fld > 1 )   \
+        printk( "- " str ": %#x\n", iommu->features.flds.fld); \
+    else if ( iommu->features.flds.fld )                       \
+        printk( "- " str "\n");                                \
+} while ( false )
+
+    FEAT(pref_sup,           "Prefetch Pages Command");
+    FEAT(ppr_sup,            "Peripheral Page Service Request");
+    FEAT(xt_sup,             "x2APIC");
+    FEAT(nx_sup,             "NX bit");
+    FEAT(gappi_sup,          "Guest APIC Physical Processor Interrupt");
+    FEAT(ia_sup,             "Invalidate All Command");
+    FEAT(ga_sup,             "Guest APIC");
+    FEAT(he_sup,             "Hardware Error Registers");
+    FEAT(pc_sup,             "Performance Counters");
+    FEAT(hats,               "Host Address Translation Size");
+
+    if ( iommu->features.flds.gt_sup )
     {
-        if ( amd_iommu_has_feature(iommu, i) )
-            printk( " %s\n", feature_str[i]);
-        i++;
+        FEAT(gats,           "Guest Address Translation Size");
+        FEAT(glx_sup,        "Guest CR3 Root Table Level");
+        FEAT(pas_max,        "Maximum PASID");
     }
+
+    FEAT(smif_sup,           "SMI Filter Register");
+    FEAT(smif_rc,            "SMI Filter Register Count");
+    FEAT(gam_sup,            "Guest Virtual APIC Modes");
+    FEAT(dual_ppr_log_sup,   "Dual PPR Log");
+    FEAT(dual_event_log_sup, "Dual Event Log");
+    FEAT(sats_sup,           "Secure ATS");
+    FEAT(us_sup,             "User / Supervisor Page Protection");
+    FEAT(dev_tbl_seg_sup,    "Device Table Segmentation");
+    FEAT(ppr_early_of_sup,   "PPR Log Overflow Early Warning");
+    FEAT(ppr_auto_rsp_sup,   "PPR Automatic Response");
+    FEAT(marc_sup,           "Memory Access Routing and Control");
+    FEAT(blk_stop_mrk_sup,   "Block StopMark Message");
+    FEAT(perf_opt_sup ,      "Performance Optimization");
+    FEAT(msi_cap_mmio_sup,   "MSI Capability MMIO Access");
+    FEAT(gio_sup,            "Guest I/O Protection");
+    FEAT(ha_sup,             "Host Access");
+    FEAT(eph_sup,            "Enhanced PPR Handling");
+    FEAT(attr_fw_sup,        "Attribute Forward");
+    FEAT(hd_sup,             "Host Dirty");
+    FEAT(inv_iotlb_type_sup, "Invalidate IOTLB Type");
+    FEAT(viommu_sup,         "Virtualized IOMMU");
+    FEAT(vm_guard_io_sup,    "VMGuard I/O Support");
+    FEAT(vm_table_size,      "VM Table Size");
+    FEAT(ga_update_dis_sup,  "Guest Access Bit Update Disable");
+
+#undef FEAT
+#undef MASK
 }
 
 int __init amd_iommu_detect_one_acpi(
--- a/xen/drivers/passthrough/amd/iommu_guest.c
+++ b/xen/drivers/passthrough/amd/iommu_guest.c
@@ -638,7 +638,7 @@ static uint64_t iommu_mmio_read64(struct
         val = reg_to_u64(iommu->reg_status);
         break;
     case IOMMU_EXT_FEATURE_MMIO_OFFSET:
-        val = reg_to_u64(iommu->reg_ext_feature);
+        val = iommu->reg_ext_feature.raw;
         break;
 
     default:
@@ -802,39 +802,26 @@ int guest_iommu_set_base(struct domain *
 /* Initialize mmio read only bits */
 static void guest_iommu_reg_init(struct guest_iommu *iommu)
 {
-    uint32_t lower, upper;
+    union amd_iommu_ext_features ef = {
+        /* Support prefetch */
+        .flds.pref_sup = 1,
+        /* Support PPR log */
+        .flds.ppr_sup = 1,
+        /* Support guest translation */
+        .flds.gt_sup = 1,
+        /* Support invalidate all command */
+        .flds.ia_sup = 1,
+        /* Host translation size has 6 levels */
+        .flds.hats = HOST_ADDRESS_SIZE_6_LEVEL,
+        /* Guest translation size has 6 levels */
+        .flds.gats = GUEST_ADDRESS_SIZE_6_LEVEL,
+        /* Single level gCR3 */
+        .flds.glx_sup = GUEST_CR3_1_LEVEL,
+        /* 9 bit PASID */
+        .flds.pas_max = PASMAX_9_bit,
+    };
 
-    lower = upper = 0;
-    /* Support prefetch */
-    iommu_set_bit(&lower,IOMMU_EXT_FEATURE_PREFSUP_SHIFT);
-    /* Support PPR log */
-    iommu_set_bit(&lower,IOMMU_EXT_FEATURE_PPRSUP_SHIFT);
-    /* Support guest translation */
-    iommu_set_bit(&lower,IOMMU_EXT_FEATURE_GTSUP_SHIFT);
-    /* Support invalidate all command */
-    iommu_set_bit(&lower,IOMMU_EXT_FEATURE_IASUP_SHIFT);
-
-    /* Host translation size has 6 levels */
-    set_field_in_reg_u32(HOST_ADDRESS_SIZE_6_LEVEL, lower,
-                         IOMMU_EXT_FEATURE_HATS_MASK,
-                         IOMMU_EXT_FEATURE_HATS_SHIFT,
-                         &lower);
-    /* Guest translation size has 6 levels */
-    set_field_in_reg_u32(GUEST_ADDRESS_SIZE_6_LEVEL, lower,
-                         IOMMU_EXT_FEATURE_GATS_MASK,
-                         IOMMU_EXT_FEATURE_GATS_SHIFT,
-                         &lower);
-    /* Single level gCR3 */
-    set_field_in_reg_u32(GUEST_CR3_1_LEVEL, lower,
-                         IOMMU_EXT_FEATURE_GLXSUP_MASK,
-                         IOMMU_EXT_FEATURE_GLXSUP_SHIFT, &lower);
-    /* 9 bit PASID */
-    set_field_in_reg_u32(PASMAX_9_bit, upper,
-                         IOMMU_EXT_FEATURE_PASMAX_MASK,
-                         IOMMU_EXT_FEATURE_PASMAX_SHIFT, &upper);
-
-    iommu->reg_ext_feature.lo = lower;
-    iommu->reg_ext_feature.hi = upper;
+    iommu->reg_ext_feature = ef;
 }
 
 static int guest_iommu_mmio_range(struct vcpu *v, unsigned long addr)
--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -883,7 +883,7 @@ static void enable_iommu(struct amd_iomm
     register_iommu_event_log_in_mmio_space(iommu);
     register_iommu_exclusion_range(iommu);
 
-    if ( amd_iommu_has_feature(iommu, IOMMU_EXT_FEATURE_PPRSUP_SHIFT) )
+    if ( iommu->features.flds.ppr_sup )
         register_iommu_ppr_log_in_mmio_space(iommu);
 
     desc = irq_to_desc(iommu->msi.irq);
@@ -897,15 +897,15 @@ static void enable_iommu(struct amd_iomm
     set_iommu_command_buffer_control(iommu, IOMMU_CONTROL_ENABLED);
     set_iommu_event_log_control(iommu, IOMMU_CONTROL_ENABLED);
 
-    if ( amd_iommu_has_feature(iommu, IOMMU_EXT_FEATURE_PPRSUP_SHIFT) )
+    if ( iommu->features.flds.ppr_sup )
         set_iommu_ppr_log_control(iommu, IOMMU_CONTROL_ENABLED);
 
-    if ( amd_iommu_has_feature(iommu, IOMMU_EXT_FEATURE_GTSUP_SHIFT) )
+    if ( iommu->features.flds.gt_sup )
         set_iommu_guest_translation_control(iommu, IOMMU_CONTROL_ENABLED);
 
     set_iommu_translation_control(iommu, IOMMU_CONTROL_ENABLED);
 
-    if ( amd_iommu_has_feature(iommu, IOMMU_EXT_FEATURE_IASUP_SHIFT) )
+    if ( iommu->features.flds.ia_sup )
         amd_iommu_flush_all_caches(iommu);
 
     iommu->enabled = 1;
@@ -928,10 +928,10 @@ static void disable_iommu(struct amd_iom
     set_iommu_command_buffer_control(iommu, IOMMU_CONTROL_DISABLED);
     set_iommu_event_log_control(iommu, IOMMU_CONTROL_DISABLED);
 
-    if ( amd_iommu_has_feature(iommu, IOMMU_EXT_FEATURE_PPRSUP_SHIFT) )
+    if ( iommu->features.flds.ppr_sup )
         set_iommu_ppr_log_control(iommu, IOMMU_CONTROL_DISABLED);
 
-    if ( amd_iommu_has_feature(iommu, IOMMU_EXT_FEATURE_GTSUP_SHIFT) )
+    if ( iommu->features.flds.gt_sup )
         set_iommu_guest_translation_control(iommu, IOMMU_CONTROL_DISABLED);
 
     set_iommu_translation_control(iommu, IOMMU_CONTROL_DISABLED);
@@ -1027,7 +1027,7 @@ static int __init amd_iommu_init_one(str
 
     get_iommu_features(iommu);
 
-    if ( iommu->features )
+    if ( iommu->features.raw )
         iommuv2_enabled = 1;
 
     if ( allocate_cmd_buffer(iommu) == NULL )
@@ -1036,9 +1036,8 @@ static int __init amd_iommu_init_one(str
     if ( allocate_event_log(iommu) == NULL )
         goto error_out;
 
-    if ( amd_iommu_has_feature(iommu, IOMMU_EXT_FEATURE_PPRSUP_SHIFT) )
-        if ( allocate_ppr_log(iommu) == NULL )
-            goto error_out;
+    if ( iommu->features.flds.ppr_sup && !allocate_ppr_log(iommu) )
+        goto error_out;
 
     if ( !set_iommu_interrupt_handler(iommu) )
         goto error_out;
@@ -1388,7 +1387,7 @@ void amd_iommu_resume(void)
     }
 
     /* flush all cache entries after iommu re-enabled */
-    if ( !amd_iommu_has_feature(iommu, IOMMU_EXT_FEATURE_IASUP_SHIFT) )
+    if ( !iommu->features.flds.ia_sup )
     {
         invalidate_all_devices();
         invalidate_all_domain_pages();
--- a/xen/include/asm-x86/amd-iommu.h
+++ b/xen/include/asm-x86/amd-iommu.h
@@ -83,7 +83,7 @@ struct amd_iommu {
     iommu_cap_t cap;
 
     u8 ht_flags;
-    u64 features;
+    union amd_iommu_ext_features features;
 
     void *mmio_base;
     unsigned long mmio_base_phys;
@@ -174,7 +174,7 @@ struct guest_iommu {
     /* MMIO regs */
     struct mmio_reg         reg_ctrl;              /* MMIO offset 0018h */
     struct mmio_reg         reg_status;            /* MMIO offset 2020h */
-    struct mmio_reg         reg_ext_feature;       /* MMIO offset 0030h */
+    union amd_iommu_ext_features reg_ext_feature;  /* MMIO offset 0030h */
 
     /* guest interrupt settings */
     struct guest_iommu_msi  msi;
--- a/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
+++ b/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
@@ -346,26 +346,57 @@ struct amd_iommu_dte {
 #define IOMMU_EXCLUSION_LIMIT_HIGH_MASK		0xFFFFFFFF
 #define IOMMU_EXCLUSION_LIMIT_HIGH_SHIFT	0
 
-/* Extended Feature Register*/
+/* Extended Feature Register */
 #define IOMMU_EXT_FEATURE_MMIO_OFFSET                   0x30
-#define IOMMU_EXT_FEATURE_PREFSUP_SHIFT                 0x0
-#define IOMMU_EXT_FEATURE_PPRSUP_SHIFT                  0x1
-#define IOMMU_EXT_FEATURE_XTSUP_SHIFT                   0x2
-#define IOMMU_EXT_FEATURE_NXSUP_SHIFT                   0x3
-#define IOMMU_EXT_FEATURE_GTSUP_SHIFT                   0x4
-#define IOMMU_EXT_FEATURE_IASUP_SHIFT                   0x6
-#define IOMMU_EXT_FEATURE_GASUP_SHIFT                   0x7
-#define IOMMU_EXT_FEATURE_HESUP_SHIFT                   0x8
-#define IOMMU_EXT_FEATURE_PCSUP_SHIFT                   0x9
-#define IOMMU_EXT_FEATURE_HATS_SHIFT                    0x10
-#define IOMMU_EXT_FEATURE_HATS_MASK                     0x00000C00
-#define IOMMU_EXT_FEATURE_GATS_SHIFT                    0x12
-#define IOMMU_EXT_FEATURE_GATS_MASK                     0x00003000
-#define IOMMU_EXT_FEATURE_GLXSUP_SHIFT                  0x14
-#define IOMMU_EXT_FEATURE_GLXSUP_MASK                   0x0000C000
 
-#define IOMMU_EXT_FEATURE_PASMAX_SHIFT                  0x0
-#define IOMMU_EXT_FEATURE_PASMAX_MASK                   0x0000001F
+union amd_iommu_ext_features {
+    uint64_t raw;
+    struct {
+        unsigned int pref_sup:1;
+        unsigned int ppr_sup:1;
+        unsigned int xt_sup:1;
+        unsigned int nx_sup:1;
+        unsigned int gt_sup:1;
+        unsigned int gappi_sup:1;
+        unsigned int ia_sup:1;
+        unsigned int ga_sup:1;
+        unsigned int he_sup:1;
+        unsigned int pc_sup:1;
+        unsigned int hats:2;
+        unsigned int gats:2;
+        unsigned int glx_sup:2;
+        unsigned int smif_sup:2;
+        unsigned int smif_rc:3;
+        unsigned int gam_sup:3;
+        unsigned int dual_ppr_log_sup:2;
+        unsigned int :2;
+        unsigned int dual_event_log_sup:2;
+        unsigned int :1;
+        unsigned int sats_sup:1;
+        unsigned int pas_max:5;
+        unsigned int us_sup:1;
+        unsigned int dev_tbl_seg_sup:2;
+        unsigned int ppr_early_of_sup:1;
+        unsigned int ppr_auto_rsp_sup:1;
+        unsigned int marc_sup:2;
+        unsigned int blk_stop_mrk_sup:1;
+        unsigned int perf_opt_sup:1;
+        unsigned int msi_cap_mmio_sup:1;
+        unsigned int :1;
+        unsigned int gio_sup:1;
+        unsigned int ha_sup:1;
+        unsigned int eph_sup:1;
+        unsigned int attr_fw_sup:1;
+        unsigned int hd_sup:1;
+        unsigned int :1;
+        unsigned int inv_iotlb_type_sup:1;
+        unsigned int viommu_sup:1;
+        unsigned int vm_guard_io_sup:1;
+        unsigned int vm_table_size:4;
+        unsigned int ga_update_dis_sup:1;
+        unsigned int :2;
+    } flds;
+};
 
 /* Status Register*/
 #define IOMMU_STATUS_MMIO_OFFSET		0x2020
--- a/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
+++ b/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
@@ -219,13 +219,6 @@ static inline int iommu_has_cap(struct a
     return !!(iommu->cap.header & (1u << bit));
 }
 
-static inline int amd_iommu_has_feature(struct amd_iommu *iommu, uint32_t bit)
-{
-    if ( !iommu_has_cap(iommu, PCI_CAP_EFRSUP_SHIFT) )
-        return 0;
-    return !!(iommu->features & (1U << bit));
-}
-
 /* access tail or head pointer of ring buffer */
 static inline uint32_t iommu_get_rb_pointer(uint32_t reg)
 {
AMD/IOMMU: use bit field for control register

Also introduce a field in struct amd_iommu caching the most recently
written control register. All writes should now happen exclusively from
that cached value, such that it is guaranteed to be up to date.

Take the opportunity and add further fields. Also convert a few boolean
function parameters to bool, such that use of !! can be avoided.

Because of there now being definitions beyond bit 31, writel() also gets
replaced by writeq() when updating hardware.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
v3: Switch boolean bitfields to bool.
v2: Add domain_id_pne field. Mention writel() -> writeq() change.

--- a/xen/drivers/passthrough/amd/iommu_guest.c
+++ b/xen/drivers/passthrough/amd/iommu_guest.c
@@ -317,7 +317,7 @@ static int do_invalidate_iotlb_pages(str
 
 static int do_completion_wait(struct domain *d, cmd_entry_t *cmd)
 {
-    bool_t com_wait_int_en, com_wait_int, i, s;
+    bool com_wait_int, i, s;
     struct guest_iommu *iommu;
     unsigned long gfn;
     p2m_type_t p2mt;
@@ -354,12 +354,10 @@ static int do_completion_wait(struct dom
         unmap_domain_page(vaddr);
     }
 
-    com_wait_int_en = iommu_get_bit(iommu->reg_ctrl.lo,
-                                    IOMMU_CONTROL_COMP_WAIT_INT_SHIFT);
     com_wait_int = iommu_get_bit(iommu->reg_status.lo,
                                  IOMMU_STATUS_COMP_WAIT_INT_SHIFT);
 
-    if ( com_wait_int_en && com_wait_int )
+    if ( iommu->reg_ctrl.com_wait_int_en && com_wait_int )
         guest_iommu_deliver_msi(d);
 
     return 0;
@@ -521,40 +519,17 @@ static void guest_iommu_process_command(
     return;
 }
 
-static int guest_iommu_write_ctrl(struct guest_iommu *iommu, uint64_t newctrl)
+static int guest_iommu_write_ctrl(struct guest_iommu *iommu, uint64_t val)
 {
-    bool_t cmd_en, event_en, iommu_en, ppr_en, ppr_log_en;
-    bool_t cmd_en_old, event_en_old, iommu_en_old;
-    bool_t cmd_run;
-
-    iommu_en = iommu_get_bit(newctrl,
-                             IOMMU_CONTROL_TRANSLATION_ENABLE_SHIFT);
-    iommu_en_old = iommu_get_bit(iommu->reg_ctrl.lo,
-                                 IOMMU_CONTROL_TRANSLATION_ENABLE_SHIFT);
-
-    cmd_en = iommu_get_bit(newctrl,
-                           IOMMU_CONTROL_COMMAND_BUFFER_ENABLE_SHIFT);
-    cmd_en_old = iommu_get_bit(iommu->reg_ctrl.lo,
-                               IOMMU_CONTROL_COMMAND_BUFFER_ENABLE_SHIFT);
-    cmd_run = iommu_get_bit(iommu->reg_status.lo,
-                            IOMMU_STATUS_CMD_BUFFER_RUN_SHIFT);
-    event_en = iommu_get_bit(newctrl,
-                             IOMMU_CONTROL_EVENT_LOG_ENABLE_SHIFT);
-    event_en_old = iommu_get_bit(iommu->reg_ctrl.lo,
-                                 IOMMU_CONTROL_EVENT_LOG_ENABLE_SHIFT);
-
-    ppr_en = iommu_get_bit(newctrl,
-                           IOMMU_CONTROL_PPR_ENABLE_SHIFT);
-    ppr_log_en = iommu_get_bit(newctrl,
-                               IOMMU_CONTROL_PPR_LOG_ENABLE_SHIFT);
+    union amd_iommu_control newctrl = { .raw = val };
 
-    if ( iommu_en )
+    if ( newctrl.iommu_en )
     {
         guest_iommu_enable(iommu);
         guest_iommu_enable_dev_table(iommu);
     }
 
-    if ( iommu_en && cmd_en )
+    if ( newctrl.iommu_en && newctrl.cmd_buf_en )
     {
         guest_iommu_enable_ring_buffer(iommu, &iommu->cmd_buffer,
                                        sizeof(cmd_entry_t));
@@ -562,7 +537,7 @@ static int guest_iommu_write_ctrl(struct
         tasklet_schedule(&iommu->cmd_buffer_tasklet);
     }
 
-    if ( iommu_en && event_en )
+    if ( newctrl.iommu_en && newctrl.event_log_en )
     {
         guest_iommu_enable_ring_buffer(iommu, &iommu->event_log,
                                        sizeof(event_entry_t));
@@ -570,7 +545,7 @@ static int guest_iommu_write_ctrl(struct
         guest_iommu_clear_status(iommu, IOMMU_STATUS_EVENT_OVERFLOW_SHIFT);
     }
 
-    if ( iommu_en && ppr_en && ppr_log_en )
+    if ( newctrl.iommu_en && newctrl.ppr_en && newctrl.ppr_log_en )
     {
         guest_iommu_enable_ring_buffer(iommu, &iommu->ppr_log,
                                        sizeof(ppr_entry_t));
@@ -578,19 +553,21 @@ static int guest_iommu_write_ctrl(struct
         guest_iommu_clear_status(iommu, IOMMU_STATUS_PPR_LOG_OVERFLOW_SHIFT);
     }
 
-    if ( iommu_en && cmd_en_old && !cmd_en )
+    if ( newctrl.iommu_en && iommu->reg_ctrl.cmd_buf_en &&
+         !newctrl.cmd_buf_en )
     {
         /* Disable iommu command processing */
         tasklet_kill(&iommu->cmd_buffer_tasklet);
     }
 
-    if ( event_en_old && !event_en )
+    if ( iommu->reg_ctrl.event_log_en && !newctrl.event_log_en )
         guest_iommu_clear_status(iommu, IOMMU_STATUS_EVENT_LOG_RUN_SHIFT);
 
-    if ( iommu_en_old && !iommu_en )
+    if ( iommu->reg_ctrl.iommu_en && !newctrl.iommu_en )
         guest_iommu_disable(iommu);
 
-    u64_to_reg(&iommu->reg_ctrl, newctrl);
+    iommu->reg_ctrl = newctrl;
+
     return 0;
 }
 
@@ -632,7 +609,7 @@ static uint64_t iommu_mmio_read64(struct
         val = reg_to_u64(iommu->ppr_log.reg_tail);
         break;
     case IOMMU_CONTROL_MMIO_OFFSET:
-        val = reg_to_u64(iommu->reg_ctrl);
+        val = iommu->reg_ctrl.raw;
         break;
     case IOMMU_STATUS_MMIO_OFFSET:
         val = reg_to_u64(iommu->reg_status);
--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -41,7 +41,7 @@ LIST_HEAD_READ_MOSTLY(amd_iommu_head);
 struct table_struct device_table;
 bool_t iommuv2_enabled;
 
-static int iommu_has_ht_flag(struct amd_iommu *iommu, u8 mask)
+static bool iommu_has_ht_flag(struct amd_iommu *iommu, u8 mask)
 {
     return iommu->ht_flags & mask;
 }
@@ -69,31 +69,18 @@ static void __init unmap_iommu_mmio_regi
 
 static void set_iommu_ht_flags(struct amd_iommu *iommu)
 {
-    u32 entry;
-    entry = readl(iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
-
     /* Setup HT flags */
     if ( iommu_has_cap(iommu, PCI_CAP_HT_TUNNEL_SHIFT) )
-        iommu_has_ht_flag(iommu, ACPI_IVHD_TT_ENABLE) ?
-            iommu_set_bit(&entry, IOMMU_CONTROL_HT_TUNNEL_TRANSLATION_SHIFT) :
-            iommu_clear_bit(&entry, IOMMU_CONTROL_HT_TUNNEL_TRANSLATION_SHIFT);
-
-    iommu_has_ht_flag(iommu, ACPI_IVHD_RES_PASS_PW) ?
-        iommu_set_bit(&entry, IOMMU_CONTROL_RESP_PASS_POSTED_WRITE_SHIFT):
-        iommu_clear_bit(&entry, IOMMU_CONTROL_RESP_PASS_POSTED_WRITE_SHIFT);
-
-    iommu_has_ht_flag(iommu, ACPI_IVHD_ISOC) ?
-        iommu_set_bit(&entry, IOMMU_CONTROL_ISOCHRONOUS_SHIFT):
-        iommu_clear_bit(&entry, IOMMU_CONTROL_ISOCHRONOUS_SHIFT);
-
-    iommu_has_ht_flag(iommu, ACPI_IVHD_PASS_PW) ?
-        iommu_set_bit(&entry, IOMMU_CONTROL_PASS_POSTED_WRITE_SHIFT):
-        iommu_clear_bit(&entry, IOMMU_CONTROL_PASS_POSTED_WRITE_SHIFT);
+        iommu->ctrl.ht_tun_en = iommu_has_ht_flag(iommu, ACPI_IVHD_TT_ENABLE);
+
+    iommu->ctrl.pass_pw     = iommu_has_ht_flag(iommu, ACPI_IVHD_PASS_PW);
+    iommu->ctrl.res_pass_pw = iommu_has_ht_flag(iommu, ACPI_IVHD_RES_PASS_PW);
+    iommu->ctrl.isoc        = iommu_has_ht_flag(iommu, ACPI_IVHD_ISOC);
 
     /* Force coherent */
-    iommu_set_bit(&entry, IOMMU_CONTROL_COHERENT_SHIFT);
+    iommu->ctrl.coherent = true;
 
-    writel(entry, iommu->mmio_base+IOMMU_CONTROL_MMIO_OFFSET);
+    writeq(iommu->ctrl.raw, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
 }
 
 static void register_iommu_dev_table_in_mmio_space(struct amd_iommu *iommu)
@@ -205,55 +192,37 @@ static void register_iommu_ppr_log_in_mm
 
 
 static void set_iommu_translation_control(struct amd_iommu *iommu,
-                                                 int enable)
+                                          bool enable)
 {
-    u32 entry;
+    iommu->ctrl.iommu_en = enable;
 
-    entry = readl(iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
-
-    enable ?
-        iommu_set_bit(&entry, IOMMU_CONTROL_TRANSLATION_ENABLE_SHIFT) :
-        iommu_clear_bit(&entry, IOMMU_CONTROL_TRANSLATION_ENABLE_SHIFT);
-
-    writel(entry, iommu->mmio_base+IOMMU_CONTROL_MMIO_OFFSET);
+    writeq(iommu->ctrl.raw, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
 }
 
 static void set_iommu_guest_translation_control(struct amd_iommu *iommu,
-                                                int enable)
+                                                bool enable)
 {
-    u32 entry;
-
-    entry = readl(iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
+    iommu->ctrl.gt_en = enable;
 
-    enable ?
-        iommu_set_bit(&entry, IOMMU_CONTROL_GT_ENABLE_SHIFT) :
-        iommu_clear_bit(&entry, IOMMU_CONTROL_GT_ENABLE_SHIFT);
-
-    writel(entry, iommu->mmio_base+IOMMU_CONTROL_MMIO_OFFSET);
+    writeq(iommu->ctrl.raw, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
 
     if ( enable )
         AMD_IOMMU_DEBUG("Guest Translation Enabled.\n");
 }
 
 static void set_iommu_command_buffer_control(struct amd_iommu *iommu,
-                                                    int enable)
+                                             bool enable)
 {
-    u32 entry;
-
-    entry = readl(iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
-
-    /*reset head and tail pointer manually before enablement */
+    /* Reset head and tail pointer manually before enablement */
     if ( enable )
     {
         writeq(0, iommu->mmio_base + IOMMU_CMD_BUFFER_HEAD_OFFSET);
         writeq(0, iommu->mmio_base + IOMMU_CMD_BUFFER_TAIL_OFFSET);
-
-        iommu_set_bit(&entry, IOMMU_CONTROL_COMMAND_BUFFER_ENABLE_SHIFT);
     }
-    else
-        iommu_clear_bit(&entry, IOMMU_CONTROL_COMMAND_BUFFER_ENABLE_SHIFT);
 
-    writel(entry, iommu->mmio_base+IOMMU_CONTROL_MMIO_OFFSET);
+    iommu->ctrl.cmd_buf_en = enable;
+
+    writeq(iommu->ctrl.raw, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
 }
 
 static void register_iommu_exclusion_range(struct amd_iommu *iommu)
@@ -295,57 +264,38 @@ static void register_iommu_exclusion_ran
 }
 
 static void set_iommu_event_log_control(struct amd_iommu *iommu,
-            int enable)
+                                        bool enable)
 {
-    u32 entry;
-
-    entry = readl(iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
-
-    /*reset head and tail pointer manually before enablement */
+    /* Reset head and tail pointer manually before enablement */
     if ( enable )
     {
         writeq(0, iommu->mmio_base + IOMMU_EVENT_LOG_HEAD_OFFSET);
         writeq(0, iommu->mmio_base + IOMMU_EVENT_LOG_TAIL_OFFSET);
-
-        iommu_set_bit(&entry, IOMMU_CONTROL_EVENT_LOG_INT_SHIFT);
-        iommu_set_bit(&entry, IOMMU_CONTROL_EVENT_LOG_ENABLE_SHIFT);
-    }
-    else
-    {
-        iommu_clear_bit(&entry, IOMMU_CONTROL_EVENT_LOG_INT_SHIFT);
-        iommu_clear_bit(&entry, IOMMU_CONTROL_EVENT_LOG_ENABLE_SHIFT);
     }
 
-    iommu_clear_bit(&entry, IOMMU_CONTROL_COMP_WAIT_INT_SHIFT);
+    iommu->ctrl.event_int_en = enable;
+    iommu->ctrl.event_log_en = enable;
+    iommu->ctrl.com_wait_int_en = false;
 
-    writel(entry, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
+    writeq(iommu->ctrl.raw, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
 }
 
 static void set_iommu_ppr_log_control(struct amd_iommu *iommu,
-                                      int enable)
+                                      bool enable)
 {
-    u32 entry;
-
-    entry = readl(iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
-
-    /*reset head and tail pointer manually before enablement */
+    /* Reset head and tail pointer manually before enablement */
     if ( enable )
     {
         writeq(0, iommu->mmio_base + IOMMU_PPR_LOG_HEAD_OFFSET);
         writeq(0, iommu->mmio_base + IOMMU_PPR_LOG_TAIL_OFFSET);
-
-        iommu_set_bit(&entry, IOMMU_CONTROL_PPR_ENABLE_SHIFT);
-        iommu_set_bit(&entry, IOMMU_CONTROL_PPR_LOG_INT_SHIFT);
-        iommu_set_bit(&entry, IOMMU_CONTROL_PPR_LOG_ENABLE_SHIFT);
-    }
-    else
-    {
-        iommu_clear_bit(&entry, IOMMU_CONTROL_PPR_ENABLE_SHIFT);
-        iommu_clear_bit(&entry, IOMMU_CONTROL_PPR_LOG_INT_SHIFT);
-        iommu_clear_bit(&entry, IOMMU_CONTROL_PPR_LOG_ENABLE_SHIFT);
     }
 
-    writel(entry, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
+    iommu->ctrl.ppr_en = enable;
+    iommu->ctrl.ppr_int_en = enable;
+    iommu->ctrl.ppr_log_en = enable;
+
+    writeq(iommu->ctrl.raw, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
+
     if ( enable )
         AMD_IOMMU_DEBUG("PPR Log Enabled.\n");
 }
@@ -398,7 +348,7 @@ static int iommu_read_log(struct amd_iom
 /* reset event log or ppr log when overflow */
 static void iommu_reset_log(struct amd_iommu *iommu,
                             struct ring_buffer *log,
-                            void (*ctrl_func)(struct amd_iommu *iommu, int))
+                            void (*ctrl_func)(struct amd_iommu *iommu, bool))
 {
     u32 entry;
     int log_run, run_bit;
@@ -615,11 +565,11 @@ static void iommu_check_event_log(struct
         iommu_reset_log(iommu, &iommu->event_log, set_iommu_event_log_control);
     else
     {
-        entry = readl(iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
-        if ( !(entry & IOMMU_CONTROL_EVENT_LOG_INT_MASK) )
+        if ( !iommu->ctrl.event_int_en )
         {
-            entry |= IOMMU_CONTROL_EVENT_LOG_INT_MASK;
-            writel(entry, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
+            iommu->ctrl.event_int_en = true;
+            writeq(iommu->ctrl.raw,
+                   iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
             /*
              * Re-schedule the tasklet to handle eventual log entries added
              * between reading the log above and re-enabling the interrupt.
@@ -704,11 +654,11 @@ static void iommu_check_ppr_log(struct a
         iommu_reset_log(iommu, &iommu->ppr_log, set_iommu_ppr_log_control);
     else
     {
-        entry = readl(iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
-        if ( !(entry & IOMMU_CONTROL_PPR_LOG_INT_MASK) )
+        if ( !iommu->ctrl.ppr_int_en )
         {
-            entry |= IOMMU_CONTROL_PPR_LOG_INT_MASK;
-            writel(entry, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
+            iommu->ctrl.ppr_int_en = true;
+            writeq(iommu->ctrl.raw,
+                   iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
             /*
              * Re-schedule the tasklet to handle eventual log entries added
              * between reading the log above and re-enabling the interrupt.
@@ -754,7 +704,6 @@ static void do_amd_iommu_irq(unsigned lo
 static void iommu_interrupt_handler(int irq, void *dev_id,
                                     struct cpu_user_regs *regs)
 {
-    u32 entry;
     unsigned long flags;
     struct amd_iommu *iommu = dev_id;
 
@@ -764,10 +713,9 @@ static void iommu_interrupt_handler(int
      * Silence interrupts from both event and PPR by clearing the
      * enable logging bits in the control register
      */
-    entry = readl(iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
-    iommu_clear_bit(&entry, IOMMU_CONTROL_EVENT_LOG_INT_SHIFT);
-    iommu_clear_bit(&entry, IOMMU_CONTROL_PPR_LOG_INT_SHIFT);
-    writel(entry, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
+    iommu->ctrl.event_int_en = false;
+    iommu->ctrl.ppr_int_en = false;
+    writeq(iommu->ctrl.raw, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
 
     spin_unlock_irqrestore(&iommu->lock, flags);
 
--- a/xen/include/asm-x86/amd-iommu.h
+++ b/xen/include/asm-x86/amd-iommu.h
@@ -88,6 +88,8 @@ struct amd_iommu {
     void *mmio_base;
     unsigned long mmio_base_phys;
 
+    union amd_iommu_control ctrl;
+
     struct table_struct dev_table;
     struct ring_buffer cmd_buffer;
     struct ring_buffer event_log;
@@ -172,7 +174,7 @@ struct guest_iommu {
     uint64_t                mmio_base;             /* MMIO base address */
 
     /* MMIO regs */
-    struct mmio_reg         reg_ctrl;              /* MMIO offset 0018h */
+    union amd_iommu_control reg_ctrl;              /* MMIO offset 0018h */
     struct mmio_reg         reg_status;            /* MMIO offset 2020h */
     union amd_iommu_ext_features reg_ext_feature;  /* MMIO offset 0030h */
 
--- a/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
+++ b/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
@@ -295,38 +295,56 @@ struct amd_iommu_dte {
 
 /* Control Register */
 #define IOMMU_CONTROL_MMIO_OFFSET			0x18
-#define IOMMU_CONTROL_TRANSLATION_ENABLE_MASK		0x00000001
-#define IOMMU_CONTROL_TRANSLATION_ENABLE_SHIFT		0
-#define IOMMU_CONTROL_HT_TUNNEL_TRANSLATION_MASK	0x00000002
-#define IOMMU_CONTROL_HT_TUNNEL_TRANSLATION_SHIFT	1
-#define IOMMU_CONTROL_EVENT_LOG_ENABLE_MASK		0x00000004
-#define IOMMU_CONTROL_EVENT_LOG_ENABLE_SHIFT		2
-#define IOMMU_CONTROL_EVENT_LOG_INT_MASK		0x00000008
-#define IOMMU_CONTROL_EVENT_LOG_INT_SHIFT		3
-#define IOMMU_CONTROL_COMP_WAIT_INT_MASK		0x00000010
-#define IOMMU_CONTROL_COMP_WAIT_INT_SHIFT		4
-#define IOMMU_CONTROL_INVALIDATION_TIMEOUT_MASK		0x000000E0
-#define IOMMU_CONTROL_INVALIDATION_TIMEOUT_SHIFT	5
-#define IOMMU_CONTROL_PASS_POSTED_WRITE_MASK		0x00000100
-#define IOMMU_CONTROL_PASS_POSTED_WRITE_SHIFT		8
-#define IOMMU_CONTROL_RESP_PASS_POSTED_WRITE_MASK	0x00000200
-#define IOMMU_CONTROL_RESP_PASS_POSTED_WRITE_SHIFT	9
-#define IOMMU_CONTROL_COHERENT_MASK			0x00000400
-#define IOMMU_CONTROL_COHERENT_SHIFT			10
-#define IOMMU_CONTROL_ISOCHRONOUS_MASK			0x00000800
-#define IOMMU_CONTROL_ISOCHRONOUS_SHIFT			11
-#define IOMMU_CONTROL_COMMAND_BUFFER_ENABLE_MASK	0x00001000
-#define IOMMU_CONTROL_COMMAND_BUFFER_ENABLE_SHIFT	12
-#define IOMMU_CONTROL_PPR_LOG_ENABLE_MASK		0x00002000
-#define IOMMU_CONTROL_PPR_LOG_ENABLE_SHIFT		13
-#define IOMMU_CONTROL_PPR_LOG_INT_MASK			0x00004000
-#define IOMMU_CONTROL_PPR_LOG_INT_SHIFT			14
-#define IOMMU_CONTROL_PPR_ENABLE_MASK			0x00008000
-#define IOMMU_CONTROL_PPR_ENABLE_SHIFT			15
-#define IOMMU_CONTROL_GT_ENABLE_MASK			0x00010000
-#define IOMMU_CONTROL_GT_ENABLE_SHIFT			16
-#define IOMMU_CONTROL_RESTART_MASK			0x80000000
-#define IOMMU_CONTROL_RESTART_SHIFT			31
+
+union amd_iommu_control {
+    uint64_t raw;
+    struct {
+        bool iommu_en:1;
+        bool ht_tun_en:1;
+        bool event_log_en:1;
+        bool event_int_en:1;
+        bool com_wait_int_en:1;
+        unsigned int inv_timeout:3;
+        bool pass_pw:1;
+        bool res_pass_pw:1;
+        bool coherent:1;
+        bool isoc:1;
+        bool cmd_buf_en:1;
+        bool ppr_log_en:1;
+        bool ppr_int_en:1;
+        bool ppr_en:1;
+        bool gt_en:1;
+        bool ga_en:1;
+        unsigned int crw:4;
+        bool smif_en:1;
+        bool slf_wb_dis:1;
+        bool smif_log_en:1;
+        unsigned int gam_en:3;
+        bool ga_log_en:1;
+        bool ga_int_en:1;
+        unsigned int dual_ppr_log_en:2;
+        unsigned int dual_event_log_en:2;
+        unsigned int dev_tbl_seg_en:3;
+        unsigned int priv_abrt_en:2;
+        bool ppr_auto_rsp_en:1;
+        bool marc_en:1;
+        bool blk_stop_mrk_en:1;
+        bool ppr_auto_rsp_aon:1;
+        bool domain_id_pne:1;
+        unsigned int :1;
+        bool eph_en:1;
+        unsigned int had_update:2;
+        bool gd_update_dis:1;
+        unsigned int :1;
+        bool xt_en:1;
+        bool int_cap_xt_en:1;
+        bool vcmd_en:1;
+        bool viommu_en:1;
+        bool ga_update_dis:1;
+        bool gappi_en:1;
+        unsigned int :8;
+    };
+};
 
 /* Exclusion Register */
 #define IOMMU_EXCLUSION_BASE_LOW_OFFSET		0x20
AMD/IOMMU: use bit field for IRTE

At the same time restrict its scope to just the single source file
actually using it, and abstract accesses by introducing a union of
pointers. (A union of the actual table entries is not used to make it
impossible to [wrongly, once the 128-bit form gets added] perform
pointer arithmetic / array accesses on derived types.)

Also move away from updating the entries piecemeal: Construct a full new
entry, and write it out.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v3: Switch boolean bitfields to bool.
v2: name {get,free}_intremap_entry()'s last parameter "index" instead of
    "offset". Introduce union irte32.

--- a/xen/drivers/passthrough/amd/iommu_intr.c
+++ b/xen/drivers/passthrough/amd/iommu_intr.c
@@ -23,6 +23,28 @@
 #include <asm/io_apic.h>
 #include <xen/keyhandler.h>
 
+struct irte_basic {
+    bool remap_en:1;
+    bool sup_io_pf:1;
+    unsigned int int_type:3;
+    bool rq_eoi:1;
+    bool dm:1;
+    bool guest_mode:1; /* MBZ */
+    unsigned int dest:8;
+    unsigned int vector:8;
+    unsigned int :8;
+};
+
+union irte32 {
+    uint32_t raw[1];
+    struct irte_basic basic;
+};
+
+union irte_ptr {
+    void *ptr;
+    union irte32 *ptr32;
+};
+
 #define INTREMAP_TABLE_ORDER    1
 #define INTREMAP_LENGTH 0xB
 #define INTREMAP_ENTRIES (1 << INTREMAP_LENGTH)
@@ -101,47 +123,44 @@ static unsigned int alloc_intremap_entry
     return slot;
 }
 
-static u32 *get_intremap_entry(int seg, int bdf, int offset)
+static union irte_ptr get_intremap_entry(unsigned int seg, unsigned int bdf,
+                                         unsigned int index)
 {
-    u32 *table = get_ivrs_mappings(seg)[bdf].intremap_table;
+    union irte_ptr table = {
+        .ptr = get_ivrs_mappings(seg)[bdf].intremap_table
+    };
+
+    ASSERT(table.ptr && (index < INTREMAP_ENTRIES));
 
-    ASSERT( (table != NULL) && (offset < INTREMAP_ENTRIES) );
+    table.ptr32 += index;
 
-    return table + offset;
+    return table;
 }
 
-static void free_intremap_entry(int seg, int bdf, int offset)
-{
-    u32 *entry = get_intremap_entry(seg, bdf, offset);
-
-    memset(entry, 0, sizeof(u32));
-    __clear_bit(offset, get_ivrs_mappings(seg)[bdf].intremap_inuse);
-}
-
-static void update_intremap_entry(u32* entry, u8 vector, u8 int_type,
-    u8 dest_mode, u8 dest)
-{
-    set_field_in_reg_u32(IOMMU_CONTROL_ENABLED, 0,
-                            INT_REMAP_ENTRY_REMAPEN_MASK,
-                            INT_REMAP_ENTRY_REMAPEN_SHIFT, entry);
-    set_field_in_reg_u32(IOMMU_CONTROL_DISABLED, *entry,
-                            INT_REMAP_ENTRY_SUPIOPF_MASK,
-                            INT_REMAP_ENTRY_SUPIOPF_SHIFT, entry);
-    set_field_in_reg_u32(int_type, *entry,
-                            INT_REMAP_ENTRY_INTTYPE_MASK,
-                            INT_REMAP_ENTRY_INTTYPE_SHIFT, entry);
-    set_field_in_reg_u32(IOMMU_CONTROL_DISABLED, *entry,
-                            INT_REMAP_ENTRY_REQEOI_MASK,
-                            INT_REMAP_ENTRY_REQEOI_SHIFT, entry);
-    set_field_in_reg_u32((u32)dest_mode, *entry,
-                            INT_REMAP_ENTRY_DM_MASK,
-                            INT_REMAP_ENTRY_DM_SHIFT, entry);
-    set_field_in_reg_u32((u32)dest, *entry,
-                            INT_REMAP_ENTRY_DEST_MAST,
-                            INT_REMAP_ENTRY_DEST_SHIFT, entry);
-    set_field_in_reg_u32((u32)vector, *entry,
-                            INT_REMAP_ENTRY_VECTOR_MASK,
-                            INT_REMAP_ENTRY_VECTOR_SHIFT, entry);
+static void free_intremap_entry(unsigned int seg, unsigned int bdf,
+                                unsigned int index)
+{
+    union irte_ptr entry = get_intremap_entry(seg, bdf, index);
+
+    ACCESS_ONCE(entry.ptr32->raw[0]) = 0;
+
+    __clear_bit(index, get_ivrs_mappings(seg)[bdf].intremap_inuse);
+}
+
+static void update_intremap_entry(union irte_ptr entry, unsigned int vector,
+                                  unsigned int int_type,
+                                  unsigned int dest_mode, unsigned int dest)
+{
+    struct irte_basic basic = {
+        .remap_en = true,
+        .int_type = int_type,
+        .dm = dest_mode,
+        .dest = dest,
+        .vector = vector,
+    };
+
+    ACCESS_ONCE(entry.ptr32->raw[0]) =
+        container_of(&basic, union irte32, basic)->raw[0];
 }
 
 static inline int get_rte_index(const struct IO_APIC_route_entry *rte)
@@ -163,7 +182,7 @@ static int update_intremap_entry_from_io
     u16 *index)
 {
     unsigned long flags;
-    u32* entry;
+    union irte_ptr entry;
     u8 delivery_mode, dest, vector, dest_mode;
     int req_id;
     spinlock_t *lock;
@@ -201,12 +220,8 @@ static int update_intremap_entry_from_io
          * so need to recover vector and delivery mode from IRTE.
          */
         ASSERT(get_rte_index(rte) == offset);
-        vector = get_field_from_reg_u32(*entry,
-                                        INT_REMAP_ENTRY_VECTOR_MASK,
-                                        INT_REMAP_ENTRY_VECTOR_SHIFT);
-        delivery_mode = get_field_from_reg_u32(*entry,
-                                               INT_REMAP_ENTRY_INTTYPE_MASK,
-                                               INT_REMAP_ENTRY_INTTYPE_SHIFT);
+        vector = entry.ptr32->basic.vector;
+        delivery_mode = entry.ptr32->basic.int_type;
     }
     update_intremap_entry(entry, vector, delivery_mode, dest_mode, dest);
 
@@ -228,7 +243,7 @@ int __init amd_iommu_setup_ioapic_remapp
 {
     struct IO_APIC_route_entry rte;
     unsigned long flags;
-    u32* entry;
+    union irte_ptr entry;
     int apic, pin;
     u8 delivery_mode, dest, vector, dest_mode;
     u16 seg, bdf, req_id;
@@ -407,16 +422,14 @@ unsigned int amd_iommu_read_ioapic_from_
         u16 bdf = ioapic_sbdf[idx].bdf;
         u16 seg = ioapic_sbdf[idx].seg;
         u16 req_id = get_intremap_requestor_id(seg, bdf);
-        const u32 *entry = get_intremap_entry(seg, req_id, offset);
+        union irte_ptr entry = get_intremap_entry(seg, req_id, offset);
 
         ASSERT(offset == (val & (INTREMAP_ENTRIES - 1)));
         val &= ~(INTREMAP_ENTRIES - 1);
-        val |= get_field_from_reg_u32(*entry,
-                                      INT_REMAP_ENTRY_INTTYPE_MASK,
-                                      INT_REMAP_ENTRY_INTTYPE_SHIFT) << 8;
-        val |= get_field_from_reg_u32(*entry,
-                                      INT_REMAP_ENTRY_VECTOR_MASK,
-                                      INT_REMAP_ENTRY_VECTOR_SHIFT);
+        val |= MASK_INSR(entry.ptr32->basic.int_type,
+                         IO_APIC_REDIR_DELIV_MODE_MASK);
+        val |= MASK_INSR(entry.ptr32->basic.vector,
+                         IO_APIC_REDIR_VECTOR_MASK);
     }
 
     return val;
@@ -427,7 +440,7 @@ static int update_intremap_entry_from_ms
     int *remap_index, const struct msi_msg *msg, u32 *data)
 {
     unsigned long flags;
-    u32* entry;
+    union irte_ptr entry;
     u16 req_id, alias_id;
     u8 delivery_mode, dest, vector, dest_mode;
     spinlock_t *lock;
@@ -581,7 +594,7 @@ void amd_iommu_read_msi_from_ire(
     const struct pci_dev *pdev = msi_desc->dev;
     u16 bdf = pdev ? PCI_BDF2(pdev->bus, pdev->devfn) : hpet_sbdf.bdf;
     u16 seg = pdev ? pdev->seg : hpet_sbdf.seg;
-    const u32 *entry;
+    union irte_ptr entry;
 
     if ( IS_ERR_OR_NULL(_find_iommu_for_device(seg, bdf)) )
         return;
@@ -597,12 +610,10 @@ void amd_iommu_read_msi_from_ire(
     }
 
     msg->data &= ~(INTREMAP_ENTRIES - 1);
-    msg->data |= get_field_from_reg_u32(*entry,
-                                        INT_REMAP_ENTRY_INTTYPE_MASK,
-                                        INT_REMAP_ENTRY_INTTYPE_SHIFT) << 8;
-    msg->data |= get_field_from_reg_u32(*entry,
-                                        INT_REMAP_ENTRY_VECTOR_MASK,
-                                        INT_REMAP_ENTRY_VECTOR_SHIFT);
+    msg->data |= MASK_INSR(entry.ptr32->basic.int_type,
+                           MSI_DATA_DELIVERY_MODE_MASK);
+    msg->data |= MASK_INSR(entry.ptr32->basic.vector,
+                           MSI_DATA_VECTOR_MASK);
 }
 
 int __init amd_iommu_free_intremap_table(
--- a/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
+++ b/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
@@ -469,22 +469,6 @@ struct amd_iommu_pte {
 #define IOMMU_CONTROL_DISABLED	0
 #define IOMMU_CONTROL_ENABLED	1
 
-/* interrupt remapping table */
-#define INT_REMAP_ENTRY_REMAPEN_MASK    0x00000001
-#define INT_REMAP_ENTRY_REMAPEN_SHIFT   0
-#define INT_REMAP_ENTRY_SUPIOPF_MASK    0x00000002
-#define INT_REMAP_ENTRY_SUPIOPF_SHIFT   1
-#define INT_REMAP_ENTRY_INTTYPE_MASK    0x0000001C
-#define INT_REMAP_ENTRY_INTTYPE_SHIFT   2
-#define INT_REMAP_ENTRY_REQEOI_MASK     0x00000020
-#define INT_REMAP_ENTRY_REQEOI_SHIFT    5
-#define INT_REMAP_ENTRY_DM_MASK         0x00000040
-#define INT_REMAP_ENTRY_DM_SHIFT        6
-#define INT_REMAP_ENTRY_DEST_MAST       0x0000FF00
-#define INT_REMAP_ENTRY_DEST_SHIFT      8
-#define INT_REMAP_ENTRY_VECTOR_MASK     0x00FF0000
-#define INT_REMAP_ENTRY_VECTOR_SHIFT    16
-
 #define INV_IOMMU_ALL_PAGES_ADDRESS      ((1ULL << 63) - 1)
 
 #define IOMMU_RING_BUFFER_PTR_MASK                  0x0007FFF0
AMD/IOMMU: pass IOMMU to iterate_ivrs_entries() callback

Both users will want to know IOMMU properties (specifically the IRTE
size) subsequently. Leverage this to avoid pointless calls to the
callback when IVRS mapping table entries are unpopulated. To avoid
leaking interrupt remapping tables (bogusly) allocated for IOMMUs
themselves, this requires suppressing their allocation in the first
place, taking a step further what commit 757122c0cf ('AMD/IOMMU: don't
"add" IOMMUs') had done.

Additionally suppress the call for alias entries, as again both users
don't care about these anyway. In fact this eliminates a fair bit of
redundancy from dump output.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v3: New.
---
TBD: Along the lines of avoiding the IRT allocation for the IOMMUs, is
     there a way to recognize the many CPU-provided devices many of
     which can't generate interrupts anyway, and avoid allocations for
     them as well? It's 32k per device, after all. Another option might
     be on-demand allocation of the tables, but quite possibly we'd get
     into trouble with error handling there.

--- a/xen/drivers/passthrough/amd/iommu_acpi.c
+++ b/xen/drivers/passthrough/amd/iommu_acpi.c
@@ -65,7 +65,11 @@ static void __init add_ivrs_mapping_entr
     /* override flags for range of devices */
     ivrs_mappings[bdf].device_flags = flags;
 
-    if (ivrs_mappings[alias_id].intremap_table == NULL )
+    /* Don't map an IOMMU by itself. */
+    if ( iommu->bdf == bdf )
+        return;
+
+    if ( !ivrs_mappings[alias_id].intremap_table )
     {
          /* allocate per-device interrupt remapping table */
          if ( amd_iommu_perdev_intremap )
@@ -81,8 +85,9 @@ static void __init add_ivrs_mapping_entr
              ivrs_mappings[alias_id].intremap_inuse = shared_intremap_inuse;
          }
     }
-    /* Assign IOMMU hardware, but don't map an IOMMU by itself. */
-    ivrs_mappings[bdf].iommu = iommu->bdf != bdf ? iommu : NULL;
+
+    /* Assign IOMMU hardware. */
+    ivrs_mappings[bdf].iommu = iommu;
 }
 
 static struct amd_iommu * __init find_iommu_from_bdf_cap(
--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -1069,7 +1069,8 @@ int iterate_ivrs_mappings(int (*handler)
     return rc;
 }
 
-int iterate_ivrs_entries(int (*handler)(u16 seg, struct ivrs_mappings *))
+int iterate_ivrs_entries(int (*handler)(const struct amd_iommu *,
+                                        struct ivrs_mappings *))
 {
     u16 seg = 0;
     int rc = 0;
@@ -1082,7 +1083,12 @@ int iterate_ivrs_entries(int (*handler)(
             break;
         seg = IVRS_MAPPINGS_SEG(map);
         for ( bdf = 0; !rc && bdf < ivrs_bdf_entries; ++bdf )
-            rc = handler(seg, map + bdf);
+        {
+            const struct amd_iommu *iommu = map[bdf].iommu;
+
+            if ( iommu && map[bdf].dte_requestor_id == bdf )
+                rc = handler(iommu, &map[bdf]);
+        }
     } while ( !rc && ++seg );
 
     return rc;
--- a/xen/drivers/passthrough/amd/iommu_intr.c
+++ b/xen/drivers/passthrough/amd/iommu_intr.c
@@ -617,7 +617,7 @@ void amd_iommu_read_msi_from_ire(
 }
 
 int __init amd_iommu_free_intremap_table(
-    u16 seg, struct ivrs_mappings *ivrs_mapping)
+    const struct amd_iommu *iommu, struct ivrs_mappings *ivrs_mapping)
 {
     void *tb = ivrs_mapping->intremap_table;
 
@@ -693,14 +693,15 @@ static void dump_intremap_table(const u3
     }
 }
 
-static int dump_intremap_mapping(u16 seg, struct ivrs_mappings *ivrs_mapping)
+static int dump_intremap_mapping(const struct amd_iommu *iommu,
+                                 struct ivrs_mappings *ivrs_mapping)
 {
     unsigned long flags;
 
     if ( !ivrs_mapping )
         return 0;
 
-    printk("  %04x:%02x:%02x:%u:\n", seg,
+    printk("  %04x:%02x:%02x:%u:\n", iommu->seg,
            PCI_BUS(ivrs_mapping->dte_requestor_id),
            PCI_SLOT(ivrs_mapping->dte_requestor_id),
            PCI_FUNC(ivrs_mapping->dte_requestor_id));
--- a/xen/include/asm-x86/amd-iommu.h
+++ b/xen/include/asm-x86/amd-iommu.h
@@ -129,7 +129,8 @@ extern u8 ivhd_type;
 
 struct ivrs_mappings *get_ivrs_mappings(u16 seg);
 int iterate_ivrs_mappings(int (*)(u16 seg, struct ivrs_mappings *));
-int iterate_ivrs_entries(int (*)(u16 seg, struct ivrs_mappings *));
+int iterate_ivrs_entries(int (*)(const struct amd_iommu *,
+                                 struct ivrs_mappings *));
 
 /* iommu tables in guest space */
 struct mmio_reg {
--- a/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
+++ b/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
@@ -98,7 +98,8 @@ struct amd_iommu *find_iommu_for_device(
 /* interrupt remapping */
 int amd_iommu_setup_ioapic_remapping(void);
 void *amd_iommu_alloc_intremap_table(unsigned long **);
-int amd_iommu_free_intremap_table(u16 seg, struct ivrs_mappings *);
+int amd_iommu_free_intremap_table(
+    const struct amd_iommu *, struct ivrs_mappings *);
 void amd_iommu_ioapic_update_ire(
     unsigned int apic, unsigned int reg, unsigned int value);
 unsigned int amd_iommu_read_ioapic_from_ire(
AMD/IOMMU: pass IOMMU to amd_iommu_alloc_intremap_table()

The function will want to know IOMMU properties (specifically the IRTE
size) subsequently.

Correct indentation of one of the call sites at this occasion.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v3: New.

--- a/xen/drivers/passthrough/amd/iommu_acpi.c
+++ b/xen/drivers/passthrough/amd/iommu_acpi.c
@@ -74,12 +74,14 @@ static void __init add_ivrs_mapping_entr
          /* allocate per-device interrupt remapping table */
          if ( amd_iommu_perdev_intremap )
              ivrs_mappings[alias_id].intremap_table =
-                amd_iommu_alloc_intremap_table(
-                    &ivrs_mappings[alias_id].intremap_inuse);
+                 amd_iommu_alloc_intremap_table(
+                     iommu,
+                     &ivrs_mappings[alias_id].intremap_inuse);
          else
          {
              if ( shared_intremap_table == NULL  )
                  shared_intremap_table = amd_iommu_alloc_intremap_table(
+                     iommu,
                      &shared_intremap_inuse);
              ivrs_mappings[alias_id].intremap_table = shared_intremap_table;
              ivrs_mappings[alias_id].intremap_inuse = shared_intremap_inuse;
--- a/xen/drivers/passthrough/amd/iommu_intr.c
+++ b/xen/drivers/passthrough/amd/iommu_intr.c
@@ -632,7 +632,8 @@ int __init amd_iommu_free_intremap_table
     return 0;
 }
 
-void* __init amd_iommu_alloc_intremap_table(unsigned long **inuse_map)
+void *__init amd_iommu_alloc_intremap_table(
+    const struct amd_iommu *iommu, unsigned long **inuse_map)
 {
     void *tb;
     tb = __alloc_amd_iommu_tables(INTREMAP_TABLE_ORDER);
--- a/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
+++ b/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
@@ -97,7 +97,8 @@ struct amd_iommu *find_iommu_for_device(
 
 /* interrupt remapping */
 int amd_iommu_setup_ioapic_remapping(void);
-void *amd_iommu_alloc_intremap_table(unsigned long **);
+void *amd_iommu_alloc_intremap_table(
+    const struct amd_iommu *, unsigned long **);
 int amd_iommu_free_intremap_table(
     const struct amd_iommu *, struct ivrs_mappings *);
 void amd_iommu_ioapic_update_ire(
AMD/IOMMU: pass IOMMU to {get,free,update}_intremap_entry()

The functions will want to know IOMMU properties (specifically the IRTE
size) subsequently.

Rather than introducing a second error path bogusly returning -E... from
amd_iommu_read_ioapic_from_ire(), also change the existing one to follow
VT-d in returning the raw (untranslated) IO-APIC RTE.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v3: New.

--- a/xen/drivers/passthrough/amd/iommu_intr.c
+++ b/xen/drivers/passthrough/amd/iommu_intr.c
@@ -123,11 +123,11 @@ static unsigned int alloc_intremap_entry
     return slot;
 }
 
-static union irte_ptr get_intremap_entry(unsigned int seg, unsigned int bdf,
-                                         unsigned int index)
+static union irte_ptr get_intremap_entry(const struct amd_iommu *iommu,
+                                         unsigned int bdf, unsigned int index)
 {
     union irte_ptr table = {
-        .ptr = get_ivrs_mappings(seg)[bdf].intremap_table
+        .ptr = get_ivrs_mappings(iommu->seg)[bdf].intremap_table
     };
 
     ASSERT(table.ptr && (index < INTREMAP_ENTRIES));
@@ -137,18 +137,19 @@ static union irte_ptr get_intremap_entry
     return table;
 }
 
-static void free_intremap_entry(unsigned int seg, unsigned int bdf,
-                                unsigned int index)
+static void free_intremap_entry(const struct amd_iommu *iommu,
+                                unsigned int bdf, unsigned int index)
 {
-    union irte_ptr entry = get_intremap_entry(seg, bdf, index);
+    union irte_ptr entry = get_intremap_entry(iommu, bdf, index);
 
     ACCESS_ONCE(entry.ptr32->raw[0]) = 0;
 
-    __clear_bit(index, get_ivrs_mappings(seg)[bdf].intremap_inuse);
+    __clear_bit(index, get_ivrs_mappings(iommu->seg)[bdf].intremap_inuse);
 }
 
-static void update_intremap_entry(union irte_ptr entry, unsigned int vector,
-                                  unsigned int int_type,
+static void update_intremap_entry(const struct amd_iommu *iommu,
+                                  union irte_ptr entry,
+                                  unsigned int vector, unsigned int int_type,
                                   unsigned int dest_mode, unsigned int dest)
 {
     struct irte_basic basic = {
@@ -212,7 +213,7 @@ static int update_intremap_entry_from_io
         lo_update = 1;
     }
 
-    entry = get_intremap_entry(iommu->seg, req_id, offset);
+    entry = get_intremap_entry(iommu, req_id, offset);
     if ( !lo_update )
     {
         /*
@@ -223,7 +224,7 @@ static int update_intremap_entry_from_io
         vector = entry.ptr32->basic.vector;
         delivery_mode = entry.ptr32->basic.int_type;
     }
-    update_intremap_entry(entry, vector, delivery_mode, dest_mode, dest);
+    update_intremap_entry(iommu, entry, vector, delivery_mode, dest_mode, dest);
 
     spin_unlock_irqrestore(lock, flags);
 
@@ -288,8 +289,8 @@ int __init amd_iommu_setup_ioapic_remapp
             spin_lock_irqsave(lock, flags);
             offset = alloc_intremap_entry(seg, req_id, 1);
             BUG_ON(offset >= INTREMAP_ENTRIES);
-            entry = get_intremap_entry(iommu->seg, req_id, offset);
-            update_intremap_entry(entry, vector,
+            entry = get_intremap_entry(iommu, req_id, offset);
+            update_intremap_entry(iommu, entry, vector,
                                   delivery_mode, dest_mode, dest);
             spin_unlock_irqrestore(lock, flags);
 
@@ -413,7 +414,7 @@ unsigned int amd_iommu_read_ioapic_from_
 
     idx = ioapic_id_to_index(IO_APIC_ID(apic));
     if ( idx == MAX_IO_APICS )
-        return -EINVAL;
+        return val;
 
     offset = ioapic_sbdf[idx].pin_2_idx[pin];
 
@@ -422,9 +423,13 @@ unsigned int amd_iommu_read_ioapic_from_
         u16 bdf = ioapic_sbdf[idx].bdf;
         u16 seg = ioapic_sbdf[idx].seg;
         u16 req_id = get_intremap_requestor_id(seg, bdf);
-        union irte_ptr entry = get_intremap_entry(seg, req_id, offset);
+        const struct amd_iommu *iommu = find_iommu_for_device(seg, bdf);
+        union irte_ptr entry;
 
+        if ( !iommu )
+            return val;
         ASSERT(offset == (val & (INTREMAP_ENTRIES - 1)));
+        entry = get_intremap_entry(iommu, req_id, offset);
         val &= ~(INTREMAP_ENTRIES - 1);
         val |= MASK_INSR(entry.ptr32->basic.int_type,
                          IO_APIC_REDIR_DELIV_MODE_MASK);
@@ -454,7 +459,7 @@ static int update_intremap_entry_from_ms
         lock = get_intremap_lock(iommu->seg, req_id);
         spin_lock_irqsave(lock, flags);
         for ( i = 0; i < nr; ++i )
-            free_intremap_entry(iommu->seg, req_id, *remap_index + i);
+            free_intremap_entry(iommu, req_id, *remap_index + i);
         spin_unlock_irqrestore(lock, flags);
         goto done;
     }
@@ -479,8 +484,8 @@ static int update_intremap_entry_from_ms
         *remap_index = offset;
     }
 
-    entry = get_intremap_entry(iommu->seg, req_id, offset);
-    update_intremap_entry(entry, vector, delivery_mode, dest_mode, dest);
+    entry = get_intremap_entry(iommu, req_id, offset);
+    update_intremap_entry(iommu, entry, vector, delivery_mode, dest_mode, dest);
     spin_unlock_irqrestore(lock, flags);
 
     *data = (msg->data & ~(INTREMAP_ENTRIES - 1)) | offset;
@@ -594,12 +599,13 @@ void amd_iommu_read_msi_from_ire(
     const struct pci_dev *pdev = msi_desc->dev;
     u16 bdf = pdev ? PCI_BDF2(pdev->bus, pdev->devfn) : hpet_sbdf.bdf;
     u16 seg = pdev ? pdev->seg : hpet_sbdf.seg;
+    const struct amd_iommu *iommu = _find_iommu_for_device(seg, bdf);
     union irte_ptr entry;
 
-    if ( IS_ERR_OR_NULL(_find_iommu_for_device(seg, bdf)) )
+    if ( IS_ERR_OR_NULL(iommu) )
         return;
 
-    entry = get_intremap_entry(seg, get_dma_requestor_id(seg, bdf), offset);
+    entry = get_intremap_entry(iommu, get_dma_requestor_id(seg, bdf), offset);
 
     if ( msi_desc->msi_attrib.type == PCI_CAP_ID_MSI )
     {
AMD/IOMMU: introduce 128-bit IRTE non-guest-APIC IRTE format

This is in preparation of actually enabling x2APIC mode, which requires
this wider IRTE format to be used.

A specific remark regarding the first hunk changing
amd_iommu_ioapic_update_ire(): This bypass was introduced for XSA-36,
i.e. by 94d4a1119d ("AMD,IOMMU: Clean up old entries in remapping
tables when creating new one"). Other code introduced by that change has
meanwhile disappeared or further changed, and I wonder if - rather than
adding an x2apic_enabled check to the conditional - the bypass couldn't
be deleted altogether. For now the goal is to affect the non-x2APIC
paths as little as possible.

Take the liberty and use the new "fresh" flag to suppress an unneeded
flush in update_intremap_entry_from_ioapic().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v3: Avoid unrelated type changes in update_intremap_entry_from_ioapic().
    Drop irte_mode enum and variable. Convert INTREMAP_TABLE_ORDER into
    a static helper. Comment barrier() uses. Switch boolean bitfields to
    bool.
v2: Add cast in get_full_dest(). Re-base over changes earlier in the
    series. Don't use cmpxchg16b. Use barrier() instead of wmb().
---
Note that AMD's doc says Lowest Priority ("Arbitrated" by their naming)
mode is unavailable in x2APIC mode, but they've confirmed this to be a
mistake on their part.

--- a/xen/drivers/passthrough/amd/iommu_intr.c
+++ b/xen/drivers/passthrough/amd/iommu_intr.c
@@ -40,12 +40,38 @@ union irte32 {
     struct irte_basic basic;
 };
 
+struct irte_full {
+    bool remap_en:1;
+    bool sup_io_pf:1;
+    unsigned int int_type:3;
+    bool rq_eoi:1;
+    bool dm:1;
+    bool guest_mode:1; /* MBZ */
+    unsigned int dest_lo:24;
+    unsigned int :32;
+    unsigned int vector:8;
+    unsigned int :24;
+    unsigned int :24;
+    unsigned int dest_hi:8;
+};
+
+union irte128 {
+    uint64_t raw[2];
+    struct irte_full full;
+};
+
 union irte_ptr {
     void *ptr;
     union irte32 *ptr32;
+    union irte128 *ptr128;
 };
 
-#define INTREMAP_TABLE_ORDER    1
+union irte_cptr {
+    const void *ptr;
+    const union irte32 *ptr32;
+    const union irte128 *ptr128;
+} __transparent__;
+
 #define INTREMAP_LENGTH 0xB
 #define INTREMAP_ENTRIES (1 << INTREMAP_LENGTH)
 
@@ -58,6 +84,13 @@ unsigned int nr_ioapic_sbdf;
 
 static void dump_intremap_tables(unsigned char key);
 
+static unsigned int __init intremap_table_order(const struct amd_iommu *iommu)
+{
+    return iommu->ctrl.ga_en
+           ? get_order_from_bytes(INTREMAP_ENTRIES * sizeof(union irte128))
+           : get_order_from_bytes(INTREMAP_ENTRIES * sizeof(union irte32));
+}
+
 unsigned int ioapic_id_to_index(unsigned int apic_id)
 {
     unsigned int idx;
@@ -132,7 +165,10 @@ static union irte_ptr get_intremap_entry
 
     ASSERT(table.ptr && (index < INTREMAP_ENTRIES));
 
-    table.ptr32 += index;
+    if ( iommu->ctrl.ga_en )
+        table.ptr128 += index;
+    else
+        table.ptr32 += index;
 
     return table;
 }
@@ -142,7 +178,15 @@ static void free_intremap_entry(const st
 {
     union irte_ptr entry = get_intremap_entry(iommu, bdf, index);
 
-    ACCESS_ONCE(entry.ptr32->raw[0]) = 0;
+    if ( iommu->ctrl.ga_en )
+    {
+        ACCESS_ONCE(entry.ptr128->raw[0]) = 0;
+        /* Low half (containing RemapEn) needs to be cleared first. */
+        barrier();
+        entry.ptr128->raw[1] = 0;
+    }
+    else
+        ACCESS_ONCE(entry.ptr32->raw[0]) = 0;
 
     __clear_bit(index, get_ivrs_mappings(iommu->seg)[bdf].intremap_inuse);
 }
@@ -152,16 +196,40 @@ static void update_intremap_entry(const
                                   unsigned int vector, unsigned int int_type,
                                   unsigned int dest_mode, unsigned int dest)
 {
-    struct irte_basic basic = {
-        .remap_en = true,
-        .int_type = int_type,
-        .dm = dest_mode,
-        .dest = dest,
-        .vector = vector,
-    };
+    if ( iommu->ctrl.ga_en )
+    {
+        struct irte_full full = {
+            .remap_en = true,
+            .int_type = int_type,
+            .dm = dest_mode,
+            .dest_lo = dest,
+            .dest_hi = dest >> 24,
+            .vector = vector,
+        };
+
+        ACCESS_ONCE(entry.ptr128->raw[0]) = 0;
+        /* Low half, in particular RemapEn, needs to be cleared first. */
+        barrier();
+        entry.ptr128->raw[1] =
+            container_of(&full, union irte128, full)->raw[1];
+        /* High half needs to be set before low one (containing RemapEn). */
+        barrier();
+        ACCESS_ONCE(entry.ptr128->raw[0]) =
+            container_of(&full, union irte128, full)->raw[0];
+    }
+    else
+    {
+        struct irte_basic basic = {
+            .remap_en = true,
+            .int_type = int_type,
+            .dm = dest_mode,
+            .dest = dest,
+            .vector = vector,
+        };
 
-    ACCESS_ONCE(entry.ptr32->raw[0]) =
-        container_of(&basic, union irte32, basic)->raw[0];
+        ACCESS_ONCE(entry.ptr32->raw[0]) =
+            container_of(&basic, union irte32, basic)->raw[0];
+    }
 }
 
 static inline int get_rte_index(const struct IO_APIC_route_entry *rte)
@@ -175,6 +243,11 @@ static inline void set_rte_index(struct
     rte->delivery_mode = offset >> 8;
 }
 
+static inline unsigned int get_full_dest(const union irte128 *entry)
+{
+    return entry->full.dest_lo | ((unsigned int)entry->full.dest_hi << 24);
+}
+
 static int update_intremap_entry_from_ioapic(
     int bdf,
     struct amd_iommu *iommu,
@@ -184,10 +257,11 @@ static int update_intremap_entry_from_io
 {
     unsigned long flags;
     union irte_ptr entry;
-    u8 delivery_mode, dest, vector, dest_mode;
+    uint8_t delivery_mode, vector, dest_mode;
     int req_id;
     spinlock_t *lock;
-    unsigned int offset;
+    unsigned int dest, offset;
+    bool fresh = false;
 
     req_id = get_intremap_requestor_id(iommu->seg, bdf);
     lock = get_intremap_lock(iommu->seg, req_id);
@@ -195,7 +269,7 @@ static int update_intremap_entry_from_io
     delivery_mode = rte->delivery_mode;
     vector = rte->vector;
     dest_mode = rte->dest_mode;
-    dest = rte->dest.logical.logical_dest;
+    dest = x2apic_enabled ? rte->dest.dest32 : rte->dest.logical.logical_dest;
 
     spin_lock_irqsave(lock, flags);
 
@@ -210,25 +284,40 @@ static int update_intremap_entry_from_io
             return -ENOSPC;
         }
         *index = offset;
-        lo_update = 1;
+        fresh = true;
     }
 
     entry = get_intremap_entry(iommu, req_id, offset);
-    if ( !lo_update )
+    if ( fresh )
+        /* nothing */;
+    else if ( !lo_update )
     {
         /*
          * Low half of incoming RTE is already in remapped format,
          * so need to recover vector and delivery mode from IRTE.
          */
         ASSERT(get_rte_index(rte) == offset);
-        vector = entry.ptr32->basic.vector;
+        if ( iommu->ctrl.ga_en )
+            vector = entry.ptr128->full.vector;
+        else
+            vector = entry.ptr32->basic.vector;
+        /* The IntType fields match for both formats. */
         delivery_mode = entry.ptr32->basic.int_type;
     }
+    else if ( x2apic_enabled )
+    {
+        /*
+         * High half of incoming RTE was read from the I/O APIC and hence may
+         * not hold the full destination, so need to recover full destination
+         * from IRTE.
+         */
+        dest = get_full_dest(entry.ptr128);
+    }
     update_intremap_entry(iommu, entry, vector, delivery_mode, dest_mode, dest);
 
     spin_unlock_irqrestore(lock, flags);
 
-    if ( iommu->enabled )
+    if ( iommu->enabled && !fresh )
     {
         spin_lock_irqsave(&iommu->lock, flags);
         amd_iommu_flush_intremap(iommu, req_id);
@@ -286,6 +375,18 @@ int __init amd_iommu_setup_ioapic_remapp
             dest_mode = rte.dest_mode;
             dest = rte.dest.logical.logical_dest;
 
+            if ( iommu->ctrl.xt_en )
+            {
+                /*
+                 * In x2APIC mode we have no way of discovering the high 24
+                 * bits of the destination of an already enabled interrupt.
+                 * We come here earlier than for xAPIC mode, so no interrupts
+                 * should have been set up before.
+                 */
+                AMD_IOMMU_DEBUG("Unmasked IO-APIC#%u entry %u in x2APIC mode\n",
+                                IO_APIC_ID(apic), pin);
+            }
+
             spin_lock_irqsave(lock, flags);
             offset = alloc_intremap_entry(seg, req_id, 1);
             BUG_ON(offset >= INTREMAP_ENTRIES);
@@ -320,7 +421,8 @@ void amd_iommu_ioapic_update_ire(
     struct IO_APIC_route_entry new_rte = { 0 };
     unsigned int rte_lo = (reg & 1) ? reg - 1 : reg;
     unsigned int pin = (reg - 0x10) / 2;
-    int saved_mask, seg, bdf, rc;
+    int seg, bdf, rc;
+    bool saved_mask, fresh = false;
     struct amd_iommu *iommu;
     unsigned int idx;
 
@@ -362,12 +464,22 @@ void amd_iommu_ioapic_update_ire(
         *(((u32 *)&new_rte) + 1) = value;
     }
 
-    if ( new_rte.mask &&
-         ioapic_sbdf[idx].pin_2_idx[pin] >= INTREMAP_ENTRIES )
+    if ( ioapic_sbdf[idx].pin_2_idx[pin] >= INTREMAP_ENTRIES )
     {
         ASSERT(saved_mask);
-        __io_apic_write(apic, reg, value);
-        return;
+
+        /*
+         * There's nowhere except the IRTE to store a full 32-bit destination,
+         * so we may not bypass entry allocation and updating of the low RTE
+         * half in the (usual) case of the high RTE half getting written first.
+         */
+        if ( new_rte.mask && !x2apic_enabled )
+        {
+            __io_apic_write(apic, reg, value);
+            return;
+        }
+
+        fresh = true;
     }
 
     /* mask the interrupt while we change the intremap table */
@@ -396,8 +508,12 @@ void amd_iommu_ioapic_update_ire(
     if ( reg == rte_lo )
         return;
 
-    /* unmask the interrupt after we have updated the intremap table */
-    if ( !saved_mask )
+    /*
+     * Unmask the interrupt after we have updated the intremap table. Also
+     * write the low half if a fresh entry was allocated for a high half
+     * update in x2APIC mode.
+     */
+    if ( !saved_mask || (x2apic_enabled && fresh) )
     {
         old_rte.mask = saved_mask;
         __io_apic_write(apic, rte_lo, *((u32 *)&old_rte));
@@ -411,31 +527,40 @@ unsigned int amd_iommu_read_ioapic_from_
     unsigned int offset;
     unsigned int val = __io_apic_read(apic, reg);
     unsigned int pin = (reg - 0x10) / 2;
+    uint16_t seg, bdf, req_id;
+    const struct amd_iommu *iommu;
+    union irte_ptr entry;
 
     idx = ioapic_id_to_index(IO_APIC_ID(apic));
     if ( idx == MAX_IO_APICS )
         return val;
 
     offset = ioapic_sbdf[idx].pin_2_idx[pin];
+    if ( offset >= INTREMAP_ENTRIES )
+        return val;
 
-    if ( !(reg & 1) && offset < INTREMAP_ENTRIES )
-    {
-        u16 bdf = ioapic_sbdf[idx].bdf;
-        u16 seg = ioapic_sbdf[idx].seg;
-        u16 req_id = get_intremap_requestor_id(seg, bdf);
-        const struct amd_iommu *iommu = find_iommu_for_device(seg, bdf);
-        union irte_ptr entry;
+    seg = ioapic_sbdf[idx].seg;
+    bdf = ioapic_sbdf[idx].bdf;
+    iommu = find_iommu_for_device(seg, bdf);
+    if ( !iommu )
+        return val;
+    req_id = get_intremap_requestor_id(seg, bdf);
+    entry = get_intremap_entry(iommu, req_id, offset);
 
-        if ( !iommu )
-            return val;
+    if ( !(reg & 1) )
+    {
         ASSERT(offset == (val & (INTREMAP_ENTRIES - 1)));
-        entry = get_intremap_entry(iommu, req_id, offset);
         val &= ~(INTREMAP_ENTRIES - 1);
+        /* The IntType fields match for both formats. */
         val |= MASK_INSR(entry.ptr32->basic.int_type,
                          IO_APIC_REDIR_DELIV_MODE_MASK);
-        val |= MASK_INSR(entry.ptr32->basic.vector,
+        val |= MASK_INSR(iommu->ctrl.ga_en
+                         ? entry.ptr128->full.vector
+                         : entry.ptr32->basic.vector,
                          IO_APIC_REDIR_VECTOR_MASK);
     }
+    else if ( x2apic_enabled )
+        val = get_full_dest(entry.ptr128);
 
     return val;
 }
@@ -447,9 +572,9 @@ static int update_intremap_entry_from_ms
     unsigned long flags;
     union irte_ptr entry;
     u16 req_id, alias_id;
-    u8 delivery_mode, dest, vector, dest_mode;
+    uint8_t delivery_mode, vector, dest_mode;
     spinlock_t *lock;
-    unsigned int offset, i;
+    unsigned int dest, offset, i;
 
     req_id = get_dma_requestor_id(iommu->seg, bdf);
     alias_id = get_intremap_requestor_id(iommu->seg, bdf);
@@ -470,7 +595,12 @@ static int update_intremap_entry_from_ms
     dest_mode = (msg->address_lo >> MSI_ADDR_DESTMODE_SHIFT) & 0x1;
     delivery_mode = (msg->data >> MSI_DATA_DELIVERY_MODE_SHIFT) & 0x1;
     vector = (msg->data >> MSI_DATA_VECTOR_SHIFT) & MSI_DATA_VECTOR_MASK;
-    dest = (msg->address_lo >> MSI_ADDR_DEST_ID_SHIFT) & 0xff;
+
+    if ( x2apic_enabled )
+        dest = msg->dest32;
+    else
+        dest = MASK_EXTR(msg->address_lo, MSI_ADDR_DEST_ID_MASK);
+
     offset = *remap_index;
     if ( offset >= INTREMAP_ENTRIES )
     {
@@ -616,10 +746,21 @@ void amd_iommu_read_msi_from_ire(
     }
 
     msg->data &= ~(INTREMAP_ENTRIES - 1);
+    /* The IntType fields match for both formats. */
     msg->data |= MASK_INSR(entry.ptr32->basic.int_type,
                            MSI_DATA_DELIVERY_MODE_MASK);
-    msg->data |= MASK_INSR(entry.ptr32->basic.vector,
-                           MSI_DATA_VECTOR_MASK);
+    if ( iommu->ctrl.ga_en )
+    {
+        msg->data |= MASK_INSR(entry.ptr128->full.vector,
+                               MSI_DATA_VECTOR_MASK);
+        msg->dest32 = get_full_dest(entry.ptr128);
+    }
+    else
+    {
+        msg->data |= MASK_INSR(entry.ptr32->basic.vector,
+                               MSI_DATA_VECTOR_MASK);
+        msg->dest32 = entry.ptr32->basic.dest;
+    }
 }
 
 int __init amd_iommu_free_intremap_table(
@@ -631,7 +772,7 @@ int __init amd_iommu_free_intremap_table
 
     if ( tb )
     {
-        __free_amd_iommu_tables(tb, INTREMAP_TABLE_ORDER);
+        __free_amd_iommu_tables(tb, intremap_table_order(iommu));
         ivrs_mapping->intremap_table = NULL;
     }
 
@@ -641,10 +782,10 @@ int __init amd_iommu_free_intremap_table
 void *__init amd_iommu_alloc_intremap_table(
     const struct amd_iommu *iommu, unsigned long **inuse_map)
 {
-    void *tb;
-    tb = __alloc_amd_iommu_tables(INTREMAP_TABLE_ORDER);
+    void *tb = __alloc_amd_iommu_tables(intremap_table_order(iommu));
+
     BUG_ON(tb == NULL);
-    memset(tb, 0, PAGE_SIZE * (1UL << INTREMAP_TABLE_ORDER));
+    memset(tb, 0, PAGE_SIZE << intremap_table_order(iommu));
     *inuse_map = xzalloc_array(unsigned long, BITS_TO_LONGS(INTREMAP_ENTRIES));
     BUG_ON(*inuse_map == NULL);
     return tb;
@@ -685,18 +826,29 @@ int __init amd_setup_hpet_msi(struct msi
     return rc;
 }
 
-static void dump_intremap_table(const u32 *table)
+static void dump_intremap_table(const struct amd_iommu *iommu,
+                                union irte_cptr tbl)
 {
-    u32 count;
+    unsigned int count;
 
-    if ( !table )
+    if ( !tbl.ptr )
         return;
 
     for ( count = 0; count < INTREMAP_ENTRIES; count++ )
     {
-        if ( !table[count] )
-            continue;
-        printk("    IRTE[%03x] %08x\n", count, table[count]);
+        if ( iommu->ctrl.ga_en )
+        {
+            if ( !tbl.ptr128[count].raw[0] && !tbl.ptr128[count].raw[1] )
+                continue;
+            printk("    IRTE[%03x] %016lx_%016lx\n",
+                   count, tbl.ptr128[count].raw[1], tbl.ptr128[count].raw[0]);
+        }
+        else
+        {
+            if ( !tbl.ptr32[count].raw[0] )
+                continue;
+            printk("    IRTE[%03x] %08x\n", count, tbl.ptr32[count].raw[0]);
+        }
     }
 }
 
@@ -714,7 +866,7 @@ static int dump_intremap_mapping(const s
            PCI_FUNC(ivrs_mapping->dte_requestor_id));
 
     spin_lock_irqsave(&(ivrs_mapping->intremap_lock), flags);
-    dump_intremap_table(ivrs_mapping->intremap_table);
+    dump_intremap_table(iommu, ivrs_mapping->intremap_table);
     spin_unlock_irqrestore(&(ivrs_mapping->intremap_lock), flags);
 
     return 0;
@@ -731,6 +883,8 @@ static void dump_intremap_tables(unsigne
     printk("--- Dumping Shared IOMMU Interrupt Remapping Table ---\n");
 
     spin_lock_irqsave(&shared_intremap_lock, flags);
-    dump_intremap_table(shared_intremap_table);
+    dump_intremap_table(list_first_entry(&amd_iommu_head, struct amd_iommu,
+                                         list),
+                        shared_intremap_table);
     spin_unlock_irqrestore(&shared_intremap_lock, flags);
 }
AMD/IOMMU: split amd_iommu_init_one()

Mapping the MMIO space and obtaining feature information needs to happen
slightly earlier, such that for x2APIC support we can set XTEn prior to
calling amd_iommu_update_ivrs_mapping_acpi() and
amd_iommu_setup_ioapic_remapping().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -970,14 +970,6 @@ static void * __init allocate_ppr_log(st
 
 static int __init amd_iommu_init_one(struct amd_iommu *iommu)
 {
-    if ( map_iommu_mmio_region(iommu) != 0 )
-        goto error_out;
-
-    get_iommu_features(iommu);
-
-    if ( iommu->features.raw )
-        iommuv2_enabled = 1;
-
     if ( allocate_cmd_buffer(iommu) == NULL )
         goto error_out;
 
@@ -1202,6 +1194,23 @@ static bool_t __init amd_sp5100_erratum2
     return 0;
 }
 
+static int __init amd_iommu_prepare_one(struct amd_iommu *iommu)
+{
+    int rc = alloc_ivrs_mappings(iommu->seg);
+
+    if ( !rc )
+        rc = map_iommu_mmio_region(iommu);
+    if ( rc )
+        return rc;
+
+    get_iommu_features(iommu);
+
+    if ( iommu->features.raw )
+        iommuv2_enabled = true;
+
+    return 0;
+}
+
 int __init amd_iommu_init(void)
 {
     struct amd_iommu *iommu;
@@ -1232,7 +1241,7 @@ int __init amd_iommu_init(void)
     radix_tree_init(&ivrs_maps);
     for_each_amd_iommu ( iommu )
     {
-        rc = alloc_ivrs_mappings(iommu->seg);
+        rc = amd_iommu_prepare_one(iommu);
         if ( rc )
             goto error_out;
     }
AMD/IOMMU: allow enabling with IRQ not yet set up

Early enabling (to enter x2APIC mode) requires deferring of the IRQ
setup. Code to actually do that setup in the x2APIC case will get added
subsequently.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
v3: Re-base.

--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -814,7 +814,6 @@ static void amd_iommu_erratum_746_workar
 static void enable_iommu(struct amd_iommu *iommu)
 {
     unsigned long flags;
-    struct irq_desc *desc;
 
     spin_lock_irqsave(&iommu->lock, flags);
 
@@ -834,19 +833,27 @@ static void enable_iommu(struct amd_iomm
     if ( iommu->features.flds.ppr_sup )
         register_iommu_ppr_log_in_mmio_space(iommu);
 
-    desc = irq_to_desc(iommu->msi.irq);
-    spin_lock(&desc->lock);
-    set_msi_affinity(desc, NULL);
-    spin_unlock(&desc->lock);
+    if ( iommu->msi.irq > 0 )
+    {
+        struct irq_desc *desc = irq_to_desc(iommu->msi.irq);
+
+        spin_lock(&desc->lock);
+        set_msi_affinity(desc, NULL);
+        spin_unlock(&desc->lock);
+    }
 
     amd_iommu_msi_enable(iommu, IOMMU_CONTROL_ENABLED);
 
     set_iommu_ht_flags(iommu);
     set_iommu_command_buffer_control(iommu, IOMMU_CONTROL_ENABLED);
-    set_iommu_event_log_control(iommu, IOMMU_CONTROL_ENABLED);
 
-    if ( iommu->features.flds.ppr_sup )
-        set_iommu_ppr_log_control(iommu, IOMMU_CONTROL_ENABLED);
+    if ( iommu->msi.irq > 0 )
+    {
+        set_iommu_event_log_control(iommu, IOMMU_CONTROL_ENABLED);
+
+        if ( iommu->features.flds.ppr_sup )
+            set_iommu_ppr_log_control(iommu, IOMMU_CONTROL_ENABLED);
+    }
 
     if ( iommu->features.flds.gt_sup )
         set_iommu_guest_translation_control(iommu, IOMMU_CONTROL_ENABLED);
AMD/IOMMU: adjust setup of internal interrupt for x2APIC mode

In order to be able to express all possible destinations we need to make
use of this non-MSI-capability based mechanism. The new IRQ controller
structure can re-use certain MSI functions, though.

For now general and PPR interrupts still share a single vector, IRQ, and
hence handler.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
v3: Re-base.

--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -472,6 +472,44 @@ static hw_irq_controller iommu_maskable_
     .set_affinity = set_msi_affinity,
 };
 
+static void set_x2apic_affinity(struct irq_desc *desc, const cpumask_t *mask)
+{
+    struct amd_iommu *iommu = desc->action->dev_id;
+    unsigned int dest = set_desc_affinity(desc, mask);
+    union amd_iommu_x2apic_control ctrl = {};
+    unsigned long flags;
+
+    if ( dest == BAD_APICID )
+        return;
+
+    msi_compose_msg(desc->arch.vector, NULL, &iommu->msi.msg);
+    iommu->msi.msg.dest32 = dest;
+
+    ctrl.dest_mode = MASK_EXTR(iommu->msi.msg.address_lo,
+                               MSI_ADDR_DESTMODE_MASK);
+    ctrl.int_type = MASK_EXTR(iommu->msi.msg.data,
+                              MSI_DATA_DELIVERY_MODE_MASK);
+    ctrl.vector = desc->arch.vector;
+    ctrl.dest_lo = dest;
+    ctrl.dest_hi = dest >> 24;
+
+    spin_lock_irqsave(&iommu->lock, flags);
+    writeq(ctrl.raw, iommu->mmio_base + IOMMU_XT_INT_CTRL_MMIO_OFFSET);
+    writeq(ctrl.raw, iommu->mmio_base + IOMMU_XT_PPR_INT_CTRL_MMIO_OFFSET);
+    spin_unlock_irqrestore(&iommu->lock, flags);
+}
+
+static hw_irq_controller iommu_x2apic_type = {
+    .typename     = "IOMMU-x2APIC",
+    .startup      = irq_startup_none,
+    .shutdown     = irq_shutdown_none,
+    .enable       = irq_enable_none,
+    .disable      = irq_disable_none,
+    .ack          = ack_nonmaskable_msi_irq,
+    .end          = end_nonmaskable_msi_irq,
+    .set_affinity = set_x2apic_affinity,
+};
+
 static void parse_event_log_entry(struct amd_iommu *iommu, u32 entry[])
 {
     u16 domain_id, device_id, flags;
@@ -726,8 +764,6 @@ static void iommu_interrupt_handler(int
 static bool_t __init set_iommu_interrupt_handler(struct amd_iommu *iommu)
 {
     int irq, ret;
-    hw_irq_controller *handler;
-    u16 control;
 
     irq = create_irq(NUMA_NO_NODE);
     if ( irq <= 0 )
@@ -747,20 +783,43 @@ static bool_t __init set_iommu_interrupt
                         PCI_SLOT(iommu->bdf), PCI_FUNC(iommu->bdf));
         return 0;
     }
-    control = pci_conf_read16(iommu->seg, PCI_BUS(iommu->bdf),
-                              PCI_SLOT(iommu->bdf), PCI_FUNC(iommu->bdf),
-                              iommu->msi.msi_attrib.pos + PCI_MSI_FLAGS);
-    iommu->msi.msi.nvec = 1;
-    if ( is_mask_bit_support(control) )
-    {
-        iommu->msi.msi_attrib.maskbit = 1;
-        iommu->msi.msi.mpos = msi_mask_bits_reg(iommu->msi.msi_attrib.pos,
-                                                is_64bit_address(control));
-        handler = &iommu_maskable_msi_type;
+
+    if ( iommu->ctrl.int_cap_xt_en )
+    {
+        struct irq_desc *desc = irq_to_desc(irq);
+
+        iommu->msi.msi_attrib.pos = MSI_TYPE_IOMMU;
+        iommu->msi.msi_attrib.maskbit = 0;
+        iommu->msi.msi_attrib.is_64 = 1;
+
+        desc->msi_desc = &iommu->msi;
+        desc->handler = &iommu_x2apic_type;
+
+        ret = 0;
     }
     else
-        handler = &iommu_msi_type;
-    ret = __setup_msi_irq(irq_to_desc(irq), &iommu->msi, handler);
+    {
+        hw_irq_controller *handler;
+        u16 control;
+
+        control = pci_conf_read16(iommu->seg, PCI_BUS(iommu->bdf),
+                                  PCI_SLOT(iommu->bdf), PCI_FUNC(iommu->bdf),
+                                  iommu->msi.msi_attrib.pos + PCI_MSI_FLAGS);
+
+        iommu->msi.msi.nvec = 1;
+        if ( is_mask_bit_support(control) )
+        {
+            iommu->msi.msi_attrib.maskbit = 1;
+            iommu->msi.msi.mpos = msi_mask_bits_reg(iommu->msi.msi_attrib.pos,
+                                                    is_64bit_address(control));
+            handler = &iommu_maskable_msi_type;
+        }
+        else
+            handler = &iommu_msi_type;
+
+        ret = __setup_msi_irq(irq_to_desc(irq), &iommu->msi, handler);
+    }
+
     if ( !ret )
         ret = request_irq(irq, 0, iommu_interrupt_handler, "amd_iommu", iommu);
     if ( ret )
@@ -838,8 +897,19 @@ static void enable_iommu(struct amd_iomm
         struct irq_desc *desc = irq_to_desc(iommu->msi.irq);
 
         spin_lock(&desc->lock);
-        set_msi_affinity(desc, NULL);
-        spin_unlock(&desc->lock);
+
+        if ( iommu->ctrl.int_cap_xt_en )
+        {
+            set_x2apic_affinity(desc, NULL);
+            spin_unlock(&desc->lock);
+        }
+        else
+        {
+            set_msi_affinity(desc, NULL);
+            spin_unlock(&desc->lock);
+
+            amd_iommu_msi_enable(iommu, IOMMU_CONTROL_ENABLED);
+        }
     }
 
     amd_iommu_msi_enable(iommu, IOMMU_CONTROL_ENABLED);
@@ -879,7 +949,9 @@ static void disable_iommu(struct amd_iom
         return;
     }
 
-    amd_iommu_msi_enable(iommu, IOMMU_CONTROL_DISABLED);
+    if ( !iommu->ctrl.int_cap_xt_en )
+        amd_iommu_msi_enable(iommu, IOMMU_CONTROL_DISABLED);
+
     set_iommu_command_buffer_control(iommu, IOMMU_CONTROL_DISABLED);
     set_iommu_event_log_control(iommu, IOMMU_CONTROL_DISABLED);
 
--- a/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
+++ b/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
@@ -416,6 +416,25 @@ union amd_iommu_ext_features {
     } flds;
 };
 
+/* x2APIC Control Registers */
+#define IOMMU_XT_INT_CTRL_MMIO_OFFSET		0x0170
+#define IOMMU_XT_PPR_INT_CTRL_MMIO_OFFSET	0x0178
+#define IOMMU_XT_GA_INT_CTRL_MMIO_OFFSET	0x0180
+
+union amd_iommu_x2apic_control {
+    uint64_t raw;
+    struct {
+        unsigned int :2;
+        unsigned int dest_mode:1;
+        unsigned int :5;
+        unsigned int dest_lo:24;
+        unsigned int vector:8;
+        unsigned int int_type:1; /* DM in IOMMU spec 3.04 */
+        unsigned int :15;
+        unsigned int dest_hi:8;
+    };
+};
+
 /* Status Register*/
 #define IOMMU_STATUS_MMIO_OFFSET		0x2020
 #define IOMMU_STATUS_EVENT_OVERFLOW_MASK	0x00000001
AMD/IOMMU: enable x2APIC mode when available

In order for the CPUs to use x2APIC mode, the IOMMU(s) first need to be
switched into suitable state.

The post-AP-bringup IRQ affinity adjustment is done also for the non-
x2APIC case, matching what VT-d does.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v3: Set GAEn (and other control register bits) earlier. Also clear the
    bits enabled here in amd_iommu_init_cleanup(). Re-base. Pass NULL
    CPU mask to set_{x2apic,msi}_affinity().
v2: Drop cpu_has_cx16 check. Add comment.
---
TBD: Instead of the system_state check in iov_enable_xt() the function
     could also zap its own hook pointer, at which point it could also
     become __init. This would, however, require that either
     resume_x2apic() be bound to ignore iommu_enable_x2apic() errors
     forever, or that iommu_enable_x2apic() be slightly re-arranged to
     not return -EOPNOTSUPP when finding a NULL hook during resume.

--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -834,6 +834,30 @@ static bool_t __init set_iommu_interrupt
     return 1;
 }
 
+int iov_adjust_irq_affinities(void)
+{
+    const struct amd_iommu *iommu;
+
+    if ( !iommu_enabled )
+        return 0;
+
+    for_each_amd_iommu ( iommu )
+    {
+        struct irq_desc *desc = irq_to_desc(iommu->msi.irq);
+        unsigned long flags;
+
+        spin_lock_irqsave(&desc->lock, flags);
+        if ( iommu->ctrl.int_cap_xt_en )
+            set_x2apic_affinity(desc, NULL);
+        else
+            set_msi_affinity(desc, NULL);
+        spin_unlock_irqrestore(&desc->lock, flags);
+    }
+
+    return 0;
+}
+__initcall(iov_adjust_irq_affinities);
+
 /*
  * Family15h Model 10h-1fh erratum 746 (IOMMU Logging May Stall Translations)
  * Workaround:
@@ -1047,7 +1071,7 @@ static void * __init allocate_ppr_log(st
                                 IOMMU_PPR_LOG_DEFAULT_ENTRIES, "PPR Log");
 }
 
-static int __init amd_iommu_init_one(struct amd_iommu *iommu)
+static int __init amd_iommu_init_one(struct amd_iommu *iommu, bool intr)
 {
     if ( allocate_cmd_buffer(iommu) == NULL )
         goto error_out;
@@ -1058,7 +1082,7 @@ static int __init amd_iommu_init_one(str
     if ( iommu->features.flds.ppr_sup && !allocate_ppr_log(iommu) )
         goto error_out;
 
-    if ( !set_iommu_interrupt_handler(iommu) )
+    if ( intr && !set_iommu_interrupt_handler(iommu) )
         goto error_out;
 
     /* To make sure that device_table.buffer has been successfully allocated */
@@ -1087,8 +1111,16 @@ static void __init amd_iommu_init_cleanu
     list_for_each_entry_safe ( iommu, next, &amd_iommu_head, list )
     {
         list_del(&iommu->list);
+
+        iommu->ctrl.ga_en = 0;
+        iommu->ctrl.xt_en = 0;
+        iommu->ctrl.int_cap_xt_en = 0;
+
         if ( iommu->enabled )
             disable_iommu(iommu);
+        else if ( iommu->mmio_base )
+            writeq(iommu->ctrl.raw,
+                   iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
 
         deallocate_ring_buffer(&iommu->cmd_buffer);
         deallocate_ring_buffer(&iommu->event_log);
@@ -1290,7 +1322,7 @@ static int __init amd_iommu_prepare_one(
     return 0;
 }
 
-int __init amd_iommu_init(void)
+int __init amd_iommu_prepare(bool xt)
 {
     struct amd_iommu *iommu;
     int rc = -ENODEV;
@@ -1305,9 +1337,14 @@ int __init amd_iommu_init(void)
     if ( unlikely(acpi_gbl_FADT.boot_flags & ACPI_FADT_NO_MSI) )
         goto error_out;
 
+    /* Have we been here before? */
+    if ( ivhd_type )
+        return 0;
+
     rc = amd_iommu_get_supported_ivhd_type();
     if ( rc < 0 )
         goto error_out;
+    BUG_ON(!rc);
     ivhd_type = rc;
 
     rc = amd_iommu_get_ivrs_dev_entries();
@@ -1323,9 +1360,37 @@ int __init amd_iommu_init(void)
         rc = amd_iommu_prepare_one(iommu);
         if ( rc )
             goto error_out;
+
+        rc = -ENODEV;
+        if ( xt && (!iommu->features.flds.ga_sup || !iommu->features.flds.xt_sup) )
+            goto error_out;
+    }
+
+    for_each_amd_iommu ( iommu )
+    {
+        /* NB: There's no need to actually write these out right here. */
+        iommu->ctrl.ga_en |= xt;
+        iommu->ctrl.xt_en = xt;
+        iommu->ctrl.int_cap_xt_en = xt;
     }
 
     rc = amd_iommu_update_ivrs_mapping_acpi();
+
+ error_out:
+    if ( rc )
+    {
+        amd_iommu_init_cleanup();
+        ivhd_type = 0;
+    }
+
+    return rc;
+}
+
+int __init amd_iommu_init(bool xt)
+{
+    struct amd_iommu *iommu;
+    int rc = amd_iommu_prepare(xt);
+
     if ( rc )
         goto error_out;
 
@@ -1351,7 +1416,12 @@ int __init amd_iommu_init(void)
     /* per iommu initialization  */
     for_each_amd_iommu ( iommu )
     {
-        rc = amd_iommu_init_one(iommu);
+        /*
+         * Setting up of the IOMMU interrupts cannot occur yet at the (very
+         * early) time we get here when enabling x2APIC mode. Suppress it
+         * here, and do it explicitly in amd_iommu_init_interrupt().
+         */
+        rc = amd_iommu_init_one(iommu, !xt);
         if ( rc )
             goto error_out;
     }
@@ -1363,6 +1433,40 @@ error_out:
     return rc;
 }
 
+int __init amd_iommu_init_interrupt(void)
+{
+    struct amd_iommu *iommu;
+    int rc = 0;
+
+    for_each_amd_iommu ( iommu )
+    {
+        struct irq_desc *desc;
+
+        if ( !set_iommu_interrupt_handler(iommu) )
+        {
+            rc = -EIO;
+            break;
+        }
+
+        desc = irq_to_desc(iommu->msi.irq);
+
+        spin_lock(&desc->lock);
+        ASSERT(iommu->ctrl.int_cap_xt_en);
+        set_x2apic_affinity(desc, &cpu_online_map);
+        spin_unlock(&desc->lock);
+
+        set_iommu_event_log_control(iommu, IOMMU_CONTROL_ENABLED);
+
+        if ( iommu->features.flds.ppr_sup )
+            set_iommu_ppr_log_control(iommu, IOMMU_CONTROL_ENABLED);
+    }
+
+    if ( rc )
+        amd_iommu_init_cleanup();
+
+    return rc;
+}
+
 static void invalidate_all_domain_pages(void)
 {
     struct domain *d;
--- a/xen/drivers/passthrough/amd/iommu_intr.c
+++ b/xen/drivers/passthrough/amd/iommu_intr.c
@@ -791,6 +791,35 @@ void *__init amd_iommu_alloc_intremap_ta
     return tb;
 }
 
+bool __init iov_supports_xt(void)
+{
+    unsigned int apic;
+
+    if ( !iommu_enable || !iommu_intremap )
+        return false;
+
+    if ( amd_iommu_prepare(true) )
+        return false;
+
+    for ( apic = 0; apic < nr_ioapics; apic++ )
+    {
+        unsigned int idx = ioapic_id_to_index(IO_APIC_ID(apic));
+
+        if ( idx == MAX_IO_APICS )
+            return false;
+
+        if ( !find_iommu_for_device(ioapic_sbdf[idx].seg,
+                                    ioapic_sbdf[idx].bdf) )
+        {
+            AMD_IOMMU_DEBUG("No IOMMU for IO-APIC %#x (ID %x)\n",
+                            apic, IO_APIC_ID(apic));
+            return false;
+        }
+    }
+
+    return true;
+}
+
 int __init amd_setup_hpet_msi(struct msi_desc *msi_desc)
 {
     spinlock_t *lock;
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -170,7 +170,8 @@ static int __init iov_detect(void)
     if ( !iommu_enable && !iommu_intremap )
         return 0;
 
-    if ( amd_iommu_init() != 0 )
+    else if ( (init_done ? amd_iommu_init_interrupt()
+                         : amd_iommu_init(false)) != 0 )
     {
         printk("AMD-Vi: Error initialization\n");
         return -ENODEV;
@@ -183,6 +184,25 @@ static int __init iov_detect(void)
     return scan_pci_devices();
 }
 
+static int iov_enable_xt(void)
+{
+    int rc;
+
+    if ( system_state >= SYS_STATE_active )
+        return 0;
+
+    if ( (rc = amd_iommu_init(true)) != 0 )
+    {
+        printk("AMD-Vi: Error %d initializing for x2APIC mode\n", rc);
+        /* -ENXIO has special meaning to the caller - convert it. */
+        return rc != -ENXIO ? rc : -ENODATA;
+    }
+
+    init_done = true;
+
+    return 0;
+}
+
 int amd_iommu_alloc_root(struct domain_iommu *hd)
 {
     if ( unlikely(!hd->arch.root_table) )
@@ -559,11 +579,13 @@ static const struct iommu_ops __initcons
     .free_page_table = deallocate_page_table,
     .reassign_device = reassign_device,
     .get_device_group_id = amd_iommu_group_id,
+    .enable_x2apic = iov_enable_xt,
     .update_ire_from_apic = amd_iommu_ioapic_update_ire,
     .update_ire_from_msi = amd_iommu_msi_msg_update_ire,
     .read_apic_from_ire = amd_iommu_read_ioapic_from_ire,
     .read_msi_from_ire = amd_iommu_read_msi_from_ire,
     .setup_hpet_msi = amd_setup_hpet_msi,
+    .adjust_irq_affinities = iov_adjust_irq_affinities,
     .suspend = amd_iommu_suspend,
     .resume = amd_iommu_resume,
     .share_p2m = amd_iommu_share_p2m,
@@ -574,4 +596,5 @@ static const struct iommu_ops __initcons
 static const struct iommu_init_ops __initconstrel _iommu_init_ops = {
     .ops = &_iommu_ops,
     .setup = iov_detect,
+    .supports_x2apic = iov_supports_xt,
 };
--- a/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
+++ b/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
@@ -48,8 +48,11 @@ int amd_iommu_detect_acpi(void);
 void get_iommu_features(struct amd_iommu *iommu);
 
 /* amd-iommu-init functions */
-int amd_iommu_init(void);
+int amd_iommu_prepare(bool xt);
+int amd_iommu_init(bool xt);
+int amd_iommu_init_interrupt(void);
 int amd_iommu_update_ivrs_mapping_acpi(void);
+int iov_adjust_irq_affinities(void);
 
 /* mapping functions */
 int __must_check amd_iommu_map_page(struct domain *d, dfn_t dfn,
@@ -96,6 +99,7 @@ void amd_iommu_flush_all_caches(struct a
 struct amd_iommu *find_iommu_for_device(int seg, int bdf);
 
 /* interrupt remapping */
+bool iov_supports_xt(void);
 int amd_iommu_setup_ioapic_remapping(void);
 void *amd_iommu_alloc_intremap_table(
     const struct amd_iommu *, unsigned long **);
AMD/IOMMU: correct IRTE updating

Flushing didn't get done along the lines of what the specification says.
Mark entries to be updated as not remapped (which will result in
interrupt requests to get target aborted, but the interrupts should be
masked anyway at that point in time), issue the flush, and only then
write the new entry.

In update_intremap_entry_from_msi_msg() also fold the duplicate initial
lock determination and acquire into just a single instance.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
RFC: Putting the flush invocations in loops isn't overly nice, but I
     don't think this can really be abused, since callers up the stack
     hold further locks. Nevertheless I'd like to ask for better
     suggestions.
---
v3: Remove stale parts of description. Re-base.
v2: Parts morphed into earlier patch.

--- a/xen/drivers/passthrough/amd/iommu_intr.c
+++ b/xen/drivers/passthrough/amd/iommu_intr.c
@@ -207,9 +207,7 @@ static void update_intremap_entry(const
             .vector = vector,
         };
 
-        ACCESS_ONCE(entry.ptr128->raw[0]) = 0;
-        /* Low half, in particular RemapEn, needs to be cleared first. */
-        barrier();
+        ASSERT(!entry.ptr128->full.remap_en);
         entry.ptr128->raw[1] =
             container_of(&full, union irte128, full)->raw[1];
         /* High half needs to be set before low one (containing RemapEn). */
@@ -288,6 +286,20 @@ static int update_intremap_entry_from_io
     }
 
     entry = get_intremap_entry(iommu, req_id, offset);
+
+    /* The RemapEn fields match for all formats. */
+    while ( iommu->enabled && entry.ptr32->basic.remap_en )
+    {
+        entry.ptr32->basic.remap_en = false;
+        spin_unlock(lock);
+
+        spin_lock(&iommu->lock);
+        amd_iommu_flush_intremap(iommu, req_id);
+        spin_unlock(&iommu->lock);
+
+        spin_lock(lock);
+    }
+
     if ( fresh )
         /* nothing */;
     else if ( !lo_update )
@@ -317,13 +329,6 @@ static int update_intremap_entry_from_io
 
     spin_unlock_irqrestore(lock, flags);
 
-    if ( iommu->enabled && !fresh )
-    {
-        spin_lock_irqsave(&iommu->lock, flags);
-        amd_iommu_flush_intremap(iommu, req_id);
-        spin_unlock_irqrestore(&iommu->lock, flags);
-    }
-
     set_rte_index(rte, offset);
 
     return 0;
@@ -579,19 +584,27 @@ static int update_intremap_entry_from_ms
     req_id = get_dma_requestor_id(iommu->seg, bdf);
     alias_id = get_intremap_requestor_id(iommu->seg, bdf);
 
+    lock = get_intremap_lock(iommu->seg, req_id);
+    spin_lock_irqsave(lock, flags);
+
     if ( msg == NULL )
     {
-        lock = get_intremap_lock(iommu->seg, req_id);
-        spin_lock_irqsave(lock, flags);
         for ( i = 0; i < nr; ++i )
             free_intremap_entry(iommu, req_id, *remap_index + i);
         spin_unlock_irqrestore(lock, flags);
-        goto done;
-    }
 
-    lock = get_intremap_lock(iommu->seg, req_id);
+        if ( iommu->enabled )
+        {
+            spin_lock_irqsave(&iommu->lock, flags);
+            amd_iommu_flush_intremap(iommu, req_id);
+            if ( alias_id != req_id )
+                amd_iommu_flush_intremap(iommu, alias_id);
+            spin_unlock_irqrestore(&iommu->lock, flags);
+        }
+
+        return 0;
+    }
 
-    spin_lock_irqsave(lock, flags);
     dest_mode = (msg->address_lo >> MSI_ADDR_DESTMODE_SHIFT) & 0x1;
     delivery_mode = (msg->data >> MSI_DATA_DELIVERY_MODE_SHIFT) & 0x1;
     vector = (msg->data >> MSI_DATA_VECTOR_SHIFT) & MSI_DATA_VECTOR_MASK;
@@ -615,6 +628,22 @@ static int update_intremap_entry_from_ms
     }
 
     entry = get_intremap_entry(iommu, req_id, offset);
+
+    /* The RemapEn fields match for all formats. */
+    while ( iommu->enabled && entry.ptr32->basic.remap_en )
+    {
+        entry.ptr32->basic.remap_en = false;
+        spin_unlock(lock);
+
+        spin_lock(&iommu->lock);
+        amd_iommu_flush_intremap(iommu, req_id);
+        if ( alias_id != req_id )
+            amd_iommu_flush_intremap(iommu, alias_id);
+        spin_unlock(&iommu->lock);
+
+        spin_lock(lock);
+    }
+
     update_intremap_entry(iommu, entry, vector, delivery_mode, dest_mode, dest);
     spin_unlock_irqrestore(lock, flags);
 
@@ -634,16 +663,6 @@ static int update_intremap_entry_from_ms
                get_ivrs_mappings(iommu->seg)[alias_id].intremap_table);
     }
 
-done:
-    if ( iommu->enabled )
-    {
-        spin_lock_irqsave(&iommu->lock, flags);
-        amd_iommu_flush_intremap(iommu, req_id);
-        if ( alias_id != req_id )
-            amd_iommu_flush_intremap(iommu, alias_id);
-        spin_unlock_irqrestore(&iommu->lock, flags);
-    }
-
     return 0;
 }
 
AMD/IOMMU: process softirqs while dumping IRTs

When there are sufficiently many devices listed in the ACPI tables (no
matter if they actually exist), output may take way longer than the
watchdog would like.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v3: New.
---
TBD: Seeing the volume of output I wonder whether we should further
     suppress logging headers of devices which have no active entry
     (i.e. emit the header only upon finding the first IRTE worth
     logging). And while minor for the total volume of output I'm
     also unconvinced logging both a "per device" header line and a
     "shared" one makes sense, when only one of the two can actually
     be followed by actual contents.

--- a/xen/drivers/passthrough/amd/iommu_intr.c
+++ b/xen/drivers/passthrough/amd/iommu_intr.c
@@ -22,6 +22,7 @@
 #include <asm/hvm/svm/amd-iommu-proto.h>
 #include <asm/io_apic.h>
 #include <xen/keyhandler.h>
+#include <xen/softirq.h>
 
 struct irte_basic {
     bool remap_en:1;
@@ -917,6 +918,8 @@ static int dump_intremap_mapping(const s
     dump_intremap_table(iommu, ivrs_mapping->intremap_table);
     spin_unlock_irqrestore(&(ivrs_mapping->intremap_lock), flags);
 
+    process_pending_softirqs();
+
     return 0;
 }
 
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v3 01/14] AMD/IOMMU: free more memory when cleaning up after error
Posted by Jan Beulich 4 years, 9 months ago
The interrupt remapping in-use bitmaps were leaked in all cases. The
ring buffers and the mapping of the MMIO space were leaked for any IOMMU
that hadn't been enabled yet.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v3: New.

--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -1070,13 +1070,12 @@ static void __init amd_iommu_init_cleanu
      {
          list_del(&iommu->list);
          if ( iommu->enabled )
-        {
              disable_iommu(iommu);
-            deallocate_ring_buffer(&iommu->cmd_buffer);
-            deallocate_ring_buffer(&iommu->event_log);
-            deallocate_ring_buffer(&iommu->ppr_log);
-            unmap_iommu_mmio_region(iommu);
-        }
+
+        deallocate_ring_buffer(&iommu->cmd_buffer);
+        deallocate_ring_buffer(&iommu->event_log);
+        deallocate_ring_buffer(&iommu->ppr_log);
+        unmap_iommu_mmio_region(iommu);
          xfree(iommu);
      }
  
--- a/xen/drivers/passthrough/amd/iommu_intr.c
+++ b/xen/drivers/passthrough/amd/iommu_intr.c
@@ -610,6 +610,8 @@ int __init amd_iommu_free_intremap_table
  {
      void *tb = ivrs_mapping->intremap_table;
  
+    XFREE(ivrs_mapping->intremap_inuse);
+
      if ( tb )
      {
          __free_amd_iommu_tables(tb, INTREMAP_TABLE_ORDER);

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 01/14] AMD/IOMMU: free more memory when cleaning up after error
Posted by Woods, Brian 4 years, 9 months ago
On Tue, Jul 16, 2019 at 04:35:08PM +0000, Jan Beulich wrote:
> The interrupt remapping in-use bitmaps were leaked in all cases. The
> ring buffers and the mapping of the MMIO space were leaked for any IOMMU
> that hadn't been enabled yet.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Brian Woods <brian.woods@amd.com>

> ---
> v3: New.
> 
> --- a/xen/drivers/passthrough/amd/iommu_init.c
> +++ b/xen/drivers/passthrough/amd/iommu_init.c
> @@ -1070,13 +1070,12 @@ static void __init amd_iommu_init_cleanu
>       {
>           list_del(&iommu->list);
>           if ( iommu->enabled )
> -        {
>               disable_iommu(iommu);
> -            deallocate_ring_buffer(&iommu->cmd_buffer);
> -            deallocate_ring_buffer(&iommu->event_log);
> -            deallocate_ring_buffer(&iommu->ppr_log);
> -            unmap_iommu_mmio_region(iommu);
> -        }
> +
> +        deallocate_ring_buffer(&iommu->cmd_buffer);
> +        deallocate_ring_buffer(&iommu->event_log);
> +        deallocate_ring_buffer(&iommu->ppr_log);
> +        unmap_iommu_mmio_region(iommu);
>           xfree(iommu);
>       }
>   
> --- a/xen/drivers/passthrough/amd/iommu_intr.c
> +++ b/xen/drivers/passthrough/amd/iommu_intr.c
> @@ -610,6 +610,8 @@ int __init amd_iommu_free_intremap_table
>   {
>       void *tb = ivrs_mapping->intremap_table;
>   
> +    XFREE(ivrs_mapping->intremap_inuse);
> +
>       if ( tb )
>       {
>           __free_amd_iommu_tables(tb, INTREMAP_TABLE_ORDER);
> 

-- 
Brian Woods

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 01/14] AMD/IOMMU: free more memory when cleaning up after error
Posted by Andrew Cooper 4 years, 9 months ago
On 16/07/2019 17:35, Jan Beulich wrote:
> The interrupt remapping in-use bitmaps were leaked in all cases. The
> ring buffers and the mapping of the MMIO space were leaked for any IOMMU
> that hadn't been enabled yet.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v3 02/14] AMD/IOMMU: use bit field for extended feature register
Posted by Jan Beulich 4 years, 9 months ago
This also takes care of several of the shift values wrongly having been
specified as hex rather than dec.

Take the opportunity and
- replace a readl() pair by a single readq(),
- add further fields.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v3: Another attempt at deriving masks from bitfields, hopefully better
     liked by clang (mine was fine even with the v2 variant).
v2: Correct sats_sup position and name. Re-base over new earlier patch.

--- a/xen/drivers/passthrough/amd/iommu_detect.c
+++ b/xen/drivers/passthrough/amd/iommu_detect.c
@@ -60,49 +60,77 @@ static int __init get_iommu_capabilities
  
  void __init get_iommu_features(struct amd_iommu *iommu)
  {
-    u32 low, high;
-    int i = 0 ;
      const struct amd_iommu *first;
-    static const char *__initdata feature_str[] = {
-        "- Prefetch Pages Command",
-        "- Peripheral Page Service Request",
-        "- X2APIC Supported",
-        "- NX bit Supported",
-        "- Guest Translation",
-        "- Reserved bit [5]",
-        "- Invalidate All Command",
-        "- Guest APIC supported",
-        "- Hardware Error Registers",
-        "- Performance Counters",
-        NULL
-    };
-
      ASSERT( iommu->mmio_base );
  
      if ( !iommu_has_cap(iommu, PCI_CAP_EFRSUP_SHIFT) )
      {
-        iommu->features = 0;
+        iommu->features.raw = 0;
          return;
      }
  
-    low = readl(iommu->mmio_base + IOMMU_EXT_FEATURE_MMIO_OFFSET);
-    high = readl(iommu->mmio_base + IOMMU_EXT_FEATURE_MMIO_OFFSET + 4);
-
-    iommu->features = ((u64)high << 32) | low;
+    iommu->features.raw =
+        readq(iommu->mmio_base + IOMMU_EXT_FEATURE_MMIO_OFFSET);
  
      /* Don't log the same set of features over and over. */
      first = list_first_entry(&amd_iommu_head, struct amd_iommu, list);
-    if ( iommu != first && iommu->features == first->features )
+    if ( iommu != first && iommu->features.raw == first->features.raw )
          return;
  
      printk("AMD-Vi: IOMMU Extended Features:\n");
  
-    while ( feature_str[i] )
+#define FEAT(fld, str) do {                                    \
+    if ( --((union amd_iommu_ext_features){}).flds.fld > 1 )   \
+        printk( "- " str ": %#x\n", iommu->features.flds.fld); \
+    else if ( iommu->features.flds.fld )                       \
+        printk( "- " str "\n");                                \
+} while ( false )
+
+    FEAT(pref_sup,           "Prefetch Pages Command");
+    FEAT(ppr_sup,            "Peripheral Page Service Request");
+    FEAT(xt_sup,             "x2APIC");
+    FEAT(nx_sup,             "NX bit");
+    FEAT(gappi_sup,          "Guest APIC Physical Processor Interrupt");
+    FEAT(ia_sup,             "Invalidate All Command");
+    FEAT(ga_sup,             "Guest APIC");
+    FEAT(he_sup,             "Hardware Error Registers");
+    FEAT(pc_sup,             "Performance Counters");
+    FEAT(hats,               "Host Address Translation Size");
+
+    if ( iommu->features.flds.gt_sup )
      {
-        if ( amd_iommu_has_feature(iommu, i) )
-            printk( " %s\n", feature_str[i]);
-        i++;
+        FEAT(gats,           "Guest Address Translation Size");
+        FEAT(glx_sup,        "Guest CR3 Root Table Level");
+        FEAT(pas_max,        "Maximum PASID");
      }
+
+    FEAT(smif_sup,           "SMI Filter Register");
+    FEAT(smif_rc,            "SMI Filter Register Count");
+    FEAT(gam_sup,            "Guest Virtual APIC Modes");
+    FEAT(dual_ppr_log_sup,   "Dual PPR Log");
+    FEAT(dual_event_log_sup, "Dual Event Log");
+    FEAT(sats_sup,           "Secure ATS");
+    FEAT(us_sup,             "User / Supervisor Page Protection");
+    FEAT(dev_tbl_seg_sup,    "Device Table Segmentation");
+    FEAT(ppr_early_of_sup,   "PPR Log Overflow Early Warning");
+    FEAT(ppr_auto_rsp_sup,   "PPR Automatic Response");
+    FEAT(marc_sup,           "Memory Access Routing and Control");
+    FEAT(blk_stop_mrk_sup,   "Block StopMark Message");
+    FEAT(perf_opt_sup ,      "Performance Optimization");
+    FEAT(msi_cap_mmio_sup,   "MSI Capability MMIO Access");
+    FEAT(gio_sup,            "Guest I/O Protection");
+    FEAT(ha_sup,             "Host Access");
+    FEAT(eph_sup,            "Enhanced PPR Handling");
+    FEAT(attr_fw_sup,        "Attribute Forward");
+    FEAT(hd_sup,             "Host Dirty");
+    FEAT(inv_iotlb_type_sup, "Invalidate IOTLB Type");
+    FEAT(viommu_sup,         "Virtualized IOMMU");
+    FEAT(vm_guard_io_sup,    "VMGuard I/O Support");
+    FEAT(vm_table_size,      "VM Table Size");
+    FEAT(ga_update_dis_sup,  "Guest Access Bit Update Disable");
+
+#undef FEAT
+#undef MASK
  }
  
  int __init amd_iommu_detect_one_acpi(
--- a/xen/drivers/passthrough/amd/iommu_guest.c
+++ b/xen/drivers/passthrough/amd/iommu_guest.c
@@ -638,7 +638,7 @@ static uint64_t iommu_mmio_read64(struct
          val = reg_to_u64(iommu->reg_status);
          break;
      case IOMMU_EXT_FEATURE_MMIO_OFFSET:
-        val = reg_to_u64(iommu->reg_ext_feature);
+        val = iommu->reg_ext_feature.raw;
          break;
  
      default:
@@ -802,39 +802,26 @@ int guest_iommu_set_base(struct domain *
  /* Initialize mmio read only bits */
  static void guest_iommu_reg_init(struct guest_iommu *iommu)
  {
-    uint32_t lower, upper;
+    union amd_iommu_ext_features ef = {
+        /* Support prefetch */
+        .flds.pref_sup = 1,
+        /* Support PPR log */
+        .flds.ppr_sup = 1,
+        /* Support guest translation */
+        .flds.gt_sup = 1,
+        /* Support invalidate all command */
+        .flds.ia_sup = 1,
+        /* Host translation size has 6 levels */
+        .flds.hats = HOST_ADDRESS_SIZE_6_LEVEL,
+        /* Guest translation size has 6 levels */
+        .flds.gats = GUEST_ADDRESS_SIZE_6_LEVEL,
+        /* Single level gCR3 */
+        .flds.glx_sup = GUEST_CR3_1_LEVEL,
+        /* 9 bit PASID */
+        .flds.pas_max = PASMAX_9_bit,
+    };
  
-    lower = upper = 0;
-    /* Support prefetch */
-    iommu_set_bit(&lower,IOMMU_EXT_FEATURE_PREFSUP_SHIFT);
-    /* Support PPR log */
-    iommu_set_bit(&lower,IOMMU_EXT_FEATURE_PPRSUP_SHIFT);
-    /* Support guest translation */
-    iommu_set_bit(&lower,IOMMU_EXT_FEATURE_GTSUP_SHIFT);
-    /* Support invalidate all command */
-    iommu_set_bit(&lower,IOMMU_EXT_FEATURE_IASUP_SHIFT);
-
-    /* Host translation size has 6 levels */
-    set_field_in_reg_u32(HOST_ADDRESS_SIZE_6_LEVEL, lower,
-                         IOMMU_EXT_FEATURE_HATS_MASK,
-                         IOMMU_EXT_FEATURE_HATS_SHIFT,
-                         &lower);
-    /* Guest translation size has 6 levels */
-    set_field_in_reg_u32(GUEST_ADDRESS_SIZE_6_LEVEL, lower,
-                         IOMMU_EXT_FEATURE_GATS_MASK,
-                         IOMMU_EXT_FEATURE_GATS_SHIFT,
-                         &lower);
-    /* Single level gCR3 */
-    set_field_in_reg_u32(GUEST_CR3_1_LEVEL, lower,
-                         IOMMU_EXT_FEATURE_GLXSUP_MASK,
-                         IOMMU_EXT_FEATURE_GLXSUP_SHIFT, &lower);
-    /* 9 bit PASID */
-    set_field_in_reg_u32(PASMAX_9_bit, upper,
-                         IOMMU_EXT_FEATURE_PASMAX_MASK,
-                         IOMMU_EXT_FEATURE_PASMAX_SHIFT, &upper);
-
-    iommu->reg_ext_feature.lo = lower;
-    iommu->reg_ext_feature.hi = upper;
+    iommu->reg_ext_feature = ef;
  }
  
  static int guest_iommu_mmio_range(struct vcpu *v, unsigned long addr)
--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -883,7 +883,7 @@ static void enable_iommu(struct amd_iomm
      register_iommu_event_log_in_mmio_space(iommu);
      register_iommu_exclusion_range(iommu);
  
-    if ( amd_iommu_has_feature(iommu, IOMMU_EXT_FEATURE_PPRSUP_SHIFT) )
+    if ( iommu->features.flds.ppr_sup )
          register_iommu_ppr_log_in_mmio_space(iommu);
  
      desc = irq_to_desc(iommu->msi.irq);
@@ -897,15 +897,15 @@ static void enable_iommu(struct amd_iomm
      set_iommu_command_buffer_control(iommu, IOMMU_CONTROL_ENABLED);
      set_iommu_event_log_control(iommu, IOMMU_CONTROL_ENABLED);
  
-    if ( amd_iommu_has_feature(iommu, IOMMU_EXT_FEATURE_PPRSUP_SHIFT) )
+    if ( iommu->features.flds.ppr_sup )
          set_iommu_ppr_log_control(iommu, IOMMU_CONTROL_ENABLED);
  
-    if ( amd_iommu_has_feature(iommu, IOMMU_EXT_FEATURE_GTSUP_SHIFT) )
+    if ( iommu->features.flds.gt_sup )
          set_iommu_guest_translation_control(iommu, IOMMU_CONTROL_ENABLED);
  
      set_iommu_translation_control(iommu, IOMMU_CONTROL_ENABLED);
  
-    if ( amd_iommu_has_feature(iommu, IOMMU_EXT_FEATURE_IASUP_SHIFT) )
+    if ( iommu->features.flds.ia_sup )
          amd_iommu_flush_all_caches(iommu);
  
      iommu->enabled = 1;
@@ -928,10 +928,10 @@ static void disable_iommu(struct amd_iom
      set_iommu_command_buffer_control(iommu, IOMMU_CONTROL_DISABLED);
      set_iommu_event_log_control(iommu, IOMMU_CONTROL_DISABLED);
  
-    if ( amd_iommu_has_feature(iommu, IOMMU_EXT_FEATURE_PPRSUP_SHIFT) )
+    if ( iommu->features.flds.ppr_sup )
          set_iommu_ppr_log_control(iommu, IOMMU_CONTROL_DISABLED);
  
-    if ( amd_iommu_has_feature(iommu, IOMMU_EXT_FEATURE_GTSUP_SHIFT) )
+    if ( iommu->features.flds.gt_sup )
          set_iommu_guest_translation_control(iommu, IOMMU_CONTROL_DISABLED);
  
      set_iommu_translation_control(iommu, IOMMU_CONTROL_DISABLED);
@@ -1027,7 +1027,7 @@ static int __init amd_iommu_init_one(str
  
      get_iommu_features(iommu);
  
-    if ( iommu->features )
+    if ( iommu->features.raw )
          iommuv2_enabled = 1;
  
      if ( allocate_cmd_buffer(iommu) == NULL )
@@ -1036,9 +1036,8 @@ static int __init amd_iommu_init_one(str
      if ( allocate_event_log(iommu) == NULL )
          goto error_out;
  
-    if ( amd_iommu_has_feature(iommu, IOMMU_EXT_FEATURE_PPRSUP_SHIFT) )
-        if ( allocate_ppr_log(iommu) == NULL )
-            goto error_out;
+    if ( iommu->features.flds.ppr_sup && !allocate_ppr_log(iommu) )
+        goto error_out;
  
      if ( !set_iommu_interrupt_handler(iommu) )
          goto error_out;
@@ -1388,7 +1387,7 @@ void amd_iommu_resume(void)
      }
  
      /* flush all cache entries after iommu re-enabled */
-    if ( !amd_iommu_has_feature(iommu, IOMMU_EXT_FEATURE_IASUP_SHIFT) )
+    if ( !iommu->features.flds.ia_sup )
      {
          invalidate_all_devices();
          invalidate_all_domain_pages();
--- a/xen/include/asm-x86/amd-iommu.h
+++ b/xen/include/asm-x86/amd-iommu.h
@@ -83,7 +83,7 @@ struct amd_iommu {
      iommu_cap_t cap;
  
      u8 ht_flags;
-    u64 features;
+    union amd_iommu_ext_features features;
  
      void *mmio_base;
      unsigned long mmio_base_phys;
@@ -174,7 +174,7 @@ struct guest_iommu {
      /* MMIO regs */
      struct mmio_reg         reg_ctrl;              /* MMIO offset 0018h */
      struct mmio_reg         reg_status;            /* MMIO offset 2020h */
-    struct mmio_reg         reg_ext_feature;       /* MMIO offset 0030h */
+    union amd_iommu_ext_features reg_ext_feature;  /* MMIO offset 0030h */
  
      /* guest interrupt settings */
      struct guest_iommu_msi  msi;
--- a/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
+++ b/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
@@ -346,26 +346,57 @@ struct amd_iommu_dte {
  #define IOMMU_EXCLUSION_LIMIT_HIGH_MASK		0xFFFFFFFF
  #define IOMMU_EXCLUSION_LIMIT_HIGH_SHIFT	0
  
-/* Extended Feature Register*/
+/* Extended Feature Register */
  #define IOMMU_EXT_FEATURE_MMIO_OFFSET                   0x30
-#define IOMMU_EXT_FEATURE_PREFSUP_SHIFT                 0x0
-#define IOMMU_EXT_FEATURE_PPRSUP_SHIFT                  0x1
-#define IOMMU_EXT_FEATURE_XTSUP_SHIFT                   0x2
-#define IOMMU_EXT_FEATURE_NXSUP_SHIFT                   0x3
-#define IOMMU_EXT_FEATURE_GTSUP_SHIFT                   0x4
-#define IOMMU_EXT_FEATURE_IASUP_SHIFT                   0x6
-#define IOMMU_EXT_FEATURE_GASUP_SHIFT                   0x7
-#define IOMMU_EXT_FEATURE_HESUP_SHIFT                   0x8
-#define IOMMU_EXT_FEATURE_PCSUP_SHIFT                   0x9
-#define IOMMU_EXT_FEATURE_HATS_SHIFT                    0x10
-#define IOMMU_EXT_FEATURE_HATS_MASK                     0x00000C00
-#define IOMMU_EXT_FEATURE_GATS_SHIFT                    0x12
-#define IOMMU_EXT_FEATURE_GATS_MASK                     0x00003000
-#define IOMMU_EXT_FEATURE_GLXSUP_SHIFT                  0x14
-#define IOMMU_EXT_FEATURE_GLXSUP_MASK                   0x0000C000
  
-#define IOMMU_EXT_FEATURE_PASMAX_SHIFT                  0x0
-#define IOMMU_EXT_FEATURE_PASMAX_MASK                   0x0000001F
+union amd_iommu_ext_features {
+    uint64_t raw;
+    struct {
+        unsigned int pref_sup:1;
+        unsigned int ppr_sup:1;
+        unsigned int xt_sup:1;
+        unsigned int nx_sup:1;
+        unsigned int gt_sup:1;
+        unsigned int gappi_sup:1;
+        unsigned int ia_sup:1;
+        unsigned int ga_sup:1;
+        unsigned int he_sup:1;
+        unsigned int pc_sup:1;
+        unsigned int hats:2;
+        unsigned int gats:2;
+        unsigned int glx_sup:2;
+        unsigned int smif_sup:2;
+        unsigned int smif_rc:3;
+        unsigned int gam_sup:3;
+        unsigned int dual_ppr_log_sup:2;
+        unsigned int :2;
+        unsigned int dual_event_log_sup:2;
+        unsigned int :1;
+        unsigned int sats_sup:1;
+        unsigned int pas_max:5;
+        unsigned int us_sup:1;
+        unsigned int dev_tbl_seg_sup:2;
+        unsigned int ppr_early_of_sup:1;
+        unsigned int ppr_auto_rsp_sup:1;
+        unsigned int marc_sup:2;
+        unsigned int blk_stop_mrk_sup:1;
+        unsigned int perf_opt_sup:1;
+        unsigned int msi_cap_mmio_sup:1;
+        unsigned int :1;
+        unsigned int gio_sup:1;
+        unsigned int ha_sup:1;
+        unsigned int eph_sup:1;
+        unsigned int attr_fw_sup:1;
+        unsigned int hd_sup:1;
+        unsigned int :1;
+        unsigned int inv_iotlb_type_sup:1;
+        unsigned int viommu_sup:1;
+        unsigned int vm_guard_io_sup:1;
+        unsigned int vm_table_size:4;
+        unsigned int ga_update_dis_sup:1;
+        unsigned int :2;
+    } flds;
+};
  
  /* Status Register*/
  #define IOMMU_STATUS_MMIO_OFFSET		0x2020
--- a/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
+++ b/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
@@ -219,13 +219,6 @@ static inline int iommu_has_cap(struct a
      return !!(iommu->cap.header & (1u << bit));
  }
  
-static inline int amd_iommu_has_feature(struct amd_iommu *iommu, uint32_t bit)
-{
-    if ( !iommu_has_cap(iommu, PCI_CAP_EFRSUP_SHIFT) )
-        return 0;
-    return !!(iommu->features & (1U << bit));
-}
-
  /* access tail or head pointer of ring buffer */
  static inline uint32_t iommu_get_rb_pointer(uint32_t reg)
  {

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 02/14] AMD/IOMMU: use bit field for extended feature register
Posted by Andrew Cooper 4 years, 9 months ago
On 16/07/2019 17:35, Jan Beulich wrote:
> This also takes care of several of the shift values wrongly having been
> specified as hex rather than dec.
>
> Take the opportunity and
> - replace a readl() pair by a single readq(),
> - add further fields.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

CI is happy this time around.

https://gitlab.com/xen-project/people/andyhhp/xen/pipelines/71942193

Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

but as a warning, I'm still certain that FEAT() is a fragile, and will
be liable to break on future compilers, seeing as that seem to be the
trend for diagnostics.

I'm also unsure whether it works correctly on signed fields.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 02/14] AMD/IOMMU: use bit field for extended feature register
Posted by Jan Beulich 4 years, 9 months ago
On 19.07.2019 18:23, Andrew Cooper wrote:
> On 16/07/2019 17:35, Jan Beulich wrote:
>> This also takes care of several of the shift values wrongly having been
>> specified as hex rather than dec.
>>
>> Take the opportunity and
>> - replace a readl() pair by a single readq(),
>> - add further fields.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> CI is happy this time around.
> 
> https://gitlab.com/xen-project/people/andyhhp/xen/pipelines/71942193

Hurray.

> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

Thanks.

> but as a warning, I'm still certain that FEAT() is a fragile, and will
> be liable to break on future compilers, seeing as that seem to be the
> trend for diagnostics.

There's a certain risk, yes.

> I'm also unsure whether it works correctly on signed fields.

I'm sure it wouldn't, but I don't see any signed fields appearing
there.

Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 02/14] AMD/IOMMU: use bit field for extended feature register
Posted by Jan Beulich 4 years, 9 months ago
On 16.07.2019 18:35, Jan Beulich wrote:
> --- a/xen/drivers/passthrough/amd/iommu_detect.c
> +++ b/xen/drivers/passthrough/amd/iommu_detect.c
> @@ -60,49 +60,77 @@ static int __init get_iommu_capabilities
>    
>    void __init get_iommu_features(struct amd_iommu *iommu)
>    {
> -    u32 low, high;
> -    int i = 0 ;
>        const struct amd_iommu *first;
> -    static const char *__initdata feature_str[] = {
> -        "- Prefetch Pages Command",
> -        "- Peripheral Page Service Request",
> -        "- X2APIC Supported",
> -        "- NX bit Supported",
> -        "- Guest Translation",
> -        "- Reserved bit [5]",
> -        "- Invalidate All Command",
> -        "- Guest APIC supported",
> -        "- Hardware Error Registers",
> -        "- Performance Counters",
> -        NULL
> -    };
> -
>        ASSERT( iommu->mmio_base );
>    
>        if ( !iommu_has_cap(iommu, PCI_CAP_EFRSUP_SHIFT) )
>        {
> -        iommu->features = 0;
> +        iommu->features.raw = 0;
>            return;
>        }
>    
> -    low = readl(iommu->mmio_base + IOMMU_EXT_FEATURE_MMIO_OFFSET);
> -    high = readl(iommu->mmio_base + IOMMU_EXT_FEATURE_MMIO_OFFSET + 4);
> -
> -    iommu->features = ((u64)high << 32) | low;
> +    iommu->features.raw =
> +        readq(iommu->mmio_base + IOMMU_EXT_FEATURE_MMIO_OFFSET);
>    
>        /* Don't log the same set of features over and over. */
>        first = list_first_entry(&amd_iommu_head, struct amd_iommu, list);
> -    if ( iommu != first && iommu->features == first->features )
> +    if ( iommu != first && iommu->features.raw == first->features.raw )
>            return;
>    
>        printk("AMD-Vi: IOMMU Extended Features:\n");
>    
> -    while ( feature_str[i] )
> +#define FEAT(fld, str) do {                                    \
> +    if ( --((union amd_iommu_ext_features){}).flds.fld > 1 )   \
> +        printk( "- " str ": %#x\n", iommu->features.flds.fld); \
> +    else if ( iommu->features.flds.fld )                       \
> +        printk( "- " str "\n");                                \
> +} while ( false )
> +
> +    FEAT(pref_sup,           "Prefetch Pages Command");
> +    FEAT(ppr_sup,            "Peripheral Page Service Request");
> +    FEAT(xt_sup,             "x2APIC");
> +    FEAT(nx_sup,             "NX bit");
> +    FEAT(gappi_sup,          "Guest APIC Physical Processor Interrupt");
> +    FEAT(ia_sup,             "Invalidate All Command");
> +    FEAT(ga_sup,             "Guest APIC");
> +    FEAT(he_sup,             "Hardware Error Registers");
> +    FEAT(pc_sup,             "Performance Counters");
> +    FEAT(hats,               "Host Address Translation Size");
> +
> +    if ( iommu->features.flds.gt_sup )
>        {
> -        if ( amd_iommu_has_feature(iommu, i) )
> -            printk( " %s\n", feature_str[i]);
> -        i++;
> +        FEAT(gats,           "Guest Address Translation Size");
> +        FEAT(glx_sup,        "Guest CR3 Root Table Level");
> +        FEAT(pas_max,        "Maximum PASID");
>        }
> +
> +    FEAT(smif_sup,           "SMI Filter Register");
> +    FEAT(smif_rc,            "SMI Filter Register Count");
> +    FEAT(gam_sup,            "Guest Virtual APIC Modes");
> +    FEAT(dual_ppr_log_sup,   "Dual PPR Log");
> +    FEAT(dual_event_log_sup, "Dual Event Log");
> +    FEAT(sats_sup,           "Secure ATS");
> +    FEAT(us_sup,             "User / Supervisor Page Protection");
> +    FEAT(dev_tbl_seg_sup,    "Device Table Segmentation");
> +    FEAT(ppr_early_of_sup,   "PPR Log Overflow Early Warning");
> +    FEAT(ppr_auto_rsp_sup,   "PPR Automatic Response");
> +    FEAT(marc_sup,           "Memory Access Routing and Control");
> +    FEAT(blk_stop_mrk_sup,   "Block StopMark Message");
> +    FEAT(perf_opt_sup ,      "Performance Optimization");
> +    FEAT(msi_cap_mmio_sup,   "MSI Capability MMIO Access");
> +    FEAT(gio_sup,            "Guest I/O Protection");
> +    FEAT(ha_sup,             "Host Access");
> +    FEAT(eph_sup,            "Enhanced PPR Handling");
> +    FEAT(attr_fw_sup,        "Attribute Forward");
> +    FEAT(hd_sup,             "Host Dirty");
> +    FEAT(inv_iotlb_type_sup, "Invalidate IOTLB Type");
> +    FEAT(viommu_sup,         "Virtualized IOMMU");
> +    FEAT(vm_guard_io_sup,    "VMGuard I/O Support");
> +    FEAT(vm_table_size,      "VM Table Size");
> +    FEAT(ga_update_dis_sup,  "Guest Access Bit Update Disable");
> +
> +#undef FEAT
> +#undef MASK
>    }

Just realized that I had left in place here a no longer needed #undef.
Now dropped.

Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v3 03/14] AMD/IOMMU: use bit field for control register
Posted by Jan Beulich 4 years, 9 months ago
Also introduce a field in struct amd_iommu caching the most recently
written control register. All writes should now happen exclusively from
that cached value, such that it is guaranteed to be up to date.

Take the opportunity and add further fields. Also convert a few boolean
function parameters to bool, such that use of !! can be avoided.

Because of there now being definitions beyond bit 31, writel() also gets
replaced by writeq() when updating hardware.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
v3: Switch boolean bitfields to bool.
v2: Add domain_id_pne field. Mention writel() -> writeq() change.

--- a/xen/drivers/passthrough/amd/iommu_guest.c
+++ b/xen/drivers/passthrough/amd/iommu_guest.c
@@ -317,7 +317,7 @@ static int do_invalidate_iotlb_pages(str
  
  static int do_completion_wait(struct domain *d, cmd_entry_t *cmd)
  {
-    bool_t com_wait_int_en, com_wait_int, i, s;
+    bool com_wait_int, i, s;
      struct guest_iommu *iommu;
      unsigned long gfn;
      p2m_type_t p2mt;
@@ -354,12 +354,10 @@ static int do_completion_wait(struct dom
          unmap_domain_page(vaddr);
      }
  
-    com_wait_int_en = iommu_get_bit(iommu->reg_ctrl.lo,
-                                    IOMMU_CONTROL_COMP_WAIT_INT_SHIFT);
      com_wait_int = iommu_get_bit(iommu->reg_status.lo,
                                   IOMMU_STATUS_COMP_WAIT_INT_SHIFT);
  
-    if ( com_wait_int_en && com_wait_int )
+    if ( iommu->reg_ctrl.com_wait_int_en && com_wait_int )
          guest_iommu_deliver_msi(d);
  
      return 0;
@@ -521,40 +519,17 @@ static void guest_iommu_process_command(
      return;
  }
  
-static int guest_iommu_write_ctrl(struct guest_iommu *iommu, uint64_t newctrl)
+static int guest_iommu_write_ctrl(struct guest_iommu *iommu, uint64_t val)
  {
-    bool_t cmd_en, event_en, iommu_en, ppr_en, ppr_log_en;
-    bool_t cmd_en_old, event_en_old, iommu_en_old;
-    bool_t cmd_run;
-
-    iommu_en = iommu_get_bit(newctrl,
-                             IOMMU_CONTROL_TRANSLATION_ENABLE_SHIFT);
-    iommu_en_old = iommu_get_bit(iommu->reg_ctrl.lo,
-                                 IOMMU_CONTROL_TRANSLATION_ENABLE_SHIFT);
-
-    cmd_en = iommu_get_bit(newctrl,
-                           IOMMU_CONTROL_COMMAND_BUFFER_ENABLE_SHIFT);
-    cmd_en_old = iommu_get_bit(iommu->reg_ctrl.lo,
-                               IOMMU_CONTROL_COMMAND_BUFFER_ENABLE_SHIFT);
-    cmd_run = iommu_get_bit(iommu->reg_status.lo,
-                            IOMMU_STATUS_CMD_BUFFER_RUN_SHIFT);
-    event_en = iommu_get_bit(newctrl,
-                             IOMMU_CONTROL_EVENT_LOG_ENABLE_SHIFT);
-    event_en_old = iommu_get_bit(iommu->reg_ctrl.lo,
-                                 IOMMU_CONTROL_EVENT_LOG_ENABLE_SHIFT);
-
-    ppr_en = iommu_get_bit(newctrl,
-                           IOMMU_CONTROL_PPR_ENABLE_SHIFT);
-    ppr_log_en = iommu_get_bit(newctrl,
-                               IOMMU_CONTROL_PPR_LOG_ENABLE_SHIFT);
+    union amd_iommu_control newctrl = { .raw = val };
  
-    if ( iommu_en )
+    if ( newctrl.iommu_en )
      {
          guest_iommu_enable(iommu);
          guest_iommu_enable_dev_table(iommu);
      }
  
-    if ( iommu_en && cmd_en )
+    if ( newctrl.iommu_en && newctrl.cmd_buf_en )
      {
          guest_iommu_enable_ring_buffer(iommu, &iommu->cmd_buffer,
                                         sizeof(cmd_entry_t));
@@ -562,7 +537,7 @@ static int guest_iommu_write_ctrl(struct
          tasklet_schedule(&iommu->cmd_buffer_tasklet);
      }
  
-    if ( iommu_en && event_en )
+    if ( newctrl.iommu_en && newctrl.event_log_en )
      {
          guest_iommu_enable_ring_buffer(iommu, &iommu->event_log,
                                         sizeof(event_entry_t));
@@ -570,7 +545,7 @@ static int guest_iommu_write_ctrl(struct
          guest_iommu_clear_status(iommu, IOMMU_STATUS_EVENT_OVERFLOW_SHIFT);
      }
  
-    if ( iommu_en && ppr_en && ppr_log_en )
+    if ( newctrl.iommu_en && newctrl.ppr_en && newctrl.ppr_log_en )
      {
          guest_iommu_enable_ring_buffer(iommu, &iommu->ppr_log,
                                         sizeof(ppr_entry_t));
@@ -578,19 +553,21 @@ static int guest_iommu_write_ctrl(struct
          guest_iommu_clear_status(iommu, IOMMU_STATUS_PPR_LOG_OVERFLOW_SHIFT);
      }
  
-    if ( iommu_en && cmd_en_old && !cmd_en )
+    if ( newctrl.iommu_en && iommu->reg_ctrl.cmd_buf_en &&
+         !newctrl.cmd_buf_en )
      {
          /* Disable iommu command processing */
          tasklet_kill(&iommu->cmd_buffer_tasklet);
      }
  
-    if ( event_en_old && !event_en )
+    if ( iommu->reg_ctrl.event_log_en && !newctrl.event_log_en )
          guest_iommu_clear_status(iommu, IOMMU_STATUS_EVENT_LOG_RUN_SHIFT);
  
-    if ( iommu_en_old && !iommu_en )
+    if ( iommu->reg_ctrl.iommu_en && !newctrl.iommu_en )
          guest_iommu_disable(iommu);
  
-    u64_to_reg(&iommu->reg_ctrl, newctrl);
+    iommu->reg_ctrl = newctrl;
+
      return 0;
  }
  
@@ -632,7 +609,7 @@ static uint64_t iommu_mmio_read64(struct
          val = reg_to_u64(iommu->ppr_log.reg_tail);
          break;
      case IOMMU_CONTROL_MMIO_OFFSET:
-        val = reg_to_u64(iommu->reg_ctrl);
+        val = iommu->reg_ctrl.raw;
          break;
      case IOMMU_STATUS_MMIO_OFFSET:
          val = reg_to_u64(iommu->reg_status);
--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -41,7 +41,7 @@ LIST_HEAD_READ_MOSTLY(amd_iommu_head);
  struct table_struct device_table;
  bool_t iommuv2_enabled;
  
-static int iommu_has_ht_flag(struct amd_iommu *iommu, u8 mask)
+static bool iommu_has_ht_flag(struct amd_iommu *iommu, u8 mask)
  {
      return iommu->ht_flags & mask;
  }
@@ -69,31 +69,18 @@ static void __init unmap_iommu_mmio_regi
  
  static void set_iommu_ht_flags(struct amd_iommu *iommu)
  {
-    u32 entry;
-    entry = readl(iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
-
      /* Setup HT flags */
      if ( iommu_has_cap(iommu, PCI_CAP_HT_TUNNEL_SHIFT) )
-        iommu_has_ht_flag(iommu, ACPI_IVHD_TT_ENABLE) ?
-            iommu_set_bit(&entry, IOMMU_CONTROL_HT_TUNNEL_TRANSLATION_SHIFT) :
-            iommu_clear_bit(&entry, IOMMU_CONTROL_HT_TUNNEL_TRANSLATION_SHIFT);
-
-    iommu_has_ht_flag(iommu, ACPI_IVHD_RES_PASS_PW) ?
-        iommu_set_bit(&entry, IOMMU_CONTROL_RESP_PASS_POSTED_WRITE_SHIFT):
-        iommu_clear_bit(&entry, IOMMU_CONTROL_RESP_PASS_POSTED_WRITE_SHIFT);
-
-    iommu_has_ht_flag(iommu, ACPI_IVHD_ISOC) ?
-        iommu_set_bit(&entry, IOMMU_CONTROL_ISOCHRONOUS_SHIFT):
-        iommu_clear_bit(&entry, IOMMU_CONTROL_ISOCHRONOUS_SHIFT);
-
-    iommu_has_ht_flag(iommu, ACPI_IVHD_PASS_PW) ?
-        iommu_set_bit(&entry, IOMMU_CONTROL_PASS_POSTED_WRITE_SHIFT):
-        iommu_clear_bit(&entry, IOMMU_CONTROL_PASS_POSTED_WRITE_SHIFT);
+        iommu->ctrl.ht_tun_en = iommu_has_ht_flag(iommu, ACPI_IVHD_TT_ENABLE);
+
+    iommu->ctrl.pass_pw     = iommu_has_ht_flag(iommu, ACPI_IVHD_PASS_PW);
+    iommu->ctrl.res_pass_pw = iommu_has_ht_flag(iommu, ACPI_IVHD_RES_PASS_PW);
+    iommu->ctrl.isoc        = iommu_has_ht_flag(iommu, ACPI_IVHD_ISOC);
  
      /* Force coherent */
-    iommu_set_bit(&entry, IOMMU_CONTROL_COHERENT_SHIFT);
+    iommu->ctrl.coherent = true;
  
-    writel(entry, iommu->mmio_base+IOMMU_CONTROL_MMIO_OFFSET);
+    writeq(iommu->ctrl.raw, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
  }
  
  static void register_iommu_dev_table_in_mmio_space(struct amd_iommu *iommu)
@@ -205,55 +192,37 @@ static void register_iommu_ppr_log_in_mm
  
  
  static void set_iommu_translation_control(struct amd_iommu *iommu,
-                                                 int enable)
+                                          bool enable)
  {
-    u32 entry;
+    iommu->ctrl.iommu_en = enable;
  
-    entry = readl(iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
-
-    enable ?
-        iommu_set_bit(&entry, IOMMU_CONTROL_TRANSLATION_ENABLE_SHIFT) :
-        iommu_clear_bit(&entry, IOMMU_CONTROL_TRANSLATION_ENABLE_SHIFT);
-
-    writel(entry, iommu->mmio_base+IOMMU_CONTROL_MMIO_OFFSET);
+    writeq(iommu->ctrl.raw, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
  }
  
  static void set_iommu_guest_translation_control(struct amd_iommu *iommu,
-                                                int enable)
+                                                bool enable)
  {
-    u32 entry;
-
-    entry = readl(iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
+    iommu->ctrl.gt_en = enable;
  
-    enable ?
-        iommu_set_bit(&entry, IOMMU_CONTROL_GT_ENABLE_SHIFT) :
-        iommu_clear_bit(&entry, IOMMU_CONTROL_GT_ENABLE_SHIFT);
-
-    writel(entry, iommu->mmio_base+IOMMU_CONTROL_MMIO_OFFSET);
+    writeq(iommu->ctrl.raw, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
  
      if ( enable )
          AMD_IOMMU_DEBUG("Guest Translation Enabled.\n");
  }
  
  static void set_iommu_command_buffer_control(struct amd_iommu *iommu,
-                                                    int enable)
+                                             bool enable)
  {
-    u32 entry;
-
-    entry = readl(iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
-
-    /*reset head and tail pointer manually before enablement */
+    /* Reset head and tail pointer manually before enablement */
      if ( enable )
      {
          writeq(0, iommu->mmio_base + IOMMU_CMD_BUFFER_HEAD_OFFSET);
          writeq(0, iommu->mmio_base + IOMMU_CMD_BUFFER_TAIL_OFFSET);
-
-        iommu_set_bit(&entry, IOMMU_CONTROL_COMMAND_BUFFER_ENABLE_SHIFT);
      }
-    else
-        iommu_clear_bit(&entry, IOMMU_CONTROL_COMMAND_BUFFER_ENABLE_SHIFT);
  
-    writel(entry, iommu->mmio_base+IOMMU_CONTROL_MMIO_OFFSET);
+    iommu->ctrl.cmd_buf_en = enable;
+
+    writeq(iommu->ctrl.raw, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
  }
  
  static void register_iommu_exclusion_range(struct amd_iommu *iommu)
@@ -295,57 +264,38 @@ static void register_iommu_exclusion_ran
  }
  
  static void set_iommu_event_log_control(struct amd_iommu *iommu,
-            int enable)
+                                        bool enable)
  {
-    u32 entry;
-
-    entry = readl(iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
-
-    /*reset head and tail pointer manually before enablement */
+    /* Reset head and tail pointer manually before enablement */
      if ( enable )
      {
          writeq(0, iommu->mmio_base + IOMMU_EVENT_LOG_HEAD_OFFSET);
          writeq(0, iommu->mmio_base + IOMMU_EVENT_LOG_TAIL_OFFSET);
-
-        iommu_set_bit(&entry, IOMMU_CONTROL_EVENT_LOG_INT_SHIFT);
-        iommu_set_bit(&entry, IOMMU_CONTROL_EVENT_LOG_ENABLE_SHIFT);
-    }
-    else
-    {
-        iommu_clear_bit(&entry, IOMMU_CONTROL_EVENT_LOG_INT_SHIFT);
-        iommu_clear_bit(&entry, IOMMU_CONTROL_EVENT_LOG_ENABLE_SHIFT);
      }
  
-    iommu_clear_bit(&entry, IOMMU_CONTROL_COMP_WAIT_INT_SHIFT);
+    iommu->ctrl.event_int_en = enable;
+    iommu->ctrl.event_log_en = enable;
+    iommu->ctrl.com_wait_int_en = false;
  
-    writel(entry, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
+    writeq(iommu->ctrl.raw, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
  }
  
  static void set_iommu_ppr_log_control(struct amd_iommu *iommu,
-                                      int enable)
+                                      bool enable)
  {
-    u32 entry;
-
-    entry = readl(iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
-
-    /*reset head and tail pointer manually before enablement */
+    /* Reset head and tail pointer manually before enablement */
      if ( enable )
      {
          writeq(0, iommu->mmio_base + IOMMU_PPR_LOG_HEAD_OFFSET);
          writeq(0, iommu->mmio_base + IOMMU_PPR_LOG_TAIL_OFFSET);
-
-        iommu_set_bit(&entry, IOMMU_CONTROL_PPR_ENABLE_SHIFT);
-        iommu_set_bit(&entry, IOMMU_CONTROL_PPR_LOG_INT_SHIFT);
-        iommu_set_bit(&entry, IOMMU_CONTROL_PPR_LOG_ENABLE_SHIFT);
-    }
-    else
-    {
-        iommu_clear_bit(&entry, IOMMU_CONTROL_PPR_ENABLE_SHIFT);
-        iommu_clear_bit(&entry, IOMMU_CONTROL_PPR_LOG_INT_SHIFT);
-        iommu_clear_bit(&entry, IOMMU_CONTROL_PPR_LOG_ENABLE_SHIFT);
      }
  
-    writel(entry, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
+    iommu->ctrl.ppr_en = enable;
+    iommu->ctrl.ppr_int_en = enable;
+    iommu->ctrl.ppr_log_en = enable;
+
+    writeq(iommu->ctrl.raw, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
+
      if ( enable )
          AMD_IOMMU_DEBUG("PPR Log Enabled.\n");
  }
@@ -398,7 +348,7 @@ static int iommu_read_log(struct amd_iom
  /* reset event log or ppr log when overflow */
  static void iommu_reset_log(struct amd_iommu *iommu,
                              struct ring_buffer *log,
-                            void (*ctrl_func)(struct amd_iommu *iommu, int))
+                            void (*ctrl_func)(struct amd_iommu *iommu, bool))
  {
      u32 entry;
      int log_run, run_bit;
@@ -615,11 +565,11 @@ static void iommu_check_event_log(struct
          iommu_reset_log(iommu, &iommu->event_log, set_iommu_event_log_control);
      else
      {
-        entry = readl(iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
-        if ( !(entry & IOMMU_CONTROL_EVENT_LOG_INT_MASK) )
+        if ( !iommu->ctrl.event_int_en )
          {
-            entry |= IOMMU_CONTROL_EVENT_LOG_INT_MASK;
-            writel(entry, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
+            iommu->ctrl.event_int_en = true;
+            writeq(iommu->ctrl.raw,
+                   iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
              /*
               * Re-schedule the tasklet to handle eventual log entries added
               * between reading the log above and re-enabling the interrupt.
@@ -704,11 +654,11 @@ static void iommu_check_ppr_log(struct a
          iommu_reset_log(iommu, &iommu->ppr_log, set_iommu_ppr_log_control);
      else
      {
-        entry = readl(iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
-        if ( !(entry & IOMMU_CONTROL_PPR_LOG_INT_MASK) )
+        if ( !iommu->ctrl.ppr_int_en )
          {
-            entry |= IOMMU_CONTROL_PPR_LOG_INT_MASK;
-            writel(entry, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
+            iommu->ctrl.ppr_int_en = true;
+            writeq(iommu->ctrl.raw,
+                   iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
              /*
               * Re-schedule the tasklet to handle eventual log entries added
               * between reading the log above and re-enabling the interrupt.
@@ -754,7 +704,6 @@ static void do_amd_iommu_irq(unsigned lo
  static void iommu_interrupt_handler(int irq, void *dev_id,
                                      struct cpu_user_regs *regs)
  {
-    u32 entry;
      unsigned long flags;
      struct amd_iommu *iommu = dev_id;
  
@@ -764,10 +713,9 @@ static void iommu_interrupt_handler(int
       * Silence interrupts from both event and PPR by clearing the
       * enable logging bits in the control register
       */
-    entry = readl(iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
-    iommu_clear_bit(&entry, IOMMU_CONTROL_EVENT_LOG_INT_SHIFT);
-    iommu_clear_bit(&entry, IOMMU_CONTROL_PPR_LOG_INT_SHIFT);
-    writel(entry, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
+    iommu->ctrl.event_int_en = false;
+    iommu->ctrl.ppr_int_en = false;
+    writeq(iommu->ctrl.raw, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
  
      spin_unlock_irqrestore(&iommu->lock, flags);
  
--- a/xen/include/asm-x86/amd-iommu.h
+++ b/xen/include/asm-x86/amd-iommu.h
@@ -88,6 +88,8 @@ struct amd_iommu {
      void *mmio_base;
      unsigned long mmio_base_phys;
  
+    union amd_iommu_control ctrl;
+
      struct table_struct dev_table;
      struct ring_buffer cmd_buffer;
      struct ring_buffer event_log;
@@ -172,7 +174,7 @@ struct guest_iommu {
      uint64_t                mmio_base;             /* MMIO base address */
  
      /* MMIO regs */
-    struct mmio_reg         reg_ctrl;              /* MMIO offset 0018h */
+    union amd_iommu_control reg_ctrl;              /* MMIO offset 0018h */
      struct mmio_reg         reg_status;            /* MMIO offset 2020h */
      union amd_iommu_ext_features reg_ext_feature;  /* MMIO offset 0030h */
  
--- a/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
+++ b/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
@@ -295,38 +295,56 @@ struct amd_iommu_dte {
  
  /* Control Register */
  #define IOMMU_CONTROL_MMIO_OFFSET			0x18
-#define IOMMU_CONTROL_TRANSLATION_ENABLE_MASK		0x00000001
-#define IOMMU_CONTROL_TRANSLATION_ENABLE_SHIFT		0
-#define IOMMU_CONTROL_HT_TUNNEL_TRANSLATION_MASK	0x00000002
-#define IOMMU_CONTROL_HT_TUNNEL_TRANSLATION_SHIFT	1
-#define IOMMU_CONTROL_EVENT_LOG_ENABLE_MASK		0x00000004
-#define IOMMU_CONTROL_EVENT_LOG_ENABLE_SHIFT		2
-#define IOMMU_CONTROL_EVENT_LOG_INT_MASK		0x00000008
-#define IOMMU_CONTROL_EVENT_LOG_INT_SHIFT		3
-#define IOMMU_CONTROL_COMP_WAIT_INT_MASK		0x00000010
-#define IOMMU_CONTROL_COMP_WAIT_INT_SHIFT		4
-#define IOMMU_CONTROL_INVALIDATION_TIMEOUT_MASK		0x000000E0
-#define IOMMU_CONTROL_INVALIDATION_TIMEOUT_SHIFT	5
-#define IOMMU_CONTROL_PASS_POSTED_WRITE_MASK		0x00000100
-#define IOMMU_CONTROL_PASS_POSTED_WRITE_SHIFT		8
-#define IOMMU_CONTROL_RESP_PASS_POSTED_WRITE_MASK	0x00000200
-#define IOMMU_CONTROL_RESP_PASS_POSTED_WRITE_SHIFT	9
-#define IOMMU_CONTROL_COHERENT_MASK			0x00000400
-#define IOMMU_CONTROL_COHERENT_SHIFT			10
-#define IOMMU_CONTROL_ISOCHRONOUS_MASK			0x00000800
-#define IOMMU_CONTROL_ISOCHRONOUS_SHIFT			11
-#define IOMMU_CONTROL_COMMAND_BUFFER_ENABLE_MASK	0x00001000
-#define IOMMU_CONTROL_COMMAND_BUFFER_ENABLE_SHIFT	12
-#define IOMMU_CONTROL_PPR_LOG_ENABLE_MASK		0x00002000
-#define IOMMU_CONTROL_PPR_LOG_ENABLE_SHIFT		13
-#define IOMMU_CONTROL_PPR_LOG_INT_MASK			0x00004000
-#define IOMMU_CONTROL_PPR_LOG_INT_SHIFT			14
-#define IOMMU_CONTROL_PPR_ENABLE_MASK			0x00008000
-#define IOMMU_CONTROL_PPR_ENABLE_SHIFT			15
-#define IOMMU_CONTROL_GT_ENABLE_MASK			0x00010000
-#define IOMMU_CONTROL_GT_ENABLE_SHIFT			16
-#define IOMMU_CONTROL_RESTART_MASK			0x80000000
-#define IOMMU_CONTROL_RESTART_SHIFT			31
+
+union amd_iommu_control {
+    uint64_t raw;
+    struct {
+        bool iommu_en:1;
+        bool ht_tun_en:1;
+        bool event_log_en:1;
+        bool event_int_en:1;
+        bool com_wait_int_en:1;
+        unsigned int inv_timeout:3;
+        bool pass_pw:1;
+        bool res_pass_pw:1;
+        bool coherent:1;
+        bool isoc:1;
+        bool cmd_buf_en:1;
+        bool ppr_log_en:1;
+        bool ppr_int_en:1;
+        bool ppr_en:1;
+        bool gt_en:1;
+        bool ga_en:1;
+        unsigned int crw:4;
+        bool smif_en:1;
+        bool slf_wb_dis:1;
+        bool smif_log_en:1;
+        unsigned int gam_en:3;
+        bool ga_log_en:1;
+        bool ga_int_en:1;
+        unsigned int dual_ppr_log_en:2;
+        unsigned int dual_event_log_en:2;
+        unsigned int dev_tbl_seg_en:3;
+        unsigned int priv_abrt_en:2;
+        bool ppr_auto_rsp_en:1;
+        bool marc_en:1;
+        bool blk_stop_mrk_en:1;
+        bool ppr_auto_rsp_aon:1;
+        bool domain_id_pne:1;
+        unsigned int :1;
+        bool eph_en:1;
+        unsigned int had_update:2;
+        bool gd_update_dis:1;
+        unsigned int :1;
+        bool xt_en:1;
+        bool int_cap_xt_en:1;
+        bool vcmd_en:1;
+        bool viommu_en:1;
+        bool ga_update_dis:1;
+        bool gappi_en:1;
+        unsigned int :8;
+    };
+};
  
  /* Exclusion Register */
  #define IOMMU_EXCLUSION_BASE_LOW_OFFSET		0x20

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 03/14] AMD/IOMMU: use bit field for control register
Posted by Woods, Brian 4 years, 9 months ago
On Tue, Jul 16, 2019 at 04:36:06PM +0000, Jan Beulich wrote:
> Also introduce a field in struct amd_iommu caching the most recently
> written control register. All writes should now happen exclusively from
> that cached value, such that it is guaranteed to be up to date.
> 
> Take the opportunity and add further fields. Also convert a few boolean
> function parameters to bool, such that use of !! can be avoided.
> 
> Because of there now being definitions beyond bit 31, writel() also gets
> replaced by writeq() when updating hardware.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

Acked-by: Brian Woods <brian.woods@amd.com>

> ---
> v3: Switch boolean bitfields to bool.
> v2: Add domain_id_pne field. Mention writel() -> writeq() change.
> 
> --- a/xen/drivers/passthrough/amd/iommu_guest.c
> +++ b/xen/drivers/passthrough/amd/iommu_guest.c
> @@ -317,7 +317,7 @@ static int do_invalidate_iotlb_pages(str
>   
>   static int do_completion_wait(struct domain *d, cmd_entry_t *cmd)
>   {
> -    bool_t com_wait_int_en, com_wait_int, i, s;
> +    bool com_wait_int, i, s;
>       struct guest_iommu *iommu;
>       unsigned long gfn;
>       p2m_type_t p2mt;
> @@ -354,12 +354,10 @@ static int do_completion_wait(struct dom
>           unmap_domain_page(vaddr);
>       }
>   
> -    com_wait_int_en = iommu_get_bit(iommu->reg_ctrl.lo,
> -                                    IOMMU_CONTROL_COMP_WAIT_INT_SHIFT);
>       com_wait_int = iommu_get_bit(iommu->reg_status.lo,
>                                    IOMMU_STATUS_COMP_WAIT_INT_SHIFT);
>   
> -    if ( com_wait_int_en && com_wait_int )
> +    if ( iommu->reg_ctrl.com_wait_int_en && com_wait_int )
>           guest_iommu_deliver_msi(d);
>   
>       return 0;
> @@ -521,40 +519,17 @@ static void guest_iommu_process_command(
>       return;
>   }
>   
> -static int guest_iommu_write_ctrl(struct guest_iommu *iommu, uint64_t newctrl)
> +static int guest_iommu_write_ctrl(struct guest_iommu *iommu, uint64_t val)
>   {
> -    bool_t cmd_en, event_en, iommu_en, ppr_en, ppr_log_en;
> -    bool_t cmd_en_old, event_en_old, iommu_en_old;
> -    bool_t cmd_run;
> -
> -    iommu_en = iommu_get_bit(newctrl,
> -                             IOMMU_CONTROL_TRANSLATION_ENABLE_SHIFT);
> -    iommu_en_old = iommu_get_bit(iommu->reg_ctrl.lo,
> -                                 IOMMU_CONTROL_TRANSLATION_ENABLE_SHIFT);
> -
> -    cmd_en = iommu_get_bit(newctrl,
> -                           IOMMU_CONTROL_COMMAND_BUFFER_ENABLE_SHIFT);
> -    cmd_en_old = iommu_get_bit(iommu->reg_ctrl.lo,
> -                               IOMMU_CONTROL_COMMAND_BUFFER_ENABLE_SHIFT);
> -    cmd_run = iommu_get_bit(iommu->reg_status.lo,
> -                            IOMMU_STATUS_CMD_BUFFER_RUN_SHIFT);
> -    event_en = iommu_get_bit(newctrl,
> -                             IOMMU_CONTROL_EVENT_LOG_ENABLE_SHIFT);
> -    event_en_old = iommu_get_bit(iommu->reg_ctrl.lo,
> -                                 IOMMU_CONTROL_EVENT_LOG_ENABLE_SHIFT);
> -
> -    ppr_en = iommu_get_bit(newctrl,
> -                           IOMMU_CONTROL_PPR_ENABLE_SHIFT);
> -    ppr_log_en = iommu_get_bit(newctrl,
> -                               IOMMU_CONTROL_PPR_LOG_ENABLE_SHIFT);
> +    union amd_iommu_control newctrl = { .raw = val };
>   
> -    if ( iommu_en )
> +    if ( newctrl.iommu_en )
>       {
>           guest_iommu_enable(iommu);
>           guest_iommu_enable_dev_table(iommu);
>       }
>   
> -    if ( iommu_en && cmd_en )
> +    if ( newctrl.iommu_en && newctrl.cmd_buf_en )
>       {
>           guest_iommu_enable_ring_buffer(iommu, &iommu->cmd_buffer,
>                                          sizeof(cmd_entry_t));
> @@ -562,7 +537,7 @@ static int guest_iommu_write_ctrl(struct
>           tasklet_schedule(&iommu->cmd_buffer_tasklet);
>       }
>   
> -    if ( iommu_en && event_en )
> +    if ( newctrl.iommu_en && newctrl.event_log_en )
>       {
>           guest_iommu_enable_ring_buffer(iommu, &iommu->event_log,
>                                          sizeof(event_entry_t));
> @@ -570,7 +545,7 @@ static int guest_iommu_write_ctrl(struct
>           guest_iommu_clear_status(iommu, IOMMU_STATUS_EVENT_OVERFLOW_SHIFT);
>       }
>   
> -    if ( iommu_en && ppr_en && ppr_log_en )
> +    if ( newctrl.iommu_en && newctrl.ppr_en && newctrl.ppr_log_en )
>       {
>           guest_iommu_enable_ring_buffer(iommu, &iommu->ppr_log,
>                                          sizeof(ppr_entry_t));
> @@ -578,19 +553,21 @@ static int guest_iommu_write_ctrl(struct
>           guest_iommu_clear_status(iommu, IOMMU_STATUS_PPR_LOG_OVERFLOW_SHIFT);
>       }
>   
> -    if ( iommu_en && cmd_en_old && !cmd_en )
> +    if ( newctrl.iommu_en && iommu->reg_ctrl.cmd_buf_en &&
> +         !newctrl.cmd_buf_en )
>       {
>           /* Disable iommu command processing */
>           tasklet_kill(&iommu->cmd_buffer_tasklet);
>       }
>   
> -    if ( event_en_old && !event_en )
> +    if ( iommu->reg_ctrl.event_log_en && !newctrl.event_log_en )
>           guest_iommu_clear_status(iommu, IOMMU_STATUS_EVENT_LOG_RUN_SHIFT);
>   
> -    if ( iommu_en_old && !iommu_en )
> +    if ( iommu->reg_ctrl.iommu_en && !newctrl.iommu_en )
>           guest_iommu_disable(iommu);
>   
> -    u64_to_reg(&iommu->reg_ctrl, newctrl);
> +    iommu->reg_ctrl = newctrl;
> +
>       return 0;
>   }
>   
> @@ -632,7 +609,7 @@ static uint64_t iommu_mmio_read64(struct
>           val = reg_to_u64(iommu->ppr_log.reg_tail);
>           break;
>       case IOMMU_CONTROL_MMIO_OFFSET:
> -        val = reg_to_u64(iommu->reg_ctrl);
> +        val = iommu->reg_ctrl.raw;
>           break;
>       case IOMMU_STATUS_MMIO_OFFSET:
>           val = reg_to_u64(iommu->reg_status);
> --- a/xen/drivers/passthrough/amd/iommu_init.c
> +++ b/xen/drivers/passthrough/amd/iommu_init.c
> @@ -41,7 +41,7 @@ LIST_HEAD_READ_MOSTLY(amd_iommu_head);
>   struct table_struct device_table;
>   bool_t iommuv2_enabled;
>   
> -static int iommu_has_ht_flag(struct amd_iommu *iommu, u8 mask)
> +static bool iommu_has_ht_flag(struct amd_iommu *iommu, u8 mask)
>   {
>       return iommu->ht_flags & mask;
>   }
> @@ -69,31 +69,18 @@ static void __init unmap_iommu_mmio_regi
>   
>   static void set_iommu_ht_flags(struct amd_iommu *iommu)
>   {
> -    u32 entry;
> -    entry = readl(iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
> -
>       /* Setup HT flags */
>       if ( iommu_has_cap(iommu, PCI_CAP_HT_TUNNEL_SHIFT) )
> -        iommu_has_ht_flag(iommu, ACPI_IVHD_TT_ENABLE) ?
> -            iommu_set_bit(&entry, IOMMU_CONTROL_HT_TUNNEL_TRANSLATION_SHIFT) :
> -            iommu_clear_bit(&entry, IOMMU_CONTROL_HT_TUNNEL_TRANSLATION_SHIFT);
> -
> -    iommu_has_ht_flag(iommu, ACPI_IVHD_RES_PASS_PW) ?
> -        iommu_set_bit(&entry, IOMMU_CONTROL_RESP_PASS_POSTED_WRITE_SHIFT):
> -        iommu_clear_bit(&entry, IOMMU_CONTROL_RESP_PASS_POSTED_WRITE_SHIFT);
> -
> -    iommu_has_ht_flag(iommu, ACPI_IVHD_ISOC) ?
> -        iommu_set_bit(&entry, IOMMU_CONTROL_ISOCHRONOUS_SHIFT):
> -        iommu_clear_bit(&entry, IOMMU_CONTROL_ISOCHRONOUS_SHIFT);
> -
> -    iommu_has_ht_flag(iommu, ACPI_IVHD_PASS_PW) ?
> -        iommu_set_bit(&entry, IOMMU_CONTROL_PASS_POSTED_WRITE_SHIFT):
> -        iommu_clear_bit(&entry, IOMMU_CONTROL_PASS_POSTED_WRITE_SHIFT);
> +        iommu->ctrl.ht_tun_en = iommu_has_ht_flag(iommu, ACPI_IVHD_TT_ENABLE);
> +
> +    iommu->ctrl.pass_pw     = iommu_has_ht_flag(iommu, ACPI_IVHD_PASS_PW);
> +    iommu->ctrl.res_pass_pw = iommu_has_ht_flag(iommu, ACPI_IVHD_RES_PASS_PW);
> +    iommu->ctrl.isoc        = iommu_has_ht_flag(iommu, ACPI_IVHD_ISOC);
>   
>       /* Force coherent */
> -    iommu_set_bit(&entry, IOMMU_CONTROL_COHERENT_SHIFT);
> +    iommu->ctrl.coherent = true;
>   
> -    writel(entry, iommu->mmio_base+IOMMU_CONTROL_MMIO_OFFSET);
> +    writeq(iommu->ctrl.raw, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
>   }
>   
>   static void register_iommu_dev_table_in_mmio_space(struct amd_iommu *iommu)
> @@ -205,55 +192,37 @@ static void register_iommu_ppr_log_in_mm
>   
>   
>   static void set_iommu_translation_control(struct amd_iommu *iommu,
> -                                                 int enable)
> +                                          bool enable)
>   {
> -    u32 entry;
> +    iommu->ctrl.iommu_en = enable;
>   
> -    entry = readl(iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
> -
> -    enable ?
> -        iommu_set_bit(&entry, IOMMU_CONTROL_TRANSLATION_ENABLE_SHIFT) :
> -        iommu_clear_bit(&entry, IOMMU_CONTROL_TRANSLATION_ENABLE_SHIFT);
> -
> -    writel(entry, iommu->mmio_base+IOMMU_CONTROL_MMIO_OFFSET);
> +    writeq(iommu->ctrl.raw, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
>   }
>   
>   static void set_iommu_guest_translation_control(struct amd_iommu *iommu,
> -                                                int enable)
> +                                                bool enable)
>   {
> -    u32 entry;
> -
> -    entry = readl(iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
> +    iommu->ctrl.gt_en = enable;
>   
> -    enable ?
> -        iommu_set_bit(&entry, IOMMU_CONTROL_GT_ENABLE_SHIFT) :
> -        iommu_clear_bit(&entry, IOMMU_CONTROL_GT_ENABLE_SHIFT);
> -
> -    writel(entry, iommu->mmio_base+IOMMU_CONTROL_MMIO_OFFSET);
> +    writeq(iommu->ctrl.raw, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
>   
>       if ( enable )
>           AMD_IOMMU_DEBUG("Guest Translation Enabled.\n");
>   }
>   
>   static void set_iommu_command_buffer_control(struct amd_iommu *iommu,
> -                                                    int enable)
> +                                             bool enable)
>   {
> -    u32 entry;
> -
> -    entry = readl(iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
> -
> -    /*reset head and tail pointer manually before enablement */
> +    /* Reset head and tail pointer manually before enablement */
>       if ( enable )
>       {
>           writeq(0, iommu->mmio_base + IOMMU_CMD_BUFFER_HEAD_OFFSET);
>           writeq(0, iommu->mmio_base + IOMMU_CMD_BUFFER_TAIL_OFFSET);
> -
> -        iommu_set_bit(&entry, IOMMU_CONTROL_COMMAND_BUFFER_ENABLE_SHIFT);
>       }
> -    else
> -        iommu_clear_bit(&entry, IOMMU_CONTROL_COMMAND_BUFFER_ENABLE_SHIFT);
>   
> -    writel(entry, iommu->mmio_base+IOMMU_CONTROL_MMIO_OFFSET);
> +    iommu->ctrl.cmd_buf_en = enable;
> +
> +    writeq(iommu->ctrl.raw, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
>   }
>   
>   static void register_iommu_exclusion_range(struct amd_iommu *iommu)
> @@ -295,57 +264,38 @@ static void register_iommu_exclusion_ran
>   }
>   
>   static void set_iommu_event_log_control(struct amd_iommu *iommu,
> -            int enable)
> +                                        bool enable)
>   {
> -    u32 entry;
> -
> -    entry = readl(iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
> -
> -    /*reset head and tail pointer manually before enablement */
> +    /* Reset head and tail pointer manually before enablement */
>       if ( enable )
>       {
>           writeq(0, iommu->mmio_base + IOMMU_EVENT_LOG_HEAD_OFFSET);
>           writeq(0, iommu->mmio_base + IOMMU_EVENT_LOG_TAIL_OFFSET);
> -
> -        iommu_set_bit(&entry, IOMMU_CONTROL_EVENT_LOG_INT_SHIFT);
> -        iommu_set_bit(&entry, IOMMU_CONTROL_EVENT_LOG_ENABLE_SHIFT);
> -    }
> -    else
> -    {
> -        iommu_clear_bit(&entry, IOMMU_CONTROL_EVENT_LOG_INT_SHIFT);
> -        iommu_clear_bit(&entry, IOMMU_CONTROL_EVENT_LOG_ENABLE_SHIFT);
>       }
>   
> -    iommu_clear_bit(&entry, IOMMU_CONTROL_COMP_WAIT_INT_SHIFT);
> +    iommu->ctrl.event_int_en = enable;
> +    iommu->ctrl.event_log_en = enable;
> +    iommu->ctrl.com_wait_int_en = false;
>   
> -    writel(entry, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
> +    writeq(iommu->ctrl.raw, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
>   }
>   
>   static void set_iommu_ppr_log_control(struct amd_iommu *iommu,
> -                                      int enable)
> +                                      bool enable)
>   {
> -    u32 entry;
> -
> -    entry = readl(iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
> -
> -    /*reset head and tail pointer manually before enablement */
> +    /* Reset head and tail pointer manually before enablement */
>       if ( enable )
>       {
>           writeq(0, iommu->mmio_base + IOMMU_PPR_LOG_HEAD_OFFSET);
>           writeq(0, iommu->mmio_base + IOMMU_PPR_LOG_TAIL_OFFSET);
> -
> -        iommu_set_bit(&entry, IOMMU_CONTROL_PPR_ENABLE_SHIFT);
> -        iommu_set_bit(&entry, IOMMU_CONTROL_PPR_LOG_INT_SHIFT);
> -        iommu_set_bit(&entry, IOMMU_CONTROL_PPR_LOG_ENABLE_SHIFT);
> -    }
> -    else
> -    {
> -        iommu_clear_bit(&entry, IOMMU_CONTROL_PPR_ENABLE_SHIFT);
> -        iommu_clear_bit(&entry, IOMMU_CONTROL_PPR_LOG_INT_SHIFT);
> -        iommu_clear_bit(&entry, IOMMU_CONTROL_PPR_LOG_ENABLE_SHIFT);
>       }
>   
> -    writel(entry, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
> +    iommu->ctrl.ppr_en = enable;
> +    iommu->ctrl.ppr_int_en = enable;
> +    iommu->ctrl.ppr_log_en = enable;
> +
> +    writeq(iommu->ctrl.raw, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
> +
>       if ( enable )
>           AMD_IOMMU_DEBUG("PPR Log Enabled.\n");
>   }
> @@ -398,7 +348,7 @@ static int iommu_read_log(struct amd_iom
>   /* reset event log or ppr log when overflow */
>   static void iommu_reset_log(struct amd_iommu *iommu,
>                               struct ring_buffer *log,
> -                            void (*ctrl_func)(struct amd_iommu *iommu, int))
> +                            void (*ctrl_func)(struct amd_iommu *iommu, bool))
>   {
>       u32 entry;
>       int log_run, run_bit;
> @@ -615,11 +565,11 @@ static void iommu_check_event_log(struct
>           iommu_reset_log(iommu, &iommu->event_log, set_iommu_event_log_control);
>       else
>       {
> -        entry = readl(iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
> -        if ( !(entry & IOMMU_CONTROL_EVENT_LOG_INT_MASK) )
> +        if ( !iommu->ctrl.event_int_en )
>           {
> -            entry |= IOMMU_CONTROL_EVENT_LOG_INT_MASK;
> -            writel(entry, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
> +            iommu->ctrl.event_int_en = true;
> +            writeq(iommu->ctrl.raw,
> +                   iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
>               /*
>                * Re-schedule the tasklet to handle eventual log entries added
>                * between reading the log above and re-enabling the interrupt.
> @@ -704,11 +654,11 @@ static void iommu_check_ppr_log(struct a
>           iommu_reset_log(iommu, &iommu->ppr_log, set_iommu_ppr_log_control);
>       else
>       {
> -        entry = readl(iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
> -        if ( !(entry & IOMMU_CONTROL_PPR_LOG_INT_MASK) )
> +        if ( !iommu->ctrl.ppr_int_en )
>           {
> -            entry |= IOMMU_CONTROL_PPR_LOG_INT_MASK;
> -            writel(entry, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
> +            iommu->ctrl.ppr_int_en = true;
> +            writeq(iommu->ctrl.raw,
> +                   iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
>               /*
>                * Re-schedule the tasklet to handle eventual log entries added
>                * between reading the log above and re-enabling the interrupt.
> @@ -754,7 +704,6 @@ static void do_amd_iommu_irq(unsigned lo
>   static void iommu_interrupt_handler(int irq, void *dev_id,
>                                       struct cpu_user_regs *regs)
>   {
> -    u32 entry;
>       unsigned long flags;
>       struct amd_iommu *iommu = dev_id;
>   
> @@ -764,10 +713,9 @@ static void iommu_interrupt_handler(int
>        * Silence interrupts from both event and PPR by clearing the
>        * enable logging bits in the control register
>        */
> -    entry = readl(iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
> -    iommu_clear_bit(&entry, IOMMU_CONTROL_EVENT_LOG_INT_SHIFT);
> -    iommu_clear_bit(&entry, IOMMU_CONTROL_PPR_LOG_INT_SHIFT);
> -    writel(entry, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
> +    iommu->ctrl.event_int_en = false;
> +    iommu->ctrl.ppr_int_en = false;
> +    writeq(iommu->ctrl.raw, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
>   
>       spin_unlock_irqrestore(&iommu->lock, flags);
>   
> --- a/xen/include/asm-x86/amd-iommu.h
> +++ b/xen/include/asm-x86/amd-iommu.h
> @@ -88,6 +88,8 @@ struct amd_iommu {
>       void *mmio_base;
>       unsigned long mmio_base_phys;
>   
> +    union amd_iommu_control ctrl;
> +
>       struct table_struct dev_table;
>       struct ring_buffer cmd_buffer;
>       struct ring_buffer event_log;
> @@ -172,7 +174,7 @@ struct guest_iommu {
>       uint64_t                mmio_base;             /* MMIO base address */
>   
>       /* MMIO regs */
> -    struct mmio_reg         reg_ctrl;              /* MMIO offset 0018h */
> +    union amd_iommu_control reg_ctrl;              /* MMIO offset 0018h */
>       struct mmio_reg         reg_status;            /* MMIO offset 2020h */
>       union amd_iommu_ext_features reg_ext_feature;  /* MMIO offset 0030h */
>   
> --- a/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
> +++ b/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
> @@ -295,38 +295,56 @@ struct amd_iommu_dte {
>   
>   /* Control Register */
>   #define IOMMU_CONTROL_MMIO_OFFSET			0x18
> -#define IOMMU_CONTROL_TRANSLATION_ENABLE_MASK		0x00000001
> -#define IOMMU_CONTROL_TRANSLATION_ENABLE_SHIFT		0
> -#define IOMMU_CONTROL_HT_TUNNEL_TRANSLATION_MASK	0x00000002
> -#define IOMMU_CONTROL_HT_TUNNEL_TRANSLATION_SHIFT	1
> -#define IOMMU_CONTROL_EVENT_LOG_ENABLE_MASK		0x00000004
> -#define IOMMU_CONTROL_EVENT_LOG_ENABLE_SHIFT		2
> -#define IOMMU_CONTROL_EVENT_LOG_INT_MASK		0x00000008
> -#define IOMMU_CONTROL_EVENT_LOG_INT_SHIFT		3
> -#define IOMMU_CONTROL_COMP_WAIT_INT_MASK		0x00000010
> -#define IOMMU_CONTROL_COMP_WAIT_INT_SHIFT		4
> -#define IOMMU_CONTROL_INVALIDATION_TIMEOUT_MASK		0x000000E0
> -#define IOMMU_CONTROL_INVALIDATION_TIMEOUT_SHIFT	5
> -#define IOMMU_CONTROL_PASS_POSTED_WRITE_MASK		0x00000100
> -#define IOMMU_CONTROL_PASS_POSTED_WRITE_SHIFT		8
> -#define IOMMU_CONTROL_RESP_PASS_POSTED_WRITE_MASK	0x00000200
> -#define IOMMU_CONTROL_RESP_PASS_POSTED_WRITE_SHIFT	9
> -#define IOMMU_CONTROL_COHERENT_MASK			0x00000400
> -#define IOMMU_CONTROL_COHERENT_SHIFT			10
> -#define IOMMU_CONTROL_ISOCHRONOUS_MASK			0x00000800
> -#define IOMMU_CONTROL_ISOCHRONOUS_SHIFT			11
> -#define IOMMU_CONTROL_COMMAND_BUFFER_ENABLE_MASK	0x00001000
> -#define IOMMU_CONTROL_COMMAND_BUFFER_ENABLE_SHIFT	12
> -#define IOMMU_CONTROL_PPR_LOG_ENABLE_MASK		0x00002000
> -#define IOMMU_CONTROL_PPR_LOG_ENABLE_SHIFT		13
> -#define IOMMU_CONTROL_PPR_LOG_INT_MASK			0x00004000
> -#define IOMMU_CONTROL_PPR_LOG_INT_SHIFT			14
> -#define IOMMU_CONTROL_PPR_ENABLE_MASK			0x00008000
> -#define IOMMU_CONTROL_PPR_ENABLE_SHIFT			15
> -#define IOMMU_CONTROL_GT_ENABLE_MASK			0x00010000
> -#define IOMMU_CONTROL_GT_ENABLE_SHIFT			16
> -#define IOMMU_CONTROL_RESTART_MASK			0x80000000
> -#define IOMMU_CONTROL_RESTART_SHIFT			31
> +
> +union amd_iommu_control {
> +    uint64_t raw;
> +    struct {
> +        bool iommu_en:1;
> +        bool ht_tun_en:1;
> +        bool event_log_en:1;
> +        bool event_int_en:1;
> +        bool com_wait_int_en:1;
> +        unsigned int inv_timeout:3;
> +        bool pass_pw:1;
> +        bool res_pass_pw:1;
> +        bool coherent:1;
> +        bool isoc:1;
> +        bool cmd_buf_en:1;
> +        bool ppr_log_en:1;
> +        bool ppr_int_en:1;
> +        bool ppr_en:1;
> +        bool gt_en:1;
> +        bool ga_en:1;
> +        unsigned int crw:4;
> +        bool smif_en:1;
> +        bool slf_wb_dis:1;
> +        bool smif_log_en:1;
> +        unsigned int gam_en:3;
> +        bool ga_log_en:1;
> +        bool ga_int_en:1;
> +        unsigned int dual_ppr_log_en:2;
> +        unsigned int dual_event_log_en:2;
> +        unsigned int dev_tbl_seg_en:3;
> +        unsigned int priv_abrt_en:2;
> +        bool ppr_auto_rsp_en:1;
> +        bool marc_en:1;
> +        bool blk_stop_mrk_en:1;
> +        bool ppr_auto_rsp_aon:1;
> +        bool domain_id_pne:1;
> +        unsigned int :1;
> +        bool eph_en:1;
> +        unsigned int had_update:2;
> +        bool gd_update_dis:1;
> +        unsigned int :1;
> +        bool xt_en:1;
> +        bool int_cap_xt_en:1;
> +        bool vcmd_en:1;
> +        bool viommu_en:1;
> +        bool ga_update_dis:1;
> +        bool gappi_en:1;
> +        unsigned int :8;
> +    };
> +};
>   
>   /* Exclusion Register */
>   #define IOMMU_EXCLUSION_BASE_LOW_OFFSET		0x20
> 

-- 
Brian Woods

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 03/14] AMD/IOMMU: use bit field for control register
Posted by Jan Beulich 4 years, 9 months ago
On 19.07.2019 20:23,  Woods, Brian  wrote:
> On Tue, Jul 16, 2019 at 04:36:06PM +0000, Jan Beulich wrote:
>> Also introduce a field in struct amd_iommu caching the most recently
>> written control register. All writes should now happen exclusively from
>> that cached value, such that it is guaranteed to be up to date.
>>
>> Take the opportunity and add further fields. Also convert a few boolean
>> function parameters to bool, such that use of !! can be avoided.
>>
>> Because of there now being definitions beyond bit 31, writel() also gets
>> replaced by writeq() when updating hardware.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
> 
> Acked-by: Brian Woods <brian.woods@amd.com>

Thanks for this and the other acks. I notice though that you skipped
patches 2 and 13: Are there concerns there? Patch 8 still has a
discussion to settle, so I realize you probably wouldn't want to ack
that one yet.

Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v3 04/14] AMD/IOMMU: use bit field for IRTE
Posted by Jan Beulich 4 years, 9 months ago
At the same time restrict its scope to just the single source file
actually using it, and abstract accesses by introducing a union of
pointers. (A union of the actual table entries is not used to make it
impossible to [wrongly, once the 128-bit form gets added] perform
pointer arithmetic / array accesses on derived types.)

Also move away from updating the entries piecemeal: Construct a full new
entry, and write it out.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v3: Switch boolean bitfields to bool.
v2: name {get,free}_intremap_entry()'s last parameter "index" instead of
     "offset". Introduce union irte32.

--- a/xen/drivers/passthrough/amd/iommu_intr.c
+++ b/xen/drivers/passthrough/amd/iommu_intr.c
@@ -23,6 +23,28 @@
  #include <asm/io_apic.h>
  #include <xen/keyhandler.h>
  
+struct irte_basic {
+    bool remap_en:1;
+    bool sup_io_pf:1;
+    unsigned int int_type:3;
+    bool rq_eoi:1;
+    bool dm:1;
+    bool guest_mode:1; /* MBZ */
+    unsigned int dest:8;
+    unsigned int vector:8;
+    unsigned int :8;
+};
+
+union irte32 {
+    uint32_t raw[1];
+    struct irte_basic basic;
+};
+
+union irte_ptr {
+    void *ptr;
+    union irte32 *ptr32;
+};
+
  #define INTREMAP_TABLE_ORDER    1
  #define INTREMAP_LENGTH 0xB
  #define INTREMAP_ENTRIES (1 << INTREMAP_LENGTH)
@@ -101,47 +123,44 @@ static unsigned int alloc_intremap_entry
      return slot;
  }
  
-static u32 *get_intremap_entry(int seg, int bdf, int offset)
+static union irte_ptr get_intremap_entry(unsigned int seg, unsigned int bdf,
+                                         unsigned int index)
  {
-    u32 *table = get_ivrs_mappings(seg)[bdf].intremap_table;
+    union irte_ptr table = {
+        .ptr = get_ivrs_mappings(seg)[bdf].intremap_table
+    };
+
+    ASSERT(table.ptr && (index < INTREMAP_ENTRIES));
  
-    ASSERT( (table != NULL) && (offset < INTREMAP_ENTRIES) );
+    table.ptr32 += index;
  
-    return table + offset;
+    return table;
  }
  
-static void free_intremap_entry(int seg, int bdf, int offset)
-{
-    u32 *entry = get_intremap_entry(seg, bdf, offset);
-
-    memset(entry, 0, sizeof(u32));
-    __clear_bit(offset, get_ivrs_mappings(seg)[bdf].intremap_inuse);
-}
-
-static void update_intremap_entry(u32* entry, u8 vector, u8 int_type,
-    u8 dest_mode, u8 dest)
-{
-    set_field_in_reg_u32(IOMMU_CONTROL_ENABLED, 0,
-                            INT_REMAP_ENTRY_REMAPEN_MASK,
-                            INT_REMAP_ENTRY_REMAPEN_SHIFT, entry);
-    set_field_in_reg_u32(IOMMU_CONTROL_DISABLED, *entry,
-                            INT_REMAP_ENTRY_SUPIOPF_MASK,
-                            INT_REMAP_ENTRY_SUPIOPF_SHIFT, entry);
-    set_field_in_reg_u32(int_type, *entry,
-                            INT_REMAP_ENTRY_INTTYPE_MASK,
-                            INT_REMAP_ENTRY_INTTYPE_SHIFT, entry);
-    set_field_in_reg_u32(IOMMU_CONTROL_DISABLED, *entry,
-                            INT_REMAP_ENTRY_REQEOI_MASK,
-                            INT_REMAP_ENTRY_REQEOI_SHIFT, entry);
-    set_field_in_reg_u32((u32)dest_mode, *entry,
-                            INT_REMAP_ENTRY_DM_MASK,
-                            INT_REMAP_ENTRY_DM_SHIFT, entry);
-    set_field_in_reg_u32((u32)dest, *entry,
-                            INT_REMAP_ENTRY_DEST_MAST,
-                            INT_REMAP_ENTRY_DEST_SHIFT, entry);
-    set_field_in_reg_u32((u32)vector, *entry,
-                            INT_REMAP_ENTRY_VECTOR_MASK,
-                            INT_REMAP_ENTRY_VECTOR_SHIFT, entry);
+static void free_intremap_entry(unsigned int seg, unsigned int bdf,
+                                unsigned int index)
+{
+    union irte_ptr entry = get_intremap_entry(seg, bdf, index);
+
+    ACCESS_ONCE(entry.ptr32->raw[0]) = 0;
+
+    __clear_bit(index, get_ivrs_mappings(seg)[bdf].intremap_inuse);
+}
+
+static void update_intremap_entry(union irte_ptr entry, unsigned int vector,
+                                  unsigned int int_type,
+                                  unsigned int dest_mode, unsigned int dest)
+{
+    struct irte_basic basic = {
+        .remap_en = true,
+        .int_type = int_type,
+        .dm = dest_mode,
+        .dest = dest,
+        .vector = vector,
+    };
+
+    ACCESS_ONCE(entry.ptr32->raw[0]) =
+        container_of(&basic, union irte32, basic)->raw[0];
  }
  
  static inline int get_rte_index(const struct IO_APIC_route_entry *rte)
@@ -163,7 +182,7 @@ static int update_intremap_entry_from_io
      u16 *index)
  {
      unsigned long flags;
-    u32* entry;
+    union irte_ptr entry;
      u8 delivery_mode, dest, vector, dest_mode;
      int req_id;
      spinlock_t *lock;
@@ -201,12 +220,8 @@ static int update_intremap_entry_from_io
           * so need to recover vector and delivery mode from IRTE.
           */
          ASSERT(get_rte_index(rte) == offset);
-        vector = get_field_from_reg_u32(*entry,
-                                        INT_REMAP_ENTRY_VECTOR_MASK,
-                                        INT_REMAP_ENTRY_VECTOR_SHIFT);
-        delivery_mode = get_field_from_reg_u32(*entry,
-                                               INT_REMAP_ENTRY_INTTYPE_MASK,
-                                               INT_REMAP_ENTRY_INTTYPE_SHIFT);
+        vector = entry.ptr32->basic.vector;
+        delivery_mode = entry.ptr32->basic.int_type;
      }
      update_intremap_entry(entry, vector, delivery_mode, dest_mode, dest);
  
@@ -228,7 +243,7 @@ int __init amd_iommu_setup_ioapic_remapp
  {
      struct IO_APIC_route_entry rte;
      unsigned long flags;
-    u32* entry;
+    union irte_ptr entry;
      int apic, pin;
      u8 delivery_mode, dest, vector, dest_mode;
      u16 seg, bdf, req_id;
@@ -407,16 +422,14 @@ unsigned int amd_iommu_read_ioapic_from_
          u16 bdf = ioapic_sbdf[idx].bdf;
          u16 seg = ioapic_sbdf[idx].seg;
          u16 req_id = get_intremap_requestor_id(seg, bdf);
-        const u32 *entry = get_intremap_entry(seg, req_id, offset);
+        union irte_ptr entry = get_intremap_entry(seg, req_id, offset);
  
          ASSERT(offset == (val & (INTREMAP_ENTRIES - 1)));
          val &= ~(INTREMAP_ENTRIES - 1);
-        val |= get_field_from_reg_u32(*entry,
-                                      INT_REMAP_ENTRY_INTTYPE_MASK,
-                                      INT_REMAP_ENTRY_INTTYPE_SHIFT) << 8;
-        val |= get_field_from_reg_u32(*entry,
-                                      INT_REMAP_ENTRY_VECTOR_MASK,
-                                      INT_REMAP_ENTRY_VECTOR_SHIFT);
+        val |= MASK_INSR(entry.ptr32->basic.int_type,
+                         IO_APIC_REDIR_DELIV_MODE_MASK);
+        val |= MASK_INSR(entry.ptr32->basic.vector,
+                         IO_APIC_REDIR_VECTOR_MASK);
      }
  
      return val;
@@ -427,7 +440,7 @@ static int update_intremap_entry_from_ms
      int *remap_index, const struct msi_msg *msg, u32 *data)
  {
      unsigned long flags;
-    u32* entry;
+    union irte_ptr entry;
      u16 req_id, alias_id;
      u8 delivery_mode, dest, vector, dest_mode;
      spinlock_t *lock;
@@ -581,7 +594,7 @@ void amd_iommu_read_msi_from_ire(
      const struct pci_dev *pdev = msi_desc->dev;
      u16 bdf = pdev ? PCI_BDF2(pdev->bus, pdev->devfn) : hpet_sbdf.bdf;
      u16 seg = pdev ? pdev->seg : hpet_sbdf.seg;
-    const u32 *entry;
+    union irte_ptr entry;
  
      if ( IS_ERR_OR_NULL(_find_iommu_for_device(seg, bdf)) )
          return;
@@ -597,12 +610,10 @@ void amd_iommu_read_msi_from_ire(
      }
  
      msg->data &= ~(INTREMAP_ENTRIES - 1);
-    msg->data |= get_field_from_reg_u32(*entry,
-                                        INT_REMAP_ENTRY_INTTYPE_MASK,
-                                        INT_REMAP_ENTRY_INTTYPE_SHIFT) << 8;
-    msg->data |= get_field_from_reg_u32(*entry,
-                                        INT_REMAP_ENTRY_VECTOR_MASK,
-                                        INT_REMAP_ENTRY_VECTOR_SHIFT);
+    msg->data |= MASK_INSR(entry.ptr32->basic.int_type,
+                           MSI_DATA_DELIVERY_MODE_MASK);
+    msg->data |= MASK_INSR(entry.ptr32->basic.vector,
+                           MSI_DATA_VECTOR_MASK);
  }
  
  int __init amd_iommu_free_intremap_table(
--- a/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
+++ b/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
@@ -469,22 +469,6 @@ struct amd_iommu_pte {
  #define IOMMU_CONTROL_DISABLED	0
  #define IOMMU_CONTROL_ENABLED	1
  
-/* interrupt remapping table */
-#define INT_REMAP_ENTRY_REMAPEN_MASK    0x00000001
-#define INT_REMAP_ENTRY_REMAPEN_SHIFT   0
-#define INT_REMAP_ENTRY_SUPIOPF_MASK    0x00000002
-#define INT_REMAP_ENTRY_SUPIOPF_SHIFT   1
-#define INT_REMAP_ENTRY_INTTYPE_MASK    0x0000001C
-#define INT_REMAP_ENTRY_INTTYPE_SHIFT   2
-#define INT_REMAP_ENTRY_REQEOI_MASK     0x00000020
-#define INT_REMAP_ENTRY_REQEOI_SHIFT    5
-#define INT_REMAP_ENTRY_DM_MASK         0x00000040
-#define INT_REMAP_ENTRY_DM_SHIFT        6
-#define INT_REMAP_ENTRY_DEST_MAST       0x0000FF00
-#define INT_REMAP_ENTRY_DEST_SHIFT      8
-#define INT_REMAP_ENTRY_VECTOR_MASK     0x00FF0000
-#define INT_REMAP_ENTRY_VECTOR_SHIFT    16
-
  #define INV_IOMMU_ALL_PAGES_ADDRESS      ((1ULL << 63) - 1)
  
  #define IOMMU_RING_BUFFER_PTR_MASK                  0x0007FFF0

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 04/14] AMD/IOMMU: use bit field for IRTE
Posted by Woods, Brian 4 years, 9 months ago
On Tue, Jul 16, 2019 at 04:36:34PM +0000, Jan Beulich wrote:
> At the same time restrict its scope to just the single source file
> actually using it, and abstract accesses by introducing a union of
> pointers. (A union of the actual table entries is not used to make it
> impossible to [wrongly, once the 128-bit form gets added] perform
> pointer arithmetic / array accesses on derived types.)
> 
> Also move away from updating the entries piecemeal: Construct a full new
> entry, and write it out.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Brian Woods <brian.woods@amd.com>

> ---
> v3: Switch boolean bitfields to bool.
> v2: name {get,free}_intremap_entry()'s last parameter "index" instead of
>      "offset". Introduce union irte32.
> 
> --- a/xen/drivers/passthrough/amd/iommu_intr.c
> +++ b/xen/drivers/passthrough/amd/iommu_intr.c
> @@ -23,6 +23,28 @@
>   #include <asm/io_apic.h>
>   #include <xen/keyhandler.h>
>   
> +struct irte_basic {
> +    bool remap_en:1;
> +    bool sup_io_pf:1;
> +    unsigned int int_type:3;
> +    bool rq_eoi:1;
> +    bool dm:1;
> +    bool guest_mode:1; /* MBZ */
> +    unsigned int dest:8;
> +    unsigned int vector:8;
> +    unsigned int :8;
> +};
> +
> +union irte32 {
> +    uint32_t raw[1];
> +    struct irte_basic basic;
> +};
> +
> +union irte_ptr {
> +    void *ptr;
> +    union irte32 *ptr32;
> +};
> +
>   #define INTREMAP_TABLE_ORDER    1
>   #define INTREMAP_LENGTH 0xB
>   #define INTREMAP_ENTRIES (1 << INTREMAP_LENGTH)
> @@ -101,47 +123,44 @@ static unsigned int alloc_intremap_entry
>       return slot;
>   }
>   
> -static u32 *get_intremap_entry(int seg, int bdf, int offset)
> +static union irte_ptr get_intremap_entry(unsigned int seg, unsigned int bdf,
> +                                         unsigned int index)
>   {
> -    u32 *table = get_ivrs_mappings(seg)[bdf].intremap_table;
> +    union irte_ptr table = {
> +        .ptr = get_ivrs_mappings(seg)[bdf].intremap_table
> +    };
> +
> +    ASSERT(table.ptr && (index < INTREMAP_ENTRIES));
>   
> -    ASSERT( (table != NULL) && (offset < INTREMAP_ENTRIES) );
> +    table.ptr32 += index;
>   
> -    return table + offset;
> +    return table;
>   }
>   
> -static void free_intremap_entry(int seg, int bdf, int offset)
> -{
> -    u32 *entry = get_intremap_entry(seg, bdf, offset);
> -
> -    memset(entry, 0, sizeof(u32));
> -    __clear_bit(offset, get_ivrs_mappings(seg)[bdf].intremap_inuse);
> -}
> -
> -static void update_intremap_entry(u32* entry, u8 vector, u8 int_type,
> -    u8 dest_mode, u8 dest)
> -{
> -    set_field_in_reg_u32(IOMMU_CONTROL_ENABLED, 0,
> -                            INT_REMAP_ENTRY_REMAPEN_MASK,
> -                            INT_REMAP_ENTRY_REMAPEN_SHIFT, entry);
> -    set_field_in_reg_u32(IOMMU_CONTROL_DISABLED, *entry,
> -                            INT_REMAP_ENTRY_SUPIOPF_MASK,
> -                            INT_REMAP_ENTRY_SUPIOPF_SHIFT, entry);
> -    set_field_in_reg_u32(int_type, *entry,
> -                            INT_REMAP_ENTRY_INTTYPE_MASK,
> -                            INT_REMAP_ENTRY_INTTYPE_SHIFT, entry);
> -    set_field_in_reg_u32(IOMMU_CONTROL_DISABLED, *entry,
> -                            INT_REMAP_ENTRY_REQEOI_MASK,
> -                            INT_REMAP_ENTRY_REQEOI_SHIFT, entry);
> -    set_field_in_reg_u32((u32)dest_mode, *entry,
> -                            INT_REMAP_ENTRY_DM_MASK,
> -                            INT_REMAP_ENTRY_DM_SHIFT, entry);
> -    set_field_in_reg_u32((u32)dest, *entry,
> -                            INT_REMAP_ENTRY_DEST_MAST,
> -                            INT_REMAP_ENTRY_DEST_SHIFT, entry);
> -    set_field_in_reg_u32((u32)vector, *entry,
> -                            INT_REMAP_ENTRY_VECTOR_MASK,
> -                            INT_REMAP_ENTRY_VECTOR_SHIFT, entry);
> +static void free_intremap_entry(unsigned int seg, unsigned int bdf,
> +                                unsigned int index)
> +{
> +    union irte_ptr entry = get_intremap_entry(seg, bdf, index);
> +
> +    ACCESS_ONCE(entry.ptr32->raw[0]) = 0;
> +
> +    __clear_bit(index, get_ivrs_mappings(seg)[bdf].intremap_inuse);
> +}
> +
> +static void update_intremap_entry(union irte_ptr entry, unsigned int vector,
> +                                  unsigned int int_type,
> +                                  unsigned int dest_mode, unsigned int dest)
> +{
> +    struct irte_basic basic = {
> +        .remap_en = true,
> +        .int_type = int_type,
> +        .dm = dest_mode,
> +        .dest = dest,
> +        .vector = vector,
> +    };
> +
> +    ACCESS_ONCE(entry.ptr32->raw[0]) =
> +        container_of(&basic, union irte32, basic)->raw[0];
>   }
>   
>   static inline int get_rte_index(const struct IO_APIC_route_entry *rte)
> @@ -163,7 +182,7 @@ static int update_intremap_entry_from_io
>       u16 *index)
>   {
>       unsigned long flags;
> -    u32* entry;
> +    union irte_ptr entry;
>       u8 delivery_mode, dest, vector, dest_mode;
>       int req_id;
>       spinlock_t *lock;
> @@ -201,12 +220,8 @@ static int update_intremap_entry_from_io
>            * so need to recover vector and delivery mode from IRTE.
>            */
>           ASSERT(get_rte_index(rte) == offset);
> -        vector = get_field_from_reg_u32(*entry,
> -                                        INT_REMAP_ENTRY_VECTOR_MASK,
> -                                        INT_REMAP_ENTRY_VECTOR_SHIFT);
> -        delivery_mode = get_field_from_reg_u32(*entry,
> -                                               INT_REMAP_ENTRY_INTTYPE_MASK,
> -                                               INT_REMAP_ENTRY_INTTYPE_SHIFT);
> +        vector = entry.ptr32->basic.vector;
> +        delivery_mode = entry.ptr32->basic.int_type;
>       }
>       update_intremap_entry(entry, vector, delivery_mode, dest_mode, dest);
>   
> @@ -228,7 +243,7 @@ int __init amd_iommu_setup_ioapic_remapp
>   {
>       struct IO_APIC_route_entry rte;
>       unsigned long flags;
> -    u32* entry;
> +    union irte_ptr entry;
>       int apic, pin;
>       u8 delivery_mode, dest, vector, dest_mode;
>       u16 seg, bdf, req_id;
> @@ -407,16 +422,14 @@ unsigned int amd_iommu_read_ioapic_from_
>           u16 bdf = ioapic_sbdf[idx].bdf;
>           u16 seg = ioapic_sbdf[idx].seg;
>           u16 req_id = get_intremap_requestor_id(seg, bdf);
> -        const u32 *entry = get_intremap_entry(seg, req_id, offset);
> +        union irte_ptr entry = get_intremap_entry(seg, req_id, offset);
>   
>           ASSERT(offset == (val & (INTREMAP_ENTRIES - 1)));
>           val &= ~(INTREMAP_ENTRIES - 1);
> -        val |= get_field_from_reg_u32(*entry,
> -                                      INT_REMAP_ENTRY_INTTYPE_MASK,
> -                                      INT_REMAP_ENTRY_INTTYPE_SHIFT) << 8;
> -        val |= get_field_from_reg_u32(*entry,
> -                                      INT_REMAP_ENTRY_VECTOR_MASK,
> -                                      INT_REMAP_ENTRY_VECTOR_SHIFT);
> +        val |= MASK_INSR(entry.ptr32->basic.int_type,
> +                         IO_APIC_REDIR_DELIV_MODE_MASK);
> +        val |= MASK_INSR(entry.ptr32->basic.vector,
> +                         IO_APIC_REDIR_VECTOR_MASK);
>       }
>   
>       return val;
> @@ -427,7 +440,7 @@ static int update_intremap_entry_from_ms
>       int *remap_index, const struct msi_msg *msg, u32 *data)
>   {
>       unsigned long flags;
> -    u32* entry;
> +    union irte_ptr entry;
>       u16 req_id, alias_id;
>       u8 delivery_mode, dest, vector, dest_mode;
>       spinlock_t *lock;
> @@ -581,7 +594,7 @@ void amd_iommu_read_msi_from_ire(
>       const struct pci_dev *pdev = msi_desc->dev;
>       u16 bdf = pdev ? PCI_BDF2(pdev->bus, pdev->devfn) : hpet_sbdf.bdf;
>       u16 seg = pdev ? pdev->seg : hpet_sbdf.seg;
> -    const u32 *entry;
> +    union irte_ptr entry;
>   
>       if ( IS_ERR_OR_NULL(_find_iommu_for_device(seg, bdf)) )
>           return;
> @@ -597,12 +610,10 @@ void amd_iommu_read_msi_from_ire(
>       }
>   
>       msg->data &= ~(INTREMAP_ENTRIES - 1);
> -    msg->data |= get_field_from_reg_u32(*entry,
> -                                        INT_REMAP_ENTRY_INTTYPE_MASK,
> -                                        INT_REMAP_ENTRY_INTTYPE_SHIFT) << 8;
> -    msg->data |= get_field_from_reg_u32(*entry,
> -                                        INT_REMAP_ENTRY_VECTOR_MASK,
> -                                        INT_REMAP_ENTRY_VECTOR_SHIFT);
> +    msg->data |= MASK_INSR(entry.ptr32->basic.int_type,
> +                           MSI_DATA_DELIVERY_MODE_MASK);
> +    msg->data |= MASK_INSR(entry.ptr32->basic.vector,
> +                           MSI_DATA_VECTOR_MASK);
>   }
>   
>   int __init amd_iommu_free_intremap_table(
> --- a/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
> +++ b/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
> @@ -469,22 +469,6 @@ struct amd_iommu_pte {
>   #define IOMMU_CONTROL_DISABLED	0
>   #define IOMMU_CONTROL_ENABLED	1
>   
> -/* interrupt remapping table */
> -#define INT_REMAP_ENTRY_REMAPEN_MASK    0x00000001
> -#define INT_REMAP_ENTRY_REMAPEN_SHIFT   0
> -#define INT_REMAP_ENTRY_SUPIOPF_MASK    0x00000002
> -#define INT_REMAP_ENTRY_SUPIOPF_SHIFT   1
> -#define INT_REMAP_ENTRY_INTTYPE_MASK    0x0000001C
> -#define INT_REMAP_ENTRY_INTTYPE_SHIFT   2
> -#define INT_REMAP_ENTRY_REQEOI_MASK     0x00000020
> -#define INT_REMAP_ENTRY_REQEOI_SHIFT    5
> -#define INT_REMAP_ENTRY_DM_MASK         0x00000040
> -#define INT_REMAP_ENTRY_DM_SHIFT        6
> -#define INT_REMAP_ENTRY_DEST_MAST       0x0000FF00
> -#define INT_REMAP_ENTRY_DEST_SHIFT      8
> -#define INT_REMAP_ENTRY_VECTOR_MASK     0x00FF0000
> -#define INT_REMAP_ENTRY_VECTOR_SHIFT    16
> -
>   #define INV_IOMMU_ALL_PAGES_ADDRESS      ((1ULL << 63) - 1)
>   
>   #define IOMMU_RING_BUFFER_PTR_MASK                  0x0007FFF0
> 

-- 
Brian Woods

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 04/14] AMD/IOMMU: use bit field for IRTE
Posted by Andrew Cooper 4 years, 9 months ago
On 16/07/2019 17:36, Jan Beulich wrote:
> At the same time restrict its scope to just the single source file
> actually using it, and abstract accesses by introducing a union of
> pointers. (A union of the actual table entries is not used to make it
> impossible to [wrongly, once the 128-bit form gets added] perform
> pointer arithmetic / array accesses on derived types.)
>
> Also move away from updating the entries piecemeal: Construct a full new
> entry, and write it out.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

I'm still not entirely convinced by extra union and containerof(), but
the result looks correct.

Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 04/14] AMD/IOMMU: use bit field for IRTE
Posted by Jan Beulich 4 years, 9 months ago
On 19.07.2019 17:56, Andrew Cooper wrote:
> On 16/07/2019 17:36, Jan Beulich wrote:
>> At the same time restrict its scope to just the single source file
>> actually using it, and abstract accesses by introducing a union of
>> pointers. (A union of the actual table entries is not used to make it
>> impossible to [wrongly, once the 128-bit form gets added] perform
>> pointer arithmetic / array accesses on derived types.)
>>
>> Also move away from updating the entries piecemeal: Construct a full new
>> entry, and write it out.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> I'm still not entirely convinced by extra union and containerof(), but
> the result looks correct.

And I'm still open to going the other way, if you're convinced that
in update_intremap_entry() this

     struct irte_basic basic = {
         .flds = {
             .remap_en = true,
             .int_type = int_type,
             .dm = dest_mode,
             .dest = dest,
             .vector = vector,
         }
     };

(and similarly then for the 128-bit form, and of course .flds
inserted at other use sites) is overall better than the current
variant.

> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

Thanks, Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 04/14] AMD/IOMMU: use bit field for IRTE
Posted by Andrew Cooper 4 years, 9 months ago
On 19/07/2019 17:16, Jan Beulich wrote:
> On 19.07.2019 17:56, Andrew Cooper wrote:
>> On 16/07/2019 17:36, Jan Beulich wrote:
>>> At the same time restrict its scope to just the single source file
>>> actually using it, and abstract accesses by introducing a union of
>>> pointers. (A union of the actual table entries is not used to make it
>>> impossible to [wrongly, once the 128-bit form gets added] perform
>>> pointer arithmetic / array accesses on derived types.)
>>>
>>> Also move away from updating the entries piecemeal: Construct a full new
>>> entry, and write it out.
>>>
>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> I'm still not entirely convinced by extra union and containerof(), but
>> the result looks correct.
> And I'm still open to going the other way, if you're convinced that
> in update_intremap_entry() this
>
>      struct irte_basic basic = {
>          .flds = {
>              .remap_en = true,
>              .int_type = int_type,
>              .dm = dest_mode,
>              .dest = dest,
>              .vector = vector,
>          }
>      };
>
> (and similarly then for the 128-bit form, and of course .flds
> inserted at other use sites) is overall better than the current
> variant.

I've just experimented with the attached delta and it does compile in
CentOS 6, which is the usual culprit for problems in this area.

I do think the result is easier-to-read code, which I am definitely in
favour of.

My ack still stands in all affected patches, but ideally with this kind
of change folded in appropriately.

~Andrew
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 04/14] AMD/IOMMU: use bit field for IRTE
Posted by Jan Beulich 4 years, 9 months ago
On 19.07.2019 20:44, Andrew Cooper wrote:
> On 19/07/2019 17:16, Jan Beulich wrote:
>> On 19.07.2019 17:56, Andrew Cooper wrote:
>>> On 16/07/2019 17:36, Jan Beulich wrote:
>>>> At the same time restrict its scope to just the single source file
>>>> actually using it, and abstract accesses by introducing a union of
>>>> pointers. (A union of the actual table entries is not used to make it
>>>> impossible to [wrongly, once the 128-bit form gets added] perform
>>>> pointer arithmetic / array accesses on derived types.)
>>>>
>>>> Also move away from updating the entries piecemeal: Construct a full new
>>>> entry, and write it out.
>>>>
>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>> I'm still not entirely convinced by extra union and containerof(), but
>>> the result looks correct.
>> And I'm still open to going the other way, if you're convinced that
>> in update_intremap_entry() this
>>
>>       struct irte_basic basic = {
>>           .flds = {
>>               .remap_en = true,
>>               .int_type = int_type,
>>               .dm = dest_mode,
>>               .dest = dest,
>>               .vector = vector,
>>           }
>>       };
>>
>> (and similarly then for the 128-bit form, and of course .flds
>> inserted at other use sites) is overall better than the current
>> variant.
> 
> I've just experimented with the attached delta and it does compile in
> CentOS 6, which is the usual culprit for problems in this area.

Yeah, with the "flds" in place things ought to (and do) build fine for
me too (it was, after all, the question whether inserting that
intermediate field would be more or less ugly than the container_of()
"solution"). I've therefore mostly switched to what you've suggested.
But before re-posting we should really settle on the barrier kind to
use for the 128-bit IRTE writes.

Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v3 05/14] AMD/IOMMU: pass IOMMU to iterate_ivrs_entries() callback
Posted by Jan Beulich 4 years, 9 months ago
Both users will want to know IOMMU properties (specifically the IRTE
size) subsequently. Leverage this to avoid pointless calls to the
callback when IVRS mapping table entries are unpopulated. To avoid
leaking interrupt remapping tables (bogusly) allocated for IOMMUs
themselves, this requires suppressing their allocation in the first
place, taking a step further what commit 757122c0cf ('AMD/IOMMU: don't
"add" IOMMUs') had done.

Additionally suppress the call for alias entries, as again both users
don't care about these anyway. In fact this eliminates a fair bit of
redundancy from dump output.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v3: New.
---
TBD: Along the lines of avoiding the IRT allocation for the IOMMUs, is
      there a way to recognize the many CPU-provided devices many of
      which can't generate interrupts anyway, and avoid allocations for
      them as well? It's 32k per device, after all. Another option might
      be on-demand allocation of the tables, but quite possibly we'd get
      into trouble with error handling there.

--- a/xen/drivers/passthrough/amd/iommu_acpi.c
+++ b/xen/drivers/passthrough/amd/iommu_acpi.c
@@ -65,7 +65,11 @@ static void __init add_ivrs_mapping_entr
      /* override flags for range of devices */
      ivrs_mappings[bdf].device_flags = flags;
  
-    if (ivrs_mappings[alias_id].intremap_table == NULL )
+    /* Don't map an IOMMU by itself. */
+    if ( iommu->bdf == bdf )
+        return;
+
+    if ( !ivrs_mappings[alias_id].intremap_table )
      {
           /* allocate per-device interrupt remapping table */
           if ( amd_iommu_perdev_intremap )
@@ -81,8 +85,9 @@ static void __init add_ivrs_mapping_entr
               ivrs_mappings[alias_id].intremap_inuse = shared_intremap_inuse;
           }
      }
-    /* Assign IOMMU hardware, but don't map an IOMMU by itself. */
-    ivrs_mappings[bdf].iommu = iommu->bdf != bdf ? iommu : NULL;
+
+    /* Assign IOMMU hardware. */
+    ivrs_mappings[bdf].iommu = iommu;
  }
  
  static struct amd_iommu * __init find_iommu_from_bdf_cap(
--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -1069,7 +1069,8 @@ int iterate_ivrs_mappings(int (*handler)
      return rc;
  }
  
-int iterate_ivrs_entries(int (*handler)(u16 seg, struct ivrs_mappings *))
+int iterate_ivrs_entries(int (*handler)(const struct amd_iommu *,
+                                        struct ivrs_mappings *))
  {
      u16 seg = 0;
      int rc = 0;
@@ -1082,7 +1083,12 @@ int iterate_ivrs_entries(int (*handler)(
              break;
          seg = IVRS_MAPPINGS_SEG(map);
          for ( bdf = 0; !rc && bdf < ivrs_bdf_entries; ++bdf )
-            rc = handler(seg, map + bdf);
+        {
+            const struct amd_iommu *iommu = map[bdf].iommu;
+
+            if ( iommu && map[bdf].dte_requestor_id == bdf )
+                rc = handler(iommu, &map[bdf]);
+        }
      } while ( !rc && ++seg );
  
      return rc;
--- a/xen/drivers/passthrough/amd/iommu_intr.c
+++ b/xen/drivers/passthrough/amd/iommu_intr.c
@@ -617,7 +617,7 @@ void amd_iommu_read_msi_from_ire(
  }
  
  int __init amd_iommu_free_intremap_table(
-    u16 seg, struct ivrs_mappings *ivrs_mapping)
+    const struct amd_iommu *iommu, struct ivrs_mappings *ivrs_mapping)
  {
      void *tb = ivrs_mapping->intremap_table;
  
@@ -693,14 +693,15 @@ static void dump_intremap_table(const u3
      }
  }
  
-static int dump_intremap_mapping(u16 seg, struct ivrs_mappings *ivrs_mapping)
+static int dump_intremap_mapping(const struct amd_iommu *iommu,
+                                 struct ivrs_mappings *ivrs_mapping)
  {
      unsigned long flags;
  
      if ( !ivrs_mapping )
          return 0;
  
-    printk("  %04x:%02x:%02x:%u:\n", seg,
+    printk("  %04x:%02x:%02x:%u:\n", iommu->seg,
             PCI_BUS(ivrs_mapping->dte_requestor_id),
             PCI_SLOT(ivrs_mapping->dte_requestor_id),
             PCI_FUNC(ivrs_mapping->dte_requestor_id));
--- a/xen/include/asm-x86/amd-iommu.h
+++ b/xen/include/asm-x86/amd-iommu.h
@@ -129,7 +129,8 @@ extern u8 ivhd_type;
  
  struct ivrs_mappings *get_ivrs_mappings(u16 seg);
  int iterate_ivrs_mappings(int (*)(u16 seg, struct ivrs_mappings *));
-int iterate_ivrs_entries(int (*)(u16 seg, struct ivrs_mappings *));
+int iterate_ivrs_entries(int (*)(const struct amd_iommu *,
+                                 struct ivrs_mappings *));
  
  /* iommu tables in guest space */
  struct mmio_reg {
--- a/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
+++ b/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
@@ -98,7 +98,8 @@ struct amd_iommu *find_iommu_for_device(
  /* interrupt remapping */
  int amd_iommu_setup_ioapic_remapping(void);
  void *amd_iommu_alloc_intremap_table(unsigned long **);
-int amd_iommu_free_intremap_table(u16 seg, struct ivrs_mappings *);
+int amd_iommu_free_intremap_table(
+    const struct amd_iommu *, struct ivrs_mappings *);
  void amd_iommu_ioapic_update_ire(
      unsigned int apic, unsigned int reg, unsigned int value);
  unsigned int amd_iommu_read_ioapic_from_ire(

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 05/14] AMD/IOMMU: pass IOMMU to iterate_ivrs_entries() callback
Posted by Woods, Brian 4 years, 9 months ago
On Tue, Jul 16, 2019 at 04:37:04PM +0000, Jan Beulich wrote:
> Both users will want to know IOMMU properties (specifically the IRTE
> size) subsequently. Leverage this to avoid pointless calls to the
> callback when IVRS mapping table entries are unpopulated. To avoid
> leaking interrupt remapping tables (bogusly) allocated for IOMMUs
> themselves, this requires suppressing their allocation in the first
> place, taking a step further what commit 757122c0cf ('AMD/IOMMU: don't
> "add" IOMMUs') had done.
> 
> Additionally suppress the call for alias entries, as again both users
> don't care about these anyway. In fact this eliminates a fair bit of
> redundancy from dump output.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Brian Woods <brian.woods@amd.com>

> ---
> v3: New.
> ---
> TBD: Along the lines of avoiding the IRT allocation for the IOMMUs, is
>       there a way to recognize the many CPU-provided devices many of
>       which can't generate interrupts anyway, and avoid allocations for
>       them as well? It's 32k per device, after all. Another option might
>       be on-demand allocation of the tables, but quite possibly we'd get
>       into trouble with error handling there.
> 
> --- a/xen/drivers/passthrough/amd/iommu_acpi.c
> +++ b/xen/drivers/passthrough/amd/iommu_acpi.c
> @@ -65,7 +65,11 @@ static void __init add_ivrs_mapping_entr
>       /* override flags for range of devices */
>       ivrs_mappings[bdf].device_flags = flags;
>   
> -    if (ivrs_mappings[alias_id].intremap_table == NULL )
> +    /* Don't map an IOMMU by itself. */
> +    if ( iommu->bdf == bdf )
> +        return;
> +
> +    if ( !ivrs_mappings[alias_id].intremap_table )
>       {
>            /* allocate per-device interrupt remapping table */
>            if ( amd_iommu_perdev_intremap )
> @@ -81,8 +85,9 @@ static void __init add_ivrs_mapping_entr
>                ivrs_mappings[alias_id].intremap_inuse = shared_intremap_inuse;
>            }
>       }
> -    /* Assign IOMMU hardware, but don't map an IOMMU by itself. */
> -    ivrs_mappings[bdf].iommu = iommu->bdf != bdf ? iommu : NULL;
> +
> +    /* Assign IOMMU hardware. */
> +    ivrs_mappings[bdf].iommu = iommu;
>   }
>   
>   static struct amd_iommu * __init find_iommu_from_bdf_cap(
> --- a/xen/drivers/passthrough/amd/iommu_init.c
> +++ b/xen/drivers/passthrough/amd/iommu_init.c
> @@ -1069,7 +1069,8 @@ int iterate_ivrs_mappings(int (*handler)
>       return rc;
>   }
>   
> -int iterate_ivrs_entries(int (*handler)(u16 seg, struct ivrs_mappings *))
> +int iterate_ivrs_entries(int (*handler)(const struct amd_iommu *,
> +                                        struct ivrs_mappings *))
>   {
>       u16 seg = 0;
>       int rc = 0;
> @@ -1082,7 +1083,12 @@ int iterate_ivrs_entries(int (*handler)(
>               break;
>           seg = IVRS_MAPPINGS_SEG(map);
>           for ( bdf = 0; !rc && bdf < ivrs_bdf_entries; ++bdf )
> -            rc = handler(seg, map + bdf);
> +        {
> +            const struct amd_iommu *iommu = map[bdf].iommu;
> +
> +            if ( iommu && map[bdf].dte_requestor_id == bdf )
> +                rc = handler(iommu, &map[bdf]);
> +        }
>       } while ( !rc && ++seg );
>   
>       return rc;
> --- a/xen/drivers/passthrough/amd/iommu_intr.c
> +++ b/xen/drivers/passthrough/amd/iommu_intr.c
> @@ -617,7 +617,7 @@ void amd_iommu_read_msi_from_ire(
>   }
>   
>   int __init amd_iommu_free_intremap_table(
> -    u16 seg, struct ivrs_mappings *ivrs_mapping)
> +    const struct amd_iommu *iommu, struct ivrs_mappings *ivrs_mapping)
>   {
>       void *tb = ivrs_mapping->intremap_table;
>   
> @@ -693,14 +693,15 @@ static void dump_intremap_table(const u3
>       }
>   }
>   
> -static int dump_intremap_mapping(u16 seg, struct ivrs_mappings *ivrs_mapping)
> +static int dump_intremap_mapping(const struct amd_iommu *iommu,
> +                                 struct ivrs_mappings *ivrs_mapping)
>   {
>       unsigned long flags;
>   
>       if ( !ivrs_mapping )
>           return 0;
>   
> -    printk("  %04x:%02x:%02x:%u:\n", seg,
> +    printk("  %04x:%02x:%02x:%u:\n", iommu->seg,
>              PCI_BUS(ivrs_mapping->dte_requestor_id),
>              PCI_SLOT(ivrs_mapping->dte_requestor_id),
>              PCI_FUNC(ivrs_mapping->dte_requestor_id));
> --- a/xen/include/asm-x86/amd-iommu.h
> +++ b/xen/include/asm-x86/amd-iommu.h
> @@ -129,7 +129,8 @@ extern u8 ivhd_type;
>   
>   struct ivrs_mappings *get_ivrs_mappings(u16 seg);
>   int iterate_ivrs_mappings(int (*)(u16 seg, struct ivrs_mappings *));
> -int iterate_ivrs_entries(int (*)(u16 seg, struct ivrs_mappings *));
> +int iterate_ivrs_entries(int (*)(const struct amd_iommu *,
> +                                 struct ivrs_mappings *));
>   
>   /* iommu tables in guest space */
>   struct mmio_reg {
> --- a/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
> +++ b/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
> @@ -98,7 +98,8 @@ struct amd_iommu *find_iommu_for_device(
>   /* interrupt remapping */
>   int amd_iommu_setup_ioapic_remapping(void);
>   void *amd_iommu_alloc_intremap_table(unsigned long **);
> -int amd_iommu_free_intremap_table(u16 seg, struct ivrs_mappings *);
> +int amd_iommu_free_intremap_table(
> +    const struct amd_iommu *, struct ivrs_mappings *);
>   void amd_iommu_ioapic_update_ire(
>       unsigned int apic, unsigned int reg, unsigned int value);
>   unsigned int amd_iommu_read_ioapic_from_ire(
> 

-- 
Brian Woods

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 05/14] AMD/IOMMU: pass IOMMU to iterate_ivrs_entries() callback
Posted by Andrew Cooper 4 years, 9 months ago
On 16/07/2019 17:37, Jan Beulich wrote:
> Both users will want to know IOMMU properties (specifically the IRTE
> size) subsequently. Leverage this to avoid pointless calls to the
> callback when IVRS mapping table entries are unpopulated. To avoid
> leaking interrupt remapping tables (bogusly) allocated for IOMMUs
> themselves, this requires suppressing their allocation in the first
> place, taking a step further what commit 757122c0cf ('AMD/IOMMU: don't
> "add" IOMMUs') had done.
>
> Additionally suppress the call for alias entries, as again both users
> don't care about these anyway. In fact this eliminates a fair bit of
> redundancy from dump output.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v3 06/14] AMD/IOMMU: pass IOMMU to amd_iommu_alloc_intremap_table()
Posted by Jan Beulich 4 years, 9 months ago
The function will want to know IOMMU properties (specifically the IRTE
size) subsequently.

Correct indentation of one of the call sites at this occasion.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v3: New.

--- a/xen/drivers/passthrough/amd/iommu_acpi.c
+++ b/xen/drivers/passthrough/amd/iommu_acpi.c
@@ -74,12 +74,14 @@ static void __init add_ivrs_mapping_entr
           /* allocate per-device interrupt remapping table */
           if ( amd_iommu_perdev_intremap )
               ivrs_mappings[alias_id].intremap_table =
-                amd_iommu_alloc_intremap_table(
-                    &ivrs_mappings[alias_id].intremap_inuse);
+                 amd_iommu_alloc_intremap_table(
+                     iommu,
+                     &ivrs_mappings[alias_id].intremap_inuse);
           else
           {
               if ( shared_intremap_table == NULL  )
                   shared_intremap_table = amd_iommu_alloc_intremap_table(
+                     iommu,
                       &shared_intremap_inuse);
               ivrs_mappings[alias_id].intremap_table = shared_intremap_table;
               ivrs_mappings[alias_id].intremap_inuse = shared_intremap_inuse;
--- a/xen/drivers/passthrough/amd/iommu_intr.c
+++ b/xen/drivers/passthrough/amd/iommu_intr.c
@@ -632,7 +632,8 @@ int __init amd_iommu_free_intremap_table
      return 0;
  }
  
-void* __init amd_iommu_alloc_intremap_table(unsigned long **inuse_map)
+void *__init amd_iommu_alloc_intremap_table(
+    const struct amd_iommu *iommu, unsigned long **inuse_map)
  {
      void *tb;
      tb = __alloc_amd_iommu_tables(INTREMAP_TABLE_ORDER);
--- a/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
+++ b/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
@@ -97,7 +97,8 @@ struct amd_iommu *find_iommu_for_device(
  
  /* interrupt remapping */
  int amd_iommu_setup_ioapic_remapping(void);
-void *amd_iommu_alloc_intremap_table(unsigned long **);
+void *amd_iommu_alloc_intremap_table(
+    const struct amd_iommu *, unsigned long **);
  int amd_iommu_free_intremap_table(
      const struct amd_iommu *, struct ivrs_mappings *);
  void amd_iommu_ioapic_update_ire(

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 06/14] AMD/IOMMU: pass IOMMU to amd_iommu_alloc_intremap_table()
Posted by Woods, Brian 4 years, 9 months ago
On Tue, Jul 16, 2019 at 04:37:26PM +0000, Jan Beulich wrote:
> The function will want to know IOMMU properties (specifically the IRTE
> size) subsequently.
> 
> Correct indentation of one of the call sites at this occasion.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Brian Woods <brian.woods@amd.com>

> ---
> v3: New.
> 
> --- a/xen/drivers/passthrough/amd/iommu_acpi.c
> +++ b/xen/drivers/passthrough/amd/iommu_acpi.c
> @@ -74,12 +74,14 @@ static void __init add_ivrs_mapping_entr
>            /* allocate per-device interrupt remapping table */
>            if ( amd_iommu_perdev_intremap )
>                ivrs_mappings[alias_id].intremap_table =
> -                amd_iommu_alloc_intremap_table(
> -                    &ivrs_mappings[alias_id].intremap_inuse);
> +                 amd_iommu_alloc_intremap_table(
> +                     iommu,
> +                     &ivrs_mappings[alias_id].intremap_inuse);
>            else
>            {
>                if ( shared_intremap_table == NULL  )
>                    shared_intremap_table = amd_iommu_alloc_intremap_table(
> +                     iommu,
>                        &shared_intremap_inuse);
>                ivrs_mappings[alias_id].intremap_table = shared_intremap_table;
>                ivrs_mappings[alias_id].intremap_inuse = shared_intremap_inuse;
> --- a/xen/drivers/passthrough/amd/iommu_intr.c
> +++ b/xen/drivers/passthrough/amd/iommu_intr.c
> @@ -632,7 +632,8 @@ int __init amd_iommu_free_intremap_table
>       return 0;
>   }
>   
> -void* __init amd_iommu_alloc_intremap_table(unsigned long **inuse_map)
> +void *__init amd_iommu_alloc_intremap_table(
> +    const struct amd_iommu *iommu, unsigned long **inuse_map)
>   {
>       void *tb;
>       tb = __alloc_amd_iommu_tables(INTREMAP_TABLE_ORDER);
> --- a/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
> +++ b/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
> @@ -97,7 +97,8 @@ struct amd_iommu *find_iommu_for_device(
>   
>   /* interrupt remapping */
>   int amd_iommu_setup_ioapic_remapping(void);
> -void *amd_iommu_alloc_intremap_table(unsigned long **);
> +void *amd_iommu_alloc_intremap_table(
> +    const struct amd_iommu *, unsigned long **);
>   int amd_iommu_free_intremap_table(
>       const struct amd_iommu *, struct ivrs_mappings *);
>   void amd_iommu_ioapic_update_ire(
> 

-- 
Brian Woods

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 06/14] AMD/IOMMU: pass IOMMU to amd_iommu_alloc_intremap_table()
Posted by Andrew Cooper 4 years, 9 months ago
On 16/07/2019 17:37, Jan Beulich wrote:
> The function will want to know IOMMU properties (specifically the IRTE
> size) subsequently.
>
> Correct indentation of one of the call sites at this occasion.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v3 07/14] AMD/IOMMU: pass IOMMU to {get, free, update}_intremap_entry()
Posted by Jan Beulich 4 years, 9 months ago
The functions will want to know IOMMU properties (specifically the IRTE
size) subsequently.

Rather than introducing a second error path bogusly returning -E... from
amd_iommu_read_ioapic_from_ire(), also change the existing one to follow
VT-d in returning the raw (untranslated) IO-APIC RTE.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v3: New.

--- a/xen/drivers/passthrough/amd/iommu_intr.c
+++ b/xen/drivers/passthrough/amd/iommu_intr.c
@@ -123,11 +123,11 @@ static unsigned int alloc_intremap_entry
      return slot;
  }
  
-static union irte_ptr get_intremap_entry(unsigned int seg, unsigned int bdf,
-                                         unsigned int index)
+static union irte_ptr get_intremap_entry(const struct amd_iommu *iommu,
+                                         unsigned int bdf, unsigned int index)
  {
      union irte_ptr table = {
-        .ptr = get_ivrs_mappings(seg)[bdf].intremap_table
+        .ptr = get_ivrs_mappings(iommu->seg)[bdf].intremap_table
      };
  
      ASSERT(table.ptr && (index < INTREMAP_ENTRIES));
@@ -137,18 +137,19 @@ static union irte_ptr get_intremap_entry
      return table;
  }
  
-static void free_intremap_entry(unsigned int seg, unsigned int bdf,
-                                unsigned int index)
+static void free_intremap_entry(const struct amd_iommu *iommu,
+                                unsigned int bdf, unsigned int index)
  {
-    union irte_ptr entry = get_intremap_entry(seg, bdf, index);
+    union irte_ptr entry = get_intremap_entry(iommu, bdf, index);
  
      ACCESS_ONCE(entry.ptr32->raw[0]) = 0;
  
-    __clear_bit(index, get_ivrs_mappings(seg)[bdf].intremap_inuse);
+    __clear_bit(index, get_ivrs_mappings(iommu->seg)[bdf].intremap_inuse);
  }
  
-static void update_intremap_entry(union irte_ptr entry, unsigned int vector,
-                                  unsigned int int_type,
+static void update_intremap_entry(const struct amd_iommu *iommu,
+                                  union irte_ptr entry,
+                                  unsigned int vector, unsigned int int_type,
                                    unsigned int dest_mode, unsigned int dest)
  {
      struct irte_basic basic = {
@@ -212,7 +213,7 @@ static int update_intremap_entry_from_io
          lo_update = 1;
      }
  
-    entry = get_intremap_entry(iommu->seg, req_id, offset);
+    entry = get_intremap_entry(iommu, req_id, offset);
      if ( !lo_update )
      {
          /*
@@ -223,7 +224,7 @@ static int update_intremap_entry_from_io
          vector = entry.ptr32->basic.vector;
          delivery_mode = entry.ptr32->basic.int_type;
      }
-    update_intremap_entry(entry, vector, delivery_mode, dest_mode, dest);
+    update_intremap_entry(iommu, entry, vector, delivery_mode, dest_mode, dest);
  
      spin_unlock_irqrestore(lock, flags);
  
@@ -288,8 +289,8 @@ int __init amd_iommu_setup_ioapic_remapp
              spin_lock_irqsave(lock, flags);
              offset = alloc_intremap_entry(seg, req_id, 1);
              BUG_ON(offset >= INTREMAP_ENTRIES);
-            entry = get_intremap_entry(iommu->seg, req_id, offset);
-            update_intremap_entry(entry, vector,
+            entry = get_intremap_entry(iommu, req_id, offset);
+            update_intremap_entry(iommu, entry, vector,
                                    delivery_mode, dest_mode, dest);
              spin_unlock_irqrestore(lock, flags);
  
@@ -413,7 +414,7 @@ unsigned int amd_iommu_read_ioapic_from_
  
      idx = ioapic_id_to_index(IO_APIC_ID(apic));
      if ( idx == MAX_IO_APICS )
-        return -EINVAL;
+        return val;
  
      offset = ioapic_sbdf[idx].pin_2_idx[pin];
  
@@ -422,9 +423,13 @@ unsigned int amd_iommu_read_ioapic_from_
          u16 bdf = ioapic_sbdf[idx].bdf;
          u16 seg = ioapic_sbdf[idx].seg;
          u16 req_id = get_intremap_requestor_id(seg, bdf);
-        union irte_ptr entry = get_intremap_entry(seg, req_id, offset);
+        const struct amd_iommu *iommu = find_iommu_for_device(seg, bdf);
+        union irte_ptr entry;
  
+        if ( !iommu )
+            return val;
          ASSERT(offset == (val & (INTREMAP_ENTRIES - 1)));
+        entry = get_intremap_entry(iommu, req_id, offset);
          val &= ~(INTREMAP_ENTRIES - 1);
          val |= MASK_INSR(entry.ptr32->basic.int_type,
                           IO_APIC_REDIR_DELIV_MODE_MASK);
@@ -454,7 +459,7 @@ static int update_intremap_entry_from_ms
          lock = get_intremap_lock(iommu->seg, req_id);
          spin_lock_irqsave(lock, flags);
          for ( i = 0; i < nr; ++i )
-            free_intremap_entry(iommu->seg, req_id, *remap_index + i);
+            free_intremap_entry(iommu, req_id, *remap_index + i);
          spin_unlock_irqrestore(lock, flags);
          goto done;
      }
@@ -479,8 +484,8 @@ static int update_intremap_entry_from_ms
          *remap_index = offset;
      }
  
-    entry = get_intremap_entry(iommu->seg, req_id, offset);
-    update_intremap_entry(entry, vector, delivery_mode, dest_mode, dest);
+    entry = get_intremap_entry(iommu, req_id, offset);
+    update_intremap_entry(iommu, entry, vector, delivery_mode, dest_mode, dest);
      spin_unlock_irqrestore(lock, flags);
  
      *data = (msg->data & ~(INTREMAP_ENTRIES - 1)) | offset;
@@ -594,12 +599,13 @@ void amd_iommu_read_msi_from_ire(
      const struct pci_dev *pdev = msi_desc->dev;
      u16 bdf = pdev ? PCI_BDF2(pdev->bus, pdev->devfn) : hpet_sbdf.bdf;
      u16 seg = pdev ? pdev->seg : hpet_sbdf.seg;
+    const struct amd_iommu *iommu = _find_iommu_for_device(seg, bdf);
      union irte_ptr entry;
  
-    if ( IS_ERR_OR_NULL(_find_iommu_for_device(seg, bdf)) )
+    if ( IS_ERR_OR_NULL(iommu) )
          return;
  
-    entry = get_intremap_entry(seg, get_dma_requestor_id(seg, bdf), offset);
+    entry = get_intremap_entry(iommu, get_dma_requestor_id(seg, bdf), offset);
  
      if ( msi_desc->msi_attrib.type == PCI_CAP_ID_MSI )
      {

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 07/14] AMD/IOMMU: pass IOMMU to {get, free, update}_intremap_entry()
Posted by Woods, Brian 4 years, 9 months ago
On Tue, Jul 16, 2019 at 04:37:51PM +0000, Jan Beulich wrote:
> The functions will want to know IOMMU properties (specifically the IRTE
> size) subsequently.
> 
> Rather than introducing a second error path bogusly returning -E... from
> amd_iommu_read_ioapic_from_ire(), also change the existing one to follow
> VT-d in returning the raw (untranslated) IO-APIC RTE.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Brian Woods <brian.woods@amd.com>

> ---
> v3: New.
> 
> --- a/xen/drivers/passthrough/amd/iommu_intr.c
> +++ b/xen/drivers/passthrough/amd/iommu_intr.c
> @@ -123,11 +123,11 @@ static unsigned int alloc_intremap_entry
>       return slot;
>   }
>   
> -static union irte_ptr get_intremap_entry(unsigned int seg, unsigned int bdf,
> -                                         unsigned int index)
> +static union irte_ptr get_intremap_entry(const struct amd_iommu *iommu,
> +                                         unsigned int bdf, unsigned int index)
>   {
>       union irte_ptr table = {
> -        .ptr = get_ivrs_mappings(seg)[bdf].intremap_table
> +        .ptr = get_ivrs_mappings(iommu->seg)[bdf].intremap_table
>       };
>   
>       ASSERT(table.ptr && (index < INTREMAP_ENTRIES));
> @@ -137,18 +137,19 @@ static union irte_ptr get_intremap_entry
>       return table;
>   }
>   
> -static void free_intremap_entry(unsigned int seg, unsigned int bdf,
> -                                unsigned int index)
> +static void free_intremap_entry(const struct amd_iommu *iommu,
> +                                unsigned int bdf, unsigned int index)
>   {
> -    union irte_ptr entry = get_intremap_entry(seg, bdf, index);
> +    union irte_ptr entry = get_intremap_entry(iommu, bdf, index);
>   
>       ACCESS_ONCE(entry.ptr32->raw[0]) = 0;
>   
> -    __clear_bit(index, get_ivrs_mappings(seg)[bdf].intremap_inuse);
> +    __clear_bit(index, get_ivrs_mappings(iommu->seg)[bdf].intremap_inuse);
>   }
>   
> -static void update_intremap_entry(union irte_ptr entry, unsigned int vector,
> -                                  unsigned int int_type,
> +static void update_intremap_entry(const struct amd_iommu *iommu,
> +                                  union irte_ptr entry,
> +                                  unsigned int vector, unsigned int int_type,
>                                     unsigned int dest_mode, unsigned int dest)
>   {
>       struct irte_basic basic = {
> @@ -212,7 +213,7 @@ static int update_intremap_entry_from_io
>           lo_update = 1;
>       }
>   
> -    entry = get_intremap_entry(iommu->seg, req_id, offset);
> +    entry = get_intremap_entry(iommu, req_id, offset);
>       if ( !lo_update )
>       {
>           /*
> @@ -223,7 +224,7 @@ static int update_intremap_entry_from_io
>           vector = entry.ptr32->basic.vector;
>           delivery_mode = entry.ptr32->basic.int_type;
>       }
> -    update_intremap_entry(entry, vector, delivery_mode, dest_mode, dest);
> +    update_intremap_entry(iommu, entry, vector, delivery_mode, dest_mode, dest);
>   
>       spin_unlock_irqrestore(lock, flags);
>   
> @@ -288,8 +289,8 @@ int __init amd_iommu_setup_ioapic_remapp
>               spin_lock_irqsave(lock, flags);
>               offset = alloc_intremap_entry(seg, req_id, 1);
>               BUG_ON(offset >= INTREMAP_ENTRIES);
> -            entry = get_intremap_entry(iommu->seg, req_id, offset);
> -            update_intremap_entry(entry, vector,
> +            entry = get_intremap_entry(iommu, req_id, offset);
> +            update_intremap_entry(iommu, entry, vector,
>                                     delivery_mode, dest_mode, dest);
>               spin_unlock_irqrestore(lock, flags);
>   
> @@ -413,7 +414,7 @@ unsigned int amd_iommu_read_ioapic_from_
>   
>       idx = ioapic_id_to_index(IO_APIC_ID(apic));
>       if ( idx == MAX_IO_APICS )
> -        return -EINVAL;
> +        return val;
>   
>       offset = ioapic_sbdf[idx].pin_2_idx[pin];
>   
> @@ -422,9 +423,13 @@ unsigned int amd_iommu_read_ioapic_from_
>           u16 bdf = ioapic_sbdf[idx].bdf;
>           u16 seg = ioapic_sbdf[idx].seg;
>           u16 req_id = get_intremap_requestor_id(seg, bdf);
> -        union irte_ptr entry = get_intremap_entry(seg, req_id, offset);
> +        const struct amd_iommu *iommu = find_iommu_for_device(seg, bdf);
> +        union irte_ptr entry;
>   
> +        if ( !iommu )
> +            return val;
>           ASSERT(offset == (val & (INTREMAP_ENTRIES - 1)));
> +        entry = get_intremap_entry(iommu, req_id, offset);
>           val &= ~(INTREMAP_ENTRIES - 1);
>           val |= MASK_INSR(entry.ptr32->basic.int_type,
>                            IO_APIC_REDIR_DELIV_MODE_MASK);
> @@ -454,7 +459,7 @@ static int update_intremap_entry_from_ms
>           lock = get_intremap_lock(iommu->seg, req_id);
>           spin_lock_irqsave(lock, flags);
>           for ( i = 0; i < nr; ++i )
> -            free_intremap_entry(iommu->seg, req_id, *remap_index + i);
> +            free_intremap_entry(iommu, req_id, *remap_index + i);
>           spin_unlock_irqrestore(lock, flags);
>           goto done;
>       }
> @@ -479,8 +484,8 @@ static int update_intremap_entry_from_ms
>           *remap_index = offset;
>       }
>   
> -    entry = get_intremap_entry(iommu->seg, req_id, offset);
> -    update_intremap_entry(entry, vector, delivery_mode, dest_mode, dest);
> +    entry = get_intremap_entry(iommu, req_id, offset);
> +    update_intremap_entry(iommu, entry, vector, delivery_mode, dest_mode, dest);
>       spin_unlock_irqrestore(lock, flags);
>   
>       *data = (msg->data & ~(INTREMAP_ENTRIES - 1)) | offset;
> @@ -594,12 +599,13 @@ void amd_iommu_read_msi_from_ire(
>       const struct pci_dev *pdev = msi_desc->dev;
>       u16 bdf = pdev ? PCI_BDF2(pdev->bus, pdev->devfn) : hpet_sbdf.bdf;
>       u16 seg = pdev ? pdev->seg : hpet_sbdf.seg;
> +    const struct amd_iommu *iommu = _find_iommu_for_device(seg, bdf);
>       union irte_ptr entry;
>   
> -    if ( IS_ERR_OR_NULL(_find_iommu_for_device(seg, bdf)) )
> +    if ( IS_ERR_OR_NULL(iommu) )
>           return;
>   
> -    entry = get_intremap_entry(seg, get_dma_requestor_id(seg, bdf), offset);
> +    entry = get_intremap_entry(iommu, get_dma_requestor_id(seg, bdf), offset);
>   
>       if ( msi_desc->msi_attrib.type == PCI_CAP_ID_MSI )
>       {
> 

-- 
Brian Woods

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 07/14] AMD/IOMMU: pass IOMMU to {get, free, update}_intremap_entry()
Posted by Andrew Cooper 4 years, 9 months ago
On 16/07/2019 17:37, Jan Beulich wrote:
> The functions will want to know IOMMU properties (specifically the IRTE
> size) subsequently.
>
> Rather than introducing a second error path bogusly returning -E... from
> amd_iommu_read_ioapic_from_ire(), also change the existing one to follow
> VT-d in returning the raw (untranslated) IO-APIC RTE.

I'm not convinced that this is any less bogus.  The caller still can't
figure out if an error occurred.

Still, consistency with VT-d is less bad overall.

>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v3 08/14] AMD/IOMMU: introduce 128-bit IRTE non-guest-APIC IRTE format
Posted by Jan Beulich 4 years, 9 months ago
This is in preparation of actually enabling x2APIC mode, which requires
this wider IRTE format to be used.

A specific remark regarding the first hunk changing
amd_iommu_ioapic_update_ire(): This bypass was introduced for XSA-36,
i.e. by 94d4a1119d ("AMD,IOMMU: Clean up old entries in remapping
tables when creating new one"). Other code introduced by that change has
meanwhile disappeared or further changed, and I wonder if - rather than
adding an x2apic_enabled check to the conditional - the bypass couldn't
be deleted altogether. For now the goal is to affect the non-x2APIC
paths as little as possible.

Take the liberty and use the new "fresh" flag to suppress an unneeded
flush in update_intremap_entry_from_ioapic().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v3: Avoid unrelated type changes in update_intremap_entry_from_ioapic().
     Drop irte_mode enum and variable. Convert INTREMAP_TABLE_ORDER into
     a static helper. Comment barrier() uses. Switch boolean bitfields to
     bool.
v2: Add cast in get_full_dest(). Re-base over changes earlier in the
     series. Don't use cmpxchg16b. Use barrier() instead of wmb().
---
Note that AMD's doc says Lowest Priority ("Arbitrated" by their naming)
mode is unavailable in x2APIC mode, but they've confirmed this to be a
mistake on their part.

--- a/xen/drivers/passthrough/amd/iommu_intr.c
+++ b/xen/drivers/passthrough/amd/iommu_intr.c
@@ -40,12 +40,38 @@ union irte32 {
      struct irte_basic basic;
  };
  
+struct irte_full {
+    bool remap_en:1;
+    bool sup_io_pf:1;
+    unsigned int int_type:3;
+    bool rq_eoi:1;
+    bool dm:1;
+    bool guest_mode:1; /* MBZ */
+    unsigned int dest_lo:24;
+    unsigned int :32;
+    unsigned int vector:8;
+    unsigned int :24;
+    unsigned int :24;
+    unsigned int dest_hi:8;
+};
+
+union irte128 {
+    uint64_t raw[2];
+    struct irte_full full;
+};
+
  union irte_ptr {
      void *ptr;
      union irte32 *ptr32;
+    union irte128 *ptr128;
  };
  
-#define INTREMAP_TABLE_ORDER    1
+union irte_cptr {
+    const void *ptr;
+    const union irte32 *ptr32;
+    const union irte128 *ptr128;
+} __transparent__;
+
  #define INTREMAP_LENGTH 0xB
  #define INTREMAP_ENTRIES (1 << INTREMAP_LENGTH)
  
@@ -58,6 +84,13 @@ unsigned int nr_ioapic_sbdf;
  
  static void dump_intremap_tables(unsigned char key);
  
+static unsigned int __init intremap_table_order(const struct amd_iommu *iommu)
+{
+    return iommu->ctrl.ga_en
+           ? get_order_from_bytes(INTREMAP_ENTRIES * sizeof(union irte128))
+           : get_order_from_bytes(INTREMAP_ENTRIES * sizeof(union irte32));
+}
+
  unsigned int ioapic_id_to_index(unsigned int apic_id)
  {
      unsigned int idx;
@@ -132,7 +165,10 @@ static union irte_ptr get_intremap_entry
  
      ASSERT(table.ptr && (index < INTREMAP_ENTRIES));
  
-    table.ptr32 += index;
+    if ( iommu->ctrl.ga_en )
+        table.ptr128 += index;
+    else
+        table.ptr32 += index;
  
      return table;
  }
@@ -142,7 +178,15 @@ static void free_intremap_entry(const st
  {
      union irte_ptr entry = get_intremap_entry(iommu, bdf, index);
  
-    ACCESS_ONCE(entry.ptr32->raw[0]) = 0;
+    if ( iommu->ctrl.ga_en )
+    {
+        ACCESS_ONCE(entry.ptr128->raw[0]) = 0;
+        /* Low half (containing RemapEn) needs to be cleared first. */
+        barrier();
+        entry.ptr128->raw[1] = 0;
+    }
+    else
+        ACCESS_ONCE(entry.ptr32->raw[0]) = 0;
  
      __clear_bit(index, get_ivrs_mappings(iommu->seg)[bdf].intremap_inuse);
  }
@@ -152,16 +196,40 @@ static void update_intremap_entry(const
                                    unsigned int vector, unsigned int int_type,
                                    unsigned int dest_mode, unsigned int dest)
  {
-    struct irte_basic basic = {
-        .remap_en = true,
-        .int_type = int_type,
-        .dm = dest_mode,
-        .dest = dest,
-        .vector = vector,
-    };
+    if ( iommu->ctrl.ga_en )
+    {
+        struct irte_full full = {
+            .remap_en = true,
+            .int_type = int_type,
+            .dm = dest_mode,
+            .dest_lo = dest,
+            .dest_hi = dest >> 24,
+            .vector = vector,
+        };
+
+        ACCESS_ONCE(entry.ptr128->raw[0]) = 0;
+        /* Low half, in particular RemapEn, needs to be cleared first. */
+        barrier();
+        entry.ptr128->raw[1] =
+            container_of(&full, union irte128, full)->raw[1];
+        /* High half needs to be set before low one (containing RemapEn). */
+        barrier();
+        ACCESS_ONCE(entry.ptr128->raw[0]) =
+            container_of(&full, union irte128, full)->raw[0];
+    }
+    else
+    {
+        struct irte_basic basic = {
+            .remap_en = true,
+            .int_type = int_type,
+            .dm = dest_mode,
+            .dest = dest,
+            .vector = vector,
+        };
  
-    ACCESS_ONCE(entry.ptr32->raw[0]) =
-        container_of(&basic, union irte32, basic)->raw[0];
+        ACCESS_ONCE(entry.ptr32->raw[0]) =
+            container_of(&basic, union irte32, basic)->raw[0];
+    }
  }
  
  static inline int get_rte_index(const struct IO_APIC_route_entry *rte)
@@ -175,6 +243,11 @@ static inline void set_rte_index(struct
      rte->delivery_mode = offset >> 8;
  }
  
+static inline unsigned int get_full_dest(const union irte128 *entry)
+{
+    return entry->full.dest_lo | ((unsigned int)entry->full.dest_hi << 24);
+}
+
  static int update_intremap_entry_from_ioapic(
      int bdf,
      struct amd_iommu *iommu,
@@ -184,10 +257,11 @@ static int update_intremap_entry_from_io
  {
      unsigned long flags;
      union irte_ptr entry;
-    u8 delivery_mode, dest, vector, dest_mode;
+    uint8_t delivery_mode, vector, dest_mode;
      int req_id;
      spinlock_t *lock;
-    unsigned int offset;
+    unsigned int dest, offset;
+    bool fresh = false;
  
      req_id = get_intremap_requestor_id(iommu->seg, bdf);
      lock = get_intremap_lock(iommu->seg, req_id);
@@ -195,7 +269,7 @@ static int update_intremap_entry_from_io
      delivery_mode = rte->delivery_mode;
      vector = rte->vector;
      dest_mode = rte->dest_mode;
-    dest = rte->dest.logical.logical_dest;
+    dest = x2apic_enabled ? rte->dest.dest32 : rte->dest.logical.logical_dest;
  
      spin_lock_irqsave(lock, flags);
  
@@ -210,25 +284,40 @@ static int update_intremap_entry_from_io
              return -ENOSPC;
          }
          *index = offset;
-        lo_update = 1;
+        fresh = true;
      }
  
      entry = get_intremap_entry(iommu, req_id, offset);
-    if ( !lo_update )
+    if ( fresh )
+        /* nothing */;
+    else if ( !lo_update )
      {
          /*
           * Low half of incoming RTE is already in remapped format,
           * so need to recover vector and delivery mode from IRTE.
           */
          ASSERT(get_rte_index(rte) == offset);
-        vector = entry.ptr32->basic.vector;
+        if ( iommu->ctrl.ga_en )
+            vector = entry.ptr128->full.vector;
+        else
+            vector = entry.ptr32->basic.vector;
+        /* The IntType fields match for both formats. */
          delivery_mode = entry.ptr32->basic.int_type;
      }
+    else if ( x2apic_enabled )
+    {
+        /*
+         * High half of incoming RTE was read from the I/O APIC and hence may
+         * not hold the full destination, so need to recover full destination
+         * from IRTE.
+         */
+        dest = get_full_dest(entry.ptr128);
+    }
      update_intremap_entry(iommu, entry, vector, delivery_mode, dest_mode, dest);
  
      spin_unlock_irqrestore(lock, flags);
  
-    if ( iommu->enabled )
+    if ( iommu->enabled && !fresh )
      {
          spin_lock_irqsave(&iommu->lock, flags);
          amd_iommu_flush_intremap(iommu, req_id);
@@ -286,6 +375,18 @@ int __init amd_iommu_setup_ioapic_remapp
              dest_mode = rte.dest_mode;
              dest = rte.dest.logical.logical_dest;
  
+            if ( iommu->ctrl.xt_en )
+            {
+                /*
+                 * In x2APIC mode we have no way of discovering the high 24
+                 * bits of the destination of an already enabled interrupt.
+                 * We come here earlier than for xAPIC mode, so no interrupts
+                 * should have been set up before.
+                 */
+                AMD_IOMMU_DEBUG("Unmasked IO-APIC#%u entry %u in x2APIC mode\n",
+                                IO_APIC_ID(apic), pin);
+            }
+
              spin_lock_irqsave(lock, flags);
              offset = alloc_intremap_entry(seg, req_id, 1);
              BUG_ON(offset >= INTREMAP_ENTRIES);
@@ -320,7 +421,8 @@ void amd_iommu_ioapic_update_ire(
      struct IO_APIC_route_entry new_rte = { 0 };
      unsigned int rte_lo = (reg & 1) ? reg - 1 : reg;
      unsigned int pin = (reg - 0x10) / 2;
-    int saved_mask, seg, bdf, rc;
+    int seg, bdf, rc;
+    bool saved_mask, fresh = false;
      struct amd_iommu *iommu;
      unsigned int idx;
  
@@ -362,12 +464,22 @@ void amd_iommu_ioapic_update_ire(
          *(((u32 *)&new_rte) + 1) = value;
      }
  
-    if ( new_rte.mask &&
-         ioapic_sbdf[idx].pin_2_idx[pin] >= INTREMAP_ENTRIES )
+    if ( ioapic_sbdf[idx].pin_2_idx[pin] >= INTREMAP_ENTRIES )
      {
          ASSERT(saved_mask);
-        __io_apic_write(apic, reg, value);
-        return;
+
+        /*
+         * There's nowhere except the IRTE to store a full 32-bit destination,
+         * so we may not bypass entry allocation and updating of the low RTE
+         * half in the (usual) case of the high RTE half getting written first.
+         */
+        if ( new_rte.mask && !x2apic_enabled )
+        {
+            __io_apic_write(apic, reg, value);
+            return;
+        }
+
+        fresh = true;
      }
  
      /* mask the interrupt while we change the intremap table */
@@ -396,8 +508,12 @@ void amd_iommu_ioapic_update_ire(
      if ( reg == rte_lo )
          return;
  
-    /* unmask the interrupt after we have updated the intremap table */
-    if ( !saved_mask )
+    /*
+     * Unmask the interrupt after we have updated the intremap table. Also
+     * write the low half if a fresh entry was allocated for a high half
+     * update in x2APIC mode.
+     */
+    if ( !saved_mask || (x2apic_enabled && fresh) )
      {
          old_rte.mask = saved_mask;
          __io_apic_write(apic, rte_lo, *((u32 *)&old_rte));
@@ -411,31 +527,40 @@ unsigned int amd_iommu_read_ioapic_from_
      unsigned int offset;
      unsigned int val = __io_apic_read(apic, reg);
      unsigned int pin = (reg - 0x10) / 2;
+    uint16_t seg, bdf, req_id;
+    const struct amd_iommu *iommu;
+    union irte_ptr entry;
  
      idx = ioapic_id_to_index(IO_APIC_ID(apic));
      if ( idx == MAX_IO_APICS )
          return val;
  
      offset = ioapic_sbdf[idx].pin_2_idx[pin];
+    if ( offset >= INTREMAP_ENTRIES )
+        return val;
  
-    if ( !(reg & 1) && offset < INTREMAP_ENTRIES )
-    {
-        u16 bdf = ioapic_sbdf[idx].bdf;
-        u16 seg = ioapic_sbdf[idx].seg;
-        u16 req_id = get_intremap_requestor_id(seg, bdf);
-        const struct amd_iommu *iommu = find_iommu_for_device(seg, bdf);
-        union irte_ptr entry;
+    seg = ioapic_sbdf[idx].seg;
+    bdf = ioapic_sbdf[idx].bdf;
+    iommu = find_iommu_for_device(seg, bdf);
+    if ( !iommu )
+        return val;
+    req_id = get_intremap_requestor_id(seg, bdf);
+    entry = get_intremap_entry(iommu, req_id, offset);
  
-        if ( !iommu )
-            return val;
+    if ( !(reg & 1) )
+    {
          ASSERT(offset == (val & (INTREMAP_ENTRIES - 1)));
-        entry = get_intremap_entry(iommu, req_id, offset);
          val &= ~(INTREMAP_ENTRIES - 1);
+        /* The IntType fields match for both formats. */
          val |= MASK_INSR(entry.ptr32->basic.int_type,
                           IO_APIC_REDIR_DELIV_MODE_MASK);
-        val |= MASK_INSR(entry.ptr32->basic.vector,
+        val |= MASK_INSR(iommu->ctrl.ga_en
+                         ? entry.ptr128->full.vector
+                         : entry.ptr32->basic.vector,
                           IO_APIC_REDIR_VECTOR_MASK);
      }
+    else if ( x2apic_enabled )
+        val = get_full_dest(entry.ptr128);
  
      return val;
  }
@@ -447,9 +572,9 @@ static int update_intremap_entry_from_ms
      unsigned long flags;
      union irte_ptr entry;
      u16 req_id, alias_id;
-    u8 delivery_mode, dest, vector, dest_mode;
+    uint8_t delivery_mode, vector, dest_mode;
      spinlock_t *lock;
-    unsigned int offset, i;
+    unsigned int dest, offset, i;
  
      req_id = get_dma_requestor_id(iommu->seg, bdf);
      alias_id = get_intremap_requestor_id(iommu->seg, bdf);
@@ -470,7 +595,12 @@ static int update_intremap_entry_from_ms
      dest_mode = (msg->address_lo >> MSI_ADDR_DESTMODE_SHIFT) & 0x1;
      delivery_mode = (msg->data >> MSI_DATA_DELIVERY_MODE_SHIFT) & 0x1;
      vector = (msg->data >> MSI_DATA_VECTOR_SHIFT) & MSI_DATA_VECTOR_MASK;
-    dest = (msg->address_lo >> MSI_ADDR_DEST_ID_SHIFT) & 0xff;
+
+    if ( x2apic_enabled )
+        dest = msg->dest32;
+    else
+        dest = MASK_EXTR(msg->address_lo, MSI_ADDR_DEST_ID_MASK);
+
      offset = *remap_index;
      if ( offset >= INTREMAP_ENTRIES )
      {
@@ -616,10 +746,21 @@ void amd_iommu_read_msi_from_ire(
      }
  
      msg->data &= ~(INTREMAP_ENTRIES - 1);
+    /* The IntType fields match for both formats. */
      msg->data |= MASK_INSR(entry.ptr32->basic.int_type,
                             MSI_DATA_DELIVERY_MODE_MASK);
-    msg->data |= MASK_INSR(entry.ptr32->basic.vector,
-                           MSI_DATA_VECTOR_MASK);
+    if ( iommu->ctrl.ga_en )
+    {
+        msg->data |= MASK_INSR(entry.ptr128->full.vector,
+                               MSI_DATA_VECTOR_MASK);
+        msg->dest32 = get_full_dest(entry.ptr128);
+    }
+    else
+    {
+        msg->data |= MASK_INSR(entry.ptr32->basic.vector,
+                               MSI_DATA_VECTOR_MASK);
+        msg->dest32 = entry.ptr32->basic.dest;
+    }
  }
  
  int __init amd_iommu_free_intremap_table(
@@ -631,7 +772,7 @@ int __init amd_iommu_free_intremap_table
  
      if ( tb )
      {
-        __free_amd_iommu_tables(tb, INTREMAP_TABLE_ORDER);
+        __free_amd_iommu_tables(tb, intremap_table_order(iommu));
          ivrs_mapping->intremap_table = NULL;
      }
  
@@ -641,10 +782,10 @@ int __init amd_iommu_free_intremap_table
  void *__init amd_iommu_alloc_intremap_table(
      const struct amd_iommu *iommu, unsigned long **inuse_map)
  {
-    void *tb;
-    tb = __alloc_amd_iommu_tables(INTREMAP_TABLE_ORDER);
+    void *tb = __alloc_amd_iommu_tables(intremap_table_order(iommu));
+
      BUG_ON(tb == NULL);
-    memset(tb, 0, PAGE_SIZE * (1UL << INTREMAP_TABLE_ORDER));
+    memset(tb, 0, PAGE_SIZE << intremap_table_order(iommu));
      *inuse_map = xzalloc_array(unsigned long, BITS_TO_LONGS(INTREMAP_ENTRIES));
      BUG_ON(*inuse_map == NULL);
      return tb;
@@ -685,18 +826,29 @@ int __init amd_setup_hpet_msi(struct msi
      return rc;
  }
  
-static void dump_intremap_table(const u32 *table)
+static void dump_intremap_table(const struct amd_iommu *iommu,
+                                union irte_cptr tbl)
  {
-    u32 count;
+    unsigned int count;
  
-    if ( !table )
+    if ( !tbl.ptr )
          return;
  
      for ( count = 0; count < INTREMAP_ENTRIES; count++ )
      {
-        if ( !table[count] )
-            continue;
-        printk("    IRTE[%03x] %08x\n", count, table[count]);
+        if ( iommu->ctrl.ga_en )
+        {
+            if ( !tbl.ptr128[count].raw[0] && !tbl.ptr128[count].raw[1] )
+                continue;
+            printk("    IRTE[%03x] %016lx_%016lx\n",
+                   count, tbl.ptr128[count].raw[1], tbl.ptr128[count].raw[0]);
+        }
+        else
+        {
+            if ( !tbl.ptr32[count].raw[0] )
+                continue;
+            printk("    IRTE[%03x] %08x\n", count, tbl.ptr32[count].raw[0]);
+        }
      }
  }
  
@@ -714,7 +866,7 @@ static int dump_intremap_mapping(const s
             PCI_FUNC(ivrs_mapping->dte_requestor_id));
  
      spin_lock_irqsave(&(ivrs_mapping->intremap_lock), flags);
-    dump_intremap_table(ivrs_mapping->intremap_table);
+    dump_intremap_table(iommu, ivrs_mapping->intremap_table);
      spin_unlock_irqrestore(&(ivrs_mapping->intremap_lock), flags);
  
      return 0;
@@ -731,6 +883,8 @@ static void dump_intremap_tables(unsigne
      printk("--- Dumping Shared IOMMU Interrupt Remapping Table ---\n");
  
      spin_lock_irqsave(&shared_intremap_lock, flags);
-    dump_intremap_table(shared_intremap_table);
+    dump_intremap_table(list_first_entry(&amd_iommu_head, struct amd_iommu,
+                                         list),
+                        shared_intremap_table);
      spin_unlock_irqrestore(&shared_intremap_lock, flags);
  }

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 08/14] AMD/IOMMU: introduce 128-bit IRTE non-guest-APIC IRTE format
Posted by Andrew Cooper 4 years, 9 months ago
On 16/07/2019 17:38, Jan Beulich wrote:
> This is in preparation of actually enabling x2APIC mode, which requires
> this wider IRTE format to be used.
>
> A specific remark regarding the first hunk changing
> amd_iommu_ioapic_update_ire(): This bypass was introduced for XSA-36,
> i.e. by 94d4a1119d ("AMD,IOMMU: Clean up old entries in remapping
> tables when creating new one"). Other code introduced by that change has
> meanwhile disappeared or further changed, and I wonder if - rather than
> adding an x2apic_enabled check to the conditional - the bypass couldn't
> be deleted altogether. For now the goal is to affect the non-x2APIC
> paths as little as possible.

There are plenty of mistakes with XSA-36.  Reading the XSA back, the
MITIGATION section gets the sense of the iommu=amd-iommu-perdev-intremap
boolean the wrong way around.  Oh well...

SP5100 erratum 28 only requires that the IDE and SATA devices share
tables, not that every device on the whole system shares tables.

With the proposed work to perform IOMMU assignment by group rather than
individually, this will naturally fall out as a quirk requiring the two
devices to be grouped, at which point we can drop all remnants of global
remapping tables.

In this case, I'm not sure it is worth caring about shared-table mode on
x2apic-capable systems.  0 people will be using that mode.  That said,
if its easier to wait until the IOMMU changes to make this adjustment,
then fine.

> @@ -142,7 +178,15 @@ static void free_intremap_entry(const st
>   {
>       union irte_ptr entry = get_intremap_entry(iommu, bdf, index);
>   
> -    ACCESS_ONCE(entry.ptr32->raw[0]) = 0;
> +    if ( iommu->ctrl.ga_en )
> +    {
> +        ACCESS_ONCE(entry.ptr128->raw[0]) = 0;
> +        /* Low half (containing RemapEn) needs to be cleared first. */
> +        barrier();

While this will function on x86, I still consider this buggy.  From a
conceptual point of view, barrier() is not the correct construction to
use, whereas smp_wmb() is.

As this is the only remaining issue, with it fixed in each location,
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 08/14] AMD/IOMMU: introduce 128-bit IRTE non-guest-APIC IRTE format
Posted by Jan Beulich 4 years, 9 months ago
On 19.07.2019 19:27, Andrew Cooper wrote:
> On 16/07/2019 17:38, Jan Beulich wrote:
>> This is in preparation of actually enabling x2APIC mode, which requires
>> this wider IRTE format to be used.
>>
>> A specific remark regarding the first hunk changing
>> amd_iommu_ioapic_update_ire(): This bypass was introduced for XSA-36,
>> i.e. by 94d4a1119d ("AMD,IOMMU: Clean up old entries in remapping
>> tables when creating new one"). Other code introduced by that change has
>> meanwhile disappeared or further changed, and I wonder if - rather than
>> adding an x2apic_enabled check to the conditional - the bypass couldn't
>> be deleted altogether. For now the goal is to affect the non-x2APIC
>> paths as little as possible.
> 
> There are plenty of mistakes with XSA-36.  Reading the XSA back, the
> MITIGATION section gets the sense of the iommu=amd-iommu-perdev-intremap
> boolean the wrong way around.  Oh well...
> 
> SP5100 erratum 28 only requires that the IDE and SATA devices share
> tables, not that every device on the whole system shares tables.
> 
> With the proposed work to perform IOMMU assignment by group rather than
> individually, this will naturally fall out as a quirk requiring the two
> devices to be grouped, at which point we can drop all remnants of global
> remapping tables.

Yes, and I'll be happy to see them go away.

> In this case, I'm not sure it is worth caring about shared-table mode on
> x2apic-capable systems.  0 people will be using that mode.  That said,
> if its easier to wait until the IOMMU changes to make this adjustment,
> then fine.

It certainly is, especially with backporting of this series in mind.

>> @@ -142,7 +178,15 @@ static void free_intremap_entry(const st
>>    {
>>        union irte_ptr entry = get_intremap_entry(iommu, bdf, index);
>>    
>> -    ACCESS_ONCE(entry.ptr32->raw[0]) = 0;
>> +    if ( iommu->ctrl.ga_en )
>> +    {
>> +        ACCESS_ONCE(entry.ptr128->raw[0]) = 0;
>> +        /* Low half (containing RemapEn) needs to be cleared first. */
>> +        barrier();
> 
> While this will function on x86, I still consider this buggy.  From a
> conceptual point of view, barrier() is not the correct construction to
> use, whereas smp_wmb() is.

I think it's the 3rd time now that I respond saying that barrier() is
as good or as bad as smp_wmb(), just for different reasons. While I
agree with you that barrier() is correct on x86 only, I'm yet to hear
back from you on my argument that smp_wmb() is incorrect when
considering its UP semantics (which we don't currently implement, but
which Linux as the origin of the construct can well be used for
reference). And I think we both don't really want wmb() here.

> As this is the only remaining issue, with it fixed in each location,
> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

I'm not going to apply this for now, until we've managed to come to an
agreement on the item above.

Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 08/14] AMD/IOMMU: introduce 128-bit IRTE non-guest-APIC IRTE format
Posted by Andrew Cooper 4 years, 8 months ago
On 22/07/2019 09:34, Jan Beulich wrote:
> On 19.07.2019 19:27, Andrew Cooper wrote:
>> On 16/07/2019 17:38, Jan Beulich wrote:
>>> @@ -142,7 +178,15 @@ static void free_intremap_entry(const st
>>>    {
>>>        union irte_ptr entry = get_intremap_entry(iommu, bdf, index);
>>>    
>>> -    ACCESS_ONCE(entry.ptr32->raw[0]) = 0;
>>> +    if ( iommu->ctrl.ga_en )
>>> +    {
>>> +        ACCESS_ONCE(entry.ptr128->raw[0]) = 0;
>>> +        /* Low half (containing RemapEn) needs to be cleared first. */
>>> +        barrier();
>> While this will function on x86, I still consider this buggy.  From a
>> conceptual point of view, barrier() is not the correct construction to
>> use, whereas smp_wmb() is.
> I think it's the 3rd time now that I respond saying that barrier() is
> as good or as bad as smp_wmb(), just for different reasons.

barrier() and smp_wmb() are different constructs, with specific,
*different* meanings.  From a programmers point of view, they should be
considered black boxes of functionality.

barrier() is for forcing the compiler to not reorder things.

smp_wmb() is about the external visibility of writes, as observed by a
different entity on a coherent fabric.

The fact they alias on x86 in an implementation detail of x86 cache
coherency - it does not mean they can legitimately be alternated in code.

This piece of code is a 2-way communication between the CPU core and the
IOMMU, over a coherent cache.  The IOMMU logically has an smp_rmb() in
its mirror functionality (although that is likely not how the property
is expressed).

> While I
> agree with you that barrier() is correct on x86 only, I'm yet to hear
> back from you on my argument that smp_wmb() is incorrect when
> considering its UP semantics (which we don't currently implement, but
> which Linux as the origin of the construct can well be used for
> reference).

UP vs SMP doesn't affect which is the correct construct to use.

>  And I think we both don't really want wmb() here.

No, because wmb() is definitely not the right thing to use.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 08/14] AMD/IOMMU: introduce 128-bit IRTE non-guest-APIC IRTE format
Posted by Jan Beulich 4 years, 8 months ago
On 22.07.2019 15:36, Andrew Cooper wrote:
> On 22/07/2019 09:34, Jan Beulich wrote:
>> On 19.07.2019 19:27, Andrew Cooper wrote:
>>> On 16/07/2019 17:38, Jan Beulich wrote:
>>>> @@ -142,7 +178,15 @@ static void free_intremap_entry(const st
>>>>     {
>>>>         union irte_ptr entry = get_intremap_entry(iommu, bdf, index);
>>>>     
>>>> -    ACCESS_ONCE(entry.ptr32->raw[0]) = 0;
>>>> +    if ( iommu->ctrl.ga_en )
>>>> +    {
>>>> +        ACCESS_ONCE(entry.ptr128->raw[0]) = 0;
>>>> +        /* Low half (containing RemapEn) needs to be cleared first. */
>>>> +        barrier();
>>> While this will function on x86, I still consider this buggy.  From a
>>> conceptual point of view, barrier() is not the correct construction to
>>> use, whereas smp_wmb() is.
>> I think it's the 3rd time now that I respond saying that barrier() is
>> as good or as bad as smp_wmb(), just for different reasons.
> 
> barrier() and smp_wmb() are different constructs, with specific,
> *different* meanings.  From a programmers point of view, they should be
> considered black boxes of functionality.
> 
> barrier() is for forcing the compiler to not reorder things.
> 
> smp_wmb() is about the external visibility of writes, as observed by a
> different entity on a coherent fabric.

I'm afraid I disagree here: The "smp" in its name means "CPU", not
"entity" in your sentence. Which is why ...

> The fact they alias on x86 in an implementation detail of x86 cache
> coherency - it does not mean they can legitimately be alternated in code.
> 
> This piece of code is a 2-way communication between the CPU core and the
> IOMMU, over a coherent cache.  The IOMMU logically has an smp_rmb() in
> its mirror functionality (although that is likely not how the property
> is expressed).
> 
>> While I
>> agree with you that barrier() is correct on x86 only, I'm yet to hear
>> back from you on my argument that smp_wmb() is incorrect when
>> considering its UP semantics (which we don't currently implement, but
>> which Linux as the origin of the construct can well be used for
>> reference).
> 
> UP vs SMP doesn't affect which is the correct construct to use.

... I disagree with this part too. Even nowadays Linux still has

#ifdef CONFIG_SMP
[...]
#else	/* !CONFIG_SMP */

#ifndef smp_mb
#define smp_mb()	barrier()
#endif

#ifndef smp_rmb
#define smp_rmb()	barrier()
#endif

#ifndef smp_wmb
#define smp_wmb()	barrier()
#endif

in asm-generic/barrier.h, i.e. independent of architecture. Yet the
SMP config setting is concerned about CPUs only, not "entities".

Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 08/14] AMD/IOMMU: introduce 128-bit IRTE non-guest-APIC IRTE format
Posted by Andrew Cooper 4 years, 8 months ago
On 22/07/2019 16:01, Jan Beulich wrote:
> On 22.07.2019 15:36, Andrew Cooper wrote:
>> On 22/07/2019 09:34, Jan Beulich wrote:
>>> On 19.07.2019 19:27, Andrew Cooper wrote:
>>>> On 16/07/2019 17:38, Jan Beulich wrote:
>>>>> @@ -142,7 +178,15 @@ static void free_intremap_entry(const st
>>>>>     {
>>>>>         union irte_ptr entry = get_intremap_entry(iommu, bdf, index);
>>>>>     
>>>>> -    ACCESS_ONCE(entry.ptr32->raw[0]) = 0;
>>>>> +    if ( iommu->ctrl.ga_en )
>>>>> +    {
>>>>> +        ACCESS_ONCE(entry.ptr128->raw[0]) = 0;
>>>>> +        /* Low half (containing RemapEn) needs to be cleared first. */
>>>>> +        barrier();
>>>> While this will function on x86, I still consider this buggy.  From a
>>>> conceptual point of view, barrier() is not the correct construction to
>>>> use, whereas smp_wmb() is.
>>> I think it's the 3rd time now that I respond saying that barrier() is
>>> as good or as bad as smp_wmb(), just for different reasons.
>> barrier() and smp_wmb() are different constructs, with specific,
>> *different* meanings.  From a programmers point of view, they should be
>> considered black boxes of functionality.
>>
>> barrier() is for forcing the compiler to not reorder things.
>>
>> smp_wmb() is about the external visibility of writes, as observed by a
>> different entity on a coherent fabric.
> I'm afraid I disagree here: The "smp" in its name means "CPU", not
> "entity" in your sentence.

Citation definitely needed.

The term SMP means Symmetric MultiProcessing, but no computer these days
matches any of the traditional definitions.  You can thank the fact we
are one of the fastest evolving industries in the world, and that the
term you're using is more than 20 years old.

In particular, it predates cache-coherent uncore devices. 
Cache-coherent devices by definition have the same ordering properties
and constraints as cpus, because they are part of one shared (or dare I
say, symmetric), cache-coherent domain.

How would your argument change if the IOMMU was a real CPU running real
x86 code?  Its interface to the rest of the system would be identical,
and in that case, it would obviously need an smp_{r,w}mb() pair for
correctness reasons.  This is why smp_wmb() is the only appropriate
construct to use.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 08/14] AMD/IOMMU: introduce 128-bit IRTE non-guest-APIC IRTE format
Posted by Jan Beulich 4 years, 8 months ago
On 22.07.2019 17:43, Andrew Cooper wrote:
> On 22/07/2019 16:01, Jan Beulich wrote:
>> On 22.07.2019 15:36, Andrew Cooper wrote:
>>> On 22/07/2019 09:34, Jan Beulich wrote:
>>>> On 19.07.2019 19:27, Andrew Cooper wrote:
>>>>> On 16/07/2019 17:38, Jan Beulich wrote:
>>>>>> @@ -142,7 +178,15 @@ static void free_intremap_entry(const st
>>>>>>      {
>>>>>>          union irte_ptr entry = get_intremap_entry(iommu, bdf, index);
>>>>>>      
>>>>>> -    ACCESS_ONCE(entry.ptr32->raw[0]) = 0;
>>>>>> +    if ( iommu->ctrl.ga_en )
>>>>>> +    {
>>>>>> +        ACCESS_ONCE(entry.ptr128->raw[0]) = 0;
>>>>>> +        /* Low half (containing RemapEn) needs to be cleared first. */
>>>>>> +        barrier();
>>>>> While this will function on x86, I still consider this buggy.  From a
>>>>> conceptual point of view, barrier() is not the correct construction to
>>>>> use, whereas smp_wmb() is.
>>>> I think it's the 3rd time now that I respond saying that barrier() is
>>>> as good or as bad as smp_wmb(), just for different reasons.
>>> barrier() and smp_wmb() are different constructs, with specific,
>>> *different* meanings.  From a programmers point of view, they should be
>>> considered black boxes of functionality.
>>>
>>> barrier() is for forcing the compiler to not reorder things.
>>>
>>> smp_wmb() is about the external visibility of writes, as observed by a
>>> different entity on a coherent fabric.
>> I'm afraid I disagree here: The "smp" in its name means "CPU", not
>> "entity" in your sentence.
> 
> Citation definitely needed.

Which I did provide in the earlier reply: If what you say was
intended to be that way, the !CONFIG_SMP definitions in Linux were
wrong, and ...

> The term SMP means Symmetric MultiProcessing, but no computer these days
> matches any of the traditional definitions.  You can thank the fact we
> are one of the fastest evolving industries in the world, and that the
> term you're using is more than 20 years old.

... would have been for a long time.

> In particular, it predates cache-coherent uncore devices.
> Cache-coherent devices by definition have the same ordering properties
> and constraints as cpus, because they are part of one shared (or dare I
> say, symmetric), cache-coherent domain.
> 
> How would your argument change if the IOMMU was a real CPU running real
> x86 code?  Its interface to the rest of the system would be identical,
> and in that case, it would obviously need an smp_{r,w}mb() pair for
> correctness reasons.  This is why smp_wmb() is the only appropriate
> construct to use.

It wouldn't change at all. What matters (as per above) is the
understanding the OS has, i.e. what is being surfaced to it as CPU.

Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 08/14] AMD/IOMMU: introduce 128-bit IRTE non-guest-APIC IRTE format
Posted by Jan Beulich 4 years, 8 months ago
On 23.07.2019 10:13, Jan Beulich wrote:
> On 22.07.2019 17:43, Andrew Cooper wrote:
>> How would your argument change if the IOMMU was a real CPU running real
>> x86 code?  Its interface to the rest of the system would be identical,
>> and in that case, it would obviously need an smp_{r,w}mb() pair for
>> correctness reasons.  This is why smp_wmb() is the only appropriate
>> construct to use.
> 
> It wouldn't change at all. What matters (as per above) is the
> understanding the OS has, i.e. what is being surfaced to it as CPU.

Oh, btw - I've got curious whether we could use Linux sources for
arbitration. What I found though is that they don't use any barrier
at all - see modify_irte_ga().

Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v3 09/14] AMD/IOMMU: split amd_iommu_init_one()
Posted by Jan Beulich 4 years, 9 months ago
Mapping the MMIO space and obtaining feature information needs to happen
slightly earlier, such that for x2APIC support we can set XTEn prior to
calling amd_iommu_update_ivrs_mapping_acpi() and
amd_iommu_setup_ioapic_remapping().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -970,14 +970,6 @@ static void * __init allocate_ppr_log(st
  
  static int __init amd_iommu_init_one(struct amd_iommu *iommu)
  {
-    if ( map_iommu_mmio_region(iommu) != 0 )
-        goto error_out;
-
-    get_iommu_features(iommu);
-
-    if ( iommu->features.raw )
-        iommuv2_enabled = 1;
-
      if ( allocate_cmd_buffer(iommu) == NULL )
          goto error_out;
  
@@ -1202,6 +1194,23 @@ static bool_t __init amd_sp5100_erratum2
      return 0;
  }
  
+static int __init amd_iommu_prepare_one(struct amd_iommu *iommu)
+{
+    int rc = alloc_ivrs_mappings(iommu->seg);
+
+    if ( !rc )
+        rc = map_iommu_mmio_region(iommu);
+    if ( rc )
+        return rc;
+
+    get_iommu_features(iommu);
+
+    if ( iommu->features.raw )
+        iommuv2_enabled = true;
+
+    return 0;
+}
+
  int __init amd_iommu_init(void)
  {
      struct amd_iommu *iommu;
@@ -1232,7 +1241,7 @@ int __init amd_iommu_init(void)
      radix_tree_init(&ivrs_maps);
      for_each_amd_iommu ( iommu )
      {
-        rc = alloc_ivrs_mappings(iommu->seg);
+        rc = amd_iommu_prepare_one(iommu);
          if ( rc )
              goto error_out;
      }

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 09/14] AMD/IOMMU: split amd_iommu_init_one()
Posted by Woods, Brian 4 years, 9 months ago
On Tue, Jul 16, 2019 at 04:39:10PM +0000, Jan Beulich wrote:
> Mapping the MMIO space and obtaining feature information needs to happen
> slightly earlier, such that for x2APIC support we can set XTEn prior to
> calling amd_iommu_update_ivrs_mapping_acpi() and
> amd_iommu_setup_ioapic_remapping().
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

Acked-by: Brian Woods <brian.woods@amd.com>

> --- a/xen/drivers/passthrough/amd/iommu_init.c
> +++ b/xen/drivers/passthrough/amd/iommu_init.c
> @@ -970,14 +970,6 @@ static void * __init allocate_ppr_log(st
>   
>   static int __init amd_iommu_init_one(struct amd_iommu *iommu)
>   {
> -    if ( map_iommu_mmio_region(iommu) != 0 )
> -        goto error_out;
> -
> -    get_iommu_features(iommu);
> -
> -    if ( iommu->features.raw )
> -        iommuv2_enabled = 1;
> -
>       if ( allocate_cmd_buffer(iommu) == NULL )
>           goto error_out;
>   
> @@ -1202,6 +1194,23 @@ static bool_t __init amd_sp5100_erratum2
>       return 0;
>   }
>   
> +static int __init amd_iommu_prepare_one(struct amd_iommu *iommu)
> +{
> +    int rc = alloc_ivrs_mappings(iommu->seg);
> +
> +    if ( !rc )
> +        rc = map_iommu_mmio_region(iommu);
> +    if ( rc )
> +        return rc;
> +
> +    get_iommu_features(iommu);
> +
> +    if ( iommu->features.raw )
> +        iommuv2_enabled = true;
> +
> +    return 0;
> +}
> +
>   int __init amd_iommu_init(void)
>   {
>       struct amd_iommu *iommu;
> @@ -1232,7 +1241,7 @@ int __init amd_iommu_init(void)
>       radix_tree_init(&ivrs_maps);
>       for_each_amd_iommu ( iommu )
>       {
> -        rc = alloc_ivrs_mappings(iommu->seg);
> +        rc = amd_iommu_prepare_one(iommu);
>           if ( rc )
>               goto error_out;
>       }
> 

-- 
Brian Woods

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v3 10/14] AMD/IOMMU: allow enabling with IRQ not yet set up
Posted by Jan Beulich 4 years, 9 months ago
Early enabling (to enter x2APIC mode) requires deferring of the IRQ
setup. Code to actually do that setup in the x2APIC case will get added
subsequently.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
v3: Re-base.

--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -814,7 +814,6 @@ static void amd_iommu_erratum_746_workar
  static void enable_iommu(struct amd_iommu *iommu)
  {
      unsigned long flags;
-    struct irq_desc *desc;
  
      spin_lock_irqsave(&iommu->lock, flags);
  
@@ -834,19 +833,27 @@ static void enable_iommu(struct amd_iomm
      if ( iommu->features.flds.ppr_sup )
          register_iommu_ppr_log_in_mmio_space(iommu);
  
-    desc = irq_to_desc(iommu->msi.irq);
-    spin_lock(&desc->lock);
-    set_msi_affinity(desc, NULL);
-    spin_unlock(&desc->lock);
+    if ( iommu->msi.irq > 0 )
+    {
+        struct irq_desc *desc = irq_to_desc(iommu->msi.irq);
+
+        spin_lock(&desc->lock);
+        set_msi_affinity(desc, NULL);
+        spin_unlock(&desc->lock);
+    }
  
      amd_iommu_msi_enable(iommu, IOMMU_CONTROL_ENABLED);
  
      set_iommu_ht_flags(iommu);
      set_iommu_command_buffer_control(iommu, IOMMU_CONTROL_ENABLED);
-    set_iommu_event_log_control(iommu, IOMMU_CONTROL_ENABLED);
  
-    if ( iommu->features.flds.ppr_sup )
-        set_iommu_ppr_log_control(iommu, IOMMU_CONTROL_ENABLED);
+    if ( iommu->msi.irq > 0 )
+    {
+        set_iommu_event_log_control(iommu, IOMMU_CONTROL_ENABLED);
+
+        if ( iommu->features.flds.ppr_sup )
+            set_iommu_ppr_log_control(iommu, IOMMU_CONTROL_ENABLED);
+    }
  
      if ( iommu->features.flds.gt_sup )
          set_iommu_guest_translation_control(iommu, IOMMU_CONTROL_ENABLED);

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 10/14] AMD/IOMMU: allow enabling with IRQ not yet set up
Posted by Woods, Brian 4 years, 9 months ago
On Tue, Jul 16, 2019 at 04:39:34PM +0000, Jan Beulich wrote:
> Early enabling (to enter x2APIC mode) requires deferring of the IRQ
> setup. Code to actually do that setup in the x2APIC case will get added
> subsequently.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

Acked-by: Brian Woods <brian.woods@amd.com>

> ---
> v3: Re-base.
> 
> --- a/xen/drivers/passthrough/amd/iommu_init.c
> +++ b/xen/drivers/passthrough/amd/iommu_init.c
> @@ -814,7 +814,6 @@ static void amd_iommu_erratum_746_workar
>   static void enable_iommu(struct amd_iommu *iommu)
>   {
>       unsigned long flags;
> -    struct irq_desc *desc;
>   
>       spin_lock_irqsave(&iommu->lock, flags);
>   
> @@ -834,19 +833,27 @@ static void enable_iommu(struct amd_iomm
>       if ( iommu->features.flds.ppr_sup )
>           register_iommu_ppr_log_in_mmio_space(iommu);
>   
> -    desc = irq_to_desc(iommu->msi.irq);
> -    spin_lock(&desc->lock);
> -    set_msi_affinity(desc, NULL);
> -    spin_unlock(&desc->lock);
> +    if ( iommu->msi.irq > 0 )
> +    {
> +        struct irq_desc *desc = irq_to_desc(iommu->msi.irq);
> +
> +        spin_lock(&desc->lock);
> +        set_msi_affinity(desc, NULL);
> +        spin_unlock(&desc->lock);
> +    }
>   
>       amd_iommu_msi_enable(iommu, IOMMU_CONTROL_ENABLED);
>   
>       set_iommu_ht_flags(iommu);
>       set_iommu_command_buffer_control(iommu, IOMMU_CONTROL_ENABLED);
> -    set_iommu_event_log_control(iommu, IOMMU_CONTROL_ENABLED);
>   
> -    if ( iommu->features.flds.ppr_sup )
> -        set_iommu_ppr_log_control(iommu, IOMMU_CONTROL_ENABLED);
> +    if ( iommu->msi.irq > 0 )
> +    {
> +        set_iommu_event_log_control(iommu, IOMMU_CONTROL_ENABLED);
> +
> +        if ( iommu->features.flds.ppr_sup )
> +            set_iommu_ppr_log_control(iommu, IOMMU_CONTROL_ENABLED);
> +    }
>   
>       if ( iommu->features.flds.gt_sup )
>           set_iommu_guest_translation_control(iommu, IOMMU_CONTROL_ENABLED);
> 

-- 
Brian Woods

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v3 11/14] AMD/IOMMU: adjust setup of internal interrupt for x2APIC mode
Posted by Jan Beulich 4 years, 9 months ago
In order to be able to express all possible destinations we need to make
use of this non-MSI-capability based mechanism. The new IRQ controller
structure can re-use certain MSI functions, though.

For now general and PPR interrupts still share a single vector, IRQ, and
hence handler.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
v3: Re-base.

--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -472,6 +472,44 @@ static hw_irq_controller iommu_maskable_
      .set_affinity = set_msi_affinity,
  };
  
+static void set_x2apic_affinity(struct irq_desc *desc, const cpumask_t *mask)
+{
+    struct amd_iommu *iommu = desc->action->dev_id;
+    unsigned int dest = set_desc_affinity(desc, mask);
+    union amd_iommu_x2apic_control ctrl = {};
+    unsigned long flags;
+
+    if ( dest == BAD_APICID )
+        return;
+
+    msi_compose_msg(desc->arch.vector, NULL, &iommu->msi.msg);
+    iommu->msi.msg.dest32 = dest;
+
+    ctrl.dest_mode = MASK_EXTR(iommu->msi.msg.address_lo,
+                               MSI_ADDR_DESTMODE_MASK);
+    ctrl.int_type = MASK_EXTR(iommu->msi.msg.data,
+                              MSI_DATA_DELIVERY_MODE_MASK);
+    ctrl.vector = desc->arch.vector;
+    ctrl.dest_lo = dest;
+    ctrl.dest_hi = dest >> 24;
+
+    spin_lock_irqsave(&iommu->lock, flags);
+    writeq(ctrl.raw, iommu->mmio_base + IOMMU_XT_INT_CTRL_MMIO_OFFSET);
+    writeq(ctrl.raw, iommu->mmio_base + IOMMU_XT_PPR_INT_CTRL_MMIO_OFFSET);
+    spin_unlock_irqrestore(&iommu->lock, flags);
+}
+
+static hw_irq_controller iommu_x2apic_type = {
+    .typename     = "IOMMU-x2APIC",
+    .startup      = irq_startup_none,
+    .shutdown     = irq_shutdown_none,
+    .enable       = irq_enable_none,
+    .disable      = irq_disable_none,
+    .ack          = ack_nonmaskable_msi_irq,
+    .end          = end_nonmaskable_msi_irq,
+    .set_affinity = set_x2apic_affinity,
+};
+
  static void parse_event_log_entry(struct amd_iommu *iommu, u32 entry[])
  {
      u16 domain_id, device_id, flags;
@@ -726,8 +764,6 @@ static void iommu_interrupt_handler(int
  static bool_t __init set_iommu_interrupt_handler(struct amd_iommu *iommu)
  {
      int irq, ret;
-    hw_irq_controller *handler;
-    u16 control;
  
      irq = create_irq(NUMA_NO_NODE);
      if ( irq <= 0 )
@@ -747,20 +783,43 @@ static bool_t __init set_iommu_interrupt
                          PCI_SLOT(iommu->bdf), PCI_FUNC(iommu->bdf));
          return 0;
      }
-    control = pci_conf_read16(iommu->seg, PCI_BUS(iommu->bdf),
-                              PCI_SLOT(iommu->bdf), PCI_FUNC(iommu->bdf),
-                              iommu->msi.msi_attrib.pos + PCI_MSI_FLAGS);
-    iommu->msi.msi.nvec = 1;
-    if ( is_mask_bit_support(control) )
-    {
-        iommu->msi.msi_attrib.maskbit = 1;
-        iommu->msi.msi.mpos = msi_mask_bits_reg(iommu->msi.msi_attrib.pos,
-                                                is_64bit_address(control));
-        handler = &iommu_maskable_msi_type;
+
+    if ( iommu->ctrl.int_cap_xt_en )
+    {
+        struct irq_desc *desc = irq_to_desc(irq);
+
+        iommu->msi.msi_attrib.pos = MSI_TYPE_IOMMU;
+        iommu->msi.msi_attrib.maskbit = 0;
+        iommu->msi.msi_attrib.is_64 = 1;
+
+        desc->msi_desc = &iommu->msi;
+        desc->handler = &iommu_x2apic_type;
+
+        ret = 0;
      }
      else
-        handler = &iommu_msi_type;
-    ret = __setup_msi_irq(irq_to_desc(irq), &iommu->msi, handler);
+    {
+        hw_irq_controller *handler;
+        u16 control;
+
+        control = pci_conf_read16(iommu->seg, PCI_BUS(iommu->bdf),
+                                  PCI_SLOT(iommu->bdf), PCI_FUNC(iommu->bdf),
+                                  iommu->msi.msi_attrib.pos + PCI_MSI_FLAGS);
+
+        iommu->msi.msi.nvec = 1;
+        if ( is_mask_bit_support(control) )
+        {
+            iommu->msi.msi_attrib.maskbit = 1;
+            iommu->msi.msi.mpos = msi_mask_bits_reg(iommu->msi.msi_attrib.pos,
+                                                    is_64bit_address(control));
+            handler = &iommu_maskable_msi_type;
+        }
+        else
+            handler = &iommu_msi_type;
+
+        ret = __setup_msi_irq(irq_to_desc(irq), &iommu->msi, handler);
+    }
+
      if ( !ret )
          ret = request_irq(irq, 0, iommu_interrupt_handler, "amd_iommu", iommu);
      if ( ret )
@@ -838,8 +897,19 @@ static void enable_iommu(struct amd_iomm
          struct irq_desc *desc = irq_to_desc(iommu->msi.irq);
  
          spin_lock(&desc->lock);
-        set_msi_affinity(desc, NULL);
-        spin_unlock(&desc->lock);
+
+        if ( iommu->ctrl.int_cap_xt_en )
+        {
+            set_x2apic_affinity(desc, NULL);
+            spin_unlock(&desc->lock);
+        }
+        else
+        {
+            set_msi_affinity(desc, NULL);
+            spin_unlock(&desc->lock);
+
+            amd_iommu_msi_enable(iommu, IOMMU_CONTROL_ENABLED);
+        }
      }
  
      amd_iommu_msi_enable(iommu, IOMMU_CONTROL_ENABLED);
@@ -879,7 +949,9 @@ static void disable_iommu(struct amd_iom
          return;
      }
  
-    amd_iommu_msi_enable(iommu, IOMMU_CONTROL_DISABLED);
+    if ( !iommu->ctrl.int_cap_xt_en )
+        amd_iommu_msi_enable(iommu, IOMMU_CONTROL_DISABLED);
+
      set_iommu_command_buffer_control(iommu, IOMMU_CONTROL_DISABLED);
      set_iommu_event_log_control(iommu, IOMMU_CONTROL_DISABLED);
  
--- a/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
+++ b/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
@@ -416,6 +416,25 @@ union amd_iommu_ext_features {
      } flds;
  };
  
+/* x2APIC Control Registers */
+#define IOMMU_XT_INT_CTRL_MMIO_OFFSET		0x0170
+#define IOMMU_XT_PPR_INT_CTRL_MMIO_OFFSET	0x0178
+#define IOMMU_XT_GA_INT_CTRL_MMIO_OFFSET	0x0180
+
+union amd_iommu_x2apic_control {
+    uint64_t raw;
+    struct {
+        unsigned int :2;
+        unsigned int dest_mode:1;
+        unsigned int :5;
+        unsigned int dest_lo:24;
+        unsigned int vector:8;
+        unsigned int int_type:1; /* DM in IOMMU spec 3.04 */
+        unsigned int :15;
+        unsigned int dest_hi:8;
+    };
+};
+
  /* Status Register*/
  #define IOMMU_STATUS_MMIO_OFFSET		0x2020
  #define IOMMU_STATUS_EVENT_OVERFLOW_MASK	0x00000001

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 11/14] AMD/IOMMU: adjust setup of internal interrupt for x2APIC mode
Posted by Andrew Cooper 4 years, 9 months ago
On 16/07/2019 17:39, Jan Beulich wrote:
> --- a/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
> +++ b/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
> @@ -416,6 +416,25 @@ union amd_iommu_ext_features {
>       } flds;
>   };
>   
> +/* x2APIC Control Registers */
> +#define IOMMU_XT_INT_CTRL_MMIO_OFFSET		0x0170
> +#define IOMMU_XT_PPR_INT_CTRL_MMIO_OFFSET	0x0178
> +#define IOMMU_XT_GA_INT_CTRL_MMIO_OFFSET	0x0180
> +
> +union amd_iommu_x2apic_control {
> +    uint64_t raw;
> +    struct {
> +        unsigned int :2;
> +        unsigned int dest_mode:1;
> +        unsigned int :5;
> +        unsigned int dest_lo:24;
> +        unsigned int vector:8;
> +        unsigned int int_type:1; /* DM in IOMMU spec 3.04 */
> +        unsigned int :15;
> +        unsigned int dest_hi:8;

Bool bitfields like you've done elsewhere in v3?

My pre-existing R-by stands, but ideally with this chagned.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 11/14] AMD/IOMMU: adjust setup of internal interrupt for x2APIC mode
Posted by Jan Beulich 4 years, 9 months ago
On 19.07.2019 19:31, Andrew Cooper wrote:
> On 16/07/2019 17:39, Jan Beulich wrote:
>> --- a/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
>> +++ b/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
>> @@ -416,6 +416,25 @@ union amd_iommu_ext_features {
>>        } flds;
>>    };
>>    
>> +/* x2APIC Control Registers */
>> +#define IOMMU_XT_INT_CTRL_MMIO_OFFSET		0x0170
>> +#define IOMMU_XT_PPR_INT_CTRL_MMIO_OFFSET	0x0178
>> +#define IOMMU_XT_GA_INT_CTRL_MMIO_OFFSET	0x0180
>> +
>> +union amd_iommu_x2apic_control {
>> +    uint64_t raw;
>> +    struct {
>> +        unsigned int :2;
>> +        unsigned int dest_mode:1;
>> +        unsigned int :5;
>> +        unsigned int dest_lo:24;
>> +        unsigned int vector:8;
>> +        unsigned int int_type:1; /* DM in IOMMU spec 3.04 */
>> +        unsigned int :15;
>> +        unsigned int dest_hi:8;
> 
> Bool bitfields like you've done elsewhere in v3?

I'd been considering this, but decided against because of ...

+static void set_x2apic_affinity(struct irq_desc *desc, const cpumask_t *mask)
+{
+    struct amd_iommu *iommu = desc->action->dev_id;
+    unsigned int dest = set_desc_affinity(desc, mask);
+    union amd_iommu_x2apic_control ctrl = {};
+    unsigned long flags;
+
+    if ( dest == BAD_APICID )
+        return;
+
+    msi_compose_msg(desc->arch.vector, NULL, &iommu->msi.msg);
+    iommu->msi.msg.dest32 = dest;
+
+    ctrl.dest_mode = MASK_EXTR(iommu->msi.msg.address_lo,
+                               MSI_ADDR_DESTMODE_MASK);
+    ctrl.int_type = MASK_EXTR(iommu->msi.msg.data,
+                              MSI_DATA_DELIVERY_MODE_MASK);

... this: We really mean a value copy here, not an "is zero" or
"is non-zero" one. I also think that both fields are not suitably
named for being boolean. In the recent re-work of struct
IO_APIC_route_entry (ca9310b24e) similar fields similarly were
left as "unsigned int". MSI's struct msg_data also falls into the
same category. I think if we wanted to switch to bool here, we
should do so everywhere at the same time (along with suitably
renaming fields).

Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 11/14] AMD/IOMMU: adjust setup of internal interrupt for x2APIC mode
Posted by Andrew Cooper 4 years, 8 months ago
On 22/07/2019 09:43, Jan Beulich wrote:
> On 19.07.2019 19:31, Andrew Cooper wrote:
>> On 16/07/2019 17:39, Jan Beulich wrote:
>>> --- a/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
>>> +++ b/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
>>> @@ -416,6 +416,25 @@ union amd_iommu_ext_features {
>>>        } flds;
>>>    };
>>>    
>>> +/* x2APIC Control Registers */
>>> +#define IOMMU_XT_INT_CTRL_MMIO_OFFSET		0x0170
>>> +#define IOMMU_XT_PPR_INT_CTRL_MMIO_OFFSET	0x0178
>>> +#define IOMMU_XT_GA_INT_CTRL_MMIO_OFFSET	0x0180
>>> +
>>> +union amd_iommu_x2apic_control {
>>> +    uint64_t raw;
>>> +    struct {
>>> +        unsigned int :2;
>>> +        unsigned int dest_mode:1;
>>> +        unsigned int :5;
>>> +        unsigned int dest_lo:24;
>>> +        unsigned int vector:8;
>>> +        unsigned int int_type:1; /* DM in IOMMU spec 3.04 */
>>> +        unsigned int :15;
>>> +        unsigned int dest_hi:8;
>> Bool bitfields like you've done elsewhere in v3?
> I'd been considering this, but decided against because of ...
>
> +static void set_x2apic_affinity(struct irq_desc *desc, const cpumask_t *mask)
> +{
> +    struct amd_iommu *iommu = desc->action->dev_id;
> +    unsigned int dest = set_desc_affinity(desc, mask);
> +    union amd_iommu_x2apic_control ctrl = {};
> +    unsigned long flags;
> +
> +    if ( dest == BAD_APICID )
> +        return;
> +
> +    msi_compose_msg(desc->arch.vector, NULL, &iommu->msi.msg);
> +    iommu->msi.msg.dest32 = dest;
> +
> +    ctrl.dest_mode = MASK_EXTR(iommu->msi.msg.address_lo,
> +                               MSI_ADDR_DESTMODE_MASK);
> +    ctrl.int_type = MASK_EXTR(iommu->msi.msg.data,
> +                              MSI_DATA_DELIVERY_MODE_MASK);
>
> ... this: We really mean a value copy here, not an "is zero" or
> "is non-zero" one. I also think that both fields are not suitably
> named for being boolean. In the recent re-work of struct
> IO_APIC_route_entry (ca9310b24e) similar fields similarly were
> left as "unsigned int". MSI's struct msg_data also falls into the
> same category. I think if we wanted to switch to bool here, we
> should do so everywhere at the same time (along with suitably
> renaming fields).

Architecturally, both of these are single-bit fields, no?

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 11/14] AMD/IOMMU: adjust setup of internal interrupt for x2APIC mode
Posted by Jan Beulich 4 years, 8 months ago
On 22.07.2019 15:45, Andrew Cooper wrote:
> On 22/07/2019 09:43, Jan Beulich wrote:
>> On 19.07.2019 19:31, Andrew Cooper wrote:
>>> On 16/07/2019 17:39, Jan Beulich wrote:
>>>> --- a/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
>>>> +++ b/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
>>>> @@ -416,6 +416,25 @@ union amd_iommu_ext_features {
>>>>         } flds;
>>>>     };
>>>>     
>>>> +/* x2APIC Control Registers */
>>>> +#define IOMMU_XT_INT_CTRL_MMIO_OFFSET		0x0170
>>>> +#define IOMMU_XT_PPR_INT_CTRL_MMIO_OFFSET	0x0178
>>>> +#define IOMMU_XT_GA_INT_CTRL_MMIO_OFFSET	0x0180
>>>> +
>>>> +union amd_iommu_x2apic_control {
>>>> +    uint64_t raw;
>>>> +    struct {
>>>> +        unsigned int :2;
>>>> +        unsigned int dest_mode:1;
>>>> +        unsigned int :5;
>>>> +        unsigned int dest_lo:24;
>>>> +        unsigned int vector:8;
>>>> +        unsigned int int_type:1; /* DM in IOMMU spec 3.04 */
>>>> +        unsigned int :15;
>>>> +        unsigned int dest_hi:8;
>>> Bool bitfields like you've done elsewhere in v3?
>> I'd been considering this, but decided against because of ...
>>
>> +static void set_x2apic_affinity(struct irq_desc *desc, const cpumask_t *mask)
>> +{
>> +    struct amd_iommu *iommu = desc->action->dev_id;
>> +    unsigned int dest = set_desc_affinity(desc, mask);
>> +    union amd_iommu_x2apic_control ctrl = {};
>> +    unsigned long flags;
>> +
>> +    if ( dest == BAD_APICID )
>> +        return;
>> +
>> +    msi_compose_msg(desc->arch.vector, NULL, &iommu->msi.msg);
>> +    iommu->msi.msg.dest32 = dest;
>> +
>> +    ctrl.dest_mode = MASK_EXTR(iommu->msi.msg.address_lo,
>> +                               MSI_ADDR_DESTMODE_MASK);
>> +    ctrl.int_type = MASK_EXTR(iommu->msi.msg.data,
>> +                              MSI_DATA_DELIVERY_MODE_MASK);
>>
>> ... this: We really mean a value copy here, not an "is zero" or
>> "is non-zero" one. I also think that both fields are not suitably
>> named for being boolean. In the recent re-work of struct
>> IO_APIC_route_entry (ca9310b24e) similar fields similarly were
>> left as "unsigned int". MSI's struct msg_data also falls into the
>> same category. I think if we wanted to switch to bool here, we
>> should do so everywhere at the same time (along with suitably
>> renaming fields).
> 
> Architecturally, both of these are single-bit fields, no?

Sure, but with the names we have there's no obvious indication
whether physical/logical are respectively true or false. Same
(or worse) for fixed/lowest priority, which in the LAPIC even
has further accompanying values (i.e. couldn't possibly be bool
there at all).

Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 11/14] AMD/IOMMU: adjust setup of internal interrupt for x2APIC mode
Posted by Woods, Brian 4 years, 9 months ago
On Tue, Jul 16, 2019 at 04:39:58PM +0000, Jan Beulich wrote:
> In order to be able to express all possible destinations we need to make
> use of this non-MSI-capability based mechanism. The new IRQ controller
> structure can re-use certain MSI functions, though.
> 
> For now general and PPR interrupts still share a single vector, IRQ, and
> hence handler.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

Acked-by: Brian Woods <brian.woods@amd.com>

> ---
> v3: Re-base.
> 
> --- a/xen/drivers/passthrough/amd/iommu_init.c
> +++ b/xen/drivers/passthrough/amd/iommu_init.c
> @@ -472,6 +472,44 @@ static hw_irq_controller iommu_maskable_
>       .set_affinity = set_msi_affinity,
>   };
>   
> +static void set_x2apic_affinity(struct irq_desc *desc, const cpumask_t *mask)
> +{
> +    struct amd_iommu *iommu = desc->action->dev_id;
> +    unsigned int dest = set_desc_affinity(desc, mask);
> +    union amd_iommu_x2apic_control ctrl = {};
> +    unsigned long flags;
> +
> +    if ( dest == BAD_APICID )
> +        return;
> +
> +    msi_compose_msg(desc->arch.vector, NULL, &iommu->msi.msg);
> +    iommu->msi.msg.dest32 = dest;
> +
> +    ctrl.dest_mode = MASK_EXTR(iommu->msi.msg.address_lo,
> +                               MSI_ADDR_DESTMODE_MASK);
> +    ctrl.int_type = MASK_EXTR(iommu->msi.msg.data,
> +                              MSI_DATA_DELIVERY_MODE_MASK);
> +    ctrl.vector = desc->arch.vector;
> +    ctrl.dest_lo = dest;
> +    ctrl.dest_hi = dest >> 24;
> +
> +    spin_lock_irqsave(&iommu->lock, flags);
> +    writeq(ctrl.raw, iommu->mmio_base + IOMMU_XT_INT_CTRL_MMIO_OFFSET);
> +    writeq(ctrl.raw, iommu->mmio_base + IOMMU_XT_PPR_INT_CTRL_MMIO_OFFSET);
> +    spin_unlock_irqrestore(&iommu->lock, flags);
> +}
> +
> +static hw_irq_controller iommu_x2apic_type = {
> +    .typename     = "IOMMU-x2APIC",
> +    .startup      = irq_startup_none,
> +    .shutdown     = irq_shutdown_none,
> +    .enable       = irq_enable_none,
> +    .disable      = irq_disable_none,
> +    .ack          = ack_nonmaskable_msi_irq,
> +    .end          = end_nonmaskable_msi_irq,
> +    .set_affinity = set_x2apic_affinity,
> +};
> +
>   static void parse_event_log_entry(struct amd_iommu *iommu, u32 entry[])
>   {
>       u16 domain_id, device_id, flags;
> @@ -726,8 +764,6 @@ static void iommu_interrupt_handler(int
>   static bool_t __init set_iommu_interrupt_handler(struct amd_iommu *iommu)
>   {
>       int irq, ret;
> -    hw_irq_controller *handler;
> -    u16 control;
>   
>       irq = create_irq(NUMA_NO_NODE);
>       if ( irq <= 0 )
> @@ -747,20 +783,43 @@ static bool_t __init set_iommu_interrupt
>                           PCI_SLOT(iommu->bdf), PCI_FUNC(iommu->bdf));
>           return 0;
>       }
> -    control = pci_conf_read16(iommu->seg, PCI_BUS(iommu->bdf),
> -                              PCI_SLOT(iommu->bdf), PCI_FUNC(iommu->bdf),
> -                              iommu->msi.msi_attrib.pos + PCI_MSI_FLAGS);
> -    iommu->msi.msi.nvec = 1;
> -    if ( is_mask_bit_support(control) )
> -    {
> -        iommu->msi.msi_attrib.maskbit = 1;
> -        iommu->msi.msi.mpos = msi_mask_bits_reg(iommu->msi.msi_attrib.pos,
> -                                                is_64bit_address(control));
> -        handler = &iommu_maskable_msi_type;
> +
> +    if ( iommu->ctrl.int_cap_xt_en )
> +    {
> +        struct irq_desc *desc = irq_to_desc(irq);
> +
> +        iommu->msi.msi_attrib.pos = MSI_TYPE_IOMMU;
> +        iommu->msi.msi_attrib.maskbit = 0;
> +        iommu->msi.msi_attrib.is_64 = 1;
> +
> +        desc->msi_desc = &iommu->msi;
> +        desc->handler = &iommu_x2apic_type;
> +
> +        ret = 0;
>       }
>       else
> -        handler = &iommu_msi_type;
> -    ret = __setup_msi_irq(irq_to_desc(irq), &iommu->msi, handler);
> +    {
> +        hw_irq_controller *handler;
> +        u16 control;
> +
> +        control = pci_conf_read16(iommu->seg, PCI_BUS(iommu->bdf),
> +                                  PCI_SLOT(iommu->bdf), PCI_FUNC(iommu->bdf),
> +                                  iommu->msi.msi_attrib.pos + PCI_MSI_FLAGS);
> +
> +        iommu->msi.msi.nvec = 1;
> +        if ( is_mask_bit_support(control) )
> +        {
> +            iommu->msi.msi_attrib.maskbit = 1;
> +            iommu->msi.msi.mpos = msi_mask_bits_reg(iommu->msi.msi_attrib.pos,
> +                                                    is_64bit_address(control));
> +            handler = &iommu_maskable_msi_type;
> +        }
> +        else
> +            handler = &iommu_msi_type;
> +
> +        ret = __setup_msi_irq(irq_to_desc(irq), &iommu->msi, handler);
> +    }
> +
>       if ( !ret )
>           ret = request_irq(irq, 0, iommu_interrupt_handler, "amd_iommu", iommu);
>       if ( ret )
> @@ -838,8 +897,19 @@ static void enable_iommu(struct amd_iomm
>           struct irq_desc *desc = irq_to_desc(iommu->msi.irq);
>   
>           spin_lock(&desc->lock);
> -        set_msi_affinity(desc, NULL);
> -        spin_unlock(&desc->lock);
> +
> +        if ( iommu->ctrl.int_cap_xt_en )
> +        {
> +            set_x2apic_affinity(desc, NULL);
> +            spin_unlock(&desc->lock);
> +        }
> +        else
> +        {
> +            set_msi_affinity(desc, NULL);
> +            spin_unlock(&desc->lock);
> +
> +            amd_iommu_msi_enable(iommu, IOMMU_CONTROL_ENABLED);
> +        }
>       }
>   
>       amd_iommu_msi_enable(iommu, IOMMU_CONTROL_ENABLED);
> @@ -879,7 +949,9 @@ static void disable_iommu(struct amd_iom
>           return;
>       }
>   
> -    amd_iommu_msi_enable(iommu, IOMMU_CONTROL_DISABLED);
> +    if ( !iommu->ctrl.int_cap_xt_en )
> +        amd_iommu_msi_enable(iommu, IOMMU_CONTROL_DISABLED);
> +
>       set_iommu_command_buffer_control(iommu, IOMMU_CONTROL_DISABLED);
>       set_iommu_event_log_control(iommu, IOMMU_CONTROL_DISABLED);
>   
> --- a/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
> +++ b/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
> @@ -416,6 +416,25 @@ union amd_iommu_ext_features {
>       } flds;
>   };
>   
> +/* x2APIC Control Registers */
> +#define IOMMU_XT_INT_CTRL_MMIO_OFFSET		0x0170
> +#define IOMMU_XT_PPR_INT_CTRL_MMIO_OFFSET	0x0178
> +#define IOMMU_XT_GA_INT_CTRL_MMIO_OFFSET	0x0180
> +
> +union amd_iommu_x2apic_control {
> +    uint64_t raw;
> +    struct {
> +        unsigned int :2;
> +        unsigned int dest_mode:1;
> +        unsigned int :5;
> +        unsigned int dest_lo:24;
> +        unsigned int vector:8;
> +        unsigned int int_type:1; /* DM in IOMMU spec 3.04 */
> +        unsigned int :15;
> +        unsigned int dest_hi:8;
> +    };
> +};
> +
>   /* Status Register*/
>   #define IOMMU_STATUS_MMIO_OFFSET		0x2020
>   #define IOMMU_STATUS_EVENT_OVERFLOW_MASK	0x00000001
> 

-- 
Brian Woods

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v3 12/14] AMD/IOMMU: enable x2APIC mode when available
Posted by Jan Beulich 4 years, 9 months ago
In order for the CPUs to use x2APIC mode, the IOMMU(s) first need to be
switched into suitable state.

The post-AP-bringup IRQ affinity adjustment is done also for the non-
x2APIC case, matching what VT-d does.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v3: Set GAEn (and other control register bits) earlier. Also clear the
     bits enabled here in amd_iommu_init_cleanup(). Re-base. Pass NULL
     CPU mask to set_{x2apic,msi}_affinity().
v2: Drop cpu_has_cx16 check. Add comment.
---
TBD: Instead of the system_state check in iov_enable_xt() the function
      could also zap its own hook pointer, at which point it could also
      become __init. This would, however, require that either
      resume_x2apic() be bound to ignore iommu_enable_x2apic() errors
      forever, or that iommu_enable_x2apic() be slightly re-arranged to
      not return -EOPNOTSUPP when finding a NULL hook during resume.

--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -834,6 +834,30 @@ static bool_t __init set_iommu_interrupt
      return 1;
  }
  
+int iov_adjust_irq_affinities(void)
+{
+    const struct amd_iommu *iommu;
+
+    if ( !iommu_enabled )
+        return 0;
+
+    for_each_amd_iommu ( iommu )
+    {
+        struct irq_desc *desc = irq_to_desc(iommu->msi.irq);
+        unsigned long flags;
+
+        spin_lock_irqsave(&desc->lock, flags);
+        if ( iommu->ctrl.int_cap_xt_en )
+            set_x2apic_affinity(desc, NULL);
+        else
+            set_msi_affinity(desc, NULL);
+        spin_unlock_irqrestore(&desc->lock, flags);
+    }
+
+    return 0;
+}
+__initcall(iov_adjust_irq_affinities);
+
  /*
   * Family15h Model 10h-1fh erratum 746 (IOMMU Logging May Stall Translations)
   * Workaround:
@@ -1047,7 +1071,7 @@ static void * __init allocate_ppr_log(st
                                  IOMMU_PPR_LOG_DEFAULT_ENTRIES, "PPR Log");
  }
  
-static int __init amd_iommu_init_one(struct amd_iommu *iommu)
+static int __init amd_iommu_init_one(struct amd_iommu *iommu, bool intr)
  {
      if ( allocate_cmd_buffer(iommu) == NULL )
          goto error_out;
@@ -1058,7 +1082,7 @@ static int __init amd_iommu_init_one(str
      if ( iommu->features.flds.ppr_sup && !allocate_ppr_log(iommu) )
          goto error_out;
  
-    if ( !set_iommu_interrupt_handler(iommu) )
+    if ( intr && !set_iommu_interrupt_handler(iommu) )
          goto error_out;
  
      /* To make sure that device_table.buffer has been successfully allocated */
@@ -1087,8 +1111,16 @@ static void __init amd_iommu_init_cleanu
      list_for_each_entry_safe ( iommu, next, &amd_iommu_head, list )
      {
          list_del(&iommu->list);
+
+        iommu->ctrl.ga_en = 0;
+        iommu->ctrl.xt_en = 0;
+        iommu->ctrl.int_cap_xt_en = 0;
+
          if ( iommu->enabled )
              disable_iommu(iommu);
+        else if ( iommu->mmio_base )
+            writeq(iommu->ctrl.raw,
+                   iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
  
          deallocate_ring_buffer(&iommu->cmd_buffer);
          deallocate_ring_buffer(&iommu->event_log);
@@ -1290,7 +1322,7 @@ static int __init amd_iommu_prepare_one(
      return 0;
  }
  
-int __init amd_iommu_init(void)
+int __init amd_iommu_prepare(bool xt)
  {
      struct amd_iommu *iommu;
      int rc = -ENODEV;
@@ -1305,9 +1337,14 @@ int __init amd_iommu_init(void)
      if ( unlikely(acpi_gbl_FADT.boot_flags & ACPI_FADT_NO_MSI) )
          goto error_out;
  
+    /* Have we been here before? */
+    if ( ivhd_type )
+        return 0;
+
      rc = amd_iommu_get_supported_ivhd_type();
      if ( rc < 0 )
          goto error_out;
+    BUG_ON(!rc);
      ivhd_type = rc;
  
      rc = amd_iommu_get_ivrs_dev_entries();
@@ -1323,9 +1360,37 @@ int __init amd_iommu_init(void)
          rc = amd_iommu_prepare_one(iommu);
          if ( rc )
              goto error_out;
+
+        rc = -ENODEV;
+        if ( xt && (!iommu->features.flds.ga_sup || !iommu->features.flds.xt_sup) )
+            goto error_out;
+    }
+
+    for_each_amd_iommu ( iommu )
+    {
+        /* NB: There's no need to actually write these out right here. */
+        iommu->ctrl.ga_en |= xt;
+        iommu->ctrl.xt_en = xt;
+        iommu->ctrl.int_cap_xt_en = xt;
      }
  
      rc = amd_iommu_update_ivrs_mapping_acpi();
+
+ error_out:
+    if ( rc )
+    {
+        amd_iommu_init_cleanup();
+        ivhd_type = 0;
+    }
+
+    return rc;
+}
+
+int __init amd_iommu_init(bool xt)
+{
+    struct amd_iommu *iommu;
+    int rc = amd_iommu_prepare(xt);
+
      if ( rc )
          goto error_out;
  
@@ -1351,7 +1416,12 @@ int __init amd_iommu_init(void)
      /* per iommu initialization  */
      for_each_amd_iommu ( iommu )
      {
-        rc = amd_iommu_init_one(iommu);
+        /*
+         * Setting up of the IOMMU interrupts cannot occur yet at the (very
+         * early) time we get here when enabling x2APIC mode. Suppress it
+         * here, and do it explicitly in amd_iommu_init_interrupt().
+         */
+        rc = amd_iommu_init_one(iommu, !xt);
          if ( rc )
              goto error_out;
      }
@@ -1363,6 +1433,40 @@ error_out:
      return rc;
  }
  
+int __init amd_iommu_init_interrupt(void)
+{
+    struct amd_iommu *iommu;
+    int rc = 0;
+
+    for_each_amd_iommu ( iommu )
+    {
+        struct irq_desc *desc;
+
+        if ( !set_iommu_interrupt_handler(iommu) )
+        {
+            rc = -EIO;
+            break;
+        }
+
+        desc = irq_to_desc(iommu->msi.irq);
+
+        spin_lock(&desc->lock);
+        ASSERT(iommu->ctrl.int_cap_xt_en);
+        set_x2apic_affinity(desc, &cpu_online_map);
+        spin_unlock(&desc->lock);
+
+        set_iommu_event_log_control(iommu, IOMMU_CONTROL_ENABLED);
+
+        if ( iommu->features.flds.ppr_sup )
+            set_iommu_ppr_log_control(iommu, IOMMU_CONTROL_ENABLED);
+    }
+
+    if ( rc )
+        amd_iommu_init_cleanup();
+
+    return rc;
+}
+
  static void invalidate_all_domain_pages(void)
  {
      struct domain *d;
--- a/xen/drivers/passthrough/amd/iommu_intr.c
+++ b/xen/drivers/passthrough/amd/iommu_intr.c
@@ -791,6 +791,35 @@ void *__init amd_iommu_alloc_intremap_ta
      return tb;
  }
  
+bool __init iov_supports_xt(void)
+{
+    unsigned int apic;
+
+    if ( !iommu_enable || !iommu_intremap )
+        return false;
+
+    if ( amd_iommu_prepare(true) )
+        return false;
+
+    for ( apic = 0; apic < nr_ioapics; apic++ )
+    {
+        unsigned int idx = ioapic_id_to_index(IO_APIC_ID(apic));
+
+        if ( idx == MAX_IO_APICS )
+            return false;
+
+        if ( !find_iommu_for_device(ioapic_sbdf[idx].seg,
+                                    ioapic_sbdf[idx].bdf) )
+        {
+            AMD_IOMMU_DEBUG("No IOMMU for IO-APIC %#x (ID %x)\n",
+                            apic, IO_APIC_ID(apic));
+            return false;
+        }
+    }
+
+    return true;
+}
+
  int __init amd_setup_hpet_msi(struct msi_desc *msi_desc)
  {
      spinlock_t *lock;
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -170,7 +170,8 @@ static int __init iov_detect(void)
      if ( !iommu_enable && !iommu_intremap )
          return 0;
  
-    if ( amd_iommu_init() != 0 )
+    else if ( (init_done ? amd_iommu_init_interrupt()
+                         : amd_iommu_init(false)) != 0 )
      {
          printk("AMD-Vi: Error initialization\n");
          return -ENODEV;
@@ -183,6 +184,25 @@ static int __init iov_detect(void)
      return scan_pci_devices();
  }
  
+static int iov_enable_xt(void)
+{
+    int rc;
+
+    if ( system_state >= SYS_STATE_active )
+        return 0;
+
+    if ( (rc = amd_iommu_init(true)) != 0 )
+    {
+        printk("AMD-Vi: Error %d initializing for x2APIC mode\n", rc);
+        /* -ENXIO has special meaning to the caller - convert it. */
+        return rc != -ENXIO ? rc : -ENODATA;
+    }
+
+    init_done = true;
+
+    return 0;
+}
+
  int amd_iommu_alloc_root(struct domain_iommu *hd)
  {
      if ( unlikely(!hd->arch.root_table) )
@@ -559,11 +579,13 @@ static const struct iommu_ops __initcons
      .free_page_table = deallocate_page_table,
      .reassign_device = reassign_device,
      .get_device_group_id = amd_iommu_group_id,
+    .enable_x2apic = iov_enable_xt,
      .update_ire_from_apic = amd_iommu_ioapic_update_ire,
      .update_ire_from_msi = amd_iommu_msi_msg_update_ire,
      .read_apic_from_ire = amd_iommu_read_ioapic_from_ire,
      .read_msi_from_ire = amd_iommu_read_msi_from_ire,
      .setup_hpet_msi = amd_setup_hpet_msi,
+    .adjust_irq_affinities = iov_adjust_irq_affinities,
      .suspend = amd_iommu_suspend,
      .resume = amd_iommu_resume,
      .share_p2m = amd_iommu_share_p2m,
@@ -574,4 +596,5 @@ static const struct iommu_ops __initcons
  static const struct iommu_init_ops __initconstrel _iommu_init_ops = {
      .ops = &_iommu_ops,
      .setup = iov_detect,
+    .supports_x2apic = iov_supports_xt,
  };
--- a/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
+++ b/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
@@ -48,8 +48,11 @@ int amd_iommu_detect_acpi(void);
  void get_iommu_features(struct amd_iommu *iommu);
  
  /* amd-iommu-init functions */
-int amd_iommu_init(void);
+int amd_iommu_prepare(bool xt);
+int amd_iommu_init(bool xt);
+int amd_iommu_init_interrupt(void);
  int amd_iommu_update_ivrs_mapping_acpi(void);
+int iov_adjust_irq_affinities(void);
  
  /* mapping functions */
  int __must_check amd_iommu_map_page(struct domain *d, dfn_t dfn,
@@ -96,6 +99,7 @@ void amd_iommu_flush_all_caches(struct a
  struct amd_iommu *find_iommu_for_device(int seg, int bdf);
  
  /* interrupt remapping */
+bool iov_supports_xt(void);
  int amd_iommu_setup_ioapic_remapping(void);
  void *amd_iommu_alloc_intremap_table(
      const struct amd_iommu *, unsigned long **);

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 12/14] AMD/IOMMU: enable x2APIC mode when available
Posted by Woods, Brian 4 years, 9 months ago
On Tue, Jul 16, 2019 at 04:40:33PM +0000, Jan Beulich wrote:
> In order for the CPUs to use x2APIC mode, the IOMMU(s) first need to be
> switched into suitable state.
> 
> The post-AP-bringup IRQ affinity adjustment is done also for the non-
> x2APIC case, matching what VT-d does.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Brian Woods <brian.woods@amd.com>

> ---
> v3: Set GAEn (and other control register bits) earlier. Also clear the
>      bits enabled here in amd_iommu_init_cleanup(). Re-base. Pass NULL
>      CPU mask to set_{x2apic,msi}_affinity().
> v2: Drop cpu_has_cx16 check. Add comment.
> ---
> TBD: Instead of the system_state check in iov_enable_xt() the function
>       could also zap its own hook pointer, at which point it could also
>       become __init. This would, however, require that either
>       resume_x2apic() be bound to ignore iommu_enable_x2apic() errors
>       forever, or that iommu_enable_x2apic() be slightly re-arranged to
>       not return -EOPNOTSUPP when finding a NULL hook during resume.
> 
> --- a/xen/drivers/passthrough/amd/iommu_init.c
> +++ b/xen/drivers/passthrough/amd/iommu_init.c
> @@ -834,6 +834,30 @@ static bool_t __init set_iommu_interrupt
>       return 1;
>   }
>   
> +int iov_adjust_irq_affinities(void)
> +{
> +    const struct amd_iommu *iommu;
> +
> +    if ( !iommu_enabled )
> +        return 0;
> +
> +    for_each_amd_iommu ( iommu )
> +    {
> +        struct irq_desc *desc = irq_to_desc(iommu->msi.irq);
> +        unsigned long flags;
> +
> +        spin_lock_irqsave(&desc->lock, flags);
> +        if ( iommu->ctrl.int_cap_xt_en )
> +            set_x2apic_affinity(desc, NULL);
> +        else
> +            set_msi_affinity(desc, NULL);
> +        spin_unlock_irqrestore(&desc->lock, flags);
> +    }
> +
> +    return 0;
> +}
> +__initcall(iov_adjust_irq_affinities);
> +
>   /*
>    * Family15h Model 10h-1fh erratum 746 (IOMMU Logging May Stall Translations)
>    * Workaround:
> @@ -1047,7 +1071,7 @@ static void * __init allocate_ppr_log(st
>                                   IOMMU_PPR_LOG_DEFAULT_ENTRIES, "PPR Log");
>   }
>   
> -static int __init amd_iommu_init_one(struct amd_iommu *iommu)
> +static int __init amd_iommu_init_one(struct amd_iommu *iommu, bool intr)
>   {
>       if ( allocate_cmd_buffer(iommu) == NULL )
>           goto error_out;
> @@ -1058,7 +1082,7 @@ static int __init amd_iommu_init_one(str
>       if ( iommu->features.flds.ppr_sup && !allocate_ppr_log(iommu) )
>           goto error_out;
>   
> -    if ( !set_iommu_interrupt_handler(iommu) )
> +    if ( intr && !set_iommu_interrupt_handler(iommu) )
>           goto error_out;
>   
>       /* To make sure that device_table.buffer has been successfully allocated */
> @@ -1087,8 +1111,16 @@ static void __init amd_iommu_init_cleanu
>       list_for_each_entry_safe ( iommu, next, &amd_iommu_head, list )
>       {
>           list_del(&iommu->list);
> +
> +        iommu->ctrl.ga_en = 0;
> +        iommu->ctrl.xt_en = 0;
> +        iommu->ctrl.int_cap_xt_en = 0;
> +
>           if ( iommu->enabled )
>               disable_iommu(iommu);
> +        else if ( iommu->mmio_base )
> +            writeq(iommu->ctrl.raw,
> +                   iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
>   
>           deallocate_ring_buffer(&iommu->cmd_buffer);
>           deallocate_ring_buffer(&iommu->event_log);
> @@ -1290,7 +1322,7 @@ static int __init amd_iommu_prepare_one(
>       return 0;
>   }
>   
> -int __init amd_iommu_init(void)
> +int __init amd_iommu_prepare(bool xt)
>   {
>       struct amd_iommu *iommu;
>       int rc = -ENODEV;
> @@ -1305,9 +1337,14 @@ int __init amd_iommu_init(void)
>       if ( unlikely(acpi_gbl_FADT.boot_flags & ACPI_FADT_NO_MSI) )
>           goto error_out;
>   
> +    /* Have we been here before? */
> +    if ( ivhd_type )
> +        return 0;
> +
>       rc = amd_iommu_get_supported_ivhd_type();
>       if ( rc < 0 )
>           goto error_out;
> +    BUG_ON(!rc);
>       ivhd_type = rc;
>   
>       rc = amd_iommu_get_ivrs_dev_entries();
> @@ -1323,9 +1360,37 @@ int __init amd_iommu_init(void)
>           rc = amd_iommu_prepare_one(iommu);
>           if ( rc )
>               goto error_out;
> +
> +        rc = -ENODEV;
> +        if ( xt && (!iommu->features.flds.ga_sup || !iommu->features.flds.xt_sup) )
> +            goto error_out;
> +    }
> +
> +    for_each_amd_iommu ( iommu )
> +    {
> +        /* NB: There's no need to actually write these out right here. */
> +        iommu->ctrl.ga_en |= xt;
> +        iommu->ctrl.xt_en = xt;
> +        iommu->ctrl.int_cap_xt_en = xt;
>       }
>   
>       rc = amd_iommu_update_ivrs_mapping_acpi();
> +
> + error_out:
> +    if ( rc )
> +    {
> +        amd_iommu_init_cleanup();
> +        ivhd_type = 0;
> +    }
> +
> +    return rc;
> +}
> +
> +int __init amd_iommu_init(bool xt)
> +{
> +    struct amd_iommu *iommu;
> +    int rc = amd_iommu_prepare(xt);
> +
>       if ( rc )
>           goto error_out;
>   
> @@ -1351,7 +1416,12 @@ int __init amd_iommu_init(void)
>       /* per iommu initialization  */
>       for_each_amd_iommu ( iommu )
>       {
> -        rc = amd_iommu_init_one(iommu);
> +        /*
> +         * Setting up of the IOMMU interrupts cannot occur yet at the (very
> +         * early) time we get here when enabling x2APIC mode. Suppress it
> +         * here, and do it explicitly in amd_iommu_init_interrupt().
> +         */
> +        rc = amd_iommu_init_one(iommu, !xt);
>           if ( rc )
>               goto error_out;
>       }
> @@ -1363,6 +1433,40 @@ error_out:
>       return rc;
>   }
>   
> +int __init amd_iommu_init_interrupt(void)
> +{
> +    struct amd_iommu *iommu;
> +    int rc = 0;
> +
> +    for_each_amd_iommu ( iommu )
> +    {
> +        struct irq_desc *desc;
> +
> +        if ( !set_iommu_interrupt_handler(iommu) )
> +        {
> +            rc = -EIO;
> +            break;
> +        }
> +
> +        desc = irq_to_desc(iommu->msi.irq);
> +
> +        spin_lock(&desc->lock);
> +        ASSERT(iommu->ctrl.int_cap_xt_en);
> +        set_x2apic_affinity(desc, &cpu_online_map);
> +        spin_unlock(&desc->lock);
> +
> +        set_iommu_event_log_control(iommu, IOMMU_CONTROL_ENABLED);
> +
> +        if ( iommu->features.flds.ppr_sup )
> +            set_iommu_ppr_log_control(iommu, IOMMU_CONTROL_ENABLED);
> +    }
> +
> +    if ( rc )
> +        amd_iommu_init_cleanup();
> +
> +    return rc;
> +}
> +
>   static void invalidate_all_domain_pages(void)
>   {
>       struct domain *d;
> --- a/xen/drivers/passthrough/amd/iommu_intr.c
> +++ b/xen/drivers/passthrough/amd/iommu_intr.c
> @@ -791,6 +791,35 @@ void *__init amd_iommu_alloc_intremap_ta
>       return tb;
>   }
>   
> +bool __init iov_supports_xt(void)
> +{
> +    unsigned int apic;
> +
> +    if ( !iommu_enable || !iommu_intremap )
> +        return false;
> +
> +    if ( amd_iommu_prepare(true) )
> +        return false;
> +
> +    for ( apic = 0; apic < nr_ioapics; apic++ )
> +    {
> +        unsigned int idx = ioapic_id_to_index(IO_APIC_ID(apic));
> +
> +        if ( idx == MAX_IO_APICS )
> +            return false;
> +
> +        if ( !find_iommu_for_device(ioapic_sbdf[idx].seg,
> +                                    ioapic_sbdf[idx].bdf) )
> +        {
> +            AMD_IOMMU_DEBUG("No IOMMU for IO-APIC %#x (ID %x)\n",
> +                            apic, IO_APIC_ID(apic));
> +            return false;
> +        }
> +    }
> +
> +    return true;
> +}
> +
>   int __init amd_setup_hpet_msi(struct msi_desc *msi_desc)
>   {
>       spinlock_t *lock;
> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> @@ -170,7 +170,8 @@ static int __init iov_detect(void)
>       if ( !iommu_enable && !iommu_intremap )
>           return 0;
>   
> -    if ( amd_iommu_init() != 0 )
> +    else if ( (init_done ? amd_iommu_init_interrupt()
> +                         : amd_iommu_init(false)) != 0 )
>       {
>           printk("AMD-Vi: Error initialization\n");
>           return -ENODEV;
> @@ -183,6 +184,25 @@ static int __init iov_detect(void)
>       return scan_pci_devices();
>   }
>   
> +static int iov_enable_xt(void)
> +{
> +    int rc;
> +
> +    if ( system_state >= SYS_STATE_active )
> +        return 0;
> +
> +    if ( (rc = amd_iommu_init(true)) != 0 )
> +    {
> +        printk("AMD-Vi: Error %d initializing for x2APIC mode\n", rc);
> +        /* -ENXIO has special meaning to the caller - convert it. */
> +        return rc != -ENXIO ? rc : -ENODATA;
> +    }
> +
> +    init_done = true;
> +
> +    return 0;
> +}
> +
>   int amd_iommu_alloc_root(struct domain_iommu *hd)
>   {
>       if ( unlikely(!hd->arch.root_table) )
> @@ -559,11 +579,13 @@ static const struct iommu_ops __initcons
>       .free_page_table = deallocate_page_table,
>       .reassign_device = reassign_device,
>       .get_device_group_id = amd_iommu_group_id,
> +    .enable_x2apic = iov_enable_xt,
>       .update_ire_from_apic = amd_iommu_ioapic_update_ire,
>       .update_ire_from_msi = amd_iommu_msi_msg_update_ire,
>       .read_apic_from_ire = amd_iommu_read_ioapic_from_ire,
>       .read_msi_from_ire = amd_iommu_read_msi_from_ire,
>       .setup_hpet_msi = amd_setup_hpet_msi,
> +    .adjust_irq_affinities = iov_adjust_irq_affinities,
>       .suspend = amd_iommu_suspend,
>       .resume = amd_iommu_resume,
>       .share_p2m = amd_iommu_share_p2m,
> @@ -574,4 +596,5 @@ static const struct iommu_ops __initcons
>   static const struct iommu_init_ops __initconstrel _iommu_init_ops = {
>       .ops = &_iommu_ops,
>       .setup = iov_detect,
> +    .supports_x2apic = iov_supports_xt,
>   };
> --- a/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
> +++ b/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
> @@ -48,8 +48,11 @@ int amd_iommu_detect_acpi(void);
>   void get_iommu_features(struct amd_iommu *iommu);
>   
>   /* amd-iommu-init functions */
> -int amd_iommu_init(void);
> +int amd_iommu_prepare(bool xt);
> +int amd_iommu_init(bool xt);
> +int amd_iommu_init_interrupt(void);
>   int amd_iommu_update_ivrs_mapping_acpi(void);
> +int iov_adjust_irq_affinities(void);
>   
>   /* mapping functions */
>   int __must_check amd_iommu_map_page(struct domain *d, dfn_t dfn,
> @@ -96,6 +99,7 @@ void amd_iommu_flush_all_caches(struct a
>   struct amd_iommu *find_iommu_for_device(int seg, int bdf);
>   
>   /* interrupt remapping */
> +bool iov_supports_xt(void);
>   int amd_iommu_setup_ioapic_remapping(void);
>   void *amd_iommu_alloc_intremap_table(
>       const struct amd_iommu *, unsigned long **);
> 

-- 
Brian Woods

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 12/14] AMD/IOMMU: enable x2APIC mode when available
Posted by Andrew Cooper 4 years, 9 months ago
On 16/07/2019 17:40, Jan Beulich wrote:
> In order for the CPUs to use x2APIC mode, the IOMMU(s) first need to be
> switched into suitable state.
>
> The post-AP-bringup IRQ affinity adjustment is done also for the non-
> x2APIC case, matching what VT-d does.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH RFC v3 13/14] AMD/IOMMU: correct IRTE updating
Posted by Jan Beulich 4 years, 9 months ago
Flushing didn't get done along the lines of what the specification says.
Mark entries to be updated as not remapped (which will result in
interrupt requests to get target aborted, but the interrupts should be
masked anyway at that point in time), issue the flush, and only then
write the new entry.

In update_intremap_entry_from_msi_msg() also fold the duplicate initial
lock determination and acquire into just a single instance.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
RFC: Putting the flush invocations in loops isn't overly nice, but I
      don't think this can really be abused, since callers up the stack
      hold further locks. Nevertheless I'd like to ask for better
      suggestions.
---
v3: Remove stale parts of description. Re-base.
v2: Parts morphed into earlier patch.

--- a/xen/drivers/passthrough/amd/iommu_intr.c
+++ b/xen/drivers/passthrough/amd/iommu_intr.c
@@ -207,9 +207,7 @@ static void update_intremap_entry(const
              .vector = vector,
          };
  
-        ACCESS_ONCE(entry.ptr128->raw[0]) = 0;
-        /* Low half, in particular RemapEn, needs to be cleared first. */
-        barrier();
+        ASSERT(!entry.ptr128->full.remap_en);
          entry.ptr128->raw[1] =
              container_of(&full, union irte128, full)->raw[1];
          /* High half needs to be set before low one (containing RemapEn). */
@@ -288,6 +286,20 @@ static int update_intremap_entry_from_io
      }
  
      entry = get_intremap_entry(iommu, req_id, offset);
+
+    /* The RemapEn fields match for all formats. */
+    while ( iommu->enabled && entry.ptr32->basic.remap_en )
+    {
+        entry.ptr32->basic.remap_en = false;
+        spin_unlock(lock);
+
+        spin_lock(&iommu->lock);
+        amd_iommu_flush_intremap(iommu, req_id);
+        spin_unlock(&iommu->lock);
+
+        spin_lock(lock);
+    }
+
      if ( fresh )
          /* nothing */;
      else if ( !lo_update )
@@ -317,13 +329,6 @@ static int update_intremap_entry_from_io
  
      spin_unlock_irqrestore(lock, flags);
  
-    if ( iommu->enabled && !fresh )
-    {
-        spin_lock_irqsave(&iommu->lock, flags);
-        amd_iommu_flush_intremap(iommu, req_id);
-        spin_unlock_irqrestore(&iommu->lock, flags);
-    }
-
      set_rte_index(rte, offset);
  
      return 0;
@@ -579,19 +584,27 @@ static int update_intremap_entry_from_ms
      req_id = get_dma_requestor_id(iommu->seg, bdf);
      alias_id = get_intremap_requestor_id(iommu->seg, bdf);
  
+    lock = get_intremap_lock(iommu->seg, req_id);
+    spin_lock_irqsave(lock, flags);
+
      if ( msg == NULL )
      {
-        lock = get_intremap_lock(iommu->seg, req_id);
-        spin_lock_irqsave(lock, flags);
          for ( i = 0; i < nr; ++i )
              free_intremap_entry(iommu, req_id, *remap_index + i);
          spin_unlock_irqrestore(lock, flags);
-        goto done;
-    }
  
-    lock = get_intremap_lock(iommu->seg, req_id);
+        if ( iommu->enabled )
+        {
+            spin_lock_irqsave(&iommu->lock, flags);
+            amd_iommu_flush_intremap(iommu, req_id);
+            if ( alias_id != req_id )
+                amd_iommu_flush_intremap(iommu, alias_id);
+            spin_unlock_irqrestore(&iommu->lock, flags);
+        }
+
+        return 0;
+    }
  
-    spin_lock_irqsave(lock, flags);
      dest_mode = (msg->address_lo >> MSI_ADDR_DESTMODE_SHIFT) & 0x1;
      delivery_mode = (msg->data >> MSI_DATA_DELIVERY_MODE_SHIFT) & 0x1;
      vector = (msg->data >> MSI_DATA_VECTOR_SHIFT) & MSI_DATA_VECTOR_MASK;
@@ -615,6 +628,22 @@ static int update_intremap_entry_from_ms
      }
  
      entry = get_intremap_entry(iommu, req_id, offset);
+
+    /* The RemapEn fields match for all formats. */
+    while ( iommu->enabled && entry.ptr32->basic.remap_en )
+    {
+        entry.ptr32->basic.remap_en = false;
+        spin_unlock(lock);
+
+        spin_lock(&iommu->lock);
+        amd_iommu_flush_intremap(iommu, req_id);
+        if ( alias_id != req_id )
+            amd_iommu_flush_intremap(iommu, alias_id);
+        spin_unlock(&iommu->lock);
+
+        spin_lock(lock);
+    }
+
      update_intremap_entry(iommu, entry, vector, delivery_mode, dest_mode, dest);
      spin_unlock_irqrestore(lock, flags);
  
@@ -634,16 +663,6 @@ static int update_intremap_entry_from_ms
                 get_ivrs_mappings(iommu->seg)[alias_id].intremap_table);
      }
  
-done:
-    if ( iommu->enabled )
-    {
-        spin_lock_irqsave(&iommu->lock, flags);
-        amd_iommu_flush_intremap(iommu, req_id);
-        if ( alias_id != req_id )
-            amd_iommu_flush_intremap(iommu, alias_id);
-        spin_unlock_irqrestore(&iommu->lock, flags);
-    }
-
      return 0;
  }
  

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH RFC v3 13/14] AMD/IOMMU: correct IRTE updating
Posted by Andrew Cooper 4 years, 9 months ago
On 16/07/2019 17:40, Jan Beulich wrote:
> Flushing didn't get done along the lines of what the specification says.
> Mark entries to be updated as not remapped (which will result in
> interrupt requests to get target aborted, but the interrupts should be
> masked anyway at that point in time), issue the flush, and only then
> write the new entry.
>
> In update_intremap_entry_from_msi_msg() also fold the duplicate initial
> lock determination and acquire into just a single instance.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> RFC: Putting the flush invocations in loops isn't overly nice, but I
>       don't think this can really be abused, since callers up the stack
>       hold further locks. Nevertheless I'd like to ask for better
>       suggestions.

Looking again, and at v2, I think this is a consequence of our insane
!x2apic interrupt set up, where we wrap an already-established system
with interrupt remapping.

Longer term, when we undo that, we should have far more clear code
structure.  Therefore, I think it is fine for now.

Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v3 14/14] AMD/IOMMU: process softirqs while dumping IRTs
Posted by Jan Beulich 4 years, 9 months ago
When there are sufficiently many devices listed in the ACPI tables (no
matter if they actually exist), output may take way longer than the
watchdog would like.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v3: New.
---
TBD: Seeing the volume of output I wonder whether we should further
      suppress logging headers of devices which have no active entry
      (i.e. emit the header only upon finding the first IRTE worth
      logging). And while minor for the total volume of output I'm
      also unconvinced logging both a "per device" header line and a
      "shared" one makes sense, when only one of the two can actually
      be followed by actual contents.

--- a/xen/drivers/passthrough/amd/iommu_intr.c
+++ b/xen/drivers/passthrough/amd/iommu_intr.c
@@ -22,6 +22,7 @@
  #include <asm/hvm/svm/amd-iommu-proto.h>
  #include <asm/io_apic.h>
  #include <xen/keyhandler.h>
+#include <xen/softirq.h>
  
  struct irte_basic {
      bool remap_en:1;
@@ -917,6 +918,8 @@ static int dump_intremap_mapping(const s
      dump_intremap_table(iommu, ivrs_mapping->intremap_table);
      spin_unlock_irqrestore(&(ivrs_mapping->intremap_lock), flags);
  
+    process_pending_softirqs();
+
      return 0;
  }
  

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 14/14] AMD/IOMMU: process softirqs while dumping IRTs
Posted by Andrew Cooper 4 years, 9 months ago
On 16/07/2019 17:41, Jan Beulich wrote:
> When there are sufficiently many devices listed in the ACPI tables (no
> matter if they actually exist), output may take way longer than the
> watchdog would like.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> v3: New.
> ---
> TBD: Seeing the volume of output I wonder whether we should further
>       suppress logging headers of devices which have no active entry
>       (i.e. emit the header only upon finding the first IRTE worth
>       logging). And while minor for the total volume of output I'm
>       also unconvinced logging both a "per device" header line and a
>       "shared" one makes sense, when only one of the two can actually
>       be followed by actual contents.

I don't have a system I can access at the moment, so can't judge how bad
it is right now.  However, I would advocate the removal of irrelevant
information.

Either way, this is debugging so Acked-by: Andrew Cooper
<andrew.cooper3@citrix.com>

As an observation, I wonder whether continually sprinkling
process_pending_softirqs() is the best thing to do for keyhandlers. 
We've got a number of other which incur the wrath of the watchdog (grant
table in particular), which in practice means they are typically broken
when they are actually used for debugging production.

As these are for debugging only, might it be a better idea to stop the
watchdog while keyhandlers are running?  The only useful thing we
actually manage here is to stop the watchdog killing us.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 14/14] AMD/IOMMU: process softirqs while dumping IRTs
Posted by Jan Beulich 4 years, 9 months ago
On 19.07.2019 19:55, Andrew Cooper wrote:
> On 16/07/2019 17:41, Jan Beulich wrote:
>> When there are sufficiently many devices listed in the ACPI tables (no
>> matter if they actually exist), output may take way longer than the
>> watchdog would like.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> v3: New.
>> ---
>> TBD: Seeing the volume of output I wonder whether we should further
>>        suppress logging headers of devices which have no active entry
>>        (i.e. emit the header only upon finding the first IRTE worth
>>        logging). And while minor for the total volume of output I'm
>>        also unconvinced logging both a "per device" header line and a
>>        "shared" one makes sense, when only one of the two can actually
>>        be followed by actual contents.
> 
> I don't have a system I can access at the moment, so can't judge how bad
> it is right now.  However, I would advocate the removal of irrelevant
> information.

I'll try to get to putting together another patch to this effect.

> Either way, this is debugging so Acked-by: Andrew Cooper
> <andrew.cooper3@citrix.com>

Thanks, also for all the other review of this series!

> As an observation, I wonder whether continually sprinkling
> process_pending_softirqs() is the best thing to do for keyhandlers.
> We've got a number of other which incur the wrath of the watchdog (grant
> table in particular), which in practice means they are typically broken
> when they are actually used for debugging production.
> 
> As these are for debugging only, might it be a better idea to stop the
> watchdog while keyhandlers are running?  The only useful thing we
> actually manage here is to stop the watchdog killing us.

Hmm, I would agree going this route if the watchdog could be disabled
on a per-CPU basis, but right now watchdog_disable() is a system wide
action.

Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 14/14] AMD/IOMMU: process softirqs while dumping IRTs
Posted by Andrew Cooper 4 years, 9 months ago
On 22/07/2019 09:49, Jan Beulich wrote:
> On 19.07.2019 19:55, Andrew Cooper wrote:
>> On 16/07/2019 17:41, Jan Beulich wrote:
>> As an observation, I wonder whether continually sprinkling
>> process_pending_softirqs() is the best thing to do for keyhandlers.
>> We've got a number of other which incur the wrath of the watchdog (grant
>> table in particular), which in practice means they are typically broken
>> when they are actually used for debugging production.
>>
>> As these are for debugging only, might it be a better idea to stop the
>> watchdog while keyhandlers are running?  The only useful thing we
>> actually manage here is to stop the watchdog killing us.
> Hmm, I would agree going this route if the watchdog could be disabled
> on a per-CPU basis, but right now watchdog_disable() is a system wide
> action.

It needs to be disabled system-wide.  Disabling only the local CPU will
still cause a watchdog timeout on other CPUs which are waiting on the
current CPU to complete some action.

Most keyhandlers run with interrupts enabled so we will be fine WRT TLB
flushes, etc, but things like vcpu_pause() will block until softirqs are
processed again, and we need to prevent those CPUs from taking a timeout.

For other CPUs which really are having problems, the timeout will still
trip 5 seconds after the keyhandler completes, and we'll still get a
backtrace out of it.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 14/14] AMD/IOMMU: process softirqs while dumping IRTs
Posted by Woods, Brian 4 years, 9 months ago
On Tue, Jul 16, 2019 at 04:41:21PM +0000, Jan Beulich wrote:
> When there are sufficiently many devices listed in the ACPI tables (no
> matter if they actually exist), output may take way longer than the
> watchdog would like.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Brian Woods <brian.woods@amd.com>

> ---
> v3: New.
> ---
> TBD: Seeing the volume of output I wonder whether we should further
>       suppress logging headers of devices which have no active entry
>       (i.e. emit the header only upon finding the first IRTE worth
>       logging). And while minor for the total volume of output I'm
>       also unconvinced logging both a "per device" header line and a
>       "shared" one makes sense, when only one of the two can actually
>       be followed by actual contents.
> 
> --- a/xen/drivers/passthrough/amd/iommu_intr.c
> +++ b/xen/drivers/passthrough/amd/iommu_intr.c
> @@ -22,6 +22,7 @@
>   #include <asm/hvm/svm/amd-iommu-proto.h>
>   #include <asm/io_apic.h>
>   #include <xen/keyhandler.h>
> +#include <xen/softirq.h>
>   
>   struct irte_basic {
>       bool remap_en:1;
> @@ -917,6 +918,8 @@ static int dump_intremap_mapping(const s
>       dump_intremap_table(iommu, ivrs_mapping->intremap_table);
>       spin_unlock_irqrestore(&(ivrs_mapping->intremap_lock), flags);
>   
> +    process_pending_softirqs();
> +
>       return 0;
>   }
>   
> 

-- 
Brian Woods

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel