[Xen-devel] [PATCH v2 00/10] x86: AMD x2APIC support

Jan Beulich posted 10 patches 4 years, 10 months ago
Only 0 patches received!
There is a newer version of this series
[Xen-devel] [PATCH v2 00/10] x86: AMD x2APIC support
Posted by Jan Beulich 4 years, 10 months ago
Despite the title this is actually all AMD IOMMU side work; all x86
side adjustments have already been carried out.

1: AMD/IOMMU: restrict feature logging
2: AMD/IOMMU: use bit field for extended feature register
3: AMD/IOMMU: use bit field for control register
4: AMD/IOMMU: use bit field for IRTE
5: AMD/IOMMU: introduce 128-bit IRTE non-guest-APIC IRTE format
6: AMD/IOMMU: split amd_iommu_init_one()
7: AMD/IOMMU: allow enabling with IRQ not yet set up
8: AMD/IOMMU: adjust setup of internal interrupt for x2APIC mode
9: AMD/IOMMU: enable x2APIC mode when available
10: AMD/IOMMU: correct IRTE updating

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v2 01/10] AMD/IOMMU: restrict feature logging
Posted by Jan Beulich 4 years, 10 months ago
The common case is all IOMMUs having the same features. Log them only
for the first IOMMU, or for any that have a differing feature set.

Requested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: New.

--- a/xen/drivers/passthrough/amd/iommu_detect.c
+++ b/xen/drivers/passthrough/amd/iommu_detect.c
@@ -62,6 +62,7 @@ void __init get_iommu_features(struct am
 {
     u32 low, high;
     int i = 0 ;
+    const struct amd_iommu *first;
     static const char *__initdata feature_str[] = {
         "- Prefetch Pages Command", 
         "- Peripheral Page Service Request", 
@@ -89,6 +90,11 @@ void __init get_iommu_features(struct am
 
     iommu->features = ((u64)high << 32) | low;
 
+    /* Don't log the same set of features over and over. */
+    first = list_first_entry(&amd_iommu_head, struct amd_iommu, list);
+    if ( iommu != first && iommu->features == first->features )
+        return;
+
     printk("AMD-Vi: IOMMU Extended Features:\n");
 
     while ( feature_str[i] )




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 01/10] AMD/IOMMU: restrict feature logging
Posted by Andrew Cooper 4 years, 9 months ago
On 27/06/2019 16:19, Jan Beulich wrote:
> The common case is all IOMMUs having the same features. Log them only
> for the first IOMMU, or for any that have a differing feature set.
>
> Requested-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 01/10] AMD/IOMMU: restrict feature logging
Posted by Woods, Brian 4 years, 9 months ago
On Thu, Jun 27, 2019 at 09:19:06AM -0600, Jan Beulich wrote:
> The common case is all IOMMUs having the same features. Log them only
> for the first IOMMU, or for any that have a differing feature set.
> 
> Requested-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Brian Woods <brian.woods@amd.com>

> ---
> v2: New.
> 
> --- a/xen/drivers/passthrough/amd/iommu_detect.c
> +++ b/xen/drivers/passthrough/amd/iommu_detect.c
> @@ -62,6 +62,7 @@ void __init get_iommu_features(struct am
>  {
>      u32 low, high;
>      int i = 0 ;
> +    const struct amd_iommu *first;
>      static const char *__initdata feature_str[] = {
>          "- Prefetch Pages Command", 
>          "- Peripheral Page Service Request", 
> @@ -89,6 +90,11 @@ void __init get_iommu_features(struct am
>  
>      iommu->features = ((u64)high << 32) | low;
>  
> +    /* Don't log the same set of features over and over. */
> +    first = list_first_entry(&amd_iommu_head, struct amd_iommu, list);
> +    if ( iommu != first && iommu->features == first->features )
> +        return;
> +
>      printk("AMD-Vi: IOMMU Extended Features:\n");
>  
>      while ( feature_str[i] )
> 
> 
> 

-- 
Brian Woods

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v2 02/10] AMD/IOMMU: use bit field for extended feature register
Posted by Jan Beulich 4 years, 10 months ago
This also takes care of several of the shift values wrongly having been
specified as hex rather than dec.

Take the opportunity and add further fields.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: Correct sats_sup position and name. Re-base over new earlier patch.

--- a/xen/drivers/passthrough/amd/iommu_detect.c
+++ b/xen/drivers/passthrough/amd/iommu_detect.c
@@ -60,49 +60,78 @@ static int __init get_iommu_capabilities
 
 void __init get_iommu_features(struct amd_iommu *iommu)
 {
-    u32 low, high;
-    int i = 0 ;
     const struct amd_iommu *first;
-    static const char *__initdata feature_str[] = {
-        "- Prefetch Pages Command", 
-        "- Peripheral Page Service Request", 
-        "- X2APIC Supported", 
-        "- NX bit Supported", 
-        "- Guest Translation", 
-        "- Reserved bit [5]",
-        "- Invalidate All Command", 
-        "- Guest APIC supported", 
-        "- Hardware Error Registers", 
-        "- Performance Counters", 
-        NULL
-    };
-
     ASSERT( iommu->mmio_base );
 
     if ( !iommu_has_cap(iommu, PCI_CAP_EFRSUP_SHIFT) )
     {
-        iommu->features = 0;
+        iommu->features.raw = 0;
         return;
     }
 
-    low = readl(iommu->mmio_base + IOMMU_EXT_FEATURE_MMIO_OFFSET);
-    high = readl(iommu->mmio_base + IOMMU_EXT_FEATURE_MMIO_OFFSET + 4);
-
-    iommu->features = ((u64)high << 32) | low;
+    iommu->features.raw =
+        readq(iommu->mmio_base + IOMMU_EXT_FEATURE_MMIO_OFFSET);
 
     /* Don't log the same set of features over and over. */
     first = list_first_entry(&amd_iommu_head, struct amd_iommu, list);
-    if ( iommu != first && iommu->features == first->features )
+    if ( iommu != first && iommu->features.raw == first->features.raw )
         return;
 
     printk("AMD-Vi: IOMMU Extended Features:\n");
 
-    while ( feature_str[i] )
+#define MASK(fld) ((union amd_iommu_ext_features){ .flds.fld = ~0 }).raw
+#define FEAT(fld, str) do { \
+    if ( MASK(fld) & (MASK(fld) - 1) ) \
+        printk( "- " str ": %#x\n", iommu->features.flds.fld); \
+    else if ( iommu->features.raw & MASK(fld) ) \
+        printk( "- " str "\n"); \
+} while ( false )
+
+    FEAT(pref_sup,           "Prefetch Pages Command");
+    FEAT(ppr_sup,            "Peripheral Page Service Request");
+    FEAT(xt_sup,             "x2APIC");
+    FEAT(nx_sup,             "NX bit");
+    FEAT(gappi_sup,          "Guest APIC Physical Processor Interrupt");
+    FEAT(ia_sup,             "Invalidate All Command");
+    FEAT(ga_sup,             "Guest APIC");
+    FEAT(he_sup,             "Hardware Error Registers");
+    FEAT(pc_sup,             "Performance Counters");
+    FEAT(hats,               "Host Address Translation Size");
+
+    if ( iommu->features.flds.gt_sup )
     {
-        if ( amd_iommu_has_feature(iommu, i) )
-            printk( " %s\n", feature_str[i]);
-        i++;
+        FEAT(gats,           "Guest Address Translation Size");
+        FEAT(glx_sup,        "Guest CR3 Root Table Level");
+        FEAT(pas_max,        "Maximum PASID");
     }
+
+    FEAT(smif_sup,           "SMI Filter Register");
+    FEAT(smif_rc,            "SMI Filter Register Count");
+    FEAT(gam_sup,            "Guest Virtual APIC Modes");
+    FEAT(dual_ppr_log_sup,   "Dual PPR Log");
+    FEAT(dual_event_log_sup, "Dual Event Log");
+    FEAT(sats_sup,           "Secure ATS");
+    FEAT(us_sup,             "User / Supervisor Page Protection");
+    FEAT(dev_tbl_seg_sup,    "Device Table Segmentation");
+    FEAT(ppr_early_of_sup,   "PPR Log Overflow Early Warning");
+    FEAT(ppr_auto_rsp_sup,   "PPR Automatic Response");
+    FEAT(marc_sup,           "Memory Access Routing and Control");
+    FEAT(blk_stop_mrk_sup,   "Block StopMark Message");
+    FEAT(perf_opt_sup ,      "Performance Optimization");
+    FEAT(msi_cap_mmio_sup,   "MSI Capability MMIO Access");
+    FEAT(gio_sup,            "Guest I/O Protection");
+    FEAT(ha_sup,             "Host Access");
+    FEAT(eph_sup,            "Enhanced PPR Handling");
+    FEAT(attr_fw_sup,        "Attribute Forward");
+    FEAT(hd_sup,             "Host Dirty");
+    FEAT(inv_iotlb_type_sup, "Invalidate IOTLB Type");
+    FEAT(viommu_sup,         "Virtualized IOMMU");
+    FEAT(vm_guard_io_sup,    "VMGuard I/O Support");
+    FEAT(vm_table_size,      "VM Table Size");
+    FEAT(ga_update_dis_sup,  "Guest Access Bit Update Disable");
+
+#undef FEAT
+#undef MASK
 }
 
 int __init amd_iommu_detect_one_acpi(
--- a/xen/drivers/passthrough/amd/iommu_guest.c
+++ b/xen/drivers/passthrough/amd/iommu_guest.c
@@ -638,7 +638,7 @@ static uint64_t iommu_mmio_read64(struct
         val = reg_to_u64(iommu->reg_status);
         break;
     case IOMMU_EXT_FEATURE_MMIO_OFFSET:
-        val = reg_to_u64(iommu->reg_ext_feature);
+        val = iommu->reg_ext_feature.raw;
         break;
 
     default:
@@ -802,39 +802,26 @@ int guest_iommu_set_base(struct domain *
 /* Initialize mmio read only bits */
 static void guest_iommu_reg_init(struct guest_iommu *iommu)
 {
-    uint32_t lower, upper;
+    union amd_iommu_ext_features ef = {
+        /* Support prefetch */
+        .flds.pref_sup = 1,
+        /* Support PPR log */
+        .flds.ppr_sup = 1,
+        /* Support guest translation */
+        .flds.gt_sup = 1,
+        /* Support invalidate all command */
+        .flds.ia_sup = 1,
+        /* Host translation size has 6 levels */
+        .flds.hats = HOST_ADDRESS_SIZE_6_LEVEL,
+        /* Guest translation size has 6 levels */
+        .flds.gats = GUEST_ADDRESS_SIZE_6_LEVEL,
+        /* Single level gCR3 */
+        .flds.glx_sup = GUEST_CR3_1_LEVEL,
+        /* 9 bit PASID */
+        .flds.pas_max = PASMAX_9_bit,
+    };
 
-    lower = upper = 0;
-    /* Support prefetch */
-    iommu_set_bit(&lower,IOMMU_EXT_FEATURE_PREFSUP_SHIFT);
-    /* Support PPR log */
-    iommu_set_bit(&lower,IOMMU_EXT_FEATURE_PPRSUP_SHIFT);
-    /* Support guest translation */
-    iommu_set_bit(&lower,IOMMU_EXT_FEATURE_GTSUP_SHIFT);
-    /* Support invalidate all command */
-    iommu_set_bit(&lower,IOMMU_EXT_FEATURE_IASUP_SHIFT);
-
-    /* Host translation size has 6 levels */
-    set_field_in_reg_u32(HOST_ADDRESS_SIZE_6_LEVEL, lower,
-                         IOMMU_EXT_FEATURE_HATS_MASK,
-                         IOMMU_EXT_FEATURE_HATS_SHIFT,
-                         &lower);
-    /* Guest translation size has 6 levels */
-    set_field_in_reg_u32(GUEST_ADDRESS_SIZE_6_LEVEL, lower,
-                         IOMMU_EXT_FEATURE_GATS_MASK,
-                         IOMMU_EXT_FEATURE_GATS_SHIFT,
-                         &lower);
-    /* Single level gCR3 */
-    set_field_in_reg_u32(GUEST_CR3_1_LEVEL, lower,
-                         IOMMU_EXT_FEATURE_GLXSUP_MASK,
-                         IOMMU_EXT_FEATURE_GLXSUP_SHIFT, &lower);
-    /* 9 bit PASID */
-    set_field_in_reg_u32(PASMAX_9_bit, upper,
-                         IOMMU_EXT_FEATURE_PASMAX_MASK,
-                         IOMMU_EXT_FEATURE_PASMAX_SHIFT, &upper);
-
-    iommu->reg_ext_feature.lo = lower;
-    iommu->reg_ext_feature.hi = upper;
+    iommu->reg_ext_feature = ef;
 }
 
 static int guest_iommu_mmio_range(struct vcpu *v, unsigned long addr)
--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -883,7 +883,7 @@ static void enable_iommu(struct amd_iomm
     register_iommu_event_log_in_mmio_space(iommu);
     register_iommu_exclusion_range(iommu);
 
-    if ( amd_iommu_has_feature(iommu, IOMMU_EXT_FEATURE_PPRSUP_SHIFT) )
+    if ( iommu->features.flds.ppr_sup )
         register_iommu_ppr_log_in_mmio_space(iommu);
 
     desc = irq_to_desc(iommu->msi.irq);
@@ -897,15 +897,15 @@ static void enable_iommu(struct amd_iomm
     set_iommu_command_buffer_control(iommu, IOMMU_CONTROL_ENABLED);
     set_iommu_event_log_control(iommu, IOMMU_CONTROL_ENABLED);
 
-    if ( amd_iommu_has_feature(iommu, IOMMU_EXT_FEATURE_PPRSUP_SHIFT) )
+    if ( iommu->features.flds.ppr_sup )
         set_iommu_ppr_log_control(iommu, IOMMU_CONTROL_ENABLED);
 
-    if ( amd_iommu_has_feature(iommu, IOMMU_EXT_FEATURE_GTSUP_SHIFT) )
+    if ( iommu->features.flds.gt_sup )
         set_iommu_guest_translation_control(iommu, IOMMU_CONTROL_ENABLED);
 
     set_iommu_translation_control(iommu, IOMMU_CONTROL_ENABLED);
 
-    if ( amd_iommu_has_feature(iommu, IOMMU_EXT_FEATURE_IASUP_SHIFT) )
+    if ( iommu->features.flds.ia_sup )
         amd_iommu_flush_all_caches(iommu);
 
     iommu->enabled = 1;
@@ -928,10 +928,10 @@ static void disable_iommu(struct amd_iom
     set_iommu_command_buffer_control(iommu, IOMMU_CONTROL_DISABLED);
     set_iommu_event_log_control(iommu, IOMMU_CONTROL_DISABLED);
 
-    if ( amd_iommu_has_feature(iommu, IOMMU_EXT_FEATURE_PPRSUP_SHIFT) )
+    if ( iommu->features.flds.ppr_sup )
         set_iommu_ppr_log_control(iommu, IOMMU_CONTROL_DISABLED);
 
-    if ( amd_iommu_has_feature(iommu, IOMMU_EXT_FEATURE_GTSUP_SHIFT) )
+    if ( iommu->features.flds.gt_sup )
         set_iommu_guest_translation_control(iommu, IOMMU_CONTROL_DISABLED);
 
     set_iommu_translation_control(iommu, IOMMU_CONTROL_DISABLED);
@@ -1027,7 +1027,7 @@ static int __init amd_iommu_init_one(str
 
     get_iommu_features(iommu);
 
-    if ( iommu->features )
+    if ( iommu->features.raw )
         iommuv2_enabled = 1;
 
     if ( allocate_cmd_buffer(iommu) == NULL )
@@ -1036,9 +1036,8 @@ static int __init amd_iommu_init_one(str
     if ( allocate_event_log(iommu) == NULL )
         goto error_out;
 
-    if ( amd_iommu_has_feature(iommu, IOMMU_EXT_FEATURE_PPRSUP_SHIFT) )
-        if ( allocate_ppr_log(iommu) == NULL )
-            goto error_out;
+    if ( iommu->features.flds.ppr_sup && !allocate_ppr_log(iommu) )
+        goto error_out;
 
     if ( !set_iommu_interrupt_handler(iommu) )
         goto error_out;
@@ -1389,7 +1388,7 @@ void amd_iommu_resume(void)
     }
 
     /* flush all cache entries after iommu re-enabled */
-    if ( !amd_iommu_has_feature(iommu, IOMMU_EXT_FEATURE_IASUP_SHIFT) )
+    if ( !iommu->features.flds.ia_sup )
     {
         invalidate_all_devices();
         invalidate_all_domain_pages();
--- a/xen/include/asm-x86/amd-iommu.h
+++ b/xen/include/asm-x86/amd-iommu.h
@@ -83,7 +83,7 @@ struct amd_iommu {
     iommu_cap_t cap;
 
     u8 ht_flags;
-    u64 features;
+    union amd_iommu_ext_features features;
 
     void *mmio_base;
     unsigned long mmio_base_phys;
@@ -174,7 +174,7 @@ struct guest_iommu {
     /* MMIO regs */
     struct mmio_reg         reg_ctrl;              /* MMIO offset 0018h */
     struct mmio_reg         reg_status;            /* MMIO offset 2020h */
-    struct mmio_reg         reg_ext_feature;       /* MMIO offset 0030h */
+    union amd_iommu_ext_features reg_ext_feature;  /* MMIO offset 0030h */
 
     /* guest interrupt settings */
     struct guest_iommu_msi  msi;
--- a/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
+++ b/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
@@ -346,26 +346,57 @@ struct amd_iommu_dte {
 #define IOMMU_EXCLUSION_LIMIT_HIGH_MASK		0xFFFFFFFF
 #define IOMMU_EXCLUSION_LIMIT_HIGH_SHIFT	0
 
-/* Extended Feature Register*/
+/* Extended Feature Register */
 #define IOMMU_EXT_FEATURE_MMIO_OFFSET                   0x30
-#define IOMMU_EXT_FEATURE_PREFSUP_SHIFT                 0x0
-#define IOMMU_EXT_FEATURE_PPRSUP_SHIFT                  0x1
-#define IOMMU_EXT_FEATURE_XTSUP_SHIFT                   0x2
-#define IOMMU_EXT_FEATURE_NXSUP_SHIFT                   0x3
-#define IOMMU_EXT_FEATURE_GTSUP_SHIFT                   0x4
-#define IOMMU_EXT_FEATURE_IASUP_SHIFT                   0x6
-#define IOMMU_EXT_FEATURE_GASUP_SHIFT                   0x7
-#define IOMMU_EXT_FEATURE_HESUP_SHIFT                   0x8
-#define IOMMU_EXT_FEATURE_PCSUP_SHIFT                   0x9
-#define IOMMU_EXT_FEATURE_HATS_SHIFT                    0x10
-#define IOMMU_EXT_FEATURE_HATS_MASK                     0x00000C00
-#define IOMMU_EXT_FEATURE_GATS_SHIFT                    0x12
-#define IOMMU_EXT_FEATURE_GATS_MASK                     0x00003000
-#define IOMMU_EXT_FEATURE_GLXSUP_SHIFT                  0x14
-#define IOMMU_EXT_FEATURE_GLXSUP_MASK                   0x0000C000
 
-#define IOMMU_EXT_FEATURE_PASMAX_SHIFT                  0x0
-#define IOMMU_EXT_FEATURE_PASMAX_MASK                   0x0000001F
+union amd_iommu_ext_features {
+    uint64_t raw;
+    struct {
+        unsigned int pref_sup:1;
+        unsigned int ppr_sup:1;
+        unsigned int xt_sup:1;
+        unsigned int nx_sup:1;
+        unsigned int gt_sup:1;
+        unsigned int gappi_sup:1;
+        unsigned int ia_sup:1;
+        unsigned int ga_sup:1;
+        unsigned int he_sup:1;
+        unsigned int pc_sup:1;
+        unsigned int hats:2;
+        unsigned int gats:2;
+        unsigned int glx_sup:2;
+        unsigned int smif_sup:2;
+        unsigned int smif_rc:3;
+        unsigned int gam_sup:3;
+        unsigned int dual_ppr_log_sup:2;
+        unsigned int :2;
+        unsigned int dual_event_log_sup:2;
+        unsigned int :1;
+        unsigned int sats_sup:1;
+        unsigned int pas_max:5;
+        unsigned int us_sup:1;
+        unsigned int dev_tbl_seg_sup:2;
+        unsigned int ppr_early_of_sup:1;
+        unsigned int ppr_auto_rsp_sup:1;
+        unsigned int marc_sup:2;
+        unsigned int blk_stop_mrk_sup:1;
+        unsigned int perf_opt_sup:1;
+        unsigned int msi_cap_mmio_sup:1;
+        unsigned int :1;
+        unsigned int gio_sup:1;
+        unsigned int ha_sup:1;
+        unsigned int eph_sup:1;
+        unsigned int attr_fw_sup:1;
+        unsigned int hd_sup:1;
+        unsigned int :1;
+        unsigned int inv_iotlb_type_sup:1;
+        unsigned int viommu_sup:1;
+        unsigned int vm_guard_io_sup:1;
+        unsigned int vm_table_size:4;
+        unsigned int ga_update_dis_sup:1;
+        unsigned int :2;
+    } flds;
+};
 
 /* Status Register*/
 #define IOMMU_STATUS_MMIO_OFFSET		0x2020
--- a/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
+++ b/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
@@ -219,13 +219,6 @@ static inline int iommu_has_cap(struct a
     return !!(iommu->cap.header & (1u << bit));
 }
 
-static inline int amd_iommu_has_feature(struct amd_iommu *iommu, uint32_t bit)
-{
-    if ( !iommu_has_cap(iommu, PCI_CAP_EFRSUP_SHIFT) )
-        return 0;
-    return !!(iommu->features & (1U << bit));
-}
-
 /* access tail or head pointer of ring buffer */
 static inline uint32_t iommu_get_rb_pointer(uint32_t reg)
 {




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 02/10] AMD/IOMMU: use bit field for extended feature register
Posted by Andrew Cooper 4 years, 9 months ago
On 27/06/2019 16:19, Jan Beulich wrote:
>      printk("AMD-Vi: IOMMU Extended Features:\n");
>  
> -    while ( feature_str[i] )
> +#define MASK(fld) ((union amd_iommu_ext_features){ .flds.fld = ~0 }).raw
> +#define FEAT(fld, str) do { \
> +    if ( MASK(fld) & (MASK(fld) - 1) ) \
> +        printk( "- " str ": %#x\n", iommu->features.flds.fld); \
> +    else if ( iommu->features.raw & MASK(fld) ) \
> +        printk( "- " str "\n"); \
> +} while ( false )

Sadly, Clang dislikes this construct.

https://gitlab.com/xen-project/people/andyhhp/xen/-/jobs/243795095 
(Click on the "Complete Raw" button)

iommu_detect.c:90:5: error: implicit truncation from 'int' to bitfield changes value from -1 to 1 [-Werror,-Wbitfield-constant-conversion]
    FEAT(pref_sup,           "Prefetch Pages Command");
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
iommu_detect.c:84:10: note: expanded from macro 'FEAT'
    if ( MASK(fld) & (MASK(fld) - 1) ) \
         ^~~~~~~~~
iommu_detect.c:82:64: note: expanded from macro 'MASK'
#define MASK(fld) ((union amd_iommu_ext_features){ .flds.fld = ~0 }).raw
                                                               ^~


which is a shame.  Furthermore, switching to ~(0u) won't work either,
because that will then get a truncation warning.

Clever as this trick is, this is write-once code and isn't going to
change moving forward.  I'd do away with the compile-time cleverness and
have simple FEAT() and MASK() macros, and use the correct one below.

> --- a/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
> +++ b/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
> @@ -346,26 +346,57 @@ struct amd_iommu_dte {
> +union amd_iommu_ext_features {
> +    uint64_t raw;
> +    struct {
> +        unsigned int pref_sup:1;
> +        unsigned int ppr_sup:1;
> +        unsigned int xt_sup:1;
> +        unsigned int nx_sup:1;
> +        unsigned int gt_sup:1;
> +        unsigned int gappi_sup:1;
> +        unsigned int ia_sup:1;
> +        unsigned int ga_sup:1;
> +        unsigned int he_sup:1;
> +        unsigned int pc_sup:1;
> +        unsigned int hats:2;
> +        unsigned int gats:2;
> +        unsigned int glx_sup:2;
> +        unsigned int smif_sup:2;
> +        unsigned int smif_rc:3;
> +        unsigned int gam_sup:3;
> +        unsigned int dual_ppr_log_sup:2;
> +        unsigned int :2;
> +        unsigned int dual_event_log_sup:2;
> +        unsigned int :1;
> +        unsigned int sats_sup:1;
> +        unsigned int pas_max:5;
> +        unsigned int us_sup:1;
> +        unsigned int dev_tbl_seg_sup:2;
> +        unsigned int ppr_early_of_sup:1;
> +        unsigned int ppr_auto_rsp_sup:1;
> +        unsigned int marc_sup:2;
> +        unsigned int blk_stop_mrk_sup:1;
> +        unsigned int perf_opt_sup:1;
> +        unsigned int msi_cap_mmio_sup:1;
> +        unsigned int :1;
> +        unsigned int gio_sup:1;
> +        unsigned int ha_sup:1;
> +        unsigned int eph_sup:1;
> +        unsigned int attr_fw_sup:1;
> +        unsigned int hd_sup:1;
> +        unsigned int :1;
> +        unsigned int inv_iotlb_type_sup:1;
> +        unsigned int viommu_sup:1;
> +        unsigned int vm_guard_io_sup:1;
> +        unsigned int vm_table_size:4;
> +        unsigned int ga_update_dis_sup:1;
> +        unsigned int :2;
> +    } flds;

Why the .flds name?  What is wrong with this becoming anonymous?

~Andrew
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 02/10] AMD/IOMMU: use bit field for extended feature register
Posted by Jan Beulich 4 years, 9 months ago
On 02.07.2019 14:09, Andrew Cooper wrote:
> On 27/06/2019 16:19, Jan Beulich wrote:
>>       printk("AMD-Vi: IOMMU Extended Features:\n");
>>   
>> -    while ( feature_str[i] )
>> +#define MASK(fld) ((union amd_iommu_ext_features){ .flds.fld = ~0 }).raw
>> +#define FEAT(fld, str) do { \
>> +    if ( MASK(fld) & (MASK(fld) - 1) ) \
>> +        printk( "- " str ": %#x\n", iommu->features.flds.fld); \
>> +    else if ( iommu->features.raw & MASK(fld) ) \
>> +        printk( "- " str "\n"); \
>> +} while ( false )
> 
> Sadly, Clang dislikes this construct.
> 
> https://gitlab.com/xen-project/people/andyhhp/xen/-/jobs/243795095
> (Click on the "Complete Raw" button)

It it possible that this has expired in the meantime? I can't seem to
be able to access it. But then, with what you write below, I probably
also have enough information.

> iommu_detect.c:90:5: error: implicit truncation from 'int' to bitfield changes value from -1 to 1 [-Werror,-Wbitfield-constant-conversion]
>      FEAT(pref_sup,           "Prefetch Pages Command");
>      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> iommu_detect.c:84:10: note: expanded from macro 'FEAT'
>      if ( MASK(fld) & (MASK(fld) - 1) ) \
>           ^~~~~~~~~
> iommu_detect.c:82:64: note: expanded from macro 'MASK'
> #define MASK(fld) ((union amd_iommu_ext_features){ .flds.fld = ~0 }).raw
>                                                                 ^~
> 
> 
> which is a shame.  Furthermore, switching to ~(0u) won't work either,
> because that will then get a truncation warning.
> 
> Clever as this trick is, this is write-once code and isn't going to
> change moving forward.  I'd do away with the compile-time cleverness and
> have simple FEAT() and MASK() macros, and use the correct one below.

If only I knew what you mean with "simple FEAT() and MASK() macros".
I can't think of variants not requiring to also introduce literal
numbers to use as constants. I'll (not just) therefore try to modify
the original approach such that hopefully there won't be any overflow
detected anymore. I can't see any such issue with the clang I use
anyway.

Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 02/10] AMD/IOMMU: use bit field for extended feature register
Posted by Jan Beulich 4 years, 9 months ago
On 02.07.2019 14:09, Andrew Cooper wrote:
> On 27/06/2019 16:19, Jan Beulich wrote:
>>       printk("AMD-Vi: IOMMU Extended Features:\n");
>>   
>> -    while ( feature_str[i] )
>> +#define MASK(fld) ((union amd_iommu_ext_features){ .flds.fld = ~0 }).raw
>> +#define FEAT(fld, str) do { \
>> +    if ( MASK(fld) & (MASK(fld) - 1) ) \
>> +        printk( "- " str ": %#x\n", iommu->features.flds.fld); \
>> +    else if ( iommu->features.raw & MASK(fld) ) \
>> +        printk( "- " str "\n"); \
>> +} while ( false )
> 
> Sadly, Clang dislikes this construct.
> 
> https://gitlab.com/xen-project/people/andyhhp/xen/-/jobs/243795095
> (Click on the "Complete Raw" button)
> 
> iommu_detect.c:90:5: error: implicit truncation from 'int' to bitfield changes value from -1 to 1 [-Werror,-Wbitfield-constant-conversion]
>      FEAT(pref_sup,           "Prefetch Pages Command");
>      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> iommu_detect.c:84:10: note: expanded from macro 'FEAT'
>      if ( MASK(fld) & (MASK(fld) - 1) ) \
>           ^~~~~~~~~
> iommu_detect.c:82:64: note: expanded from macro 'MASK'
> #define MASK(fld) ((union amd_iommu_ext_features){ .flds.fld = ~0 }).raw
>                                                                 ^~
> 
> 
> which is a shame.  Furthermore, switching to ~(0u) won't work either,
> because that will then get a truncation warning.
> 
> Clever as this trick is, this is write-once code and isn't going to
> change moving forward.  I'd do away with the compile-time cleverness and
> have simple FEAT() and MASK() macros, and use the correct one below.

I don't immediately see what you would mean by "simple FEAT() and MASK()
macros", but perhaps I'll figure when I actually make this change. What
I'm concerned about when changing away from the chosen model is that
there'll likely be a need to explicitly know whether a field is just a
boolean or holds an actual (wider) value. I.e. that's what is not "write
once" about this code, since future additions equally become more
fragile.

I was actually hoping to use this "mask from bitfield" approach
elsewhere, so this is yet another case where I wonder whether us wanting
to be able to build with clang is actually becoming an increasing
hindrance.

I'll see if I can come up with something else, still matching the
original idea. Clearly clang can't be consistent with its value
truncation warnings, or else Xen wouldn't build with it at all.

>> --- a/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
>> +++ b/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
>> @@ -346,26 +346,57 @@ struct amd_iommu_dte {
>> +union amd_iommu_ext_features {
>> +    uint64_t raw;
>> +    struct {
>> +        unsigned int pref_sup:1;
>> +        unsigned int ppr_sup:1;
>> +        unsigned int xt_sup:1;
>> +        unsigned int nx_sup:1;
>> +        unsigned int gt_sup:1;
>> +        unsigned int gappi_sup:1;
>> +        unsigned int ia_sup:1;
>> +        unsigned int ga_sup:1;
>> +        unsigned int he_sup:1;
>> +        unsigned int pc_sup:1;
>> +        unsigned int hats:2;
>> +        unsigned int gats:2;
>> +        unsigned int glx_sup:2;
>> +        unsigned int smif_sup:2;
>> +        unsigned int smif_rc:3;
>> +        unsigned int gam_sup:3;
>> +        unsigned int dual_ppr_log_sup:2;
>> +        unsigned int :2;
>> +        unsigned int dual_event_log_sup:2;
>> +        unsigned int :1;
>> +        unsigned int sats_sup:1;
>> +        unsigned int pas_max:5;
>> +        unsigned int us_sup:1;
>> +        unsigned int dev_tbl_seg_sup:2;
>> +        unsigned int ppr_early_of_sup:1;
>> +        unsigned int ppr_auto_rsp_sup:1;
>> +        unsigned int marc_sup:2;
>> +        unsigned int blk_stop_mrk_sup:1;
>> +        unsigned int perf_opt_sup:1;
>> +        unsigned int msi_cap_mmio_sup:1;
>> +        unsigned int :1;
>> +        unsigned int gio_sup:1;
>> +        unsigned int ha_sup:1;
>> +        unsigned int eph_sup:1;
>> +        unsigned int attr_fw_sup:1;
>> +        unsigned int hd_sup:1;
>> +        unsigned int :1;
>> +        unsigned int inv_iotlb_type_sup:1;
>> +        unsigned int viommu_sup:1;
>> +        unsigned int vm_guard_io_sup:1;
>> +        unsigned int vm_table_size:4;
>> +        unsigned int ga_update_dis_sup:1;
>> +        unsigned int :2;
>> +    } flds;
> 
> Why the .flds name?  What is wrong with this becoming anonymous?

The initializer in guest_iommu_reg_init() (with old gcc).

Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v2 03/10] AMD/IOMMU: use bit field for control register
Posted by Jan Beulich 4 years, 10 months ago
Also introduce a field in struct amd_iommu caching the most recently
written control register. All writes should now happen exclusively from
that cached value, such that it is guaranteed to be up to date.

Take the opportunity and add further fields. Also convert a few boolean
function parameters to bool, such that use of !! can be avoided.

Because of there now being definitions beyond bit 31, writel() also gets
replaced by writeq() when updating hardware.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: Add domain_id_pne field. Mention writel() -> writeq() change.

--- a/xen/drivers/passthrough/amd/iommu_guest.c
+++ b/xen/drivers/passthrough/amd/iommu_guest.c
@@ -317,7 +317,7 @@ static int do_invalidate_iotlb_pages(str
 
 static int do_completion_wait(struct domain *d, cmd_entry_t *cmd)
 {
-    bool_t com_wait_int_en, com_wait_int, i, s;
+    bool com_wait_int, i, s;
     struct guest_iommu *iommu;
     unsigned long gfn;
     p2m_type_t p2mt;
@@ -354,12 +354,10 @@ static int do_completion_wait(struct dom
         unmap_domain_page(vaddr);
     }
 
-    com_wait_int_en = iommu_get_bit(iommu->reg_ctrl.lo,
-                                    IOMMU_CONTROL_COMP_WAIT_INT_SHIFT);
     com_wait_int = iommu_get_bit(iommu->reg_status.lo,
                                  IOMMU_STATUS_COMP_WAIT_INT_SHIFT);
 
-    if ( com_wait_int_en && com_wait_int )
+    if ( iommu->reg_ctrl.com_wait_int_en && com_wait_int )
         guest_iommu_deliver_msi(d);
 
     return 0;
@@ -521,40 +519,17 @@ static void guest_iommu_process_command(
     return;
 }
 
-static int guest_iommu_write_ctrl(struct guest_iommu *iommu, uint64_t newctrl)
+static int guest_iommu_write_ctrl(struct guest_iommu *iommu, uint64_t val)
 {
-    bool_t cmd_en, event_en, iommu_en, ppr_en, ppr_log_en;
-    bool_t cmd_en_old, event_en_old, iommu_en_old;
-    bool_t cmd_run;
-
-    iommu_en = iommu_get_bit(newctrl,
-                             IOMMU_CONTROL_TRANSLATION_ENABLE_SHIFT);
-    iommu_en_old = iommu_get_bit(iommu->reg_ctrl.lo,
-                                 IOMMU_CONTROL_TRANSLATION_ENABLE_SHIFT);
-
-    cmd_en = iommu_get_bit(newctrl,
-                           IOMMU_CONTROL_COMMAND_BUFFER_ENABLE_SHIFT);
-    cmd_en_old = iommu_get_bit(iommu->reg_ctrl.lo,
-                               IOMMU_CONTROL_COMMAND_BUFFER_ENABLE_SHIFT);
-    cmd_run = iommu_get_bit(iommu->reg_status.lo,
-                            IOMMU_STATUS_CMD_BUFFER_RUN_SHIFT);
-    event_en = iommu_get_bit(newctrl,
-                             IOMMU_CONTROL_EVENT_LOG_ENABLE_SHIFT);
-    event_en_old = iommu_get_bit(iommu->reg_ctrl.lo,
-                                 IOMMU_CONTROL_EVENT_LOG_ENABLE_SHIFT);
-
-    ppr_en = iommu_get_bit(newctrl,
-                           IOMMU_CONTROL_PPR_ENABLE_SHIFT);
-    ppr_log_en = iommu_get_bit(newctrl,
-                               IOMMU_CONTROL_PPR_LOG_ENABLE_SHIFT);
+    union amd_iommu_control newctrl = { .raw = val };
 
-    if ( iommu_en )
+    if ( newctrl.iommu_en )
     {
         guest_iommu_enable(iommu);
         guest_iommu_enable_dev_table(iommu);
     }
 
-    if ( iommu_en && cmd_en )
+    if ( newctrl.iommu_en && newctrl.cmd_buf_en )
     {
         guest_iommu_enable_ring_buffer(iommu, &iommu->cmd_buffer,
                                        sizeof(cmd_entry_t));
@@ -562,7 +537,7 @@ static int guest_iommu_write_ctrl(struct
         tasklet_schedule(&iommu->cmd_buffer_tasklet);
     }
 
-    if ( iommu_en && event_en )
+    if ( newctrl.iommu_en && newctrl.event_log_en )
     {
         guest_iommu_enable_ring_buffer(iommu, &iommu->event_log,
                                        sizeof(event_entry_t));
@@ -570,7 +545,7 @@ static int guest_iommu_write_ctrl(struct
         guest_iommu_clear_status(iommu, IOMMU_STATUS_EVENT_OVERFLOW_SHIFT);
     }
 
-    if ( iommu_en && ppr_en && ppr_log_en )
+    if ( newctrl.iommu_en && newctrl.ppr_en && newctrl.ppr_log_en )
     {
         guest_iommu_enable_ring_buffer(iommu, &iommu->ppr_log,
                                        sizeof(ppr_entry_t));
@@ -578,19 +553,21 @@ static int guest_iommu_write_ctrl(struct
         guest_iommu_clear_status(iommu, IOMMU_STATUS_PPR_LOG_OVERFLOW_SHIFT);
     }
 
-    if ( iommu_en && cmd_en_old && !cmd_en )
+    if ( newctrl.iommu_en && iommu->reg_ctrl.cmd_buf_en &&
+         !newctrl.cmd_buf_en )
     {
         /* Disable iommu command processing */
         tasklet_kill(&iommu->cmd_buffer_tasklet);
     }
 
-    if ( event_en_old && !event_en )
+    if ( iommu->reg_ctrl.event_log_en && !newctrl.event_log_en )
         guest_iommu_clear_status(iommu, IOMMU_STATUS_EVENT_LOG_RUN_SHIFT);
 
-    if ( iommu_en_old && !iommu_en )
+    if ( iommu->reg_ctrl.iommu_en && !newctrl.iommu_en )
         guest_iommu_disable(iommu);
 
-    u64_to_reg(&iommu->reg_ctrl, newctrl);
+    iommu->reg_ctrl = newctrl;
+
     return 0;
 }
 
@@ -632,7 +609,7 @@ static uint64_t iommu_mmio_read64(struct
         val = reg_to_u64(iommu->ppr_log.reg_tail);
         break;
     case IOMMU_CONTROL_MMIO_OFFSET:
-        val = reg_to_u64(iommu->reg_ctrl);
+        val = iommu->reg_ctrl.raw;
         break;
     case IOMMU_STATUS_MMIO_OFFSET:
         val = reg_to_u64(iommu->reg_status);
--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -41,7 +41,7 @@ LIST_HEAD_READ_MOSTLY(amd_iommu_head);
 struct table_struct device_table;
 bool_t iommuv2_enabled;
 
-static int iommu_has_ht_flag(struct amd_iommu *iommu, u8 mask)
+static bool iommu_has_ht_flag(struct amd_iommu *iommu, u8 mask)
 {
     return iommu->ht_flags & mask;
 }
@@ -69,31 +69,18 @@ static void __init unmap_iommu_mmio_regi
 
 static void set_iommu_ht_flags(struct amd_iommu *iommu)
 {
-    u32 entry;
-    entry = readl(iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
-
     /* Setup HT flags */
     if ( iommu_has_cap(iommu, PCI_CAP_HT_TUNNEL_SHIFT) )
-        iommu_has_ht_flag(iommu, ACPI_IVHD_TT_ENABLE) ?
-            iommu_set_bit(&entry, IOMMU_CONTROL_HT_TUNNEL_TRANSLATION_SHIFT) :
-            iommu_clear_bit(&entry, IOMMU_CONTROL_HT_TUNNEL_TRANSLATION_SHIFT);
-
-    iommu_has_ht_flag(iommu, ACPI_IVHD_RES_PASS_PW) ?
-        iommu_set_bit(&entry, IOMMU_CONTROL_RESP_PASS_POSTED_WRITE_SHIFT):
-        iommu_clear_bit(&entry, IOMMU_CONTROL_RESP_PASS_POSTED_WRITE_SHIFT);
-
-    iommu_has_ht_flag(iommu, ACPI_IVHD_ISOC) ?
-        iommu_set_bit(&entry, IOMMU_CONTROL_ISOCHRONOUS_SHIFT):
-        iommu_clear_bit(&entry, IOMMU_CONTROL_ISOCHRONOUS_SHIFT);
-
-    iommu_has_ht_flag(iommu, ACPI_IVHD_PASS_PW) ?
-        iommu_set_bit(&entry, IOMMU_CONTROL_PASS_POSTED_WRITE_SHIFT):
-        iommu_clear_bit(&entry, IOMMU_CONTROL_PASS_POSTED_WRITE_SHIFT);
+        iommu->ctrl.ht_tun_en = iommu_has_ht_flag(iommu, ACPI_IVHD_TT_ENABLE);
+
+    iommu->ctrl.pass_pw     = iommu_has_ht_flag(iommu, ACPI_IVHD_PASS_PW);
+    iommu->ctrl.res_pass_pw = iommu_has_ht_flag(iommu, ACPI_IVHD_RES_PASS_PW);
+    iommu->ctrl.isoc        = iommu_has_ht_flag(iommu, ACPI_IVHD_ISOC);
 
     /* Force coherent */
-    iommu_set_bit(&entry, IOMMU_CONTROL_COHERENT_SHIFT);
+    iommu->ctrl.coherent = 1;
 
-    writel(entry, iommu->mmio_base+IOMMU_CONTROL_MMIO_OFFSET);
+    writeq(iommu->ctrl.raw, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
 }
 
 static void register_iommu_dev_table_in_mmio_space(struct amd_iommu *iommu)
@@ -205,55 +192,37 @@ static void register_iommu_ppr_log_in_mm
 
 
 static void set_iommu_translation_control(struct amd_iommu *iommu,
-                                                 int enable)
+                                          bool enable)
 {
-    u32 entry;
+    iommu->ctrl.iommu_en = enable;
 
-    entry = readl(iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
-
-    enable ?
-        iommu_set_bit(&entry, IOMMU_CONTROL_TRANSLATION_ENABLE_SHIFT) :
-        iommu_clear_bit(&entry, IOMMU_CONTROL_TRANSLATION_ENABLE_SHIFT);
-
-    writel(entry, iommu->mmio_base+IOMMU_CONTROL_MMIO_OFFSET);
+    writeq(iommu->ctrl.raw, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
 }
 
 static void set_iommu_guest_translation_control(struct amd_iommu *iommu,
-                                                int enable)
+                                                bool enable)
 {
-    u32 entry;
-
-    entry = readl(iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
+    iommu->ctrl.gt_en = enable;
 
-    enable ?
-        iommu_set_bit(&entry, IOMMU_CONTROL_GT_ENABLE_SHIFT) :
-        iommu_clear_bit(&entry, IOMMU_CONTROL_GT_ENABLE_SHIFT);
-
-    writel(entry, iommu->mmio_base+IOMMU_CONTROL_MMIO_OFFSET);
+    writeq(iommu->ctrl.raw, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
 
     if ( enable )
         AMD_IOMMU_DEBUG("Guest Translation Enabled.\n");
 }
 
 static void set_iommu_command_buffer_control(struct amd_iommu *iommu,
-                                                    int enable)
+                                             bool enable)
 {
-    u32 entry;
-
-    entry = readl(iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
-
-    /*reset head and tail pointer manually before enablement */
+    /* Reset head and tail pointer manually before enablement */
     if ( enable )
     {
         writeq(0, iommu->mmio_base + IOMMU_CMD_BUFFER_HEAD_OFFSET);
         writeq(0, iommu->mmio_base + IOMMU_CMD_BUFFER_TAIL_OFFSET);
-
-        iommu_set_bit(&entry, IOMMU_CONTROL_COMMAND_BUFFER_ENABLE_SHIFT);
     }
-    else
-        iommu_clear_bit(&entry, IOMMU_CONTROL_COMMAND_BUFFER_ENABLE_SHIFT);
 
-    writel(entry, iommu->mmio_base+IOMMU_CONTROL_MMIO_OFFSET);
+    iommu->ctrl.cmd_buf_en = enable;
+
+    writeq(iommu->ctrl.raw, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
 }
 
 static void register_iommu_exclusion_range(struct amd_iommu *iommu)
@@ -295,57 +264,38 @@ static void register_iommu_exclusion_ran
 }
 
 static void set_iommu_event_log_control(struct amd_iommu *iommu,
-            int enable)
+                                        bool enable)
 {
-    u32 entry;
-
-    entry = readl(iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
-
-    /*reset head and tail pointer manually before enablement */
+    /* Reset head and tail pointer manually before enablement */
     if ( enable )
     {
         writeq(0, iommu->mmio_base + IOMMU_EVENT_LOG_HEAD_OFFSET);
         writeq(0, iommu->mmio_base + IOMMU_EVENT_LOG_TAIL_OFFSET);
-
-        iommu_set_bit(&entry, IOMMU_CONTROL_EVENT_LOG_INT_SHIFT);
-        iommu_set_bit(&entry, IOMMU_CONTROL_EVENT_LOG_ENABLE_SHIFT);
-    }
-    else
-    {
-        iommu_clear_bit(&entry, IOMMU_CONTROL_EVENT_LOG_INT_SHIFT);
-        iommu_clear_bit(&entry, IOMMU_CONTROL_EVENT_LOG_ENABLE_SHIFT);
     }
 
-    iommu_clear_bit(&entry, IOMMU_CONTROL_COMP_WAIT_INT_SHIFT);
+    iommu->ctrl.event_int_en = enable;
+    iommu->ctrl.event_log_en = enable;
+    iommu->ctrl.com_wait_int_en = 0;
 
-    writel(entry, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
+    writeq(iommu->ctrl.raw, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
 }
 
 static void set_iommu_ppr_log_control(struct amd_iommu *iommu,
-                                      int enable)
+                                      bool enable)
 {
-    u32 entry;
-
-    entry = readl(iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
-
-    /*reset head and tail pointer manually before enablement */
+    /* Reset head and tail pointer manually before enablement */
     if ( enable )
     {
         writeq(0, iommu->mmio_base + IOMMU_PPR_LOG_HEAD_OFFSET);
         writeq(0, iommu->mmio_base + IOMMU_PPR_LOG_TAIL_OFFSET);
-
-        iommu_set_bit(&entry, IOMMU_CONTROL_PPR_ENABLE_SHIFT);
-        iommu_set_bit(&entry, IOMMU_CONTROL_PPR_LOG_INT_SHIFT);
-        iommu_set_bit(&entry, IOMMU_CONTROL_PPR_LOG_ENABLE_SHIFT);
-    }
-    else
-    {
-        iommu_clear_bit(&entry, IOMMU_CONTROL_PPR_ENABLE_SHIFT);
-        iommu_clear_bit(&entry, IOMMU_CONTROL_PPR_LOG_INT_SHIFT);
-        iommu_clear_bit(&entry, IOMMU_CONTROL_PPR_LOG_ENABLE_SHIFT);
     }
 
-    writel(entry, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
+    iommu->ctrl.ppr_en = enable;
+    iommu->ctrl.ppr_int_en = enable;
+    iommu->ctrl.ppr_log_en = enable;
+
+    writeq(iommu->ctrl.raw, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
+
     if ( enable )
         AMD_IOMMU_DEBUG("PPR Log Enabled.\n");
 }
@@ -398,7 +348,7 @@ static int iommu_read_log(struct amd_iom
 /* reset event log or ppr log when overflow */
 static void iommu_reset_log(struct amd_iommu *iommu,
                             struct ring_buffer *log,
-                            void (*ctrl_func)(struct amd_iommu *iommu, int))
+                            void (*ctrl_func)(struct amd_iommu *iommu, bool))
 {
     u32 entry;
     int log_run, run_bit;
@@ -615,11 +565,11 @@ static void iommu_check_event_log(struct
         iommu_reset_log(iommu, &iommu->event_log, set_iommu_event_log_control);
     else
     {
-        entry = readl(iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
-        if ( !(entry & IOMMU_CONTROL_EVENT_LOG_INT_MASK) )
+        if ( !iommu->ctrl.event_int_en )
         {
-            entry |= IOMMU_CONTROL_EVENT_LOG_INT_MASK;
-            writel(entry, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
+            iommu->ctrl.event_int_en = 1;
+            writeq(iommu->ctrl.raw,
+                   iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
             /*
              * Re-schedule the tasklet to handle eventual log entries added
              * between reading the log above and re-enabling the interrupt.
@@ -704,11 +654,11 @@ static void iommu_check_ppr_log(struct a
         iommu_reset_log(iommu, &iommu->ppr_log, set_iommu_ppr_log_control);
     else
     {
-        entry = readl(iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
-        if ( !(entry & IOMMU_CONTROL_PPR_LOG_INT_MASK) )
+        if ( !iommu->ctrl.ppr_int_en )
         {
-            entry |= IOMMU_CONTROL_PPR_LOG_INT_MASK;
-            writel(entry, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
+            iommu->ctrl.ppr_int_en = 1;
+            writeq(iommu->ctrl.raw,
+                   iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
             /*
              * Re-schedule the tasklet to handle eventual log entries added
              * between reading the log above and re-enabling the interrupt.
@@ -754,7 +704,6 @@ static void do_amd_iommu_irq(unsigned lo
 static void iommu_interrupt_handler(int irq, void *dev_id,
                                     struct cpu_user_regs *regs)
 {
-    u32 entry;
     unsigned long flags;
     struct amd_iommu *iommu = dev_id;
 
@@ -764,10 +713,9 @@ static void iommu_interrupt_handler(int
      * Silence interrupts from both event and PPR by clearing the
      * enable logging bits in the control register
      */
-    entry = readl(iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
-    iommu_clear_bit(&entry, IOMMU_CONTROL_EVENT_LOG_INT_SHIFT);
-    iommu_clear_bit(&entry, IOMMU_CONTROL_PPR_LOG_INT_SHIFT);
-    writel(entry, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
+    iommu->ctrl.event_int_en = 0;
+    iommu->ctrl.ppr_int_en = 0;
+    writeq(iommu->ctrl.raw, iommu->mmio_base + IOMMU_CONTROL_MMIO_OFFSET);
 
     spin_unlock_irqrestore(&iommu->lock, flags);
 
--- a/xen/include/asm-x86/amd-iommu.h
+++ b/xen/include/asm-x86/amd-iommu.h
@@ -88,6 +88,8 @@ struct amd_iommu {
     void *mmio_base;
     unsigned long mmio_base_phys;
 
+    union amd_iommu_control ctrl;
+
     struct table_struct dev_table;
     struct ring_buffer cmd_buffer;
     struct ring_buffer event_log;
@@ -172,7 +174,7 @@ struct guest_iommu {
     uint64_t                mmio_base;             /* MMIO base address */
 
     /* MMIO regs */
-    struct mmio_reg         reg_ctrl;              /* MMIO offset 0018h */
+    union amd_iommu_control reg_ctrl;              /* MMIO offset 0018h */
     struct mmio_reg         reg_status;            /* MMIO offset 2020h */
     union amd_iommu_ext_features reg_ext_feature;  /* MMIO offset 0030h */
 
--- a/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
+++ b/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
@@ -295,38 +295,56 @@ struct amd_iommu_dte {
 
 /* Control Register */
 #define IOMMU_CONTROL_MMIO_OFFSET			0x18
-#define IOMMU_CONTROL_TRANSLATION_ENABLE_MASK		0x00000001
-#define IOMMU_CONTROL_TRANSLATION_ENABLE_SHIFT		0
-#define IOMMU_CONTROL_HT_TUNNEL_TRANSLATION_MASK	0x00000002
-#define IOMMU_CONTROL_HT_TUNNEL_TRANSLATION_SHIFT	1
-#define IOMMU_CONTROL_EVENT_LOG_ENABLE_MASK		0x00000004
-#define IOMMU_CONTROL_EVENT_LOG_ENABLE_SHIFT		2
-#define IOMMU_CONTROL_EVENT_LOG_INT_MASK		0x00000008
-#define IOMMU_CONTROL_EVENT_LOG_INT_SHIFT		3
-#define IOMMU_CONTROL_COMP_WAIT_INT_MASK		0x00000010
-#define IOMMU_CONTROL_COMP_WAIT_INT_SHIFT		4
-#define IOMMU_CONTROL_INVALIDATION_TIMEOUT_MASK		0x000000E0
-#define IOMMU_CONTROL_INVALIDATION_TIMEOUT_SHIFT	5
-#define IOMMU_CONTROL_PASS_POSTED_WRITE_MASK		0x00000100
-#define IOMMU_CONTROL_PASS_POSTED_WRITE_SHIFT		8
-#define IOMMU_CONTROL_RESP_PASS_POSTED_WRITE_MASK	0x00000200
-#define IOMMU_CONTROL_RESP_PASS_POSTED_WRITE_SHIFT	9
-#define IOMMU_CONTROL_COHERENT_MASK			0x00000400
-#define IOMMU_CONTROL_COHERENT_SHIFT			10
-#define IOMMU_CONTROL_ISOCHRONOUS_MASK			0x00000800
-#define IOMMU_CONTROL_ISOCHRONOUS_SHIFT			11
-#define IOMMU_CONTROL_COMMAND_BUFFER_ENABLE_MASK	0x00001000
-#define IOMMU_CONTROL_COMMAND_BUFFER_ENABLE_SHIFT	12
-#define IOMMU_CONTROL_PPR_LOG_ENABLE_MASK		0x00002000
-#define IOMMU_CONTROL_PPR_LOG_ENABLE_SHIFT		13
-#define IOMMU_CONTROL_PPR_LOG_INT_MASK			0x00004000
-#define IOMMU_CONTROL_PPR_LOG_INT_SHIFT			14
-#define IOMMU_CONTROL_PPR_ENABLE_MASK			0x00008000
-#define IOMMU_CONTROL_PPR_ENABLE_SHIFT			15
-#define IOMMU_CONTROL_GT_ENABLE_MASK			0x00010000
-#define IOMMU_CONTROL_GT_ENABLE_SHIFT			16
-#define IOMMU_CONTROL_RESTART_MASK			0x80000000
-#define IOMMU_CONTROL_RESTART_SHIFT			31
+
+union amd_iommu_control {
+    uint64_t raw;
+    struct {
+        unsigned int iommu_en:1;
+        unsigned int ht_tun_en:1;
+        unsigned int event_log_en:1;
+        unsigned int event_int_en:1;
+        unsigned int com_wait_int_en:1;
+        unsigned int inv_timeout:3;
+        unsigned int pass_pw:1;
+        unsigned int res_pass_pw:1;
+        unsigned int coherent:1;
+        unsigned int isoc:1;
+        unsigned int cmd_buf_en:1;
+        unsigned int ppr_log_en:1;
+        unsigned int ppr_int_en:1;
+        unsigned int ppr_en:1;
+        unsigned int gt_en:1;
+        unsigned int ga_en:1;
+        unsigned int crw:4;
+        unsigned int smif_en:1;
+        unsigned int slf_wb_dis:1;
+        unsigned int smif_log_en:1;
+        unsigned int gam_en:3;
+        unsigned int ga_log_en:1;
+        unsigned int ga_int_en:1;
+        unsigned int dual_ppr_log_en:2;
+        unsigned int dual_event_log_en:2;
+        unsigned int dev_tbl_seg_en:3;
+        unsigned int priv_abrt_en:2;
+        unsigned int ppr_auto_rsp_en:1;
+        unsigned int marc_en:1;
+        unsigned int blk_stop_mrk_en:1;
+        unsigned int ppr_auto_rsp_aon:1;
+        unsigned int domain_id_pne:1;
+        unsigned int :1;
+        unsigned int eph_en:1;
+        unsigned int had_update:2;
+        unsigned int gd_update_dis:1;
+        unsigned int :1;
+        unsigned int xt_en:1;
+        unsigned int int_cap_xt_en:1;
+        unsigned int vcmd_en:1;
+        unsigned int viommu_en:1;
+        unsigned int ga_update_dis:1;
+        unsigned int gappi_en:1;
+        unsigned int :8;
+    };
+};
 
 /* Exclusion Register */
 #define IOMMU_EXCLUSION_BASE_LOW_OFFSET		0x20




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 03/10] AMD/IOMMU: use bit field for control register
Posted by Andrew Cooper 4 years, 9 months ago
On 27/06/2019 16:20, Jan Beulich wrote:
> Also introduce a field in struct amd_iommu caching the most recently
> written control register. All writes should now happen exclusively from
> that cached value, such that it is guaranteed to be up to date.
>
> Take the opportunity and add further fields. Also convert a few boolean
> function parameters to bool, such that use of !! can be avoided.
>
> Because of there now being definitions beyond bit 31, writel() also gets
> replaced by writeq() when updating hardware.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> v2: Add domain_id_pne field. Mention writel() -> writeq() change.

Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>, subject to the
resolution of similarities with the previous patch.

I'm still concerned that not using bool bitfields is a recipe for a
subtle mistakes.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v2 04/10] AMD/IOMMU: use bit field for IRTE
Posted by Jan Beulich 4 years, 10 months ago
At the same time restrict its scope to just the single source file
actually using it, and abstract accesses by introducing a union of
pointers. (A union of the actual table entries is not used to make it
impossible to [wrongly, once the 128-bit form gets added] perform
pointer arithmetic / array accesses on derived types.)

Also move away from updating the entries piecemeal: Construct a full new
entry, and write it out.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: name {get,free}_intremap_entry()'s last parameter "index" instead of
    "offset". Introduce union irte32.
---
It would have been nice to use write_atomic() or ACCESS_ONCE() for the
actual writes, but both cast the value to a scalar one, which doesn't
suit us here (and I also didn't want to make the compound type a union
with a raw member just for this).

--- a/xen/drivers/passthrough/amd/iommu_intr.c
+++ b/xen/drivers/passthrough/amd/iommu_intr.c
@@ -23,6 +23,28 @@
 #include <asm/io_apic.h>
 #include <xen/keyhandler.h>
 
+struct irte_basic {
+    unsigned int remap_en:1;
+    unsigned int sup_io_pf:1;
+    unsigned int int_type:3;
+    unsigned int rq_eoi:1;
+    unsigned int dm:1;
+    unsigned int guest_mode:1; /* MBZ */
+    unsigned int dest:8;
+    unsigned int vector:8;
+    unsigned int :8;
+};
+
+union irte32 {
+    uint32_t raw[1];
+    struct irte_basic basic;
+};
+
+union irte_ptr {
+    void *ptr;
+    union irte32 *ptr32;
+};
+
 #define INTREMAP_TABLE_ORDER    1
 #define INTREMAP_LENGTH 0xB
 #define INTREMAP_ENTRIES (1 << INTREMAP_LENGTH)
@@ -101,47 +123,46 @@ static unsigned int alloc_intremap_entry
     return slot;
 }
 
-static u32 *get_intremap_entry(int seg, int bdf, int offset)
+static union irte_ptr get_intremap_entry(unsigned int seg, unsigned int bdf,
+                                         unsigned int index)
 {
-    u32 *table = get_ivrs_mappings(seg)[bdf].intremap_table;
+    union irte_ptr table = {
+        .ptr = get_ivrs_mappings(seg)[bdf].intremap_table
+    };
+
+    ASSERT(table.ptr && (index < INTREMAP_ENTRIES));
 
-    ASSERT( (table != NULL) && (offset < INTREMAP_ENTRIES) );
+    table.ptr32 += index;
 
-    return table + offset;
+    return table;
 }
 
-static void free_intremap_entry(int seg, int bdf, int offset)
-{
-    u32 *entry = get_intremap_entry(seg, bdf, offset);
-
-    memset(entry, 0, sizeof(u32));
-    __clear_bit(offset, get_ivrs_mappings(seg)[bdf].intremap_inuse);
-}
-
-static void update_intremap_entry(u32* entry, u8 vector, u8 int_type,
-    u8 dest_mode, u8 dest)
-{
-    set_field_in_reg_u32(IOMMU_CONTROL_ENABLED, 0,
-                            INT_REMAP_ENTRY_REMAPEN_MASK,
-                            INT_REMAP_ENTRY_REMAPEN_SHIFT, entry);
-    set_field_in_reg_u32(IOMMU_CONTROL_DISABLED, *entry,
-                            INT_REMAP_ENTRY_SUPIOPF_MASK,
-                            INT_REMAP_ENTRY_SUPIOPF_SHIFT, entry);
-    set_field_in_reg_u32(int_type, *entry,
-                            INT_REMAP_ENTRY_INTTYPE_MASK,
-                            INT_REMAP_ENTRY_INTTYPE_SHIFT, entry);
-    set_field_in_reg_u32(IOMMU_CONTROL_DISABLED, *entry,
-                            INT_REMAP_ENTRY_REQEOI_MASK,
-                            INT_REMAP_ENTRY_REQEOI_SHIFT, entry);
-    set_field_in_reg_u32((u32)dest_mode, *entry,
-                            INT_REMAP_ENTRY_DM_MASK,
-                            INT_REMAP_ENTRY_DM_SHIFT, entry);
-    set_field_in_reg_u32((u32)dest, *entry,
-                            INT_REMAP_ENTRY_DEST_MAST,
-                            INT_REMAP_ENTRY_DEST_SHIFT, entry);
-    set_field_in_reg_u32((u32)vector, *entry,
-                            INT_REMAP_ENTRY_VECTOR_MASK,
-                            INT_REMAP_ENTRY_VECTOR_SHIFT, entry);
+static void free_intremap_entry(unsigned int seg, unsigned int bdf,
+                                unsigned int index)
+{
+    union irte_ptr entry = get_intremap_entry(seg, bdf, index);
+
+    ACCESS_ONCE(entry.ptr32->raw[0]) = 0;
+
+    __clear_bit(index, get_ivrs_mappings(seg)[bdf].intremap_inuse);
+}
+
+static void update_intremap_entry(union irte_ptr entry, unsigned int vector,
+                                  unsigned int int_type,
+                                  unsigned int dest_mode, unsigned int dest)
+{
+    struct irte_basic basic = {
+        .remap_en = 1,
+        .sup_io_pf = 0,
+        .int_type = int_type,
+        .rq_eoi = 0,
+        .dm = dest_mode,
+        .dest = dest,
+        .vector = vector,
+    };
+
+    ACCESS_ONCE(entry.ptr32->raw[0]) =
+        container_of(&basic, union irte32, basic)->raw[0];
 }
 
 static inline int get_rte_index(const struct IO_APIC_route_entry *rte)
@@ -163,7 +184,7 @@ static int update_intremap_entry_from_io
     u16 *index)
 {
     unsigned long flags;
-    u32* entry;
+    union irte_ptr entry;
     u8 delivery_mode, dest, vector, dest_mode;
     int req_id;
     spinlock_t *lock;
@@ -201,12 +222,8 @@ static int update_intremap_entry_from_io
          * so need to recover vector and delivery mode from IRTE.
          */
         ASSERT(get_rte_index(rte) == offset);
-        vector = get_field_from_reg_u32(*entry,
-                                        INT_REMAP_ENTRY_VECTOR_MASK,
-                                        INT_REMAP_ENTRY_VECTOR_SHIFT);
-        delivery_mode = get_field_from_reg_u32(*entry,
-                                               INT_REMAP_ENTRY_INTTYPE_MASK,
-                                               INT_REMAP_ENTRY_INTTYPE_SHIFT);
+        vector = entry.ptr32->basic.vector;
+        delivery_mode = entry.ptr32->basic.int_type;
     }
     update_intremap_entry(entry, vector, delivery_mode, dest_mode, dest);
 
@@ -228,7 +245,7 @@ int __init amd_iommu_setup_ioapic_remapp
 {
     struct IO_APIC_route_entry rte;
     unsigned long flags;
-    u32* entry;
+    union irte_ptr entry;
     int apic, pin;
     u8 delivery_mode, dest, vector, dest_mode;
     u16 seg, bdf, req_id;
@@ -407,16 +424,14 @@ unsigned int amd_iommu_read_ioapic_from_
         u16 bdf = ioapic_sbdf[idx].bdf;
         u16 seg = ioapic_sbdf[idx].seg;
         u16 req_id = get_intremap_requestor_id(seg, bdf);
-        const u32 *entry = get_intremap_entry(seg, req_id, offset);
+        union irte_ptr entry = get_intremap_entry(seg, req_id, offset);
 
         ASSERT(offset == (val & (INTREMAP_ENTRIES - 1)));
         val &= ~(INTREMAP_ENTRIES - 1);
-        val |= get_field_from_reg_u32(*entry,
-                                      INT_REMAP_ENTRY_INTTYPE_MASK,
-                                      INT_REMAP_ENTRY_INTTYPE_SHIFT) << 8;
-        val |= get_field_from_reg_u32(*entry,
-                                      INT_REMAP_ENTRY_VECTOR_MASK,
-                                      INT_REMAP_ENTRY_VECTOR_SHIFT);
+        val |= MASK_INSR(entry.ptr32->basic.int_type,
+                         IO_APIC_REDIR_DELIV_MODE_MASK);
+        val |= MASK_INSR(entry.ptr32->basic.vector,
+                         IO_APIC_REDIR_VECTOR_MASK);
     }
 
     return val;
@@ -427,7 +442,7 @@ static int update_intremap_entry_from_ms
     int *remap_index, const struct msi_msg *msg, u32 *data)
 {
     unsigned long flags;
-    u32* entry;
+    union irte_ptr entry;
     u16 req_id, alias_id;
     u8 delivery_mode, dest, vector, dest_mode;
     spinlock_t *lock;
@@ -581,7 +596,7 @@ void amd_iommu_read_msi_from_ire(
     const struct pci_dev *pdev = msi_desc->dev;
     u16 bdf = pdev ? PCI_BDF2(pdev->bus, pdev->devfn) : hpet_sbdf.bdf;
     u16 seg = pdev ? pdev->seg : hpet_sbdf.seg;
-    const u32 *entry;
+    union irte_ptr entry;
 
     if ( IS_ERR_OR_NULL(_find_iommu_for_device(seg, bdf)) )
         return;
@@ -597,12 +612,10 @@ void amd_iommu_read_msi_from_ire(
     }
 
     msg->data &= ~(INTREMAP_ENTRIES - 1);
-    msg->data |= get_field_from_reg_u32(*entry,
-                                        INT_REMAP_ENTRY_INTTYPE_MASK,
-                                        INT_REMAP_ENTRY_INTTYPE_SHIFT) << 8;
-    msg->data |= get_field_from_reg_u32(*entry,
-                                        INT_REMAP_ENTRY_VECTOR_MASK,
-                                        INT_REMAP_ENTRY_VECTOR_SHIFT);
+    msg->data |= MASK_INSR(entry.ptr32->basic.int_type,
+                           MSI_DATA_DELIVERY_MODE_MASK);
+    msg->data |= MASK_INSR(entry.ptr32->basic.vector,
+                           MSI_DATA_VECTOR_MASK);
 }
 
 int __init amd_iommu_free_intremap_table(
--- a/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
+++ b/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
@@ -469,22 +469,6 @@ struct amd_iommu_pte {
 #define IOMMU_CONTROL_DISABLED	0
 #define IOMMU_CONTROL_ENABLED	1
 
-/* interrupt remapping table */
-#define INT_REMAP_ENTRY_REMAPEN_MASK    0x00000001
-#define INT_REMAP_ENTRY_REMAPEN_SHIFT   0
-#define INT_REMAP_ENTRY_SUPIOPF_MASK    0x00000002
-#define INT_REMAP_ENTRY_SUPIOPF_SHIFT   1
-#define INT_REMAP_ENTRY_INTTYPE_MASK    0x0000001C
-#define INT_REMAP_ENTRY_INTTYPE_SHIFT   2
-#define INT_REMAP_ENTRY_REQEOI_MASK     0x00000020
-#define INT_REMAP_ENTRY_REQEOI_SHIFT    5
-#define INT_REMAP_ENTRY_DM_MASK         0x00000040
-#define INT_REMAP_ENTRY_DM_SHIFT        6
-#define INT_REMAP_ENTRY_DEST_MAST       0x0000FF00
-#define INT_REMAP_ENTRY_DEST_SHIFT      8
-#define INT_REMAP_ENTRY_VECTOR_MASK     0x00FF0000
-#define INT_REMAP_ENTRY_VECTOR_SHIFT    16
-
 #define INV_IOMMU_ALL_PAGES_ADDRESS      ((1ULL << 63) - 1)
 
 #define IOMMU_RING_BUFFER_PTR_MASK                  0x0007FFF0




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 04/10] AMD/IOMMU: use bit field for IRTE
Posted by Andrew Cooper 4 years, 9 months ago
On 27/06/2019 16:20, Jan Beulich wrote:
> At the same time restrict its scope to just the single source file
> actually using it, and abstract accesses by introducing a union of
> pointers. (A union of the actual table entries is not used to make it
> impossible to [wrongly, once the 128-bit form gets added] perform
> pointer arithmetic / array accesses on derived types.)
>
> Also move away from updating the entries piecemeal: Construct a full new
> entry, and write it out.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> v2: name {get,free}_intremap_entry()'s last parameter "index" instead of
>     "offset". Introduce union irte32.
> ---
> It would have been nice to use write_atomic() or ACCESS_ONCE() for the
> actual writes, but both cast the value to a scalar one, which doesn't
> suit us here (and I also didn't want to make the compound type a union
> with a raw member just for this).

This comment is stale.  However, I'm still confused as to what the
problem with putting a raw in union irte_basic is.

In particular, the containerof() usage is complicated to follow, and I
don't see it as being necessary.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 04/10] AMD/IOMMU: use bit field for IRTE
Posted by Jan Beulich 4 years, 9 months ago
On 02.07.2019 14:33, Andrew Cooper wrote:
> On 27/06/2019 16:20, Jan Beulich wrote:
>> At the same time restrict its scope to just the single source file
>> actually using it, and abstract accesses by introducing a union of
>> pointers. (A union of the actual table entries is not used to make it
>> impossible to [wrongly, once the 128-bit form gets added] perform
>> pointer arithmetic / array accesses on derived types.)
>>
>> Also move away from updating the entries piecemeal: Construct a full new
>> entry, and write it out.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> v2: name {get,free}_intremap_entry()'s last parameter "index" instead of
>>      "offset". Introduce union irte32.
>> ---
>> It would have been nice to use write_atomic() or ACCESS_ONCE() for the
>> actual writes, but both cast the value to a scalar one, which doesn't
>> suit us here (and I also didn't want to make the compound type a union
>> with a raw member just for this).
> 
> This comment is stale.  However, I'm still confused as to what the
> problem with putting a raw in union irte_basic is.

That'll again require an intermediate "flds" (or however we choose to
name it) union field name for the bitfield structure, or else once
again initializers won't work with old gcc.

> In particular, the containerof() usage is complicated to follow, and I
> don't see it as being necessary.

Well, I can drop it if we're happy about the extra intermediate field
name (personally I'm not, but I'd accept it if it's considered less bad
than the containerof() approach).

Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v2 05/10] AMD/IOMMU: introduce 128-bit IRTE non-guest-APIC IRTE format
Posted by Jan Beulich 4 years, 10 months ago
This is in preparation of actually enabling x2APIC mode, which requires
this wider IRTE format to be used.

A specific remark regarding the first hunk changing
amd_iommu_ioapic_update_ire(): This bypass was introduced for XSA-36,
i.e. by 94d4a1119d ("AMD,IOMMU: Clean up old entries in remapping
tables when creating new one"). Other code introduced by that change has
meanwhile disappeared or further changed, and I wonder if - rather than
adding an x2apic_enabled check to the conditional - the bypass couldn't
be deleted altogether. For now the goal is to affect the non-x2APIC
paths as little as possible.

Take the liberty and use the new "fresh" flag to suppress an unneeded
flush in update_intremap_entry_from_ioapic().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: Add cast in get_full_dest(). Re-base over changes earlier in the
    series. Don't use cmpxchg16b. Use barrier() instead of wmb().
---
Note that AMD's doc says Lowest Priority ("Arbitrated" by their naming)
mode is unavailable in x2APIC mode, but they've confirmed this to be a
mistake on their part.

--- a/xen/drivers/passthrough/amd/iommu_intr.c
+++ b/xen/drivers/passthrough/amd/iommu_intr.c
@@ -40,12 +40,45 @@ union irte32 {
     struct irte_basic basic;
 };
 
+struct irte_full {
+    unsigned int remap_en:1;
+    unsigned int sup_io_pf:1;
+    unsigned int int_type:3;
+    unsigned int rq_eoi:1;
+    unsigned int dm:1;
+    unsigned int guest_mode:1; /* MBZ */
+    unsigned int dest_lo:24;
+    unsigned int :32;
+    unsigned int vector:8;
+    unsigned int :24;
+    unsigned int :24;
+    unsigned int dest_hi:8;
+};
+
+union irte128 {
+    uint64_t raw[2];
+    struct irte_full full;
+};
+
+static enum {
+    irte32,
+    irte128,
+    irteUNK,
+} irte_mode __read_mostly = irteUNK;
+
 union irte_ptr {
     void *ptr;
     union irte32 *ptr32;
+    union irte128 *ptr128;
 };
 
-#define INTREMAP_TABLE_ORDER    1
+union irte_cptr {
+    const void *ptr;
+    const union irte32 *ptr32;
+    const union irte128 *ptr128;
+} __transparent__;
+
+#define INTREMAP_TABLE_ORDER (irte_mode == irte32 ? 1 : 3)
 #define INTREMAP_LENGTH 0xB
 #define INTREMAP_ENTRIES (1 << INTREMAP_LENGTH)
 
@@ -132,7 +165,19 @@ static union irte_ptr get_intremap_entry
 
     ASSERT(table.ptr && (index < INTREMAP_ENTRIES));
 
-    table.ptr32 += index;
+    switch ( irte_mode )
+    {
+    case irte32:
+        table.ptr32 += index;
+        break;
+
+    case irte128:
+        table.ptr128 += index;
+        break;
+
+    default:
+        ASSERT_UNREACHABLE();
+    }
 
     return table;
 }
@@ -142,7 +187,21 @@ static void free_intremap_entry(unsigned
 {
     union irte_ptr entry = get_intremap_entry(seg, bdf, index);
 
-    ACCESS_ONCE(entry.ptr32->raw[0]) = 0;
+    switch ( irte_mode )
+    {
+    case irte32:
+        ACCESS_ONCE(entry.ptr32->raw[0]) = 0;
+        break;
+
+    case irte128:
+        ACCESS_ONCE(entry.ptr128->raw[0]) = 0;
+        barrier();
+        entry.ptr128->raw[1] = 0;
+        break;
+
+    default:
+        ASSERT_UNREACHABLE();
+    }
 
     __clear_bit(index, get_ivrs_mappings(seg)[bdf].intremap_inuse);
 }
@@ -160,9 +219,37 @@ static void update_intremap_entry(union
         .dest = dest,
         .vector = vector,
     };
+    struct irte_full full = {
+        .remap_en = 1,
+        .sup_io_pf = 0,
+        .int_type = int_type,
+        .rq_eoi = 0,
+        .dm = dest_mode,
+        .dest_lo = dest,
+        .dest_hi = dest >> 24,
+        .vector = vector,
+    };
+
+    switch ( irte_mode )
+    {
+    case irte32:
+        ACCESS_ONCE(entry.ptr32->raw[0]) =
+            container_of(&basic, union irte32, basic)->raw[0];
+        break;
+
+    case irte128:
+        ACCESS_ONCE(entry.ptr128->raw[0]) = 0;
+        barrier();
+        entry.ptr128->raw[1] =
+            container_of(&full, union irte128, full)->raw[1];
+        barrier();
+        ACCESS_ONCE(entry.ptr128->raw[0]) =
+            container_of(&full, union irte128, full)->raw[0];
+        break;
 
-    ACCESS_ONCE(entry.ptr32->raw[0]) =
-        container_of(&basic, union irte32, basic)->raw[0];
+    default:
+        ASSERT_UNREACHABLE();
+    }
 }
 
 static inline int get_rte_index(const struct IO_APIC_route_entry *rte)
@@ -176,6 +263,11 @@ static inline void set_rte_index(struct
     rte->delivery_mode = offset >> 8;
 }
 
+static inline unsigned int get_full_dest(const union irte128 *entry)
+{
+    return entry->full.dest_lo | ((unsigned int)entry->full.dest_hi << 24);
+}
+
 static int update_intremap_entry_from_ioapic(
     int bdf,
     struct amd_iommu *iommu,
@@ -185,10 +277,11 @@ static int update_intremap_entry_from_io
 {
     unsigned long flags;
     union irte_ptr entry;
-    u8 delivery_mode, dest, vector, dest_mode;
+    unsigned int delivery_mode, dest, vector, dest_mode;
     int req_id;
     spinlock_t *lock;
     unsigned int offset;
+    bool fresh = false;
 
     req_id = get_intremap_requestor_id(iommu->seg, bdf);
     lock = get_intremap_lock(iommu->seg, req_id);
@@ -196,7 +289,7 @@ static int update_intremap_entry_from_io
     delivery_mode = rte->delivery_mode;
     vector = rte->vector;
     dest_mode = rte->dest_mode;
-    dest = rte->dest.logical.logical_dest;
+    dest = x2apic_enabled ? rte->dest.dest32 : rte->dest.logical.logical_dest;
 
     spin_lock_irqsave(lock, flags);
 
@@ -211,25 +304,40 @@ static int update_intremap_entry_from_io
             return -ENOSPC;
         }
         *index = offset;
-        lo_update = 1;
+        fresh = true;
     }
 
     entry = get_intremap_entry(iommu->seg, req_id, offset);
-    if ( !lo_update )
+    if ( fresh )
+        /* nothing */;
+    else if ( !lo_update )
     {
         /*
          * Low half of incoming RTE is already in remapped format,
          * so need to recover vector and delivery mode from IRTE.
          */
         ASSERT(get_rte_index(rte) == offset);
-        vector = entry.ptr32->basic.vector;
+        if ( irte_mode == irte32 )
+            vector = entry.ptr32->basic.vector;
+        else
+            vector = entry.ptr128->full.vector;
+        /* The IntType fields match for both formats. */
         delivery_mode = entry.ptr32->basic.int_type;
     }
+    else if ( x2apic_enabled )
+    {
+        /*
+         * High half of incoming RTE was read from the I/O APIC and hence may
+         * not hold the full destination, so need to recover full destination
+         * from IRTE.
+         */
+        dest = get_full_dest(entry.ptr128);
+    }
     update_intremap_entry(entry, vector, delivery_mode, dest_mode, dest);
 
     spin_unlock_irqrestore(lock, flags);
 
-    if ( iommu->enabled )
+    if ( iommu->enabled && !fresh )
     {
         spin_lock_irqsave(&iommu->lock, flags);
         amd_iommu_flush_intremap(iommu, req_id);
@@ -253,6 +361,19 @@ int __init amd_iommu_setup_ioapic_remapp
     spinlock_t *lock;
     unsigned int offset;
 
+    for_each_amd_iommu ( iommu )
+    {
+        if ( irte_mode != irteUNK )
+        {
+            if ( iommu->ctrl.ga_en == (irte_mode == irte32) )
+                return -ENXIO;
+        }
+        else if ( iommu->ctrl.ga_en )
+            irte_mode = irte128;
+        else
+            irte_mode = irte32;
+    }
+
     /* Read ioapic entries and update interrupt remapping table accordingly */
     for ( apic = 0; apic < nr_ioapics; apic++ )
     {
@@ -287,6 +408,18 @@ int __init amd_iommu_setup_ioapic_remapp
             dest_mode = rte.dest_mode;
             dest = rte.dest.logical.logical_dest;
 
+            if ( iommu->ctrl.xt_en )
+            {
+                /*
+                 * In x2APIC mode we have no way of discovering the high 24
+                 * bits of the destination of an already enabled interrupt.
+                 * We come here earlier than for xAPIC mode, so no interrupts
+                 * should have been set up before.
+                 */
+                AMD_IOMMU_DEBUG("Unmasked IO-APIC#%u entry %u in x2APIC mode\n",
+                                IO_APIC_ID(apic), pin);
+            }
+
             spin_lock_irqsave(lock, flags);
             offset = alloc_intremap_entry(seg, req_id, 1);
             BUG_ON(offset >= INTREMAP_ENTRIES);
@@ -321,7 +454,8 @@ void amd_iommu_ioapic_update_ire(
     struct IO_APIC_route_entry new_rte = { 0 };
     unsigned int rte_lo = (reg & 1) ? reg - 1 : reg;
     unsigned int pin = (reg - 0x10) / 2;
-    int saved_mask, seg, bdf, rc;
+    int seg, bdf, rc;
+    bool saved_mask, fresh = false;
     struct amd_iommu *iommu;
     unsigned int idx;
 
@@ -363,12 +497,22 @@ void amd_iommu_ioapic_update_ire(
         *(((u32 *)&new_rte) + 1) = value;
     }
 
-    if ( new_rte.mask &&
-         ioapic_sbdf[idx].pin_2_idx[pin] >= INTREMAP_ENTRIES )
+    if ( ioapic_sbdf[idx].pin_2_idx[pin] >= INTREMAP_ENTRIES )
     {
         ASSERT(saved_mask);
-        __io_apic_write(apic, reg, value);
-        return;
+
+        /*
+         * There's nowhere except the IRTE to store a full 32-bit destination,
+         * so we may not bypass entry allocation and updating of the low RTE
+         * half in the (usual) case of the high RTE half getting written first.
+         */
+        if ( new_rte.mask && !x2apic_enabled )
+        {
+            __io_apic_write(apic, reg, value);
+            return;
+        }
+
+        fresh = true;
     }
 
     /* mask the interrupt while we change the intremap table */
@@ -397,8 +541,12 @@ void amd_iommu_ioapic_update_ire(
     if ( reg == rte_lo )
         return;
 
-    /* unmask the interrupt after we have updated the intremap table */
-    if ( !saved_mask )
+    /*
+     * Unmask the interrupt after we have updated the intremap table. Also
+     * write the low half if a fresh entry was allocated for a high half
+     * update in x2APIC mode.
+     */
+    if ( !saved_mask || (x2apic_enabled && fresh) )
     {
         old_rte.mask = saved_mask;
         __io_apic_write(apic, rte_lo, *((u32 *)&old_rte));
@@ -412,27 +560,36 @@ unsigned int amd_iommu_read_ioapic_from_
     unsigned int offset;
     unsigned int val = __io_apic_read(apic, reg);
     unsigned int pin = (reg - 0x10) / 2;
+    uint16_t seg, req_id;
+    union irte_ptr entry;
 
     idx = ioapic_id_to_index(IO_APIC_ID(apic));
     if ( idx == MAX_IO_APICS )
         return -EINVAL;
 
     offset = ioapic_sbdf[idx].pin_2_idx[pin];
+    if ( offset >= INTREMAP_ENTRIES )
+        return val;
 
-    if ( !(reg & 1) && offset < INTREMAP_ENTRIES )
+    seg = ioapic_sbdf[idx].seg;
+    req_id = get_intremap_requestor_id(seg, ioapic_sbdf[idx].bdf);
+    entry = get_intremap_entry(seg, req_id, offset);
+
+    if ( !(reg & 1) )
     {
-        u16 bdf = ioapic_sbdf[idx].bdf;
-        u16 seg = ioapic_sbdf[idx].seg;
-        u16 req_id = get_intremap_requestor_id(seg, bdf);
-        union irte_ptr entry = get_intremap_entry(seg, req_id, offset);
 
         ASSERT(offset == (val & (INTREMAP_ENTRIES - 1)));
         val &= ~(INTREMAP_ENTRIES - 1);
+        /* The IntType fields match for both formats. */
         val |= MASK_INSR(entry.ptr32->basic.int_type,
                          IO_APIC_REDIR_DELIV_MODE_MASK);
-        val |= MASK_INSR(entry.ptr32->basic.vector,
+        val |= MASK_INSR(irte_mode == irte32
+                         ? entry.ptr32->basic.vector
+                         : entry.ptr128->full.vector,
                          IO_APIC_REDIR_VECTOR_MASK);
     }
+    else if ( x2apic_enabled )
+        val = get_full_dest(entry.ptr128);
 
     return val;
 }
@@ -444,9 +601,9 @@ static int update_intremap_entry_from_ms
     unsigned long flags;
     union irte_ptr entry;
     u16 req_id, alias_id;
-    u8 delivery_mode, dest, vector, dest_mode;
+    uint8_t delivery_mode, vector, dest_mode;
     spinlock_t *lock;
-    unsigned int offset, i;
+    unsigned int dest, offset, i;
 
     req_id = get_dma_requestor_id(iommu->seg, bdf);
     alias_id = get_intremap_requestor_id(iommu->seg, bdf);
@@ -467,7 +624,12 @@ static int update_intremap_entry_from_ms
     dest_mode = (msg->address_lo >> MSI_ADDR_DESTMODE_SHIFT) & 0x1;
     delivery_mode = (msg->data >> MSI_DATA_DELIVERY_MODE_SHIFT) & 0x1;
     vector = (msg->data >> MSI_DATA_VECTOR_SHIFT) & MSI_DATA_VECTOR_MASK;
-    dest = (msg->address_lo >> MSI_ADDR_DEST_ID_SHIFT) & 0xff;
+
+    if ( x2apic_enabled )
+        dest = msg->dest32;
+    else
+        dest = MASK_EXTR(msg->address_lo, MSI_ADDR_DEST_ID_MASK);
+
     offset = *remap_index;
     if ( offset >= INTREMAP_ENTRIES )
     {
@@ -612,10 +774,21 @@ void amd_iommu_read_msi_from_ire(
     }
 
     msg->data &= ~(INTREMAP_ENTRIES - 1);
+    /* The IntType fields match for both formats. */
     msg->data |= MASK_INSR(entry.ptr32->basic.int_type,
                            MSI_DATA_DELIVERY_MODE_MASK);
-    msg->data |= MASK_INSR(entry.ptr32->basic.vector,
-                           MSI_DATA_VECTOR_MASK);
+    if ( irte_mode == irte32 )
+    {
+        msg->data |= MASK_INSR(entry.ptr32->basic.vector,
+                               MSI_DATA_VECTOR_MASK);
+        msg->dest32 = entry.ptr32->basic.dest;
+    }
+    else
+    {
+        msg->data |= MASK_INSR(entry.ptr128->full.vector,
+                               MSI_DATA_VECTOR_MASK);
+        msg->dest32 = get_full_dest(entry.ptr128);
+    }
 }
 
 int __init amd_iommu_free_intremap_table(
@@ -678,18 +851,28 @@ int __init amd_setup_hpet_msi(struct msi
     return rc;
 }
 
-static void dump_intremap_table(const u32 *table)
+static void dump_intremap_table(union irte_cptr tbl)
 {
-    u32 count;
+    unsigned int count;
 
-    if ( !table )
+    if ( !tbl.ptr || irte_mode == irteUNK )
         return;
 
     for ( count = 0; count < INTREMAP_ENTRIES; count++ )
     {
-        if ( !table[count] )
-            continue;
-        printk("    IRTE[%03x] %08x\n", count, table[count]);
+        if ( irte_mode == irte32 )
+        {
+            if ( !tbl.ptr32[count].raw[0] )
+                continue;
+            printk("    IRTE[%03x] %08x\n", count, tbl.ptr32[count].raw[0]);
+        }
+        else
+        {
+            if ( !tbl.ptr128[count].raw[0] && !tbl.ptr128[count].raw[1] )
+                continue;
+            printk("    IRTE[%03x] %016lx_%016lx\n",
+                   count, tbl.ptr128[count].raw[1], tbl.ptr128[count].raw[0]);
+        }
     }
 }
 




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 05/10] AMD/IOMMU: introduce 128-bit IRTE non-guest-APIC IRTE format
Posted by Andrew Cooper 4 years, 9 months ago
On 27/06/2019 16:21, Jan Beulich wrote:
> --- a/xen/drivers/passthrough/amd/iommu_intr.c
> +++ b/xen/drivers/passthrough/amd/iommu_intr.c
> @@ -40,12 +40,45 @@ union irte32 {
>
> -#define INTREMAP_TABLE_ORDER    1
> +union irte_cptr {
> +    const void *ptr;
> +    const union irte32 *ptr32;
> +    const union irte128 *ptr128;
> +} __transparent__;
> +
> +#define INTREMAP_TABLE_ORDER (irte_mode == irte32 ? 1 : 3)

This is problematic for irte_mode == irteUNK.  As this "constant" is
used in exactly two places, I'd suggest a tiny static function along the
same lines as {get,update}_intremap_entry(), which can sensibly prevent
code looking for a size before irte_mode is set up.

> @@ -142,7 +187,21 @@ static void free_intremap_entry(unsigned
>  {
>      union irte_ptr entry = get_intremap_entry(seg, bdf, index);
>  
> -    ACCESS_ONCE(entry.ptr32->raw[0]) = 0;
> +    switch ( irte_mode )
> +    {
> +    case irte32:
> +        ACCESS_ONCE(entry.ptr32->raw[0]) = 0;
> +        break;
> +
> +    case irte128:
> +        ACCESS_ONCE(entry.ptr128->raw[0]) = 0;
> +        barrier();

smp_wmb().

Using barrier here isn't technically correct, because what matters is
the external visibility of the write.

It functions correctly on x86 because smp_wmb() is barrier(), but this
code doesn't work correctly on e.g. ARM.

I'd go further and leave an explanation.

smp_wmb(); /* Ensure the clear of .remap_en is visible to the IOMMU
first. */

> @@ -444,9 +601,9 @@ static int update_intremap_entry_from_ms
>      unsigned long flags;
>      union irte_ptr entry;
>      u16 req_id, alias_id;
> -    u8 delivery_mode, dest, vector, dest_mode;
> +    uint8_t delivery_mode, vector, dest_mode;

For the ioapic version, you used unsigned int, rather than uint8_t.  I'd
expect them to at least be consistent.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 05/10] AMD/IOMMU: introduce 128-bit IRTE non-guest-APIC IRTE format
Posted by Jan Beulich 4 years, 9 months ago
On 02.07.2019 16:41, Andrew Cooper wrote:
> On 27/06/2019 16:21, Jan Beulich wrote:
>> --- a/xen/drivers/passthrough/amd/iommu_intr.c
>> +++ b/xen/drivers/passthrough/amd/iommu_intr.c
>> @@ -40,12 +40,45 @@ union irte32 {
>>
>> -#define INTREMAP_TABLE_ORDER    1
>> +union irte_cptr {
>> +    const void *ptr;
>> +    const union irte32 *ptr32;
>> +    const union irte128 *ptr128;
>> +} __transparent__;
>> +
>> +#define INTREMAP_TABLE_ORDER (irte_mode == irte32 ? 1 : 3)
> 
> This is problematic for irte_mode == irteUNK.  As this "constant" is
> used in exactly two places, I'd suggest a tiny static function along the
> same lines as {get,update}_intremap_entry(), which can sensibly prevent
> code looking for a size before irte_mode is set up.

This was indeed a problem, and requires quite a bit of further rework:
Things only worked (almost) correctly because for irteUNK we'd also set
up a table fitting 128-bit entries. The issue is that
amd_iommu_update_ivrs_mapping_acpi() gets called (in the original code
immediately) ahead of amd_iommu_setup_ioapic_remapping(), yet so far it
is the latter what establishes irte_mode.

I'm now trying to figure whether / how far it would be feasible to go
by per-IOMMU settings rather than the global mode indicator. But that
in turn requires setting GAEn earlier. Another option (or maybe an
additional requirement) is to hand through the "xt" flag to further
functions.

Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 05/10] AMD/IOMMU: introduce 128-bit IRTE non-guest-APIC IRTE format
Posted by Jan Beulich 4 years, 9 months ago
On 02.07.2019 16:41, Andrew Cooper wrote:
> On 27/06/2019 16:21, Jan Beulich wrote:
>> @@ -142,7 +187,21 @@ static void free_intremap_entry(unsigned
>>   {
>>       union irte_ptr entry = get_intremap_entry(seg, bdf, index);
>>   
>> -    ACCESS_ONCE(entry.ptr32->raw[0]) = 0;
>> +    switch ( irte_mode )
>> +    {
>> +    case irte32:
>> +        ACCESS_ONCE(entry.ptr32->raw[0]) = 0;
>> +        break;
>> +
>> +    case irte128:
>> +        ACCESS_ONCE(entry.ptr128->raw[0]) = 0;
>> +        barrier();
> 
> smp_wmb().
> 
> Using barrier here isn't technically correct, because what matters is
> the external visibility of the write.
> 
> It functions correctly on x86 because smp_wmb() is barrier(), but this
> code doesn't work correctly on e.g. ARM.

Well, I did reply to a similar earlier comment of yours, and I
had hoped to get a reply from you in turn before actually sending
out v2. As said there, smp_wmb() isn't correct either, yet you
also don't want wmb() here. Even if we don't patch them ourselves,
we should still follow the abstract Linux model and _assume_
smp_*mb() convert to no-op when running on a UP system. The
barrier, however, is needed even in that case.

What I'm okay to do is accompany the barrier() (or, if you insist,
smp_wmb()) use with a comment clarifying that this is fine for x86,
but would need changing if the code was included in builds for
other architectures.

>> @@ -444,9 +601,9 @@ static int update_intremap_entry_from_ms
>>       unsigned long flags;
>>       union irte_ptr entry;
>>       u16 req_id, alias_id;
>> -    u8 delivery_mode, dest, vector, dest_mode;
>> +    uint8_t delivery_mode, vector, dest_mode;
> 
> For the ioapic version, you used unsigned int, rather than uint8_t.  I'd
> expect them to at least be consistent.

The type change on the I/O-APIC side is because "dest" is among
the variables there. But looking at both changes again, I guess
I'll rather use the approach here also in the I/O-APIC function,
moving "dest" down together with "offset".

Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v2 06/10] AMD/IOMMU: split amd_iommu_init_one()
Posted by Jan Beulich 4 years, 10 months ago
Mapping the MMIO space and obtaining feature information needs to happen
slightly earlier, such that for x2APIC support we can set XTEn prior to
calling amd_iommu_update_ivrs_mapping_acpi() and
amd_iommu_setup_ioapic_remapping().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -970,14 +970,6 @@ static void * __init allocate_ppr_log(st
 
 static int __init amd_iommu_init_one(struct amd_iommu *iommu)
 {
-    if ( map_iommu_mmio_region(iommu) != 0 )
-        goto error_out;
-
-    get_iommu_features(iommu);
-
-    if ( iommu->features.raw )
-        iommuv2_enabled = 1;
-
     if ( allocate_cmd_buffer(iommu) == NULL )
         goto error_out;
 
@@ -1197,6 +1189,23 @@ static bool_t __init amd_sp5100_erratum2
     return 0;
 }
 
+static int __init amd_iommu_prepare_one(struct amd_iommu *iommu)
+{
+    int rc = alloc_ivrs_mappings(iommu->seg);
+
+    if ( !rc )
+        rc = map_iommu_mmio_region(iommu);
+    if ( rc )
+        return rc;
+
+    get_iommu_features(iommu);
+
+    if ( iommu->features.raw )
+        iommuv2_enabled = true;
+
+    return 0;
+}
+
 int __init amd_iommu_init(void)
 {
     struct amd_iommu *iommu;
@@ -1227,7 +1236,7 @@ int __init amd_iommu_init(void)
     radix_tree_init(&ivrs_maps);
     for_each_amd_iommu ( iommu )
     {
-        rc = alloc_ivrs_mappings(iommu->seg);
+        rc = amd_iommu_prepare_one(iommu);
         if ( rc )
             goto error_out;
     }





_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v2 07/10] AMD/IOMMU: allow enabling with IRQ not yet set up
Posted by Jan Beulich 4 years, 10 months ago
Early enabling (to enter x2APIC mode) requires deferring of the IRQ
setup. Code to actually do that setup in the x2APIC case will get added
subsequently.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -814,7 +814,6 @@ static void amd_iommu_erratum_746_workar
 static void enable_iommu(struct amd_iommu *iommu)
 {
     unsigned long flags;
-    struct irq_desc *desc;
 
     spin_lock_irqsave(&iommu->lock, flags);
 
@@ -834,19 +833,27 @@ static void enable_iommu(struct amd_iomm
     if ( iommu->features.flds.ppr_sup )
         register_iommu_ppr_log_in_mmio_space(iommu);
 
-    desc = irq_to_desc(iommu->msi.irq);
-    spin_lock(&desc->lock);
-    set_msi_affinity(desc, &cpu_online_map);
-    spin_unlock(&desc->lock);
+    if ( iommu->msi.irq > 0 )
+    {
+        struct irq_desc *desc = irq_to_desc(iommu->msi.irq);
+
+        spin_lock(&desc->lock);
+        set_msi_affinity(desc, &cpu_online_map);
+        spin_unlock(&desc->lock);
+    }
 
     amd_iommu_msi_enable(iommu, IOMMU_CONTROL_ENABLED);
 
     set_iommu_ht_flags(iommu);
     set_iommu_command_buffer_control(iommu, IOMMU_CONTROL_ENABLED);
-    set_iommu_event_log_control(iommu, IOMMU_CONTROL_ENABLED);
 
-    if ( iommu->features.flds.ppr_sup )
-        set_iommu_ppr_log_control(iommu, IOMMU_CONTROL_ENABLED);
+    if ( iommu->msi.irq > 0 )
+    {
+        set_iommu_event_log_control(iommu, IOMMU_CONTROL_ENABLED);
+
+        if ( iommu->features.flds.ppr_sup )
+            set_iommu_ppr_log_control(iommu, IOMMU_CONTROL_ENABLED);
+    }
 
     if ( iommu->features.flds.gt_sup )
         set_iommu_guest_translation_control(iommu, IOMMU_CONTROL_ENABLED);





_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v2 08/10] AMD/IOMMU: adjust setup of internal interrupt for x2APIC mode
Posted by Jan Beulich 4 years, 10 months ago
In order to be able to express all possible destinations we need to make
use of this non-MSI-capability based mechanism. The new IRQ controller
structure can re-use certain MSI functions, though.

For now general and PPR interrupts still share a single vector, IRQ, and
hence handler.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -472,6 +472,44 @@ static hw_irq_controller iommu_maskable_
     .set_affinity = set_msi_affinity,
 };
 
+static void set_x2apic_affinity(struct irq_desc *desc, const cpumask_t *mask)
+{
+    struct amd_iommu *iommu = desc->action->dev_id;
+    unsigned int dest = set_desc_affinity(desc, mask);
+    union amd_iommu_x2apic_control ctrl = {};
+    unsigned long flags;
+
+    if ( dest == BAD_APICID )
+        return;
+
+    msi_compose_msg(desc->arch.vector, NULL, &iommu->msi.msg);
+    iommu->msi.msg.dest32 = dest;
+
+    ctrl.dest_mode = MASK_EXTR(iommu->msi.msg.address_lo,
+                               MSI_ADDR_DESTMODE_MASK);
+    ctrl.int_type = MASK_EXTR(iommu->msi.msg.data,
+                              MSI_DATA_DELIVERY_MODE_MASK);
+    ctrl.vector = desc->arch.vector;
+    ctrl.dest_lo = dest;
+    ctrl.dest_hi = dest >> 24;
+
+    spin_lock_irqsave(&iommu->lock, flags);
+    writeq(ctrl.raw, iommu->mmio_base + IOMMU_XT_INT_CTRL_MMIO_OFFSET);
+    writeq(ctrl.raw, iommu->mmio_base + IOMMU_XT_PPR_INT_CTRL_MMIO_OFFSET);
+    spin_unlock_irqrestore(&iommu->lock, flags);
+}
+
+static hw_irq_controller iommu_x2apic_type = {
+    .typename     = "IOMMU-x2APIC",
+    .startup      = irq_startup_none,
+    .shutdown     = irq_shutdown_none,
+    .enable       = irq_enable_none,
+    .disable      = irq_disable_none,
+    .ack          = ack_nonmaskable_msi_irq,
+    .end          = end_nonmaskable_msi_irq,
+    .set_affinity = set_x2apic_affinity,
+};
+
 static void parse_event_log_entry(struct amd_iommu *iommu, u32 entry[])
 {
     u16 domain_id, device_id, flags;
@@ -726,8 +764,6 @@ static void iommu_interrupt_handler(int
 static bool_t __init set_iommu_interrupt_handler(struct amd_iommu *iommu)
 {
     int irq, ret;
-    hw_irq_controller *handler;
-    u16 control;
 
     irq = create_irq(NUMA_NO_NODE);
     if ( irq <= 0 )
@@ -747,20 +783,43 @@ static bool_t __init set_iommu_interrupt
                         PCI_SLOT(iommu->bdf), PCI_FUNC(iommu->bdf));
         return 0;
     }
-    control = pci_conf_read16(iommu->seg, PCI_BUS(iommu->bdf),
-                              PCI_SLOT(iommu->bdf), PCI_FUNC(iommu->bdf),
-                              iommu->msi.msi_attrib.pos + PCI_MSI_FLAGS);
-    iommu->msi.msi.nvec = 1;
-    if ( is_mask_bit_support(control) )
-    {
-        iommu->msi.msi_attrib.maskbit = 1;
-        iommu->msi.msi.mpos = msi_mask_bits_reg(iommu->msi.msi_attrib.pos,
-                                                is_64bit_address(control));
-        handler = &iommu_maskable_msi_type;
+
+    if ( iommu->ctrl.int_cap_xt_en )
+    {
+        struct irq_desc *desc = irq_to_desc(irq);
+
+        iommu->msi.msi_attrib.pos = MSI_TYPE_IOMMU;
+        iommu->msi.msi_attrib.maskbit = 0;
+        iommu->msi.msi_attrib.is_64 = 1;
+
+        desc->msi_desc = &iommu->msi;
+        desc->handler = &iommu_x2apic_type;
+
+        ret = 0;
     }
     else
-        handler = &iommu_msi_type;
-    ret = __setup_msi_irq(irq_to_desc(irq), &iommu->msi, handler);
+    {
+        hw_irq_controller *handler;
+        u16 control;
+
+        control = pci_conf_read16(iommu->seg, PCI_BUS(iommu->bdf),
+                                  PCI_SLOT(iommu->bdf), PCI_FUNC(iommu->bdf),
+                                  iommu->msi.msi_attrib.pos + PCI_MSI_FLAGS);
+
+        iommu->msi.msi.nvec = 1;
+        if ( is_mask_bit_support(control) )
+        {
+            iommu->msi.msi_attrib.maskbit = 1;
+            iommu->msi.msi.mpos = msi_mask_bits_reg(iommu->msi.msi_attrib.pos,
+                                                    is_64bit_address(control));
+            handler = &iommu_maskable_msi_type;
+        }
+        else
+            handler = &iommu_msi_type;
+
+        ret = __setup_msi_irq(irq_to_desc(irq), &iommu->msi, handler);
+    }
+
     if ( !ret )
         ret = request_irq(irq, 0, iommu_interrupt_handler, "amd_iommu", iommu);
     if ( ret )
@@ -838,8 +897,19 @@ static void enable_iommu(struct amd_iomm
         struct irq_desc *desc = irq_to_desc(iommu->msi.irq);
 
         spin_lock(&desc->lock);
-        set_msi_affinity(desc, &cpu_online_map);
-        spin_unlock(&desc->lock);
+
+        if ( iommu->ctrl.int_cap_xt_en )
+        {
+            set_x2apic_affinity(desc, &cpu_online_map);
+            spin_unlock(&desc->lock);
+        }
+        else
+        {
+            set_msi_affinity(desc, &cpu_online_map);
+            spin_unlock(&desc->lock);
+
+            amd_iommu_msi_enable(iommu, IOMMU_CONTROL_ENABLED);
+        }
     }
 
     amd_iommu_msi_enable(iommu, IOMMU_CONTROL_ENABLED);
@@ -879,7 +949,9 @@ static void disable_iommu(struct amd_iom
         return;
     }
 
-    amd_iommu_msi_enable(iommu, IOMMU_CONTROL_DISABLED);
+    if ( !iommu->ctrl.int_cap_xt_en )
+        amd_iommu_msi_enable(iommu, IOMMU_CONTROL_DISABLED);
+
     set_iommu_command_buffer_control(iommu, IOMMU_CONTROL_DISABLED);
     set_iommu_event_log_control(iommu, IOMMU_CONTROL_DISABLED);
 
--- a/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
+++ b/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
@@ -416,6 +416,25 @@ union amd_iommu_ext_features {
     } flds;
 };
 
+/* x2APIC Control Registers */
+#define IOMMU_XT_INT_CTRL_MMIO_OFFSET		0x0170
+#define IOMMU_XT_PPR_INT_CTRL_MMIO_OFFSET	0x0178
+#define IOMMU_XT_GA_INT_CTRL_MMIO_OFFSET	0x0180
+
+union amd_iommu_x2apic_control {
+    uint64_t raw;
+    struct {
+        unsigned int :2;
+        unsigned int dest_mode:1;
+        unsigned int :5;
+        unsigned int dest_lo:24;
+        unsigned int vector:8;
+        unsigned int int_type:1; /* DM in IOMMU spec 3.04 */
+        unsigned int :15;
+        unsigned int dest_hi:8;
+    };
+};
+
 /* Status Register*/
 #define IOMMU_STATUS_MMIO_OFFSET		0x2020
 #define IOMMU_STATUS_EVENT_OVERFLOW_MASK	0x00000001




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v2 09/10] AMD/IOMMU: enable x2APIC mode when available
Posted by Jan Beulich 4 years, 10 months ago
In order for the CPUs to use x2APIC mode, the IOMMU(s) first need to be
switched into suitable state.

The post-AP-bringup IRQ affinity adjustment is done also for the non-
x2APIC case.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: Drop cpu_has_cx16 check. Add comment.
---
TBD: Instead of the system_state check in iov_enable_xt() the function
     could also zap its own hook pointer, at which point it could also
     become __init. This would, however, require that either
     resume_x2apic() be bound to ignore iommu_enable_x2apic() errors
     forever, or that iommu_enable_x2apic() be slightly re-arranged to
     not return -EOPNOTSUPP when finding a NULL hook during resume.

--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -834,6 +834,30 @@ static bool_t __init set_iommu_interrupt
     return 1;
 }
 
+int iov_adjust_irq_affinities(void)
+{
+    const struct amd_iommu *iommu;
+
+    if ( !iommu_enabled )
+        return 0;
+
+    for_each_amd_iommu ( iommu )
+    {
+        struct irq_desc *desc = irq_to_desc(iommu->msi.irq);
+        unsigned long flags;
+
+        spin_lock_irqsave(&desc->lock, flags);
+        if ( iommu->ctrl.int_cap_xt_en )
+            set_x2apic_affinity(desc, &cpu_online_map);
+        else
+            set_msi_affinity(desc, &cpu_online_map);
+        spin_unlock_irqrestore(&desc->lock, flags);
+    }
+
+    return 0;
+}
+__initcall(iov_adjust_irq_affinities);
+
 /*
  * Family15h Model 10h-1fh erratum 746 (IOMMU Logging May Stall Translations)
  * Workaround:
@@ -1047,7 +1071,7 @@ static void * __init allocate_ppr_log(st
                                 IOMMU_PPR_LOG_DEFAULT_ENTRIES, "PPR Log");
 }
 
-static int __init amd_iommu_init_one(struct amd_iommu *iommu)
+static int __init amd_iommu_init_one(struct amd_iommu *iommu, bool intr)
 {
     if ( allocate_cmd_buffer(iommu) == NULL )
         goto error_out;
@@ -1058,7 +1082,7 @@ static int __init amd_iommu_init_one(str
     if ( iommu->features.flds.ppr_sup && !allocate_ppr_log(iommu) )
         goto error_out;
 
-    if ( !set_iommu_interrupt_handler(iommu) )
+    if ( intr && !set_iommu_interrupt_handler(iommu) )
         goto error_out;
 
     /* To make sure that device_table.buffer has been successfully allocated */
@@ -1285,7 +1309,7 @@ static int __init amd_iommu_prepare_one(
     return 0;
 }
 
-int __init amd_iommu_init(void)
+int __init amd_iommu_prepare(void)
 {
     struct amd_iommu *iommu;
     int rc = -ENODEV;
@@ -1300,9 +1324,14 @@ int __init amd_iommu_init(void)
     if ( unlikely(acpi_gbl_FADT.boot_flags & ACPI_FADT_NO_MSI) )
         goto error_out;
 
+    /* Have we been here before? */
+    if ( ivhd_type )
+        return 0;
+
     rc = amd_iommu_get_supported_ivhd_type();
     if ( rc < 0 )
         goto error_out;
+    BUG_ON(!rc);
     ivhd_type = rc;
 
     rc = amd_iommu_get_ivrs_dev_entries();
@@ -1321,9 +1350,33 @@ int __init amd_iommu_init(void)
     }
 
     rc = amd_iommu_update_ivrs_mapping_acpi();
+
+ error_out:
+    if ( rc )
+    {
+        amd_iommu_init_cleanup();
+        ivhd_type = 0;
+    }
+
+    return rc;
+}
+
+int __init amd_iommu_init(bool xt)
+{
+    struct amd_iommu *iommu;
+    int rc = amd_iommu_prepare();
+
     if ( rc )
         goto error_out;
 
+    for_each_amd_iommu ( iommu )
+    {
+        /* NB: There's no need to actually write these out right here. */
+        iommu->ctrl.ga_en |= xt;
+        iommu->ctrl.xt_en = xt;
+        iommu->ctrl.int_cap_xt_en = xt;
+    }
+
     /* initialize io-apic interrupt remapping entries */
     if ( iommu_intremap )
         rc = amd_iommu_setup_ioapic_remapping();
@@ -1346,7 +1399,12 @@ int __init amd_iommu_init(void)
     /* per iommu initialization  */
     for_each_amd_iommu ( iommu )
     {
-        rc = amd_iommu_init_one(iommu);
+        /*
+         * Setting up of the IOMMU interrupts cannot occur yet at the (very
+         * early) time we get here when enabling x2APIC mode. Suppress it
+         * here, and do it explicitly in amd_iommu_init_interrupt().
+         */
+        rc = amd_iommu_init_one(iommu, !xt);
         if ( rc )
             goto error_out;
     }
@@ -1358,6 +1416,40 @@ error_out:
     return rc;
 }
 
+int __init amd_iommu_init_interrupt(void)
+{
+    struct amd_iommu *iommu;
+    int rc = 0;
+
+    for_each_amd_iommu ( iommu )
+    {
+        struct irq_desc *desc;
+
+        if ( !set_iommu_interrupt_handler(iommu) )
+        {
+            rc = -EIO;
+            break;
+        }
+
+        desc = irq_to_desc(iommu->msi.irq);
+
+        spin_lock(&desc->lock);
+        ASSERT(iommu->ctrl.int_cap_xt_en);
+        set_x2apic_affinity(desc, &cpu_online_map);
+        spin_unlock(&desc->lock);
+
+        set_iommu_event_log_control(iommu, IOMMU_CONTROL_ENABLED);
+
+        if ( iommu->features.flds.ppr_sup )
+            set_iommu_ppr_log_control(iommu, IOMMU_CONTROL_ENABLED);
+    }
+
+    if ( rc )
+        amd_iommu_init_cleanup();
+
+    return rc;
+}
+
 static void invalidate_all_domain_pages(void)
 {
     struct domain *d;
--- a/xen/drivers/passthrough/amd/iommu_intr.c
+++ b/xen/drivers/passthrough/amd/iommu_intr.c
@@ -816,6 +816,40 @@ void* __init amd_iommu_alloc_intremap_ta
     return tb;
 }
 
+bool __init iov_supports_xt(void)
+{
+    unsigned int apic;
+    struct amd_iommu *iommu;
+
+    if ( !iommu_enable || !iommu_intremap )
+        return false;
+
+    if ( amd_iommu_prepare() )
+        return false;
+
+    for_each_amd_iommu ( iommu )
+        if ( !iommu->features.flds.ga_sup || !iommu->features.flds.xt_sup )
+            return false;
+
+    for ( apic = 0; apic < nr_ioapics; apic++ )
+    {
+        unsigned int idx = ioapic_id_to_index(IO_APIC_ID(apic));
+
+        if ( idx == MAX_IO_APICS )
+            return false;
+
+        if ( !find_iommu_for_device(ioapic_sbdf[idx].seg,
+                                    ioapic_sbdf[idx].bdf) )
+        {
+            AMD_IOMMU_DEBUG("No IOMMU for IO-APIC %#x (ID %x)\n",
+                            apic, IO_APIC_ID(apic));
+            return false;
+        }
+    }
+
+    return true;
+}
+
 int __init amd_setup_hpet_msi(struct msi_desc *msi_desc)
 {
     spinlock_t *lock;
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -170,7 +170,8 @@ static int __init iov_detect(void)
     if ( !iommu_enable && !iommu_intremap )
         return 0;
 
-    if ( amd_iommu_init() != 0 )
+    else if ( (init_done ? amd_iommu_init_interrupt()
+                         : amd_iommu_init(false)) != 0 )
     {
         printk("AMD-Vi: Error initialization\n");
         return -ENODEV;
@@ -183,6 +184,25 @@ static int __init iov_detect(void)
     return scan_pci_devices();
 }
 
+static int iov_enable_xt(void)
+{
+    int rc;
+
+    if ( system_state >= SYS_STATE_active )
+        return 0;
+
+    if ( (rc = amd_iommu_init(true)) != 0 )
+    {
+        printk("AMD-Vi: Error %d initializing for x2APIC mode\n", rc);
+        /* -ENXIO has special meaning to the caller - convert it. */
+        return rc != -ENXIO ? rc : -ENODATA;
+    }
+
+    init_done = true;
+
+    return 0;
+}
+
 int amd_iommu_alloc_root(struct domain_iommu *hd)
 {
     if ( unlikely(!hd->arch.root_table) )
@@ -559,11 +579,13 @@ static const struct iommu_ops __initcons
     .free_page_table = deallocate_page_table,
     .reassign_device = reassign_device,
     .get_device_group_id = amd_iommu_group_id,
+    .enable_x2apic = iov_enable_xt,
     .update_ire_from_apic = amd_iommu_ioapic_update_ire,
     .update_ire_from_msi = amd_iommu_msi_msg_update_ire,
     .read_apic_from_ire = amd_iommu_read_ioapic_from_ire,
     .read_msi_from_ire = amd_iommu_read_msi_from_ire,
     .setup_hpet_msi = amd_setup_hpet_msi,
+    .adjust_irq_affinities = iov_adjust_irq_affinities,
     .suspend = amd_iommu_suspend,
     .resume = amd_iommu_resume,
     .share_p2m = amd_iommu_share_p2m,
@@ -574,4 +596,5 @@ static const struct iommu_ops __initcons
 static const struct iommu_init_ops __initconstrel _iommu_init_ops = {
     .ops = &_iommu_ops,
     .setup = iov_detect,
+    .supports_x2apic = iov_supports_xt,
 };
--- a/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
+++ b/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
@@ -48,8 +48,11 @@ int amd_iommu_detect_acpi(void);
 void get_iommu_features(struct amd_iommu *iommu);
 
 /* amd-iommu-init functions */
-int amd_iommu_init(void);
+int amd_iommu_prepare(void);
+int amd_iommu_init(bool xt);
+int amd_iommu_init_interrupt(void);
 int amd_iommu_update_ivrs_mapping_acpi(void);
+int iov_adjust_irq_affinities(void);
 
 /* mapping functions */
 int __must_check amd_iommu_map_page(struct domain *d, dfn_t dfn,
@@ -96,6 +99,7 @@ void amd_iommu_flush_all_caches(struct a
 struct amd_iommu *find_iommu_for_device(int seg, int bdf);
 
 /* interrupt remapping */
+bool iov_supports_xt(void);
 int amd_iommu_setup_ioapic_remapping(void);
 void *amd_iommu_alloc_intremap_table(unsigned long **);
 int amd_iommu_free_intremap_table(u16 seg, struct ivrs_mappings *);




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 09/10] AMD/IOMMU: enable x2APIC mode when available
Posted by Andrew Cooper 4 years, 9 months ago
On 27/06/2019 16:23, Jan Beulich wrote:
> In order for the CPUs to use x2APIC mode, the IOMMU(s) first need to be
> switched into suitable state.
>
> The post-AP-bringup IRQ affinity adjustment is done also for the non-
> x2APIC case.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH RFC v2 10/10] AMD/IOMMU: correct IRTE updating
Posted by Jan Beulich 4 years, 10 months ago
While for 32-bit IRTEs I think we can safely continue to assume that the
writes will translate to a single MOV, the use of CMPXCHG16B is more
heavy handed than necessary for the 128-bit form, and the flushing
didn't get done along the lines of what the specification says. Mark
entries to be updated as not remapped (which will result in interrupt
requests to get target aborted, but the interrupts should be masked
anyway at that point in time), issue the flush, and only then write the
new entry. In the 128-bit IRTE case set RemapEn separately last, to that
the ordering of the writes of the two 64-bit halves won't matter.

In update_intremap_entry_from_msi_msg() also fold the duplicate initial
lock determination and acquire into just a single instance.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
RFC: Putting the flush invocations in loops isn't overly nice, but I
     don't think this can really be abused, since callers up the stack
     hold further locks. Nevertheless I'd like to ask for better
     suggestions.
---
v2: Parts morphed into earlier patch.

--- a/xen/drivers/passthrough/amd/iommu_intr.c
+++ b/xen/drivers/passthrough/amd/iommu_intr.c
@@ -238,8 +238,7 @@ static void update_intremap_entry(union
         break;
 
     case irte128:
-        ACCESS_ONCE(entry.ptr128->raw[0]) = 0;
-        barrier();
+        ASSERT(!entry.ptr128->full.remap_en);
         entry.ptr128->raw[1] =
             container_of(&full, union irte128, full)->raw[1];
         barrier();
@@ -308,6 +307,20 @@ static int update_intremap_entry_from_io
     }
 
     entry = get_intremap_entry(iommu->seg, req_id, offset);
+
+    /* The RemapEn fields match for all formats. */
+    while ( iommu->enabled && entry.ptr32->basic.remap_en )
+    {
+        entry.ptr32->basic.remap_en = 0;
+        spin_unlock(lock);
+
+        spin_lock(&iommu->lock);
+        amd_iommu_flush_intremap(iommu, req_id);
+        spin_unlock(&iommu->lock);
+
+        spin_lock(lock);
+    }
+
     if ( fresh )
         /* nothing */;
     else if ( !lo_update )
@@ -337,13 +350,6 @@ static int update_intremap_entry_from_io
 
     spin_unlock_irqrestore(lock, flags);
 
-    if ( iommu->enabled && !fresh )
-    {
-        spin_lock_irqsave(&iommu->lock, flags);
-        amd_iommu_flush_intremap(iommu, req_id);
-        spin_unlock_irqrestore(&iommu->lock, flags);
-    }
-
     set_rte_index(rte, offset);
 
     return 0;
@@ -608,19 +614,27 @@ static int update_intremap_entry_from_ms
     req_id = get_dma_requestor_id(iommu->seg, bdf);
     alias_id = get_intremap_requestor_id(iommu->seg, bdf);
 
+    lock = get_intremap_lock(iommu->seg, req_id);
+    spin_lock_irqsave(lock, flags);
+
     if ( msg == NULL )
     {
-        lock = get_intremap_lock(iommu->seg, req_id);
-        spin_lock_irqsave(lock, flags);
         for ( i = 0; i < nr; ++i )
             free_intremap_entry(iommu->seg, req_id, *remap_index + i);
         spin_unlock_irqrestore(lock, flags);
-        goto done;
-    }
 
-    lock = get_intremap_lock(iommu->seg, req_id);
+        if ( iommu->enabled )
+        {
+            spin_lock_irqsave(&iommu->lock, flags);
+            amd_iommu_flush_intremap(iommu, req_id);
+            if ( alias_id != req_id )
+                amd_iommu_flush_intremap(iommu, alias_id);
+            spin_unlock_irqrestore(&iommu->lock, flags);
+        }
+
+        return 0;
+    }
 
-    spin_lock_irqsave(lock, flags);
     dest_mode = (msg->address_lo >> MSI_ADDR_DESTMODE_SHIFT) & 0x1;
     delivery_mode = (msg->data >> MSI_DATA_DELIVERY_MODE_SHIFT) & 0x1;
     vector = (msg->data >> MSI_DATA_VECTOR_SHIFT) & MSI_DATA_VECTOR_MASK;
@@ -644,6 +658,22 @@ static int update_intremap_entry_from_ms
     }
 
     entry = get_intremap_entry(iommu->seg, req_id, offset);
+
+    /* The RemapEn fields match for all formats. */
+    while ( iommu->enabled && entry.ptr32->basic.remap_en )
+    {
+        entry.ptr32->basic.remap_en = 0;
+        spin_unlock(lock);
+
+        spin_lock(&iommu->lock);
+        amd_iommu_flush_intremap(iommu, req_id);
+        if ( alias_id != req_id )
+            amd_iommu_flush_intremap(iommu, alias_id);
+        spin_unlock(&iommu->lock);
+
+        spin_lock(lock);
+    }
+
     update_intremap_entry(entry, vector, delivery_mode, dest_mode, dest);
     spin_unlock_irqrestore(lock, flags);
 
@@ -663,16 +693,6 @@ static int update_intremap_entry_from_ms
                get_ivrs_mappings(iommu->seg)[alias_id].intremap_table);
     }
 
-done:
-    if ( iommu->enabled )
-    {
-        spin_lock_irqsave(&iommu->lock, flags);
-        amd_iommu_flush_intremap(iommu, req_id);
-        if ( alias_id != req_id )
-            amd_iommu_flush_intremap(iommu, alias_id);
-        spin_unlock_irqrestore(&iommu->lock, flags);
-    }
-
     return 0;
 }
 




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH RFC v2 10/10] AMD/IOMMU: correct IRTE updating
Posted by Andrew Cooper 4 years, 9 months ago
On 27/06/2019 16:23, Jan Beulich wrote:
> While for 32-bit IRTEs I think we can safely continue to assume that the
> writes will translate to a single MOV, the use of CMPXCHG16B is more

The CMPXCHG16B here is stale.

> heavy handed than necessary for the 128-bit form, and the flushing
> didn't get done along the lines of what the specification says. Mark
> entries to be updated as not remapped (which will result in interrupt
> requests to get target aborted, but the interrupts should be masked
> anyway at that point in time), issue the flush, and only then write the
> new entry. In the 128-bit IRTE case set RemapEn separately last, to that
> the ordering of the writes of the two 64-bit halves won't matter.
>
> In update_intremap_entry_from_msi_msg() also fold the duplicate initial
> lock determination and acquire into just a single instance.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> RFC: Putting the flush invocations in loops isn't overly nice, but I
>      don't think this can really be abused, since callers up the stack
>      hold further locks. Nevertheless I'd like to ask for better
>      suggestions.
> ---
> v2: Parts morphed into earlier patch.
>
> --- a/xen/drivers/passthrough/amd/iommu_intr.c
> +++ b/xen/drivers/passthrough/amd/iommu_intr.c
> @@ -238,8 +238,7 @@ static void update_intremap_entry(union
>          break;
>  
>      case irte128:
> -        ACCESS_ONCE(entry.ptr128->raw[0]) = 0;
> -        barrier();
> +        ASSERT(!entry.ptr128->full.remap_en);
>          entry.ptr128->raw[1] =
>              container_of(&full, union irte128, full)->raw[1];
>          barrier();
> @@ -308,6 +307,20 @@ static int update_intremap_entry_from_io
>      }
>  
>      entry = get_intremap_entry(iommu->seg, req_id, offset);
> +
> +    /* The RemapEn fields match for all formats. */
> +    while ( iommu->enabled && entry.ptr32->basic.remap_en )

Why while?  (and by this, what I mean is that this definitely needs a
comment, because the code looks like it ought to be an if.)

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH RFC v2 10/10] AMD/IOMMU: correct IRTE updating
Posted by Jan Beulich 4 years, 9 months ago
On 02.07.2019 17:08, Andrew Cooper wrote:
> On 27/06/2019 16:23, Jan Beulich wrote:
>> While for 32-bit IRTEs I think we can safely continue to assume that the
>> writes will translate to a single MOV, the use of CMPXCHG16B is more
> 
> The CMPXCHG16B here is stale.

Indeed, as is the 32-bit IRTE part of the sentence (now that I
use ACCESS_ONCE() already before this patch).

>> heavy handed than necessary for the 128-bit form, and the flushing
>> didn't get done along the lines of what the specification says. Mark
>> entries to be updated as not remapped (which will result in interrupt
>> requests to get target aborted, but the interrupts should be masked
>> anyway at that point in time), issue the flush, and only then write the
>> new entry. In the 128-bit IRTE case set RemapEn separately last, to that
>> the ordering of the writes of the two 64-bit halves won't matter.

This last sentence is stale too, and hence I've now removed it.

>> --- a/xen/drivers/passthrough/amd/iommu_intr.c
>> +++ b/xen/drivers/passthrough/amd/iommu_intr.c
>> @@ -238,8 +238,7 @@ static void update_intremap_entry(union
>>           break;
>>   
>>       case irte128:
>> -        ACCESS_ONCE(entry.ptr128->raw[0]) = 0;
>> -        barrier();
>> +        ASSERT(!entry.ptr128->full.remap_en);
>>           entry.ptr128->raw[1] =
>>               container_of(&full, union irte128, full)->raw[1];
>>           barrier();
>> @@ -308,6 +307,20 @@ static int update_intremap_entry_from_io
>>       }
>>   
>>       entry = get_intremap_entry(iommu->seg, req_id, offset);
>> +
>> +    /* The RemapEn fields match for all formats. */
>> +    while ( iommu->enabled && entry.ptr32->basic.remap_en )
> 
> Why while?  (and by this, what I mean is that this definitely needs a
> comment, because the code looks like it ought to be an if.)

Well - see the RFC remark after the description. I'd be happy to
change to if(), but only on solid grounds. Without clear
guarantees that no races between IRTE updates can occur, we need
to continue flushing as long as we find RemapEn to have got set
again after a flush. Note how the necessary lock guarding against
such is getting dropped and re-acquired in the loop bodies.

Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel