[REPOST PATCH v3 0/5] Support x2APIC mode with TCG accelerator

Bui Quang Minh posted 5 patches 1 year ago
Failed in applying to current master (apply log)
There is a newer version of this series
hw/i386/acpi-build.c                 |  28 +-
hw/i386/amd_iommu.c                  |  21 +-
hw/i386/amd_iommu.h                  |  16 +-
hw/i386/intel_iommu.c                |  11 -
hw/i386/x86.c                        |   8 +-
hw/intc/apic.c                       | 395 +++++++++++++++++++++------
hw/intc/apic_common.c                |  16 +-
hw/intc/trace-events                 |   4 +-
include/hw/i386/apic.h               |   6 +-
include/hw/i386/apic_internal.h      |   7 +-
target/i386/cpu-sysemu.c             |  18 +-
target/i386/cpu.c                    |   5 +-
target/i386/cpu.h                    |   9 +
target/i386/tcg/sysemu/misc_helper.c |  31 +++
14 files changed, 436 insertions(+), 139 deletions(-)
[REPOST PATCH v3 0/5] Support x2APIC mode with TCG accelerator
Posted by Bui Quang Minh 1 year ago
[Reposting due to broken threading in previous post]

Hi everyone,

This series implements x2APIC mode in userspace local APIC and the
RDMSR/WRMSR helper to access x2APIC registers in x2APIC mode. Intel iommu
and AMD iommu are adjusted to support x2APIC interrupt remapping. With this
series, we can now boot Linux kernel into x2APIC mode with TCG accelerator
using either Intel or AMD iommu.

Testing the emulated userspace APIC with kvm-unit-tests, disable test
device with this patch

diff --git a/lib/x86/fwcfg.c b/lib/x86/fwcfg.c
index 1734afb..f56fe1c 100644
--- a/lib/x86/fwcfg.c
+++ b/lib/x86/fwcfg.c
@@ -27,6 +27,7 @@ static void read_cfg_override(void)
 
        if ((str = getenv("TEST_DEVICE")))
                no_test_device = !atol(str);
+       no_test_device = true;
 
        if ((str = getenv("MEMLIMIT")))
                fw_override[FW_CFG_MAX_RAM] = atol(str) * 1024 * 1024;

~ env QEMU=/home/minh/Desktop/oss/qemu/build/qemu-system-x86_64 ACCEL=tcg \
./run_tests.sh -v -g apic 

TESTNAME=apic-split TIMEOUT=90s ACCEL=tcg ./x86/run x86/apic.flat -smp 2
-cpu qemu64,+x2apic,+tsc-deadline -machine kernel_irqchip=split FAIL
apic-split (54 tests, 8 unexpected failures, 1 skipped)
TESTNAME=ioapic-split TIMEOUT=90s ACCEL=tcg ./x86/run x86/ioapic.flat -smp
1 -cpu qemu64 -machine kernel_irqchip=split PASS ioapic-split (19 tests)
TESTNAME=x2apic TIMEOUT=30 ACCEL=tcg ./x86/run x86/apic.flat -smp 2 -cpu
qemu64,+x2apic,+tsc-deadline FAIL x2apic (54 tests, 8 unexpected failures,
1 skipped) TESTNAME=xapic TIMEOUT=60 ACCEL=tcg ./x86/run x86/apic.flat -smp
2 -cpu qemu64,-x2apic,+tsc-deadline -machine pit=off FAIL xapic (43 tests,
6 unexpected failures, 2 skipped)

  FAIL: apic_disable: *0xfee00030: 50014
  FAIL: apic_disable: *0xfee00080: f0
  FAIL: apic_disable: *0xfee00030: 50014
  FAIL: apic_disable: *0xfee00080: f0 
  FAIL: apicbase: relocate apic

These errors are because we don't disable MMIO region when switching to
x2APIC and don't support relocate MMIO region yet. This is a problem
because, MMIO region is the same for all CPUs, in order to support these we
need to figure out how to allocate and manage different MMIO regions for
each CPUs. This can be an improvement in the future.

  FAIL: nmi-after-sti
  FAIL: multiple nmi

These errors are in the way we handle CPU_INTERRUPT_NMI in core TCG.

  FAIL: TMCCT should stay at zero

This error is related to APIC timer which should be addressed in separate
patch.

Version 3 changes,
- Patch 2:
  + Allow APIC ID > 255 only when x2APIC feature is supported on CPU
  + Make physical destination mode IPI which has destination id 0xffffffff
  a broadcast to xAPIC CPUs
  + Make cluster address 0xf in cluster model of xAPIC logical destination
  mode a broadcast to all clusters
  + Create new extended_log_dest to store APIC_LDR information in x2APIC
  instead of extending log_dest for backward compatibility in vmstate

Version 2 changes,
- Add support for APIC ID larger than 255
- Adjust AMD iommu for x2APIC suuport
- Reorganize and split patch 1,2 into patch 1,2,3 in version 2

Thanks,
Quang Minh.

Bui Quang Minh (5):
  i386/tcg: implement x2APIC registers MSR access
  apic: add support for x2APIC mode
  apic, i386/tcg: add x2apic transitions
  intel_iommu: allow Extended Interrupt Mode when using userspace APIC
  amd_iommu: report x2APIC support to the operating system

 hw/i386/acpi-build.c                 |  28 +-
 hw/i386/amd_iommu.c                  |  21 +-
 hw/i386/amd_iommu.h                  |  16 +-
 hw/i386/intel_iommu.c                |  11 -
 hw/i386/x86.c                        |   8 +-
 hw/intc/apic.c                       | 395 +++++++++++++++++++++------
 hw/intc/apic_common.c                |  16 +-
 hw/intc/trace-events                 |   4 +-
 include/hw/i386/apic.h               |   6 +-
 include/hw/i386/apic_internal.h      |   7 +-
 target/i386/cpu-sysemu.c             |  18 +-
 target/i386/cpu.c                    |   5 +-
 target/i386/cpu.h                    |   9 +
 target/i386/tcg/sysemu/misc_helper.c |  31 +++
 14 files changed, 436 insertions(+), 139 deletions(-)

-- 
2.25.1
Re: [REPOST PATCH v3 0/5] Support x2APIC mode with TCG accelerator
Posted by Michael S. Tsirkin 1 year ago
On Tue, Apr 11, 2023 at 09:24:35PM +0700, Bui Quang Minh wrote:
> [Reposting due to broken threading in previous post]
> 
> Hi everyone,
> 
> This series implements x2APIC mode in userspace local APIC and the
> RDMSR/WRMSR helper to access x2APIC registers in x2APIC mode. Intel iommu
> and AMD iommu are adjusted to support x2APIC interrupt remapping. With this
> series, we can now boot Linux kernel into x2APIC mode with TCG accelerator
> using either Intel or AMD iommu.
> 
> Testing the emulated userspace APIC with kvm-unit-tests, disable test
> device with this patch


Series:

Reviewed-by: Michael S. Tsirkin <mst@redhat.com>

any acks from tcg maintainers?

> diff --git a/lib/x86/fwcfg.c b/lib/x86/fwcfg.c
> index 1734afb..f56fe1c 100644
> --- a/lib/x86/fwcfg.c
> +++ b/lib/x86/fwcfg.c
> @@ -27,6 +27,7 @@ static void read_cfg_override(void)
>  
>         if ((str = getenv("TEST_DEVICE")))
>                 no_test_device = !atol(str);
> +       no_test_device = true;
>  
>         if ((str = getenv("MEMLIMIT")))
>                 fw_override[FW_CFG_MAX_RAM] = atol(str) * 1024 * 1024;
> 
> ~ env QEMU=/home/minh/Desktop/oss/qemu/build/qemu-system-x86_64 ACCEL=tcg \
> ./run_tests.sh -v -g apic 
> 
> TESTNAME=apic-split TIMEOUT=90s ACCEL=tcg ./x86/run x86/apic.flat -smp 2
> -cpu qemu64,+x2apic,+tsc-deadline -machine kernel_irqchip=split FAIL
> apic-split (54 tests, 8 unexpected failures, 1 skipped)
> TESTNAME=ioapic-split TIMEOUT=90s ACCEL=tcg ./x86/run x86/ioapic.flat -smp
> 1 -cpu qemu64 -machine kernel_irqchip=split PASS ioapic-split (19 tests)
> TESTNAME=x2apic TIMEOUT=30 ACCEL=tcg ./x86/run x86/apic.flat -smp 2 -cpu
> qemu64,+x2apic,+tsc-deadline FAIL x2apic (54 tests, 8 unexpected failures,
> 1 skipped) TESTNAME=xapic TIMEOUT=60 ACCEL=tcg ./x86/run x86/apic.flat -smp
> 2 -cpu qemu64,-x2apic,+tsc-deadline -machine pit=off FAIL xapic (43 tests,
> 6 unexpected failures, 2 skipped)
> 
>   FAIL: apic_disable: *0xfee00030: 50014
>   FAIL: apic_disable: *0xfee00080: f0
>   FAIL: apic_disable: *0xfee00030: 50014
>   FAIL: apic_disable: *0xfee00080: f0 
>   FAIL: apicbase: relocate apic
> 
> These errors are because we don't disable MMIO region when switching to
> x2APIC and don't support relocate MMIO region yet. This is a problem
> because, MMIO region is the same for all CPUs, in order to support these we
> need to figure out how to allocate and manage different MMIO regions for
> each CPUs. This can be an improvement in the future.
> 
>   FAIL: nmi-after-sti
>   FAIL: multiple nmi
> 
> These errors are in the way we handle CPU_INTERRUPT_NMI in core TCG.
> 
>   FAIL: TMCCT should stay at zero
> 
> This error is related to APIC timer which should be addressed in separate
> patch.
> 
> Version 3 changes,
> - Patch 2:
>   + Allow APIC ID > 255 only when x2APIC feature is supported on CPU
>   + Make physical destination mode IPI which has destination id 0xffffffff
>   a broadcast to xAPIC CPUs
>   + Make cluster address 0xf in cluster model of xAPIC logical destination
>   mode a broadcast to all clusters
>   + Create new extended_log_dest to store APIC_LDR information in x2APIC
>   instead of extending log_dest for backward compatibility in vmstate
> 
> Version 2 changes,
> - Add support for APIC ID larger than 255
> - Adjust AMD iommu for x2APIC suuport
> - Reorganize and split patch 1,2 into patch 1,2,3 in version 2
> 
> Thanks,
> Quang Minh.
> 
> Bui Quang Minh (5):
>   i386/tcg: implement x2APIC registers MSR access
>   apic: add support for x2APIC mode
>   apic, i386/tcg: add x2apic transitions
>   intel_iommu: allow Extended Interrupt Mode when using userspace APIC
>   amd_iommu: report x2APIC support to the operating system
> 
>  hw/i386/acpi-build.c                 |  28 +-
>  hw/i386/amd_iommu.c                  |  21 +-
>  hw/i386/amd_iommu.h                  |  16 +-
>  hw/i386/intel_iommu.c                |  11 -
>  hw/i386/x86.c                        |   8 +-
>  hw/intc/apic.c                       | 395 +++++++++++++++++++++------
>  hw/intc/apic_common.c                |  16 +-
>  hw/intc/trace-events                 |   4 +-
>  include/hw/i386/apic.h               |   6 +-
>  include/hw/i386/apic_internal.h      |   7 +-
>  target/i386/cpu-sysemu.c             |  18 +-
>  target/i386/cpu.c                    |   5 +-
>  target/i386/cpu.h                    |   9 +
>  target/i386/tcg/sysemu/misc_helper.c |  31 +++
>  14 files changed, 436 insertions(+), 139 deletions(-)
> 
> -- 
> 2.25.1
Re: [REPOST PATCH v3 0/5] Support x2APIC mode with TCG accelerator
Posted by Bui Quang Minh 1 year ago
On 4/21/23 14:57, Michael S. Tsirkin wrote:
> On Tue, Apr 11, 2023 at 09:24:35PM +0700, Bui Quang Minh wrote:
>> [Reposting due to broken threading in previous post]
>>
>> Hi everyone,
>>
>> This series implements x2APIC mode in userspace local APIC and the
>> RDMSR/WRMSR helper to access x2APIC registers in x2APIC mode. Intel iommu
>> and AMD iommu are adjusted to support x2APIC interrupt remapping. With this
>> series, we can now boot Linux kernel into x2APIC mode with TCG accelerator
>> using either Intel or AMD iommu.
>>
>> Testing the emulated userspace APIC with kvm-unit-tests, disable test
>> device with this patch
> 
> 
> Series:
> 
> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> 
> any acks from tcg maintainers?

Thank you for your review. There is no ack from tcg maintainers yet.
Quang Minh.
Re: [REPOST PATCH v3 0/5] Support x2APIC mode with TCG accelerator
Posted by Bui Quang Minh 11 months, 3 weeks ago
On 4/11/23 21:24, Bui Quang Minh wrote:
> [Reposting due to broken threading in previous post]
> 
> Hi everyone,
> 
> This series implements x2APIC mode in userspace local APIC and the
> RDMSR/WRMSR helper to access x2APIC registers in x2APIC mode. Intel iommu
> and AMD iommu are adjusted to support x2APIC interrupt remapping. With this
> series, we can now boot Linux kernel into x2APIC mode with TCG accelerator
> using either Intel or AMD iommu.
> 
> Testing the emulated userspace APIC with kvm-unit-tests, disable test
> device with this patch
> 
> diff --git a/lib/x86/fwcfg.c b/lib/x86/fwcfg.c
> index 1734afb..f56fe1c 100644
> --- a/lib/x86/fwcfg.c
> +++ b/lib/x86/fwcfg.c
> @@ -27,6 +27,7 @@ static void read_cfg_override(void)
>   
>          if ((str = getenv("TEST_DEVICE")))
>                  no_test_device = !atol(str);
> +       no_test_device = true;
>   
>          if ((str = getenv("MEMLIMIT")))
>                  fw_override[FW_CFG_MAX_RAM] = atol(str) * 1024 * 1024;
> 
> ~ env QEMU=/home/minh/Desktop/oss/qemu/build/qemu-system-x86_64 ACCEL=tcg \
> ./run_tests.sh -v -g apic
> 
> TESTNAME=apic-split TIMEOUT=90s ACCEL=tcg ./x86/run x86/apic.flat -smp 2
> -cpu qemu64,+x2apic,+tsc-deadline -machine kernel_irqchip=split FAIL
> apic-split (54 tests, 8 unexpected failures, 1 skipped)
> TESTNAME=ioapic-split TIMEOUT=90s ACCEL=tcg ./x86/run x86/ioapic.flat -smp
> 1 -cpu qemu64 -machine kernel_irqchip=split PASS ioapic-split (19 tests)
> TESTNAME=x2apic TIMEOUT=30 ACCEL=tcg ./x86/run x86/apic.flat -smp 2 -cpu
> qemu64,+x2apic,+tsc-deadline FAIL x2apic (54 tests, 8 unexpected failures,
> 1 skipped) TESTNAME=xapic TIMEOUT=60 ACCEL=tcg ./x86/run x86/apic.flat -smp
> 2 -cpu qemu64,-x2apic,+tsc-deadline -machine pit=off FAIL xapic (43 tests,
> 6 unexpected failures, 2 skipped)
> 
>    FAIL: apic_disable: *0xfee00030: 50014
>    FAIL: apic_disable: *0xfee00080: f0
>    FAIL: apic_disable: *0xfee00030: 50014
>    FAIL: apic_disable: *0xfee00080: f0
>    FAIL: apicbase: relocate apic
> 
> These errors are because we don't disable MMIO region when switching to
> x2APIC and don't support relocate MMIO region yet. This is a problem
> because, MMIO region is the same for all CPUs, in order to support these we
> need to figure out how to allocate and manage different MMIO regions for
> each CPUs. This can be an improvement in the future.
> 
>    FAIL: nmi-after-sti
>    FAIL: multiple nmi
> 
> These errors are in the way we handle CPU_INTERRUPT_NMI in core TCG.
> 
>    FAIL: TMCCT should stay at zero
> 
> This error is related to APIC timer which should be addressed in separate
> patch.
> 
> Version 3 changes,
> - Patch 2:
>    + Allow APIC ID > 255 only when x2APIC feature is supported on CPU
>    + Make physical destination mode IPI which has destination id 0xffffffff
>    a broadcast to xAPIC CPUs
>    + Make cluster address 0xf in cluster model of xAPIC logical destination
>    mode a broadcast to all clusters
>    + Create new extended_log_dest to store APIC_LDR information in x2APIC
>    instead of extending log_dest for backward compatibility in vmstate
> 
> Version 2 changes,
> - Add support for APIC ID larger than 255
> - Adjust AMD iommu for x2APIC suuport
> - Reorganize and split patch 1,2 into patch 1,2,3 in version 2
> 
> Thanks,
> Quang Minh.
> 
> Bui Quang Minh (5):
>    i386/tcg: implement x2APIC registers MSR access
>    apic: add support for x2APIC mode
>    apic, i386/tcg: add x2apic transitions
>    intel_iommu: allow Extended Interrupt Mode when using userspace APIC
>    amd_iommu: report x2APIC support to the operating system
> 
>   hw/i386/acpi-build.c                 |  28 +-
>   hw/i386/amd_iommu.c                  |  21 +-
>   hw/i386/amd_iommu.h                  |  16 +-
>   hw/i386/intel_iommu.c                |  11 -
>   hw/i386/x86.c                        |   8 +-
>   hw/intc/apic.c                       | 395 +++++++++++++++++++++------
>   hw/intc/apic_common.c                |  16 +-
>   hw/intc/trace-events                 |   4 +-
>   include/hw/i386/apic.h               |   6 +-
>   include/hw/i386/apic_internal.h      |   7 +-
>   target/i386/cpu-sysemu.c             |  18 +-
>   target/i386/cpu.c                    |   5 +-
>   target/i386/cpu.h                    |   9 +
>   target/i386/tcg/sysemu/misc_helper.c |  31 +++
>   14 files changed, 436 insertions(+), 139 deletions(-)

Hello everyone, I just want to politely ping here. Could you spend some 
time to review the series?

Thank you very much,
Quang Minh.
[REPOST PATCH v3 1/5] i386/tcg: implement x2APIC registers MSR access
Posted by Bui Quang Minh 1 year ago
This commit refactors apic_mem_read/write to support both MMIO access in
xAPIC and MSR access in x2APIC.

Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
---
 hw/intc/apic.c                       | 79 ++++++++++++++++++----------
 hw/intc/trace-events                 |  4 +-
 include/hw/i386/apic.h               |  3 ++
 target/i386/cpu.h                    |  3 ++
 target/i386/tcg/sysemu/misc_helper.c | 27 ++++++++++
 5 files changed, 86 insertions(+), 30 deletions(-)

diff --git a/hw/intc/apic.c b/hw/intc/apic.c
index 20b5a94073..61b494b20a 100644
--- a/hw/intc/apic.c
+++ b/hw/intc/apic.c
@@ -288,6 +288,13 @@ void apic_deliver_irq(uint8_t dest, uint8_t dest_mode, uint8_t delivery_mode,
     apic_bus_deliver(deliver_bitmask, delivery_mode, vector_num, trigger_mode);
 }
 
+bool is_x2apic_mode(DeviceState *dev)
+{
+    APICCommonState *s = APIC(dev);
+
+    return s->apicbase & MSR_IA32_APICBASE_EXTD;
+}
+
 static void apic_set_base(APICCommonState *s, uint64_t val)
 {
     s->apicbase = (val & 0xfffff000) |
@@ -636,16 +643,11 @@ static void apic_timer(void *opaque)
     apic_timer_update(s, s->next_time);
 }
 
-static uint64_t apic_mem_read(void *opaque, hwaddr addr, unsigned size)
+uint64_t apic_register_read(int index)
 {
     DeviceState *dev;
     APICCommonState *s;
-    uint32_t val;
-    int index;
-
-    if (size < 4) {
-        return 0;
-    }
+    uint64_t val;
 
     dev = cpu_get_current_apic();
     if (!dev) {
@@ -653,7 +655,6 @@ static uint64_t apic_mem_read(void *opaque, hwaddr addr, unsigned size)
     }
     s = APIC(dev);
 
-    index = (addr >> 4) & 0xff;
     switch(index) {
     case 0x02: /* id */
         val = s->id << 24;
@@ -720,7 +721,23 @@ static uint64_t apic_mem_read(void *opaque, hwaddr addr, unsigned size)
         val = 0;
         break;
     }
-    trace_apic_mem_readl(addr, val);
+
+    trace_apic_register_read(index, val);
+    return val;
+}
+
+static uint64_t apic_mem_read(void *opaque, hwaddr addr, unsigned size)
+{
+    uint32_t val;
+    int index;
+
+    if (size < 4) {
+        return 0;
+    }
+
+    index = (addr >> 4) & 0xff;
+    val = (uint32_t)apic_register_read(index);
+
     return val;
 }
 
@@ -737,27 +754,10 @@ static void apic_send_msi(MSIMessage *msi)
     apic_deliver_irq(dest, dest_mode, delivery, vector, trigger_mode);
 }
 
-static void apic_mem_write(void *opaque, hwaddr addr, uint64_t val,
-                           unsigned size)
+void apic_register_write(int index, uint64_t val)
 {
     DeviceState *dev;
     APICCommonState *s;
-    int index = (addr >> 4) & 0xff;
-
-    if (size < 4) {
-        return;
-    }
-
-    if (addr > 0xfff || !index) {
-        /* MSI and MMIO APIC are at the same memory location,
-         * but actually not on the global bus: MSI is on PCI bus
-         * APIC is connected directly to the CPU.
-         * Mapping them on the global bus happens to work because
-         * MSI registers are reserved in APIC MMIO and vice versa. */
-        MSIMessage msi = { .address = addr, .data = val };
-        apic_send_msi(&msi);
-        return;
-    }
 
     dev = cpu_get_current_apic();
     if (!dev) {
@@ -765,7 +765,7 @@ static void apic_mem_write(void *opaque, hwaddr addr, uint64_t val,
     }
     s = APIC(dev);
 
-    trace_apic_mem_writel(addr, val);
+    trace_apic_register_write(index, val);
 
     switch(index) {
     case 0x02:
@@ -843,6 +843,29 @@ static void apic_mem_write(void *opaque, hwaddr addr, uint64_t val,
     }
 }
 
+static void apic_mem_write(void *opaque, hwaddr addr, uint64_t val,
+                           unsigned size)
+{
+    int index = (addr >> 4) & 0xff;
+
+    if (size < 4) {
+        return;
+    }
+
+    if (addr > 0xfff || !index) {
+        /* MSI and MMIO APIC are at the same memory location,
+         * but actually not on the global bus: MSI is on PCI bus
+         * APIC is connected directly to the CPU.
+         * Mapping them on the global bus happens to work because
+         * MSI registers are reserved in APIC MMIO and vice versa. */
+        MSIMessage msi = { .address = addr, .data = val };
+        apic_send_msi(&msi);
+        return;
+    }
+
+    apic_register_write(index, val);
+}
+
 static void apic_pre_save(APICCommonState *s)
 {
     apic_sync_vapic(s, SYNC_FROM_VAPIC);
diff --git a/hw/intc/trace-events b/hw/intc/trace-events
index 50cadfb996..9d4e7a67be 100644
--- a/hw/intc/trace-events
+++ b/hw/intc/trace-events
@@ -14,8 +14,8 @@ cpu_get_apic_base(uint64_t val) "0x%016"PRIx64
 # apic.c
 apic_local_deliver(int vector, uint32_t lvt) "vector %d delivery mode %d"
 apic_deliver_irq(uint8_t dest, uint8_t dest_mode, uint8_t delivery_mode, uint8_t vector_num, uint8_t trigger_mode) "dest %d dest_mode %d delivery_mode %d vector %d trigger_mode %d"
-apic_mem_readl(uint64_t addr, uint32_t val)  "0x%"PRIx64" = 0x%08x"
-apic_mem_writel(uint64_t addr, uint32_t val) "0x%"PRIx64" = 0x%08x"
+apic_register_read(uint8_t reg, uint64_t val) "register 0x%02x = 0x%"PRIx64
+apic_register_write(uint8_t reg, uint64_t val) "register 0x%02x = 0x%"PRIx64
 
 # ioapic.c
 ioapic_set_remote_irr(int n) "set remote irr for pin %d"
diff --git a/include/hw/i386/apic.h b/include/hw/i386/apic.h
index bdc15a7a73..2cebeb4faf 100644
--- a/include/hw/i386/apic.h
+++ b/include/hw/i386/apic.h
@@ -18,6 +18,9 @@ void apic_sipi(DeviceState *s);
 void apic_poll_irq(DeviceState *d);
 void apic_designate_bsp(DeviceState *d, bool bsp);
 int apic_get_highest_priority_irr(DeviceState *dev);
+uint64_t apic_register_read(int index);
+void apic_register_write(int index, uint64_t val);
+bool is_x2apic_mode(DeviceState *d);
 
 /* pc.c */
 DeviceState *cpu_get_current_apic(void);
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index d243e290d3..02165a5ad2 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -545,6 +545,9 @@ typedef enum X86Seg {
 #define MSR_IA32_VMX_TRUE_ENTRY_CTLS     0x00000490
 #define MSR_IA32_VMX_VMFUNC             0x00000491
 
+#define MSR_APIC_START                  0x00000800
+#define MSR_APIC_END                    0x000008ff
+
 #define XSTATE_FP_BIT                   0
 #define XSTATE_SSE_BIT                  1
 #define XSTATE_YMM_BIT                  2
diff --git a/target/i386/tcg/sysemu/misc_helper.c b/target/i386/tcg/sysemu/misc_helper.c
index e1528b7f80..1fce2076a3 100644
--- a/target/i386/tcg/sysemu/misc_helper.c
+++ b/target/i386/tcg/sysemu/misc_helper.c
@@ -25,6 +25,7 @@
 #include "exec/address-spaces.h"
 #include "exec/exec-all.h"
 #include "tcg/helper-tcg.h"
+#include "hw/i386/apic.h"
 
 void helper_outb(CPUX86State *env, uint32_t port, uint32_t data)
 {
@@ -289,6 +290,19 @@ void helper_wrmsr(CPUX86State *env)
         env->msr_bndcfgs = val;
         cpu_sync_bndcs_hflags(env);
         break;
+    case MSR_APIC_START ... MSR_APIC_END: {
+        int index = (uint32_t)env->regs[R_ECX] - MSR_APIC_START;
+
+        if (!is_x2apic_mode(env_archcpu(env)->apic_state)) {
+            goto error;
+        }
+
+        qemu_mutex_lock_iothread();
+        apic_register_write(index, val);
+        qemu_mutex_unlock_iothread();
+
+        break;
+    }
     default:
         if ((uint32_t)env->regs[R_ECX] >= MSR_MC0_CTL
             && (uint32_t)env->regs[R_ECX] < MSR_MC0_CTL +
@@ -455,6 +469,19 @@ void helper_rdmsr(CPUX86State *env)
         val = (cs->nr_threads * cs->nr_cores) | (cs->nr_cores << 16);
         break;
     }
+    case MSR_APIC_START ... MSR_APIC_END: {
+        int index = (uint32_t)env->regs[R_ECX] - MSR_APIC_START;
+
+        if (!is_x2apic_mode(env_archcpu(env)->apic_state)) {
+            raise_exception_ra(env, EXCP0D_GPF, GETPC());
+        }
+
+        qemu_mutex_lock_iothread();
+        val = apic_register_read(index);
+        qemu_mutex_unlock_iothread();
+
+        break;
+    }
     default:
         if ((uint32_t)env->regs[R_ECX] >= MSR_MC0_CTL
             && (uint32_t)env->regs[R_ECX] < MSR_MC0_CTL +
-- 
2.25.1
[REPOST PATCH v3 2/5] apic: add support for x2APIC mode
Posted by Bui Quang Minh 1 year ago
This commit extends the APIC ID to 32-bit long and remove the 255 max APIC
ID limit in userspace APIC. The array that manages local APICs is now
dynamically allocated based on the max APIC ID of created x86 machine.
Also, new x2APIC IPI destination determination scheme, self IPI and x2APIC
mode register access are supported.

Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
---
Version 3 changes:
- Allow APIC ID > 255 only when x2APIC feature is supported on CPU
- Make physical destination mode IPI which has destination id 0xffffffff
a broadcast to xAPIC CPUs
- Make cluster address 0xf in cluster model of xAPIC logical destination
mode a broadcast to all clusters
- Create new extended_log_dest to store APIC_LDR information in x2APIC
instead of extending log_dest for backward compatibility in vmstate

 hw/i386/x86.c                   |   8 +-
 hw/intc/apic.c                  | 266 ++++++++++++++++++++++++--------
 hw/intc/apic_common.c           |   9 ++
 include/hw/i386/apic.h          |   3 +-
 include/hw/i386/apic_internal.h |   7 +-
 target/i386/cpu-sysemu.c        |   8 +-
 6 files changed, 231 insertions(+), 70 deletions(-)

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index a88a126123..8b70f0a6ea 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -132,11 +132,11 @@ void x86_cpus_init(X86MachineState *x86ms, int default_cpu_version)
      * Can we support APIC ID 255 or higher?
      *
      * Under Xen: yes.
-     * With userspace emulated lapic: no
+     * With userspace emulated lapic: checked later in apic_common_set_id.
      * With KVM's in-kernel lapic: only if X2APIC API is enabled.
      */
     if (x86ms->apic_id_limit > 255 && !xen_enabled() &&
-        (!kvm_irqchip_in_kernel() || !kvm_enable_x2apic())) {
+        kvm_irqchip_in_kernel() && !kvm_enable_x2apic()) {
         error_report("current -smp configuration requires kernel "
                      "irqchip and X2APIC API support.");
         exit(EXIT_FAILURE);
@@ -146,6 +146,10 @@ void x86_cpus_init(X86MachineState *x86ms, int default_cpu_version)
         kvm_set_max_apic_id(x86ms->apic_id_limit);
     }
 
+    if (!kvm_irqchip_in_kernel()) {
+        apic_set_max_apic_id(x86ms->apic_id_limit);
+    }
+
     possible_cpus = mc->possible_cpu_arch_ids(ms);
     for (i = 0; i < ms->smp.cpus; i++) {
         x86_cpu_new(x86ms, possible_cpus->cpus[i].arch_id, &error_fatal);
diff --git a/hw/intc/apic.c b/hw/intc/apic.c
index 61b494b20a..0f2f29e679 100644
--- a/hw/intc/apic.c
+++ b/hw/intc/apic.c
@@ -31,15 +31,15 @@
 #include "hw/i386/apic-msidef.h"
 #include "qapi/error.h"
 #include "qom/object.h"
-
-#define MAX_APICS 255
-#define MAX_APIC_WORDS 8
+#include "tcg/helper-tcg.h"
 
 #define SYNC_FROM_VAPIC                 0x1
 #define SYNC_TO_VAPIC                   0x2
 #define SYNC_ISR_IRR_TO_VAPIC           0x4
 
-static APICCommonState *local_apics[MAX_APICS + 1];
+static APICCommonState **local_apics;
+static uint32_t max_apics;
+static uint32_t max_apic_words;
 
 #define TYPE_APIC "apic"
 /*This is reusing the APICCommonState typedef from APIC_COMMON */
@@ -49,7 +49,19 @@ DECLARE_INSTANCE_CHECKER(APICCommonState, APIC,
 static void apic_set_irq(APICCommonState *s, int vector_num, int trigger_mode);
 static void apic_update_irq(APICCommonState *s);
 static void apic_get_delivery_bitmask(uint32_t *deliver_bitmask,
-                                      uint8_t dest, uint8_t dest_mode);
+                                      uint32_t dest, uint8_t dest_mode);
+
+void apic_set_max_apic_id(uint32_t max_apic_id)
+{
+    int word_size = 32;
+
+    /* round up the max apic id to next multiple of words */
+    max_apics = (max_apic_id + word_size - 1) & ~(word_size - 1);
+
+    local_apics = g_malloc0(sizeof(*local_apics) * max_apics);
+    max_apic_words = max_apics >> 5;
+}
+
 
 /* Find first bit starting from msb */
 static int apic_fls_bit(uint32_t value)
@@ -199,7 +211,7 @@ static void apic_external_nmi(APICCommonState *s)
 #define foreach_apic(apic, deliver_bitmask, code) \
 {\
     int __i, __j;\
-    for(__i = 0; __i < MAX_APIC_WORDS; __i++) {\
+    for(__i = 0; __i < max_apic_words; __i++) {\
         uint32_t __mask = deliver_bitmask[__i];\
         if (__mask) {\
             for(__j = 0; __j < 32; __j++) {\
@@ -226,7 +238,7 @@ static void apic_bus_deliver(const uint32_t *deliver_bitmask,
             {
                 int i, d;
                 d = -1;
-                for(i = 0; i < MAX_APIC_WORDS; i++) {
+                for(i = 0; i < max_apic_words; i++) {
                     if (deliver_bitmask[i]) {
                         d = i * 32 + apic_ffs_bit(deliver_bitmask[i]);
                         break;
@@ -276,16 +288,18 @@ static void apic_bus_deliver(const uint32_t *deliver_bitmask,
                  apic_set_irq(apic_iter, vector_num, trigger_mode) );
 }
 
-void apic_deliver_irq(uint8_t dest, uint8_t dest_mode, uint8_t delivery_mode,
-                      uint8_t vector_num, uint8_t trigger_mode)
+static void apic_deliver_irq(uint32_t dest, uint8_t dest_mode,
+                             uint8_t delivery_mode, uint8_t vector_num,
+                             uint8_t trigger_mode)
 {
-    uint32_t deliver_bitmask[MAX_APIC_WORDS];
+    uint32_t *deliver_bitmask = g_malloc(max_apic_words * sizeof(uint32_t));
 
     trace_apic_deliver_irq(dest, dest_mode, delivery_mode, vector_num,
                            trigger_mode);
 
     apic_get_delivery_bitmask(deliver_bitmask, dest, dest_mode);
     apic_bus_deliver(deliver_bitmask, delivery_mode, vector_num, trigger_mode);
+    g_free(deliver_bitmask);
 }
 
 bool is_x2apic_mode(DeviceState *dev)
@@ -442,57 +456,121 @@ static void apic_eoi(APICCommonState *s)
     apic_update_irq(s);
 }
 
-static int apic_find_dest(uint8_t dest)
+static bool apic_match_dest(APICCommonState *apic, uint32_t dest)
 {
-    APICCommonState *apic = local_apics[dest];
+    if (is_x2apic_mode(&apic->parent_obj)) {
+        return apic->initial_apic_id == dest;
+    } else {
+        return apic->id == (uint8_t)dest;
+    }
+}
+
+static void apic_find_dest(uint32_t *deliver_bitmask, uint32_t dest)
+{
+    APICCommonState *apic = NULL;
     int i;
 
-    if (apic && apic->id == dest)
-        return dest;  /* shortcut in case apic->id == local_apics[dest]->id */
-
-    for (i = 0; i < MAX_APICS; i++) {
+    for (i = 0; i < max_apics; i++) {
         apic = local_apics[i];
-        if (apic && apic->id == dest)
-            return i;
-        if (!apic)
-            break;
+        if (apic && apic_match_dest(apic, dest)) {
+            apic_set_bit(deliver_bitmask, i);
+        }
     }
+}
 
-    return -1;
+/*
+ * Deliver interrupt to x2APIC CPUs if it is x2APIC broadcast.
+ * Otherwise, deliver interrupt to xAPIC CPUs if it is xAPIC
+ * broadcast.
+ */
+static void apic_get_broadcast_bitmask(uint32_t *deliver_bitmask,
+                                       bool is_x2apic_broadcast)
+{
+    int i;
+    APICCommonState *apic_iter;
+
+    for (i = 0; i < max_apics; i++) {
+        apic_iter = local_apics[i];
+        if (apic_iter) {
+            bool apic_in_x2apic = is_x2apic_mode(&apic_iter->parent_obj);
+
+            if (is_x2apic_broadcast && apic_in_x2apic) {
+                apic_set_bit(deliver_bitmask, i);
+            } else if (!is_x2apic_broadcast && !apic_in_x2apic) {
+                apic_set_bit(deliver_bitmask, i);
+            }
+        }
+    }
 }
 
 static void apic_get_delivery_bitmask(uint32_t *deliver_bitmask,
-                                      uint8_t dest, uint8_t dest_mode)
+                                      uint32_t dest, uint8_t dest_mode)
 {
     APICCommonState *apic_iter;
     int i;
 
-    if (dest_mode == 0) {
+    memset(deliver_bitmask, 0x00, max_apic_words * sizeof(uint32_t));
+
+    /*
+     * x2APIC broadcast is delivered to all x2APIC CPUs regardless of
+     * destination mode. In case the destination mode is physical, it is
+     * broadcasted to all xAPIC CPUs too. Otherwise, if the destination
+     * mode is logical, we need to continue checking if xAPIC CPUs accepts
+     * the interrupt.
+     */
+    if (dest == 0xffffffff) {
+        if (dest_mode == APIC_DESTMODE_PHYSICAL) {
+            memset(deliver_bitmask, 0xff, max_apic_words * sizeof(uint32_t));
+            return;
+        } else {
+            apic_get_broadcast_bitmask(deliver_bitmask, true);
+        }
+    }
+
+    if (dest_mode == APIC_DESTMODE_PHYSICAL) {
+        apic_find_dest(deliver_bitmask, dest);
+        /* Any APIC in xAPIC mode will interpret 0xFF as broadcast */
         if (dest == 0xff) {
-            memset(deliver_bitmask, 0xff, MAX_APIC_WORDS * sizeof(uint32_t));
-        } else {
-            int idx = apic_find_dest(dest);
-            memset(deliver_bitmask, 0x00, MAX_APIC_WORDS * sizeof(uint32_t));
-            if (idx >= 0)
-                apic_set_bit(deliver_bitmask, idx);
+            apic_get_broadcast_bitmask(deliver_bitmask, false);
         }
     } else {
-        /* XXX: cluster mode */
-        memset(deliver_bitmask, 0x00, MAX_APIC_WORDS * sizeof(uint32_t));
-        for(i = 0; i < MAX_APICS; i++) {
+        /* XXX: logical mode */
+        for(i = 0; i < max_apics; i++) {
             apic_iter = local_apics[i];
             if (apic_iter) {
-                if (apic_iter->dest_mode == 0xf) {
-                    if (dest & apic_iter->log_dest)
-                        apic_set_bit(deliver_bitmask, i);
-                } else if (apic_iter->dest_mode == 0x0) {
-                    if ((dest & 0xf0) == (apic_iter->log_dest & 0xf0) &&
-                        (dest & apic_iter->log_dest & 0x0f)) {
+                /* x2APIC logical mode */
+                if (apic_iter->apicbase & MSR_IA32_APICBASE_EXTD) {
+                    if ((dest & 0xffff0000) == (apic_iter->extended_log_dest & 0xffff0000) &&
+                        (dest & apic_iter->extended_log_dest & 0xffff)) {
                         apic_set_bit(deliver_bitmask, i);
                     }
+                } else {
+                    dest = (uint8_t)dest;
+                    if (apic_iter->dest_mode == APIC_DESTMODE_LOGICAL_FLAT) {
+                        if (dest & apic_iter->log_dest) {
+                            apic_set_bit(deliver_bitmask, i);
+                        }
+                    } else if (apic_iter->dest_mode == APIC_DESTMODE_LOGICAL_CLUSTER) {
+                        /*
+                         * In cluster model of xAPIC logical mode IPI, 4 higher
+                         * bits are used as cluster address, 4 lower bits are
+                         * the bitmask for local APICs in the cluster. The IPI
+                         * is delivered to an APIC if the cluster address
+                         * matches and the APIC's address bit in the cluster is
+                         * set in bitmask of destination ID in IPI.
+                         *
+                         * The cluster address ranges from 0 - 14, the cluster
+                         * address 15 (0xf) is the broadcast address to all
+                         * clusters.
+                         */
+                        if ((dest & 0xf0) == 0xf0 ||
+                            (dest & 0xf0) == (apic_iter->log_dest & 0xf0)) {
+                            if (dest & apic_iter->log_dest & 0x0f) {
+                                apic_set_bit(deliver_bitmask, i);
+                            }
+                        }
+                    }
                 }
-            } else {
-                break;
             }
         }
     }
@@ -516,29 +594,36 @@ void apic_sipi(DeviceState *dev)
     s->wait_for_sipi = 0;
 }
 
-static void apic_deliver(DeviceState *dev, uint8_t dest, uint8_t dest_mode,
+static void apic_deliver(DeviceState *dev, uint32_t dest, uint8_t dest_mode,
                          uint8_t delivery_mode, uint8_t vector_num,
-                         uint8_t trigger_mode)
+                         uint8_t trigger_mode, uint8_t dest_shorthand)
 {
     APICCommonState *s = APIC(dev);
-    uint32_t deliver_bitmask[MAX_APIC_WORDS];
-    int dest_shorthand = (s->icr[0] >> 18) & 3;
     APICCommonState *apic_iter;
+    uint32_t deliver_bitmask_size = max_apic_words * sizeof(uint32_t);
+    uint32_t *deliver_bitmask = g_malloc(deliver_bitmask_size);
+    uint32_t current_apic_id;
+
+    if (is_x2apic_mode(dev)) {
+        current_apic_id = s->initial_apic_id;
+    } else {
+        current_apic_id = s->id;
+    }
 
     switch (dest_shorthand) {
     case 0:
         apic_get_delivery_bitmask(deliver_bitmask, dest, dest_mode);
         break;
     case 1:
-        memset(deliver_bitmask, 0x00, sizeof(deliver_bitmask));
-        apic_set_bit(deliver_bitmask, s->id);
+        memset(deliver_bitmask, 0x00, deliver_bitmask_size);
+        apic_set_bit(deliver_bitmask, current_apic_id);
         break;
     case 2:
-        memset(deliver_bitmask, 0xff, sizeof(deliver_bitmask));
+        memset(deliver_bitmask, 0xff, deliver_bitmask_size);
         break;
     case 3:
-        memset(deliver_bitmask, 0xff, sizeof(deliver_bitmask));
-        apic_reset_bit(deliver_bitmask, s->id);
+        memset(deliver_bitmask, 0xff, deliver_bitmask_size);
+        apic_reset_bit(deliver_bitmask, current_apic_id);
         break;
     }
 
@@ -562,6 +647,7 @@ static void apic_deliver(DeviceState *dev, uint8_t dest, uint8_t dest_mode,
     }
 
     apic_bus_deliver(deliver_bitmask, delivery_mode, vector_num, trigger_mode);
+    g_free(deliver_bitmask);
 }
 
 static bool apic_check_pic(APICCommonState *s)
@@ -657,7 +743,11 @@ uint64_t apic_register_read(int index)
 
     switch(index) {
     case 0x02: /* id */
-        val = s->id << 24;
+        if (is_x2apic_mode(dev)) {
+            val = s->initial_apic_id;
+        } else {
+            val = s->id << 24;
+        }
         break;
     case 0x03: /* version */
         val = s->version | ((APIC_LVT_NB - 1) << 16);
@@ -680,9 +770,17 @@ uint64_t apic_register_read(int index)
         val = 0;
         break;
     case 0x0d:
-        val = s->log_dest << 24;
+        if (is_x2apic_mode(dev)) {
+            val = s->extended_log_dest;
+        } else {
+            val = s->log_dest << 24;
+        }
         break;
     case 0x0e:
+        if (is_x2apic_mode(dev)) {
+            raise_exception_ra(&s->cpu->env, EXCP0D_GPF, GETPC());
+        }
+
         val = (s->dest_mode << 28) | 0xfffffff;
         break;
     case 0x0f:
@@ -745,7 +843,12 @@ static void apic_send_msi(MSIMessage *msi)
 {
     uint64_t addr = msi->address;
     uint32_t data = msi->data;
-    uint8_t dest = (addr & MSI_ADDR_DEST_ID_MASK) >> MSI_ADDR_DEST_ID_SHIFT;
+    uint32_t dest = (addr & MSI_ADDR_DEST_ID_MASK) >> MSI_ADDR_DEST_ID_SHIFT;
+    /*
+     * The higher 3 bytes of destination id is stored in higher word of
+     * msi address. See x86_iommu_irq_to_msi_message()
+     */
+    dest = dest | (addr >> 32);
     uint8_t vector = (data & MSI_DATA_VECTOR_MASK) >> MSI_DATA_VECTOR_SHIFT;
     uint8_t dest_mode = (addr >> MSI_ADDR_DEST_MODE_SHIFT) & 0x1;
     uint8_t trigger_mode = (data >> MSI_DATA_TRIGGER_SHIFT) & 0x1;
@@ -769,6 +872,10 @@ void apic_register_write(int index, uint64_t val)
 
     switch(index) {
     case 0x02:
+        if (is_x2apic_mode(dev)) {
+            raise_exception_ra(&s->cpu->env, EXCP0D_GPF, GETPC());
+        }
+
         s->id = (val >> 24);
         break;
     case 0x03:
@@ -788,9 +895,17 @@ void apic_register_write(int index, uint64_t val)
         apic_eoi(s);
         break;
     case 0x0d:
+        if (is_x2apic_mode(dev)) {
+            raise_exception_ra(&s->cpu->env, EXCP0D_GPF, GETPC());
+        }
+
         s->log_dest = val >> 24;
         break;
     case 0x0e:
+        if (is_x2apic_mode(dev)) {
+            raise_exception_ra(&s->cpu->env, EXCP0D_GPF, GETPC());
+        }
+
         s->dest_mode = val >> 28;
         break;
     case 0x0f:
@@ -802,13 +917,27 @@ void apic_register_write(int index, uint64_t val)
     case 0x20 ... 0x27:
     case 0x28:
         break;
-    case 0x30:
+    case 0x30: {
+        uint32_t dest;
+
         s->icr[0] = val;
-        apic_deliver(dev, (s->icr[1] >> 24) & 0xff, (s->icr[0] >> 11) & 1,
+        if (is_x2apic_mode(dev)) {
+            s->icr[1] = val >> 32;
+            dest = s->icr[1];
+        } else {
+            dest = (s->icr[1] >> 24) & 0xff;
+        }
+
+        apic_deliver(dev, dest, (s->icr[0] >> 11) & 1,
                      (s->icr[0] >> 8) & 7, (s->icr[0] & 0xff),
-                     (s->icr[0] >> 15) & 1);
+                     (s->icr[0] >> 15) & 1, (s->icr[0] >> 18) & 3);
         break;
+    }
     case 0x31:
+        if (is_x2apic_mode(dev)) {
+            raise_exception_ra(&s->cpu->env, EXCP0D_GPF, GETPC());
+        }
+
         s->icr[1] = val;
         break;
     case 0x32 ... 0x37:
@@ -837,6 +966,23 @@ void apic_register_write(int index, uint64_t val)
             s->count_shift = (v + 1) & 7;
         }
         break;
+    case 0x3f: {
+        int vector = val & 0xff;
+
+        if (!is_x2apic_mode(dev)) {
+            raise_exception_ra(&s->cpu->env, EXCP0D_GPF, GETPC());
+        }
+
+        /*
+         * Self IPI is identical to IPI with
+         * - Destination shorthand: 1 (Self)
+         * - Trigger mode: 0 (Edge)
+         * - Delivery mode: 0 (Fixed)
+         */
+        apic_deliver(dev, 0, 0, APIC_DM_FIXED, vector, 0, 1);
+
+        break;
+    }
     default:
         s->esr |= APIC_ESR_ILLEGAL_ADDRESS;
         break;
@@ -894,12 +1040,6 @@ static void apic_realize(DeviceState *dev, Error **errp)
 {
     APICCommonState *s = APIC(dev);
 
-    if (s->id >= MAX_APICS) {
-        error_setg(errp, "%s initialization failed. APIC ID %d is invalid",
-                   object_get_typename(OBJECT(dev)), s->id);
-        return;
-    }
-
     if (kvm_enabled()) {
         warn_report("Userspace local APIC is deprecated for KVM.");
         warn_report("Do not use kernel-irqchip except for the -M isapc machine type.");
@@ -909,7 +1049,7 @@ static void apic_realize(DeviceState *dev, Error **errp)
                           APIC_SPACE_SIZE);
 
     s->timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, apic_timer, s);
-    local_apics[s->id] = s;
+    local_apics[s->initial_apic_id] = s;
 
     msi_nonbroken = true;
 }
@@ -919,7 +1059,7 @@ static void apic_unrealize(DeviceState *dev)
     APICCommonState *s = APIC(dev);
 
     timer_free(s->timer);
-    local_apics[s->id] = NULL;
+    local_apics[s->initial_apic_id] = NULL;
 }
 
 static void apic_class_init(ObjectClass *klass, void *data)
diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c
index 4a34f03047..d95914066e 100644
--- a/hw/intc/apic_common.c
+++ b/hw/intc/apic_common.c
@@ -284,6 +284,10 @@ static void apic_common_realize(DeviceState *dev, Error **errp)
     }
     vmstate_register_with_alias_id(NULL, instance_id, &vmstate_apic_common,
                                    s, -1, 0, NULL);
+
+    /* APIC LDR in x2APIC mode */
+    s->extended_log_dest = ((s->initial_apic_id & 0xffff0) << 16) |
+                            (1 << (s->initial_apic_id & 0xf));
 }
 
 static void apic_common_unrealize(DeviceState *dev)
@@ -424,6 +428,11 @@ static void apic_common_set_id(Object *obj, Visitor *v, const char *name,
         return;
     }
 
+    if (value >= 255 && !cpu_has_x2apic_feature(&s->cpu->env)) {
+        error_setg(errp, "APIC ID %d requires x2APIC feature in CPU", value);
+        return;
+    }
+
     s->initial_apic_id = value;
     s->id = (uint8_t)value;
 }
diff --git a/include/hw/i386/apic.h b/include/hw/i386/apic.h
index 2cebeb4faf..12aad09f4c 100644
--- a/include/hw/i386/apic.h
+++ b/include/hw/i386/apic.h
@@ -3,8 +3,7 @@
 
 
 /* apic.c */
-void apic_deliver_irq(uint8_t dest, uint8_t dest_mode, uint8_t delivery_mode,
-                      uint8_t vector_num, uint8_t trigger_mode);
+void apic_set_max_apic_id(uint32_t max_apic_id);
 int apic_accept_pic_intr(DeviceState *s);
 void apic_deliver_pic_intr(DeviceState *s, int level);
 void apic_deliver_nmi(DeviceState *d);
diff --git a/include/hw/i386/apic_internal.h b/include/hw/i386/apic_internal.h
index 5f2ba24bfc..e796e6cae3 100644
--- a/include/hw/i386/apic_internal.h
+++ b/include/hw/i386/apic_internal.h
@@ -46,8 +46,10 @@
 #define APIC_DM_EXTINT                  7
 
 /* APIC destination mode */
-#define APIC_DESTMODE_FLAT              0xf
-#define APIC_DESTMODE_CLUSTER           1
+#define APIC_DESTMODE_PHYSICAL          0
+#define APIC_DESTMODE_LOGICAL           1
+#define APIC_DESTMODE_LOGICAL_FLAT      0xf
+#define APIC_DESTMODE_LOGICAL_CLUSTER   0
 
 #define APIC_TRIGGER_EDGE               0
 #define APIC_TRIGGER_LEVEL              1
@@ -187,6 +189,7 @@ struct APICCommonState {
     DeviceState *vapic;
     hwaddr vapic_paddr; /* note: persistence via kvmvapic */
     bool legacy_instance_id;
+    uint32_t extended_log_dest;
 };
 
 typedef struct VAPICState {
diff --git a/target/i386/cpu-sysemu.c b/target/i386/cpu-sysemu.c
index 28115edf44..a9ff10c517 100644
--- a/target/i386/cpu-sysemu.c
+++ b/target/i386/cpu-sysemu.c
@@ -281,11 +281,17 @@ void x86_cpu_apic_create(X86CPU *cpu, Error **errp)
                               OBJECT(cpu->apic_state));
     object_unref(OBJECT(cpu->apic_state));
 
-    qdev_prop_set_uint32(cpu->apic_state, "id", cpu->apic_id);
     /* TODO: convert to link<> */
     apic = APIC_COMMON(cpu->apic_state);
     apic->cpu = cpu;
     apic->apicbase = APIC_DEFAULT_ADDRESS | MSR_IA32_APICBASE_ENABLE;
+
+    /*
+     * apic_common_set_id needs to check if the CPU has x2APIC
+     * feature in case APIC ID >= 255, so we need to set apic->cpu
+     * before setting APIC ID
+     */
+    qdev_prop_set_uint32(cpu->apic_state, "id", cpu->apic_id);
 }
 
 void x86_cpu_apic_realize(X86CPU *cpu, Error **errp)
-- 
2.25.1
[REPOST PATCH v3 3/5] apic, i386/tcg: add x2apic transitions
Posted by Bui Quang Minh 1 year ago
This commit adds support for x2APIC transitions when writing to
MSR_IA32_APICBASE register and finally adds CPUID_EXT_X2APIC to
TCG_EXT_FEATURES.

Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
---
 hw/intc/apic.c                       | 50 ++++++++++++++++++++++++++++
 hw/intc/apic_common.c                |  7 ++--
 target/i386/cpu-sysemu.c             | 10 ++++++
 target/i386/cpu.c                    |  5 +--
 target/i386/cpu.h                    |  6 ++++
 target/i386/tcg/sysemu/misc_helper.c |  4 +++
 6 files changed, 75 insertions(+), 7 deletions(-)

diff --git a/hw/intc/apic.c b/hw/intc/apic.c
index 0f2f29e679..716fe865d7 100644
--- a/hw/intc/apic.c
+++ b/hw/intc/apic.c
@@ -309,8 +309,41 @@ bool is_x2apic_mode(DeviceState *dev)
     return s->apicbase & MSR_IA32_APICBASE_EXTD;
 }
 
+static void apic_set_base_check(APICCommonState *s, uint64_t val)
+{
+    /* Enable x2apic when x2apic is not supported by CPU */
+    if (!cpu_has_x2apic_feature(&s->cpu->env) &&
+        val & MSR_IA32_APICBASE_EXTD)
+        raise_exception_ra(&s->cpu->env, EXCP0D_GPF, GETPC());
+
+    /*
+     * Transition into invalid state
+     * (s->apicbase & MSR_IA32_APICBASE_ENABLE == 0) &&
+     * (s->apicbase & MSR_IA32_APICBASE_EXTD) == 1
+     */
+    if (!(val & MSR_IA32_APICBASE_ENABLE) &&
+        (val & MSR_IA32_APICBASE_EXTD))
+        raise_exception_ra(&s->cpu->env, EXCP0D_GPF, GETPC());
+
+    /* Invalid transition from disabled mode to x2APIC */
+    if (!(s->apicbase & MSR_IA32_APICBASE_ENABLE) &&
+        !(s->apicbase & MSR_IA32_APICBASE_EXTD) &&
+        (val & MSR_IA32_APICBASE_ENABLE) &&
+        (val & MSR_IA32_APICBASE_EXTD))
+        raise_exception_ra(&s->cpu->env, EXCP0D_GPF, GETPC());
+
+    /* Invalid transition from x2APIC to xAPIC */
+    if ((s->apicbase & MSR_IA32_APICBASE_ENABLE) &&
+        (s->apicbase & MSR_IA32_APICBASE_EXTD) &&
+        (val & MSR_IA32_APICBASE_ENABLE) &&
+        !(val & MSR_IA32_APICBASE_EXTD))
+        raise_exception_ra(&s->cpu->env, EXCP0D_GPF, GETPC());
+}
+
 static void apic_set_base(APICCommonState *s, uint64_t val)
 {
+    apic_set_base_check(s, val);
+
     s->apicbase = (val & 0xfffff000) |
         (s->apicbase & (MSR_IA32_APICBASE_BSP | MSR_IA32_APICBASE_ENABLE));
     /* if disabled, cannot be enabled again */
@@ -319,6 +352,23 @@ static void apic_set_base(APICCommonState *s, uint64_t val)
         cpu_clear_apic_feature(&s->cpu->env);
         s->spurious_vec &= ~APIC_SV_ENABLE;
     }
+
+    /* Transition from disabled mode to xAPIC */
+    if (!(s->apicbase & MSR_IA32_APICBASE_ENABLE) &&
+        (val & MSR_IA32_APICBASE_ENABLE)) {
+        s->apicbase |= MSR_IA32_APICBASE_ENABLE;
+        cpu_set_apic_feature(&s->cpu->env);
+    }
+
+    /* Transition from xAPIC to x2APIC */
+    if (cpu_has_x2apic_feature(&s->cpu->env) &&
+        !(s->apicbase & MSR_IA32_APICBASE_EXTD) &&
+        (val & MSR_IA32_APICBASE_EXTD)) {
+        s->apicbase |= MSR_IA32_APICBASE_EXTD;
+
+        s->log_dest = ((s->initial_apic_id & 0xffff0) << 16) |
+                      (1 << (s->initial_apic_id & 0xf));
+    }
 }
 
 static void apic_set_tpr(APICCommonState *s, uint8_t val)
diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c
index d95914066e..396f828be8 100644
--- a/hw/intc/apic_common.c
+++ b/hw/intc/apic_common.c
@@ -43,11 +43,8 @@ void cpu_set_apic_base(DeviceState *dev, uint64_t val)
     if (dev) {
         APICCommonState *s = APIC_COMMON(dev);
         APICCommonClass *info = APIC_COMMON_GET_CLASS(s);
-        /* switching to x2APIC, reset possibly modified xAPIC ID */
-        if (!(s->apicbase & MSR_IA32_APICBASE_EXTD) &&
-            (val & MSR_IA32_APICBASE_EXTD)) {
-            s->id = s->initial_apic_id;
-        }
+        /* Reset possibly modified xAPIC ID */
+        s->id = s->initial_apic_id;
         info->set_base(s, val);
     }
 }
diff --git a/target/i386/cpu-sysemu.c b/target/i386/cpu-sysemu.c
index a9ff10c517..f6bbe33372 100644
--- a/target/i386/cpu-sysemu.c
+++ b/target/i386/cpu-sysemu.c
@@ -235,6 +235,16 @@ void cpu_clear_apic_feature(CPUX86State *env)
     env->features[FEAT_1_EDX] &= ~CPUID_APIC;
 }
 
+void cpu_set_apic_feature(CPUX86State *env)
+{
+    env->features[FEAT_1_EDX] |= CPUID_APIC;
+}
+
+bool cpu_has_x2apic_feature(CPUX86State *env)
+{
+    return env->features[FEAT_1_ECX] & CPUID_EXT_X2APIC;
+}
+
 bool cpu_is_bsp(X86CPU *cpu)
 {
     return cpu_get_apic_base(cpu->apic_state) & MSR_IA32_APICBASE_BSP;
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 6576287e5b..6847b2ae02 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -627,12 +627,13 @@ void x86_cpu_vendor_words2str(char *dst, uint32_t vendor1,
           CPUID_EXT_XSAVE | /* CPUID_EXT_OSXSAVE is dynamic */   \
           CPUID_EXT_MOVBE | CPUID_EXT_AES | CPUID_EXT_HYPERVISOR | \
           CPUID_EXT_RDRAND | CPUID_EXT_AVX | CPUID_EXT_F16C | \
-          CPUID_EXT_FMA)
+          CPUID_EXT_FMA | CPUID_EXT_X2APIC)
           /* missing:
           CPUID_EXT_DTES64, CPUID_EXT_DSCPL, CPUID_EXT_VMX, CPUID_EXT_SMX,
           CPUID_EXT_EST, CPUID_EXT_TM2, CPUID_EXT_CID,
           CPUID_EXT_XTPR, CPUID_EXT_PDCM, CPUID_EXT_PCID, CPUID_EXT_DCA,
-          CPUID_EXT_X2APIC, CPUID_EXT_TSC_DEADLINE_TIMER */
+          CPUID_EXT_TSC_DEADLINE_TIMER
+          */
 
 #ifdef TARGET_X86_64
 #define TCG_EXT2_X86_64_FEATURES (CPUID_EXT2_SYSCALL | CPUID_EXT2_LM)
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 02165a5ad2..53cd8f01dd 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -379,6 +379,10 @@ typedef enum X86Seg {
 #define MSR_IA32_APICBASE_ENABLE        (1<<11)
 #define MSR_IA32_APICBASE_EXTD          (1 << 10)
 #define MSR_IA32_APICBASE_BASE          (0xfffffU<<12)
+#define MSR_IA32_APICBASE_RESERVED \
+        (~(uint64_t)(MSR_IA32_APICBASE_BSP | MSR_IA32_APICBASE_ENABLE \
+                     | MSR_IA32_APICBASE_EXTD | MSR_IA32_APICBASE_BASE))
+
 #define MSR_IA32_FEATURE_CONTROL        0x0000003a
 #define MSR_TSC_ADJUST                  0x0000003b
 #define MSR_IA32_SPEC_CTRL              0x48
@@ -2158,8 +2162,10 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
                    uint32_t *eax, uint32_t *ebx,
                    uint32_t *ecx, uint32_t *edx);
 void cpu_clear_apic_feature(CPUX86State *env);
+void cpu_set_apic_feature(CPUX86State *env);
 void host_cpuid(uint32_t function, uint32_t count,
                 uint32_t *eax, uint32_t *ebx, uint32_t *ecx, uint32_t *edx);
+bool cpu_has_x2apic_feature(CPUX86State *env);
 
 /* helper.c */
 void x86_cpu_set_a20(X86CPU *cpu, int a20_state);
diff --git a/target/i386/tcg/sysemu/misc_helper.c b/target/i386/tcg/sysemu/misc_helper.c
index 1fce2076a3..91a58d4d97 100644
--- a/target/i386/tcg/sysemu/misc_helper.c
+++ b/target/i386/tcg/sysemu/misc_helper.c
@@ -159,6 +159,10 @@ void helper_wrmsr(CPUX86State *env)
         env->sysenter_eip = val;
         break;
     case MSR_IA32_APICBASE:
+        if (val & MSR_IA32_APICBASE_RESERVED) {
+            goto error;
+        }
+
         cpu_set_apic_base(env_archcpu(env)->apic_state, val);
         break;
     case MSR_EFER:
-- 
2.25.1
[REPOST PATCH v3 4/5] intel_iommu: allow Extended Interrupt Mode when using userspace APIC
Posted by Bui Quang Minh 1 year ago
As userspace APIC now supports x2APIC, intel interrupt remapping
hardware can be set to EIM mode when userspace local APIC is used.

Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
---
 hw/i386/intel_iommu.c | 11 -----------
 1 file changed, 11 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index a62896759c..fd7c16b852 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -4045,17 +4045,6 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp)
                       && x86_iommu_ir_supported(x86_iommu) ?
                                               ON_OFF_AUTO_ON : ON_OFF_AUTO_OFF;
     }
-    if (s->intr_eim == ON_OFF_AUTO_ON && !s->buggy_eim) {
-        if (!kvm_irqchip_is_split()) {
-            error_setg(errp, "eim=on requires accel=kvm,kernel-irqchip=split");
-            return false;
-        }
-        if (!kvm_enable_x2apic()) {
-            error_setg(errp, "eim=on requires support on the KVM side"
-                             "(X2APIC_API, first shipped in v4.7)");
-            return false;
-        }
-    }
 
     /* Currently only address widths supported are 39 and 48 bits */
     if ((s->aw_bits != VTD_HOST_AW_39BIT) &&
-- 
2.25.1
Re: [REPOST PATCH v3 4/5] intel_iommu: allow Extended Interrupt Mode when using userspace APIC
Posted by Michael S. Tsirkin 11 months, 2 weeks ago
On Tue, Apr 11, 2023 at 09:24:39PM +0700, Bui Quang Minh wrote:
> As userspace APIC now supports x2APIC, intel interrupt remapping
> hardware can be set to EIM mode when userspace local APIC is used.
> 
> Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
> ---
>  hw/i386/intel_iommu.c | 11 -----------
>  1 file changed, 11 deletions(-)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index a62896759c..fd7c16b852 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -4045,17 +4045,6 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp)
>                        && x86_iommu_ir_supported(x86_iommu) ?
>                                                ON_OFF_AUTO_ON : ON_OFF_AUTO_OFF;
>      }
> -    if (s->intr_eim == ON_OFF_AUTO_ON && !s->buggy_eim) {
> -        if (!kvm_irqchip_is_split()) {
> -            error_setg(errp, "eim=on requires accel=kvm,kernel-irqchip=split");
> -            return false;
> -        }
> -        if (!kvm_enable_x2apic()) {
> -            error_setg(errp, "eim=on requires support on the KVM side"
> -                             "(X2APIC_API, first shipped in v4.7)");
> -            return false;
> -        }
> -    }
>  
>      /* Currently only address widths supported are 39 and 48 bits */
>      if ((s->aw_bits != VTD_HOST_AW_39BIT) &&


Paolo I think if you ack the kvm bits, I can merge.

> -- 
> 2.25.1
> 
>
[REPOST PATCH v3 5/5] amd_iommu: report x2APIC support to the operating system
Posted by Bui Quang Minh 1 year ago
This commit adds XTSup configuration to let user choose to whether enable
this feature or not. When XTSup is enabled, additional bytes in IRTE with
enabled guest virtual VAPIC are used to support 32-bit destination id.

Additionally, this commit changes to use IVHD type 0x11 in ACPI table for
feature report to operating system. This is because Linux does not use
XTSup in IOMMU Feature Reporting field of IVHD type 0x10 but only use XTSup
bit in EFR Register Image of IVHD 0x11 to indicate x2APIC support (see
init_iommu_one in linux/drivers/iommu/amd/init.c)

Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
---
 hw/i386/acpi-build.c | 28 ++++++++++++++--------------
 hw/i386/amd_iommu.c  | 21 +++++++++++++++++++--
 hw/i386/amd_iommu.h  | 16 +++++++++++-----
 3 files changed, 44 insertions(+), 21 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index ec857a117e..72d6bb2892 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2339,7 +2339,7 @@ static void
 build_amd_iommu(GArray *table_data, BIOSLinker *linker, const char *oem_id,
                 const char *oem_table_id)
 {
-    int ivhd_table_len = 24;
+    int ivhd_table_len = 40;
     AMDVIState *s = AMD_IOMMU_DEVICE(x86_iommu_get_default());
     GArray *ivhd_blob = g_array_new(false, true, 1);
     AcpiTable table = { .sig = "IVRS", .rev = 1, .oem_id = oem_id,
@@ -2349,18 +2349,19 @@ build_amd_iommu(GArray *table_data, BIOSLinker *linker, const char *oem_id,
     /* IVinfo - IO virtualization information common to all
      * IOMMU units in a system
      */
-    build_append_int_noprefix(table_data, 40UL << 8/* PASize */, 4);
+    build_append_int_noprefix(table_data,
+                             (1UL << 0) | /* EFRSup */
+                             (40UL << 8), /* PASize */
+                             4);
     /* reserved */
     build_append_int_noprefix(table_data, 0, 8);
 
-    /* IVHD definition - type 10h */
-    build_append_int_noprefix(table_data, 0x10, 1);
+    /* IVHD definition - type 11h */
+    build_append_int_noprefix(table_data, 0x11, 1);
     /* virtualization flags */
     build_append_int_noprefix(table_data,
                              (1UL << 0) | /* HtTunEn      */
-                             (1UL << 4) | /* iotblSup     */
-                             (1UL << 6) | /* PrefSup      */
-                             (1UL << 7),  /* PPRSup       */
+                             (1UL << 4),  /* iotblSup     */
                              1);
 
     /*
@@ -2404,13 +2405,12 @@ build_amd_iommu(GArray *table_data, BIOSLinker *linker, const char *oem_id,
     build_append_int_noprefix(table_data, 0, 2);
     /* IOMMU info */
     build_append_int_noprefix(table_data, 0, 2);
-    /* IOMMU Feature Reporting */
-    build_append_int_noprefix(table_data,
-                             (48UL << 30) | /* HATS   */
-                             (48UL << 28) | /* GATS   */
-                             (1UL << 2)   | /* GTSup  */
-                             (1UL << 6),    /* GASup  */
-                             4);
+    /* IOMMU Attributes */
+    build_append_int_noprefix(table_data, 0, 4);
+    /* EFR Register Image */
+    build_append_int_noprefix(table_data, s->efr_reg, 8);
+    /* EFR Register Image 2 */
+    build_append_int_noprefix(table_data, 0, 8);
 
     /* IVHD entries as found above */
     g_array_append_vals(table_data, ivhd_blob->data, ivhd_blob->len);
diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
index bcd016f5c5..5dfa93d945 100644
--- a/hw/i386/amd_iommu.c
+++ b/hw/i386/amd_iommu.c
@@ -31,6 +31,7 @@
 #include "hw/i386/apic_internal.h"
 #include "trace.h"
 #include "hw/i386/apic-msidef.h"
+#include "hw/qdev-properties.h"
 
 /* used AMD-Vi MMIO registers */
 const char *amdvi_mmio_low[] = {
@@ -1155,7 +1156,12 @@ static int amdvi_int_remap_ga(AMDVIState *iommu,
     irq->vector = irte.hi.fields.vector;
     irq->dest_mode = irte.lo.fields_remap.dm;
     irq->redir_hint = irte.lo.fields_remap.rq_eoi;
-    irq->dest = irte.lo.fields_remap.destination;
+    if (!iommu->xtsup) {
+        irq->dest = irte.lo.fields_remap.destination & 0xff;
+    } else {
+        irq->dest = irte.lo.fields_remap.destination |
+                    (irte.hi.fields.destination_hi << 24);
+    }
 
     return 0;
 }
@@ -1503,10 +1509,15 @@ static void amdvi_init(AMDVIState *s)
     s->enabled = false;
     s->ats_enabled = false;
     s->cmdbuf_enabled = false;
+    s->efr_reg = AMDVI_DEFAULT_EXT_FEATURES;
+
+    if (s->xtsup) {
+        s->efr_reg |= AMDVI_FEATURE_XT;
+    }
 
     /* reset MMIO */
     memset(s->mmior, 0, AMDVI_MMIO_SIZE);
-    amdvi_set_quad(s, AMDVI_MMIO_EXT_FEATURES, AMDVI_EXT_FEATURES,
+    amdvi_set_quad(s, AMDVI_MMIO_EXT_FEATURES, s->efr_reg,
             0xffffffffffffffef, 0);
     amdvi_set_quad(s, AMDVI_MMIO_STATUS, 0, 0x98, 0x67);
 
@@ -1586,6 +1597,11 @@ static void amdvi_sysbus_realize(DeviceState *dev, Error **errp)
     amdvi_init(s);
 }
 
+static Property amdvi_properties[] = {
+    DEFINE_PROP_BOOL("xtsup", AMDVIState, xtsup, false),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
 static const VMStateDescription vmstate_amdvi_sysbus = {
     .name = "amd-iommu",
     .unmigratable = 1
@@ -1612,6 +1628,7 @@ static void amdvi_sysbus_class_init(ObjectClass *klass, void *data)
     dc->user_creatable = true;
     set_bit(DEVICE_CATEGORY_MISC, dc->categories);
     dc->desc = "AMD IOMMU (AMD-Vi) DMA Remapping device";
+    device_class_set_props(dc, amdvi_properties);
 }
 
 static const TypeInfo amdvi_sysbus = {
diff --git a/hw/i386/amd_iommu.h b/hw/i386/amd_iommu.h
index 79d38a3e41..96df7b0400 100644
--- a/hw/i386/amd_iommu.h
+++ b/hw/i386/amd_iommu.h
@@ -154,6 +154,7 @@
 
 #define AMDVI_FEATURE_PREFETCH            (1ULL << 0) /* page prefetch       */
 #define AMDVI_FEATURE_PPR                 (1ULL << 1) /* PPR Support         */
+#define AMDVI_FEATURE_XT                  (1ULL << 2) /* x2APIC Support      */
 #define AMDVI_FEATURE_GT                  (1ULL << 4) /* Guest Translation   */
 #define AMDVI_FEATURE_IA                  (1ULL << 6) /* inval all support   */
 #define AMDVI_FEATURE_GA                  (1ULL << 7) /* guest VAPIC support */
@@ -173,8 +174,9 @@
 #define AMDVI_IOTLB_MAX_SIZE 1024
 #define AMDVI_DEVID_SHIFT    36
 
-/* extended feature support */
-#define AMDVI_EXT_FEATURES (AMDVI_FEATURE_PREFETCH | AMDVI_FEATURE_PPR | \
+/* default extended feature */
+#define AMDVI_DEFAULT_EXT_FEATURES \
+        (AMDVI_FEATURE_PREFETCH | AMDVI_FEATURE_PPR | \
         AMDVI_FEATURE_IA | AMDVI_FEATURE_GT | AMDVI_FEATURE_HE | \
         AMDVI_GATS_MODE | AMDVI_HATS_MODE | AMDVI_FEATURE_GA)
 
@@ -278,8 +280,8 @@ union irte_ga_lo {
                 dm:1,
                 /* ------ */
                 guest_mode:1,
-                destination:8,
-                rsvd_1:48;
+                destination:24,
+                rsvd_1:32;
   } fields_remap;
 };
 
@@ -287,7 +289,8 @@ union irte_ga_hi {
   uint64_t val;
   struct {
       uint64_t  vector:8,
-                rsvd_2:56;
+                rsvd_2:48,
+                destination_hi:8;
   } fields;
 };
 
@@ -367,6 +370,9 @@ struct AMDVIState {
 
     /* Interrupt remapping */
     bool ga_enabled;
+    bool xtsup;
+
+    uint64_t efr_reg;            /* extended feature register */
 };
 
 #endif
-- 
2.25.1
Re: [REPOST PATCH v3 5/5] amd_iommu: report x2APIC support to the operating system
Posted by Michael S. Tsirkin 11 months, 3 weeks ago
On Tue, Apr 11, 2023 at 09:24:40PM +0700, Bui Quang Minh wrote:
> This commit adds XTSup configuration to let user choose to whether enable
> this feature or not. When XTSup is enabled, additional bytes in IRTE with
> enabled guest virtual VAPIC are used to support 32-bit destination id.
> 
> Additionally, this commit changes to use IVHD type 0x11 in ACPI table for
> feature report to operating system. This is because Linux does not use
> XTSup in IOMMU Feature Reporting field of IVHD type 0x10 but only use XTSup
> bit in EFR Register Image of IVHD 0x11 to indicate x2APIC support (see
> init_iommu_one in linux/drivers/iommu/amd/init.c)
> 
> Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>

I'm concerned that switching to type 11 will break some older guests.
It would be better if we could export both type 10 and type 11
ivhd. A question however would be how does this interact
with older guests. For example:
https://lists.linuxfoundation.org/pipermail/iommu/2016-January/015310.html
it looks like linux before 2016 only expected one ivhd entry?

Some research and testing here would be benefitial.
Similarly for windows guests.

Thanks!



> ---
>  hw/i386/acpi-build.c | 28 ++++++++++++++--------------
>  hw/i386/amd_iommu.c  | 21 +++++++++++++++++++--
>  hw/i386/amd_iommu.h  | 16 +++++++++++-----
>  3 files changed, 44 insertions(+), 21 deletions(-)
> 
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index ec857a117e..72d6bb2892 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -2339,7 +2339,7 @@ static void
>  build_amd_iommu(GArray *table_data, BIOSLinker *linker, const char *oem_id,
>                  const char *oem_table_id)
>  {
> -    int ivhd_table_len = 24;
> +    int ivhd_table_len = 40;
>      AMDVIState *s = AMD_IOMMU_DEVICE(x86_iommu_get_default());
>      GArray *ivhd_blob = g_array_new(false, true, 1);
>      AcpiTable table = { .sig = "IVRS", .rev = 1, .oem_id = oem_id,
> @@ -2349,18 +2349,19 @@ build_amd_iommu(GArray *table_data, BIOSLinker *linker, const char *oem_id,
>      /* IVinfo - IO virtualization information common to all
>       * IOMMU units in a system
>       */
> -    build_append_int_noprefix(table_data, 40UL << 8/* PASize */, 4);
> +    build_append_int_noprefix(table_data,
> +                             (1UL << 0) | /* EFRSup */
> +                             (40UL << 8), /* PASize */
> +                             4);
>      /* reserved */
>      build_append_int_noprefix(table_data, 0, 8);
>  
> -    /* IVHD definition - type 10h */
> -    build_append_int_noprefix(table_data, 0x10, 1);
> +    /* IVHD definition - type 11h */
> +    build_append_int_noprefix(table_data, 0x11, 1);
>      /* virtualization flags */
>      build_append_int_noprefix(table_data,
>                               (1UL << 0) | /* HtTunEn      */
> -                             (1UL << 4) | /* iotblSup     */

btw this should have been iotlbsup?

> -                             (1UL << 6) | /* PrefSup      */
> -                             (1UL << 7),  /* PPRSup       */
> +                             (1UL << 4),  /* iotblSup     */
>                               1);
>  
>      /*

hmm why are you removing these other flags?

> @@ -2404,13 +2405,12 @@ build_amd_iommu(GArray *table_data, BIOSLinker *linker, const char *oem_id,
>      build_append_int_noprefix(table_data, 0, 2);
>      /* IOMMU info */
>      build_append_int_noprefix(table_data, 0, 2);
> -    /* IOMMU Feature Reporting */
> -    build_append_int_noprefix(table_data,
> -                             (48UL << 30) | /* HATS   */
> -                             (48UL << 28) | /* GATS   */
> -                             (1UL << 2)   | /* GTSup  */
> -                             (1UL << 6),    /* GASup  */
> -                             4);
> +    /* IOMMU Attributes */
> +    build_append_int_noprefix(table_data, 0, 4);
> +    /* EFR Register Image */
> +    build_append_int_noprefix(table_data, s->efr_reg, 8);
> +    /* EFR Register Image 2 */
> +    build_append_int_noprefix(table_data, 0, 8);


here too. what's going on?

>  
>      /* IVHD entries as found above */
>      g_array_append_vals(table_data, ivhd_blob->data, ivhd_blob->len);
> diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
> index bcd016f5c5..5dfa93d945 100644
> --- a/hw/i386/amd_iommu.c
> +++ b/hw/i386/amd_iommu.c
> @@ -31,6 +31,7 @@
>  #include "hw/i386/apic_internal.h"
>  #include "trace.h"
>  #include "hw/i386/apic-msidef.h"
> +#include "hw/qdev-properties.h"
>  
>  /* used AMD-Vi MMIO registers */
>  const char *amdvi_mmio_low[] = {
> @@ -1155,7 +1156,12 @@ static int amdvi_int_remap_ga(AMDVIState *iommu,
>      irq->vector = irte.hi.fields.vector;
>      irq->dest_mode = irte.lo.fields_remap.dm;
>      irq->redir_hint = irte.lo.fields_remap.rq_eoi;
> -    irq->dest = irte.lo.fields_remap.destination;
> +    if (!iommu->xtsup) {
> +        irq->dest = irte.lo.fields_remap.destination & 0xff;
> +    } else {
> +        irq->dest = irte.lo.fields_remap.destination |
> +                    (irte.hi.fields.destination_hi << 24);
> +    }

Weird way to put it. Why not if (iommu->xtsup) ... ?

>  
>      return 0;
>  }
> @@ -1503,10 +1509,15 @@ static void amdvi_init(AMDVIState *s)
>      s->enabled = false;
>      s->ats_enabled = false;
>      s->cmdbuf_enabled = false;
> +    s->efr_reg = AMDVI_DEFAULT_EXT_FEATURES;
> +
> +    if (s->xtsup) {
> +        s->efr_reg |= AMDVI_FEATURE_XT;
> +    }
>  
>      /* reset MMIO */
>      memset(s->mmior, 0, AMDVI_MMIO_SIZE);
> -    amdvi_set_quad(s, AMDVI_MMIO_EXT_FEATURES, AMDVI_EXT_FEATURES,
> +    amdvi_set_quad(s, AMDVI_MMIO_EXT_FEATURES, s->efr_reg,
>              0xffffffffffffffef, 0);
>      amdvi_set_quad(s, AMDVI_MMIO_STATUS, 0, 0x98, 0x67);
>  
> @@ -1586,6 +1597,11 @@ static void amdvi_sysbus_realize(DeviceState *dev, Error **errp)
>      amdvi_init(s);
>  }
>  
> +static Property amdvi_properties[] = {
> +    DEFINE_PROP_BOOL("xtsup", AMDVIState, xtsup, false),
> +    DEFINE_PROP_END_OF_LIST(),
> +};
> +
>  static const VMStateDescription vmstate_amdvi_sysbus = {
>      .name = "amd-iommu",
>      .unmigratable = 1
> @@ -1612,6 +1628,7 @@ static void amdvi_sysbus_class_init(ObjectClass *klass, void *data)
>      dc->user_creatable = true;
>      set_bit(DEVICE_CATEGORY_MISC, dc->categories);
>      dc->desc = "AMD IOMMU (AMD-Vi) DMA Remapping device";
> +    device_class_set_props(dc, amdvi_properties);
>  }
>  
>  static const TypeInfo amdvi_sysbus = {
> diff --git a/hw/i386/amd_iommu.h b/hw/i386/amd_iommu.h
> index 79d38a3e41..96df7b0400 100644
> --- a/hw/i386/amd_iommu.h
> +++ b/hw/i386/amd_iommu.h
> @@ -154,6 +154,7 @@
>  
>  #define AMDVI_FEATURE_PREFETCH            (1ULL << 0) /* page prefetch       */
>  #define AMDVI_FEATURE_PPR                 (1ULL << 1) /* PPR Support         */
> +#define AMDVI_FEATURE_XT                  (1ULL << 2) /* x2APIC Support      */
>  #define AMDVI_FEATURE_GT                  (1ULL << 4) /* Guest Translation   */
>  #define AMDVI_FEATURE_IA                  (1ULL << 6) /* inval all support   */
>  #define AMDVI_FEATURE_GA                  (1ULL << 7) /* guest VAPIC support */
> @@ -173,8 +174,9 @@
>  #define AMDVI_IOTLB_MAX_SIZE 1024
>  #define AMDVI_DEVID_SHIFT    36
>  
> -/* extended feature support */
> -#define AMDVI_EXT_FEATURES (AMDVI_FEATURE_PREFETCH | AMDVI_FEATURE_PPR | \
> +/* default extended feature */
> +#define AMDVI_DEFAULT_EXT_FEATURES \
> +        (AMDVI_FEATURE_PREFETCH | AMDVI_FEATURE_PPR | \
>          AMDVI_FEATURE_IA | AMDVI_FEATURE_GT | AMDVI_FEATURE_HE | \
>          AMDVI_GATS_MODE | AMDVI_HATS_MODE | AMDVI_FEATURE_GA)
>  
> @@ -278,8 +280,8 @@ union irte_ga_lo {
>                  dm:1,
>                  /* ------ */
>                  guest_mode:1,
> -                destination:8,
> -                rsvd_1:48;
> +                destination:24,
> +                rsvd_1:32;
>    } fields_remap;
>  };
>  
> @@ -287,7 +289,8 @@ union irte_ga_hi {
>    uint64_t val;
>    struct {
>        uint64_t  vector:8,
> -                rsvd_2:56;
> +                rsvd_2:48,
> +                destination_hi:8;
>    } fields;
>  };
>  
> @@ -367,6 +370,9 @@ struct AMDVIState {
>  
>      /* Interrupt remapping */
>      bool ga_enabled;
> +    bool xtsup;
> +
> +    uint64_t efr_reg;            /* extended feature register */
>  };
>  
>  #endif
> -- 
> 2.25.1
Re: [REPOST PATCH v3 5/5] amd_iommu: report x2APIC support to the operating system
Posted by Bui Quang Minh 11 months, 3 weeks ago
On 5/12/23 21:39, Michael S. Tsirkin wrote:
> On Tue, Apr 11, 2023 at 09:24:40PM +0700, Bui Quang Minh wrote:
>> This commit adds XTSup configuration to let user choose to whether enable
>> this feature or not. When XTSup is enabled, additional bytes in IRTE with
>> enabled guest virtual VAPIC are used to support 32-bit destination id.
>>
>> Additionally, this commit changes to use IVHD type 0x11 in ACPI table for
>> feature report to operating system. This is because Linux does not use
>> XTSup in IOMMU Feature Reporting field of IVHD type 0x10 but only use XTSup
>> bit in EFR Register Image of IVHD 0x11 to indicate x2APIC support (see
>> init_iommu_one in linux/drivers/iommu/amd/init.c)
>>
>> Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
> 
> I'm concerned that switching to type 11 will break some older guests.
> It would be better if we could export both type 10 and type 11
> ivhd. A question however would be how does this interact
> with older guests. For example:
> https://lists.linuxfoundation.org/pipermail/iommu/2016-January/015310.html
> it looks like linux before 2016 only expected one ivhd entry?

Export both type 0x10 and 0x11 looks reasonable to me. Before the above 
commit, I see that Linux still loops through multiple ivhd but only 
handles one with type 0x10. On newer kernel, it will choose to handle 
the type that appears last corresponding the first devid, which is weird 
in my opinion.

+static u8 get_highest_supported_ivhd_type(struct acpi_table_header *ivrs)
+{
+	u8 *base = (u8 *)ivrs;
+	struct ivhd_header *ivhd = (struct ivhd_header *)
+					(base + IVRS_HEADER_LENGTH);
+	u8 last_type = ivhd->type;
+	u16 devid = ivhd->devid;
+
+	while (((u8 *)ivhd - base < ivrs->length) &&
+	       (ivhd->type <= ACPI_IVHD_TYPE_MAX_SUPPORTED)) {
+		u8 *p = (u8 *) ivhd;
+
+		if (ivhd->devid == devid)
+			last_type = ivhd->type;
+		ivhd = (struct ivhd_header *)(p + ivhd->length);
+	}
+
+	return last_type;
+}

So when exposing type 0x10 following by 0x11, old kernel only parses 
0x10 and does not support x2APIC while new kernel parse 0x11 and support 
x2APIC. I will expose both types in the next version.

> Some research and testing here would be benefitial.
> Similarly for windows guests.
> 
> Thanks!
> 
> 
> 
>> ---
>>   hw/i386/acpi-build.c | 28 ++++++++++++++--------------
>>   hw/i386/amd_iommu.c  | 21 +++++++++++++++++++--
>>   hw/i386/amd_iommu.h  | 16 +++++++++++-----
>>   3 files changed, 44 insertions(+), 21 deletions(-)
>>
>> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
>> index ec857a117e..72d6bb2892 100644
>> --- a/hw/i386/acpi-build.c
>> +++ b/hw/i386/acpi-build.c
>> @@ -2339,7 +2339,7 @@ static void
>>   build_amd_iommu(GArray *table_data, BIOSLinker *linker, const char *oem_id,
>>                   const char *oem_table_id)
>>   {
>> -    int ivhd_table_len = 24;
>> +    int ivhd_table_len = 40;
>>       AMDVIState *s = AMD_IOMMU_DEVICE(x86_iommu_get_default());
>>       GArray *ivhd_blob = g_array_new(false, true, 1);
>>       AcpiTable table = { .sig = "IVRS", .rev = 1, .oem_id = oem_id,
>> @@ -2349,18 +2349,19 @@ build_amd_iommu(GArray *table_data, BIOSLinker *linker, const char *oem_id,
>>       /* IVinfo - IO virtualization information common to all
>>        * IOMMU units in a system
>>        */
>> -    build_append_int_noprefix(table_data, 40UL << 8/* PASize */, 4);
>> +    build_append_int_noprefix(table_data,
>> +                             (1UL << 0) | /* EFRSup */
>> +                             (40UL << 8), /* PASize */
>> +                             4);
>>       /* reserved */
>>       build_append_int_noprefix(table_data, 0, 8);
>>   
>> -    /* IVHD definition - type 10h */
>> -    build_append_int_noprefix(table_data, 0x10, 1);
>> +    /* IVHD definition - type 11h */
>> +    build_append_int_noprefix(table_data, 0x11, 1);
>>       /* virtualization flags */
>>       build_append_int_noprefix(table_data,
>>                                (1UL << 0) | /* HtTunEn      */
>> -                             (1UL << 4) | /* iotblSup     */
> 
> btw this should have been iotlbsup?
> 
>> -                             (1UL << 6) | /* PrefSup      */
>> -                             (1UL << 7),  /* PPRSup       */
>> +                             (1UL << 4),  /* iotblSup     */
>>                                1);
>>   
>>       /*
> 
> hmm why are you removing these other flags?

According to the AMD IOMMU specification, the bit 6, 7 are reserved in 
type 0x11 which are PerfSup, PPRSup respectively in type 0x10 so I 
remove those flags when changing to type 0x11. In type 0x11, these 
feature are reported via the below EFR Register Image I believe.

> 
>> @@ -2404,13 +2405,12 @@ build_amd_iommu(GArray *table_data, BIOSLinker *linker, const char *oem_id,
>>       build_append_int_noprefix(table_data, 0, 2);
>>       /* IOMMU info */
>>       build_append_int_noprefix(table_data, 0, 2);
>> -    /* IOMMU Feature Reporting */
>> -    build_append_int_noprefix(table_data,
>> -                             (48UL << 30) | /* HATS   */
>> -                             (48UL << 28) | /* GATS   */
>> -                             (1UL << 2)   | /* GTSup  */
>> -                             (1UL << 6),    /* GASup  */
>> -                             4);
>> +    /* IOMMU Attributes */
>> +    build_append_int_noprefix(table_data, 0, 4);
>> +    /* EFR Register Image */
>> +    build_append_int_noprefix(table_data, s->efr_reg, 8);
>> +    /* EFR Register Image 2 */
>> +    build_append_int_noprefix(table_data, 0, 8);
> 
> 
> here too. what's going on?

Same as the above, the structure of type 0x11 is different from 0x10. At 
offset 20 it is not IOMMU feature reporting but IOMMU attributes. And 
there are 2 more fields: EFR Register Image, EFR Register Image 2 before 
IVHD device entries.

>>   
>>       /* IVHD entries as found above */
>>       g_array_append_vals(table_data, ivhd_blob->data, ivhd_blob->len);
>> diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
>> index bcd016f5c5..5dfa93d945 100644
>> --- a/hw/i386/amd_iommu.c
>> +++ b/hw/i386/amd_iommu.c
>> @@ -31,6 +31,7 @@
>>   #include "hw/i386/apic_internal.h"
>>   #include "trace.h"
>>   #include "hw/i386/apic-msidef.h"
>> +#include "hw/qdev-properties.h"
>>   
>>   /* used AMD-Vi MMIO registers */
>>   const char *amdvi_mmio_low[] = {
>> @@ -1155,7 +1156,12 @@ static int amdvi_int_remap_ga(AMDVIState *iommu,
>>       irq->vector = irte.hi.fields.vector;
>>       irq->dest_mode = irte.lo.fields_remap.dm;
>>       irq->redir_hint = irte.lo.fields_remap.rq_eoi;
>> -    irq->dest = irte.lo.fields_remap.destination;
>> +    if (!iommu->xtsup) {
>> +        irq->dest = irte.lo.fields_remap.destination & 0xff;
>> +    } else {
>> +        irq->dest = irte.lo.fields_remap.destination |
>> +                    (irte.hi.fields.destination_hi << 24);
>> +    }
> 
> Weird way to put it. Why not if (iommu->xtsup) ... ?

Okay, I'll fix this in next version.

>>   
>>       return 0;
>>   }
>> @@ -1503,10 +1509,15 @@ static void amdvi_init(AMDVIState *s)
>>       s->enabled = false;
>>       s->ats_enabled = false;
>>       s->cmdbuf_enabled = false;
>> +    s->efr_reg = AMDVI_DEFAULT_EXT_FEATURES;
>> +
>> +    if (s->xtsup) {
>> +        s->efr_reg |= AMDVI_FEATURE_XT;
>> +    }
>>   
>>       /* reset MMIO */
>>       memset(s->mmior, 0, AMDVI_MMIO_SIZE);
>> -    amdvi_set_quad(s, AMDVI_MMIO_EXT_FEATURES, AMDVI_EXT_FEATURES,
>> +    amdvi_set_quad(s, AMDVI_MMIO_EXT_FEATURES, s->efr_reg,
>>               0xffffffffffffffef, 0);
>>       amdvi_set_quad(s, AMDVI_MMIO_STATUS, 0, 0x98, 0x67);
>>   
>> @@ -1586,6 +1597,11 @@ static void amdvi_sysbus_realize(DeviceState *dev, Error **errp)
>>       amdvi_init(s);
>>   }
>>   
>> +static Property amdvi_properties[] = {
>> +    DEFINE_PROP_BOOL("xtsup", AMDVIState, xtsup, false),
>> +    DEFINE_PROP_END_OF_LIST(),
>> +};
>> +
>>   static const VMStateDescription vmstate_amdvi_sysbus = {
>>       .name = "amd-iommu",
>>       .unmigratable = 1
>> @@ -1612,6 +1628,7 @@ static void amdvi_sysbus_class_init(ObjectClass *klass, void *data)
>>       dc->user_creatable = true;
>>       set_bit(DEVICE_CATEGORY_MISC, dc->categories);
>>       dc->desc = "AMD IOMMU (AMD-Vi) DMA Remapping device";
>> +    device_class_set_props(dc, amdvi_properties);
>>   }
>>   
>>   static const TypeInfo amdvi_sysbus = {
>> diff --git a/hw/i386/amd_iommu.h b/hw/i386/amd_iommu.h
>> index 79d38a3e41..96df7b0400 100644
>> --- a/hw/i386/amd_iommu.h
>> +++ b/hw/i386/amd_iommu.h
>> @@ -154,6 +154,7 @@
>>   
>>   #define AMDVI_FEATURE_PREFETCH            (1ULL << 0) /* page prefetch       */
>>   #define AMDVI_FEATURE_PPR                 (1ULL << 1) /* PPR Support         */
>> +#define AMDVI_FEATURE_XT                  (1ULL << 2) /* x2APIC Support      */
>>   #define AMDVI_FEATURE_GT                  (1ULL << 4) /* Guest Translation   */
>>   #define AMDVI_FEATURE_IA                  (1ULL << 6) /* inval all support   */
>>   #define AMDVI_FEATURE_GA                  (1ULL << 7) /* guest VAPIC support */
>> @@ -173,8 +174,9 @@
>>   #define AMDVI_IOTLB_MAX_SIZE 1024
>>   #define AMDVI_DEVID_SHIFT    36
>>   
>> -/* extended feature support */
>> -#define AMDVI_EXT_FEATURES (AMDVI_FEATURE_PREFETCH | AMDVI_FEATURE_PPR | \
>> +/* default extended feature */
>> +#define AMDVI_DEFAULT_EXT_FEATURES \
>> +        (AMDVI_FEATURE_PREFETCH | AMDVI_FEATURE_PPR | \
>>           AMDVI_FEATURE_IA | AMDVI_FEATURE_GT | AMDVI_FEATURE_HE | \
>>           AMDVI_GATS_MODE | AMDVI_HATS_MODE | AMDVI_FEATURE_GA)
>>   
>> @@ -278,8 +280,8 @@ union irte_ga_lo {
>>                   dm:1,
>>                   /* ------ */
>>                   guest_mode:1,
>> -                destination:8,
>> -                rsvd_1:48;
>> +                destination:24,
>> +                rsvd_1:32;
>>     } fields_remap;
>>   };
>>   
>> @@ -287,7 +289,8 @@ union irte_ga_hi {
>>     uint64_t val;
>>     struct {
>>         uint64_t  vector:8,
>> -                rsvd_2:56;
>> +                rsvd_2:48,
>> +                destination_hi:8;
>>     } fields;
>>   };
>>   
>> @@ -367,6 +370,9 @@ struct AMDVIState {
>>   
>>       /* Interrupt remapping */
>>       bool ga_enabled;
>> +    bool xtsup;
>> +
>> +    uint64_t efr_reg;            /* extended feature register */
>>   };
>>   
>>   #endif
>> -- 
>> 2.25.1
> 

Thanks,
Quang Minh.
Re: [REPOST PATCH v3 5/5] amd_iommu: report x2APIC support to the operating system
Posted by Bui Quang Minh 11 months, 2 weeks ago
On 5/14/23 15:55, Bui Quang Minh wrote:
> On 5/12/23 21:39, Michael S. Tsirkin wrote:
>> On Tue, Apr 11, 2023 at 09:24:40PM +0700, Bui Quang Minh wrote:
>>> This commit adds XTSup configuration to let user choose to whether 
>>> enable
>>> this feature or not. When XTSup is enabled, additional bytes in IRTE 
>>> with
>>> enabled guest virtual VAPIC are used to support 32-bit destination id.
>>>
>>> Additionally, this commit changes to use IVHD type 0x11 in ACPI table 
>>> for
>>> feature report to operating system. This is because Linux does not use
>>> XTSup in IOMMU Feature Reporting field of IVHD type 0x10 but only use 
>>> XTSup
>>> bit in EFR Register Image of IVHD 0x11 to indicate x2APIC support (see
>>> init_iommu_one in linux/drivers/iommu/amd/init.c)
>>>
>>> Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
>>
>> I'm concerned that switching to type 11 will break some older guests.
>> It would be better if we could export both type 10 and type 11
>> ivhd. A question however would be how does this interact
>> with older guests. For example:
>> https://lists.linuxfoundation.org/pipermail/iommu/2016-January/015310.html
>> it looks like linux before 2016 only expected one ivhd entry?
> 
> Export both type 0x10 and 0x11 looks reasonable to me. Before the above 
> commit, I see that Linux still loops through multiple ivhd but only 
> handles one with type 0x10. On newer kernel, it will choose to handle 
> the type that appears last corresponding the first devid, which is weird 
> in my opinion.
> 
> +static u8 get_highest_supported_ivhd_type(struct acpi_table_header *ivrs)
> +{
> +    u8 *base = (u8 *)ivrs;
> +    struct ivhd_header *ivhd = (struct ivhd_header *)
> +                    (base + IVRS_HEADER_LENGTH);
> +    u8 last_type = ivhd->type;
> +    u16 devid = ivhd->devid;
> +
> +    while (((u8 *)ivhd - base < ivrs->length) &&
> +           (ivhd->type <= ACPI_IVHD_TYPE_MAX_SUPPORTED)) {
> +        u8 *p = (u8 *) ivhd;
> +
> +        if (ivhd->devid == devid)
> +            last_type = ivhd->type;
> +        ivhd = (struct ivhd_header *)(p + ivhd->length);
> +    }
> +
> +    return last_type;
> +}
> 
> So when exposing type 0x10 following by 0x11, old kernel only parses 
> 0x10 and does not support x2APIC while new kernel parse 0x11 and support 
> x2APIC. I will expose both types in the next version.
> 
>> Some research and testing here would be benefitial.
>> Similarly for windows guests.
>>
>> Thanks!
>>
>>
>>
>>> ---
>>>   hw/i386/acpi-build.c | 28 ++++++++++++++--------------
>>>   hw/i386/amd_iommu.c  | 21 +++++++++++++++++++--
>>>   hw/i386/amd_iommu.h  | 16 +++++++++++-----
>>>   3 files changed, 44 insertions(+), 21 deletions(-)
>>>
>>> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
>>> index ec857a117e..72d6bb2892 100644
>>> --- a/hw/i386/acpi-build.c
>>> +++ b/hw/i386/acpi-build.c
>>> @@ -2339,7 +2339,7 @@ static void
>>>   build_amd_iommu(GArray *table_data, BIOSLinker *linker, const char 
>>> *oem_id,
>>>                   const char *oem_table_id)
>>>   {
>>> -    int ivhd_table_len = 24;
>>> +    int ivhd_table_len = 40;
>>>       AMDVIState *s = AMD_IOMMU_DEVICE(x86_iommu_get_default());
>>>       GArray *ivhd_blob = g_array_new(false, true, 1);
>>>       AcpiTable table = { .sig = "IVRS", .rev = 1, .oem_id = oem_id,
>>> @@ -2349,18 +2349,19 @@ build_amd_iommu(GArray *table_data, 
>>> BIOSLinker *linker, const char *oem_id,
>>>       /* IVinfo - IO virtualization information common to all
>>>        * IOMMU units in a system
>>>        */
>>> -    build_append_int_noprefix(table_data, 40UL << 8/* PASize */, 4);
>>> +    build_append_int_noprefix(table_data,
>>> +                             (1UL << 0) | /* EFRSup */
>>> +                             (40UL << 8), /* PASize */
>>> +                             4);
>>>       /* reserved */
>>>       build_append_int_noprefix(table_data, 0, 8);
>>> -    /* IVHD definition - type 10h */
>>> -    build_append_int_noprefix(table_data, 0x10, 1);
>>> +    /* IVHD definition - type 11h */
>>> +    build_append_int_noprefix(table_data, 0x11, 1);
>>>       /* virtualization flags */
>>>       build_append_int_noprefix(table_data,
>>>                                (1UL << 0) | /* HtTunEn      */
>>> -                             (1UL << 4) | /* iotblSup     */
>>
>> btw this should have been iotlbsup?
>>
>>> -                             (1UL << 6) | /* PrefSup      */
>>> -                             (1UL << 7),  /* PPRSup       */
>>> +                             (1UL << 4),  /* iotblSup     */
>>>                                1);
>>>       /*
>>
>> hmm why are you removing these other flags?
> 
> According to the AMD IOMMU specification, the bit 6, 7 are reserved in 
> type 0x11 which are PerfSup, PPRSup respectively in type 0x10 so I 
> remove those flags when changing to type 0x11. In type 0x11, these 
> feature are reported via the below EFR Register Image I believe.
> 
>>
>>> @@ -2404,13 +2405,12 @@ build_amd_iommu(GArray *table_data, 
>>> BIOSLinker *linker, const char *oem_id,
>>>       build_append_int_noprefix(table_data, 0, 2);
>>>       /* IOMMU info */
>>>       build_append_int_noprefix(table_data, 0, 2);
>>> -    /* IOMMU Feature Reporting */
>>> -    build_append_int_noprefix(table_data,
>>> -                             (48UL << 30) | /* HATS   */
>>> -                             (48UL << 28) | /* GATS   */
>>> -                             (1UL << 2)   | /* GTSup  */
>>> -                             (1UL << 6),    /* GASup  */
>>> -                             4);
>>> +    /* IOMMU Attributes */
>>> +    build_append_int_noprefix(table_data, 0, 4);
>>> +    /* EFR Register Image */
>>> +    build_append_int_noprefix(table_data, s->efr_reg, 8);
>>> +    /* EFR Register Image 2 */
>>> +    build_append_int_noprefix(table_data, 0, 8);
>>
>>
>> here too. what's going on?
> 
> Same as the above, the structure of type 0x11 is different from 0x10. At 
> offset 20 it is not IOMMU feature reporting but IOMMU attributes. And 
> there are 2 more fields: EFR Register Image, EFR Register Image 2 before 
> IVHD device entries.
> 
>>>       /* IVHD entries as found above */
>>>       g_array_append_vals(table_data, ivhd_blob->data, ivhd_blob->len);
>>> diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
>>> index bcd016f5c5..5dfa93d945 100644
>>> --- a/hw/i386/amd_iommu.c
>>> +++ b/hw/i386/amd_iommu.c
>>> @@ -31,6 +31,7 @@
>>>   #include "hw/i386/apic_internal.h"
>>>   #include "trace.h"
>>>   #include "hw/i386/apic-msidef.h"
>>> +#include "hw/qdev-properties.h"
>>>   /* used AMD-Vi MMIO registers */
>>>   const char *amdvi_mmio_low[] = {
>>> @@ -1155,7 +1156,12 @@ static int amdvi_int_remap_ga(AMDVIState *iommu,
>>>       irq->vector = irte.hi.fields.vector;
>>>       irq->dest_mode = irte.lo.fields_remap.dm;
>>>       irq->redir_hint = irte.lo.fields_remap.rq_eoi;
>>> -    irq->dest = irte.lo.fields_remap.destination;
>>> +    if (!iommu->xtsup) {
>>> +        irq->dest = irte.lo.fields_remap.destination & 0xff;
>>> +    } else {
>>> +        irq->dest = irte.lo.fields_remap.destination |
>>> +                    (irte.hi.fields.destination_hi << 24);
>>> +    }
>>
>> Weird way to put it. Why not if (iommu->xtsup) ... ?
> 
> Okay, I'll fix this in next version.
> 
>>>       return 0;
>>>   }
>>> @@ -1503,10 +1509,15 @@ static void amdvi_init(AMDVIState *s)
>>>       s->enabled = false;
>>>       s->ats_enabled = false;
>>>       s->cmdbuf_enabled = false;
>>> +    s->efr_reg = AMDVI_DEFAULT_EXT_FEATURES;
>>> +
>>> +    if (s->xtsup) {
>>> +        s->efr_reg |= AMDVI_FEATURE_XT;
>>> +    }
>>>       /* reset MMIO */
>>>       memset(s->mmior, 0, AMDVI_MMIO_SIZE);
>>> -    amdvi_set_quad(s, AMDVI_MMIO_EXT_FEATURES, AMDVI_EXT_FEATURES,
>>> +    amdvi_set_quad(s, AMDVI_MMIO_EXT_FEATURES, s->efr_reg,
>>>               0xffffffffffffffef, 0);
>>>       amdvi_set_quad(s, AMDVI_MMIO_STATUS, 0, 0x98, 0x67);
>>> @@ -1586,6 +1597,11 @@ static void amdvi_sysbus_realize(DeviceState 
>>> *dev, Error **errp)
>>>       amdvi_init(s);
>>>   }
>>> +static Property amdvi_properties[] = {
>>> +    DEFINE_PROP_BOOL("xtsup", AMDVIState, xtsup, false),
>>> +    DEFINE_PROP_END_OF_LIST(),
>>> +};
>>> +
>>>   static const VMStateDescription vmstate_amdvi_sysbus = {
>>>       .name = "amd-iommu",
>>>       .unmigratable = 1
>>> @@ -1612,6 +1628,7 @@ static void amdvi_sysbus_class_init(ObjectClass 
>>> *klass, void *data)
>>>       dc->user_creatable = true;
>>>       set_bit(DEVICE_CATEGORY_MISC, dc->categories);
>>>       dc->desc = "AMD IOMMU (AMD-Vi) DMA Remapping device";
>>> +    device_class_set_props(dc, amdvi_properties);
>>>   }
>>>   static const TypeInfo amdvi_sysbus = {
>>> diff --git a/hw/i386/amd_iommu.h b/hw/i386/amd_iommu.h
>>> index 79d38a3e41..96df7b0400 100644
>>> --- a/hw/i386/amd_iommu.h
>>> +++ b/hw/i386/amd_iommu.h
>>> @@ -154,6 +154,7 @@
>>>   #define AMDVI_FEATURE_PREFETCH            (1ULL << 0) /* page 
>>> prefetch       */
>>>   #define AMDVI_FEATURE_PPR                 (1ULL << 1) /* PPR 
>>> Support         */
>>> +#define AMDVI_FEATURE_XT                  (1ULL << 2) /* x2APIC 
>>> Support      */
>>>   #define AMDVI_FEATURE_GT                  (1ULL << 4) /* Guest 
>>> Translation   */
>>>   #define AMDVI_FEATURE_IA                  (1ULL << 6) /* inval all 
>>> support   */
>>>   #define AMDVI_FEATURE_GA                  (1ULL << 7) /* guest 
>>> VAPIC support */
>>> @@ -173,8 +174,9 @@
>>>   #define AMDVI_IOTLB_MAX_SIZE 1024
>>>   #define AMDVI_DEVID_SHIFT    36
>>> -/* extended feature support */
>>> -#define AMDVI_EXT_FEATURES (AMDVI_FEATURE_PREFETCH | 
>>> AMDVI_FEATURE_PPR | \
>>> +/* default extended feature */
>>> +#define AMDVI_DEFAULT_EXT_FEATURES \
>>> +        (AMDVI_FEATURE_PREFETCH | AMDVI_FEATURE_PPR | \
>>>           AMDVI_FEATURE_IA | AMDVI_FEATURE_GT | AMDVI_FEATURE_HE | \
>>>           AMDVI_GATS_MODE | AMDVI_HATS_MODE | AMDVI_FEATURE_GA)
>>> @@ -278,8 +280,8 @@ union irte_ga_lo {
>>>                   dm:1,
>>>                   /* ------ */
>>>                   guest_mode:1,
>>> -                destination:8,
>>> -                rsvd_1:48;
>>> +                destination:24,
>>> +                rsvd_1:32;
>>>     } fields_remap;
>>>   };
>>> @@ -287,7 +289,8 @@ union irte_ga_hi {
>>>     uint64_t val;
>>>     struct {
>>>         uint64_t  vector:8,
>>> -                rsvd_2:56;
>>> +                rsvd_2:48,
>>> +                destination_hi:8;
>>>     } fields;
>>>   };
>>> @@ -367,6 +370,9 @@ struct AMDVIState {
>>>       /* Interrupt remapping */
>>>       bool ga_enabled;
>>> +    bool xtsup;
>>> +
>>> +    uint64_t efr_reg;            /* extended feature register */
>>>   };
>>>   #endif
>>> -- 
>>> 2.25.1

I have posted version 4 to export both IVHD types and flip the 
iommu->xtsup for readability.

I have tested with old Linux before this commit 
8c7142f56fedfc6824b5bca56fee1f443e01746b (iommu/amd: Use the most 
comprehensive IVHD type that the driver can support) (Linux 4.6-rc2) 
that does not support IVHD type 0x11, it can still detect AMD IOMMU 
device via IVHD 0x10 but cannot detect x2APIC support. With newer Linux, 
it successfully detects x2APIC support.

With Windows, I don't have much experience, I have tried to get the 
driver for reverse engineer but looks like the driver can be installed 
via the installer only. And I don't have the device so installer aborts 
the driver installation.

Thanks,
Quang Minh.

Re: [REPOST PATCH v3 5/5] amd_iommu: report x2APIC support to the operating system
Posted by Michael S. Tsirkin 11 months, 3 weeks ago
On Sun, May 14, 2023 at 03:55:11PM +0700, Bui Quang Minh wrote:
> On 5/12/23 21:39, Michael S. Tsirkin wrote:
> > On Tue, Apr 11, 2023 at 09:24:40PM +0700, Bui Quang Minh wrote:
> > > This commit adds XTSup configuration to let user choose to whether enable
> > > this feature or not. When XTSup is enabled, additional bytes in IRTE with
> > > enabled guest virtual VAPIC are used to support 32-bit destination id.
> > > 
> > > Additionally, this commit changes to use IVHD type 0x11 in ACPI table for
> > > feature report to operating system. This is because Linux does not use
> > > XTSup in IOMMU Feature Reporting field of IVHD type 0x10 but only use XTSup
> > > bit in EFR Register Image of IVHD 0x11 to indicate x2APIC support (see
> > > init_iommu_one in linux/drivers/iommu/amd/init.c)
> > > 
> > > Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
> > 
> > I'm concerned that switching to type 11 will break some older guests.
> > It would be better if we could export both type 10 and type 11
> > ivhd. A question however would be how does this interact
> > with older guests. For example:
> > https://lists.linuxfoundation.org/pipermail/iommu/2016-January/015310.html
> > it looks like linux before 2016 only expected one ivhd entry?
> 
> Export both type 0x10 and 0x11 looks reasonable to me. Before the above
> commit, I see that Linux still loops through multiple ivhd but only handles
> one with type 0x10. On newer kernel, it will choose to handle the type that
> appears last corresponding the first devid, which is weird in my opinion.
> +static u8 get_highest_supported_ivhd_type(struct acpi_table_header *ivrs)
> +{
> +	u8 *base = (u8 *)ivrs;
> +	struct ivhd_header *ivhd = (struct ivhd_header *)
> +					(base + IVRS_HEADER_LENGTH);
> +	u8 last_type = ivhd->type;
> +	u16 devid = ivhd->devid;
> +
> +	while (((u8 *)ivhd - base < ivrs->length) &&
> +	       (ivhd->type <= ACPI_IVHD_TYPE_MAX_SUPPORTED)) {
> +		u8 *p = (u8 *) ivhd;
> +
> +		if (ivhd->devid == devid)
> +			last_type = ivhd->type;
> +		ivhd = (struct ivhd_header *)(p + ivhd->length);
> +	}
> +
> +	return last_type;
> +}

Yes I don't get the logic here either.
Talk to kernel devs who wrote this?

commit 8c7142f56fedfc6824b5bca56fee1f443e01746b
Author: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Date:   Fri Apr 1 09:05:59 2016 -0400

    iommu/amd: Use the most comprehensive IVHD type that the driver can support
    
    The IVRS in more recent AMD system usually contains multiple
    IVHD block types (e.g. 0x10, 0x11, and 0x40) for each IOMMU.
    The newer IVHD types provide more information (e.g. new features
    specified in the IOMMU spec), while maintain compatibility with
    the older IVHD type.
    
    Having multiple IVHD type allows older IOMMU drivers to still function
    (e.g. using the older IVHD type 0x10) while the newer IOMMU driver can use
    the newer IVHD types (e.g. 0x11 and 0x40). Therefore, the IOMMU driver
    should only make use of the newest IVHD type that it can support.
    
    This patch adds new logic to determine the highest level of IVHD type
    it can support, and use it throughout the to initialize the driver.
    This requires adding another pass to the IVRS parsing to determine
    appropriate IVHD type (see function get_highest_supported_ivhd_type())
    before parsing the contents.
    
    [Vincent: fix the build error of IVHD_DEV_ACPI_HID flag not found]
    
    Signed-off-by: Wan Zongshun <vincent.wan@amd.com>
    Signed-off-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
    Signed-off-by: Joerg Roedel <jroedel@suse.de>



> 
> So when exposing type 0x10 following by 0x11, old kernel only parses 0x10
> and does not support x2APIC while new kernel parse 0x11 and support x2APIC.
> I will expose both types in the next version.
> 
> > Some research and testing here would be benefitial.
> > Similarly for windows guests.
> > 
> > Thanks!
> > 
> > 
> > 
> > > ---
> > >   hw/i386/acpi-build.c | 28 ++++++++++++++--------------
> > >   hw/i386/amd_iommu.c  | 21 +++++++++++++++++++--
> > >   hw/i386/amd_iommu.h  | 16 +++++++++++-----
> > >   3 files changed, 44 insertions(+), 21 deletions(-)
> > > 
> > > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> > > index ec857a117e..72d6bb2892 100644
> > > --- a/hw/i386/acpi-build.c
> > > +++ b/hw/i386/acpi-build.c
> > > @@ -2339,7 +2339,7 @@ static void
> > >   build_amd_iommu(GArray *table_data, BIOSLinker *linker, const char *oem_id,
> > >                   const char *oem_table_id)
> > >   {
> > > -    int ivhd_table_len = 24;
> > > +    int ivhd_table_len = 40;
> > >       AMDVIState *s = AMD_IOMMU_DEVICE(x86_iommu_get_default());
> > >       GArray *ivhd_blob = g_array_new(false, true, 1);
> > >       AcpiTable table = { .sig = "IVRS", .rev = 1, .oem_id = oem_id,
> > > @@ -2349,18 +2349,19 @@ build_amd_iommu(GArray *table_data, BIOSLinker *linker, const char *oem_id,
> > >       /* IVinfo - IO virtualization information common to all
> > >        * IOMMU units in a system
> > >        */
> > > -    build_append_int_noprefix(table_data, 40UL << 8/* PASize */, 4);
> > > +    build_append_int_noprefix(table_data,
> > > +                             (1UL << 0) | /* EFRSup */
> > > +                             (40UL << 8), /* PASize */
> > > +                             4);
> > >       /* reserved */
> > >       build_append_int_noprefix(table_data, 0, 8);
> > > -    /* IVHD definition - type 10h */
> > > -    build_append_int_noprefix(table_data, 0x10, 1);
> > > +    /* IVHD definition - type 11h */
> > > +    build_append_int_noprefix(table_data, 0x11, 1);
> > >       /* virtualization flags */
> > >       build_append_int_noprefix(table_data,
> > >                                (1UL << 0) | /* HtTunEn      */
> > > -                             (1UL << 4) | /* iotblSup     */
> > 
> > btw this should have been iotlbsup?
> > 
> > > -                             (1UL << 6) | /* PrefSup      */
> > > -                             (1UL << 7),  /* PPRSup       */
> > > +                             (1UL << 4),  /* iotblSup     */
> > >                                1);
> > >       /*
> > 
> > hmm why are you removing these other flags?
> 
> According to the AMD IOMMU specification, the bit 6, 7 are reserved in type
> 0x11 which are PerfSup, PPRSup respectively in type 0x10 so I remove those
> flags when changing to type 0x11. In type 0x11, these feature are reported
> via the below EFR Register Image I believe.
> 
> > 
> > > @@ -2404,13 +2405,12 @@ build_amd_iommu(GArray *table_data, BIOSLinker *linker, const char *oem_id,
> > >       build_append_int_noprefix(table_data, 0, 2);
> > >       /* IOMMU info */
> > >       build_append_int_noprefix(table_data, 0, 2);
> > > -    /* IOMMU Feature Reporting */
> > > -    build_append_int_noprefix(table_data,
> > > -                             (48UL << 30) | /* HATS   */
> > > -                             (48UL << 28) | /* GATS   */
> > > -                             (1UL << 2)   | /* GTSup  */
> > > -                             (1UL << 6),    /* GASup  */
> > > -                             4);
> > > +    /* IOMMU Attributes */
> > > +    build_append_int_noprefix(table_data, 0, 4);
> > > +    /* EFR Register Image */
> > > +    build_append_int_noprefix(table_data, s->efr_reg, 8);
> > > +    /* EFR Register Image 2 */
> > > +    build_append_int_noprefix(table_data, 0, 8);
> > 
> > 
> > here too. what's going on?
> 
> Same as the above, the structure of type 0x11 is different from 0x10. At
> offset 20 it is not IOMMU feature reporting but IOMMU attributes. And there
> are 2 more fields: EFR Register Image, EFR Register Image 2 before IVHD
> device entries.
> 
> > >       /* IVHD entries as found above */
> > >       g_array_append_vals(table_data, ivhd_blob->data, ivhd_blob->len);
> > > diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
> > > index bcd016f5c5..5dfa93d945 100644
> > > --- a/hw/i386/amd_iommu.c
> > > +++ b/hw/i386/amd_iommu.c
> > > @@ -31,6 +31,7 @@
> > >   #include "hw/i386/apic_internal.h"
> > >   #include "trace.h"
> > >   #include "hw/i386/apic-msidef.h"
> > > +#include "hw/qdev-properties.h"
> > >   /* used AMD-Vi MMIO registers */
> > >   const char *amdvi_mmio_low[] = {
> > > @@ -1155,7 +1156,12 @@ static int amdvi_int_remap_ga(AMDVIState *iommu,
> > >       irq->vector = irte.hi.fields.vector;
> > >       irq->dest_mode = irte.lo.fields_remap.dm;
> > >       irq->redir_hint = irte.lo.fields_remap.rq_eoi;
> > > -    irq->dest = irte.lo.fields_remap.destination;
> > > +    if (!iommu->xtsup) {
> > > +        irq->dest = irte.lo.fields_remap.destination & 0xff;
> > > +    } else {
> > > +        irq->dest = irte.lo.fields_remap.destination |
> > > +                    (irte.hi.fields.destination_hi << 24);
> > > +    }
> > 
> > Weird way to put it. Why not if (iommu->xtsup) ... ?
> 
> Okay, I'll fix this in next version.
> 
> > >       return 0;
> > >   }
> > > @@ -1503,10 +1509,15 @@ static void amdvi_init(AMDVIState *s)
> > >       s->enabled = false;
> > >       s->ats_enabled = false;
> > >       s->cmdbuf_enabled = false;
> > > +    s->efr_reg = AMDVI_DEFAULT_EXT_FEATURES;
> > > +
> > > +    if (s->xtsup) {
> > > +        s->efr_reg |= AMDVI_FEATURE_XT;
> > > +    }
> > >       /* reset MMIO */
> > >       memset(s->mmior, 0, AMDVI_MMIO_SIZE);
> > > -    amdvi_set_quad(s, AMDVI_MMIO_EXT_FEATURES, AMDVI_EXT_FEATURES,
> > > +    amdvi_set_quad(s, AMDVI_MMIO_EXT_FEATURES, s->efr_reg,
> > >               0xffffffffffffffef, 0);
> > >       amdvi_set_quad(s, AMDVI_MMIO_STATUS, 0, 0x98, 0x67);
> > > @@ -1586,6 +1597,11 @@ static void amdvi_sysbus_realize(DeviceState *dev, Error **errp)
> > >       amdvi_init(s);
> > >   }
> > > +static Property amdvi_properties[] = {
> > > +    DEFINE_PROP_BOOL("xtsup", AMDVIState, xtsup, false),
> > > +    DEFINE_PROP_END_OF_LIST(),
> > > +};
> > > +
> > >   static const VMStateDescription vmstate_amdvi_sysbus = {
> > >       .name = "amd-iommu",
> > >       .unmigratable = 1
> > > @@ -1612,6 +1628,7 @@ static void amdvi_sysbus_class_init(ObjectClass *klass, void *data)
> > >       dc->user_creatable = true;
> > >       set_bit(DEVICE_CATEGORY_MISC, dc->categories);
> > >       dc->desc = "AMD IOMMU (AMD-Vi) DMA Remapping device";
> > > +    device_class_set_props(dc, amdvi_properties);
> > >   }
> > >   static const TypeInfo amdvi_sysbus = {
> > > diff --git a/hw/i386/amd_iommu.h b/hw/i386/amd_iommu.h
> > > index 79d38a3e41..96df7b0400 100644
> > > --- a/hw/i386/amd_iommu.h
> > > +++ b/hw/i386/amd_iommu.h
> > > @@ -154,6 +154,7 @@
> > >   #define AMDVI_FEATURE_PREFETCH            (1ULL << 0) /* page prefetch       */
> > >   #define AMDVI_FEATURE_PPR                 (1ULL << 1) /* PPR Support         */
> > > +#define AMDVI_FEATURE_XT                  (1ULL << 2) /* x2APIC Support      */
> > >   #define AMDVI_FEATURE_GT                  (1ULL << 4) /* Guest Translation   */
> > >   #define AMDVI_FEATURE_IA                  (1ULL << 6) /* inval all support   */
> > >   #define AMDVI_FEATURE_GA                  (1ULL << 7) /* guest VAPIC support */
> > > @@ -173,8 +174,9 @@
> > >   #define AMDVI_IOTLB_MAX_SIZE 1024
> > >   #define AMDVI_DEVID_SHIFT    36
> > > -/* extended feature support */
> > > -#define AMDVI_EXT_FEATURES (AMDVI_FEATURE_PREFETCH | AMDVI_FEATURE_PPR | \
> > > +/* default extended feature */
> > > +#define AMDVI_DEFAULT_EXT_FEATURES \
> > > +        (AMDVI_FEATURE_PREFETCH | AMDVI_FEATURE_PPR | \
> > >           AMDVI_FEATURE_IA | AMDVI_FEATURE_GT | AMDVI_FEATURE_HE | \
> > >           AMDVI_GATS_MODE | AMDVI_HATS_MODE | AMDVI_FEATURE_GA)
> > > @@ -278,8 +280,8 @@ union irte_ga_lo {
> > >                   dm:1,
> > >                   /* ------ */
> > >                   guest_mode:1,
> > > -                destination:8,
> > > -                rsvd_1:48;
> > > +                destination:24,
> > > +                rsvd_1:32;
> > >     } fields_remap;
> > >   };
> > > @@ -287,7 +289,8 @@ union irte_ga_hi {
> > >     uint64_t val;
> > >     struct {
> > >         uint64_t  vector:8,
> > > -                rsvd_2:56;
> > > +                rsvd_2:48,
> > > +                destination_hi:8;
> > >     } fields;
> > >   };
> > > @@ -367,6 +370,9 @@ struct AMDVIState {
> > >       /* Interrupt remapping */
> > >       bool ga_enabled;
> > > +    bool xtsup;
> > > +
> > > +    uint64_t efr_reg;            /* extended feature register */
> > >   };
> > >   #endif
> > > -- 
> > > 2.25.1
> > 
> 
> Thanks,
> Quang Minh.
Re: [REPOST PATCH v3 5/5] amd_iommu: report x2APIC support to the operating system
Posted by Bui Quang Minh 11 months, 3 weeks ago
On 5/15/23 03:44, Michael S. Tsirkin wrote:
> On Sun, May 14, 2023 at 03:55:11PM +0700, Bui Quang Minh wrote:
>> On 5/12/23 21:39, Michael S. Tsirkin wrote:
>>> On Tue, Apr 11, 2023 at 09:24:40PM +0700, Bui Quang Minh wrote:
>>>> This commit adds XTSup configuration to let user choose to whether enable
>>>> this feature or not. When XTSup is enabled, additional bytes in IRTE with
>>>> enabled guest virtual VAPIC are used to support 32-bit destination id.
>>>>
>>>> Additionally, this commit changes to use IVHD type 0x11 in ACPI table for
>>>> feature report to operating system. This is because Linux does not use
>>>> XTSup in IOMMU Feature Reporting field of IVHD type 0x10 but only use XTSup
>>>> bit in EFR Register Image of IVHD 0x11 to indicate x2APIC support (see
>>>> init_iommu_one in linux/drivers/iommu/amd/init.c)
>>>>
>>>> Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
>>>
>>> I'm concerned that switching to type 11 will break some older guests.
>>> It would be better if we could export both type 10 and type 11
>>> ivhd. A question however would be how does this interact
>>> with older guests. For example:
>>> https://lists.linuxfoundation.org/pipermail/iommu/2016-January/015310.html
>>> it looks like linux before 2016 only expected one ivhd entry?
>>
>> Export both type 0x10 and 0x11 looks reasonable to me. Before the above
>> commit, I see that Linux still loops through multiple ivhd but only handles
>> one with type 0x10. On newer kernel, it will choose to handle the type that
>> appears last corresponding the first devid, which is weird in my opinion.
>> +static u8 get_highest_supported_ivhd_type(struct acpi_table_header *ivrs)
>> +{
>> +	u8 *base = (u8 *)ivrs;
>> +	struct ivhd_header *ivhd = (struct ivhd_header *)
>> +					(base + IVRS_HEADER_LENGTH);
>> +	u8 last_type = ivhd->type;
>> +	u16 devid = ivhd->devid;
>> +
>> +	while (((u8 *)ivhd - base < ivrs->length) &&
>> +	       (ivhd->type <= ACPI_IVHD_TYPE_MAX_SUPPORTED)) {
>> +		u8 *p = (u8 *) ivhd;
>> +
>> +		if (ivhd->devid == devid)
>> +			last_type = ivhd->type;
>> +		ivhd = (struct ivhd_header *)(p + ivhd->length);
>> +	}
>> +
>> +	return last_type;
>> +}
> 
> Yes I don't get the logic here either.
> Talk to kernel devs who wrote this?
> 
> commit 8c7142f56fedfc6824b5bca56fee1f443e01746b
> Author: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
> Date:   Fri Apr 1 09:05:59 2016 -0400
> 
>      iommu/amd: Use the most comprehensive IVHD type that the driver can support
>      
>      The IVRS in more recent AMD system usually contains multiple
>      IVHD block types (e.g. 0x10, 0x11, and 0x40) for each IOMMU.
>      The newer IVHD types provide more information (e.g. new features
>      specified in the IOMMU spec), while maintain compatibility with
>      the older IVHD type.
>      
>      Having multiple IVHD type allows older IOMMU drivers to still function
>      (e.g. using the older IVHD type 0x10) while the newer IOMMU driver can use
>      the newer IVHD types (e.g. 0x11 and 0x40). Therefore, the IOMMU driver
>      should only make use of the newest IVHD type that it can support.
>      
>      This patch adds new logic to determine the highest level of IVHD type
>      it can support, and use it throughout the to initialize the driver.
>      This requires adding another pass to the IVRS parsing to determine
>      appropriate IVHD type (see function get_highest_supported_ivhd_type())
>      before parsing the contents.
>      
>      [Vincent: fix the build error of IVHD_DEV_ACPI_HID flag not found]
>      
>      Signed-off-by: Wan Zongshun <vincent.wan@amd.com>
>      Signed-off-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
>      Signed-off-by: Joerg Roedel <jroedel@suse.de>

I've sent a email to talk to kernel developers about this function. Here 
is the link to the email: 
https://lore.kernel.org/all/e8a87c2b-a29a-ccf9-49c6-3cfceaa208bb@gmail.com/