[PATCH v8 0/5] Support x2APIC mode with TCG accelerator

Bui Quang Minh posted 5 patches 7 months, 1 week ago
Failed in applying to current master (apply log)
There is a newer version of this series
hw/i386/acpi-build.c                 | 129 +++++----
hw/i386/amd_iommu.c                  |  29 +-
hw/i386/amd_iommu.h                  |  16 +-
hw/i386/intel_iommu.c                |   6 +-
hw/i386/x86.c                        |   6 +-
hw/intc/apic.c                       | 395 +++++++++++++++++++++------
hw/intc/apic_common.c                |  16 +-
hw/intc/trace-events                 |   4 +-
include/hw/i386/apic.h               |   6 +-
include/hw/i386/apic_internal.h      |   7 +-
target/i386/cpu-sysemu.c             |  18 +-
target/i386/cpu.c                    |   8 +-
target/i386/cpu.h                    |   9 +
target/i386/tcg/sysemu/misc_helper.c |  31 +++
14 files changed, 510 insertions(+), 170 deletions(-)
[PATCH v8 0/5] Support x2APIC mode with TCG accelerator
Posted by Bui Quang Minh 7 months, 1 week ago
Hi everyone,

This series implements x2APIC mode in userspace local APIC and the
RDMSR/WRMSR helper to access x2APIC registers in x2APIC mode. Intel iommu
and AMD iommu are adjusted to support x2APIC interrupt remapping. With this
series, we can now boot Linux kernel into x2APIC mode with TCG accelerator
using either Intel or AMD iommu.

Testing to boot my own built Linux 6.3.0-rc2, the kernel successfully boot
with enabled x2APIC and can enumerate CPU with APIC ID 257

Using Intel IOMMU

qemu/build/qemu-system-x86_64 \
  -smp 2,maxcpus=260 \
  -cpu qemu64,x2apic=on \
  -machine q35 \
  -device intel-iommu,intremap=on,eim=on \
  -device qemu64-x86_64-cpu,x2apic=on,core-id=257,socket-id=0,thread-id=0 \
  -m 2G \
  -kernel $KERNEL_DIR \
  -append "nokaslr console=ttyS0 root=/dev/sda earlyprintk=serial net.ifnames=0" \
  -drive file=$IMAGE_DIR,format=raw \
  -nographic \
  -s

Using AMD IOMMU

qemu/build/qemu-system-x86_64 \
  -smp 2,maxcpus=260 \
  -cpu qemu64,x2apic=on \
  -machine q35 \
  -device amd-iommu,intremap=on,xtsup=on \
  -device qemu64-x86_64-cpu,x2apic=on,core-id=257,socket-id=0,thread-id=0 \
  -m 2G \
  -kernel $KERNEL_DIR \
  -append "nokaslr console=ttyS0 root=/dev/sda earlyprintk=serial net.ifnames=0" \
  -drive file=$IMAGE_DIR,format=raw \
  -nographic \
  -s

Testing the emulated userspace APIC with kvm-unit-tests, disable test
device with this patch

diff --git a/lib/x86/fwcfg.c b/lib/x86/fwcfg.c
index 1734afb..f56fe1c 100644
--- a/lib/x86/fwcfg.c
+++ b/lib/x86/fwcfg.c
@@ -27,6 +27,7 @@ static void read_cfg_override(void)

        if ((str = getenv("TEST_DEVICE")))
                no_test_device = !atol(str);
+       no_test_device = true;

        if ((str = getenv("MEMLIMIT")))
                fw_override[FW_CFG_MAX_RAM] = atol(str) * 1024 * 1024;

~ env QEMU=/home/minh/Desktop/oss/qemu/build/qemu-system-x86_64 ACCEL=tcg \
./run_tests.sh -v -g apic

TESTNAME=apic-split TIMEOUT=90s ACCEL=tcg ./x86/run x86/apic.flat -smp 2
-cpu qemu64,+x2apic,+tsc-deadline -machine kernel_irqchip=split FAIL
apic-split (54 tests, 8 unexpected failures, 1 skipped)
TESTNAME=ioapic-split TIMEOUT=90s ACCEL=tcg ./x86/run x86/ioapic.flat -smp
1 -cpu qemu64 -machine kernel_irqchip=split PASS ioapic-split (19 tests)
TESTNAME=x2apic TIMEOUT=30 ACCEL=tcg ./x86/run x86/apic.flat -smp 2 -cpu
qemu64,+x2apic,+tsc-deadline FAIL x2apic (54 tests, 8 unexpected failures,
1 skipped) TESTNAME=xapic TIMEOUT=60 ACCEL=tcg ./x86/run x86/apic.flat -smp
2 -cpu qemu64,-x2apic,+tsc-deadline -machine pit=off FAIL xapic (43 tests,
6 unexpected failures, 2 skipped)

  FAIL: apic_disable: *0xfee00030: 50014
  FAIL: apic_disable: *0xfee00080: f0
  FAIL: apic_disable: *0xfee00030: 50014
  FAIL: apic_disable: *0xfee00080: f0
  FAIL: apicbase: relocate apic

These errors are because we don't disable MMIO region when switching to
x2APIC and don't support relocate MMIO region yet. This is a problem
because, MMIO region is the same for all CPUs, in order to support these we
need to figure out how to allocate and manage different MMIO regions for
each CPUs. This can be an improvement in the future.

  FAIL: nmi-after-sti
  FAIL: multiple nmi

These errors are in the way we handle CPU_INTERRUPT_NMI in core TCG.

  FAIL: TMCCT should stay at zero

This error is related to APIC timer which should be addressed in separate
patch.

Version 8 changes,
- Patch 2, 4:
  + Rebase to master and resolve conflicts in these 2 patches

Version 7 changes,
- Patch 4:
  + If eim=on, keep checking if kvm x2APIC is enabled when kernel-irqchip
  is split

Version 6 changes,
- Patch 5:
  + Make all places use the amdvi_extended_feature_register to get extended
  feature register

Version 5 changes,
- Patch 3:
  + Rebase to master and fix conflict
- Patch 5:
  + Create a helper function to get amdvi extended feature register instead
  of storing it in AMDVIState

Version 4 changes,
- Patch 5:
  + Instead of replacing IVHD type 0x10 with type 0x11, export both types
  for backward compatibility with old guest operating system
  + Flip the xtsup feature check condition in amdvi_int_remap_ga for
  readability

Version 3 changes,
- Patch 2:
  + Allow APIC ID > 255 only when x2APIC feature is supported on CPU
  + Make physical destination mode IPI which has destination id 0xffffffff
  a broadcast to xAPIC CPUs
  + Make cluster address 0xf in cluster model of xAPIC logical destination
  mode a broadcast to all clusters
  + Create new extended_log_dest to store APIC_LDR information in x2APIC
  instead of extending log_dest for backward compatibility in vmstate

Version 2 changes,
- Add support for APIC ID larger than 255
- Adjust AMD iommu for x2APIC suuport
- Reorganize and split patch 1,2 into patch 1,2,3 in version 2

Thanks,
Quang Minh.

Bui Quang Minh (5):
  i386/tcg: implement x2APIC registers MSR access
  apic: add support for x2APIC mode
  apic, i386/tcg: add x2apic transitions
  intel_iommu: allow Extended Interrupt Mode when using userspace APIC
  amd_iommu: report x2APIC support to the operating system

 hw/i386/acpi-build.c                 | 129 +++++----
 hw/i386/amd_iommu.c                  |  29 +-
 hw/i386/amd_iommu.h                  |  16 +-
 hw/i386/intel_iommu.c                |   6 +-
 hw/i386/x86.c                        |   6 +-
 hw/intc/apic.c                       | 395 +++++++++++++++++++++------
 hw/intc/apic_common.c                |  16 +-
 hw/intc/trace-events                 |   4 +-
 include/hw/i386/apic.h               |   6 +-
 include/hw/i386/apic_internal.h      |   7 +-
 target/i386/cpu-sysemu.c             |  18 +-
 target/i386/cpu.c                    |   8 +-
 target/i386/cpu.h                    |   9 +
 target/i386/tcg/sysemu/misc_helper.c |  31 +++
 14 files changed, 510 insertions(+), 170 deletions(-)

-- 
2.25.1
Re: [PATCH v8 0/5] Support x2APIC mode with TCG accelerator
Posted by Bui Quang Minh 7 months, 1 week ago
On 9/26/23 23:06, Bui Quang Minh wrote:

> Version 8 changes,
> - Patch 2, 4:
>    + Rebase to master and resolve conflicts in these 2 patches

The conflicts when rebasing is due to the commit 9926cf34de5fa15da 
("target/i386: Allow elision of kvm_enable_x2apic()"). AFAIK, this 
commit adds kvm_enabled() before kvm_enable_x2apic() in the and (&&) 
expression so that when kvm_enabled() is known to be false at the 
compile time (CONFIG_KVM_IS_POSSIBLE is undefined), the compiler can 
omit the kvm_enable_x2apic() in the and expression.

In patch 2, I simply combine the change logic in patch 2 with logic in 
the commit 9926cf34de5fa15da.

In patch 4, the end result of version 8 is the same as version 7. I 
don't think we need to add the kvm_enabled() to make the expression become

	if (kvm_enabled() && kvm_irqchip_is_split() && !kvm_enable_x2apic())

Because when CONFIG_KVM_IS_POSSIBLE is undefined, kvm_irqchip_is_split() 
is known to be false at the compile time too so just keep the expression as

	if (kvm_irqchip_is_split() && !kvm_enable_x2apic())

is enough.

 > git range-diff feat/tcg-x2apic-v7~5..feat/tcg-x2apic-v7 
feat/tcg-x2apic-v8~5..feat/tcg-x2apic-v8

1:  c1d197a230 = 1:  f6e3918e0f i386/tcg: implement x2APIC registers MSR 
access
2:  dd96cb0238 ! 2:  54d44a15b6 apic: add support for x2APIC mode
     @@ Commit message

       ## hw/i386/x86.c ##
      @@ hw/i386/x86.c: void x86_cpus_init(X86MachineState *x86ms, int 
default_cpu_version)
     -      * Can we support APIC ID 255 or higher?
     -      *
     -      * Under Xen: yes.
     --     * With userspace emulated lapic: no
     -+     * With userspace emulated lapic: checked later in 
apic_common_set_id.
     -      * With KVM's in-kernel lapic: only if X2APIC API is enabled.
     +      * both in-kernel lapic and X2APIC userspace API.
            */
     -     if (x86ms->apic_id_limit > 255 && !xen_enabled() &&
     +     if (x86ms->apic_id_limit > 255 && kvm_enabled() &&
      -        (!kvm_irqchip_in_kernel() || !kvm_enable_x2apic())) {
      +        kvm_irqchip_in_kernel() && !kvm_enable_x2apic()) {
               error_report("current -smp configuration requires kernel "
3:  31a5c555a6 = 3:  eb080d1e2c apic, i386/tcg: add x2apic transitions
4:  d78b5c43b4 ! 4:  59f028f119 intel_iommu: allow Extended Interrupt 
Mode when using userspace APIC
     @@ hw/i386/intel_iommu.c: static bool 
vtd_decide_config(IntelIOMMUState *s, Error *
      -            error_setg(errp, "eim=on requires 
accel=kvm,kernel-irqchip=split");
      -            return false;
      -        }
     --        if (!kvm_enable_x2apic()) {
     +-        if (kvm_enabled() && !kvm_enable_x2apic()) {
      +        if (kvm_irqchip_is_split() && !kvm_enable_x2apic()) {
                   error_setg(errp, "eim=on requires support on the KVM 
side"
                                    "(X2APIC_API, first shipped in v4.7)");
5:  51f558035d = 5:  bc95c3cb60 amd_iommu: report x2APIC support to the 
operating system

As the change is minor and does not change the main logic, I keep the 
Reviewed-by and Acked-by tags.

Thank you,
Quang Minh.
Re: [PATCH v8 0/5] Support x2APIC mode with TCG accelerator
Posted by Michael S. Tsirkin 7 months ago
On Tue, Sep 26, 2023 at 11:23:53PM +0700, Bui Quang Minh wrote:
> On 9/26/23 23:06, Bui Quang Minh wrote:
> 
> > Version 8 changes,
> > - Patch 2, 4:
> >    + Rebase to master and resolve conflicts in these 2 patches
> 
> The conflicts when rebasing is due to the commit 9926cf34de5fa15da
> ("target/i386: Allow elision of kvm_enable_x2apic()"). AFAIK, this commit
> adds kvm_enabled() before kvm_enable_x2apic() in the and (&&) expression so
> that when kvm_enabled() is known to be false at the compile time
> (CONFIG_KVM_IS_POSSIBLE is undefined), the compiler can omit the
> kvm_enable_x2apic() in the and expression.
> 
> In patch 2, I simply combine the change logic in patch 2 with logic in the
> commit 9926cf34de5fa15da.
> 
> In patch 4, the end result of version 8 is the same as version 7. I don't
> think we need to add the kvm_enabled() to make the expression become
> 
> 	if (kvm_enabled() && kvm_irqchip_is_split() && !kvm_enable_x2apic())
> 
> Because when CONFIG_KVM_IS_POSSIBLE is undefined, kvm_irqchip_is_split() is
> known to be false at the compile time too so just keep the expression as
> 
> 	if (kvm_irqchip_is_split() && !kvm_enable_x2apic())
> 
> is enough.
> 
> > git range-diff feat/tcg-x2apic-v7~5..feat/tcg-x2apic-v7
> feat/tcg-x2apic-v8~5..feat/tcg-x2apic-v8
> 
> 1:  c1d197a230 = 1:  f6e3918e0f i386/tcg: implement x2APIC registers MSR
> access
> 2:  dd96cb0238 ! 2:  54d44a15b6 apic: add support for x2APIC mode
>     @@ Commit message
> 
>       ## hw/i386/x86.c ##
>      @@ hw/i386/x86.c: void x86_cpus_init(X86MachineState *x86ms, int
> default_cpu_version)
>     -      * Can we support APIC ID 255 or higher?
>     -      *
>     -      * Under Xen: yes.
>     --     * With userspace emulated lapic: no
>     -+     * With userspace emulated lapic: checked later in
> apic_common_set_id.
>     -      * With KVM's in-kernel lapic: only if X2APIC API is enabled.
>     +      * both in-kernel lapic and X2APIC userspace API.
>            */
>     -     if (x86ms->apic_id_limit > 255 && !xen_enabled() &&
>     +     if (x86ms->apic_id_limit > 255 && kvm_enabled() &&
>      -        (!kvm_irqchip_in_kernel() || !kvm_enable_x2apic())) {
>      +        kvm_irqchip_in_kernel() && !kvm_enable_x2apic()) {
>               error_report("current -smp configuration requires kernel "
> 3:  31a5c555a6 = 3:  eb080d1e2c apic, i386/tcg: add x2apic transitions
> 4:  d78b5c43b4 ! 4:  59f028f119 intel_iommu: allow Extended Interrupt Mode
> when using userspace APIC
>     @@ hw/i386/intel_iommu.c: static bool vtd_decide_config(IntelIOMMUState
> *s, Error *
>      -            error_setg(errp, "eim=on requires
> accel=kvm,kernel-irqchip=split");
>      -            return false;
>      -        }
>     --        if (!kvm_enable_x2apic()) {
>     +-        if (kvm_enabled() && !kvm_enable_x2apic()) {
>      +        if (kvm_irqchip_is_split() && !kvm_enable_x2apic()) {
>                   error_setg(errp, "eim=on requires support on the KVM side"
>                                    "(X2APIC_API, first shipped in v4.7)");
> 5:  51f558035d = 5:  bc95c3cb60 amd_iommu: report x2APIC support to the
> operating system
> 
> As the change is minor and does not change the main logic, I keep the
> Reviewed-by and Acked-by tags.
> 
> Thank you,
> Quang Minh.



Causes some build failures:

https://gitlab.com/mstredhat/qemu/-/jobs/5216377483
/builds/mstredhat/qemu/build/../hw/intc/apic.c:1023: undefined reference to `raise_exception_ra'

checkpatch warnings:
https://gitlab.com/mstredhat/qemu/-/jobs/5216377552
Re: [PATCH v8 0/5] Support x2APIC mode with TCG accelerator
Posted by Bui Quang Minh 7 months ago
On 10/4/23 13:51, Michael S. Tsirkin wrote:
> On Tue, Sep 26, 2023 at 11:23:53PM +0700, Bui Quang Minh wrote:
>> On 9/26/23 23:06, Bui Quang Minh wrote:
>>
>>> Version 8 changes,
>>> - Patch 2, 4:
>>>     + Rebase to master and resolve conflicts in these 2 patches
>>
>> The conflicts when rebasing is due to the commit 9926cf34de5fa15da
>> ("target/i386: Allow elision of kvm_enable_x2apic()"). AFAIK, this commit
>> adds kvm_enabled() before kvm_enable_x2apic() in the and (&&) expression so
>> that when kvm_enabled() is known to be false at the compile time
>> (CONFIG_KVM_IS_POSSIBLE is undefined), the compiler can omit the
>> kvm_enable_x2apic() in the and expression.
>>
>> In patch 2, I simply combine the change logic in patch 2 with logic in the
>> commit 9926cf34de5fa15da.
>>
>> In patch 4, the end result of version 8 is the same as version 7. I don't
>> think we need to add the kvm_enabled() to make the expression become
>>
>> 	if (kvm_enabled() && kvm_irqchip_is_split() && !kvm_enable_x2apic())
>>
>> Because when CONFIG_KVM_IS_POSSIBLE is undefined, kvm_irqchip_is_split() is
>> known to be false at the compile time too so just keep the expression as
>>
>> 	if (kvm_irqchip_is_split() && !kvm_enable_x2apic())
>>
>> is enough.
>>
>>> git range-diff feat/tcg-x2apic-v7~5..feat/tcg-x2apic-v7
>> feat/tcg-x2apic-v8~5..feat/tcg-x2apic-v8
>>
>> 1:  c1d197a230 = 1:  f6e3918e0f i386/tcg: implement x2APIC registers MSR
>> access
>> 2:  dd96cb0238 ! 2:  54d44a15b6 apic: add support for x2APIC mode
>>      @@ Commit message
>>
>>        ## hw/i386/x86.c ##
>>       @@ hw/i386/x86.c: void x86_cpus_init(X86MachineState *x86ms, int
>> default_cpu_version)
>>      -      * Can we support APIC ID 255 or higher?
>>      -      *
>>      -      * Under Xen: yes.
>>      --     * With userspace emulated lapic: no
>>      -+     * With userspace emulated lapic: checked later in
>> apic_common_set_id.
>>      -      * With KVM's in-kernel lapic: only if X2APIC API is enabled.
>>      +      * both in-kernel lapic and X2APIC userspace API.
>>             */
>>      -     if (x86ms->apic_id_limit > 255 && !xen_enabled() &&
>>      +     if (x86ms->apic_id_limit > 255 && kvm_enabled() &&
>>       -        (!kvm_irqchip_in_kernel() || !kvm_enable_x2apic())) {
>>       +        kvm_irqchip_in_kernel() && !kvm_enable_x2apic()) {
>>                error_report("current -smp configuration requires kernel "
>> 3:  31a5c555a6 = 3:  eb080d1e2c apic, i386/tcg: add x2apic transitions
>> 4:  d78b5c43b4 ! 4:  59f028f119 intel_iommu: allow Extended Interrupt Mode
>> when using userspace APIC
>>      @@ hw/i386/intel_iommu.c: static bool vtd_decide_config(IntelIOMMUState
>> *s, Error *
>>       -            error_setg(errp, "eim=on requires
>> accel=kvm,kernel-irqchip=split");
>>       -            return false;
>>       -        }
>>      --        if (!kvm_enable_x2apic()) {
>>      +-        if (kvm_enabled() && !kvm_enable_x2apic()) {
>>       +        if (kvm_irqchip_is_split() && !kvm_enable_x2apic()) {
>>                    error_setg(errp, "eim=on requires support on the KVM side"
>>                                     "(X2APIC_API, first shipped in v4.7)");
>> 5:  51f558035d = 5:  bc95c3cb60 amd_iommu: report x2APIC support to the
>> operating system
>>
>> As the change is minor and does not change the main logic, I keep the
>> Reviewed-by and Acked-by tags.
>>
>> Thank you,
>> Quang Minh.
> 
> 
> 
> Causes some build failures:
> 
> https://gitlab.com/mstredhat/qemu/-/jobs/5216377483
> /builds/mstredhat/qemu/build/../hw/intc/apic.c:1023: undefined reference to `raise_exception_ra'

raise_exception_ra is tcg specific so the builds are failed as tcg is 
disabled. I will remove the use of raise_exception_ra, the invalid 
register read just returns 0, invalid register write has no effect 
without raising the exception anymore. The APIC state invalid transition 
does not raise exception either, just don't change the APIC state. As a 
side effect, we fail some more KVM unit test of invalid APIC state 
transition, as they expect to catch exception in these cases. I think 
it's not a big problem. What's your opinion?

Thank you,
Quang Minh.
Re: [PATCH v8 0/5] Support x2APIC mode with TCG accelerator
Posted by Michael S. Tsirkin 7 months ago
On Wed, Oct 04, 2023 at 11:40:43PM +0700, Bui Quang Minh wrote:
> On 10/4/23 13:51, Michael S. Tsirkin wrote:
> > On Tue, Sep 26, 2023 at 11:23:53PM +0700, Bui Quang Minh wrote:
> > > On 9/26/23 23:06, Bui Quang Minh wrote:
> > > 
> > > > Version 8 changes,
> > > > - Patch 2, 4:
> > > >     + Rebase to master and resolve conflicts in these 2 patches
> > > 
> > > The conflicts when rebasing is due to the commit 9926cf34de5fa15da
> > > ("target/i386: Allow elision of kvm_enable_x2apic()"). AFAIK, this commit
> > > adds kvm_enabled() before kvm_enable_x2apic() in the and (&&) expression so
> > > that when kvm_enabled() is known to be false at the compile time
> > > (CONFIG_KVM_IS_POSSIBLE is undefined), the compiler can omit the
> > > kvm_enable_x2apic() in the and expression.
> > > 
> > > In patch 2, I simply combine the change logic in patch 2 with logic in the
> > > commit 9926cf34de5fa15da.
> > > 
> > > In patch 4, the end result of version 8 is the same as version 7. I don't
> > > think we need to add the kvm_enabled() to make the expression become
> > > 
> > > 	if (kvm_enabled() && kvm_irqchip_is_split() && !kvm_enable_x2apic())
> > > 
> > > Because when CONFIG_KVM_IS_POSSIBLE is undefined, kvm_irqchip_is_split() is
> > > known to be false at the compile time too so just keep the expression as
> > > 
> > > 	if (kvm_irqchip_is_split() && !kvm_enable_x2apic())
> > > 
> > > is enough.
> > > 
> > > > git range-diff feat/tcg-x2apic-v7~5..feat/tcg-x2apic-v7
> > > feat/tcg-x2apic-v8~5..feat/tcg-x2apic-v8
> > > 
> > > 1:  c1d197a230 = 1:  f6e3918e0f i386/tcg: implement x2APIC registers MSR
> > > access
> > > 2:  dd96cb0238 ! 2:  54d44a15b6 apic: add support for x2APIC mode
> > >      @@ Commit message
> > > 
> > >        ## hw/i386/x86.c ##
> > >       @@ hw/i386/x86.c: void x86_cpus_init(X86MachineState *x86ms, int
> > > default_cpu_version)
> > >      -      * Can we support APIC ID 255 or higher?
> > >      -      *
> > >      -      * Under Xen: yes.
> > >      --     * With userspace emulated lapic: no
> > >      -+     * With userspace emulated lapic: checked later in
> > > apic_common_set_id.
> > >      -      * With KVM's in-kernel lapic: only if X2APIC API is enabled.
> > >      +      * both in-kernel lapic and X2APIC userspace API.
> > >             */
> > >      -     if (x86ms->apic_id_limit > 255 && !xen_enabled() &&
> > >      +     if (x86ms->apic_id_limit > 255 && kvm_enabled() &&
> > >       -        (!kvm_irqchip_in_kernel() || !kvm_enable_x2apic())) {
> > >       +        kvm_irqchip_in_kernel() && !kvm_enable_x2apic()) {
> > >                error_report("current -smp configuration requires kernel "
> > > 3:  31a5c555a6 = 3:  eb080d1e2c apic, i386/tcg: add x2apic transitions
> > > 4:  d78b5c43b4 ! 4:  59f028f119 intel_iommu: allow Extended Interrupt Mode
> > > when using userspace APIC
> > >      @@ hw/i386/intel_iommu.c: static bool vtd_decide_config(IntelIOMMUState
> > > *s, Error *
> > >       -            error_setg(errp, "eim=on requires
> > > accel=kvm,kernel-irqchip=split");
> > >       -            return false;
> > >       -        }
> > >      --        if (!kvm_enable_x2apic()) {
> > >      +-        if (kvm_enabled() && !kvm_enable_x2apic()) {
> > >       +        if (kvm_irqchip_is_split() && !kvm_enable_x2apic()) {
> > >                    error_setg(errp, "eim=on requires support on the KVM side"
> > >                                     "(X2APIC_API, first shipped in v4.7)");
> > > 5:  51f558035d = 5:  bc95c3cb60 amd_iommu: report x2APIC support to the
> > > operating system
> > > 
> > > As the change is minor and does not change the main logic, I keep the
> > > Reviewed-by and Acked-by tags.
> > > 
> > > Thank you,
> > > Quang Minh.
> > 
> > 
> > 
> > Causes some build failures:
> > 
> > https://gitlab.com/mstredhat/qemu/-/jobs/5216377483
> > /builds/mstredhat/qemu/build/../hw/intc/apic.c:1023: undefined reference to `raise_exception_ra'
> 
> raise_exception_ra is tcg specific so the builds are failed as tcg is
> disabled. I will remove the use of raise_exception_ra, the invalid register
> read just returns 0, invalid register write has no effect without raising
> the exception anymore. The APIC state invalid transition does not raise
> exception either, just don't change the APIC state. As a side effect, we
> fail some more KVM unit test of invalid APIC state transition, as they
> expect to catch exception in these cases. I think it's not a big problem.
> What's your opinion?
> 
> Thank you,
> Quang Minh.

Hmm.  I think this will have to be addressed somehow so people
and ci systems are not confused. Paolo any feedback?

-- 
MST
Re: [PATCH v8 0/5] Support x2APIC mode with TCG accelerator
Posted by Bui Quang Minh 7 months ago
On 9/26/23 23:06, Bui Quang Minh wrote:
> Hi everyone,
> 
> This series implements x2APIC mode in userspace local APIC and the
> RDMSR/WRMSR helper to access x2APIC registers in x2APIC mode. Intel iommu
> and AMD iommu are adjusted to support x2APIC interrupt remapping. With this
> series, we can now boot Linux kernel into x2APIC mode with TCG accelerator
> using either Intel or AMD iommu.
> 
> Testing to boot my own built Linux 6.3.0-rc2, the kernel successfully boot
> with enabled x2APIC and can enumerate CPU with APIC ID 257
> 
> Using Intel IOMMU
> 
> qemu/build/qemu-system-x86_64 \
>    -smp 2,maxcpus=260 \
>    -cpu qemu64,x2apic=on \
>    -machine q35 \
>    -device intel-iommu,intremap=on,eim=on \
>    -device qemu64-x86_64-cpu,x2apic=on,core-id=257,socket-id=0,thread-id=0 \
>    -m 2G \
>    -kernel $KERNEL_DIR \
>    -append "nokaslr console=ttyS0 root=/dev/sda earlyprintk=serial net.ifnames=0" \
>    -drive file=$IMAGE_DIR,format=raw \
>    -nographic \
>    -s
> 
> Using AMD IOMMU
> 
> qemu/build/qemu-system-x86_64 \
>    -smp 2,maxcpus=260 \
>    -cpu qemu64,x2apic=on \
>    -machine q35 \
>    -device amd-iommu,intremap=on,xtsup=on \
>    -device qemu64-x86_64-cpu,x2apic=on,core-id=257,socket-id=0,thread-id=0 \
>    -m 2G \
>    -kernel $KERNEL_DIR \
>    -append "nokaslr console=ttyS0 root=/dev/sda earlyprintk=serial net.ifnames=0" \
>    -drive file=$IMAGE_DIR,format=raw \
>    -nographic \
>    -s
> 
> Testing the emulated userspace APIC with kvm-unit-tests, disable test
> device with this patch
> 
> diff --git a/lib/x86/fwcfg.c b/lib/x86/fwcfg.c
> index 1734afb..f56fe1c 100644
> --- a/lib/x86/fwcfg.c
> +++ b/lib/x86/fwcfg.c
> @@ -27,6 +27,7 @@ static void read_cfg_override(void)
> 
>          if ((str = getenv("TEST_DEVICE")))
>                  no_test_device = !atol(str);
> +       no_test_device = true;
> 
>          if ((str = getenv("MEMLIMIT")))
>                  fw_override[FW_CFG_MAX_RAM] = atol(str) * 1024 * 1024;
> 
> ~ env QEMU=/home/minh/Desktop/oss/qemu/build/qemu-system-x86_64 ACCEL=tcg \
> ./run_tests.sh -v -g apic
> 
> TESTNAME=apic-split TIMEOUT=90s ACCEL=tcg ./x86/run x86/apic.flat -smp 2
> -cpu qemu64,+x2apic,+tsc-deadline -machine kernel_irqchip=split FAIL
> apic-split (54 tests, 8 unexpected failures, 1 skipped)
> TESTNAME=ioapic-split TIMEOUT=90s ACCEL=tcg ./x86/run x86/ioapic.flat -smp
> 1 -cpu qemu64 -machine kernel_irqchip=split PASS ioapic-split (19 tests)
> TESTNAME=x2apic TIMEOUT=30 ACCEL=tcg ./x86/run x86/apic.flat -smp 2 -cpu
> qemu64,+x2apic,+tsc-deadline FAIL x2apic (54 tests, 8 unexpected failures,
> 1 skipped) TESTNAME=xapic TIMEOUT=60 ACCEL=tcg ./x86/run x86/apic.flat -smp
> 2 -cpu qemu64,-x2apic,+tsc-deadline -machine pit=off FAIL xapic (43 tests,
> 6 unexpected failures, 2 skipped)
> 
>    FAIL: apic_disable: *0xfee00030: 50014
>    FAIL: apic_disable: *0xfee00080: f0
>    FAIL: apic_disable: *0xfee00030: 50014
>    FAIL: apic_disable: *0xfee00080: f0
>    FAIL: apicbase: relocate apic
> 
> These errors are because we don't disable MMIO region when switching to
> x2APIC and don't support relocate MMIO region yet. This is a problem
> because, MMIO region is the same for all CPUs, in order to support these we
> need to figure out how to allocate and manage different MMIO regions for
> each CPUs. This can be an improvement in the future.

I've tried to address these failed tests with the idea of creating 
separate APIC MMIO region per CPU. I've created a working patch with 
this approach and will send it in reply to this message, you can see the 
detail in the patch. However, it has a big drawback, it breaks MSI 
handler. With that patch, device needs to call apic_send_msi directly 
instead of writing to 0xfee00000 in system memory. Furthermore, I think 
APIC MMIO relocation is a very unusual use case and APIC MMIO disable is 
not much important for normal system software. I'm pleased to receive 
any comments on that patch.

Thank you,
Quang Minh.
[RFC PATCH] tcg, apic: create a separate root memory region for each CPU
Posted by Bui Quang Minh 7 months ago
Currently, by default, every TCG cpu has root memory region to the same
system memory region. In order to support APIC MMIO relocation as well as
correctly disable APIC MMIO in disabled and x2APIC state, in this commit,
we create a separate root memory region for every CPU. The system memory
region is added to this container root memory region, later an separate
APIC MMIO is added with higher priority than system memory . With separate
APIC MMIO region per CPU, APIC MMIO relocation and disable can be done per
CPU without interfering without others.

Because the MSI base address is the same as APIC MMIO, the MMIO region
currently serves 2 purposes: APIC register access and MSI handler. In case
no interrupt remapping device is used, devices send MSI by writing to
0xfee00000 in system memory. However, as we move APIC MMIO out of system
memory we need to change the device code to call apic_send_msi instead of
writing to system memory.

This commit passes the APIC MMIO relocation test in kvm-unit-tests, still
fails APIC disable, however, I think we should treat those as pass

Before:
	FAIL: apic_disable: *0xfee00030: 50014
	FAIL: apic_disable: *0xfee00080: f0
	FAIL: apic_disable: *0xfee00030: 50014
	FAIL: apic_disable: *0xfee00080: f0
After:
	FAIL: apic_disable: *0xfee00030: 0
	FAIL: apic_disable: *0xfee00080: 0
	FAIL: apic_disable: *0xfee00030: 0
	FAIL: apic_disable: *0xfee00080: 0

Before this commit, we still can read APIC register, after this commit we
cannot. However, the test memset disabled MMIO region with 0xff and expects
to read back the 0xff value. As we disable APIC MMIO memory region, the
write is dispatched to system memory which has unassigned read/write
operation, so read returns 0 and write has no effect.

This commit is tested on booting up Linux kernel with and without Intel/AMD
IOMMU on enabled/disabled x2APIC CPUs.

The memory region tree when running without interrupt remapping device

	~ info mtree
Before:
	address-space: cpu-memory-0
	address-space: cpu-memory-1
	address-space: memory
	  0000000000000000-ffffffffffffffff (prio 0, i/o): system
After:
	address-space: cpu-memory-0
	  0000000000000000-ffffffffffffffff (prio 0, i/o): memory
	    0000000000000000-ffffffffffffffff (prio 0, i/o): alias memory
	      @system 0000000000000000-ffffffffffffffff
	    00000000fee00000-00000000feefffff (prio 4096, i/o): apic-msi

	address-space: cpu-memory-1
	  0000000000000000-ffffffffffffffff (prio 0, i/o): memory
	    0000000000000000-ffffffffffffffff (prio 0, i/o): alias memory
	      @system 0000000000000000-ffffffffffffffff
	    00000000fee00000-00000000feefffff (prio 4096, i/o): apic-msi

Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
---
  hw/intc/apic.c                   | 24 ++++++++++++++++++++++--
  hw/intc/ioapic.c                 | 32 +++++++++++++++++++++++++++-----
  hw/pci/pci.c                     | 25 +++++++++++++++++++++++--
  include/hw/i386/apic.h           |  1 +
  target/i386/cpu-sysemu.c         | 14 +++++++++++---
  target/i386/tcg/sysemu/tcg-cpu.c | 13 +++++++++++++
  6 files changed, 97 insertions(+), 12 deletions(-)

diff --git a/hw/intc/apic.c b/hw/intc/apic.c
index b8f56836a6..0bd5b5d1f9 100644
--- a/hw/intc/apic.c
+++ b/hw/intc/apic.c
@@ -340,17 +340,34 @@ static void apic_set_base_check(APICCommonState 
*s, uint64_t val)
          raise_exception_ra(&s->cpu->env, EXCP0D_GPF, GETPC());
  }

+/* set base address and enable/disable APIC MMIO */
+static void apic_io_memory_set_base_enabled(APICCommonState *s,
+                                            hwaddr address,
+                                            bool enabled)
+{
+    if (tcg_enabled()) {
+        qemu_mutex_lock_iothread();
+        memory_region_set_address(&s->io_memory, address);
+        memory_region_set_enabled(&s->io_memory, enabled);
+        qemu_mutex_unlock_iothread();
+    }
+}
+
  static void apic_set_base(APICCommonState *s, uint64_t val)
  {
+    hwaddr new_base = val & MSR_IA32_APICBASE_BASE;
+    bool enabled = true;
+
      apic_set_base_check(s, val);

-    s->apicbase = (val & 0xfffff000) |
+    s->apicbase = new_base |
          (s->apicbase & (MSR_IA32_APICBASE_BSP | 
MSR_IA32_APICBASE_ENABLE));
      /* if disabled, cannot be enabled again */
      if (!(val & MSR_IA32_APICBASE_ENABLE)) {
          s->apicbase &= ~MSR_IA32_APICBASE_ENABLE;
          cpu_clear_apic_feature(&s->cpu->env);
          s->spurious_vec &= ~APIC_SV_ENABLE;
+        enabled = false;
      }

      /* Transition from disabled mode to xAPIC */
@@ -368,7 +385,10 @@ static void apic_set_base(APICCommonState *s, 
uint64_t val)

          s->log_dest = ((s->initial_apic_id & 0xffff0) << 16) |
                        (1 << (s->initial_apic_id & 0xf));
+        enabled = false;
      }
+
+    apic_io_memory_set_base_enabled(s, new_base, enabled);
  }

  static void apic_set_tpr(APICCommonState *s, uint8_t val)
@@ -889,7 +909,7 @@ static uint64_t apic_mem_read(void *opaque, hwaddr 
addr, unsigned size)
      return val;
  }

-static void apic_send_msi(MSIMessage *msi)
+void apic_send_msi(MSIMessage *msi)
  {
      uint64_t addr = msi->address;
      uint32_t data = msi->data;
diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c
index 716ffc8bbb..8e85a3165a 100644
--- a/hw/intc/ioapic.c
+++ b/hw/intc/ioapic.c
@@ -35,6 +35,7 @@
  #include "hw/i386/apic-msidef.h"
  #include "hw/i386/x86-iommu.h"
  #include "trace.h"
+#include "hw/i386/apic.h"

  #define APIC_DELIVERY_MODE_SHIFT 8
  #define APIC_POLARITY_SHIFT 14
@@ -131,11 +132,32 @@ static void ioapic_service(IOAPICCommonState *s)
                  }
  #endif

-                /* No matter whether IR is enabled, we translate
-                 * the IOAPIC message into a MSI one, and its
-                 * address space will decide whether we need a
-                 * translation. */
-                stl_le_phys(ioapic_as, info.addr, info.data);
+                /*
+                 * We reuse the APIC MMIO memory region for MSI because
+                 * by default, they are at the same base address but
+                 * the register addresses are not overlapped. However,
+                 * in TCG we try to support APIC MMIO relocation and
+                 * disable per cpu APIC MMIO when that CPU switches
+                 * to different APIC state. So in case, when using TCG
+                 * and the no interrupt remapping device is used, we
+                 * call apic_send_msi to send MSI to APIC directly
+                 * instead of writing to MSI address.
+                 */
+                if (tcg_enabled() && ioapic_as == &address_space_memory) {
+                    MSIMessage msg = {
+                        .address = info.addr,
+                        .data = info.data,
+                    };
+                    apic_send_msi(&msg);
+                } else {
+                    /*
+                     * No matter whether IR is enabled, we translate
+                     * the IOAPIC message into a MSI one, and its
+                     * address space will decide whether we need a
+                     * translation.
+                     */
+                    stl_le_phys(ioapic_as, info.addr, info.data);
+                }
              }
          }
      }
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 881d774fb6..b680c3869b 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -50,6 +50,9 @@
  #include "qemu/cutils.h"
  #include "pci-internal.h"

+#include "sysemu/tcg.h"
+#include "hw/i386/apic.h"
+
  #include "hw/xen/xen.h"
  #include "hw/i386/kvm/xen_evtchn.h"

@@ -346,6 +349,7 @@ void pci_device_deassert_intx(PCIDevice *dev)
  static void pci_msi_trigger(PCIDevice *dev, MSIMessage msg)
  {
      MemTxAttrs attrs = {};
+    AddressSpace *dma_as = pci_device_iommu_address_space(dev);

      /*
       * Xen uses the high bits of the address to contain some of the bits
@@ -359,8 +363,25 @@ static void pci_msi_trigger(PCIDevice *dev, 
MSIMessage msg)
          return;
      }
      attrs.requester_id = pci_requester_id(dev);
-    address_space_stl_le(&dev->bus_master_as, msg.address, msg.data,
-                         attrs, NULL);
+
+
+    /*
+     * We reuse the APIC MMIO memory region for MSI because
+     * by default, they are at the same base address but
+     * the register addresses are not overlapped. However,
+     * in TCG we try to support APIC MMIO relocation and
+     * disable per cpu APIC MMIO when that CPU switches
+     * to different APIC state. So in case, when using TCG
+     * and the no interrupt remapping device is used, we
+     * call apic_send_msi to send MSI to APIC directly
+     * instead of writing to MSI address.
+     */
+    if (tcg_enabled() && dma_as == &address_space_memory) {
+        apic_send_msi(&msg);
+    } else {
+        address_space_stl_le(&dev->bus_master_as, msg.address, msg.data,
+                             attrs, NULL);
+    }
  }

  static void pci_reset_regions(PCIDevice *dev)
diff --git a/include/hw/i386/apic.h b/include/hw/i386/apic.h
index 12aad09f4c..045eb96d0e 100644
--- a/include/hw/i386/apic.h
+++ b/include/hw/i386/apic.h
@@ -20,6 +20,7 @@ int apic_get_highest_priority_irr(DeviceState *dev);
  uint64_t apic_register_read(int index);
  void apic_register_write(int index, uint64_t val);
  bool is_x2apic_mode(DeviceState *d);
+void apic_send_msi(MSIMessage *msi);

  /* pc.c */
  DeviceState *cpu_get_current_apic(void);
diff --git a/target/i386/cpu-sysemu.c b/target/i386/cpu-sysemu.c
index 7422096737..a4c10591ef 100644
--- a/target/i386/cpu-sysemu.c
+++ b/target/i386/cpu-sysemu.c
@@ -316,14 +316,22 @@ void x86_cpu_apic_realize(X86CPU *cpu, Error **errp)

      /* Map APIC MMIO area */
      apic = APIC_COMMON(cpu->apic_state);
-    if (!apic_mmio_map_once) {
+    if (tcg_enabled()) {
+        /*
+         * Map the APIC MMIO to the private root memory of the CPU
+         */
+        memory_region_add_subregion_overlap(cpu->parent_obj.memory,
+                                            apic->apicbase &
+                                            MSR_IA32_APICBASE_BASE,
+                                            &apic->io_memory,
+                                            0x1000);
+    } else if (!apic_mmio_map_once) {
          memory_region_add_subregion_overlap(get_system_memory(),
                                              apic->apicbase &
                                              MSR_IA32_APICBASE_BASE,
                                              &apic->io_memory,
                                              0x1000);
-        apic_mmio_map_once = true;
-     }
+    }
  }

  GuestPanicInformation *x86_cpu_get_crash_info(CPUState *cs)
diff --git a/target/i386/tcg/sysemu/tcg-cpu.c 
b/target/i386/tcg/sysemu/tcg-cpu.c
index c223c0fe9b..7731c422b9 100644
--- a/target/i386/tcg/sysemu/tcg-cpu.c
+++ b/target/i386/tcg/sysemu/tcg-cpu.c
@@ -24,6 +24,7 @@
  #include "sysemu/sysemu.h"
  #include "qemu/units.h"
  #include "exec/address-spaces.h"
+#include "hw/i386/x86.h"

  #include "tcg/tcg-cpu.h"

@@ -46,6 +47,7 @@ static void tcg_cpu_machine_done(Notifier *n, void 
*unused)
  bool tcg_cpu_realizefn(CPUState *cs, Error **errp)
  {
      X86CPU *cpu = X86_CPU(cs);
+    MemoryRegion *root_memory, *system_memory_alias;

      /*
       * The realize order is important, since x86_cpu_realize() checks if
@@ -72,6 +74,17 @@ bool tcg_cpu_realizefn(CPUState *cs, Error **errp)
      memory_region_add_subregion_overlap(cpu->cpu_as_root, 0, 
cpu->cpu_as_mem, 0);
      memory_region_set_enabled(cpu->cpu_as_mem, true);

+    root_memory = g_new(MemoryRegion, 1);
+    memory_region_init(root_memory, OBJECT(cs), "memory", ~0ull);
+
+    object_property_set_link(OBJECT(cs), "memory", OBJECT(root_memory), 
errp);
+
+    system_memory_alias = g_new(MemoryRegion, 1);
+    memory_region_init_alias(system_memory_alias, OBJECT(cpu), "memory",
+                             get_system_memory(), 0, ~0ull);
+
+    memory_region_add_subregion_overlap(cs->memory, 0, 
system_memory_alias, 0);
+
      cs->num_ases = 2;
      cpu_address_space_init(cs, 0, "cpu-memory", cs->memory);
      cpu_address_space_init(cs, 1, "cpu-smm", cpu->cpu_as_root);
-- 
2.25.1
[PATCH v8 1/5] i386/tcg: implement x2APIC registers MSR access
Posted by Bui Quang Minh 7 months, 1 week ago
This commit refactors apic_mem_read/write to support both MMIO access in
xAPIC and MSR access in x2APIC.

Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
---
 hw/intc/apic.c                       | 79 ++++++++++++++++++----------
 hw/intc/trace-events                 |  4 +-
 include/hw/i386/apic.h               |  3 ++
 target/i386/cpu.h                    |  3 ++
 target/i386/tcg/sysemu/misc_helper.c | 27 ++++++++++
 5 files changed, 86 insertions(+), 30 deletions(-)

diff --git a/hw/intc/apic.c b/hw/intc/apic.c
index ac3d47d231..cb8c20de93 100644
--- a/hw/intc/apic.c
+++ b/hw/intc/apic.c
@@ -288,6 +288,13 @@ void apic_deliver_irq(uint8_t dest, uint8_t dest_mode, uint8_t delivery_mode,
     apic_bus_deliver(deliver_bitmask, delivery_mode, vector_num, trigger_mode);
 }
 
+bool is_x2apic_mode(DeviceState *dev)
+{
+    APICCommonState *s = APIC(dev);
+
+    return s->apicbase & MSR_IA32_APICBASE_EXTD;
+}
+
 static void apic_set_base(APICCommonState *s, uint64_t val)
 {
     s->apicbase = (val & 0xfffff000) |
@@ -636,16 +643,11 @@ static void apic_timer(void *opaque)
     apic_timer_update(s, s->next_time);
 }
 
-static uint64_t apic_mem_read(void *opaque, hwaddr addr, unsigned size)
+uint64_t apic_register_read(int index)
 {
     DeviceState *dev;
     APICCommonState *s;
-    uint32_t val;
-    int index;
-
-    if (size < 4) {
-        return 0;
-    }
+    uint64_t val;
 
     dev = cpu_get_current_apic();
     if (!dev) {
@@ -653,7 +655,6 @@ static uint64_t apic_mem_read(void *opaque, hwaddr addr, unsigned size)
     }
     s = APIC(dev);
 
-    index = (addr >> 4) & 0xff;
     switch(index) {
     case 0x02: /* id */
         val = s->id << 24;
@@ -720,7 +721,23 @@ static uint64_t apic_mem_read(void *opaque, hwaddr addr, unsigned size)
         val = 0;
         break;
     }
-    trace_apic_mem_readl(addr, val);
+
+    trace_apic_register_read(index, val);
+    return val;
+}
+
+static uint64_t apic_mem_read(void *opaque, hwaddr addr, unsigned size)
+{
+    uint32_t val;
+    int index;
+
+    if (size < 4) {
+        return 0;
+    }
+
+    index = (addr >> 4) & 0xff;
+    val = (uint32_t)apic_register_read(index);
+
     return val;
 }
 
@@ -737,27 +754,10 @@ static void apic_send_msi(MSIMessage *msi)
     apic_deliver_irq(dest, dest_mode, delivery, vector, trigger_mode);
 }
 
-static void apic_mem_write(void *opaque, hwaddr addr, uint64_t val,
-                           unsigned size)
+void apic_register_write(int index, uint64_t val)
 {
     DeviceState *dev;
     APICCommonState *s;
-    int index = (addr >> 4) & 0xff;
-
-    if (size < 4) {
-        return;
-    }
-
-    if (addr > 0xfff || !index) {
-        /* MSI and MMIO APIC are at the same memory location,
-         * but actually not on the global bus: MSI is on PCI bus
-         * APIC is connected directly to the CPU.
-         * Mapping them on the global bus happens to work because
-         * MSI registers are reserved in APIC MMIO and vice versa. */
-        MSIMessage msi = { .address = addr, .data = val };
-        apic_send_msi(&msi);
-        return;
-    }
 
     dev = cpu_get_current_apic();
     if (!dev) {
@@ -765,7 +765,7 @@ static void apic_mem_write(void *opaque, hwaddr addr, uint64_t val,
     }
     s = APIC(dev);
 
-    trace_apic_mem_writel(addr, val);
+    trace_apic_register_write(index, val);
 
     switch(index) {
     case 0x02:
@@ -843,6 +843,29 @@ static void apic_mem_write(void *opaque, hwaddr addr, uint64_t val,
     }
 }
 
+static void apic_mem_write(void *opaque, hwaddr addr, uint64_t val,
+                           unsigned size)
+{
+    int index = (addr >> 4) & 0xff;
+
+    if (size < 4) {
+        return;
+    }
+
+    if (addr > 0xfff || !index) {
+        /* MSI and MMIO APIC are at the same memory location,
+         * but actually not on the global bus: MSI is on PCI bus
+         * APIC is connected directly to the CPU.
+         * Mapping them on the global bus happens to work because
+         * MSI registers are reserved in APIC MMIO and vice versa. */
+        MSIMessage msi = { .address = addr, .data = val };
+        apic_send_msi(&msi);
+        return;
+    }
+
+    apic_register_write(index, val);
+}
+
 static void apic_pre_save(APICCommonState *s)
 {
     apic_sync_vapic(s, SYNC_FROM_VAPIC);
diff --git a/hw/intc/trace-events b/hw/intc/trace-events
index 36ff71f947..1ef29d0256 100644
--- a/hw/intc/trace-events
+++ b/hw/intc/trace-events
@@ -14,8 +14,8 @@ cpu_get_apic_base(uint64_t val) "0x%016"PRIx64
 # apic.c
 apic_local_deliver(int vector, uint32_t lvt) "vector %d delivery mode %d"
 apic_deliver_irq(uint8_t dest, uint8_t dest_mode, uint8_t delivery_mode, uint8_t vector_num, uint8_t trigger_mode) "dest %d dest_mode %d delivery_mode %d vector %d trigger_mode %d"
-apic_mem_readl(uint64_t addr, uint32_t val)  "0x%"PRIx64" = 0x%08x"
-apic_mem_writel(uint64_t addr, uint32_t val) "0x%"PRIx64" = 0x%08x"
+apic_register_read(uint8_t reg, uint64_t val) "register 0x%02x = 0x%"PRIx64
+apic_register_write(uint8_t reg, uint64_t val) "register 0x%02x = 0x%"PRIx64
 
 # ioapic.c
 ioapic_set_remote_irr(int n) "set remote irr for pin %d"
diff --git a/include/hw/i386/apic.h b/include/hw/i386/apic.h
index bdc15a7a73..2cebeb4faf 100644
--- a/include/hw/i386/apic.h
+++ b/include/hw/i386/apic.h
@@ -18,6 +18,9 @@ void apic_sipi(DeviceState *s);
 void apic_poll_irq(DeviceState *d);
 void apic_designate_bsp(DeviceState *d, bool bsp);
 int apic_get_highest_priority_irr(DeviceState *dev);
+uint64_t apic_register_read(int index);
+void apic_register_write(int index, uint64_t val);
+bool is_x2apic_mode(DeviceState *d);
 
 /* pc.c */
 DeviceState *cpu_get_current_apic(void);
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index d3f377d48a..78489378ad 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -545,6 +545,9 @@ typedef enum X86Seg {
 #define MSR_IA32_VMX_TRUE_ENTRY_CTLS     0x00000490
 #define MSR_IA32_VMX_VMFUNC             0x00000491
 
+#define MSR_APIC_START                  0x00000800
+#define MSR_APIC_END                    0x000008ff
+
 #define XSTATE_FP_BIT                   0
 #define XSTATE_SSE_BIT                  1
 #define XSTATE_YMM_BIT                  2
diff --git a/target/i386/tcg/sysemu/misc_helper.c b/target/i386/tcg/sysemu/misc_helper.c
index e1528b7f80..1fce2076a3 100644
--- a/target/i386/tcg/sysemu/misc_helper.c
+++ b/target/i386/tcg/sysemu/misc_helper.c
@@ -25,6 +25,7 @@
 #include "exec/address-spaces.h"
 #include "exec/exec-all.h"
 #include "tcg/helper-tcg.h"
+#include "hw/i386/apic.h"
 
 void helper_outb(CPUX86State *env, uint32_t port, uint32_t data)
 {
@@ -289,6 +290,19 @@ void helper_wrmsr(CPUX86State *env)
         env->msr_bndcfgs = val;
         cpu_sync_bndcs_hflags(env);
         break;
+    case MSR_APIC_START ... MSR_APIC_END: {
+        int index = (uint32_t)env->regs[R_ECX] - MSR_APIC_START;
+
+        if (!is_x2apic_mode(env_archcpu(env)->apic_state)) {
+            goto error;
+        }
+
+        qemu_mutex_lock_iothread();
+        apic_register_write(index, val);
+        qemu_mutex_unlock_iothread();
+
+        break;
+    }
     default:
         if ((uint32_t)env->regs[R_ECX] >= MSR_MC0_CTL
             && (uint32_t)env->regs[R_ECX] < MSR_MC0_CTL +
@@ -455,6 +469,19 @@ void helper_rdmsr(CPUX86State *env)
         val = (cs->nr_threads * cs->nr_cores) | (cs->nr_cores << 16);
         break;
     }
+    case MSR_APIC_START ... MSR_APIC_END: {
+        int index = (uint32_t)env->regs[R_ECX] - MSR_APIC_START;
+
+        if (!is_x2apic_mode(env_archcpu(env)->apic_state)) {
+            raise_exception_ra(env, EXCP0D_GPF, GETPC());
+        }
+
+        qemu_mutex_lock_iothread();
+        val = apic_register_read(index);
+        qemu_mutex_unlock_iothread();
+
+        break;
+    }
     default:
         if ((uint32_t)env->regs[R_ECX] >= MSR_MC0_CTL
             && (uint32_t)env->regs[R_ECX] < MSR_MC0_CTL +
-- 
2.25.1
Re: [PATCH v8 1/5] i386/tcg: implement x2APIC registers MSR access
Posted by Phil Dennis-Jordan 6 months, 1 week ago
I can confirm that this works. The build issue obviously needs fixing,
but once that's fixed, this improves on the status quo.

I've tested this and patch 2/5 with x2apic CPUID bit enabled with the
hvf backend on macOS. To make it work in hvf mode, I used the attached
additional minimal patch to wire it up, but with that in place it
noticeably improves guest OS performance. (This patch doesn't yet
implement raising exceptions or checking for x2apic mode, more on that
in my comments below.)

Reviewed-by: Phil Dennis-Jordan <phil@philjordan.eu>

On Tue, 26 Sept 2023 at 18:08, Bui Quang Minh <minhquangbui99@gmail.com> wrote:
> @@ -455,6 +469,19 @@ void helper_rdmsr(CPUX86State *env)
>          val = (cs->nr_threads * cs->nr_cores) | (cs->nr_cores << 16);
>          break;
>      }
> +    case MSR_APIC_START ... MSR_APIC_END: {
> +        int index = (uint32_t)env->regs[R_ECX] - MSR_APIC_START;
> +
> +        if (!is_x2apic_mode(env_archcpu(env)->apic_state)) {
> +            raise_exception_ra(env, EXCP0D_GPF, GETPC());
> +        }
> +
> +        qemu_mutex_lock_iothread();
> +        val = apic_register_read(index);
> +        qemu_mutex_unlock_iothread();

Shouldn't the x2apic mode check technically be inside the lock?
Furthermore, we need the mode check logic in each accelerator whose
MSR read and write we wire up. Finally, there's the exception raising
issue which Michael noted.

So my suggestion would be to wrap the x2apic mode check and the call
to the lower level apic_register_read into a standalone
apic_x2apic_msr_read() or similar, and the equivalent for writes.
These functions should then also return success or failure, the latter
indicating an exception should be raised. Raising the exception can
then also be implemented for each accelerator at the relevant call
site. That contains the raise_exception_ra call in the TCG specific
code, and I can do the equivalent on the hvf side.

It may also be cleaner to only implement the shared xAPIC and x2APIC
functionality in the apic_register_{read|write} functions, and put the
code that's specific to MMIO and MSR paths in the
apic_mem_{read|write} and apic_x2apic_msr_{read|write} wrappers,
respectively? Not sure.
From f4661f2b8dc03ded0de186c8b3e49debb59876fc Mon Sep 17 00:00:00 2001
From: Phil Dennis-Jordan <phil@philjordan.eu>
Date: Sun, 22 Oct 2023 15:15:12 +0200
Subject: [PATCH] i386: hvf: Wires up MSRs and enables x2apic mode when using
 hvf

This change adds support for recently implemented x2apic MSR reads and
writes when using macOS Hypervisor.framework. The corresponding CPUID
flag is likewise allow-listed when using the accelerator.

Signed-off-by: Phil Dennis-Jordan <phil@philjordan.eu>
---
 target/i386/hvf/x86_cpuid.c | 4 ++--
 target/i386/hvf/x86_emu.c   | 6 ++++++
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/target/i386/hvf/x86_cpuid.c b/target/i386/hvf/x86_cpuid.c
index e56cd8411b..4f260d46a8 100644
--- a/target/i386/hvf/x86_cpuid.c
+++ b/target/i386/hvf/x86_cpuid.c
@@ -64,8 +64,8 @@ uint32_t hvf_get_supported_cpuid(uint32_t func, uint32_t idx,
              CPUID_PAT | CPUID_PSE36 | CPUID_CLFLUSH | CPUID_MMX |
              CPUID_FXSR | CPUID_SSE | CPUID_SSE2 | CPUID_SS;
         ecx &= CPUID_EXT_SSE3 | CPUID_EXT_PCLMULQDQ | CPUID_EXT_SSSE3 |
-             CPUID_EXT_FMA | CPUID_EXT_CX16 | CPUID_EXT_PCID |
-             CPUID_EXT_SSE41 | CPUID_EXT_SSE42 | CPUID_EXT_MOVBE |
+             CPUID_EXT_FMA | CPUID_EXT_CX16 | CPUID_EXT_PCID | CPUID_EXT_SSE41 |
+             CPUID_EXT_SSE42 | CPUID_EXT_X2APIC | CPUID_EXT_MOVBE |
              CPUID_EXT_POPCNT | CPUID_EXT_AES | CPUID_EXT_XSAVE |
              CPUID_EXT_AVX | CPUID_EXT_F16C | CPUID_EXT_RDRAND;
         ecx |= CPUID_EXT_HYPERVISOR;
diff --git a/target/i386/hvf/x86_emu.c b/target/i386/hvf/x86_emu.c
index e84831e6c2..a764a564df 100644
--- a/target/i386/hvf/x86_emu.c
+++ b/target/i386/hvf/x86_emu.c
@@ -750,6 +750,9 @@ void simulate_rdmsr(struct CPUState *cpu)
         val = cs->nr_threads * cs->nr_cores; /* thread count, bits 15..0 */
         val |= ((uint32_t)cs->nr_cores << 16); /* core count, bits 31..16 */
         break;
+    case MSR_APIC_START ... MSR_APIC_END:
+        val = apic_register_read(msr - MSR_APIC_START);
+        break;
     default:
         /* fprintf(stderr, "%s: unknown msr 0x%x\n", __func__, msr); */
         val = 0;
@@ -844,6 +847,9 @@ void simulate_wrmsr(struct CPUState *cpu)
     case MSR_MTRRdefType:
         env->mtrr_deftype = data;
         break;
+    case MSR_APIC_START ... MSR_APIC_END:
+        apic_register_write(msr - MSR_APIC_START, data);
+        break;
     default:
         break;
     }
-- 
2.36.1

Re: [PATCH v8 1/5] i386/tcg: implement x2APIC registers MSR access
Posted by Bui Quang Minh 6 months, 1 week ago
On 10/22/23 20:59, Phil Dennis-Jordan wrote:
> I can confirm that this works. The build issue obviously needs fixing,
> but once that's fixed, this improves on the status quo.
> 
> I've tested this and patch 2/5 with x2apic CPUID bit enabled with the
> hvf backend on macOS. To make it work in hvf mode, I used the attached
> additional minimal patch to wire it up, but with that in place it
> noticeably improves guest OS performance. (This patch doesn't yet
> implement raising exceptions or checking for x2apic mode, more on that
> in my comments below.)
> 
> Reviewed-by: Phil Dennis-Jordan <phil@philjordan.eu>
> 
> On Tue, 26 Sept 2023 at 18:08, Bui Quang Minh <minhquangbui99@gmail.com> wrote:
>> @@ -455,6 +469,19 @@ void helper_rdmsr(CPUX86State *env)
>>           val = (cs->nr_threads * cs->nr_cores) | (cs->nr_cores << 16);
>>           break;
>>       }
>> +    case MSR_APIC_START ... MSR_APIC_END: {
>> +        int index = (uint32_t)env->regs[R_ECX] - MSR_APIC_START;
>> +
>> +        if (!is_x2apic_mode(env_archcpu(env)->apic_state)) {
>> +            raise_exception_ra(env, EXCP0D_GPF, GETPC());
>> +        }
>> +
>> +        qemu_mutex_lock_iothread();
>> +        val = apic_register_read(index);
>> +        qemu_mutex_unlock_iothread();
> 
> Shouldn't the x2apic mode check technically be inside the lock?
> Furthermore, we need the mode check logic in each accelerator whose
> MSR read and write we wire up. Finally, there's the exception raising
> issue which Michael noted.
> 
> So my suggestion would be to wrap the x2apic mode check and the call
> to the lower level apic_register_read into a standalone
> apic_x2apic_msr_read() or similar, and the equivalent for writes.
> These functions should then also return success or failure, the latter
> indicating an exception should be raised. Raising the exception can
> then also be implemented for each accelerator at the relevant call
> site. That contains the raise_exception_ra call in the TCG specific
> code, and I can do the equivalent on the hvf side.

Thanks a lot for your suggestion, I've taken this approach and 
implemented an apic_msr_read/write wrapper as you suggested in version 9 
(https://lore.kernel.org/qemu-devel/20231024152105.35942-1-minhquangbui99@gmail.com/)

Thank you,
Quang Minh.
[PATCH v8 2/5] apic: add support for x2APIC mode
Posted by Bui Quang Minh 7 months, 1 week ago
This commit extends the APIC ID to 32-bit long and remove the 255 max APIC
ID limit in userspace APIC. The array that manages local APICs is now
dynamically allocated based on the max APIC ID of created x86 machine.
Also, new x2APIC IPI destination determination scheme, self IPI and x2APIC
mode register access are supported.

Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
---
 hw/i386/x86.c                   |   6 +-
 hw/intc/apic.c                  | 266 ++++++++++++++++++++++++--------
 hw/intc/apic_common.c           |   9 ++
 include/hw/i386/apic.h          |   3 +-
 include/hw/i386/apic_internal.h |   7 +-
 target/i386/cpu-sysemu.c        |   8 +-
 6 files changed, 230 insertions(+), 69 deletions(-)

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index f034df8bf6..88534203c9 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -133,7 +133,7 @@ void x86_cpus_init(X86MachineState *x86ms, int default_cpu_version)
      * both in-kernel lapic and X2APIC userspace API.
      */
     if (x86ms->apic_id_limit > 255 && kvm_enabled() &&
-        (!kvm_irqchip_in_kernel() || !kvm_enable_x2apic())) {
+        kvm_irqchip_in_kernel() && !kvm_enable_x2apic()) {
         error_report("current -smp configuration requires kernel "
                      "irqchip and X2APIC API support.");
         exit(EXIT_FAILURE);
@@ -143,6 +143,10 @@ void x86_cpus_init(X86MachineState *x86ms, int default_cpu_version)
         kvm_set_max_apic_id(x86ms->apic_id_limit);
     }
 
+    if (!kvm_irqchip_in_kernel()) {
+        apic_set_max_apic_id(x86ms->apic_id_limit);
+    }
+
     possible_cpus = mc->possible_cpu_arch_ids(ms);
     for (i = 0; i < ms->smp.cpus; i++) {
         x86_cpu_new(x86ms, possible_cpus->cpus[i].arch_id, &error_fatal);
diff --git a/hw/intc/apic.c b/hw/intc/apic.c
index cb8c20de93..9f741794a7 100644
--- a/hw/intc/apic.c
+++ b/hw/intc/apic.c
@@ -31,15 +31,15 @@
 #include "hw/i386/apic-msidef.h"
 #include "qapi/error.h"
 #include "qom/object.h"
-
-#define MAX_APICS 255
-#define MAX_APIC_WORDS 8
+#include "tcg/helper-tcg.h"
 
 #define SYNC_FROM_VAPIC                 0x1
 #define SYNC_TO_VAPIC                   0x2
 #define SYNC_ISR_IRR_TO_VAPIC           0x4
 
-static APICCommonState *local_apics[MAX_APICS + 1];
+static APICCommonState **local_apics;
+static uint32_t max_apics;
+static uint32_t max_apic_words;
 
 #define TYPE_APIC "apic"
 /*This is reusing the APICCommonState typedef from APIC_COMMON */
@@ -49,7 +49,19 @@ DECLARE_INSTANCE_CHECKER(APICCommonState, APIC,
 static void apic_set_irq(APICCommonState *s, int vector_num, int trigger_mode);
 static void apic_update_irq(APICCommonState *s);
 static void apic_get_delivery_bitmask(uint32_t *deliver_bitmask,
-                                      uint8_t dest, uint8_t dest_mode);
+                                      uint32_t dest, uint8_t dest_mode);
+
+void apic_set_max_apic_id(uint32_t max_apic_id)
+{
+    int word_size = 32;
+
+    /* round up the max apic id to next multiple of words */
+    max_apics = (max_apic_id + word_size - 1) & ~(word_size - 1);
+
+    local_apics = g_malloc0(sizeof(*local_apics) * max_apics);
+    max_apic_words = max_apics >> 5;
+}
+
 
 /* Find first bit starting from msb */
 static int apic_fls_bit(uint32_t value)
@@ -199,7 +211,7 @@ static void apic_external_nmi(APICCommonState *s)
 #define foreach_apic(apic, deliver_bitmask, code) \
 {\
     int __i, __j;\
-    for(__i = 0; __i < MAX_APIC_WORDS; __i++) {\
+    for(__i = 0; __i < max_apic_words; __i++) {\
         uint32_t __mask = deliver_bitmask[__i];\
         if (__mask) {\
             for(__j = 0; __j < 32; __j++) {\
@@ -226,7 +238,7 @@ static void apic_bus_deliver(const uint32_t *deliver_bitmask,
             {
                 int i, d;
                 d = -1;
-                for(i = 0; i < MAX_APIC_WORDS; i++) {
+                for(i = 0; i < max_apic_words; i++) {
                     if (deliver_bitmask[i]) {
                         d = i * 32 + apic_ffs_bit(deliver_bitmask[i]);
                         break;
@@ -276,16 +288,18 @@ static void apic_bus_deliver(const uint32_t *deliver_bitmask,
                  apic_set_irq(apic_iter, vector_num, trigger_mode) );
 }
 
-void apic_deliver_irq(uint8_t dest, uint8_t dest_mode, uint8_t delivery_mode,
-                      uint8_t vector_num, uint8_t trigger_mode)
+static void apic_deliver_irq(uint32_t dest, uint8_t dest_mode,
+                             uint8_t delivery_mode, uint8_t vector_num,
+                             uint8_t trigger_mode)
 {
-    uint32_t deliver_bitmask[MAX_APIC_WORDS];
+    uint32_t *deliver_bitmask = g_malloc(max_apic_words * sizeof(uint32_t));
 
     trace_apic_deliver_irq(dest, dest_mode, delivery_mode, vector_num,
                            trigger_mode);
 
     apic_get_delivery_bitmask(deliver_bitmask, dest, dest_mode);
     apic_bus_deliver(deliver_bitmask, delivery_mode, vector_num, trigger_mode);
+    g_free(deliver_bitmask);
 }
 
 bool is_x2apic_mode(DeviceState *dev)
@@ -442,57 +456,121 @@ static void apic_eoi(APICCommonState *s)
     apic_update_irq(s);
 }
 
-static int apic_find_dest(uint8_t dest)
+static bool apic_match_dest(APICCommonState *apic, uint32_t dest)
 {
-    APICCommonState *apic = local_apics[dest];
+    if (is_x2apic_mode(&apic->parent_obj)) {
+        return apic->initial_apic_id == dest;
+    } else {
+        return apic->id == (uint8_t)dest;
+    }
+}
+
+static void apic_find_dest(uint32_t *deliver_bitmask, uint32_t dest)
+{
+    APICCommonState *apic = NULL;
     int i;
 
-    if (apic && apic->id == dest)
-        return dest;  /* shortcut in case apic->id == local_apics[dest]->id */
-
-    for (i = 0; i < MAX_APICS; i++) {
+    for (i = 0; i < max_apics; i++) {
         apic = local_apics[i];
-        if (apic && apic->id == dest)
-            return i;
-        if (!apic)
-            break;
+        if (apic && apic_match_dest(apic, dest)) {
+            apic_set_bit(deliver_bitmask, i);
+        }
     }
+}
 
-    return -1;
+/*
+ * Deliver interrupt to x2APIC CPUs if it is x2APIC broadcast.
+ * Otherwise, deliver interrupt to xAPIC CPUs if it is xAPIC
+ * broadcast.
+ */
+static void apic_get_broadcast_bitmask(uint32_t *deliver_bitmask,
+                                       bool is_x2apic_broadcast)
+{
+    int i;
+    APICCommonState *apic_iter;
+
+    for (i = 0; i < max_apics; i++) {
+        apic_iter = local_apics[i];
+        if (apic_iter) {
+            bool apic_in_x2apic = is_x2apic_mode(&apic_iter->parent_obj);
+
+            if (is_x2apic_broadcast && apic_in_x2apic) {
+                apic_set_bit(deliver_bitmask, i);
+            } else if (!is_x2apic_broadcast && !apic_in_x2apic) {
+                apic_set_bit(deliver_bitmask, i);
+            }
+        }
+    }
 }
 
 static void apic_get_delivery_bitmask(uint32_t *deliver_bitmask,
-                                      uint8_t dest, uint8_t dest_mode)
+                                      uint32_t dest, uint8_t dest_mode)
 {
     APICCommonState *apic_iter;
     int i;
 
-    if (dest_mode == 0) {
+    memset(deliver_bitmask, 0x00, max_apic_words * sizeof(uint32_t));
+
+    /*
+     * x2APIC broadcast is delivered to all x2APIC CPUs regardless of
+     * destination mode. In case the destination mode is physical, it is
+     * broadcasted to all xAPIC CPUs too. Otherwise, if the destination
+     * mode is logical, we need to continue checking if xAPIC CPUs accepts
+     * the interrupt.
+     */
+    if (dest == 0xffffffff) {
+        if (dest_mode == APIC_DESTMODE_PHYSICAL) {
+            memset(deliver_bitmask, 0xff, max_apic_words * sizeof(uint32_t));
+            return;
+        } else {
+            apic_get_broadcast_bitmask(deliver_bitmask, true);
+        }
+    }
+
+    if (dest_mode == APIC_DESTMODE_PHYSICAL) {
+        apic_find_dest(deliver_bitmask, dest);
+        /* Any APIC in xAPIC mode will interpret 0xFF as broadcast */
         if (dest == 0xff) {
-            memset(deliver_bitmask, 0xff, MAX_APIC_WORDS * sizeof(uint32_t));
-        } else {
-            int idx = apic_find_dest(dest);
-            memset(deliver_bitmask, 0x00, MAX_APIC_WORDS * sizeof(uint32_t));
-            if (idx >= 0)
-                apic_set_bit(deliver_bitmask, idx);
+            apic_get_broadcast_bitmask(deliver_bitmask, false);
         }
     } else {
-        /* XXX: cluster mode */
-        memset(deliver_bitmask, 0x00, MAX_APIC_WORDS * sizeof(uint32_t));
-        for(i = 0; i < MAX_APICS; i++) {
+        /* XXX: logical mode */
+        for(i = 0; i < max_apics; i++) {
             apic_iter = local_apics[i];
             if (apic_iter) {
-                if (apic_iter->dest_mode == 0xf) {
-                    if (dest & apic_iter->log_dest)
-                        apic_set_bit(deliver_bitmask, i);
-                } else if (apic_iter->dest_mode == 0x0) {
-                    if ((dest & 0xf0) == (apic_iter->log_dest & 0xf0) &&
-                        (dest & apic_iter->log_dest & 0x0f)) {
+                /* x2APIC logical mode */
+                if (apic_iter->apicbase & MSR_IA32_APICBASE_EXTD) {
+                    if ((dest & 0xffff0000) == (apic_iter->extended_log_dest & 0xffff0000) &&
+                        (dest & apic_iter->extended_log_dest & 0xffff)) {
                         apic_set_bit(deliver_bitmask, i);
                     }
+                } else {
+                    dest = (uint8_t)dest;
+                    if (apic_iter->dest_mode == APIC_DESTMODE_LOGICAL_FLAT) {
+                        if (dest & apic_iter->log_dest) {
+                            apic_set_bit(deliver_bitmask, i);
+                        }
+                    } else if (apic_iter->dest_mode == APIC_DESTMODE_LOGICAL_CLUSTER) {
+                        /*
+                         * In cluster model of xAPIC logical mode IPI, 4 higher
+                         * bits are used as cluster address, 4 lower bits are
+                         * the bitmask for local APICs in the cluster. The IPI
+                         * is delivered to an APIC if the cluster address
+                         * matches and the APIC's address bit in the cluster is
+                         * set in bitmask of destination ID in IPI.
+                         *
+                         * The cluster address ranges from 0 - 14, the cluster
+                         * address 15 (0xf) is the broadcast address to all
+                         * clusters.
+                         */
+                        if ((dest & 0xf0) == 0xf0 ||
+                            (dest & 0xf0) == (apic_iter->log_dest & 0xf0)) {
+                            if (dest & apic_iter->log_dest & 0x0f) {
+                                apic_set_bit(deliver_bitmask, i);
+                            }
+                        }
+                    }
                 }
-            } else {
-                break;
             }
         }
     }
@@ -516,29 +594,36 @@ void apic_sipi(DeviceState *dev)
     s->wait_for_sipi = 0;
 }
 
-static void apic_deliver(DeviceState *dev, uint8_t dest, uint8_t dest_mode,
+static void apic_deliver(DeviceState *dev, uint32_t dest, uint8_t dest_mode,
                          uint8_t delivery_mode, uint8_t vector_num,
-                         uint8_t trigger_mode)
+                         uint8_t trigger_mode, uint8_t dest_shorthand)
 {
     APICCommonState *s = APIC(dev);
-    uint32_t deliver_bitmask[MAX_APIC_WORDS];
-    int dest_shorthand = (s->icr[0] >> 18) & 3;
     APICCommonState *apic_iter;
+    uint32_t deliver_bitmask_size = max_apic_words * sizeof(uint32_t);
+    uint32_t *deliver_bitmask = g_malloc(deliver_bitmask_size);
+    uint32_t current_apic_id;
+
+    if (is_x2apic_mode(dev)) {
+        current_apic_id = s->initial_apic_id;
+    } else {
+        current_apic_id = s->id;
+    }
 
     switch (dest_shorthand) {
     case 0:
         apic_get_delivery_bitmask(deliver_bitmask, dest, dest_mode);
         break;
     case 1:
-        memset(deliver_bitmask, 0x00, sizeof(deliver_bitmask));
-        apic_set_bit(deliver_bitmask, s->id);
+        memset(deliver_bitmask, 0x00, deliver_bitmask_size);
+        apic_set_bit(deliver_bitmask, current_apic_id);
         break;
     case 2:
-        memset(deliver_bitmask, 0xff, sizeof(deliver_bitmask));
+        memset(deliver_bitmask, 0xff, deliver_bitmask_size);
         break;
     case 3:
-        memset(deliver_bitmask, 0xff, sizeof(deliver_bitmask));
-        apic_reset_bit(deliver_bitmask, s->id);
+        memset(deliver_bitmask, 0xff, deliver_bitmask_size);
+        apic_reset_bit(deliver_bitmask, current_apic_id);
         break;
     }
 
@@ -562,6 +647,7 @@ static void apic_deliver(DeviceState *dev, uint8_t dest, uint8_t dest_mode,
     }
 
     apic_bus_deliver(deliver_bitmask, delivery_mode, vector_num, trigger_mode);
+    g_free(deliver_bitmask);
 }
 
 static bool apic_check_pic(APICCommonState *s)
@@ -657,7 +743,11 @@ uint64_t apic_register_read(int index)
 
     switch(index) {
     case 0x02: /* id */
-        val = s->id << 24;
+        if (is_x2apic_mode(dev)) {
+            val = s->initial_apic_id;
+        } else {
+            val = s->id << 24;
+        }
         break;
     case 0x03: /* version */
         val = s->version | ((APIC_LVT_NB - 1) << 16);
@@ -680,9 +770,17 @@ uint64_t apic_register_read(int index)
         val = 0;
         break;
     case 0x0d:
-        val = s->log_dest << 24;
+        if (is_x2apic_mode(dev)) {
+            val = s->extended_log_dest;
+        } else {
+            val = s->log_dest << 24;
+        }
         break;
     case 0x0e:
+        if (is_x2apic_mode(dev)) {
+            raise_exception_ra(&s->cpu->env, EXCP0D_GPF, GETPC());
+        }
+
         val = (s->dest_mode << 28) | 0xfffffff;
         break;
     case 0x0f:
@@ -745,7 +843,12 @@ static void apic_send_msi(MSIMessage *msi)
 {
     uint64_t addr = msi->address;
     uint32_t data = msi->data;
-    uint8_t dest = (addr & MSI_ADDR_DEST_ID_MASK) >> MSI_ADDR_DEST_ID_SHIFT;
+    uint32_t dest = (addr & MSI_ADDR_DEST_ID_MASK) >> MSI_ADDR_DEST_ID_SHIFT;
+    /*
+     * The higher 3 bytes of destination id is stored in higher word of
+     * msi address. See x86_iommu_irq_to_msi_message()
+     */
+    dest = dest | (addr >> 32);
     uint8_t vector = (data & MSI_DATA_VECTOR_MASK) >> MSI_DATA_VECTOR_SHIFT;
     uint8_t dest_mode = (addr >> MSI_ADDR_DEST_MODE_SHIFT) & 0x1;
     uint8_t trigger_mode = (data >> MSI_DATA_TRIGGER_SHIFT) & 0x1;
@@ -769,6 +872,10 @@ void apic_register_write(int index, uint64_t val)
 
     switch(index) {
     case 0x02:
+        if (is_x2apic_mode(dev)) {
+            raise_exception_ra(&s->cpu->env, EXCP0D_GPF, GETPC());
+        }
+
         s->id = (val >> 24);
         break;
     case 0x03:
@@ -788,9 +895,17 @@ void apic_register_write(int index, uint64_t val)
         apic_eoi(s);
         break;
     case 0x0d:
+        if (is_x2apic_mode(dev)) {
+            raise_exception_ra(&s->cpu->env, EXCP0D_GPF, GETPC());
+        }
+
         s->log_dest = val >> 24;
         break;
     case 0x0e:
+        if (is_x2apic_mode(dev)) {
+            raise_exception_ra(&s->cpu->env, EXCP0D_GPF, GETPC());
+        }
+
         s->dest_mode = val >> 28;
         break;
     case 0x0f:
@@ -802,13 +917,27 @@ void apic_register_write(int index, uint64_t val)
     case 0x20 ... 0x27:
     case 0x28:
         break;
-    case 0x30:
+    case 0x30: {
+        uint32_t dest;
+
         s->icr[0] = val;
-        apic_deliver(dev, (s->icr[1] >> 24) & 0xff, (s->icr[0] >> 11) & 1,
+        if (is_x2apic_mode(dev)) {
+            s->icr[1] = val >> 32;
+            dest = s->icr[1];
+        } else {
+            dest = (s->icr[1] >> 24) & 0xff;
+        }
+
+        apic_deliver(dev, dest, (s->icr[0] >> 11) & 1,
                      (s->icr[0] >> 8) & 7, (s->icr[0] & 0xff),
-                     (s->icr[0] >> 15) & 1);
+                     (s->icr[0] >> 15) & 1, (s->icr[0] >> 18) & 3);
         break;
+    }
     case 0x31:
+        if (is_x2apic_mode(dev)) {
+            raise_exception_ra(&s->cpu->env, EXCP0D_GPF, GETPC());
+        }
+
         s->icr[1] = val;
         break;
     case 0x32 ... 0x37:
@@ -837,6 +966,23 @@ void apic_register_write(int index, uint64_t val)
             s->count_shift = (v + 1) & 7;
         }
         break;
+    case 0x3f: {
+        int vector = val & 0xff;
+
+        if (!is_x2apic_mode(dev)) {
+            raise_exception_ra(&s->cpu->env, EXCP0D_GPF, GETPC());
+        }
+
+        /*
+         * Self IPI is identical to IPI with
+         * - Destination shorthand: 1 (Self)
+         * - Trigger mode: 0 (Edge)
+         * - Delivery mode: 0 (Fixed)
+         */
+        apic_deliver(dev, 0, 0, APIC_DM_FIXED, vector, 0, 1);
+
+        break;
+    }
     default:
         s->esr |= APIC_ESR_ILLEGAL_ADDRESS;
         break;
@@ -894,12 +1040,6 @@ static void apic_realize(DeviceState *dev, Error **errp)
 {
     APICCommonState *s = APIC(dev);
 
-    if (s->id >= MAX_APICS) {
-        error_setg(errp, "%s initialization failed. APIC ID %d is invalid",
-                   object_get_typename(OBJECT(dev)), s->id);
-        return;
-    }
-
     if (kvm_enabled()) {
         warn_report("Userspace local APIC is deprecated for KVM.");
         warn_report("Do not use kernel-irqchip except for the -M isapc machine type.");
@@ -916,7 +1056,7 @@ static void apic_realize(DeviceState *dev, Error **errp)
     s->io_memory.disable_reentrancy_guard = true;
 
     s->timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, apic_timer, s);
-    local_apics[s->id] = s;
+    local_apics[s->initial_apic_id] = s;
 
     msi_nonbroken = true;
 }
@@ -926,7 +1066,7 @@ static void apic_unrealize(DeviceState *dev)
     APICCommonState *s = APIC(dev);
 
     timer_free(s->timer);
-    local_apics[s->id] = NULL;
+    local_apics[s->initial_apic_id] = NULL;
 }
 
 static void apic_class_init(ObjectClass *klass, void *data)
diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c
index 68ad30e2f5..ac8ec00eef 100644
--- a/hw/intc/apic_common.c
+++ b/hw/intc/apic_common.c
@@ -283,6 +283,10 @@ static void apic_common_realize(DeviceState *dev, Error **errp)
     }
     vmstate_register_with_alias_id(NULL, instance_id, &vmstate_apic_common,
                                    s, -1, 0, NULL);
+
+    /* APIC LDR in x2APIC mode */
+    s->extended_log_dest = ((s->initial_apic_id & 0xffff0) << 16) |
+                            (1 << (s->initial_apic_id & 0xf));
 }
 
 static void apic_common_unrealize(DeviceState *dev)
@@ -423,6 +427,11 @@ static void apic_common_set_id(Object *obj, Visitor *v, const char *name,
         return;
     }
 
+    if (value >= 255 && !cpu_has_x2apic_feature(&s->cpu->env)) {
+        error_setg(errp, "APIC ID %d requires x2APIC feature in CPU", value);
+        return;
+    }
+
     s->initial_apic_id = value;
     s->id = (uint8_t)value;
 }
diff --git a/include/hw/i386/apic.h b/include/hw/i386/apic.h
index 2cebeb4faf..12aad09f4c 100644
--- a/include/hw/i386/apic.h
+++ b/include/hw/i386/apic.h
@@ -3,8 +3,7 @@
 
 
 /* apic.c */
-void apic_deliver_irq(uint8_t dest, uint8_t dest_mode, uint8_t delivery_mode,
-                      uint8_t vector_num, uint8_t trigger_mode);
+void apic_set_max_apic_id(uint32_t max_apic_id);
 int apic_accept_pic_intr(DeviceState *s);
 void apic_deliver_pic_intr(DeviceState *s, int level);
 void apic_deliver_nmi(DeviceState *d);
diff --git a/include/hw/i386/apic_internal.h b/include/hw/i386/apic_internal.h
index 5f2ba24bfc..e796e6cae3 100644
--- a/include/hw/i386/apic_internal.h
+++ b/include/hw/i386/apic_internal.h
@@ -46,8 +46,10 @@
 #define APIC_DM_EXTINT                  7
 
 /* APIC destination mode */
-#define APIC_DESTMODE_FLAT              0xf
-#define APIC_DESTMODE_CLUSTER           1
+#define APIC_DESTMODE_PHYSICAL          0
+#define APIC_DESTMODE_LOGICAL           1
+#define APIC_DESTMODE_LOGICAL_FLAT      0xf
+#define APIC_DESTMODE_LOGICAL_CLUSTER   0
 
 #define APIC_TRIGGER_EDGE               0
 #define APIC_TRIGGER_LEVEL              1
@@ -187,6 +189,7 @@ struct APICCommonState {
     DeviceState *vapic;
     hwaddr vapic_paddr; /* note: persistence via kvmvapic */
     bool legacy_instance_id;
+    uint32_t extended_log_dest;
 };
 
 typedef struct VAPICState {
diff --git a/target/i386/cpu-sysemu.c b/target/i386/cpu-sysemu.c
index 2375e48178..0e0f8cf8ad 100644
--- a/target/i386/cpu-sysemu.c
+++ b/target/i386/cpu-sysemu.c
@@ -281,11 +281,17 @@ void x86_cpu_apic_create(X86CPU *cpu, Error **errp)
                               OBJECT(cpu->apic_state));
     object_unref(OBJECT(cpu->apic_state));
 
-    qdev_prop_set_uint32(cpu->apic_state, "id", cpu->apic_id);
     /* TODO: convert to link<> */
     apic = APIC_COMMON(cpu->apic_state);
     apic->cpu = cpu;
     apic->apicbase = APIC_DEFAULT_ADDRESS | MSR_IA32_APICBASE_ENABLE;
+
+    /*
+     * apic_common_set_id needs to check if the CPU has x2APIC
+     * feature in case APIC ID >= 255, so we need to set apic->cpu
+     * before setting APIC ID
+     */
+    qdev_prop_set_uint32(cpu->apic_state, "id", cpu->apic_id);
 }
 
 void x86_cpu_apic_realize(X86CPU *cpu, Error **errp)
-- 
2.25.1
[PATCH v8 3/5] apic, i386/tcg: add x2apic transitions
Posted by Bui Quang Minh 7 months, 1 week ago
This commit adds support for x2APIC transitions when writing to
MSR_IA32_APICBASE register and finally adds CPUID_EXT_X2APIC to
TCG_EXT_FEATURES.

Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
---
 hw/intc/apic.c                       | 50 ++++++++++++++++++++++++++++
 hw/intc/apic_common.c                |  7 ++--
 target/i386/cpu-sysemu.c             | 10 ++++++
 target/i386/cpu.c                    |  8 ++---
 target/i386/cpu.h                    |  6 ++++
 target/i386/tcg/sysemu/misc_helper.c |  4 +++
 6 files changed, 76 insertions(+), 9 deletions(-)

diff --git a/hw/intc/apic.c b/hw/intc/apic.c
index 9f741794a7..b8f56836a6 100644
--- a/hw/intc/apic.c
+++ b/hw/intc/apic.c
@@ -309,8 +309,41 @@ bool is_x2apic_mode(DeviceState *dev)
     return s->apicbase & MSR_IA32_APICBASE_EXTD;
 }
 
+static void apic_set_base_check(APICCommonState *s, uint64_t val)
+{
+    /* Enable x2apic when x2apic is not supported by CPU */
+    if (!cpu_has_x2apic_feature(&s->cpu->env) &&
+        val & MSR_IA32_APICBASE_EXTD)
+        raise_exception_ra(&s->cpu->env, EXCP0D_GPF, GETPC());
+
+    /*
+     * Transition into invalid state
+     * (s->apicbase & MSR_IA32_APICBASE_ENABLE == 0) &&
+     * (s->apicbase & MSR_IA32_APICBASE_EXTD) == 1
+     */
+    if (!(val & MSR_IA32_APICBASE_ENABLE) &&
+        (val & MSR_IA32_APICBASE_EXTD))
+        raise_exception_ra(&s->cpu->env, EXCP0D_GPF, GETPC());
+
+    /* Invalid transition from disabled mode to x2APIC */
+    if (!(s->apicbase & MSR_IA32_APICBASE_ENABLE) &&
+        !(s->apicbase & MSR_IA32_APICBASE_EXTD) &&
+        (val & MSR_IA32_APICBASE_ENABLE) &&
+        (val & MSR_IA32_APICBASE_EXTD))
+        raise_exception_ra(&s->cpu->env, EXCP0D_GPF, GETPC());
+
+    /* Invalid transition from x2APIC to xAPIC */
+    if ((s->apicbase & MSR_IA32_APICBASE_ENABLE) &&
+        (s->apicbase & MSR_IA32_APICBASE_EXTD) &&
+        (val & MSR_IA32_APICBASE_ENABLE) &&
+        !(val & MSR_IA32_APICBASE_EXTD))
+        raise_exception_ra(&s->cpu->env, EXCP0D_GPF, GETPC());
+}
+
 static void apic_set_base(APICCommonState *s, uint64_t val)
 {
+    apic_set_base_check(s, val);
+
     s->apicbase = (val & 0xfffff000) |
         (s->apicbase & (MSR_IA32_APICBASE_BSP | MSR_IA32_APICBASE_ENABLE));
     /* if disabled, cannot be enabled again */
@@ -319,6 +352,23 @@ static void apic_set_base(APICCommonState *s, uint64_t val)
         cpu_clear_apic_feature(&s->cpu->env);
         s->spurious_vec &= ~APIC_SV_ENABLE;
     }
+
+    /* Transition from disabled mode to xAPIC */
+    if (!(s->apicbase & MSR_IA32_APICBASE_ENABLE) &&
+        (val & MSR_IA32_APICBASE_ENABLE)) {
+        s->apicbase |= MSR_IA32_APICBASE_ENABLE;
+        cpu_set_apic_feature(&s->cpu->env);
+    }
+
+    /* Transition from xAPIC to x2APIC */
+    if (cpu_has_x2apic_feature(&s->cpu->env) &&
+        !(s->apicbase & MSR_IA32_APICBASE_EXTD) &&
+        (val & MSR_IA32_APICBASE_EXTD)) {
+        s->apicbase |= MSR_IA32_APICBASE_EXTD;
+
+        s->log_dest = ((s->initial_apic_id & 0xffff0) << 16) |
+                      (1 << (s->initial_apic_id & 0xf));
+    }
 }
 
 static void apic_set_tpr(APICCommonState *s, uint8_t val)
diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c
index ac8ec00eef..c6b10af88f 100644
--- a/hw/intc/apic_common.c
+++ b/hw/intc/apic_common.c
@@ -42,11 +42,8 @@ void cpu_set_apic_base(DeviceState *dev, uint64_t val)
     if (dev) {
         APICCommonState *s = APIC_COMMON(dev);
         APICCommonClass *info = APIC_COMMON_GET_CLASS(s);
-        /* switching to x2APIC, reset possibly modified xAPIC ID */
-        if (!(s->apicbase & MSR_IA32_APICBASE_EXTD) &&
-            (val & MSR_IA32_APICBASE_EXTD)) {
-            s->id = s->initial_apic_id;
-        }
+        /* Reset possibly modified xAPIC ID */
+        s->id = s->initial_apic_id;
         info->set_base(s, val);
     }
 }
diff --git a/target/i386/cpu-sysemu.c b/target/i386/cpu-sysemu.c
index 0e0f8cf8ad..7422096737 100644
--- a/target/i386/cpu-sysemu.c
+++ b/target/i386/cpu-sysemu.c
@@ -235,6 +235,16 @@ void cpu_clear_apic_feature(CPUX86State *env)
     env->features[FEAT_1_EDX] &= ~CPUID_APIC;
 }
 
+void cpu_set_apic_feature(CPUX86State *env)
+{
+    env->features[FEAT_1_EDX] |= CPUID_APIC;
+}
+
+bool cpu_has_x2apic_feature(CPUX86State *env)
+{
+    return env->features[FEAT_1_ECX] & CPUID_EXT_X2APIC;
+}
+
 bool cpu_is_bsp(X86CPU *cpu)
 {
     return cpu_get_apic_base(cpu->apic_state) & MSR_IA32_APICBASE_BSP;
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 7836aa6692..1c6e0dc2f3 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -630,8 +630,7 @@ void x86_cpu_vendor_words2str(char *dst, uint32_t vendor1,
  * in CPL=3; remove them if they are ever implemented for system emulation.
  */
 #if defined CONFIG_USER_ONLY
-#define CPUID_EXT_KERNEL_FEATURES (CPUID_EXT_PCID | CPUID_EXT_TSC_DEADLINE_TIMER | \
-                                 CPUID_EXT_X2APIC)
+#define CPUID_EXT_KERNEL_FEATURES (CPUID_EXT_PCID | CPUID_EXT_TSC_DEADLINE_TIMER)
 #else
 #define CPUID_EXT_KERNEL_FEATURES 0
 #endif
@@ -641,12 +640,13 @@ void x86_cpu_vendor_words2str(char *dst, uint32_t vendor1,
           CPUID_EXT_XSAVE | /* CPUID_EXT_OSXSAVE is dynamic */   \
           CPUID_EXT_MOVBE | CPUID_EXT_AES | CPUID_EXT_HYPERVISOR | \
           CPUID_EXT_RDRAND | CPUID_EXT_AVX | CPUID_EXT_F16C | \
-          CPUID_EXT_FMA | CPUID_EXT_KERNEL_FEATURES)
+          CPUID_EXT_FMA | CPUID_EXT_X2APIC | CPUID_EXT_KERNEL_FEATURES)
           /* missing:
           CPUID_EXT_DTES64, CPUID_EXT_DSCPL, CPUID_EXT_VMX, CPUID_EXT_SMX,
           CPUID_EXT_EST, CPUID_EXT_TM2, CPUID_EXT_CID,
           CPUID_EXT_XTPR, CPUID_EXT_PDCM, CPUID_EXT_PCID, CPUID_EXT_DCA,
-          CPUID_EXT_X2APIC, CPUID_EXT_TSC_DEADLINE_TIMER */
+          CPUID_EXT_TSC_DEADLINE_TIMER
+          */
 
 #ifdef TARGET_X86_64
 #define TCG_EXT2_X86_64_FEATURES CPUID_EXT2_LM
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 78489378ad..270e320704 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -379,6 +379,10 @@ typedef enum X86Seg {
 #define MSR_IA32_APICBASE_ENABLE        (1<<11)
 #define MSR_IA32_APICBASE_EXTD          (1 << 10)
 #define MSR_IA32_APICBASE_BASE          (0xfffffU<<12)
+#define MSR_IA32_APICBASE_RESERVED \
+        (~(uint64_t)(MSR_IA32_APICBASE_BSP | MSR_IA32_APICBASE_ENABLE \
+                     | MSR_IA32_APICBASE_EXTD | MSR_IA32_APICBASE_BASE))
+
 #define MSR_IA32_FEATURE_CONTROL        0x0000003a
 #define MSR_TSC_ADJUST                  0x0000003b
 #define MSR_IA32_SPEC_CTRL              0x48
@@ -2202,8 +2206,10 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
                    uint32_t *eax, uint32_t *ebx,
                    uint32_t *ecx, uint32_t *edx);
 void cpu_clear_apic_feature(CPUX86State *env);
+void cpu_set_apic_feature(CPUX86State *env);
 void host_cpuid(uint32_t function, uint32_t count,
                 uint32_t *eax, uint32_t *ebx, uint32_t *ecx, uint32_t *edx);
+bool cpu_has_x2apic_feature(CPUX86State *env);
 
 /* helper.c */
 void x86_cpu_set_a20(X86CPU *cpu, int a20_state);
diff --git a/target/i386/tcg/sysemu/misc_helper.c b/target/i386/tcg/sysemu/misc_helper.c
index 1fce2076a3..91a58d4d97 100644
--- a/target/i386/tcg/sysemu/misc_helper.c
+++ b/target/i386/tcg/sysemu/misc_helper.c
@@ -159,6 +159,10 @@ void helper_wrmsr(CPUX86State *env)
         env->sysenter_eip = val;
         break;
     case MSR_IA32_APICBASE:
+        if (val & MSR_IA32_APICBASE_RESERVED) {
+            goto error;
+        }
+
         cpu_set_apic_base(env_archcpu(env)->apic_state, val);
         break;
     case MSR_EFER:
-- 
2.25.1
[PATCH v8 4/5] intel_iommu: allow Extended Interrupt Mode when using userspace APIC
Posted by Bui Quang Minh 7 months, 1 week ago
As userspace APIC now supports x2APIC, intel interrupt remapping
hardware can be set to EIM mode when userspace local APIC is used.

Suggested-by: Joao Martins <joao.m.martins@oracle.com>
Acked-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
---
 hw/i386/intel_iommu.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index c0ce896668..a3e4bf5497 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -4049,11 +4049,7 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp)
                                               ON_OFF_AUTO_ON : ON_OFF_AUTO_OFF;
     }
     if (s->intr_eim == ON_OFF_AUTO_ON && !s->buggy_eim) {
-        if (!kvm_irqchip_is_split()) {
-            error_setg(errp, "eim=on requires accel=kvm,kernel-irqchip=split");
-            return false;
-        }
-        if (kvm_enabled() && !kvm_enable_x2apic()) {
+        if (kvm_irqchip_is_split() && !kvm_enable_x2apic()) {
             error_setg(errp, "eim=on requires support on the KVM side"
                              "(X2APIC_API, first shipped in v4.7)");
             return false;
-- 
2.25.1
[PATCH v8 5/5] amd_iommu: report x2APIC support to the operating system
Posted by Bui Quang Minh 7 months, 1 week ago
This commit adds XTSup configuration to let user choose to whether enable
this feature or not. When XTSup is enabled, additional bytes in IRTE with
enabled guest virtual VAPIC are used to support 32-bit destination id.

Additionally, this commit exports IVHD type 0x11 besides the old IVHD type
0x10 in ACPI table. IVHD type 0x10 does not report full set of IOMMU
features only the legacy ones, so operating system (e.g. Linux) may only
detects x2APIC support if IVHD type 0x11 is available. The IVHD type 0x10
is kept so that old operating system that only parses type 0x10 can detect
the IOMMU device.

Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
---
 hw/i386/acpi-build.c | 129 +++++++++++++++++++++++++++----------------
 hw/i386/amd_iommu.c  |  29 +++++++++-
 hw/i386/amd_iommu.h  |  16 ++++--
 3 files changed, 117 insertions(+), 57 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 4d2d40bab5..aacd35b926 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2337,30 +2337,23 @@ static void
 build_amd_iommu(GArray *table_data, BIOSLinker *linker, const char *oem_id,
                 const char *oem_table_id)
 {
-    int ivhd_table_len = 24;
     AMDVIState *s = AMD_IOMMU_DEVICE(x86_iommu_get_default());
     GArray *ivhd_blob = g_array_new(false, true, 1);
     AcpiTable table = { .sig = "IVRS", .rev = 1, .oem_id = oem_id,
                         .oem_table_id = oem_table_id };
+    uint64_t feature_report;
 
     acpi_table_begin(&table, table_data);
     /* IVinfo - IO virtualization information common to all
      * IOMMU units in a system
      */
-    build_append_int_noprefix(table_data, 40UL << 8/* PASize */, 4);
+    build_append_int_noprefix(table_data,
+                             (1UL << 0) | /* EFRSup */
+                             (40UL << 8), /* PASize */
+                             4);
     /* reserved */
     build_append_int_noprefix(table_data, 0, 8);
 
-    /* IVHD definition - type 10h */
-    build_append_int_noprefix(table_data, 0x10, 1);
-    /* virtualization flags */
-    build_append_int_noprefix(table_data,
-                             (1UL << 0) | /* HtTunEn      */
-                             (1UL << 4) | /* iotblSup     */
-                             (1UL << 6) | /* PrefSup      */
-                             (1UL << 7),  /* PPRSup       */
-                             1);
-
     /*
      * A PCI bus walk, for each PCI host bridge, is necessary to create a
      * complete set of IVHD entries.  Do this into a separate blob so that we
@@ -2380,56 +2373,94 @@ build_amd_iommu(GArray *table_data, BIOSLinker *linker, const char *oem_id,
         build_append_int_noprefix(ivhd_blob, 0x0000001, 4);
     }
 
-    ivhd_table_len += ivhd_blob->len;
-
     /*
      * When interrupt remapping is supported, we add a special IVHD device
-     * for type IO-APIC.
-     */
-    if (x86_iommu_ir_supported(x86_iommu_get_default())) {
-        ivhd_table_len += 8;
-    }
-
-    /* IVHD length */
-    build_append_int_noprefix(table_data, ivhd_table_len, 2);
-    /* DeviceID */
-    build_append_int_noprefix(table_data,
-                              object_property_get_int(OBJECT(&s->pci), "addr",
-                                                      &error_abort), 2);
-    /* Capability offset */
-    build_append_int_noprefix(table_data, s->pci.capab_offset, 2);
-    /* IOMMU base address */
-    build_append_int_noprefix(table_data, s->mmio.addr, 8);
-    /* PCI Segment Group */
-    build_append_int_noprefix(table_data, 0, 2);
-    /* IOMMU info */
-    build_append_int_noprefix(table_data, 0, 2);
-    /* IOMMU Feature Reporting */
-    build_append_int_noprefix(table_data,
-                             (48UL << 30) | /* HATS   */
-                             (48UL << 28) | /* GATS   */
-                             (1UL << 2)   | /* GTSup  */
-                             (1UL << 6),    /* GASup  */
-                             4);
-
-    /* IVHD entries as found above */
-    g_array_append_vals(table_data, ivhd_blob->data, ivhd_blob->len);
-    g_array_free(ivhd_blob, TRUE);
-
-    /*
-     * Add a special IVHD device type.
+     * for type IO-APIC
      * Refer to spec - Table 95: IVHD device entry type codes
      *
      * Linux IOMMU driver checks for the special IVHD device (type IO-APIC).
      * See Linux kernel commit 'c2ff5cf5294bcbd7fa50f7d860e90a66db7e5059'
      */
     if (x86_iommu_ir_supported(x86_iommu_get_default())) {
-        build_append_int_noprefix(table_data,
+        build_append_int_noprefix(ivhd_blob,
                                  (0x1ull << 56) |           /* type IOAPIC */
                                  (IOAPIC_SB_DEVID << 40) |  /* IOAPIC devid */
                                  0x48,                      /* special device */
                                  8);
     }
+
+    /* IVHD definition - type 10h */
+    build_append_int_noprefix(table_data, 0x10, 1);
+    /* virtualization flags */
+    build_append_int_noprefix(table_data,
+                             (1UL << 0) | /* HtTunEn      */
+                             (1UL << 4) | /* iotblSup     */
+                             (1UL << 6) | /* PrefSup      */
+                             (1UL << 7),  /* PPRSup       */
+                             1);
+
+    /* IVHD length */
+    build_append_int_noprefix(table_data, ivhd_blob->len + 24, 2);
+    /* DeviceID */
+    build_append_int_noprefix(table_data,
+                              object_property_get_int(OBJECT(&s->pci), "addr",
+                                                      &error_abort), 2);
+    /* Capability offset */
+    build_append_int_noprefix(table_data, s->pci.capab_offset, 2);
+    /* IOMMU base address */
+    build_append_int_noprefix(table_data, s->mmio.addr, 8);
+    /* PCI Segment Group */
+    build_append_int_noprefix(table_data, 0, 2);
+    /* IOMMU info */
+    build_append_int_noprefix(table_data, 0, 2);
+    /* IOMMU Feature Reporting */
+    feature_report = (48UL << 30) | /* HATS   */
+                     (48UL << 28) | /* GATS   */
+                     (1UL << 2)   | /* GTSup  */
+                     (1UL << 6);    /* GASup  */
+    if (s->xtsup) {
+        feature_report |= (1UL << 0); /* XTSup */
+    }
+    build_append_int_noprefix(table_data, feature_report, 4);
+
+    /* IVHD entries as found above */
+    g_array_append_vals(table_data, ivhd_blob->data, ivhd_blob->len);
+
+   /* IVHD definition - type 11h */
+    build_append_int_noprefix(table_data, 0x11, 1);
+    /* virtualization flags */
+    build_append_int_noprefix(table_data,
+                             (1UL << 0) | /* HtTunEn      */
+                             (1UL << 4),  /* iotblSup     */
+                             1);
+
+    /* IVHD length */
+    build_append_int_noprefix(table_data, ivhd_blob->len + 40, 2);
+    /* DeviceID */
+    build_append_int_noprefix(table_data,
+                              object_property_get_int(OBJECT(&s->pci), "addr",
+                                                      &error_abort), 2);
+    /* Capability offset */
+    build_append_int_noprefix(table_data, s->pci.capab_offset, 2);
+    /* IOMMU base address */
+    build_append_int_noprefix(table_data, s->mmio.addr, 8);
+    /* PCI Segment Group */
+    build_append_int_noprefix(table_data, 0, 2);
+    /* IOMMU info */
+    build_append_int_noprefix(table_data, 0, 2);
+    /* IOMMU Attributes */
+    build_append_int_noprefix(table_data, 0, 4);
+    /* EFR Register Image */
+    build_append_int_noprefix(table_data,
+                              amdvi_extended_feature_register(s),
+                              8);
+    /* EFR Register Image 2 */
+    build_append_int_noprefix(table_data, 0, 8);
+
+    /* IVHD entries as found above */
+    g_array_append_vals(table_data, ivhd_blob->data, ivhd_blob->len);
+
+    g_array_free(ivhd_blob, TRUE);
     acpi_table_end(linker, &table);
 }
 
diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
index c98a3c6e11..27109dd56a 100644
--- a/hw/i386/amd_iommu.c
+++ b/hw/i386/amd_iommu.c
@@ -31,6 +31,7 @@
 #include "hw/i386/apic_internal.h"
 #include "trace.h"
 #include "hw/i386/apic-msidef.h"
+#include "hw/qdev-properties.h"
 
 /* used AMD-Vi MMIO registers */
 const char *amdvi_mmio_low[] = {
@@ -74,6 +75,16 @@ typedef struct AMDVIIOTLBEntry {
     uint64_t page_mask;         /* physical page size  */
 } AMDVIIOTLBEntry;
 
+uint64_t amdvi_extended_feature_register(AMDVIState *s)
+{
+    uint64_t feature = AMDVI_DEFAULT_EXT_FEATURES;
+    if (s->xtsup) {
+        feature |= AMDVI_FEATURE_XT;
+    }
+
+    return feature;
+}
+
 /* configure MMIO registers at startup/reset */
 static void amdvi_set_quad(AMDVIState *s, hwaddr addr, uint64_t val,
                            uint64_t romask, uint64_t w1cmask)
@@ -1155,7 +1166,12 @@ static int amdvi_int_remap_ga(AMDVIState *iommu,
     irq->vector = irte.hi.fields.vector;
     irq->dest_mode = irte.lo.fields_remap.dm;
     irq->redir_hint = irte.lo.fields_remap.rq_eoi;
-    irq->dest = irte.lo.fields_remap.destination;
+    if (iommu->xtsup) {
+        irq->dest = irte.lo.fields_remap.destination |
+                    (irte.hi.fields.destination_hi << 24);
+    } else {
+        irq->dest = irte.lo.fields_remap.destination & 0xff;
+    }
 
     return 0;
 }
@@ -1506,8 +1522,9 @@ static void amdvi_init(AMDVIState *s)
 
     /* reset MMIO */
     memset(s->mmior, 0, AMDVI_MMIO_SIZE);
-    amdvi_set_quad(s, AMDVI_MMIO_EXT_FEATURES, AMDVI_EXT_FEATURES,
-            0xffffffffffffffef, 0);
+    amdvi_set_quad(s, AMDVI_MMIO_EXT_FEATURES,
+                   amdvi_extended_feature_register(s),
+                   0xffffffffffffffef, 0);
     amdvi_set_quad(s, AMDVI_MMIO_STATUS, 0, 0x98, 0x67);
 }
 
@@ -1591,6 +1608,11 @@ static void amdvi_sysbus_realize(DeviceState *dev, Error **errp)
     amdvi_init(s);
 }
 
+static Property amdvi_properties[] = {
+    DEFINE_PROP_BOOL("xtsup", AMDVIState, xtsup, false),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
 static const VMStateDescription vmstate_amdvi_sysbus = {
     .name = "amd-iommu",
     .unmigratable = 1
@@ -1617,6 +1639,7 @@ static void amdvi_sysbus_class_init(ObjectClass *klass, void *data)
     dc->user_creatable = true;
     set_bit(DEVICE_CATEGORY_MISC, dc->categories);
     dc->desc = "AMD IOMMU (AMD-Vi) DMA Remapping device";
+    device_class_set_props(dc, amdvi_properties);
 }
 
 static const TypeInfo amdvi_sysbus = {
diff --git a/hw/i386/amd_iommu.h b/hw/i386/amd_iommu.h
index 6da893ee57..3d430434fe 100644
--- a/hw/i386/amd_iommu.h
+++ b/hw/i386/amd_iommu.h
@@ -154,6 +154,7 @@
 
 #define AMDVI_FEATURE_PREFETCH            (1ULL << 0) /* page prefetch       */
 #define AMDVI_FEATURE_PPR                 (1ULL << 1) /* PPR Support         */
+#define AMDVI_FEATURE_XT                  (1ULL << 2) /* x2APIC Support      */
 #define AMDVI_FEATURE_GT                  (1ULL << 4) /* Guest Translation   */
 #define AMDVI_FEATURE_IA                  (1ULL << 6) /* inval all support   */
 #define AMDVI_FEATURE_GA                  (1ULL << 7) /* guest VAPIC support */
@@ -173,8 +174,9 @@
 #define AMDVI_IOTLB_MAX_SIZE 1024
 #define AMDVI_DEVID_SHIFT    36
 
-/* extended feature support */
-#define AMDVI_EXT_FEATURES (AMDVI_FEATURE_PREFETCH | AMDVI_FEATURE_PPR | \
+/* default extended feature */
+#define AMDVI_DEFAULT_EXT_FEATURES \
+        (AMDVI_FEATURE_PREFETCH | AMDVI_FEATURE_PPR | \
         AMDVI_FEATURE_IA | AMDVI_FEATURE_GT | AMDVI_FEATURE_HE | \
         AMDVI_GATS_MODE | AMDVI_HATS_MODE | AMDVI_FEATURE_GA)
 
@@ -278,8 +280,8 @@ union irte_ga_lo {
                 dm:1,
                 /* ------ */
                 guest_mode:1,
-                destination:8,
-                rsvd_1:48;
+                destination:24,
+                rsvd_1:32;
   } fields_remap;
 };
 
@@ -287,7 +289,8 @@ union irte_ga_hi {
   uint64_t val;
   struct {
       uint64_t  vector:8,
-                rsvd_2:56;
+                rsvd_2:48,
+                destination_hi:8;
   } fields;
 };
 
@@ -366,6 +369,9 @@ struct AMDVIState {
 
     /* Interrupt remapping */
     bool ga_enabled;
+    bool xtsup;
 };
 
+uint64_t amdvi_extended_feature_register(AMDVIState *s);
+
 #endif
-- 
2.25.1