[RFC PATCH for-4.22] x86/hvm: Introduce force_x2apic flag

Teddy Astie posted 1 patch 3 months, 1 week ago
Patches applied successfully (tree, apply log)
git fetch https://gitlab.com/xen-project/patchew/xen tags/patchew/d498a50f6187b362ac5da3c6a7a7c348f35dc4b3.1761761288.git.teddy.astie@vates.tech
docs/man/xl.cfg.5.pod.in              |  7 +++++++
tools/libs/light/libxl_types.idl      |  1 +
tools/libs/light/libxl_x86.c          |  4 ++++
tools/xl/xl_parse.c                   |  1 +
xen/arch/x86/domain.c                 |  2 +-
xen/arch/x86/hvm/hvm.c                |  2 ++
xen/arch/x86/hvm/vlapic.c             | 23 ++++++++++++++++++++++-
xen/arch/x86/include/asm/domain.h     |  2 ++
xen/arch/x86/include/asm/hvm/domain.h |  3 +++
xen/include/public/arch-x86/xen.h     | 12 +++++++++++-
10 files changed, 54 insertions(+), 3 deletions(-)
[RFC PATCH for-4.22] x86/hvm: Introduce force_x2apic flag
Posted by Teddy Astie 3 months, 1 week ago
Introduce a new flag to force the x2APIC enabled and preventing a
guest from switching back LAPIC to xAPIC mode.

The semantics of this mode are based IA32_XAPIC_DISABLE_STATUS
architectural MSR of Intel specification.

Signed-off-by: Teddy Astie <teddy.astie@vates.tech>
---
This feature can be useful for various reasons, starting with SEV as
it is complicated (especially with SEV-ES) to handle MMIO, and legacy
xAPIC is one thing that needs MMIO intercepts (and Linux uses it during
boot unless x2APIC is initially enabled, even if it switches to
x2apic afterward). It could also be interesting to reduce the attack
surface of the hypervisor (by only exposing x2apic to the guest).

As it can allow to have MMIO-less guest (using PVH), perhaps it can
be enough for avoiding the problematic cases of virtualized INVLPGB
(when we have it).

In my testing, Linux, FreeBSD and PV-shim works fine with it; OVMF
freezes for some reason, NetBSD doesn't support it (no x2apic support
as Xen guest). HVM BIOS gets stuck at SeaBIOS as it expects booting
with xAPIC.

On Intel platforms, it would be better to expose the
IA32_XAPIC_DISABLE_STATUS architectural MSR to advertise this to
guest, but it's non-trivial as it needs to be properly exposed
through IA32_ARCH_CAPABILITIES which is currently passed-through.

 docs/man/xl.cfg.5.pod.in              |  7 +++++++
 tools/libs/light/libxl_types.idl      |  1 +
 tools/libs/light/libxl_x86.c          |  4 ++++
 tools/xl/xl_parse.c                   |  1 +
 xen/arch/x86/domain.c                 |  2 +-
 xen/arch/x86/hvm/hvm.c                |  2 ++
 xen/arch/x86/hvm/vlapic.c             | 23 ++++++++++++++++++++++-
 xen/arch/x86/include/asm/domain.h     |  2 ++
 xen/arch/x86/include/asm/hvm/domain.h |  3 +++
 xen/include/public/arch-x86/xen.h     | 12 +++++++++++-
 10 files changed, 54 insertions(+), 3 deletions(-)

diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in
index ad1553c5e9..01b41d93c0 100644
--- a/docs/man/xl.cfg.5.pod.in
+++ b/docs/man/xl.cfg.5.pod.in
@@ -3198,6 +3198,13 @@ option.
 
 If using this option is necessary to fix an issue, please report a bug.
 
+=item B<force_x2apic=BOOLEAN>
+
+Force the LAPIC in x2APIC mode and prevent the guest from disabling
+it or switching to xAPIC mode.
+
+This option is disabled by default.
+
 =back
 
 =head1 SEE ALSO
diff --git a/tools/libs/light/libxl_types.idl b/tools/libs/light/libxl_types.idl
index d64a573ff3..b95278007e 100644
--- a/tools/libs/light/libxl_types.idl
+++ b/tools/libs/light/libxl_types.idl
@@ -738,6 +738,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
                                ("arm_sci", libxl_arm_sci),
                               ])),
     ("arch_x86", Struct(None, [("msr_relaxed", libxl_defbool),
+                               ("force_x2apic", libxl_defbool)
                               ])),
     # Alternate p2m is not bound to any architecture or guest type, as it is
     # supported by x86 HVM and ARM support is planned.
diff --git a/tools/libs/light/libxl_x86.c b/tools/libs/light/libxl_x86.c
index 60d4e8661c..2e0205d2a2 100644
--- a/tools/libs/light/libxl_x86.c
+++ b/tools/libs/light/libxl_x86.c
@@ -26,6 +26,9 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc,
     if (libxl_defbool_val(d_config->b_info.arch_x86.msr_relaxed))
         config->arch.misc_flags |= XEN_X86_MSR_RELAXED;
 
+    if (libxl_defbool_val(d_config->b_info.arch_x86.force_x2apic))
+        config->arch.misc_flags |= XEN_X86_FORCE_X2APIC;
+
     if (libxl_defbool_val(d_config->b_info.trap_unmapped_accesses)) {
             LOG(ERROR, "trap_unmapped_accesses is not supported on x86\n");
             return ERROR_FAIL;
@@ -818,6 +821,7 @@ int libxl__arch_domain_build_info_setdefault(libxl__gc *gc,
 {
     libxl_defbool_setdefault(&b_info->acpi, true);
     libxl_defbool_setdefault(&b_info->arch_x86.msr_relaxed, false);
+    libxl_defbool_setdefault(&b_info->arch_x86.force_x2apic, false);
     libxl_defbool_setdefault(&b_info->trap_unmapped_accesses, false);
 
     if (b_info->type == LIBXL_DOMAIN_TYPE_HVM) {
diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c
index af86d3186d..d84ab7c823 100644
--- a/tools/xl/xl_parse.c
+++ b/tools/xl/xl_parse.c
@@ -3041,6 +3041,7 @@ skip_usbdev:
                     "If it fixes an issue you are having please report to "
                     "xen-devel@lists.xenproject.org.\n");
 
+    xlu_cfg_get_defbool(config, "force_x2apic", &b_info->arch_x86.force_x2apic, 0);
     xlu_cfg_get_defbool(config, "vpmu", &b_info->vpmu, 0);
 
     xlu_cfg_destroy(config);
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 19fd86ce88..02f650a614 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -704,7 +704,7 @@ int arch_sanitise_domain_config(struct xen_domctl_createdomain *config)
         return -EINVAL;
     }
 
-    if ( config->arch.misc_flags & ~XEN_X86_MSR_RELAXED )
+    if ( config->arch.misc_flags & ~(XEN_X86_MSR_RELAXED | XEN_X86_FORCE_X2APIC) )
     {
         dprintk(XENLOG_INFO, "Invalid arch misc flags %#x\n",
                 config->arch.misc_flags);
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 0c60faa39d..73cbac0f22 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -616,6 +616,8 @@ int hvm_domain_initialise(struct domain *d,
     INIT_LIST_HEAD(&d->arch.hvm.mmcfg_regions);
     INIT_LIST_HEAD(&d->arch.hvm.msix_tables);
 
+    d->arch.hvm.force_x2apic = config->arch.misc_flags & XEN_X86_FORCE_X2APIC;
+
     rc = create_perdomain_mapping(d, PERDOMAIN_VIRT_START, 0, NULL, NULL);
     if ( rc )
         goto fail;
diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
index 993e972cd7..ae8df70d2e 100644
--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -1116,6 +1116,20 @@ int guest_wrmsr_apic_base(struct vcpu *v, uint64_t val)
     if ( !has_vlapic(v->domain) )
         return X86EMUL_EXCEPTION;
 
+    if ( has_force_x2apic(v->domain) )
+    {
+        /*
+        * We implement the same semantics as MSR_IA32_XAPIC_DISABLE_STATUS:
+        * LEGACY_XAPIC_DISABLED which rejects any attempt at clearing
+        * IA32_APIC_BASE.EXTD, thus forcing the LAPIC in x2APIC mode.
+        */
+        if ( !(val & APIC_BASE_EXTD) )
+        {
+            gprintk(XENLOG_WARNING, "tried to disable x2APIC while forced on\n");
+            return X86EMUL_EXCEPTION;
+        }
+    }
+
     /* Attempting to set reserved bits? */
     if ( val & ~(APIC_BASE_ADDR_MASK | APIC_BASE_ENABLE | APIC_BASE_BSP |
                  (cp->basic.x2apic ? APIC_BASE_EXTD : 0)) )
@@ -1474,7 +1488,14 @@ void vlapic_reset(struct vlapic *vlapic)
     if ( v->vcpu_id == 0 )
         vlapic->hw.apic_base_msr |= APIC_BASE_BSP;
 
-    vlapic_set_reg(vlapic, APIC_ID, (v->vcpu_id * 2) << 24);
+    if ( has_force_x2apic(v->domain) )
+    {
+        vlapic->hw.apic_base_msr |= APIC_BASE_EXTD;
+        set_x2apic_id(vlapic);
+    }
+    else
+        vlapic_set_reg(vlapic, APIC_ID, (v->vcpu_id * 2) << 24);
+
     vlapic_do_init(vlapic);
 }
 
diff --git a/xen/arch/x86/include/asm/domain.h b/xen/arch/x86/include/asm/domain.h
index 5df8c78253..771992d156 100644
--- a/xen/arch/x86/include/asm/domain.h
+++ b/xen/arch/x86/include/asm/domain.h
@@ -509,6 +509,8 @@ struct arch_domain
 #define has_pirq(d)        (!!((d)->arch.emulation_flags & X86_EMU_USE_PIRQ))
 #define has_vpci(d)        (!!((d)->arch.emulation_flags & X86_EMU_VPCI))
 
+#define has_force_x2apic(d) ((d)->arch.hvm.force_x2apic)
+
 #define gdt_ldt_pt_idx(v) \
       ((v)->vcpu_id >> (PAGETABLE_ORDER - GDT_LDT_VCPU_SHIFT))
 #define pv_gdt_ptes(v) \
diff --git a/xen/arch/x86/include/asm/hvm/domain.h b/xen/arch/x86/include/asm/hvm/domain.h
index 333501d5f2..b56fa08b73 100644
--- a/xen/arch/x86/include/asm/hvm/domain.h
+++ b/xen/arch/x86/include/asm/hvm/domain.h
@@ -108,6 +108,9 @@ struct hvm_domain {
     /* Compatibility setting for a bug in x2APIC LDR */
     bool bug_x2apic_ldr_vcpu_id;
 
+    /* LAPIC is forced in x2APIC mode */
+    bool force_x2apic;
+
     /* hypervisor intercepted msix table */
     struct list_head       msixtbl_list;
 
diff --git a/xen/include/public/arch-x86/xen.h b/xen/include/public/arch-x86/xen.h
index b99a691706..75aa31d9ed 100644
--- a/xen/include/public/arch-x86/xen.h
+++ b/xen/include/public/arch-x86/xen.h
@@ -309,11 +309,21 @@ struct xen_arch_domainconfig {
  * doesn't allow the guest to read or write to the underlying MSR.
  */
 #define XEN_X86_MSR_RELAXED (1u << 0)
+
+/*
+ * This option forces the LAPIC to be in X2APIC mode (IA32_APIC_BASE.EXTD = 1)
+ * using the same semantics as IA32_XAPIC_DISABLE_STATUS:LEGACY_XAPIC_DISABLED
+ *
+ * Attempts by the guest to clear IA32_APIC_BASE.EXTD (e.g disable X2APIC) will
+ * inject #GP in the guest.
+ */
+#define XEN_X86_FORCE_X2APIC (1U << 1)
+
     uint32_t misc_flags;
 };
 
 /* Max  XEN_X86_* constant. Used for ABI checking. */
-#define XEN_X86_MISC_FLAGS_MAX XEN_X86_MSR_RELAXED
+#define XEN_X86_MISC_FLAGS_MAX XEN_X86_FORCE_X2APIC
 
 #endif
 
-- 
2.51.2



--
Teddy Astie | Vates XCP-ng Developer

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech
Re: [RFC PATCH for-4.22] x86/hvm: Introduce force_x2apic flag
Posted by Teddy Astie 2 months, 4 weeks ago
Le 29/10/2025 à 19:26, Teddy Astie a écrit :
> Introduce a new flag to force the x2APIC enabled and preventing a
> guest from switching back LAPIC to xAPIC mode.
> 
> The semantics of this mode are based IA32_XAPIC_DISABLE_STATUS
> architectural MSR of Intel specification.
> 
> Signed-off-by: Teddy Astie <teddy.astie@vates.tech>
> ---
> This feature can be useful for various reasons, starting with SEV as
> it is complicated (especially with SEV-ES) to handle MMIO, and legacy
> xAPIC is one thing that needs MMIO intercepts (and Linux uses it during
> boot unless x2APIC is initially enabled, even if it switches to
> x2apic afterward). It could also be interesting to reduce the attack
> surface of the hypervisor (by only exposing x2apic to the guest).
> 
> As it can allow to have MMIO-less guest (using PVH), perhaps it can
> be enough for avoiding the problematic cases of virtualized INVLPGB
> (when we have it).
> 
> In my testing, Linux, FreeBSD and PV-shim works fine with it; OVMF
> freezes for some reason, NetBSD doesn't support it (no x2apic support
> as Xen guest). HVM BIOS gets stuck at SeaBIOS as it expects booting
> with xAPIC.
> 
> On Intel platforms, it would be better to expose the
> IA32_XAPIC_DISABLE_STATUS architectural MSR to advertise this to
> guest, but it's non-trivial as it needs to be properly exposed
> through IA32_ARCH_CAPABILITIES which is currently passed-through.
> 
>   docs/man/xl.cfg.5.pod.in              |  7 +++++++
>   tools/libs/light/libxl_types.idl      |  1 +
>   tools/libs/light/libxl_x86.c          |  4 ++++
>   tools/xl/xl_parse.c                   |  1 +
>   xen/arch/x86/domain.c                 |  2 +-
>   xen/arch/x86/hvm/hvm.c                |  2 ++
>   xen/arch/x86/hvm/vlapic.c             | 23 ++++++++++++++++++++++-
>   xen/arch/x86/include/asm/domain.h     |  2 ++
>   xen/arch/x86/include/asm/hvm/domain.h |  3 +++
>   xen/include/public/arch-x86/xen.h     | 12 +++++++++++-
>   10 files changed, 54 insertions(+), 3 deletions(-)
> 

I guess for now, it would be preferable overall to :
- just add a way to enable it by default, not lock in it in x2apic mode
- the ability to lock it down (i.e disable xAPIC at compile time) could 
be introduced separately

I'm not completely decided on the naming of the option, maybe something 
like :
x2apic_mode = <default> | <pre_enable> (or just enable ?)

`default` will keep the current behavior, or force x2apic if xAPIC is 
disabled at compile time; `pre_enable` will enable it by default, but OS 
may be able to go back to xAPIC mode if supported.

Teddy


--
Teddy Astie | Vates XCP-ng Developer

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech
Re: [RFC PATCH for-4.22] x86/hvm: Introduce force_x2apic flag
Posted by Andrew Cooper 2 months, 4 weeks ago
On 12/11/2025 10:35 am, Teddy Astie wrote:
> Le 29/10/2025 à 19:26, Teddy Astie a écrit :
>> Introduce a new flag to force the x2APIC enabled and preventing a
>> guest from switching back LAPIC to xAPIC mode.
>>
>> The semantics of this mode are based IA32_XAPIC_DISABLE_STATUS
>> architectural MSR of Intel specification.
>>
>> Signed-off-by: Teddy Astie <teddy.astie@vates.tech>
>> ---
>> This feature can be useful for various reasons, starting with SEV as
>> it is complicated (especially with SEV-ES) to handle MMIO, and legacy
>> xAPIC is one thing that needs MMIO intercepts (and Linux uses it during
>> boot unless x2APIC is initially enabled, even if it switches to
>> x2apic afterward). It could also be interesting to reduce the attack
>> surface of the hypervisor (by only exposing x2apic to the guest).
>>
>> As it can allow to have MMIO-less guest (using PVH), perhaps it can
>> be enough for avoiding the problematic cases of virtualized INVLPGB
>> (when we have it).
>>
>> In my testing, Linux, FreeBSD and PV-shim works fine with it; OVMF
>> freezes for some reason, NetBSD doesn't support it (no x2apic support
>> as Xen guest). HVM BIOS gets stuck at SeaBIOS as it expects booting
>> with xAPIC.
>>
>> On Intel platforms, it would be better to expose the
>> IA32_XAPIC_DISABLE_STATUS architectural MSR to advertise this to
>> guest, but it's non-trivial as it needs to be properly exposed
>> through IA32_ARCH_CAPABILITIES which is currently passed-through.
>>
>>   docs/man/xl.cfg.5.pod.in              |  7 +++++++
>>   tools/libs/light/libxl_types.idl      |  1 +
>>   tools/libs/light/libxl_x86.c          |  4 ++++
>>   tools/xl/xl_parse.c                   |  1 +
>>   xen/arch/x86/domain.c                 |  2 +-
>>   xen/arch/x86/hvm/hvm.c                |  2 ++
>>   xen/arch/x86/hvm/vlapic.c             | 23 ++++++++++++++++++++++-
>>   xen/arch/x86/include/asm/domain.h     |  2 ++
>>   xen/arch/x86/include/asm/hvm/domain.h |  3 +++
>>   xen/include/public/arch-x86/xen.h     | 12 +++++++++++-
>>   10 files changed, 54 insertions(+), 3 deletions(-)
>>
> I guess for now, it would be preferable overall to :
> - just add a way to enable it by default, not lock in it in x2apic mode
> - the ability to lock it down (i.e disable xAPIC at compile time) could 
> be introduced separately
>
> I'm not completely decided on the naming of the option, maybe something 
> like :
> x2apic_mode = <default> | <pre_enable> (or just enable ?)
>
> `default` will keep the current behavior, or force x2apic if xAPIC is 
> disabled at compile time; `pre_enable` will enable it by default, but OS 
> may be able to go back to xAPIC mode if supported.

You don't need any new hypercalls.  Just set the state correctly in a
LAPIC record in libxg's vcpu_hvm().

~Andrew

Re: [RFC PATCH for-4.22] x86/hvm: Introduce force_x2apic flag
Posted by Andrew Cooper 2 months, 4 weeks ago
On 29/10/2025 6:26 pm, Teddy Astie wrote:
> Introduce a new flag to force the x2APIC enabled and preventing a
> guest from switching back LAPIC to xAPIC mode.
>
> The semantics of this mode are based IA32_XAPIC_DISABLE_STATUS
> architectural MSR of Intel specification.
>
> Signed-off-by: Teddy Astie <teddy.astie@vates.tech>

You can do what you want by simply starting the VM in x2APIC mode.

OSes don't tend to switch out of x2APIC mode, especially if it was set
by firmware.

IA32_XAPIC_DISABLE_STATUS is garbage.  It was an emergency "fix" the
fact that the entire L2 cache datastream was architecturally visible in
the xAPIC MMIO window, included decrypted SGX contents.  Furthermore,
upon this being discussed, and it being pointed out that the proper
place to put the lock bit would be in MSR_APIC_BASE itself, Intel
declined citing "too much effort to qualify".  So we're left with this
instead.

We do virtualise one Intel control on AMD for the benefit of L1, but AMD
have finally grown CPUID Faulting into an architectural feature so we
can see about retiring the old bodge.

But, the Local APIC is far more complicated, and which mode you want
depends more on which hardware acceleration is available to you, and
there's a huge amount of work needing to do to get our x2APIC support
into better shape.

Either way, start simple by starting the guest in x2APIC mode.  It will
probably be sufficient for your needs.

~Andrew

Re: [RFC PATCH for-4.22] x86/hvm: Introduce force_x2apic flag
Posted by Alejandro Vallejo 2 months, 4 weeks ago
Hi,

On Wed Oct 29, 2025 at 7:26 PM CET, Teddy Astie wrote:
> Introduce a new flag to force the x2APIC enabled and preventing a
> guest from switching back LAPIC to xAPIC mode.

I don't think you can really do this on AMD without advertising it somehow.

And there's no architectural way to do so.

>
> The semantics of this mode are based IA32_XAPIC_DISABLE_STATUS
> architectural MSR of Intel specification.

Yes, I can see this being usable and a good idea on Intel hardware.

>
> Signed-off-by: Teddy Astie <teddy.astie@vates.tech>
> ---
> This feature can be useful for various reasons, starting with SEV as
> it is complicated (especially with SEV-ES) to handle MMIO, and legacy
> xAPIC is one thing that needs MMIO intercepts (and Linux uses it during
> boot unless x2APIC is initially enabled, even if it switches to
> x2apic afterward). It could also be interesting to reduce the attack
> surface of the hypervisor (by only exposing x2apic to the guest).

On AMD (again, AFAIK) you do have to implement xAPIC support to provide a true
AMD-like system. Anything else would be a Xen-specific extension.

The intended way to go around trap-and-emulate for xAPIC access is to bite the
bullet and implement accelerated AVIC. That has explicit provisions to enable
SEV operation and would have the neat benefit of elliding certain VMEXITs (i.e:
EOI). It'd also simplify MSI delivery on non-oversubscribed CPUs.

I assume you already looked at it and concluded it was more work than you could
afford, but thought I'd bring it up anyway.

>
> As it can allow to have MMIO-less guest (using PVH), perhaps it can
> be enough for avoiding the problematic cases of virtualized INVLPGB
> (when we have it).
>
> In my testing, Linux, FreeBSD and PV-shim works fine with it; OVMF
> freezes for some reason, NetBSD doesn't support it (no x2apic support
> as Xen guest). HVM BIOS gets stuck at SeaBIOS as it expects booting
> with xAPIC.
>
> On Intel platforms, it would be better to expose the
> IA32_XAPIC_DISABLE_STATUS architectural MSR to advertise this to
> guest, but it's non-trivial as it needs to be properly exposed
> through IA32_ARCH_CAPABILITIES which is currently passed-through.

ARCH_CAPS is part of the CPU policy. You can have toolstack set the bit and
have Xen take the hint. Then it'd also be sent on the migrate stream.

Granted, that wouldn't help you on AMD hardware, but it'd be perfectly
spec-compliant on Intel. A different take might be to have a Xen-specific bit
in the hypervisor leaves, mirroring the arch_caps bit.

I think SeaBIOS, OVMF and NetBSD failing to boot gives you a hint that, while
this might be a good idea for some cases, you do need xAPIC for a general
purpose VM. IMO, at least.

>
>  docs/man/xl.cfg.5.pod.in              |  7 +++++++
>  tools/libs/light/libxl_types.idl      |  1 +
>  tools/libs/light/libxl_x86.c          |  4 ++++
>  tools/xl/xl_parse.c                   |  1 +
>  xen/arch/x86/domain.c                 |  2 +-
>  xen/arch/x86/hvm/hvm.c                |  2 ++
>  xen/arch/x86/hvm/vlapic.c             | 23 ++++++++++++++++++++++-
>  xen/arch/x86/include/asm/domain.h     |  2 ++
>  xen/arch/x86/include/asm/hvm/domain.h |  3 +++
>  xen/include/public/arch-x86/xen.h     | 12 +++++++++++-
>  10 files changed, 54 insertions(+), 3 deletions(-)
>
> diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in
> index ad1553c5e9..01b41d93c0 100644
> --- a/docs/man/xl.cfg.5.pod.in
> +++ b/docs/man/xl.cfg.5.pod.in
> @@ -3198,6 +3198,13 @@ option.
>  
>  If using this option is necessary to fix an issue, please report a bug.
>  
> +=item B<force_x2apic=BOOLEAN>

nit: I'd say "x2apic_only" to show not only that it starts in x2apic mode, but
also that it must stay that way. But tomato-tomahto.

> +
> +Force the LAPIC in x2APIC mode and prevent the guest from disabling
> +it or switching to xAPIC mode.

The "or switching to xAPIC mode" part is redundant. The means to transition to
xAPIC mode is through disabling it.

> +
> +This option is disabled by default.
> +
>  =back
>  
>  =head1 SEE ALSO
> diff --git a/tools/libs/light/libxl_types.idl b/tools/libs/light/libxl_types.idl
> index d64a573ff3..b95278007e 100644
> --- a/tools/libs/light/libxl_types.idl
> +++ b/tools/libs/light/libxl_types.idl
> @@ -738,6 +738,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
>                                 ("arm_sci", libxl_arm_sci),
>                                ])),
>      ("arch_x86", Struct(None, [("msr_relaxed", libxl_defbool),
> +                               ("force_x2apic", libxl_defbool)
>                                ])),
>      # Alternate p2m is not bound to any architecture or guest type, as it is
>      # supported by x86 HVM and ARM support is planned.
> diff --git a/tools/libs/light/libxl_x86.c b/tools/libs/light/libxl_x86.c
> index 60d4e8661c..2e0205d2a2 100644
> --- a/tools/libs/light/libxl_x86.c
> +++ b/tools/libs/light/libxl_x86.c
> @@ -26,6 +26,9 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc,
>      if (libxl_defbool_val(d_config->b_info.arch_x86.msr_relaxed))
>          config->arch.misc_flags |= XEN_X86_MSR_RELAXED;
>  
> +    if (libxl_defbool_val(d_config->b_info.arch_x86.force_x2apic))
> +        config->arch.misc_flags |= XEN_X86_FORCE_X2APIC;
> +
>      if (libxl_defbool_val(d_config->b_info.trap_unmapped_accesses)) {
>              LOG(ERROR, "trap_unmapped_accesses is not supported on x86\n");
>              return ERROR_FAIL;
> @@ -818,6 +821,7 @@ int libxl__arch_domain_build_info_setdefault(libxl__gc *gc,
>  {
>      libxl_defbool_setdefault(&b_info->acpi, true);
>      libxl_defbool_setdefault(&b_info->arch_x86.msr_relaxed, false);
> +    libxl_defbool_setdefault(&b_info->arch_x86.force_x2apic, false);
>      libxl_defbool_setdefault(&b_info->trap_unmapped_accesses, false);
>  
>      if (b_info->type == LIBXL_DOMAIN_TYPE_HVM) {
> diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c
> index af86d3186d..d84ab7c823 100644
> --- a/tools/xl/xl_parse.c
> +++ b/tools/xl/xl_parse.c
> @@ -3041,6 +3041,7 @@ skip_usbdev:
>                      "If it fixes an issue you are having please report to "
>                      "xen-devel@lists.xenproject.org.\n");
>  
> +    xlu_cfg_get_defbool(config, "force_x2apic", &b_info->arch_x86.force_x2apic, 0);
>      xlu_cfg_get_defbool(config, "vpmu", &b_info->vpmu, 0);
>  
>      xlu_cfg_destroy(config);
> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> index 19fd86ce88..02f650a614 100644
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -704,7 +704,7 @@ int arch_sanitise_domain_config(struct xen_domctl_createdomain *config)
>          return -EINVAL;
>      }
>  
> -    if ( config->arch.misc_flags & ~XEN_X86_MSR_RELAXED )
> +    if ( config->arch.misc_flags & ~(XEN_X86_MSR_RELAXED | XEN_X86_FORCE_X2APIC) )

As I said, I'd reuse the bit in ARCH_CAPS in the CPU policy. That also means it
can be properly migrated and you wouldn't need an extra boolean in the domain.

>      {
>          dprintk(XENLOG_INFO, "Invalid arch misc flags %#x\n",
>                  config->arch.misc_flags);
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index 0c60faa39d..73cbac0f22 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -616,6 +616,8 @@ int hvm_domain_initialise(struct domain *d,
>      INIT_LIST_HEAD(&d->arch.hvm.mmcfg_regions);
>      INIT_LIST_HEAD(&d->arch.hvm.msix_tables);
>  
> +    d->arch.hvm.force_x2apic = config->arch.misc_flags & XEN_X86_FORCE_X2APIC;
> +
>      rc = create_perdomain_mapping(d, PERDOMAIN_VIRT_START, 0, NULL, NULL);
>      if ( rc )
>          goto fail;
> diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
> index 993e972cd7..ae8df70d2e 100644
> --- a/xen/arch/x86/hvm/vlapic.c
> +++ b/xen/arch/x86/hvm/vlapic.c
> @@ -1116,6 +1116,20 @@ int guest_wrmsr_apic_base(struct vcpu *v, uint64_t val)
>      if ( !has_vlapic(v->domain) )
>          return X86EMUL_EXCEPTION;
>  
> +    if ( has_force_x2apic(v->domain) )
> +    {
> +        /*
> +        * We implement the same semantics as MSR_IA32_XAPIC_DISABLE_STATUS:
> +        * LEGACY_XAPIC_DISABLED which rejects any attempt at clearing
> +        * IA32_APIC_BASE.EXTD, thus forcing the LAPIC in x2APIC mode.
> +        */
> +        if ( !(val & APIC_BASE_EXTD) )
> +        {
> +            gprintk(XENLOG_WARNING, "tried to disable x2APIC while forced on\n");

This is intended behaviour, not a warning. I'd remove the printk.

> +            return X86EMUL_EXCEPTION;
> +        }
> +    }
> +
>      /* Attempting to set reserved bits? */
>      if ( val & ~(APIC_BASE_ADDR_MASK | APIC_BASE_ENABLE | APIC_BASE_BSP |
>                   (cp->basic.x2apic ? APIC_BASE_EXTD : 0)) )
> @@ -1474,7 +1488,14 @@ void vlapic_reset(struct vlapic *vlapic)
>      if ( v->vcpu_id == 0 )
>          vlapic->hw.apic_base_msr |= APIC_BASE_BSP;
>  
> -    vlapic_set_reg(vlapic, APIC_ID, (v->vcpu_id * 2) << 24);
> +    if ( has_force_x2apic(v->domain) )
> +    {
> +        vlapic->hw.apic_base_msr |= APIC_BASE_EXTD;
> +        set_x2apic_id(vlapic);
> +    }
> +    else
> +        vlapic_set_reg(vlapic, APIC_ID, (v->vcpu_id * 2) << 24);
> +
>      vlapic_do_init(vlapic);
>  }
>  
> diff --git a/xen/arch/x86/include/asm/domain.h b/xen/arch/x86/include/asm/domain.h
> index 5df8c78253..771992d156 100644
> --- a/xen/arch/x86/include/asm/domain.h
> +++ b/xen/arch/x86/include/asm/domain.h
> @@ -509,6 +509,8 @@ struct arch_domain
>  #define has_pirq(d)        (!!((d)->arch.emulation_flags & X86_EMU_USE_PIRQ))
>  #define has_vpci(d)        (!!((d)->arch.emulation_flags & X86_EMU_VPCI))
>  
> +#define has_force_x2apic(d) ((d)->arch.hvm.force_x2apic)

This would be a check on the CPU policy instead with my proposed change.

> +
>  #define gdt_ldt_pt_idx(v) \
>        ((v)->vcpu_id >> (PAGETABLE_ORDER - GDT_LDT_VCPU_SHIFT))
>  #define pv_gdt_ptes(v) \
> diff --git a/xen/arch/x86/include/asm/hvm/domain.h b/xen/arch/x86/include/asm/hvm/domain.h
> index 333501d5f2..b56fa08b73 100644
> --- a/xen/arch/x86/include/asm/hvm/domain.h
> +++ b/xen/arch/x86/include/asm/hvm/domain.h
> @@ -108,6 +108,9 @@ struct hvm_domain {
>      /* Compatibility setting for a bug in x2APIC LDR */
>      bool bug_x2apic_ldr_vcpu_id;
>  
> +    /* LAPIC is forced in x2APIC mode */
> +    bool force_x2apic;
> +
>      /* hypervisor intercepted msix table */
>      struct list_head       msixtbl_list;
>  
> diff --git a/xen/include/public/arch-x86/xen.h b/xen/include/public/arch-x86/xen.h
> index b99a691706..75aa31d9ed 100644
> --- a/xen/include/public/arch-x86/xen.h
> +++ b/xen/include/public/arch-x86/xen.h
> @@ -309,11 +309,21 @@ struct xen_arch_domainconfig {
>   * doesn't allow the guest to read or write to the underlying MSR.
>   */
>  #define XEN_X86_MSR_RELAXED (1u << 0)
> +
> +/*
> + * This option forces the LAPIC to be in X2APIC mode (IA32_APIC_BASE.EXTD = 1)
> + * using the same semantics as IA32_XAPIC_DISABLE_STATUS:LEGACY_XAPIC_DISABLED
> + *
> + * Attempts by the guest to clear IA32_APIC_BASE.EXTD (e.g disable X2APIC) will
> + * inject #GP in the guest.
> + */
> +#define XEN_X86_FORCE_X2APIC (1U << 1)
> +
>      uint32_t misc_flags;
>  };
>  
>  /* Max  XEN_X86_* constant. Used for ABI checking. */
> -#define XEN_X86_MISC_FLAGS_MAX XEN_X86_MSR_RELAXED
> +#define XEN_X86_MISC_FLAGS_MAX XEN_X86_FORCE_X2APIC
>  
>  #endif
>  
Re: [RFC PATCH for-4.22] x86/hvm: Introduce force_x2apic flag
Posted by Jan Beulich 3 months, 1 week ago
On 29.10.2025 19:26, Teddy Astie wrote:
> --- a/xen/arch/x86/hvm/vlapic.c
> +++ b/xen/arch/x86/hvm/vlapic.c
> @@ -1116,6 +1116,20 @@ int guest_wrmsr_apic_base(struct vcpu *v, uint64_t val)
>      if ( !has_vlapic(v->domain) )
>          return X86EMUL_EXCEPTION;
>  
> +    if ( has_force_x2apic(v->domain) )
> +    {
> +        /*
> +        * We implement the same semantics as MSR_IA32_XAPIC_DISABLE_STATUS:
> +        * LEGACY_XAPIC_DISABLED which rejects any attempt at clearing
> +        * IA32_APIC_BASE.EXTD, thus forcing the LAPIC in x2APIC mode.
> +        */

The MSR aspect should be implemented by using the MSR. Beyond that imo our treatment
shouldn't be different from that when firmware pre-enables x2APIC: While not
advisable, aiui OSes could still switch back to xAPIC mode. At which point the guest
config level control may also want calling "pre-enable", not "force".

Jan
Re: [RFC PATCH for-4.22] x86/hvm: Introduce force_x2apic flag
Posted by Teddy Astie 3 months, 1 week ago
Le 30/10/2025 à 08:56, Jan Beulich a écrit :
> On 29.10.2025 19:26, Teddy Astie wrote:
>> --- a/xen/arch/x86/hvm/vlapic.c
>> +++ b/xen/arch/x86/hvm/vlapic.c
>> @@ -1116,6 +1116,20 @@ int guest_wrmsr_apic_base(struct vcpu *v, uint64_t val)
>>       if ( !has_vlapic(v->domain) )
>>           return X86EMUL_EXCEPTION;
>>   
>> +    if ( has_force_x2apic(v->domain) )
>> +    {
>> +        /*
>> +        * We implement the same semantics as MSR_IA32_XAPIC_DISABLE_STATUS:
>> +        * LEGACY_XAPIC_DISABLED which rejects any attempt at clearing
>> +        * IA32_APIC_BASE.EXTD, thus forcing the LAPIC in x2APIC mode.
>> +        */
> 
> The MSR aspect should be implemented by using the MSR. Beyond that imo our treatment
> shouldn't be different from that when firmware pre-enables x2APIC: While not
> advisable, aiui OSes could still switch back to xAPIC mode. At which point the guest
> config level control may also want calling "pre-enable", not "force".
> 

One advantage of forcing x2APIC enabled is that it simplifies the 
support for LAPIC IDs over 255.

While that could be a alternative to just pre-enable x2apic (in cases we 
don't want the OS to use xAPIC because it is there), things still gets 
tricky for supporting more vCPUs. We would need to clarify the behavior 
of enabling xAPIC on a vCPU that has LAPIC_ID > 254, Intel and AMD 
specification don't define anything aside for Intel :
> If a BIOS transfers control to OS in xAPIC mode, then the BIOS must ensure that only logical processors with
> CPUID.0BH.EDX value less than 255 are enabled.

And I guess the Intel's MSR_IA32_XAPIC_DISABLE_STATUS exists to prevent 
such specific case to occur once the OS booted.

> Jan
> 
Teddy


--
Teddy Astie | Vates XCP-ng Developer

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech
Re: [RFC PATCH for-4.22] x86/hvm: Introduce force_x2apic flag
Posted by Jan Beulich 3 months ago
On 31.10.2025 15:33, Teddy Astie wrote:
> Le 30/10/2025 à 08:56, Jan Beulich a écrit :
>> On 29.10.2025 19:26, Teddy Astie wrote:
>>> --- a/xen/arch/x86/hvm/vlapic.c
>>> +++ b/xen/arch/x86/hvm/vlapic.c
>>> @@ -1116,6 +1116,20 @@ int guest_wrmsr_apic_base(struct vcpu *v, uint64_t val)
>>>       if ( !has_vlapic(v->domain) )
>>>           return X86EMUL_EXCEPTION;
>>>   
>>> +    if ( has_force_x2apic(v->domain) )
>>> +    {
>>> +        /*
>>> +        * We implement the same semantics as MSR_IA32_XAPIC_DISABLE_STATUS:
>>> +        * LEGACY_XAPIC_DISABLED which rejects any attempt at clearing
>>> +        * IA32_APIC_BASE.EXTD, thus forcing the LAPIC in x2APIC mode.
>>> +        */
>>
>> The MSR aspect should be implemented by using the MSR. Beyond that imo our treatment
>> shouldn't be different from that when firmware pre-enables x2APIC: While not
>> advisable, aiui OSes could still switch back to xAPIC mode. At which point the guest
>> config level control may also want calling "pre-enable", not "force".
>>
> 
> One advantage of forcing x2APIC enabled is that it simplifies the 
> support for LAPIC IDs over 255.
> 
> While that could be a alternative to just pre-enable x2apic (in cases we 
> don't want the OS to use xAPIC because it is there), things still gets 
> tricky for supporting more vCPUs. We would need to clarify the behavior 
> of enabling xAPIC on a vCPU that has LAPIC_ID > 254, Intel and AMD 
> specification don't define anything aside for Intel :
>> If a BIOS transfers control to OS in xAPIC mode, then the BIOS must ensure that only logical processors with
>> CPUID.0BH.EDX value less than 255 are enabled.

Well, this falls into the much wider topic of making more than 128 vCPU-s
available for HVM / PVH, doesn't it?

Jan

Re: [RFC PATCH for-4.22] x86/hvm: Introduce force_x2apic flag
Posted by Grygorii Strashko 3 months, 1 week ago

On 29.10.25 20:26, Teddy Astie wrote:
> Introduce a new flag to force the x2APIC enabled and preventing a
> guest from switching back LAPIC to xAPIC mode.
> 
> The semantics of this mode are based IA32_XAPIC_DISABLE_STATUS
> architectural MSR of Intel specification.
> 
> Signed-off-by: Teddy Astie <teddy.astie@vates.tech>
> ---
> This feature can be useful for various reasons, starting with SEV as
> it is complicated (especially with SEV-ES) to handle MMIO, and legacy
> xAPIC is one thing that needs MMIO intercepts (and Linux uses it during
> boot unless x2APIC is initially enabled, even if it switches to
> x2apic afterward). It could also be interesting to reduce the attack
> surface of the hypervisor (by only exposing x2apic to the guest).
> 
> As it can allow to have MMIO-less guest (using PVH), perhaps it can
> be enough for avoiding the problematic cases of virtualized INVLPGB
> (when we have it).
> 
> In my testing, Linux, FreeBSD and PV-shim works fine with it; OVMF
> freezes for some reason, NetBSD doesn't support it (no x2apic support
> as Xen guest). HVM BIOS gets stuck at SeaBIOS as it expects booting
> with xAPIC.
> 
> On Intel platforms, it would be better to expose the
> IA32_XAPIC_DISABLE_STATUS architectural MSR to advertise this to
> guest, but it's non-trivial as it needs to be properly exposed
> through IA32_ARCH_CAPABILITIES which is currently passed-through.
> 
>   docs/man/xl.cfg.5.pod.in              |  7 +++++++
>   tools/libs/light/libxl_types.idl      |  1 +
>   tools/libs/light/libxl_x86.c          |  4 ++++
>   tools/xl/xl_parse.c                   |  1 +
>   xen/arch/x86/domain.c                 |  2 +-
>   xen/arch/x86/hvm/hvm.c                |  2 ++
>   xen/arch/x86/hvm/vlapic.c             | 23 ++++++++++++++++++++++-
>   xen/arch/x86/include/asm/domain.h     |  2 ++
>   xen/arch/x86/include/asm/hvm/domain.h |  3 +++
>   xen/include/public/arch-x86/xen.h     | 12 +++++++++++-
>   10 files changed, 54 insertions(+), 3 deletions(-)
> 
> diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in
> index ad1553c5e9..01b41d93c0 100644
> --- a/docs/man/xl.cfg.5.pod.in
> +++ b/docs/man/xl.cfg.5.pod.in
> @@ -3198,6 +3198,13 @@ option.
>   
>   If using this option is necessary to fix an issue, please report a bug.
>   
> +=item B<force_x2apic=BOOLEAN>
> +
> +Force the LAPIC in x2APIC mode and prevent the guest from disabling
> +it or switching to xAPIC mode.
> +
> +This option is disabled by default.
> +
>   =back
>   
>   =head1 SEE ALSO

[...]

>   
> diff --git a/xen/arch/x86/include/asm/domain.h b/xen/arch/x86/include/asm/domain.h
> index 5df8c78253..771992d156 100644
> --- a/xen/arch/x86/include/asm/domain.h
> +++ b/xen/arch/x86/include/asm/domain.h
> @@ -509,6 +509,8 @@ struct arch_domain
>   #define has_pirq(d)        (!!((d)->arch.emulation_flags & X86_EMU_USE_PIRQ))
>   #define has_vpci(d)        (!!((d)->arch.emulation_flags & X86_EMU_VPCI))
>   
> +#define has_force_x2apic(d) ((d)->arch.hvm.force_x2apic)

Would it be possible for you to consider having Kconfig option to make
such configuration global, static?
   

-- 
Best regards,
-grygorii
Re: [RFC PATCH for-4.22] x86/hvm: Introduce force_x2apic flag
Posted by Jan Beulich 3 months, 1 week ago
On 30.10.2025 00:08, Grygorii Strashko wrote:
> On 29.10.25 20:26, Teddy Astie wrote:
>> --- a/xen/arch/x86/include/asm/domain.h
>> +++ b/xen/arch/x86/include/asm/domain.h
>> @@ -509,6 +509,8 @@ struct arch_domain
>>   #define has_pirq(d)        (!!((d)->arch.emulation_flags & X86_EMU_USE_PIRQ))
>>   #define has_vpci(d)        (!!((d)->arch.emulation_flags & X86_EMU_VPCI))
>>   
>> +#define has_force_x2apic(d) ((d)->arch.hvm.force_x2apic)
> 
> Would it be possible for you to consider having Kconfig option to make
> such configuration global, static?

Especially considering the post-commit-message remarks I don't think this can be
other than a per-guest setting.

Jan
Re: [RFC PATCH for-4.22] x86/hvm: Introduce force_x2apic flag
Posted by Alejandro Vallejo 2 months, 4 weeks ago
On Thu Oct 30, 2025 at 8:46 AM CET, Jan Beulich wrote:
> On 30.10.2025 00:08, Grygorii Strashko wrote:
>> On 29.10.25 20:26, Teddy Astie wrote:
>>> --- a/xen/arch/x86/include/asm/domain.h
>>> +++ b/xen/arch/x86/include/asm/domain.h
>>> @@ -509,6 +509,8 @@ struct arch_domain
>>>   #define has_pirq(d)        (!!((d)->arch.emulation_flags & X86_EMU_USE_PIRQ))
>>>   #define has_vpci(d)        (!!((d)->arch.emulation_flags & X86_EMU_VPCI))
>>>   
>>> +#define has_force_x2apic(d) ((d)->arch.hvm.force_x2apic)
>> 
>> Would it be possible for you to consider having Kconfig option to make
>> such configuration global, static?
>
> Especially considering the post-commit-message remarks I don't think this can be
> other than a per-guest setting.
>
> Jan

It'd certainly be of use to us to compile out the entirety of xAPIC emulation
in favour of x2APIC.

Granted, it imposes restrictions on what guests are able to run and how, but
that might be acceptable in the interest of a leaner hypervisor.

It is fairly annoying there's no architectural means to signal it.

Cheers,
Alejandro
Re: [RFC PATCH for-4.22] x86/hvm: Introduce force_x2apic flag
Posted by Roger Pau Monné 3 months, 1 week ago
On Wed, Oct 29, 2025 at 06:26:14PM +0000, Teddy Astie wrote:
> Introduce a new flag to force the x2APIC enabled and preventing a
> guest from switching back LAPIC to xAPIC mode.
> 
> The semantics of this mode are based IA32_XAPIC_DISABLE_STATUS
> architectural MSR of Intel specification.
> 
> Signed-off-by: Teddy Astie <teddy.astie@vates.tech>

Thanks, some initial comments below.

> ---
> This feature can be useful for various reasons, starting with SEV as
> it is complicated (especially with SEV-ES) to handle MMIO, and legacy
> xAPIC is one thing that needs MMIO intercepts (and Linux uses it during
> boot unless x2APIC is initially enabled, even if it switches to
> x2apic afterward). It could also be interesting to reduce the attack
> surface of the hypervisor (by only exposing x2apic to the guest).
> 
> As it can allow to have MMIO-less guest (using PVH), perhaps it can
> be enough for avoiding the problematic cases of virtualized INVLPGB
> (when we have it).
> 
> In my testing, Linux, FreeBSD and PV-shim works fine with it; OVMF
> freezes for some reason, NetBSD doesn't support it (no x2apic support
> as Xen guest). HVM BIOS gets stuck at SeaBIOS as it expects booting
> with xAPIC.
> 
> On Intel platforms, it would be better to expose the
> IA32_XAPIC_DISABLE_STATUS architectural MSR to advertise this to
> guest, but it's non-trivial as it needs to be properly exposed
> through IA32_ARCH_CAPABILITIES which is currently passed-through.
> 
>  docs/man/xl.cfg.5.pod.in              |  7 +++++++
>  tools/libs/light/libxl_types.idl      |  1 +
>  tools/libs/light/libxl_x86.c          |  4 ++++
>  tools/xl/xl_parse.c                   |  1 +
>  xen/arch/x86/domain.c                 |  2 +-
>  xen/arch/x86/hvm/hvm.c                |  2 ++
>  xen/arch/x86/hvm/vlapic.c             | 23 ++++++++++++++++++++++-
>  xen/arch/x86/include/asm/domain.h     |  2 ++
>  xen/arch/x86/include/asm/hvm/domain.h |  3 +++
>  xen/include/public/arch-x86/xen.h     | 12 +++++++++++-
>  10 files changed, 54 insertions(+), 3 deletions(-)

Seeing there are no changes to the ACPI tables exposed to the guest,
do we want to start exposing X2APIC MADT entries instead of the plain
APIC entries?

The ACPI spec seems to suggest that you can expose APIC entries for
APICs below 255, for compatibility reasons.  But given that we would
force the guest to use X2APIC mode it would certainly need to
understand how to process X2APIC MADT entries anyway.

Not sure it makes much of a difference, but wondering whether OSes
expect X2APIC MADT entries if the mode is locked to X2APIC.

> 
> diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in
> index ad1553c5e9..01b41d93c0 100644
> --- a/docs/man/xl.cfg.5.pod.in
> +++ b/docs/man/xl.cfg.5.pod.in
> @@ -3198,6 +3198,13 @@ option.
>  
>  If using this option is necessary to fix an issue, please report a bug.
>  
> +=item B<force_x2apic=BOOLEAN>
> +
> +Force the LAPIC in x2APIC mode and prevent the guest from disabling
> +it or switching to xAPIC mode.
> +
> +This option is disabled by default.
> +
>  =back
>  
>  =head1 SEE ALSO
> diff --git a/tools/libs/light/libxl_types.idl b/tools/libs/light/libxl_types.idl
> index d64a573ff3..b95278007e 100644
> --- a/tools/libs/light/libxl_types.idl
> +++ b/tools/libs/light/libxl_types.idl
> @@ -738,6 +738,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
>                                 ("arm_sci", libxl_arm_sci),
>                                ])),
>      ("arch_x86", Struct(None, [("msr_relaxed", libxl_defbool),
> +                               ("force_x2apic", libxl_defbool)

This addition needs a new define in libxl.h to signal it's presence,
see LIBXL_HAVE_* defines in there.

>                                ])),
>      # Alternate p2m is not bound to any architecture or guest type, as it is
>      # supported by x86 HVM and ARM support is planned.
> diff --git a/tools/libs/light/libxl_x86.c b/tools/libs/light/libxl_x86.c
> index 60d4e8661c..2e0205d2a2 100644
> --- a/tools/libs/light/libxl_x86.c
> +++ b/tools/libs/light/libxl_x86.c
> @@ -26,6 +26,9 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc,
>      if (libxl_defbool_val(d_config->b_info.arch_x86.msr_relaxed))
>          config->arch.misc_flags |= XEN_X86_MSR_RELAXED;
>  
> +    if (libxl_defbool_val(d_config->b_info.arch_x86.force_x2apic))
> +        config->arch.misc_flags |= XEN_X86_FORCE_X2APIC;
> +
>      if (libxl_defbool_val(d_config->b_info.trap_unmapped_accesses)) {
>              LOG(ERROR, "trap_unmapped_accesses is not supported on x86\n");
>              return ERROR_FAIL;
> @@ -818,6 +821,7 @@ int libxl__arch_domain_build_info_setdefault(libxl__gc *gc,
>  {
>      libxl_defbool_setdefault(&b_info->acpi, true);
>      libxl_defbool_setdefault(&b_info->arch_x86.msr_relaxed, false);
> +    libxl_defbool_setdefault(&b_info->arch_x86.force_x2apic, false);
>      libxl_defbool_setdefault(&b_info->trap_unmapped_accesses, false);
>  
>      if (b_info->type == LIBXL_DOMAIN_TYPE_HVM) {
> diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c
> index af86d3186d..d84ab7c823 100644
> --- a/tools/xl/xl_parse.c
> +++ b/tools/xl/xl_parse.c
> @@ -3041,6 +3041,7 @@ skip_usbdev:
>                      "If it fixes an issue you are having please report to "
>                      "xen-devel@lists.xenproject.org.\n");
>  
> +    xlu_cfg_get_defbool(config, "force_x2apic", &b_info->arch_x86.force_x2apic, 0);
>      xlu_cfg_get_defbool(config, "vpmu", &b_info->vpmu, 0);
>  
>      xlu_cfg_destroy(config);
> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> index 19fd86ce88..02f650a614 100644
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -704,7 +704,7 @@ int arch_sanitise_domain_config(struct xen_domctl_createdomain *config)
>          return -EINVAL;
>      }
>  
> -    if ( config->arch.misc_flags & ~XEN_X86_MSR_RELAXED )
> +    if ( config->arch.misc_flags & ~(XEN_X86_MSR_RELAXED | XEN_X86_FORCE_X2APIC) )
>      {
>          dprintk(XENLOG_INFO, "Invalid arch misc flags %#x\n",
>                  config->arch.misc_flags);
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index 0c60faa39d..73cbac0f22 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -616,6 +616,8 @@ int hvm_domain_initialise(struct domain *d,
>      INIT_LIST_HEAD(&d->arch.hvm.mmcfg_regions);
>      INIT_LIST_HEAD(&d->arch.hvm.msix_tables);
>  
> +    d->arch.hvm.force_x2apic = config->arch.misc_flags & XEN_X86_FORCE_X2APIC;
> +
>      rc = create_perdomain_mapping(d, PERDOMAIN_VIRT_START, 0, NULL, NULL);
>      if ( rc )
>          goto fail;
> diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
> index 993e972cd7..ae8df70d2e 100644
> --- a/xen/arch/x86/hvm/vlapic.c
> +++ b/xen/arch/x86/hvm/vlapic.c
> @@ -1116,6 +1116,20 @@ int guest_wrmsr_apic_base(struct vcpu *v, uint64_t val)
>      if ( !has_vlapic(v->domain) )
>          return X86EMUL_EXCEPTION;
>  
> +    if ( has_force_x2apic(v->domain) )
> +    {
> +        /*
> +        * We implement the same semantics as MSR_IA32_XAPIC_DISABLE_STATUS:
> +        * LEGACY_XAPIC_DISABLED which rejects any attempt at clearing
> +        * IA32_APIC_BASE.EXTD, thus forcing the LAPIC in x2APIC mode.
> +        */
> +        if ( !(val & APIC_BASE_EXTD) )
> +        {
> +            gprintk(XENLOG_WARNING, "tried to disable x2APIC while forced on\n");
> +            return X86EMUL_EXCEPTION;
> +        }
> +    }
> +
>      /* Attempting to set reserved bits? */
>      if ( val & ~(APIC_BASE_ADDR_MASK | APIC_BASE_ENABLE | APIC_BASE_BSP |
>                   (cp->basic.x2apic ? APIC_BASE_EXTD : 0)) )
> @@ -1474,7 +1488,14 @@ void vlapic_reset(struct vlapic *vlapic)
>      if ( v->vcpu_id == 0 )
>          vlapic->hw.apic_base_msr |= APIC_BASE_BSP;
>  
> -    vlapic_set_reg(vlapic, APIC_ID, (v->vcpu_id * 2) << 24);
> +    if ( has_force_x2apic(v->domain) )
> +    {
> +        vlapic->hw.apic_base_msr |= APIC_BASE_EXTD;
> +        set_x2apic_id(vlapic);
> +    }
> +    else
> +        vlapic_set_reg(vlapic, APIC_ID, (v->vcpu_id * 2) << 24);
> +
>      vlapic_do_init(vlapic);
>  }
>  
> diff --git a/xen/arch/x86/include/asm/domain.h b/xen/arch/x86/include/asm/domain.h
> index 5df8c78253..771992d156 100644
> --- a/xen/arch/x86/include/asm/domain.h
> +++ b/xen/arch/x86/include/asm/domain.h
> @@ -509,6 +509,8 @@ struct arch_domain
>  #define has_pirq(d)        (!!((d)->arch.emulation_flags & X86_EMU_USE_PIRQ))
>  #define has_vpci(d)        (!!((d)->arch.emulation_flags & X86_EMU_VPCI))
>  
> +#define has_force_x2apic(d) ((d)->arch.hvm.force_x2apic)
> +
>  #define gdt_ldt_pt_idx(v) \
>        ((v)->vcpu_id >> (PAGETABLE_ORDER - GDT_LDT_VCPU_SHIFT))
>  #define pv_gdt_ptes(v) \
> diff --git a/xen/arch/x86/include/asm/hvm/domain.h b/xen/arch/x86/include/asm/hvm/domain.h
> index 333501d5f2..b56fa08b73 100644
> --- a/xen/arch/x86/include/asm/hvm/domain.h
> +++ b/xen/arch/x86/include/asm/hvm/domain.h
> @@ -108,6 +108,9 @@ struct hvm_domain {
>      /* Compatibility setting for a bug in x2APIC LDR */
>      bool bug_x2apic_ldr_vcpu_id;
>  
> +    /* LAPIC is forced in x2APIC mode */
> +    bool force_x2apic;

This should be a field in the vlapic struct, but seeing this I wonder
whether we want to virtualize MSR_IA32_XAPIC_DISABLE_STATUS MSR and
set the bit there.  This would also help with migrating the option, as
you could then migrate the "locked" status easily by just migrating
the contents of the MSR_IA32_XAPIC_DISABLE_STATUS MSR.

> +
>      /* hypervisor intercepted msix table */
>      struct list_head       msixtbl_list;
>  
> diff --git a/xen/include/public/arch-x86/xen.h b/xen/include/public/arch-x86/xen.h
> index b99a691706..75aa31d9ed 100644
> --- a/xen/include/public/arch-x86/xen.h
> +++ b/xen/include/public/arch-x86/xen.h
> @@ -309,11 +309,21 @@ struct xen_arch_domainconfig {
>   * doesn't allow the guest to read or write to the underlying MSR.
>   */
>  #define XEN_X86_MSR_RELAXED (1u << 0)
> +
> +/*
> + * This option forces the LAPIC to be in X2APIC mode (IA32_APIC_BASE.EXTD = 1)
> + * using the same semantics as IA32_XAPIC_DISABLE_STATUS:LEGACY_XAPIC_DISABLED
> + *
> + * Attempts by the guest to clear IA32_APIC_BASE.EXTD (e.g disable X2APIC) will
> + * inject #GP in the guest.
> + */
> +#define XEN_X86_FORCE_X2APIC (1U << 1)
> +
>      uint32_t misc_flags;

If we go the MSR route we won't need a new misc_flag, as the toolstack
could set the initial value of the MSR_IA32_XAPIC_DISABLE_STATUS using
the existing way to load vCPU context.

Thanks, Roger.
Re: [RFC PATCH for-4.22] x86/hvm: Introduce force_x2apic flag
Posted by Jan Beulich 3 months, 1 week ago
On 29.10.2025 19:52, Roger Pau Monné wrote:
> Seeing there are no changes to the ACPI tables exposed to the guest,
> do we want to start exposing X2APIC MADT entries instead of the plain
> APIC entries?
> 
> The ACPI spec seems to suggest that you can expose APIC entries for
> APICs below 255, for compatibility reasons.  But given that we would
> force the guest to use X2APIC mode it would certainly need to
> understand how to process X2APIC MADT entries anyway.

+1

Jan

Re: [RFC PATCH for-4.22] x86/hvm: Introduce force_x2apic flag
Posted by Teddy Astie 3 months, 1 week ago
Le 29/10/2025 à 19:55, Roger Pau Monné a écrit :
> On Wed, Oct 29, 2025 at 06:26:14PM +0000, Teddy Astie wrote:
>> Introduce a new flag to force the x2APIC enabled and preventing a
>> guest from switching back LAPIC to xAPIC mode.
>>
>> The semantics of this mode are based IA32_XAPIC_DISABLE_STATUS
>> architectural MSR of Intel specification.
>>
>> Signed-off-by: Teddy Astie <teddy.astie@vates.tech>
> 
> Thanks, some initial comments below.
> 
>> ---
>> This feature can be useful for various reasons, starting with SEV as
>> it is complicated (especially with SEV-ES) to handle MMIO, and legacy
>> xAPIC is one thing that needs MMIO intercepts (and Linux uses it during
>> boot unless x2APIC is initially enabled, even if it switches to
>> x2apic afterward). It could also be interesting to reduce the attack
>> surface of the hypervisor (by only exposing x2apic to the guest).
>>
>> As it can allow to have MMIO-less guest (using PVH), perhaps it can
>> be enough for avoiding the problematic cases of virtualized INVLPGB
>> (when we have it).
>>
>> In my testing, Linux, FreeBSD and PV-shim works fine with it; OVMF
>> freezes for some reason, NetBSD doesn't support it (no x2apic support
>> as Xen guest). HVM BIOS gets stuck at SeaBIOS as it expects booting
>> with xAPIC.
>>
>> On Intel platforms, it would be better to expose the
>> IA32_XAPIC_DISABLE_STATUS architectural MSR to advertise this to
>> guest, but it's non-trivial as it needs to be properly exposed
>> through IA32_ARCH_CAPABILITIES which is currently passed-through.
>>
>>   docs/man/xl.cfg.5.pod.in              |  7 +++++++
>>   tools/libs/light/libxl_types.idl      |  1 +
>>   tools/libs/light/libxl_x86.c          |  4 ++++
>>   tools/xl/xl_parse.c                   |  1 +
>>   xen/arch/x86/domain.c                 |  2 +-
>>   xen/arch/x86/hvm/hvm.c                |  2 ++
>>   xen/arch/x86/hvm/vlapic.c             | 23 ++++++++++++++++++++++-
>>   xen/arch/x86/include/asm/domain.h     |  2 ++
>>   xen/arch/x86/include/asm/hvm/domain.h |  3 +++
>>   xen/include/public/arch-x86/xen.h     | 12 +++++++++++-
>>   10 files changed, 54 insertions(+), 3 deletions(-)
> 
> Seeing there are no changes to the ACPI tables exposed to the guest,
> do we want to start exposing X2APIC MADT entries instead of the plain
> APIC entries?
> 
> The ACPI spec seems to suggest that you can expose APIC entries for
> APICs below 255, for compatibility reasons.  But given that we would
> force the guest to use X2APIC mode it would certainly need to
> understand how to process X2APIC MADT entries anyway.
> 
> Not sure it makes much of a difference, but wondering whether OSes
> expect X2APIC MADT entries if the mode is locked to X2APIC.
> 

In all OS I checked, they see x2APIC MADT entries as a different format 
for LAPIC entries and don't really link it with whether x2APIC is used 
or not.

But I think it's safe to assume all OS that supports x2APIC has support 
for x2APIC MADT entries, which could make ACPI table generation simpler 
(especially for dealing with LAPIC IDs over 255)

>>
>> diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in
>> index ad1553c5e9..01b41d93c0 100644
>> --- a/docs/man/xl.cfg.5.pod.in
>> +++ b/docs/man/xl.cfg.5.pod.in
>> @@ -3198,6 +3198,13 @@ option.
>>   
>>   If using this option is necessary to fix an issue, please report a bug.
>>   
>> +=item B<force_x2apic=BOOLEAN>
>> +
>> +Force the LAPIC in x2APIC mode and prevent the guest from disabling
>> +it or switching to xAPIC mode.
>> +
>> +This option is disabled by default.
>> +
>>   =back
>>   
>>   =head1 SEE ALSO
>> diff --git a/tools/libs/light/libxl_types.idl b/tools/libs/light/libxl_types.idl
>> index d64a573ff3..b95278007e 100644
>> --- a/tools/libs/light/libxl_types.idl
>> +++ b/tools/libs/light/libxl_types.idl
>> @@ -738,6 +738,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
>>                                  ("arm_sci", libxl_arm_sci),
>>                                 ])),
>>       ("arch_x86", Struct(None, [("msr_relaxed", libxl_defbool),
>> +                               ("force_x2apic", libxl_defbool)
> 
> This addition needs a new define in libxl.h to signal it's presence,
> see LIBXL_HAVE_* defines in there.
> 

Something like LIBXL_HAVE_FORCE_X2APIC ?

>>                                 ])),
>>       # Alternate p2m is not bound to any architecture or guest type, as it is
>>       # supported by x86 HVM and ARM support is planned.
>> diff --git a/tools/libs/light/libxl_x86.c b/tools/libs/light/libxl_x86.c
>> index 60d4e8661c..2e0205d2a2 100644
>> --- a/tools/libs/light/libxl_x86.c
>> +++ b/tools/libs/light/libxl_x86.c
>> @@ -26,6 +26,9 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc,
>>       if (libxl_defbool_val(d_config->b_info.arch_x86.msr_relaxed))
>>           config->arch.misc_flags |= XEN_X86_MSR_RELAXED;
>>   
>> +    if (libxl_defbool_val(d_config->b_info.arch_x86.force_x2apic))
>> +        config->arch.misc_flags |= XEN_X86_FORCE_X2APIC;
>> +
>>       if (libxl_defbool_val(d_config->b_info.trap_unmapped_accesses)) {
>>               LOG(ERROR, "trap_unmapped_accesses is not supported on x86\n");
>>               return ERROR_FAIL;
>> @@ -818,6 +821,7 @@ int libxl__arch_domain_build_info_setdefault(libxl__gc *gc,
>>   {
>>       libxl_defbool_setdefault(&b_info->acpi, true);
>>       libxl_defbool_setdefault(&b_info->arch_x86.msr_relaxed, false);
>> +    libxl_defbool_setdefault(&b_info->arch_x86.force_x2apic, false);
>>       libxl_defbool_setdefault(&b_info->trap_unmapped_accesses, false);
>>   
>>       if (b_info->type == LIBXL_DOMAIN_TYPE_HVM) {
>> diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c
>> index af86d3186d..d84ab7c823 100644
>> --- a/tools/xl/xl_parse.c
>> +++ b/tools/xl/xl_parse.c
>> @@ -3041,6 +3041,7 @@ skip_usbdev:
>>                       "If it fixes an issue you are having please report to "
>>                       "xen-devel@lists.xenproject.org.\n");
>>   
>> +    xlu_cfg_get_defbool(config, "force_x2apic", &b_info->arch_x86.force_x2apic, 0);
>>       xlu_cfg_get_defbool(config, "vpmu", &b_info->vpmu, 0);
>>   
>>       xlu_cfg_destroy(config);
>> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
>> index 19fd86ce88..02f650a614 100644
>> --- a/xen/arch/x86/domain.c
>> +++ b/xen/arch/x86/domain.c
>> @@ -704,7 +704,7 @@ int arch_sanitise_domain_config(struct xen_domctl_createdomain *config)
>>           return -EINVAL;
>>       }
>>   
>> -    if ( config->arch.misc_flags & ~XEN_X86_MSR_RELAXED )
>> +    if ( config->arch.misc_flags & ~(XEN_X86_MSR_RELAXED | XEN_X86_FORCE_X2APIC) )
>>       {
>>           dprintk(XENLOG_INFO, "Invalid arch misc flags %#x\n",
>>                   config->arch.misc_flags);
>> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
>> index 0c60faa39d..73cbac0f22 100644
>> --- a/xen/arch/x86/hvm/hvm.c
>> +++ b/xen/arch/x86/hvm/hvm.c
>> @@ -616,6 +616,8 @@ int hvm_domain_initialise(struct domain *d,
>>       INIT_LIST_HEAD(&d->arch.hvm.mmcfg_regions);
>>       INIT_LIST_HEAD(&d->arch.hvm.msix_tables);
>>   
>> +    d->arch.hvm.force_x2apic = config->arch.misc_flags & XEN_X86_FORCE_X2APIC;
>> +
>>       rc = create_perdomain_mapping(d, PERDOMAIN_VIRT_START, 0, NULL, NULL);
>>       if ( rc )
>>           goto fail;
>> diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
>> index 993e972cd7..ae8df70d2e 100644
>> --- a/xen/arch/x86/hvm/vlapic.c
>> +++ b/xen/arch/x86/hvm/vlapic.c
>> @@ -1116,6 +1116,20 @@ int guest_wrmsr_apic_base(struct vcpu *v, uint64_t val)
>>       if ( !has_vlapic(v->domain) )
>>           return X86EMUL_EXCEPTION;
>>   
>> +    if ( has_force_x2apic(v->domain) )
>> +    {
>> +        /*
>> +        * We implement the same semantics as MSR_IA32_XAPIC_DISABLE_STATUS:
>> +        * LEGACY_XAPIC_DISABLED which rejects any attempt at clearing
>> +        * IA32_APIC_BASE.EXTD, thus forcing the LAPIC in x2APIC mode.
>> +        */
>> +        if ( !(val & APIC_BASE_EXTD) )
>> +        {
>> +            gprintk(XENLOG_WARNING, "tried to disable x2APIC while forced on\n");
>> +            return X86EMUL_EXCEPTION;
>> +        }
>> +    }
>> +
>>       /* Attempting to set reserved bits? */
>>       if ( val & ~(APIC_BASE_ADDR_MASK | APIC_BASE_ENABLE | APIC_BASE_BSP |
>>                    (cp->basic.x2apic ? APIC_BASE_EXTD : 0)) )
>> @@ -1474,7 +1488,14 @@ void vlapic_reset(struct vlapic *vlapic)
>>       if ( v->vcpu_id == 0 )
>>           vlapic->hw.apic_base_msr |= APIC_BASE_BSP;
>>   
>> -    vlapic_set_reg(vlapic, APIC_ID, (v->vcpu_id * 2) << 24);
>> +    if ( has_force_x2apic(v->domain) )
>> +    {
>> +        vlapic->hw.apic_base_msr |= APIC_BASE_EXTD;
>> +        set_x2apic_id(vlapic);
>> +    }
>> +    else
>> +        vlapic_set_reg(vlapic, APIC_ID, (v->vcpu_id * 2) << 24);
>> +
>>       vlapic_do_init(vlapic);
>>   }
>>   
>> diff --git a/xen/arch/x86/include/asm/domain.h b/xen/arch/x86/include/asm/domain.h
>> index 5df8c78253..771992d156 100644
>> --- a/xen/arch/x86/include/asm/domain.h
>> +++ b/xen/arch/x86/include/asm/domain.h
>> @@ -509,6 +509,8 @@ struct arch_domain
>>   #define has_pirq(d)        (!!((d)->arch.emulation_flags & X86_EMU_USE_PIRQ))
>>   #define has_vpci(d)        (!!((d)->arch.emulation_flags & X86_EMU_VPCI))
>>   
>> +#define has_force_x2apic(d) ((d)->arch.hvm.force_x2apic)
>> +
>>   #define gdt_ldt_pt_idx(v) \
>>         ((v)->vcpu_id >> (PAGETABLE_ORDER - GDT_LDT_VCPU_SHIFT))
>>   #define pv_gdt_ptes(v) \
>> diff --git a/xen/arch/x86/include/asm/hvm/domain.h b/xen/arch/x86/include/asm/hvm/domain.h
>> index 333501d5f2..b56fa08b73 100644
>> --- a/xen/arch/x86/include/asm/hvm/domain.h
>> +++ b/xen/arch/x86/include/asm/hvm/domain.h
>> @@ -108,6 +108,9 @@ struct hvm_domain {
>>       /* Compatibility setting for a bug in x2APIC LDR */
>>       bool bug_x2apic_ldr_vcpu_id;
>>   
>> +    /* LAPIC is forced in x2APIC mode */
>> +    bool force_x2apic;
> 
> This should be a field in the vlapic struct, but seeing this I wonder
> whether we want to virtualize MSR_IA32_XAPIC_DISABLE_STATUS MSR and
> set the bit there.  This would also help with migrating the option, as
> you could then migrate the "locked" status easily by just migrating
> the contents of the MSR_IA32_XAPIC_DISABLE_STATUS MSR.
> 

One issue with MSR_IA32_XAPIC_DISABLE_STATUS is that it is only 
meaningful on Intel platforms (unless we also virtualize it on AMD ?), 
and I haven't found a AMD-specific mecanism for exposing it.
Most operating systems don't try to disable x2apic (unless told to do 
it) if it is initially enabled ("enabled by firmware").

>> +
>>       /* hypervisor intercepted msix table */
>>       struct list_head       msixtbl_list;
>>   
>> diff --git a/xen/include/public/arch-x86/xen.h b/xen/include/public/arch-x86/xen.h
>> index b99a691706..75aa31d9ed 100644
>> --- a/xen/include/public/arch-x86/xen.h
>> +++ b/xen/include/public/arch-x86/xen.h
>> @@ -309,11 +309,21 @@ struct xen_arch_domainconfig {
>>    * doesn't allow the guest to read or write to the underlying MSR.
>>    */
>>   #define XEN_X86_MSR_RELAXED (1u << 0)
>> +
>> +/*
>> + * This option forces the LAPIC to be in X2APIC mode (IA32_APIC_BASE.EXTD = 1)
>> + * using the same semantics as IA32_XAPIC_DISABLE_STATUS:LEGACY_XAPIC_DISABLED
>> + *
>> + * Attempts by the guest to clear IA32_APIC_BASE.EXTD (e.g disable X2APIC) will
>> + * inject #GP in the guest.
>> + */
>> +#define XEN_X86_FORCE_X2APIC (1U << 1)
>> +
>>       uint32_t misc_flags;
> 
> If we go the MSR route we won't need a new misc_flag, as the toolstack
> could set the initial value of the MSR_IA32_XAPIC_DISABLE_STATUS using
> the existing way to load vCPU context.
> 
> Thanks, Roger.
> 

Teddy


--
Teddy Astie | Vates XCP-ng Developer

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech
Re: [RFC PATCH for-4.22] x86/hvm: Introduce force_x2apic flag
Posted by Roger Pau Monné 3 months, 1 week ago
On Wed, Oct 29, 2025 at 08:06:36PM +0000, Teddy Astie wrote:
> Le 29/10/2025 à 19:55, Roger Pau Monné a écrit :
> > On Wed, Oct 29, 2025 at 06:26:14PM +0000, Teddy Astie wrote:
> >> Introduce a new flag to force the x2APIC enabled and preventing a
> >> guest from switching back LAPIC to xAPIC mode.
> >>
> >> The semantics of this mode are based IA32_XAPIC_DISABLE_STATUS
> >> architectural MSR of Intel specification.
> >>
> >> Signed-off-by: Teddy Astie <teddy.astie@vates.tech>
> > 
> > Thanks, some initial comments below.
> > 
> >> ---
> >> This feature can be useful for various reasons, starting with SEV as
> >> it is complicated (especially with SEV-ES) to handle MMIO, and legacy
> >> xAPIC is one thing that needs MMIO intercepts (and Linux uses it during
> >> boot unless x2APIC is initially enabled, even if it switches to
> >> x2apic afterward). It could also be interesting to reduce the attack
> >> surface of the hypervisor (by only exposing x2apic to the guest).
> >>
> >> As it can allow to have MMIO-less guest (using PVH), perhaps it can
> >> be enough for avoiding the problematic cases of virtualized INVLPGB
> >> (when we have it).
> >>
> >> In my testing, Linux, FreeBSD and PV-shim works fine with it; OVMF
> >> freezes for some reason, NetBSD doesn't support it (no x2apic support
> >> as Xen guest). HVM BIOS gets stuck at SeaBIOS as it expects booting
> >> with xAPIC.
> >>
> >> On Intel platforms, it would be better to expose the
> >> IA32_XAPIC_DISABLE_STATUS architectural MSR to advertise this to
> >> guest, but it's non-trivial as it needs to be properly exposed
> >> through IA32_ARCH_CAPABILITIES which is currently passed-through.
> >>
> >>   docs/man/xl.cfg.5.pod.in              |  7 +++++++
> >>   tools/libs/light/libxl_types.idl      |  1 +
> >>   tools/libs/light/libxl_x86.c          |  4 ++++
> >>   tools/xl/xl_parse.c                   |  1 +
> >>   xen/arch/x86/domain.c                 |  2 +-
> >>   xen/arch/x86/hvm/hvm.c                |  2 ++
> >>   xen/arch/x86/hvm/vlapic.c             | 23 ++++++++++++++++++++++-
> >>   xen/arch/x86/include/asm/domain.h     |  2 ++
> >>   xen/arch/x86/include/asm/hvm/domain.h |  3 +++
> >>   xen/include/public/arch-x86/xen.h     | 12 +++++++++++-
> >>   10 files changed, 54 insertions(+), 3 deletions(-)
> > 
> > Seeing there are no changes to the ACPI tables exposed to the guest,
> > do we want to start exposing X2APIC MADT entries instead of the plain
> > APIC entries?
> > 
> > The ACPI spec seems to suggest that you can expose APIC entries for
> > APICs below 255, for compatibility reasons.  But given that we would
> > force the guest to use X2APIC mode it would certainly need to
> > understand how to process X2APIC MADT entries anyway.
> > 
> > Not sure it makes much of a difference, but wondering whether OSes
> > expect X2APIC MADT entries if the mode is locked to X2APIC.
> > 
> 
> In all OS I checked, they see x2APIC MADT entries as a different format 
> for LAPIC entries and don't really link it with whether x2APIC is used 
> or not.
> 
> But I think it's safe to assume all OS that supports x2APIC has support 
> for x2APIC MADT entries, which could make ACPI table generation simpler 
> (especially for dealing with LAPIC IDs over 255)
> 
> >>
> >> diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in
> >> index ad1553c5e9..01b41d93c0 100644
> >> --- a/docs/man/xl.cfg.5.pod.in
> >> +++ b/docs/man/xl.cfg.5.pod.in
> >> @@ -3198,6 +3198,13 @@ option.
> >>   
> >>   If using this option is necessary to fix an issue, please report a bug.
> >>   
> >> +=item B<force_x2apic=BOOLEAN>
> >> +
> >> +Force the LAPIC in x2APIC mode and prevent the guest from disabling
> >> +it or switching to xAPIC mode.
> >> +
> >> +This option is disabled by default.
> >> +
> >>   =back
> >>   
> >>   =head1 SEE ALSO
> >> diff --git a/tools/libs/light/libxl_types.idl b/tools/libs/light/libxl_types.idl
> >> index d64a573ff3..b95278007e 100644
> >> --- a/tools/libs/light/libxl_types.idl
> >> +++ b/tools/libs/light/libxl_types.idl
> >> @@ -738,6 +738,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
> >>                                  ("arm_sci", libxl_arm_sci),
> >>                                 ])),
> >>       ("arch_x86", Struct(None, [("msr_relaxed", libxl_defbool),
> >> +                               ("force_x2apic", libxl_defbool)
> > 
> > This addition needs a new define in libxl.h to signal it's presence,
> > see LIBXL_HAVE_* defines in there.
> > 
> 
> Something like LIBXL_HAVE_FORCE_X2APIC ?

Yes, something like that.  Not sure we want to add X86 somewhere in
there, but X2APIC is already x86-specific so unlikely to have any
meaning for other arches.

> >>                                 ])),
> >>       # Alternate p2m is not bound to any architecture or guest type, as it is
> >>       # supported by x86 HVM and ARM support is planned.
> >> diff --git a/tools/libs/light/libxl_x86.c b/tools/libs/light/libxl_x86.c
> >> index 60d4e8661c..2e0205d2a2 100644
> >> --- a/tools/libs/light/libxl_x86.c
> >> +++ b/tools/libs/light/libxl_x86.c
> >> @@ -26,6 +26,9 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc,
> >>       if (libxl_defbool_val(d_config->b_info.arch_x86.msr_relaxed))
> >>           config->arch.misc_flags |= XEN_X86_MSR_RELAXED;
> >>   
> >> +    if (libxl_defbool_val(d_config->b_info.arch_x86.force_x2apic))
> >> +        config->arch.misc_flags |= XEN_X86_FORCE_X2APIC;
> >> +
> >>       if (libxl_defbool_val(d_config->b_info.trap_unmapped_accesses)) {
> >>               LOG(ERROR, "trap_unmapped_accesses is not supported on x86\n");
> >>               return ERROR_FAIL;
> >> @@ -818,6 +821,7 @@ int libxl__arch_domain_build_info_setdefault(libxl__gc *gc,
> >>   {
> >>       libxl_defbool_setdefault(&b_info->acpi, true);
> >>       libxl_defbool_setdefault(&b_info->arch_x86.msr_relaxed, false);
> >> +    libxl_defbool_setdefault(&b_info->arch_x86.force_x2apic, false);
> >>       libxl_defbool_setdefault(&b_info->trap_unmapped_accesses, false);
> >>   
> >>       if (b_info->type == LIBXL_DOMAIN_TYPE_HVM) {
> >> diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c
> >> index af86d3186d..d84ab7c823 100644
> >> --- a/tools/xl/xl_parse.c
> >> +++ b/tools/xl/xl_parse.c
> >> @@ -3041,6 +3041,7 @@ skip_usbdev:
> >>                       "If it fixes an issue you are having please report to "
> >>                       "xen-devel@lists.xenproject.org.\n");
> >>   
> >> +    xlu_cfg_get_defbool(config, "force_x2apic", &b_info->arch_x86.force_x2apic, 0);
> >>       xlu_cfg_get_defbool(config, "vpmu", &b_info->vpmu, 0);
> >>   
> >>       xlu_cfg_destroy(config);
> >> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> >> index 19fd86ce88..02f650a614 100644
> >> --- a/xen/arch/x86/domain.c
> >> +++ b/xen/arch/x86/domain.c
> >> @@ -704,7 +704,7 @@ int arch_sanitise_domain_config(struct xen_domctl_createdomain *config)
> >>           return -EINVAL;
> >>       }
> >>   
> >> -    if ( config->arch.misc_flags & ~XEN_X86_MSR_RELAXED )
> >> +    if ( config->arch.misc_flags & ~(XEN_X86_MSR_RELAXED | XEN_X86_FORCE_X2APIC) )
> >>       {
> >>           dprintk(XENLOG_INFO, "Invalid arch misc flags %#x\n",
> >>                   config->arch.misc_flags);
> >> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> >> index 0c60faa39d..73cbac0f22 100644
> >> --- a/xen/arch/x86/hvm/hvm.c
> >> +++ b/xen/arch/x86/hvm/hvm.c
> >> @@ -616,6 +616,8 @@ int hvm_domain_initialise(struct domain *d,
> >>       INIT_LIST_HEAD(&d->arch.hvm.mmcfg_regions);
> >>       INIT_LIST_HEAD(&d->arch.hvm.msix_tables);
> >>   
> >> +    d->arch.hvm.force_x2apic = config->arch.misc_flags & XEN_X86_FORCE_X2APIC;
> >> +
> >>       rc = create_perdomain_mapping(d, PERDOMAIN_VIRT_START, 0, NULL, NULL);
> >>       if ( rc )
> >>           goto fail;
> >> diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
> >> index 993e972cd7..ae8df70d2e 100644
> >> --- a/xen/arch/x86/hvm/vlapic.c
> >> +++ b/xen/arch/x86/hvm/vlapic.c
> >> @@ -1116,6 +1116,20 @@ int guest_wrmsr_apic_base(struct vcpu *v, uint64_t val)
> >>       if ( !has_vlapic(v->domain) )
> >>           return X86EMUL_EXCEPTION;
> >>   
> >> +    if ( has_force_x2apic(v->domain) )
> >> +    {
> >> +        /*
> >> +        * We implement the same semantics as MSR_IA32_XAPIC_DISABLE_STATUS:
> >> +        * LEGACY_XAPIC_DISABLED which rejects any attempt at clearing
> >> +        * IA32_APIC_BASE.EXTD, thus forcing the LAPIC in x2APIC mode.
> >> +        */
> >> +        if ( !(val & APIC_BASE_EXTD) )
> >> +        {
> >> +            gprintk(XENLOG_WARNING, "tried to disable x2APIC while forced on\n");
> >> +            return X86EMUL_EXCEPTION;
> >> +        }
> >> +    }
> >> +
> >>       /* Attempting to set reserved bits? */
> >>       if ( val & ~(APIC_BASE_ADDR_MASK | APIC_BASE_ENABLE | APIC_BASE_BSP |
> >>                    (cp->basic.x2apic ? APIC_BASE_EXTD : 0)) )
> >> @@ -1474,7 +1488,14 @@ void vlapic_reset(struct vlapic *vlapic)
> >>       if ( v->vcpu_id == 0 )
> >>           vlapic->hw.apic_base_msr |= APIC_BASE_BSP;
> >>   
> >> -    vlapic_set_reg(vlapic, APIC_ID, (v->vcpu_id * 2) << 24);
> >> +    if ( has_force_x2apic(v->domain) )
> >> +    {
> >> +        vlapic->hw.apic_base_msr |= APIC_BASE_EXTD;
> >> +        set_x2apic_id(vlapic);
> >> +    }
> >> +    else
> >> +        vlapic_set_reg(vlapic, APIC_ID, (v->vcpu_id * 2) << 24);
> >> +
> >>       vlapic_do_init(vlapic);
> >>   }
> >>   
> >> diff --git a/xen/arch/x86/include/asm/domain.h b/xen/arch/x86/include/asm/domain.h
> >> index 5df8c78253..771992d156 100644
> >> --- a/xen/arch/x86/include/asm/domain.h
> >> +++ b/xen/arch/x86/include/asm/domain.h
> >> @@ -509,6 +509,8 @@ struct arch_domain
> >>   #define has_pirq(d)        (!!((d)->arch.emulation_flags & X86_EMU_USE_PIRQ))
> >>   #define has_vpci(d)        (!!((d)->arch.emulation_flags & X86_EMU_VPCI))
> >>   
> >> +#define has_force_x2apic(d) ((d)->arch.hvm.force_x2apic)
> >> +
> >>   #define gdt_ldt_pt_idx(v) \
> >>         ((v)->vcpu_id >> (PAGETABLE_ORDER - GDT_LDT_VCPU_SHIFT))
> >>   #define pv_gdt_ptes(v) \
> >> diff --git a/xen/arch/x86/include/asm/hvm/domain.h b/xen/arch/x86/include/asm/hvm/domain.h
> >> index 333501d5f2..b56fa08b73 100644
> >> --- a/xen/arch/x86/include/asm/hvm/domain.h
> >> +++ b/xen/arch/x86/include/asm/hvm/domain.h
> >> @@ -108,6 +108,9 @@ struct hvm_domain {
> >>       /* Compatibility setting for a bug in x2APIC LDR */
> >>       bool bug_x2apic_ldr_vcpu_id;
> >>   
> >> +    /* LAPIC is forced in x2APIC mode */
> >> +    bool force_x2apic;
> > 
> > This should be a field in the vlapic struct, but seeing this I wonder
> > whether we want to virtualize MSR_IA32_XAPIC_DISABLE_STATUS MSR and
> > set the bit there.  This would also help with migrating the option, as
> > you could then migrate the "locked" status easily by just migrating
> > the contents of the MSR_IA32_XAPIC_DISABLE_STATUS MSR.
> > 
> 
> One issue with MSR_IA32_XAPIC_DISABLE_STATUS is that it is only 
> meaningful on Intel platforms (unless we also virtualize it on AMD ?), 
> and I haven't found a AMD-specific mecanism for exposing it.
> Most operating systems don't try to disable x2apic (unless told to do 
> it) if it is initially enabled ("enabled by firmware").

Yeah, I see the availability of MSR_IA32_XAPIC_DISABLE_STATUS is
exposed in MSR_ARCH_CAPABILITIES, which is only present on Intel
platforms.

I also haven't been able to find a way to expose the APIC is locked to
x2apic mode in any AMD manuals, which is a shame.

For Intel we should expose this when possible in
MSR_IA32_XAPIC_DISABLE_STATUS.  And we need to migrate the selection
in the stream as part of the APIC data.

Thanks, Roger.