KVM: x86: Support APX feature for guests

[PATCH RFC v1 07/20] KVM: nVMX: Support the extended instruction info field

Posted by Chang S. Bae 3 months ago

Define the VMCS field offset for the extended instruction information and
handle it for nested VMX.

When EGPRs are available, VMX provides a new 64-bit field to extend the
legacy instruction information, allowing access to the higher register
indices. Then, nested VMX needs to propagate this field between L1 and
L2.

The EGPR checker will be implemented later.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
---
RFC note:
During the draft, I brought up the offset definition initially for
non-nested VMX primarily. Then, I realized the switching helper affects
nVMX code anyway. Due to this dependency, this change is placed first
together with the offset definition.
---
 arch/x86/include/asm/vmx.h | 2 ++
 arch/x86/kvm/vmx/nested.c  | 2 ++
 arch/x86/kvm/vmx/vmcs12.c  | 1 +
 arch/x86/kvm/vmx/vmcs12.h  | 3 ++-
 arch/x86/kvm/vmx/vmx.h     | 2 ++
 5 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index c85c50019523..ab0684948c56 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -264,6 +264,8 @@ enum vmcs_field {
 	PID_POINTER_TABLE_HIGH		= 0x00002043,
 	GUEST_PHYSICAL_ADDRESS          = 0x00002400,
 	GUEST_PHYSICAL_ADDRESS_HIGH     = 0x00002401,
+	EXTENDED_INSTRUCTION_INFO	= 0x00002406,
+	EXTENDED_INSTRUCTION_INFO_HIGH	= 0x00002407,
 	VMCS_LINK_POINTER               = 0x00002800,
 	VMCS_LINK_POINTER_HIGH          = 0x00002801,
 	GUEST_IA32_DEBUGCTL             = 0x00002802,
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 97ec8e594155..3442610a6b70 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -4798,6 +4798,8 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
 		vmcs12->vm_exit_intr_info = exit_intr_info;
 		vmcs12->vm_exit_instruction_len = exit_insn_len;
 		vmcs12->vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO);
+		if (vmx_egpr_enabled(vcpu))
+			vmcs12->extended_instruction_info = vmcs_read64(EXTENDED_INSTRUCTION_INFO);
 
 		/*
 		 * According to spec, there's no need to store the guest's
diff --git a/arch/x86/kvm/vmx/vmcs12.c b/arch/x86/kvm/vmx/vmcs12.c
index 4233b5ca9461..ea2b690a419e 100644
--- a/arch/x86/kvm/vmx/vmcs12.c
+++ b/arch/x86/kvm/vmx/vmcs12.c
@@ -53,6 +53,7 @@ const unsigned short vmcs12_field_offsets[] = {
 	FIELD64(XSS_EXIT_BITMAP, xss_exit_bitmap),
 	FIELD64(ENCLS_EXITING_BITMAP, encls_exiting_bitmap),
 	FIELD64(GUEST_PHYSICAL_ADDRESS, guest_physical_address),
+	FIELD64(EXTENDED_INSTRUCTION_INFO, extended_instruction_info),
 	FIELD64(VMCS_LINK_POINTER, vmcs_link_pointer),
 	FIELD64(GUEST_IA32_DEBUGCTL, guest_ia32_debugctl),
 	FIELD64(GUEST_IA32_PAT, guest_ia32_pat),
diff --git a/arch/x86/kvm/vmx/vmcs12.h b/arch/x86/kvm/vmx/vmcs12.h
index 4ad6b16525b9..2146e45aaade 100644
--- a/arch/x86/kvm/vmx/vmcs12.h
+++ b/arch/x86/kvm/vmx/vmcs12.h
@@ -71,7 +71,7 @@ struct __packed vmcs12 {
 	u64 pml_address;
 	u64 encls_exiting_bitmap;
 	u64 tsc_multiplier;
-	u64 padding64[1]; /* room for future expansion */
+	u64 extended_instruction_info;
 	/*
 	 * To allow migration of L1 (complete with its L2 guests) between
 	 * machines of different natural widths (32 or 64 bit), we cannot have
@@ -261,6 +261,7 @@ static inline void vmx_check_vmcs12_offsets(void)
 	CHECK_OFFSET(pml_address, 312);
 	CHECK_OFFSET(encls_exiting_bitmap, 320);
 	CHECK_OFFSET(tsc_multiplier, 328);
+	CHECK_OFFSET(extended_instruction_info, 336);
 	CHECK_OFFSET(cr0_guest_host_mask, 344);
 	CHECK_OFFSET(cr4_guest_host_mask, 352);
 	CHECK_OFFSET(cr0_read_shadow, 360);
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 64a0772c883c..b8da6ebc35dc 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -372,6 +372,8 @@ struct vmx_insn_info {
 	union insn_info info;
 };
 
+static inline bool vmx_egpr_enabled(struct kvm_vcpu *vcpu __maybe_unused) { return false; }
+
 static inline struct vmx_insn_info vmx_get_insn_info(struct kvm_vcpu *vcpu __maybe_unused)
 {
 	struct vmx_insn_info insn;
-- 
2.51.0

Re: [PATCH RFC v1 07/20] KVM: nVMX: Support the extended instruction info field

Posted by Paolo Bonzini 2 months, 4 weeks ago

On 11/10/25 19:01, Chang S. Bae wrote:
> Define the VMCS field offset for the extended instruction information and
> handle it for nested VMX.
> 
> When EGPRs are available, VMX provides a new 64-bit field to extend the
> legacy instruction information, allowing access to the higher register
> indices. Then, nested VMX needs to propagate this field between L1 and
> L2.
> 
> The EGPR checker will be implemented later.
> 
> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
> ---
> RFC note:
> During the draft, I brought up the offset definition initially for
> non-nested VMX primarily. Then, I realized the switching helper affects
> nVMX code anyway. Due to this dependency, this change is placed first
> together with the offset definition.
> ---
>   arch/x86/include/asm/vmx.h | 2 ++
>   arch/x86/kvm/vmx/nested.c  | 2 ++
>   arch/x86/kvm/vmx/vmcs12.c  | 1 +
>   arch/x86/kvm/vmx/vmcs12.h  | 3 ++-
>   arch/x86/kvm/vmx/vmx.h     | 2 ++
>   5 files changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
> index c85c50019523..ab0684948c56 100644
> --- a/arch/x86/include/asm/vmx.h
> +++ b/arch/x86/include/asm/vmx.h
> @@ -264,6 +264,8 @@ enum vmcs_field {
>   	PID_POINTER_TABLE_HIGH		= 0x00002043,
>   	GUEST_PHYSICAL_ADDRESS          = 0x00002400,
>   	GUEST_PHYSICAL_ADDRESS_HIGH     = 0x00002401,
> +	EXTENDED_INSTRUCTION_INFO	= 0x00002406,
> +	EXTENDED_INSTRUCTION_INFO_HIGH	= 0x00002407,
>   	VMCS_LINK_POINTER               = 0x00002800,
>   	VMCS_LINK_POINTER_HIGH          = 0x00002801,
>   	GUEST_IA32_DEBUGCTL             = 0x00002802,
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 97ec8e594155..3442610a6b70 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -4798,6 +4798,8 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
>   		vmcs12->vm_exit_intr_info = exit_intr_info;
>   		vmcs12->vm_exit_instruction_len = exit_insn_len;
>   		vmcs12->vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO);
> +		if (vmx_egpr_enabled(vcpu))
> +			vmcs12->extended_instruction_info = vmcs_read64(EXTENDED_INSTRUCTION_INFO);

 From patch 17:

+static inline bool vmx_egpr_enabled(struct kvm_vcpu *vcpu)
+{
+	return vcpu->arch.xcr0 & XFEATURE_MASK_APX && is_64_bit_mode(vcpu);
+}

but here you must not check XCR0, the extended instruction information 
field is always available.  The spec says "A non-Intel® APX enabled VMM 
is free to continue using the legacy definition of the field, since lack 
of Intel® APX enabling will guarantee that regIDs are only 4-bits, 
maximum" but you can also use the extended instruction information field 
if you want.  So, I'd make this also static_cpu_has(X86_FEATURE_APX).

Paolo

>   
>   		/*
>   		 * According to spec, there's no need to store the guest's
> diff --git a/arch/x86/kvm/vmx/vmcs12.c b/arch/x86/kvm/vmx/vmcs12.c
> index 4233b5ca9461..ea2b690a419e 100644
> --- a/arch/x86/kvm/vmx/vmcs12.c
> +++ b/arch/x86/kvm/vmx/vmcs12.c
> @@ -53,6 +53,7 @@ const unsigned short vmcs12_field_offsets[] = {
>   	FIELD64(XSS_EXIT_BITMAP, xss_exit_bitmap),
>   	FIELD64(ENCLS_EXITING_BITMAP, encls_exiting_bitmap),
>   	FIELD64(GUEST_PHYSICAL_ADDRESS, guest_physical_address),
> +	FIELD64(EXTENDED_INSTRUCTION_INFO, extended_instruction_info),
>   	FIELD64(VMCS_LINK_POINTER, vmcs_link_pointer),
>   	FIELD64(GUEST_IA32_DEBUGCTL, guest_ia32_debugctl),
>   	FIELD64(GUEST_IA32_PAT, guest_ia32_pat),
> diff --git a/arch/x86/kvm/vmx/vmcs12.h b/arch/x86/kvm/vmx/vmcs12.h
> index 4ad6b16525b9..2146e45aaade 100644
> --- a/arch/x86/kvm/vmx/vmcs12.h
> +++ b/arch/x86/kvm/vmx/vmcs12.h
> @@ -71,7 +71,7 @@ struct __packed vmcs12 {
>   	u64 pml_address;
>   	u64 encls_exiting_bitmap;
>   	u64 tsc_multiplier;
> -	u64 padding64[1]; /* room for future expansion */
> +	u64 extended_instruction_info;
>   	/*
>   	 * To allow migration of L1 (complete with its L2 guests) between
>   	 * machines of different natural widths (32 or 64 bit), we cannot have
> @@ -261,6 +261,7 @@ static inline void vmx_check_vmcs12_offsets(void)
>   	CHECK_OFFSET(pml_address, 312);
>   	CHECK_OFFSET(encls_exiting_bitmap, 320);
>   	CHECK_OFFSET(tsc_multiplier, 328);
> +	CHECK_OFFSET(extended_instruction_info, 336);
>   	CHECK_OFFSET(cr0_guest_host_mask, 344);
>   	CHECK_OFFSET(cr4_guest_host_mask, 352);
>   	CHECK_OFFSET(cr0_read_shadow, 360);
> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> index 64a0772c883c..b8da6ebc35dc 100644
> --- a/arch/x86/kvm/vmx/vmx.h
> +++ b/arch/x86/kvm/vmx/vmx.h
> @@ -372,6 +372,8 @@ struct vmx_insn_info {
>   	union insn_info info;
>   };
>   
> +static inline bool vmx_egpr_enabled(struct kvm_vcpu *vcpu __maybe_unused) { return false; }
> +
>   static inline struct vmx_insn_info vmx_get_insn_info(struct kvm_vcpu *vcpu __maybe_unused)
>   {
>   	struct vmx_insn_info insn;

Re: [PATCH RFC v1 07/20] KVM: nVMX: Support the extended instruction info field

Posted by Chang S. Bae 2 months, 3 weeks ago

On 11/11/2025 9:48 AM, Paolo Bonzini wrote:
> 
> but here you must not check XCR0, the extended instruction information 
> field is always available.  The spec says "A non-Intel® APX enabled VMM 
> is free to continue using the legacy definition of the field, since lack 
> of Intel® APX enabling will guarantee that regIDs are only 4-bits, 
> maximum" but you can also use the extended instruction information field 
> if you want.  So, I'd make this also static_cpu_has(X86_FEATURE_APX).

I just got confirmation from the hardware folks. The CPUID enumeration 
alone is sufficient to indicate the presence and usability of the 
extended instruction information field, regardless of the XCR0 state.

Re: [PATCH RFC v1 07/20] KVM: nVMX: Support the extended instruction info field

Posted by Chao Gao 2 months, 4 weeks ago

>> --- a/arch/x86/kvm/vmx/nested.c
>> +++ b/arch/x86/kvm/vmx/nested.c
>> @@ -4798,6 +4798,8 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
>>   		vmcs12->vm_exit_intr_info = exit_intr_info;
>>   		vmcs12->vm_exit_instruction_len = exit_insn_len;
>>   		vmcs12->vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO);
>> +		if (vmx_egpr_enabled(vcpu))
>> +			vmcs12->extended_instruction_info = vmcs_read64(EXTENDED_INSTRUCTION_INFO);
>
>From patch 17:
>
>+static inline bool vmx_egpr_enabled(struct kvm_vcpu *vcpu)
>+{
>+	return vcpu->arch.xcr0 & XFEATURE_MASK_APX && is_64_bit_mode(vcpu);
>+}
>
>but here you must not check XCR0, the extended instruction information field
>is always available.  The spec says "A non-Intel® APX enabled VMM is free to
>continue using the legacy definition of the field, since lack of Intel® APX
>enabling will guarantee that regIDs are only 4-bits, maximum" but you can
>also use the extended instruction information field if you want.  So, I'd
>make this also static_cpu_has(X86_FEATURE_APX).

Shouldn't we check guest's capabilities rather than host's,

i.e., guest_cpu_cap_has(X86_FEATURE_APX)?

Re: [PATCH RFC v1 07/20] KVM: nVMX: Support the extended instruction info field

Posted by Paolo Bonzini 2 months, 3 weeks ago

Il mer 12 nov 2025, 02:54 Chao Gao <chao.gao@intel.com> ha scritto:
>
> Shouldn't we check guest's capabilities rather than host's,
>
> i.e., guest_cpu_cap_has(X86_FEATURE_APX)?

As the manual says, you're free to use the extended field if
available, and it's faster.

Paolo

Re: [PATCH RFC v1 07/20] KVM: nVMX: Support the extended instruction info field

Posted by Chao Gao 2 months, 3 weeks ago

On Tue, Nov 18, 2025 at 12:29:19AM +0100, Paolo Bonzini wrote:
>Il mer 12 nov 2025, 02:54 Chao Gao <chao.gao@intel.com> ha scritto:
>>
>> Shouldn't we check guest's capabilities rather than host's,
>>
>> i.e., guest_cpu_cap_has(X86_FEATURE_APX)?
>
>As the manual says, you're free to use the extended field if
>available, and it's faster.

The point is, from the guest's perspective, the field is available iff the vCPU
supports APX. KVM (L0) doesn't need to virtualize VMCS12's EII field if the vCPU
doesn't have APX.

For other call sites of vmx_egpr_enabled(), I agree we should just check host
capabilities.

Re: [PATCH RFC v1 07/20] KVM: nVMX: Support the extended instruction info field

Posted by Paolo Bonzini 2 months, 3 weeks ago

On Tue, Nov 18, 2025 at 2:39 AM Chao Gao <chao.gao@intel.com> wrote:
>
> On Tue, Nov 18, 2025 at 12:29:19AM +0100, Paolo Bonzini wrote:
> >Il mer 12 nov 2025, 02:54 Chao Gao <chao.gao@intel.com> ha scritto:
> >>
> >> Shouldn't we check guest's capabilities rather than host's,
> >>
> >> i.e., guest_cpu_cap_has(X86_FEATURE_APX)?
> >
> >As the manual says, you're free to use the extended field if
> >available, and it's faster.
>
> The point is, from the guest's perspective, the field is available iff the vCPU
> supports APX. KVM (L0) doesn't need to virtualize VMCS12's EII field if the vCPU
> doesn't have APX.

Well, it would be faster to just do it. But you're right, checking the
guest CPUID is consistent with other code, for example

        if (nested_cpu_has_vid(vmcs12))
                vmcs12->guest_intr_status = vmcs_read16(GUEST_INTR_STATUS);

Paolo

Re: [PATCH RFC v1 07/20] KVM: nVMX: Support the extended instruction info field

Posted by Chang S. Bae 2 months, 3 weeks ago

On 11/11/2025 5:54 PM, Chao Gao wrote:
> 
> Shouldn't we check guest's capabilities rather than host's,
> 
> i.e., guest_cpu_cap_has(X86_FEATURE_APX)?

Perhaps adding a comment like this would help clarify it:

   /*
    * The APX enumeration guarantees the presence of the extended
    * field. The host CPUID bit alone is sufficient to rely on it.
    */
   static inline bool vmx_ext_insn_info_available(void) {
	return static_cpu_has(X86_FEATURE_APX);
   }