[PATCH v3 4/6] KVM: x86: Add support for RDMSR/WRMSRNS w/ immediate on Intel

Sean Christopherson posted 6 patches 2 months ago
[PATCH v3 4/6] KVM: x86: Add support for RDMSR/WRMSRNS w/ immediate on Intel
Posted by Sean Christopherson 2 months ago
From: Xin Li <xin@zytor.com>

Add support for the immediate forms of RDMSR and WRMSRNS (currently
Intel-only).  The immediate variants are only valid in 64-bit mode, and
use a single general purpose register for the data (the register is also
encoded in the instruction, i.e. not implicit like regular RDMSR/WRMSR).

The immediate variants are primarily motivated by performance, not code
size: by having the MSR index in an immediate, it is available *much*
earlier in the CPU pipeline, which allows hardware much more leeway about
how a particular MSR is handled.

Intel VMX support for the immediate forms of MSR accesses communicates
exit information to the host as follows:

  1) The immediate form of RDMSR uses VM-Exit Reason 84.

  2) The immediate form of WRMSRNS uses VM-Exit Reason 85.

  3) For both VM-Exit reasons 84 and 85, the Exit Qualification field is
     set to the MSR index that triggered the VM-Exit.

  4) Bits 3 ~ 6 of the VM-Exit Instruction Information field are set to
     the register encoding used by the immediate form of the instruction,
     i.e. the destination register for RDMSR, and the source for WRMSRNS.

  5) The VM-Exit Instruction Length field records the size of the
     immediate form of the MSR instruction.

To deal with userspace RDMSR exits, stash the destination register in a
new kvm_vcpu_arch field, similar to cui_linear_rip, pio, etc.
Alternatively, the register could be saved in kvm_run.msr or re-retrieved
from the VMCS, but the former would require sanitizing the value to ensure
userspace doesn't clobber the value to an out-of-bounds index, and the
latter would require a new one-off kvm_x86_ops hook.

Don't bother adding support for the instructions in KVM's emulator, as the
only way for RDMSR/WRMSR to be encountered is if KVM is emulating large
swaths of code due to invalid guest state, and a vCPU cannot have invalid
guest state while in 64-bit mode.

Signed-off-by: Xin Li (Intel) <xin@zytor.com>
[sean: minor tweaks, massage and expand changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h |  3 ++
 arch/x86/include/uapi/asm/vmx.h |  6 +++-
 arch/x86/kvm/vmx/nested.c       | 13 ++++++--
 arch/x86/kvm/vmx/vmx.c          | 21 +++++++++++++
 arch/x86/kvm/vmx/vmx.h          |  5 +++
 arch/x86/kvm/x86.c              | 55 +++++++++++++++++++++++++++------
 6 files changed, 90 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index d7680612ba1e..dbdec6025fde 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -929,6 +929,7 @@ struct kvm_vcpu_arch {
 	bool emulate_regs_need_sync_from_vcpu;
 	int (*complete_userspace_io)(struct kvm_vcpu *vcpu);
 	unsigned long cui_linear_rip;
+	int cui_rdmsr_imm_reg;
 
 	gpa_t time;
 	s8  pvclock_tsc_shift;
@@ -2158,7 +2159,9 @@ int __kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data, bool host_initiat
 int kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data);
 int kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data);
 int kvm_emulate_rdmsr(struct kvm_vcpu *vcpu);
+int kvm_emulate_rdmsr_imm(struct kvm_vcpu *vcpu, u32 msr, int reg);
 int kvm_emulate_wrmsr(struct kvm_vcpu *vcpu);
+int kvm_emulate_wrmsr_imm(struct kvm_vcpu *vcpu, u32 msr, int reg);
 int kvm_emulate_as_nop(struct kvm_vcpu *vcpu);
 int kvm_emulate_invd(struct kvm_vcpu *vcpu);
 int kvm_emulate_mwait(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/include/uapi/asm/vmx.h b/arch/x86/include/uapi/asm/vmx.h
index f0f4a4cf84a7..9792e329343e 100644
--- a/arch/x86/include/uapi/asm/vmx.h
+++ b/arch/x86/include/uapi/asm/vmx.h
@@ -94,6 +94,8 @@
 #define EXIT_REASON_BUS_LOCK            74
 #define EXIT_REASON_NOTIFY              75
 #define EXIT_REASON_TDCALL              77
+#define EXIT_REASON_MSR_READ_IMM        84
+#define EXIT_REASON_MSR_WRITE_IMM       85
 
 #define VMX_EXIT_REASONS \
 	{ EXIT_REASON_EXCEPTION_NMI,         "EXCEPTION_NMI" }, \
@@ -158,7 +160,9 @@
 	{ EXIT_REASON_TPAUSE,                "TPAUSE" }, \
 	{ EXIT_REASON_BUS_LOCK,              "BUS_LOCK" }, \
 	{ EXIT_REASON_NOTIFY,                "NOTIFY" }, \
-	{ EXIT_REASON_TDCALL,                "TDCALL" }
+	{ EXIT_REASON_TDCALL,                "TDCALL" }, \
+	{ EXIT_REASON_MSR_READ_IMM,          "MSR_READ_IMM" }, \
+	{ EXIT_REASON_MSR_WRITE_IMM,         "MSR_WRITE_IMM" }
 
 #define VMX_EXIT_REASON_FLAGS \
 	{ VMX_EXIT_REASONS_FAILED_VMENTRY,	"FAILED_VMENTRY" }
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index db2fd4eedc90..798776dddd43 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -6216,19 +6216,26 @@ static bool nested_vmx_exit_handled_msr(struct kvm_vcpu *vcpu,
 					struct vmcs12 *vmcs12,
 					union vmx_exit_reason exit_reason)
 {
-	u32 msr_index = kvm_rcx_read(vcpu);
+	u32 msr_index;
 	gpa_t bitmap;
 
 	if (!nested_cpu_has(vmcs12, CPU_BASED_USE_MSR_BITMAPS))
 		return true;
 
+	if (exit_reason.basic == EXIT_REASON_MSR_READ_IMM ||
+	    exit_reason.basic == EXIT_REASON_MSR_WRITE_IMM)
+		msr_index = vmx_get_exit_qual(vcpu);
+	else
+		msr_index = kvm_rcx_read(vcpu);
+
 	/*
 	 * The MSR_BITMAP page is divided into four 1024-byte bitmaps,
 	 * for the four combinations of read/write and low/high MSR numbers.
 	 * First we need to figure out which of the four to use:
 	 */
 	bitmap = vmcs12->msr_bitmap;
-	if (exit_reason.basic == EXIT_REASON_MSR_WRITE)
+	if (exit_reason.basic == EXIT_REASON_MSR_WRITE ||
+	    exit_reason.basic == EXIT_REASON_MSR_WRITE_IMM)
 		bitmap += 2048;
 	if (msr_index >= 0xc0000000) {
 		msr_index -= 0xc0000000;
@@ -6527,6 +6534,8 @@ static bool nested_vmx_l1_wants_exit(struct kvm_vcpu *vcpu,
 		return nested_cpu_has2(vmcs12, SECONDARY_EXEC_DESC);
 	case EXIT_REASON_MSR_READ:
 	case EXIT_REASON_MSR_WRITE:
+	case EXIT_REASON_MSR_READ_IMM:
+	case EXIT_REASON_MSR_WRITE_IMM:
 		return nested_vmx_exit_handled_msr(vcpu, vmcs12, exit_reason);
 	case EXIT_REASON_INVALID_STATE:
 		return true;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index ae2c8c10e5d2..44423d5f0e27 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6003,6 +6003,23 @@ static int handle_notify(struct kvm_vcpu *vcpu)
 	return 1;
 }
 
+static int vmx_get_msr_imm_reg(struct kvm_vcpu *vcpu)
+{
+	return vmx_get_instr_info_reg(vmcs_read32(VMX_INSTRUCTION_INFO));
+}
+
+static int handle_rdmsr_imm(struct kvm_vcpu *vcpu)
+{
+	return kvm_emulate_rdmsr_imm(vcpu, vmx_get_exit_qual(vcpu),
+				     vmx_get_msr_imm_reg(vcpu));
+}
+
+static int handle_wrmsr_imm(struct kvm_vcpu *vcpu)
+{
+	return kvm_emulate_wrmsr_imm(vcpu, vmx_get_exit_qual(vcpu),
+				     vmx_get_msr_imm_reg(vcpu));
+}
+
 /*
  * The exit handlers return 1 if the exit was handled fully and guest execution
  * may resume.  Otherwise they set the kvm_run parameter to indicate what needs
@@ -6061,6 +6078,8 @@ static int (*kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
 	[EXIT_REASON_ENCLS]		      = handle_encls,
 	[EXIT_REASON_BUS_LOCK]                = handle_bus_lock_vmexit,
 	[EXIT_REASON_NOTIFY]		      = handle_notify,
+	[EXIT_REASON_MSR_READ_IMM]            = handle_rdmsr_imm,
+	[EXIT_REASON_MSR_WRITE_IMM]           = handle_wrmsr_imm,
 };
 
 static const int kvm_vmx_max_exit_handlers =
@@ -6495,6 +6514,8 @@ static int __vmx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
 #ifdef CONFIG_MITIGATION_RETPOLINE
 	if (exit_reason.basic == EXIT_REASON_MSR_WRITE)
 		return kvm_emulate_wrmsr(vcpu);
+	else if (exit_reason.basic == EXIT_REASON_MSR_WRITE_IMM)
+		return handle_wrmsr_imm(vcpu);
 	else if (exit_reason.basic == EXIT_REASON_PREEMPTION_TIMER)
 		return handle_preemption_timer(vcpu);
 	else if (exit_reason.basic == EXIT_REASON_INTERRUPT_WINDOW)
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index d3389baf3ab3..24d65dac5e89 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -706,6 +706,11 @@ static inline bool vmx_guest_state_valid(struct kvm_vcpu *vcpu)
 
 void dump_vmcs(struct kvm_vcpu *vcpu);
 
+static inline int vmx_get_instr_info_reg(u32 vmx_instr_info)
+{
+	return (vmx_instr_info >> 3) & 0xf;
+}
+
 static inline int vmx_get_instr_info_reg2(u32 vmx_instr_info)
 {
 	return (vmx_instr_info >> 28) & 0xf;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6470f0ab2060..79c3074dbd60 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1990,6 +1990,15 @@ static int complete_fast_rdmsr(struct kvm_vcpu *vcpu)
 	return complete_fast_msr_access(vcpu);
 }
 
+static int complete_fast_rdmsr_imm(struct kvm_vcpu *vcpu)
+{
+	if (!vcpu->run->msr.error)
+		kvm_register_write(vcpu, vcpu->arch.cui_rdmsr_imm_reg,
+				   vcpu->run->msr.data);
+
+	return complete_fast_msr_access(vcpu);
+}
+
 static u64 kvm_msr_reason(int r)
 {
 	switch (r) {
@@ -2024,39 +2033,53 @@ static int kvm_msr_user_space(struct kvm_vcpu *vcpu, u32 index,
 	return 1;
 }
 
-int kvm_emulate_rdmsr(struct kvm_vcpu *vcpu)
+static int __kvm_emulate_rdmsr(struct kvm_vcpu *vcpu, u32 msr, int reg,
+			       int (*complete_rdmsr)(struct kvm_vcpu *))
 {
-	u32 msr = kvm_rcx_read(vcpu);
 	u64 data;
 	int r;
 
 	r = kvm_get_msr_with_filter(vcpu, msr, &data);
-
 	if (!r) {
 		trace_kvm_msr_read(msr, data);
 
-		kvm_rax_write(vcpu, data & -1u);
-		kvm_rdx_write(vcpu, (data >> 32) & -1u);
+		if (reg < 0) {
+			kvm_rax_write(vcpu, data & -1u);
+			kvm_rdx_write(vcpu, (data >> 32) & -1u);
+		} else {
+			kvm_register_write(vcpu, reg, data);
+		}
 	} else {
 		/* MSR read failed? See if we should ask user space */
 		if (kvm_msr_user_space(vcpu, msr, KVM_EXIT_X86_RDMSR, 0,
-				       complete_fast_rdmsr, r))
+				       complete_rdmsr, r))
 			return 0;
 		trace_kvm_msr_read_ex(msr);
 	}
 
 	return kvm_x86_call(complete_emulated_msr)(vcpu, r);
 }
+
+int kvm_emulate_rdmsr(struct kvm_vcpu *vcpu)
+{
+	return __kvm_emulate_rdmsr(vcpu, kvm_rcx_read(vcpu), -1,
+				   complete_fast_rdmsr);
+}
 EXPORT_SYMBOL_GPL(kvm_emulate_rdmsr);
 
-int kvm_emulate_wrmsr(struct kvm_vcpu *vcpu)
+int kvm_emulate_rdmsr_imm(struct kvm_vcpu *vcpu, u32 msr, int reg)
+{
+	vcpu->arch.cui_rdmsr_imm_reg = reg;
+
+	return __kvm_emulate_rdmsr(vcpu, msr, reg, complete_fast_rdmsr_imm);
+}
+EXPORT_SYMBOL_GPL(kvm_emulate_rdmsr_imm);
+
+static int __kvm_emulate_wrmsr(struct kvm_vcpu *vcpu, u32 msr, u64 data)
 {
-	u32 msr = kvm_rcx_read(vcpu);
-	u64 data = kvm_read_edx_eax(vcpu);
 	int r;
 
 	r = kvm_set_msr_with_filter(vcpu, msr, data);
-
 	if (!r) {
 		trace_kvm_msr_write(msr, data);
 	} else {
@@ -2072,8 +2095,20 @@ int kvm_emulate_wrmsr(struct kvm_vcpu *vcpu)
 
 	return kvm_x86_call(complete_emulated_msr)(vcpu, r);
 }
+
+int kvm_emulate_wrmsr(struct kvm_vcpu *vcpu)
+{
+	return __kvm_emulate_wrmsr(vcpu, kvm_rcx_read(vcpu),
+				   kvm_read_edx_eax(vcpu));
+}
 EXPORT_SYMBOL_GPL(kvm_emulate_wrmsr);
 
+int kvm_emulate_wrmsr_imm(struct kvm_vcpu *vcpu, u32 msr, int reg)
+{
+	return __kvm_emulate_wrmsr(vcpu, msr, kvm_register_read(vcpu, reg));
+}
+EXPORT_SYMBOL_GPL(kvm_emulate_wrmsr_imm);
+
 int kvm_emulate_as_nop(struct kvm_vcpu *vcpu)
 {
 	return kvm_skip_emulated_instruction(vcpu);
-- 
2.50.1.565.gc32cd1483b-goog
Re: [PATCH v3 4/6] KVM: x86: Add support for RDMSR/WRMSRNS w/ immediate on Intel
Posted by Xiaoyao Li 1 month ago
On 8/6/2025 4:22 AM, Sean Christopherson wrote:
> +static int handle_rdmsr_imm(struct kvm_vcpu *vcpu)
> +{
> +	return kvm_emulate_rdmsr_imm(vcpu, vmx_get_exit_qual(vcpu),
> +				     vmx_get_msr_imm_reg(vcpu));
> +}
> +
> +static int handle_wrmsr_imm(struct kvm_vcpu *vcpu)
> +{
> +	return kvm_emulate_wrmsr_imm(vcpu, vmx_get_exit_qual(vcpu),
> +				     vmx_get_msr_imm_reg(vcpu));
> +}

We need to inject #UD for !guest_cpu_has(X86_FEATURE_MSR_IMM)
Re: [PATCH v3 4/6] KVM: x86: Add support for RDMSR/WRMSRNS w/ immediate on Intel
Posted by Binbin Wu 1 month ago

On 9/1/2025 12:13 PM, Xiaoyao Li wrote:
> On 8/6/2025 4:22 AM, Sean Christopherson wrote:
>> +static int handle_rdmsr_imm(struct kvm_vcpu *vcpu)
>> +{
>> +    return kvm_emulate_rdmsr_imm(vcpu, vmx_get_exit_qual(vcpu),
>> +                     vmx_get_msr_imm_reg(vcpu));
>> +}
>> +
>> +static int handle_wrmsr_imm(struct kvm_vcpu *vcpu)
>> +{
>> +    return kvm_emulate_wrmsr_imm(vcpu, vmx_get_exit_qual(vcpu),
>> +                     vmx_get_msr_imm_reg(vcpu));
>> +}
>
> We need to inject #UD for !guest_cpu_has(X86_FEATURE_MSR_IMM)
>

Indeed.

There is a virtualization hole of this feature for the accesses to the MSRs not
intercepted. IIUIC, there is no other control in VMX for this feature. If the
feature is supported in hardware, the guest will succeed when it accesses to the
MSRs not intercepted even when the feature is not exposed to the guest, but the
guest will get #UD when access to the MSRs intercepted if KVM injects #UD.

But I guess this is the guest's fault by not following the CPUID, KVM should
still follow the spec?
Re: [PATCH v3 4/6] KVM: x86: Add support for RDMSR/WRMSRNS w/ immediate on Intel
Posted by Xin Li 1 month ago
On 8/31/2025 11:34 PM, Binbin Wu wrote:
>> We need to inject #UD for !guest_cpu_has(X86_FEATURE_MSR_IMM)
>>
> 
> Indeed.

Good catch!

> 
> There is a virtualization hole of this feature for the accesses to the MSRs 
> not
> intercepted. IIUIC, there is no other control in VMX for this feature. If the
> feature is supported in hardware, the guest will succeed when it accesses 
> to the
> MSRs not intercepted even when the feature is not exposed to the guest, but 
> the
> guest will get #UD when access to the MSRs intercepted if KVM injects #UD.

hpa mentioned this when I just started the work.  But I managed to forget
it later... Sigh!

> 
> But I guess this is the guest's fault by not following the CPUID, KVM should
> still follow the spec?

I think we should still inject #UD when a MSR is intercepted by KVM.
Re: [PATCH v3 4/6] KVM: x86: Add support for RDMSR/WRMSRNS w/ immediate on Intel
Posted by Xiaoyao Li 1 month ago
On 9/1/2025 3:04 PM, Xin Li wrote:
> On 8/31/2025 11:34 PM, Binbin Wu wrote:
>>> We need to inject #UD for !guest_cpu_has(X86_FEATURE_MSR_IMM)
>>>
>>
>> Indeed.
> 
> Good catch!
> 
>>
>> There is a virtualization hole of this feature for the accesses to the 
>> MSRs not
>> intercepted. IIUIC, there is no other control in VMX for this feature. 
>> If the
>> feature is supported in hardware, the guest will succeed when it 
>> accesses to the
>> MSRs not intercepted even when the feature is not exposed to the 
>> guest, but the
>> guest will get #UD when access to the MSRs intercepted if KVM injects 
>> #UD.
> 
> hpa mentioned this when I just started the work.  But I managed to forget
> it later... Sigh!
> 
>>
>> But I guess this is the guest's fault by not following the CPUID, KVM 
>> should
>> still follow the spec?
> 
> I think we should still inject #UD when a MSR is intercepted by KVM.
> 

For handle_wrmsr_imm(), it seems we need to check 
guest_cpu_cap_has(X86_FEATURE_WRMSRNS) as well, since immediate form of 
MSR write is only supported on WRMSRNS instruction.

It leads to another topic, do we need to bother checking the opcode of 
the instruction on EXIT_REASON_MSR_WRITE and inject #UD when it is 
WRMSRNS instuction and !guest_cpu_cap_has(X86_FEATURE_WRMSRNS)?

WRMSRNS has virtualization hole as well, but KVM at least can emulate 
the architectural behavior when the write on MSRs are not pass through.
Re: [PATCH v3 4/6] KVM: x86: Add support for RDMSR/WRMSRNS w/ immediate on Intel
Posted by Sean Christopherson 1 month ago
On Mon, Sep 01, 2025, Xiaoyao Li wrote:
> On 9/1/2025 3:04 PM, Xin Li wrote:
> > On 8/31/2025 11:34 PM, Binbin Wu wrote:
> > > > We need to inject #UD for !guest_cpu_has(X86_FEATURE_MSR_IMM)
> > > > 
> > > 
> > > Indeed.
> > 
> > Good catch!
> > 
> > > 
> > > There is a virtualization hole of this feature for the accesses to the
> > > MSRs not intercepted. IIUIC, there is no other control in VMX for this
> > > feature. If the feature is supported in hardware, the guest will succeed
> > > when it accesses to the MSRs not intercepted even when the feature is not
> > > exposed to the guest, but the guest will get #UD when access to the MSRs
> > > intercepted if KVM injects #UD.
> > 
> > hpa mentioned this when I just started the work.  But I managed to forget
> > it later... Sigh!
> > 
> > > 
> > > But I guess this is the guest's fault by not following the CPUID,
> > > KVM should
> > > still follow the spec?
> > 
> > I think we should still inject #UD when a MSR is intercepted by KVM.

Hmm, no, inconsistent behavior (from the guest's perspective) is likely worse
than eating with the virtualization hole.  Practically speaking, the only guest
that's going to be surprised by the hole is a guest that's fuzzing opcodes, and
a guest that's fuzzing opcodes at CPL0 isn't is going to create an inherently
unstable environment no matter what.

Though that raises the question of whether or not KVM should emulate WRMSRNS and
whatever the official name for the "RDMSR with immediate" instruction is (I can't
find it in the SDM).  I'm leaning "no", because outside of forced emulation, KVM
should only "need" to emulate the instructions if Unrestricted Guest is disabled,
the instructions should only be supported on CPUs with unrestricted guest, there's
no sane reason (other than testing) to run a guest without Unrestricted Guest,
and using the instructions in Big RM would be quite bizarre.  On the other hand,
adding emulation support should be quite easy...

Side topic, does RDMSRLIST have any VMX controls?

> For handle_wrmsr_imm(), it seems we need to check
> guest_cpu_cap_has(X86_FEATURE_WRMSRNS) as well, since immediate form of MSR
> write is only supported on WRMSRNS instruction.
> 
> It leads to another topic, do we need to bother checking the opcode of the
> instruction on EXIT_REASON_MSR_WRITE and inject #UD when it is WRMSRNS
> instuction and !guest_cpu_cap_has(X86_FEATURE_WRMSRNS)?
> 
> WRMSRNS has virtualization hole as well, but KVM at least can emulate the
> architectural behavior when the write on MSRs are not pass through.
Re: [PATCH v3 4/6] KVM: x86: Add support for RDMSR/WRMSRNS w/ immediate on Intel
Posted by Xiaoyao Li 1 month ago
On 9/2/2025 10:50 PM, Sean Christopherson wrote:
> On Mon, Sep 01, 2025, Xiaoyao Li wrote:
>> On 9/1/2025 3:04 PM, Xin Li wrote:
>>> On 8/31/2025 11:34 PM, Binbin Wu wrote:
>>>>> We need to inject #UD for !guest_cpu_has(X86_FEATURE_MSR_IMM)
>>>>>
>>>>
>>>> Indeed.
>>>
>>> Good catch!
>>>
>>>>
>>>> There is a virtualization hole of this feature for the accesses to the
>>>> MSRs not intercepted. IIUIC, there is no other control in VMX for this
>>>> feature. If the feature is supported in hardware, the guest will succeed
>>>> when it accesses to the MSRs not intercepted even when the feature is not
>>>> exposed to the guest, but the guest will get #UD when access to the MSRs
>>>> intercepted if KVM injects #UD.
>>>
>>> hpa mentioned this when I just started the work.  But I managed to forget
>>> it later... Sigh!
>>>
>>>>
>>>> But I guess this is the guest's fault by not following the CPUID,
>>>> KVM should
>>>> still follow the spec?
>>>
>>> I think we should still inject #UD when a MSR is intercepted by KVM.
> 
> Hmm, no, inconsistent behavior (from the guest's perspective) is likely worse
> than eating with the virtualization hole.  

Then could we document this design decision somewhere?

I believe people won't stop wondering why not inject #UD when no guest 
CPUID, when reading the code.

> Practically speaking, the only guest
> that's going to be surprised by the hole is a guest that's fuzzing opcodes, and
> a guest that's fuzzing opcodes at CPL0 isn't is going to create an inherently
> unstable environment no matter what.
> 
> Though that raises the question of whether or not KVM should emulate WRMSRNS and
> whatever the official name for the "RDMSR with immediate" instruction is (I can't
> find it in the SDM).  

do you mean because guest might be able to use immediate form of MSR 
access even if the CPUID doesn't advertise it, should KVM emulate it on 
platform doesn't support it, to make sure immediate form of MSR access 
is always supported?

> I'm leaning "no", because outside of forced emulation, KVM
> should only "need" to emulate the instructions if Unrestricted Guest is disabled,
> the instructions should only be supported on CPUs with unrestricted guest, there's
> no sane reason (other than testing) to run a guest without Unrestricted Guest,
> and using the instructions in Big RM would be quite bizarre.  On the other hand,
> adding emulation support should be quite easy...
> 
> Side topic, does RDMSRLIST have any VMX controls?
> 
>> For handle_wrmsr_imm(), it seems we need to check
>> guest_cpu_cap_has(X86_FEATURE_WRMSRNS) as well, since immediate form of MSR
>> write is only supported on WRMSRNS instruction.
>>
>> It leads to another topic, do we need to bother checking the opcode of the
>> instruction on EXIT_REASON_MSR_WRITE and inject #UD when it is WRMSRNS
>> instuction and !guest_cpu_cap_has(X86_FEATURE_WRMSRNS)?
>>
>> WRMSRNS has virtualization hole as well, but KVM at least can emulate the
>> architectural behavior when the write on MSRs are not pass through.

Re: [PATCH v3 4/6] KVM: x86: Add support for RDMSR/WRMSRNS w/ immediate on Intel
Posted by Sean Christopherson 2 weeks, 2 days ago
On Wed, Sep 03, 2025, Xiaoyao Li wrote:
> On 9/2/2025 10:50 PM, Sean Christopherson wrote:
> > On Mon, Sep 01, 2025, Xiaoyao Li wrote:
> > > On 9/1/2025 3:04 PM, Xin Li wrote:
> > > > On 8/31/2025 11:34 PM, Binbin Wu wrote:
> > > > > > We need to inject #UD for !guest_cpu_has(X86_FEATURE_MSR_IMM)
> > > > > > 
> > > > > 
> > > > > Indeed.
> > > > 
> > > > Good catch!
> > > > 
> > > > > 
> > > > > There is a virtualization hole of this feature for the accesses to the
> > > > > MSRs not intercepted. IIUIC, there is no other control in VMX for this
> > > > > feature. If the feature is supported in hardware, the guest will succeed
> > > > > when it accesses to the MSRs not intercepted even when the feature is not
> > > > > exposed to the guest, but the guest will get #UD when access to the MSRs
> > > > > intercepted if KVM injects #UD.
> > > > 
> > > > hpa mentioned this when I just started the work.  But I managed to forget
> > > > it later... Sigh!
> > > > 
> > > > > 
> > > > > But I guess this is the guest's fault by not following the CPUID,
> > > > > KVM should
> > > > > still follow the spec?
> > > > 
> > > > I think we should still inject #UD when a MSR is intercepted by KVM.
> > 
> > Hmm, no, inconsistent behavior (from the guest's perspective) is likely worse
> > than eating with the virtualization hole.
> 
> Then could we document this design decision somewhere?

Yeah, Documentation/virt/kvm/x86/errata.rst would be the place for that.

> I believe people won't stop wondering why not inject #UD when no guest
> CPUID, when reading the code.
> 
> > Practically speaking, the only guest that's going to be surprised by the
> > hole is a guest that's fuzzing opcodes, and a guest that's fuzzing opcodes
> > at CPL0 isn't is going to create an inherently unstable environment no
> > matter what.
> > 
> > Though that raises the question of whether or not KVM should emulate WRMSRNS and
> > whatever the official name for the "RDMSR with immediate" instruction is (I can't
> > find it in the SDM).
> 
> do you mean because guest might be able to use immediate form of MSR access
> even if the CPUID doesn't advertise it, should KVM emulate it on platform
> doesn't support it, to make sure immediate form of MSR access is always
> supported?

No, I'm calling out that as implemented, KVM doesn't support emulating WRMSRNS
at all.  Most of me just doesn't care.  The only way to encounter WRMSRNS would
be to disable unrestricted guest and execute WRMSRNS in an odd context, or to
force emulation.  Adding emulation support just for forced emulation is absurd,
and adding WRMSRNS emulation for !URG is almost as ridiculous.
Re: [PATCH v3 4/6] KVM: x86: Add support for RDMSR/WRMSRNS w/ immediate on Intel
Posted by Xin Li 1 month ago
On 9/2/2025 7:50 AM, Sean Christopherson wrote:
> On Mon, Sep 01, 2025, Xiaoyao Li wrote:
>> On 9/1/2025 3:04 PM, Xin Li wrote:
>>> On 8/31/2025 11:34 PM, Binbin Wu wrote:
>>>>> We need to inject #UD for !guest_cpu_has(X86_FEATURE_MSR_IMM)
>>>>>
>>>>
>>>> Indeed.
>>>
>>> Good catch!
>>>
>>>>
>>>> There is a virtualization hole of this feature for the accesses to the
>>>> MSRs not intercepted. IIUIC, there is no other control in VMX for this
>>>> feature. If the feature is supported in hardware, the guest will succeed
>>>> when it accesses to the MSRs not intercepted even when the feature is not
>>>> exposed to the guest, but the guest will get #UD when access to the MSRs
>>>> intercepted if KVM injects #UD.
>>>
>>> hpa mentioned this when I just started the work.  But I managed to forget
>>> it later... Sigh!
>>>
>>>>
>>>> But I guess this is the guest's fault by not following the CPUID,
>>>> KVM should
>>>> still follow the spec?
>>>
>>> I think we should still inject #UD when a MSR is intercepted by KVM.
> 
> Hmm, no, inconsistent behavior (from the guest's perspective) is likely worse
> than eating with the virtualization hole.  Practically speaking, the only guest
> that's going to be surprised by the hole is a guest that's fuzzing opcodes, and
> a guest that's fuzzing opcodes at CPL0 isn't is going to create an inherently
> unstable environment no matter what.

Hmm, a malicious guest could *smartly* avoid causing such vmexits.  So more
or less that is wasteful and no objection here.

> 
> Though that raises the question of whether or not KVM should emulate WRMSRNS and
> whatever the official name for the "RDMSR with immediate" instruction is (I can't
> find it in the SDM).  I'm leaning "no", because outside of forced emulation, KVM
> should only "need" to emulate the instructions if Unrestricted Guest is disabled,
> the instructions should only be supported on CPUs with unrestricted guest, there's
> no sane reason (other than testing) to run a guest without Unrestricted Guest,
> and using the instructions in Big RM would be quite bizarre.  On the other hand,
> adding emulation support should be quite easy...
> 
> Side topic, does RDMSRLIST have any VMX controls?

Yes, bit 6 of Tertiary Processor-Based VM-Execution Controls, below is C&P
from Intel SDM:

Enable MSR-list instructions control: if this control is 0, any execution 
of RDMSRLIST or WRMSRLIST causes a #UD.

If the control is 1, the instruction commences normally, writing one MSR at
a time. Writes to certain MSRs are treated specially as described above for
WRMSR and WRMSRNS. In addition, attempts to access specific MSRs may cause 
VM exits.

For RDMSRLIST and WRMSRLIST, the exit qualification depends on the setting
of the “use MSR bitmaps” VM-execution control. If the control is 0, the
exit qualification is zero. If the control is 1, the exit qualification
is the index of the MSR whose access caused the VM exit.