Architecturally, VMRUN/VMLOAD/VMSAVE should generate a #GP if the
physical address in RAX is not supported. check_svme_pa() hardcodes this
to checking that bits 63-48 are not set. This is incorrect on HW
supporting 52 bits of physical address space, so use maxphyaddr instead.
Note that the host's maxphyaddr is used, not the guest, because the
emulator path for VMLOAD/VMSAVE is generally used when virtual
VMLOAD/VMSAVE is enabled AND a #NPF is generated. If a #NPF is not
generated, the CPU will inject a #GP based on the host's maxphyaddr. So
this keeps the behavior consistent.
If KVM wants to consistently inject a #GP based on the guest's
maxphyaddr, it would need to disabled virtual VMLOAD/VMSAVE and
intercept all VMLOAD/VMSAVE instructions to do the check.
Also, emulating a smaller maxphyaddr for the guest than the host
generally doesn't work well, so it's not worth handling this.
Fixes: 01de8b09e606 ("KVM: SVM: Add intercept checks for SVM instructions")
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
arch/x86/kvm/emulate.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 6145dac4a605a..9ea2584dda912 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -3887,8 +3887,7 @@ static int check_svme_pa(struct x86_emulate_ctxt *ctxt)
{
u64 rax = reg_read(ctxt, VCPU_REGS_RAX);
- /* Valid physical address? */
- if (rax & 0xffff000000000000ULL)
+ if (rax & rsvd_bits(kvm_host.maxphyaddr, 63))
return emulate_gp(ctxt, 0);
return check_svme(ctxt);
--
2.53.0.473.g4a7958ca14-goog
On Fri, Mar 6, 2026 at 1:09 PM Yosry Ahmed <yosry@kernel.org> wrote: > > Architecturally, VMRUN/VMLOAD/VMSAVE should generate a #GP if the > physical address in RAX is not supported. check_svme_pa() hardcodes this > to checking that bits 63-48 are not set. This is incorrect on HW > supporting 52 bits of physical address space, so use maxphyaddr instead. > > Note that the host's maxphyaddr is used, not the guest, because the > emulator path for VMLOAD/VMSAVE is generally used when virtual > VMLOAD/VMSAVE is enabled AND a #NPF is generated. If a #NPF is not > generated, the CPU will inject a #GP based on the host's maxphyaddr. So > this keeps the behavior consistent. > > If KVM wants to consistently inject a #GP based on the guest's > maxphyaddr, it would need to disabled virtual VMLOAD/VMSAVE and > intercept all VMLOAD/VMSAVE instructions to do the check. > > Also, emulating a smaller maxphyaddr for the guest than the host > generally doesn't work well, so it's not worth handling this. If we're going to throw in the towel on allow_smaller_maxphyaddr, the code should be removed. In any case, the check should logically be against the guest's maxphyaddr, because the VMLOAD/VMSAVE instruction executes in guest context. Note that virtual VMLOAD/VMSAVE cannot be used if the guest's maxphyaddr doesn't match the host's maxphyaddr.
On Fri, Mar 6, 2026 at 2:27 PM Jim Mattson <jmattson@google.com> wrote: > > On Fri, Mar 6, 2026 at 1:09 PM Yosry Ahmed <yosry@kernel.org> wrote: > > > > Architecturally, VMRUN/VMLOAD/VMSAVE should generate a #GP if the > > physical address in RAX is not supported. check_svme_pa() hardcodes this > > to checking that bits 63-48 are not set. This is incorrect on HW > > supporting 52 bits of physical address space, so use maxphyaddr instead. > > > > Note that the host's maxphyaddr is used, not the guest, because the > > emulator path for VMLOAD/VMSAVE is generally used when virtual > > VMLOAD/VMSAVE is enabled AND a #NPF is generated. If a #NPF is not > > generated, the CPU will inject a #GP based on the host's maxphyaddr. So > > this keeps the behavior consistent. > > > > If KVM wants to consistently inject a #GP based on the guest's > > maxphyaddr, it would need to disabled virtual VMLOAD/VMSAVE and > > intercept all VMLOAD/VMSAVE instructions to do the check. > > > > Also, emulating a smaller maxphyaddr for the guest than the host > > generally doesn't work well, so it's not worth handling this. > > If we're going to throw in the towel on allow_smaller_maxphyaddr, the > code should be removed. > > In any case, the check should logically be against the guest's > maxphyaddr, because the VMLOAD/VMSAVE instruction executes in guest > context. Right, but I am trying to have the #GP check for VMLOAD/VMSAVE behave consistently with vls=1, whether it's done by the hardware or the emulator. > > Note that virtual VMLOAD/VMSAVE cannot be used if the guest's > maxphyaddr doesn't match the host's maxphyaddr. Not sure what you mean? Do you mean it wouldn't be correct to use it? AFAICT that doesn't prevent it from being enabled.
On Fri, Mar 6, 2026 at 2:37 PM Yosry Ahmed <yosry@kernel.org> wrote: > > On Fri, Mar 6, 2026 at 2:27 PM Jim Mattson <jmattson@google.com> wrote: > > > > On Fri, Mar 6, 2026 at 1:09 PM Yosry Ahmed <yosry@kernel.org> wrote: > > > > > > Architecturally, VMRUN/VMLOAD/VMSAVE should generate a #GP if the > > > physical address in RAX is not supported. check_svme_pa() hardcodes this > > > to checking that bits 63-48 are not set. This is incorrect on HW > > > supporting 52 bits of physical address space, so use maxphyaddr instead. > > > > > > Note that the host's maxphyaddr is used, not the guest, because the > > > emulator path for VMLOAD/VMSAVE is generally used when virtual > > > VMLOAD/VMSAVE is enabled AND a #NPF is generated. If a #NPF is not > > > generated, the CPU will inject a #GP based on the host's maxphyaddr. So > > > this keeps the behavior consistent. > > > > > > If KVM wants to consistently inject a #GP based on the guest's > > > maxphyaddr, it would need to disabled virtual VMLOAD/VMSAVE and > > > intercept all VMLOAD/VMSAVE instructions to do the check. > > > > > > Also, emulating a smaller maxphyaddr for the guest than the host > > > generally doesn't work well, so it's not worth handling this. > > > > If we're going to throw in the towel on allow_smaller_maxphyaddr, the > > code should be removed. > > > > In any case, the check should logically be against the guest's > > maxphyaddr, because the VMLOAD/VMSAVE instruction executes in guest > > context. > > Right, but I am trying to have the #GP check for VMLOAD/VMSAVE behave > consistently with vls=1, whether it's done by the hardware or the > emulator. Consistency should not be an issue, since VLS cannot be enabled when the MAXPHYADDRs differ. VLS doesn't work in that scenario. > > > > Note that virtual VMLOAD/VMSAVE cannot be used if the guest's > > maxphyaddr doesn't match the host's maxphyaddr. > > Not sure what you mean? Do you mean it wouldn't be correct to use it? > AFAICT that doesn't prevent it from being enabled. It is incorrect to use VLS when it doesn't work.
On Fri, Mar 06, 2026, Jim Mattson wrote:
> On Fri, Mar 6, 2026 at 2:37 PM Yosry Ahmed <yosry@kernel.org> wrote:
> >
> > On Fri, Mar 6, 2026 at 2:27 PM Jim Mattson <jmattson@google.com> wrote:
> > >
> > > On Fri, Mar 6, 2026 at 1:09 PM Yosry Ahmed <yosry@kernel.org> wrote:
> > > >
> > > > Architecturally, VMRUN/VMLOAD/VMSAVE should generate a #GP if the
> > > > physical address in RAX is not supported. check_svme_pa() hardcodes this
> > > > to checking that bits 63-48 are not set. This is incorrect on HW
> > > > supporting 52 bits of physical address space, so use maxphyaddr instead.
> > > >
> > > > Note that the host's maxphyaddr is used, not the guest, because the
> > > > emulator path for VMLOAD/VMSAVE is generally used when virtual
> > > > VMLOAD/VMSAVE is enabled AND a #NPF is generated. If a #NPF is not
> > > > generated, the CPU will inject a #GP based on the host's maxphyaddr. So
> > > > this keeps the behavior consistent.
> > > >
> > > > If KVM wants to consistently inject a #GP based on the guest's
> > > > maxphyaddr, it would need to disabled virtual VMLOAD/VMSAVE and
> > > > intercept all VMLOAD/VMSAVE instructions to do the check.
> > > >
> > > > Also, emulating a smaller maxphyaddr for the guest than the host
> > > > generally doesn't work well, so it's not worth handling this.
> > >
> > > If we're going to throw in the towel on allow_smaller_maxphyaddr, the
> > > code should be removed.
> > >
> > > In any case, the check should logically be against the guest's
> > > maxphyaddr, because the VMLOAD/VMSAVE instruction executes in guest
> > > context.
> >
> > Right, but I am trying to have the #GP check for VMLOAD/VMSAVE behave
> > consistently with vls=1, whether it's done by the hardware or the
> > emulator.
>
> Consistency should not be an issue, since VLS cannot be enabled when
> the MAXPHYADDRs differ. VLS doesn't work in that scenario.
>
> > >
> > > Note that virtual VMLOAD/VMSAVE cannot be used if the guest's
> > > maxphyaddr doesn't match the host's maxphyaddr.
> >
> > Not sure what you mean? Do you mean it wouldn't be correct to use it?
> > AFAICT that doesn't prevent it from being enabled.
It does, actually. KVM doesn't support allow_smaller_maxphyaddr when NPT is
enabled, because AMD CPUs (and now some Intel CPUs, lolz) do A/D updates before
signalling the reserved #NPF.
allow_smaller_maxphyaddr = !npt_enabled;
And vls is disabled if NPT is disabled, for all the reasons Jim is pointing out.
if (vls) {
if (!npt_enabled ||
!boot_cpu_has(X86_FEATURE_V_VMSAVE_VMLOAD) ||
!IS_ENABLED(CONFIG_X86_64)) {
vls = false;
} else {
pr_info("Virtual VMLOAD VMSAVE supported\n");
}
}
Thus running with allow_smaller_maxphyaddr+vls is impossible.
> It is incorrect to use VLS when it doesn't work.
> > > >
> > > > Note that virtual VMLOAD/VMSAVE cannot be used if the guest's
> > > > maxphyaddr doesn't match the host's maxphyaddr.
> > >
> > > Not sure what you mean? Do you mean it wouldn't be correct to use it?
> > > AFAICT that doesn't prevent it from being enabled.
>
> It does, actually. KVM doesn't support allow_smaller_maxphyaddr when NPT is
> enabled, because AMD CPUs (and now some Intel CPUs, lolz) do A/D updates before
> signalling the reserved #NPF.
>
> allow_smaller_maxphyaddr = !npt_enabled;
>
>
> And vls is disabled if NPT is disabled, for all the reasons Jim is pointing out.
>
> if (vls) {
> if (!npt_enabled ||
> !boot_cpu_has(X86_FEATURE_V_VMSAVE_VMLOAD) ||
> !IS_ENABLED(CONFIG_X86_64)) {
> vls = false;
> } else {
> pr_info("Virtual VMLOAD VMSAVE supported\n");
> }
> }
>
> Thus running with allow_smaller_maxphyaddr+vls is impossible.
Oh, well never mind then.. :)
Will switch to cpuid_maxphyaddr(vcpu) in the next version (assuming
there's one).
> > Right, but I am trying to have the #GP check for VMLOAD/VMSAVE behave > > consistently with vls=1, whether it's done by the hardware or the > > emulator. > > Consistency should not be an issue, since VLS cannot be enabled when > the MAXPHYADDRs differ. VLS doesn't work in that scenario. Why? It's only broken if VMLOAD/VMSAVE is executed with a GPA that exceeds the guest's MAXPHYADDR, but not the host's, right? So only broken if the guest is misbehaving. Taking a step back, I am not disagreeing that VLS should not be used with different MAXPHYADDRs, I am just saying it might be. All that being said, I am fine with using cpuid_maxphyaddr(vcpu) instead of kvm_host.maxphyaddr. Will wait for Sean's feedback to figure out if a new version is needed.
On Fri, Mar 06, 2026, Yosry Ahmed wrote: > > > Right, but I am trying to have the #GP check for VMLOAD/VMSAVE behave > > > consistently with vls=1, whether it's done by the hardware or the > > > emulator. > > > > Consistency should not be an issue, since VLS cannot be enabled when > > the MAXPHYADDRs differ. VLS doesn't work in that scenario. > > Why? It's only broken if VMLOAD/VMSAVE is executed with a GPA that > exceeds the guest's MAXPHYADDR, but not the host's, right? So only > broken if the guest is misbehaving. > > Taking a step back, I am not disagreeing that VLS should not be used > with different MAXPHYADDRs, I am just saying it might be. KVM straight up doesn't support that (see my other reply). > All that being said, I am fine with using cpuid_maxphyaddr(vcpu) > instead of kvm_host.maxphyaddr. Will wait for Sean's feedback to > figure out if a new version is needed. LOL, Jim and I are of one mind when it comes to guest.MAXPHYADDR.
On Fri, Mar 6, 2026 at 4:32 PM Sean Christopherson <seanjc@google.com> wrote: > > On Fri, Mar 06, 2026, Yosry Ahmed wrote: > > > > Right, but I am trying to have the #GP check for VMLOAD/VMSAVE behave > > > > consistently with vls=1, whether it's done by the hardware or the > > > > emulator. > > > > > > Consistency should not be an issue, since VLS cannot be enabled when > > > the MAXPHYADDRs differ. VLS doesn't work in that scenario. > > > > Why? It's only broken if VMLOAD/VMSAVE is executed with a GPA that > > exceeds the guest's MAXPHYADDR, but not the host's, right? So only > > broken if the guest is misbehaving. > > > > Taking a step back, I am not disagreeing that VLS should not be used > > with different MAXPHYADDRs, I am just saying it might be. > > KVM straight up doesn't support that (see my other reply). Sean, I intend to send a new version today with 2 main diffs: - Use cpuid_maxphyaddr() here instead of kvm_host.maxphyaddr. - Use a common helper for checking RAX for SVM instructions for both the emulator and gp_interception() (see response on patch 4). Holler if you want me to wait for further feedback.
On Wed, Mar 11, 2026 at 11:31 AM Yosry Ahmed <yosry@kernel.org> wrote:
>
> On Fri, Mar 6, 2026 at 4:32 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Fri, Mar 06, 2026, Yosry Ahmed wrote:
> > > > > Right, but I am trying to have the #GP check for VMLOAD/VMSAVE behave
> > > > > consistently with vls=1, whether it's done by the hardware or the
> > > > > emulator.
> > > >
> > > > Consistency should not be an issue, since VLS cannot be enabled when
> > > > the MAXPHYADDRs differ. VLS doesn't work in that scenario.
> > >
> > > Why? It's only broken if VMLOAD/VMSAVE is executed with a GPA that
> > > exceeds the guest's MAXPHYADDR, but not the host's, right? So only
> > > broken if the guest is misbehaving.
> > >
> > > Taking a step back, I am not disagreeing that VLS should not be used
> > > with different MAXPHYADDRs, I am just saying it might be.
> >
> > KVM straight up doesn't support that (see my other reply).
>
> Sean, I intend to send a new version today with 2 main diffs:
> - Use cpuid_maxphyaddr() here instead of kvm_host.maxphyaddr.
> - Use a common helper for checking RAX for SVM instructions for both
> the emulator and gp_interception() (see response on patch 4).
>
> Holler if you want me to wait for further feedback.
I just realized I cannot just do cpuid_maxphyaddr(ctxt->vcpu) in
check_svme_pa(), because vcpu is defined as a void pointer in
x86_emulate_ctxt. Looking at commit c9b8b07cded5 ("KVM: x86:
Dynamically allocate per-vCPU emulation context"), I cannot tell why.
I was going to move emul_to_vcpu() to arch/x86/kvm/kvm_emulate.h, but
maybe we should just make this a normal struct kvm_vpcu pointer and
drop emul_to_vcpu() completely?
On Wed, Mar 11, 2026, Yosry Ahmed wrote:
> On Wed, Mar 11, 2026 at 11:31 AM Yosry Ahmed <yosry@kernel.org> wrote:
> >
> > On Fri, Mar 6, 2026 at 4:32 PM Sean Christopherson <seanjc@google.com> wrote:
> > >
> > > On Fri, Mar 06, 2026, Yosry Ahmed wrote:
> > > > > > Right, but I am trying to have the #GP check for VMLOAD/VMSAVE behave
> > > > > > consistently with vls=1, whether it's done by the hardware or the
> > > > > > emulator.
> > > > >
> > > > > Consistency should not be an issue, since VLS cannot be enabled when
> > > > > the MAXPHYADDRs differ. VLS doesn't work in that scenario.
> > > >
> > > > Why? It's only broken if VMLOAD/VMSAVE is executed with a GPA that
> > > > exceeds the guest's MAXPHYADDR, but not the host's, right? So only
> > > > broken if the guest is misbehaving.
> > > >
> > > > Taking a step back, I am not disagreeing that VLS should not be used
> > > > with different MAXPHYADDRs, I am just saying it might be.
> > >
> > > KVM straight up doesn't support that (see my other reply).
> >
> > Sean, I intend to send a new version today with 2 main diffs:
> > - Use cpuid_maxphyaddr() here instead of kvm_host.maxphyaddr.
> > - Use a common helper for checking RAX for SVM instructions for both
> > the emulator and gp_interception() (see response on patch 4).
> >
> > Holler if you want me to wait for further feedback.
>
> I just realized I cannot just do cpuid_maxphyaddr(ctxt->vcpu) in
> check_svme_pa(), because vcpu is defined as a void pointer in
> x86_emulate_ctxt. Looking at commit c9b8b07cded5 ("KVM: x86:
> Dynamically allocate per-vCPU emulation context"), I cannot tell why.
To prevent dereferencing the vcpu object in emulator code. It's kinda silly
because common KVM is tightly coupled to the emulator, but we try to contain
what the emulator can do.
> I was going to move emul_to_vcpu() to arch/x86/kvm/kvm_emulate.h, but
> maybe we should just make this a normal struct kvm_vpcu pointer and
> drop emul_to_vcpu() completely?
Heh, talk about scope creep, that'll open a much bigger can of worms and subsequent
discussion.
Honestly, why bother keeping check_svme_pa()? Can't we just do the checks in
svm_check_intercept()? E.g. vmx_check_intercept() already "injects" #UD for RDTSCP.
Ha! And dropping check_svme_pa() would technically be a bug fix, because the #GP
is supposed to have lower priority than the #UD due to EFER.SVME=0.
> > > Sean, I intend to send a new version today with 2 main diffs:
> > > - Use cpuid_maxphyaddr() here instead of kvm_host.maxphyaddr.
> > > - Use a common helper for checking RAX for SVM instructions for both
> > > the emulator and gp_interception() (see response on patch 4).
> > >
> > > Holler if you want me to wait for further feedback.
> >
> > I just realized I cannot just do cpuid_maxphyaddr(ctxt->vcpu) in
> > check_svme_pa(), because vcpu is defined as a void pointer in
> > x86_emulate_ctxt. Looking at commit c9b8b07cded5 ("KVM: x86:
> > Dynamically allocate per-vCPU emulation context"), I cannot tell why.
>
> To prevent dereferencing the vcpu object in emulator code. It's kinda silly
> because common KVM is tightly coupled to the emulator, but we try to contain
> what the emulator can do.
>
> > I was going to move emul_to_vcpu() to arch/x86/kvm/kvm_emulate.h, but
> > maybe we should just make this a normal struct kvm_vpcu pointer and
> > drop emul_to_vcpu() completely?
>
> Heh, talk about scope creep, that'll open a much bigger can of worms and subsequent
> discussion.
Ack.
> Honestly, why bother keeping check_svme_pa()? Can't we just do the checks in
> svm_check_intercept()? E.g. vmx_check_intercept() already "injects" #UD for RDTSCP.
Hmm svm_check_intercept() isn't semantically the right place AFAICT,
and more importantly, it's only called if the instruction is executed
in guest mode (i.e. in L2+).
> Ha! And dropping check_svme_pa() would technically be a bug fix, because the #GP
> is supposed to have lower priority than the #UD due to EFER.SVME=0.
That can probably be fixed by calling check_svme() before checking RAX
in check_svme_pa().
I think we should keep check_svme_pa(), but we'll need to extract the
vCPU from the emulation context one way or another to check
MAXPHYADDR. We can cast ctxt->vcpu or pull emul_to_vcpu() into
arch/x86/kvm/kvm_emulate.h and use it.
On Wed, Mar 11, 2026, Yosry Ahmed wrote:
> > > > Sean, I intend to send a new version today with 2 main diffs:
> > > > - Use cpuid_maxphyaddr() here instead of kvm_host.maxphyaddr.
> > > > - Use a common helper for checking RAX for SVM instructions for both
> > > > the emulator and gp_interception() (see response on patch 4).
> > > >
> > > > Holler if you want me to wait for further feedback.
> > >
> > > I just realized I cannot just do cpuid_maxphyaddr(ctxt->vcpu) in
> > > check_svme_pa(), because vcpu is defined as a void pointer in
> > > x86_emulate_ctxt. Looking at commit c9b8b07cded5 ("KVM: x86:
> > > Dynamically allocate per-vCPU emulation context"), I cannot tell why.
> >
> > To prevent dereferencing the vcpu object in emulator code. It's kinda silly
> > because common KVM is tightly coupled to the emulator, but we try to contain
> > what the emulator can do.
> >
> > > I was going to move emul_to_vcpu() to arch/x86/kvm/kvm_emulate.h, but
> > > maybe we should just make this a normal struct kvm_vpcu pointer and
> > > drop emul_to_vcpu() completely?
> >
> > Heh, talk about scope creep, that'll open a much bigger can of worms and subsequent
> > discussion.
>
> Ack.
>
> > Honestly, why bother keeping check_svme_pa()? Can't we just do the checks in
> > svm_check_intercept()? E.g. vmx_check_intercept() already "injects" #UD for RDTSCP.
>
> Hmm svm_check_intercept() isn't semantically the right place AFAICT,
Actually, I think it is. Per the APM:
Generally, instruction intercepts are checked after simple exceptions (such as
#GP—when CPL is incorrect—or #UD) have been checked, but before exceptions
related to memory accesses (such as page faults) and exceptions based on
specific operand values.
This #GP is operand specific.
> and more importantly, it's only called if the instruction is executed
> in guest mode (i.e. in L2+).
Hold up, we're getting ahead of ourselves.
The only legitimate reason the emulator is at all aware of VMSAVE, VMLOAD, and
VMRUN is to deal with #GP due to the RAX check, because hardware checks the GPA
against the host's physical address space. See commit 82a11e9c6fa2 ("KVM: SVM:
Add emulation support for #GP triggered by SVM instructions").
The emulator "support" was originally added by commit 01de8b09e606 ("KVM: SVM:
Add intercept checks for SVM instructions"), but AFAICT, for all intents and
purposes that was dead code when it was added, because the emulator doesn't
actually _emulate_ the instructions. I assume if they aren't intercepted, and
KVM is full on emulating instead of just decoding, they end up at EMULATION_FAILED
and get a #UD or something.
Outside of forced emulation or code stream rewriting, KVM should _never_ fully
emulate any of the SVM instructions except VMMCALL (and that is a super special
case). KVM does need to _decode_ the instruction, and it needs to get the
pre-intercept exception checks correct so that KVM correctly injects e.g. #GP
instead of synthesizing a #VMEXIT for the CPL check, but KVM doesn't need to do
*all* of the checks.
Note, for L2, the SVME check is meaningless, as EFER.SVME has to be set for L2
to be active, i.e. it's L1's responsibility to handle that check.
Back to the physical address thing, KVM _already_ handles that check in the #GP
path, it's just wrong too:
/* All SVM instructions expect page aligned RAX */
if (svm->vmcb->save.rax & ~PAGE_MASK)
goto reinject;
So I think what we want is to
(a) fix the RAX check in gp_interception()
(b) drop the RAX check in the emulator
(c) add a CPL check in the emulator (because the intercepted #GP could have
been due to L2 executing at CPL>0, not due to a bad-but-good RAX).
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index c8e292e9a24d..74df977a38ca 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -3867,18 +3867,10 @@ static int check_svme(struct x86_emulate_ctxt *ctxt)
if (!(efer & EFER_SVME))
return emulate_ud(ctxt);
- return X86EMUL_CONTINUE;
-}
-
-static int check_svme_pa(struct x86_emulate_ctxt *ctxt)
-{
- u64 rax = reg_read(ctxt, VCPU_REGS_RAX);
-
- /* Valid physical address? */
- if (rax & 0xffff000000000000ULL)
+ if (ctxt->ops->cpl(ctxt))
return emulate_gp(ctxt, 0);
- return check_svme(ctxt);
+ return X86EMUL_CONTINUE;
}
static int check_rdtsc(struct x86_emulate_ctxt *ctxt)
@@ -3984,10 +3976,10 @@ static const struct opcode group7_rm2[] = {
};
static const struct opcode group7_rm3[] = {
- DIP(SrcNone | Prot | Priv, vmrun, check_svme_pa),
+ DIP(SrcNone | Prot | Priv, vmrun, check_svme),
II(SrcNone | Prot | EmulateOnUD, em_hypercall, vmmcall),
- DIP(SrcNone | Prot | Priv, vmload, check_svme_pa),
- DIP(SrcNone | Prot | Priv, vmsave, check_svme_pa),
+ DIP(SrcNone | Prot | Priv, vmload, check_svme),
+ DIP(SrcNone | Prot | Priv, vmsave, check_svme),
DIP(SrcNone | Prot | Priv, stgi, check_svme),
DIP(SrcNone | Prot | Priv, clgi, check_svme),
DIP(SrcNone | Prot | Priv, skinit, check_svme),
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index e6691c044913..e1223c07593b 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2294,7 +2294,7 @@ static int gp_interception(struct kvm_vcpu *vcpu)
EMULTYPE_VMWARE_GP | EMULTYPE_NO_DECODE);
} else {
/* All SVM instructions expect page aligned RAX */
- if (svm->vmcb->save.rax & ~PAGE_MASK)
+ if (!page_address_valid(vcpu, svm->vmcb->save.rax))
goto reinject;
return emulate_svm_instr(vcpu, opcode);
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index c8e292e9a24d..74df977a38ca 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -3867,18 +3867,10 @@ static int check_svme(struct x86_emulate_ctxt *ctxt)
> if (!(efer & EFER_SVME))
> return emulate_ud(ctxt);
>
> - return X86EMUL_CONTINUE;
> -}
> -
> -static int check_svme_pa(struct x86_emulate_ctxt *ctxt)
> -{
> - u64 rax = reg_read(ctxt, VCPU_REGS_RAX);
> -
> - /* Valid physical address? */
> - if (rax & 0xffff000000000000ULL)
> + if (ctxt->ops->cpl(ctxt))
> return emulate_gp(ctxt, 0);
>
> - return check_svme(ctxt);
> + return X86EMUL_CONTINUE;
> }
>
> static int check_rdtsc(struct x86_emulate_ctxt *ctxt)
> @@ -3984,10 +3976,10 @@ static const struct opcode group7_rm2[] = {
> };
>
> static const struct opcode group7_rm3[] = {
> - DIP(SrcNone | Prot | Priv, vmrun, check_svme_pa),
> + DIP(SrcNone | Prot | Priv, vmrun, check_svme),
> II(SrcNone | Prot | EmulateOnUD, em_hypercall, vmmcall),
> - DIP(SrcNone | Prot | Priv, vmload, check_svme_pa),
> - DIP(SrcNone | Prot | Priv, vmsave, check_svme_pa),
> + DIP(SrcNone | Prot | Priv, vmload, check_svme),
> + DIP(SrcNone | Prot | Priv, vmsave, check_svme),
> DIP(SrcNone | Prot | Priv, stgi, check_svme),
> DIP(SrcNone | Prot | Priv, clgi, check_svme),
> DIP(SrcNone | Prot | Priv, skinit, check_svme),
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index e6691c044913..e1223c07593b 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -2294,7 +2294,7 @@ static int gp_interception(struct kvm_vcpu *vcpu)
> EMULTYPE_VMWARE_GP | EMULTYPE_NO_DECODE);
> } else {
> /* All SVM instructions expect page aligned RAX */
> - if (svm->vmcb->save.rax & ~PAGE_MASK)
> + if (!page_address_valid(vcpu, svm->vmcb->save.rax))
> goto reinject;
Final observation (hopefully), this check needs to be moved to the
VMRUN/VMLOAD/VMSAVE interception functions. As kvm_vcpu_map() failures
will stop injecting #GP, we still need to handle the case where
allow_smaller_maxphyaddr is used and the GPA is illegal from the
vCPU's perspective but not the host. In this case, the CPU does not
inject a #GP and the instructions are intercepted, so we need to make
sure the interception functions do the legality check on RAX.
I am doing this in a separate patch after fixing the check, and
opportunistically cleaning up gp_interception() by doing an early
return for the non-SVM insn case to reduce indentation. Will send v3
out after I am done testing.
On Thu, Mar 12, 2026, Yosry Ahmed wrote:
> > diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> > index c8e292e9a24d..74df977a38ca 100644
> > --- a/arch/x86/kvm/emulate.c
> > +++ b/arch/x86/kvm/emulate.c
> > @@ -3867,18 +3867,10 @@ static int check_svme(struct x86_emulate_ctxt *ctxt)
> > if (!(efer & EFER_SVME))
> > return emulate_ud(ctxt);
> >
> > - return X86EMUL_CONTINUE;
> > -}
> > -
> > -static int check_svme_pa(struct x86_emulate_ctxt *ctxt)
> > -{
> > - u64 rax = reg_read(ctxt, VCPU_REGS_RAX);
> > -
> > - /* Valid physical address? */
> > - if (rax & 0xffff000000000000ULL)
> > + if (ctxt->ops->cpl(ctxt))
> > return emulate_gp(ctxt, 0);
> >
> > - return check_svme(ctxt);
> > + return X86EMUL_CONTINUE;
> > }
> >
> > static int check_rdtsc(struct x86_emulate_ctxt *ctxt)
> > @@ -3984,10 +3976,10 @@ static const struct opcode group7_rm2[] = {
> > };
> >
> > static const struct opcode group7_rm3[] = {
> > - DIP(SrcNone | Prot | Priv, vmrun, check_svme_pa),
> > + DIP(SrcNone | Prot | Priv, vmrun, check_svme),
> > II(SrcNone | Prot | EmulateOnUD, em_hypercall, vmmcall),
> > - DIP(SrcNone | Prot | Priv, vmload, check_svme_pa),
> > - DIP(SrcNone | Prot | Priv, vmsave, check_svme_pa),
> > + DIP(SrcNone | Prot | Priv, vmload, check_svme),
> > + DIP(SrcNone | Prot | Priv, vmsave, check_svme),
> > DIP(SrcNone | Prot | Priv, stgi, check_svme),
> > DIP(SrcNone | Prot | Priv, clgi, check_svme),
> > DIP(SrcNone | Prot | Priv, skinit, check_svme),
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index e6691c044913..e1223c07593b 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -2294,7 +2294,7 @@ static int gp_interception(struct kvm_vcpu *vcpu)
> > EMULTYPE_VMWARE_GP | EMULTYPE_NO_DECODE);
> > } else {
> > /* All SVM instructions expect page aligned RAX */
> > - if (svm->vmcb->save.rax & ~PAGE_MASK)
> > + if (!page_address_valid(vcpu, svm->vmcb->save.rax))
> > goto reinject;
>
> Final observation (hopefully), this check needs to be moved to the
> VMRUN/VMLOAD/VMSAVE interception functions.
Gah, yeah. I noticed that when initially typing up my response, but lost track
of it when I got distracted by all the emulator crud.
> As kvm_vcpu_map() failures will stop injecting #GP, we still need to handle
> the case where allow_smaller_maxphyaddr is used and the GPA is illegal from
> the vCPU's perspective but not the host.
allow_smaller_maxphyaddr is irrelevant. My read of the APM is that the intercept
has priority over the #GP due to a bad RAX. So with vls=0, KVM needs to check
RAX irrespective of allow_smaller_maxphyaddr.
> > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > > index e6691c044913..e1223c07593b 100644
> > > --- a/arch/x86/kvm/svm/svm.c
> > > +++ b/arch/x86/kvm/svm/svm.c
> > > @@ -2294,7 +2294,7 @@ static int gp_interception(struct kvm_vcpu *vcpu)
> > > EMULTYPE_VMWARE_GP | EMULTYPE_NO_DECODE);
> > > } else {
> > > /* All SVM instructions expect page aligned RAX */
> > > - if (svm->vmcb->save.rax & ~PAGE_MASK)
> > > + if (!page_address_valid(vcpu, svm->vmcb->save.rax))
> > > goto reinject;
> >
> > Final observation (hopefully), this check needs to be moved to the
> > VMRUN/VMLOAD/VMSAVE interception functions.
>
> Gah, yeah. I noticed that when initially typing up my response, but lost track
> of it when I got distracted by all the emulator crud.
>
> > As kvm_vcpu_map() failures will stop injecting #GP, we still need to handle
> > the case where allow_smaller_maxphyaddr is used and the GPA is illegal from
> > the vCPU's perspective but not the host.
>
> allow_smaller_maxphyaddr is irrelevant. My read of the APM is that the intercept
> has priority over the #GP due to a bad RAX. So with vls=0, KVM needs to check
> RAX irrespective of allow_smaller_maxphyaddr.
Oh yeah, good point.
> > > Honestly, why bother keeping check_svme_pa()? Can't we just do the checks in
> > > svm_check_intercept()? E.g. vmx_check_intercept() already "injects" #UD for RDTSCP.
> >
> > Hmm svm_check_intercept() isn't semantically the right place AFAICT,
>
> Actually, I think it is. Per the APM:
>
> Generally, instruction intercepts are checked after simple exceptions (such as
> #GP—when CPL is incorrect—or #UD) have been checked, but before exceptions
> related to memory accesses (such as page faults) and exceptions based on
> specific operand values.
>
> This #GP is operand specific.
Yes, but from KVM's perspective, svm_check_intercept() is meant to
check if the instruction executed by L2 needs to be intercepted by L1.
What this patch is doing is not specific to instructions executed by
L2.
Anyway, that's irrelevant now (see below).
>
> > and more importantly, it's only called if the instruction is executed
> > in guest mode (i.e. in L2+).
>
> Hold up, we're getting ahead of ourselves.
>
> The only legitimate reason the emulator is at all aware of VMSAVE, VMLOAD, and
> VMRUN is to deal with #GP due to the RAX check, because hardware checks the GPA
> against the host's physical address space. See commit 82a11e9c6fa2 ("KVM: SVM:
> Add emulation support for #GP triggered by SVM instructions").
>
> The emulator "support" was originally added by commit 01de8b09e606 ("KVM: SVM:
> Add intercept checks for SVM instructions"), but AFAICT, for all intents and
> purposes that was dead code when it was added, because the emulator doesn't
> actually _emulate_ the instructions. I assume if they aren't intercepted, and
> KVM is full on emulating instead of just decoding, they end up at EMULATION_FAILED
> and get a #UD or something.
>
> Outside of forced emulation or code stream rewriting, KVM should _never_ fully
> emulate any of the SVM instructions except VMMCALL (and that is a super special
> case). KVM does need to _decode_ the instruction, and it needs to get the
> pre-intercept exception checks correct so that KVM correctly injects e.g. #GP
> instead of synthesizing a #VMEXIT for the CPL check, but KVM doesn't need to do
> *all* of the checks.
>
> Note, for L2, the SVME check is meaningless, as EFER.SVME has to be set for L2
> to be active, i.e. it's L1's responsibility to handle that check.
>
> Back to the physical address thing, KVM _already_ handles that check in the #GP
> path,
I guess if KVM is not intercepting #GP, then the hardware injects the
#GP and the emulator still doesn't have to worry about it -- because
we don't support the case where RAX can be legal from the host's
perspective but not the guest's. Makes sense.
> it's just wrong too:
>
> /* All SVM instructions expect page aligned RAX */
> if (svm->vmcb->save.rax & ~PAGE_MASK)
> goto reinject;
>
> So I think what we want is to
>
> (a) fix the RAX check in gp_interception()
> (b) drop the RAX check in the emulator
> (c) add a CPL check in the emulator (because the intercepted #GP could have
> been due to L2 executing at CPL>0, not due to a bad-but-good RAX).
Agree with this (also thanks for pointing out page_address_valid()).
Will add 3 patches doing this at the beginning of the series instead
of this one.
> > Hold up, we're getting ahead of ourselves.
> >
> > The only legitimate reason the emulator is at all aware of VMSAVE, VMLOAD, and
> > VMRUN is to deal with #GP due to the RAX check, because hardware checks the GPA
> > against the host's physical address space. See commit 82a11e9c6fa2 ("KVM: SVM:
> > Add emulation support for #GP triggered by SVM instructions").
> >
> > The emulator "support" was originally added by commit 01de8b09e606 ("KVM: SVM:
> > Add intercept checks for SVM instructions"), but AFAICT, for all intents and
> > purposes that was dead code when it was added, because the emulator doesn't
> > actually _emulate_ the instructions. I assume if they aren't intercepted, and
> > KVM is full on emulating instead of just decoding, they end up at EMULATION_FAILED
> > and get a #UD or something.
> >
> > Outside of forced emulation or code stream rewriting, KVM should _never_ fully
> > emulate any of the SVM instructions except VMMCALL (and that is a super special
> > case). KVM does need to _decode_ the instruction, and it needs to get the
> > pre-intercept exception checks correct so that KVM correctly injects e.g. #GP
> > instead of synthesizing a #VMEXIT for the CPL check, but KVM doesn't need to do
> > *all* of the checks.
> >
> > Note, for L2, the SVME check is meaningless, as EFER.SVME has to be set for L2
> > to be active, i.e. it's L1's responsibility to handle that check.
> >
> > Back to the physical address thing, KVM _already_ handles that check in the #GP
> > path,
>
> I guess if KVM is not intercepting #GP, then the hardware injects the
> #GP and the emulator still doesn't have to worry about it -- because
> we don't support the case where RAX can be legal from the host's
> perspective but not the guest's. Makes sense.
>
> > it's just wrong too:
> >
> > /* All SVM instructions expect page aligned RAX */
> > if (svm->vmcb->save.rax & ~PAGE_MASK)
> > goto reinject;
> >
> > So I think what we want is to
> >
> > (a) fix the RAX check in gp_interception()
> > (b) drop the RAX check in the emulator
> > (c) add a CPL check in the emulator (because the intercepted #GP could have
> > been due to L2 executing at CPL>0, not due to a bad-but-good RAX).
Actually, I don't think (c) is needed. In the path where KVM
intercepts #GP, it doesn't go through the emulation path which ends up
calling check_svme(), it only uses the emulator to decode the
instruction.
AFAICT, we can end up in the emulator only when the CPU does not
produce a #GP, e.g. when we get a #NPF on the address in RAX. In this
case, the CPU will have already checked the CPL for us, and the
validity of the address. The emulator checking EFER.SVME check is
probably also useless, because with Kevin's patches we should always
be intercepting VMLOAD/VMSAVE when EFER.SVME is disabled by the guest
and checking EFER.SVME there anyway.
Anyway, I want to touch the emulator as little as possible tbh, so I
will still do (b) because it unblocks this series (removes the wrong
GPA check that injects #GP), but will defer any further cleanups.
On Wed, Mar 11, 2026, Yosry Ahmed wrote:
> > > Hold up, we're getting ahead of ourselves.
> > >
> > > The only legitimate reason the emulator is at all aware of VMSAVE, VMLOAD, and
> > > VMRUN is to deal with #GP due to the RAX check, because hardware checks the GPA
> > > against the host's physical address space. See commit 82a11e9c6fa2 ("KVM: SVM:
> > > Add emulation support for #GP triggered by SVM instructions").
> > >
> > > The emulator "support" was originally added by commit 01de8b09e606 ("KVM: SVM:
> > > Add intercept checks for SVM instructions"), but AFAICT, for all intents and
> > > purposes that was dead code when it was added, because the emulator doesn't
> > > actually _emulate_ the instructions. I assume if they aren't intercepted, and
> > > KVM is full on emulating instead of just decoding, they end up at EMULATION_FAILED
> > > and get a #UD or something.
> > >
> > > Outside of forced emulation or code stream rewriting, KVM should _never_ fully
> > > emulate any of the SVM instructions except VMMCALL (and that is a super special
> > > case). KVM does need to _decode_ the instruction, and it needs to get the
> > > pre-intercept exception checks correct so that KVM correctly injects e.g. #GP
> > > instead of synthesizing a #VMEXIT for the CPL check, but KVM doesn't need to do
> > > *all* of the checks.
> > >
> > > Note, for L2, the SVME check is meaningless, as EFER.SVME has to be set for L2
> > > to be active, i.e. it's L1's responsibility to handle that check.
> > >
> > > Back to the physical address thing, KVM _already_ handles that check in the #GP
> > > path,
> >
> > I guess if KVM is not intercepting #GP, then the hardware injects the
> > #GP and the emulator still doesn't have to worry about it -- because
> > we don't support the case where RAX can be legal from the host's
> > perspective but not the guest's. Makes sense.
> >
> > > it's just wrong too:
> > >
> > > /* All SVM instructions expect page aligned RAX */
> > > if (svm->vmcb->save.rax & ~PAGE_MASK)
> > > goto reinject;
> > >
> > > So I think what we want is to
> > >
> > > (a) fix the RAX check in gp_interception()
> > > (b) drop the RAX check in the emulator
> > > (c) add a CPL check in the emulator (because the intercepted #GP could have
> > > been due to L2 executing at CPL>0, not due to a bad-but-good RAX).
>
> Actually, I don't think (c) is needed. In the path where KVM
> intercepts #GP, it doesn't go through the emulation path which ends up
> calling check_svme(), it only uses the emulator to decode the
> instruction.
Oh, dagnabbit, that's stupidly obvious in hindsight.
> AFAICT, we can end up in the emulator only when the CPU does not
> produce a #GP, e.g. when we get a #NPF on the address in RAX. In this
> case, the CPU will have already checked the CPL for us, and the
> validity of the address. The emulator checking EFER.SVME check is
> probably also useless, because with Kevin's patches we should always
> be intercepting VMLOAD/VMSAVE when EFER.SVME is disabled by the guest
> and checking EFER.SVME there anyway.
>
> Anyway, I want to touch the emulator as little as possible tbh, so I
> will still do (b) because it unblocks this series (removes the wrong
> GPA check that injects #GP), but will defer any further cleanups.
Yeah, works for me.
On Fri, Mar 6, 2026 at 3:20 PM Yosry Ahmed <yosry@kernel.org> wrote: > > > > Right, but I am trying to have the #GP check for VMLOAD/VMSAVE behave > > > consistently with vls=1, whether it's done by the hardware or the > > > emulator. > > > > Consistency should not be an issue, since VLS cannot be enabled when > > the MAXPHYADDRs differ. VLS doesn't work in that scenario. > > Why? It's only broken if VMLOAD/VMSAVE is executed with a GPA that > exceeds the guest's MAXPHYADDR, but not the host's, right? So only > broken if the guest is misbehaving. "Misbehaving" is a tad pejorative. Faulting behavior is part of the architectural specification. A less biased assessment is that VLS is partially correct when the MAXPHYADDRs don't match. People thought it was a big deal when FDIV produced incorrect results one out of 10 billion times. > Taking a step back, I am not disagreeing that VLS should not be used > with different MAXPHYADDRs, I am just saying it might be. That would be wrong.
© 2016 - 2026 Red Hat, Inc.