[PATCH v2 04/13] KVM: nSVM: Fix consistency checks for NP_ENABLE

Yosry Ahmed posted 13 patches 2 months, 4 weeks ago
There is a newer version of this series
[PATCH v2 04/13] KVM: nSVM: Fix consistency checks for NP_ENABLE
Posted by Yosry Ahmed 2 months, 4 weeks ago
KVM currenty fails a nested VMRUN and injects VMEXIT_INVALID (aka
SVM_EXIT_ERR) if L1 sets NP_ENABLE and the host does not support NPTs.
On first glance, it seems like the check should actually be for
guest_cpu_cap_has(X86_FEATURE_NPT) instead, as it is possible for the
host to support NPTs but the guest CPUID to not advertise it.

However, the consistency check is not architectural to begin with. The
APM does not mention VMEXIT_INVALID if NP_ENABLE is set on a processor
that does not have X86_FEATURE_NPT. Hence, NP_ENABLE should be ignored
if X86_FEATURE_NPT is not available for L1. Apart from the consistency
check, this is currently the case because NP_ENABLE is actually copied
from VMCB01 to VMCB02, not from VMCB12.

On the other hand, the APM does mention two other consistency checks for
NP_ENABLE, both of which are missing (paraphrased):

In Volume #2, 15.25.3 (24593—Rev. 3.42—March 2024):

  If VMRUN is executed with hCR0.PG cleared to zero and NP_ENABLE set to
  1, VMRUN terminates with #VMEXIT(VMEXIT_INVALID)

In Volume #2, 15.25.4 (24593—Rev. 3.42—March 2024):

  When VMRUN is executed with nested paging enabled (NP_ENABLE = 1), the
  following conditions are considered illegal state combinations, in
  addition to those mentioned in “Canonicalization and Consistency
  Checks”:
    • Any MBZ bit of nCR3 is set.
    • Any G_PAT.PA field has an unsupported type encoding or any
    reserved field in G_PAT has a nonzero value.

Replace the existing consistency check with consistency checks on
hCR0.PG and nCR3. Only perform the consistency checks if L1 has
X86_FEATURE_NPT and NP_ENABLE is set in VMCB12. The G_PAT consistency
check will be addressed separately.

As it is now possible for an L1 to run L2 with NP_ENABLE set but
ignored, also check that L1 has X86_FEATURE_NPT in nested_npt_enabled().

Pass L1's CR0 to __nested_vmcb_check_controls(). In
nested_vmcb_check_controls(), L1's CR0 is available through
kvm_read_cr0(), as vcpu->arch.cr0 is not updated to L2's CR0 until later
through nested_vmcb02_prepare_save() -> svm_set_cr0().

In svm_set_nested_state(), L1's CR0 is available in the captured save
area, as svm_get_nested_state() captures L1's save area when running L2,
and L1's CR0 is stashed in VMCB01 on nested VMRUN (in
nested_svm_vmrun()).

Fixes: 4b16184c1cca ("KVM: SVM: Initialize Nested Nested MMU context on VMRUN")
Cc: stable@vger.kernel.org
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
---
 arch/x86/kvm/svm/nested.c | 21 ++++++++++++++++-----
 arch/x86/kvm/svm/svm.h    |  3 ++-
 2 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 74211c5c68026..87bcc5eff96e8 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -325,7 +325,8 @@ static bool nested_svm_check_bitmap_pa(struct kvm_vcpu *vcpu, u64 pa, u32 size)
 }
 
 static bool __nested_vmcb_check_controls(struct kvm_vcpu *vcpu,
-					 struct vmcb_ctrl_area_cached *control)
+					 struct vmcb_ctrl_area_cached *control,
+					 unsigned long l1_cr0)
 {
 	if (CC(!vmcb12_is_intercept(control, INTERCEPT_VMRUN)))
 		return false;
@@ -333,8 +334,12 @@ static bool __nested_vmcb_check_controls(struct kvm_vcpu *vcpu,
 	if (CC(control->asid == 0))
 		return false;
 
-	if (CC((control->nested_ctl & SVM_NESTED_CTL_NP_ENABLE) && !npt_enabled))
-		return false;
+	if (nested_npt_enabled(to_svm(vcpu))) {
+		if (CC(!kvm_vcpu_is_legal_gpa(vcpu, control->nested_cr3)))
+			return false;
+		if (CC(!(l1_cr0 & X86_CR0_PG)))
+			return false;
+	}
 
 	if (CC(!nested_svm_check_bitmap_pa(vcpu, control->msrpm_base_pa,
 					   MSRPM_SIZE)))
@@ -400,7 +405,12 @@ static bool nested_vmcb_check_controls(struct kvm_vcpu *vcpu)
 	struct vcpu_svm *svm = to_svm(vcpu);
 	struct vmcb_ctrl_area_cached *ctl = &svm->nested.ctl;
 
-	return __nested_vmcb_check_controls(vcpu, ctl);
+	/*
+	 * Make sure we did not enter guest mode yet, in which case
+	 * kvm_read_cr0() could return L2's CR0.
+	 */
+	WARN_ON_ONCE(is_guest_mode(vcpu));
+	return __nested_vmcb_check_controls(vcpu, ctl, kvm_read_cr0(vcpu));
 }
 
 static
@@ -1831,7 +1841,8 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu,
 
 	ret = -EINVAL;
 	__nested_copy_vmcb_control_to_cache(vcpu, &ctl_cached, ctl);
-	if (!__nested_vmcb_check_controls(vcpu, &ctl_cached))
+	/* 'save' contains L1 state saved from before VMRUN */
+	if (!__nested_vmcb_check_controls(vcpu, &ctl_cached, save->cr0))
 		goto out_free;
 
 	/*
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index f6fb70ddf7272..3e805a43ffcdb 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -552,7 +552,8 @@ static inline bool gif_set(struct vcpu_svm *svm)
 
 static inline bool nested_npt_enabled(struct vcpu_svm *svm)
 {
-	return svm->nested.ctl.nested_ctl & SVM_NESTED_CTL_NP_ENABLE;
+	return guest_cpu_cap_has(&svm->vcpu, X86_FEATURE_NPT) &&
+		svm->nested.ctl.nested_ctl & SVM_NESTED_CTL_NP_ENABLE;
 }
 
 static inline bool nested_vnmi_enabled(struct vcpu_svm *svm)
-- 
2.51.2.1041.gc1ab5b90ca-goog

Re: [PATCH v2 04/13] KVM: nSVM: Fix consistency checks for NP_ENABLE
Posted by Sean Christopherson 2 months ago
On Mon, Nov 10, 2025, Yosry Ahmed wrote:
> KVM currenty fails a nested VMRUN and injects VMEXIT_INVALID (aka
> SVM_EXIT_ERR) if L1 sets NP_ENABLE and the host does not support NPTs.
> On first glance, it seems like the check should actually be for
> guest_cpu_cap_has(X86_FEATURE_NPT) instead, as it is possible for the
> host to support NPTs but the guest CPUID to not advertise it.
> 
> However, the consistency check is not architectural to begin with. The
> APM does not mention VMEXIT_INVALID if NP_ENABLE is set on a processor
> that does not have X86_FEATURE_NPT. Hence, NP_ENABLE should be ignored
> if X86_FEATURE_NPT is not available for L1. Apart from the consistency
> check, this is currently the case because NP_ENABLE is actually copied
> from VMCB01 to VMCB02, not from VMCB12.
> 
> On the other hand, the APM does mention two other consistency checks for
> NP_ENABLE, both of which are missing (paraphrased):
> 
> In Volume #2, 15.25.3 (24593—Rev. 3.42—March 2024):
> 
>   If VMRUN is executed with hCR0.PG cleared to zero and NP_ENABLE set to
>   1, VMRUN terminates with #VMEXIT(VMEXIT_INVALID)
> 
> In Volume #2, 15.25.4 (24593—Rev. 3.42—March 2024):
> 
>   When VMRUN is executed with nested paging enabled (NP_ENABLE = 1), the
>   following conditions are considered illegal state combinations, in
>   addition to those mentioned in “Canonicalization and Consistency
>   Checks”:
>     • Any MBZ bit of nCR3 is set.
>     • Any G_PAT.PA field has an unsupported type encoding or any
>     reserved field in G_PAT has a nonzero value.

This should be three patches, one each for the new consistency checks, and one
to the made-up check.  Shortlogs like "Fix all the bugs" are strong hints that
a patch is doing too much.

> Replace the existing consistency check with consistency checks on
> hCR0.PG and nCR3. Only perform the consistency checks if L1 has
> X86_FEATURE_NPT and NP_ENABLE is set in VMCB12. The G_PAT consistency
> check will be addressed separately.
> 
> As it is now possible for an L1 to run L2 with NP_ENABLE set but
> ignored, also check that L1 has X86_FEATURE_NPT in nested_npt_enabled().
> 
> Pass L1's CR0 to __nested_vmcb_check_controls(). In
> nested_vmcb_check_controls(), L1's CR0 is available through
> kvm_read_cr0(), as vcpu->arch.cr0 is not updated to L2's CR0 until later
> through nested_vmcb02_prepare_save() -> svm_set_cr0().
> 
> In svm_set_nested_state(), L1's CR0 is available in the captured save
> area, as svm_get_nested_state() captures L1's save area when running L2,
> and L1's CR0 is stashed in VMCB01 on nested VMRUN (in
> nested_svm_vmrun()).
> 
> Fixes: 4b16184c1cca ("KVM: SVM: Initialize Nested Nested MMU context on VMRUN")
> Cc: stable@vger.kernel.org
> Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
> ---
>  arch/x86/kvm/svm/nested.c | 21 ++++++++++++++++-----
>  arch/x86/kvm/svm/svm.h    |  3 ++-
>  2 files changed, 18 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index 74211c5c68026..87bcc5eff96e8 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -325,7 +325,8 @@ static bool nested_svm_check_bitmap_pa(struct kvm_vcpu *vcpu, u64 pa, u32 size)
>  }
>  
>  static bool __nested_vmcb_check_controls(struct kvm_vcpu *vcpu,
> -					 struct vmcb_ctrl_area_cached *control)
> +					 struct vmcb_ctrl_area_cached *control,
> +					 unsigned long l1_cr0)
>  {
>  	if (CC(!vmcb12_is_intercept(control, INTERCEPT_VMRUN)))
>  		return false;
> @@ -333,8 +334,12 @@ static bool __nested_vmcb_check_controls(struct kvm_vcpu *vcpu,
>  	if (CC(control->asid == 0))
>  		return false;
>  
> -	if (CC((control->nested_ctl & SVM_NESTED_CTL_NP_ENABLE) && !npt_enabled))
> -		return false;
> +	if (nested_npt_enabled(to_svm(vcpu))) {
> +		if (CC(!kvm_vcpu_is_legal_gpa(vcpu, control->nested_cr3)))
> +			return false;
> +		if (CC(!(l1_cr0 & X86_CR0_PG)))
> +			return false;
> +	}
>  
>  	if (CC(!nested_svm_check_bitmap_pa(vcpu, control->msrpm_base_pa,
>  					   MSRPM_SIZE)))
> @@ -400,7 +405,12 @@ static bool nested_vmcb_check_controls(struct kvm_vcpu *vcpu)
>  	struct vcpu_svm *svm = to_svm(vcpu);
>  	struct vmcb_ctrl_area_cached *ctl = &svm->nested.ctl;
>  
> -	return __nested_vmcb_check_controls(vcpu, ctl);
> +	/*
> +	 * Make sure we did not enter guest mode yet, in which case

No pronouns.

> +	 * kvm_read_cr0() could return L2's CR0.
> +	 */
> +	WARN_ON_ONCE(is_guest_mode(vcpu));
> +	return __nested_vmcb_check_controls(vcpu, ctl, kvm_read_cr0(vcpu));
>  }
>  
>  static
> @@ -1831,7 +1841,8 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu,
>  
>  	ret = -EINVAL;
>  	__nested_copy_vmcb_control_to_cache(vcpu, &ctl_cached, ctl);
> -	if (!__nested_vmcb_check_controls(vcpu, &ctl_cached))
> +	/* 'save' contains L1 state saved from before VMRUN */
> +	if (!__nested_vmcb_check_controls(vcpu, &ctl_cached, save->cr0))
>  		goto out_free;
>  
>  	/*
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index f6fb70ddf7272..3e805a43ffcdb 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -552,7 +552,8 @@ static inline bool gif_set(struct vcpu_svm *svm)
>  
>  static inline bool nested_npt_enabled(struct vcpu_svm *svm)
>  {
> -	return svm->nested.ctl.nested_ctl & SVM_NESTED_CTL_NP_ENABLE;
> +	return guest_cpu_cap_has(&svm->vcpu, X86_FEATURE_NPT) &&
> +		svm->nested.ctl.nested_ctl & SVM_NESTED_CTL_NP_ENABLE;

I would rather rely on Kevin's patch to clear unsupported features.

>  }
>  
>  static inline bool nested_vnmi_enabled(struct vcpu_svm *svm)
> -- 
> 2.51.2.1041.gc1ab5b90ca-goog
> 
Re: [PATCH v2 04/13] KVM: nSVM: Fix consistency checks for NP_ENABLE
Posted by Yosry Ahmed 2 months ago
On Tue, Dec 09, 2025 at 08:27:39AM -0800, Sean Christopherson wrote:
> On Mon, Nov 10, 2025, Yosry Ahmed wrote:
> > KVM currenty fails a nested VMRUN and injects VMEXIT_INVALID (aka
> > SVM_EXIT_ERR) if L1 sets NP_ENABLE and the host does not support NPTs.
> > On first glance, it seems like the check should actually be for
> > guest_cpu_cap_has(X86_FEATURE_NPT) instead, as it is possible for the
> > host to support NPTs but the guest CPUID to not advertise it.
> > 
> > However, the consistency check is not architectural to begin with. The
> > APM does not mention VMEXIT_INVALID if NP_ENABLE is set on a processor
> > that does not have X86_FEATURE_NPT. Hence, NP_ENABLE should be ignored
> > if X86_FEATURE_NPT is not available for L1. Apart from the consistency
> > check, this is currently the case because NP_ENABLE is actually copied
> > from VMCB01 to VMCB02, not from VMCB12.
> > 
> > On the other hand, the APM does mention two other consistency checks for
> > NP_ENABLE, both of which are missing (paraphrased):
> > 
> > In Volume #2, 15.25.3 (24593—Rev. 3.42—March 2024):
> > 
> >   If VMRUN is executed with hCR0.PG cleared to zero and NP_ENABLE set to
> >   1, VMRUN terminates with #VMEXIT(VMEXIT_INVALID)
> > 
> > In Volume #2, 15.25.4 (24593—Rev. 3.42—March 2024):
> > 
> >   When VMRUN is executed with nested paging enabled (NP_ENABLE = 1), the
> >   following conditions are considered illegal state combinations, in
> >   addition to those mentioned in “Canonicalization and Consistency
> >   Checks”:
> >     • Any MBZ bit of nCR3 is set.
> >     • Any G_PAT.PA field has an unsupported type encoding or any
> >     reserved field in G_PAT has a nonzero value.
> 
> This should be three patches, one each for the new consistency checks, and one
> to the made-up check.  Shortlogs like "Fix all the bugs" are strong hints that
> a patch is doing too much.

Ack, will split it.

> 
> > Replace the existing consistency check with consistency checks on
> > hCR0.PG and nCR3. Only perform the consistency checks if L1 has
> > X86_FEATURE_NPT and NP_ENABLE is set in VMCB12. The G_PAT consistency
> > check will be addressed separately.
> > 
> > As it is now possible for an L1 to run L2 with NP_ENABLE set but
> > ignored, also check that L1 has X86_FEATURE_NPT in nested_npt_enabled().
> > 
> > Pass L1's CR0 to __nested_vmcb_check_controls(). In
> > nested_vmcb_check_controls(), L1's CR0 is available through
> > kvm_read_cr0(), as vcpu->arch.cr0 is not updated to L2's CR0 until later
> > through nested_vmcb02_prepare_save() -> svm_set_cr0().
> > 
> > In svm_set_nested_state(), L1's CR0 is available in the captured save
> > area, as svm_get_nested_state() captures L1's save area when running L2,
> > and L1's CR0 is stashed in VMCB01 on nested VMRUN (in
> > nested_svm_vmrun()).
> > 
> > Fixes: 4b16184c1cca ("KVM: SVM: Initialize Nested Nested MMU context on VMRUN")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
> > ---
> >  arch/x86/kvm/svm/nested.c | 21 ++++++++++++++++-----
> >  arch/x86/kvm/svm/svm.h    |  3 ++-
> >  2 files changed, 18 insertions(+), 6 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> > index 74211c5c68026..87bcc5eff96e8 100644
> > --- a/arch/x86/kvm/svm/nested.c
> > +++ b/arch/x86/kvm/svm/nested.c
> > @@ -325,7 +325,8 @@ static bool nested_svm_check_bitmap_pa(struct kvm_vcpu *vcpu, u64 pa, u32 size)
> >  }
> >  
> >  static bool __nested_vmcb_check_controls(struct kvm_vcpu *vcpu,
> > -					 struct vmcb_ctrl_area_cached *control)
> > +					 struct vmcb_ctrl_area_cached *control,
> > +					 unsigned long l1_cr0)
> >  {
> >  	if (CC(!vmcb12_is_intercept(control, INTERCEPT_VMRUN)))
> >  		return false;
> > @@ -333,8 +334,12 @@ static bool __nested_vmcb_check_controls(struct kvm_vcpu *vcpu,
> >  	if (CC(control->asid == 0))
> >  		return false;
> >  
> > -	if (CC((control->nested_ctl & SVM_NESTED_CTL_NP_ENABLE) && !npt_enabled))
> > -		return false;
> > +	if (nested_npt_enabled(to_svm(vcpu))) {
> > +		if (CC(!kvm_vcpu_is_legal_gpa(vcpu, control->nested_cr3)))
> > +			return false;
> > +		if (CC(!(l1_cr0 & X86_CR0_PG)))
> > +			return false;
> > +	}
> >  
> >  	if (CC(!nested_svm_check_bitmap_pa(vcpu, control->msrpm_base_pa,
> >  					   MSRPM_SIZE)))
> > @@ -400,7 +405,12 @@ static bool nested_vmcb_check_controls(struct kvm_vcpu *vcpu)
> >  	struct vcpu_svm *svm = to_svm(vcpu);
> >  	struct vmcb_ctrl_area_cached *ctl = &svm->nested.ctl;
> >  
> > -	return __nested_vmcb_check_controls(vcpu, ctl);
> > +	/*
> > +	 * Make sure we did not enter guest mode yet, in which case
> 
> No pronouns.

I thought that rule was for commit logs. There are plenty of 'we's in
the KVM x86 code (and all x86 code for that matter) :P

> 
> > +	 * kvm_read_cr0() could return L2's CR0.
> > +	 */
> > +	WARN_ON_ONCE(is_guest_mode(vcpu));
> > +	return __nested_vmcb_check_controls(vcpu, ctl, kvm_read_cr0(vcpu));
> >  }
> >  
> >  static
> > @@ -1831,7 +1841,8 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu,
> >  
> >  	ret = -EINVAL;
> >  	__nested_copy_vmcb_control_to_cache(vcpu, &ctl_cached, ctl);
> > -	if (!__nested_vmcb_check_controls(vcpu, &ctl_cached))
> > +	/* 'save' contains L1 state saved from before VMRUN */
> > +	if (!__nested_vmcb_check_controls(vcpu, &ctl_cached, save->cr0))
> >  		goto out_free;
> >  
> >  	/*
> > diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> > index f6fb70ddf7272..3e805a43ffcdb 100644
> > --- a/arch/x86/kvm/svm/svm.h
> > +++ b/arch/x86/kvm/svm/svm.h
> > @@ -552,7 +552,8 @@ static inline bool gif_set(struct vcpu_svm *svm)
> >  
> >  static inline bool nested_npt_enabled(struct vcpu_svm *svm)
> >  {
> > -	return svm->nested.ctl.nested_ctl & SVM_NESTED_CTL_NP_ENABLE;
> > +	return guest_cpu_cap_has(&svm->vcpu, X86_FEATURE_NPT) &&
> > +		svm->nested.ctl.nested_ctl & SVM_NESTED_CTL_NP_ENABLE;
> 
> I would rather rely on Kevin's patch to clear unsupported features.

Not sure how Kevin's patch is relevant here, could you please clarify?

This is to account for removing the artifical consistency check. It's
now possible to have SVM_NESTED_CTL_NP_ENABLE set and ignored, so we
need to also check that the guest can actually use NPTs here.

> 
> >  }
> >  
> >  static inline bool nested_vnmi_enabled(struct vcpu_svm *svm)
> > -- 
> > 2.51.2.1041.gc1ab5b90ca-goog
> > 
Re: [PATCH v2 04/13] KVM: nSVM: Fix consistency checks for NP_ENABLE
Posted by Sean Christopherson 2 months ago
On Tue, Dec 09, 2025, Yosry Ahmed wrote:
> On Tue, Dec 09, 2025 at 08:27:39AM -0800, Sean Christopherson wrote:
> > On Mon, Nov 10, 2025, Yosry Ahmed wrote:
> > > @@ -400,7 +405,12 @@ static bool nested_vmcb_check_controls(struct kvm_vcpu *vcpu)
> > >  	struct vcpu_svm *svm = to_svm(vcpu);
> > >  	struct vmcb_ctrl_area_cached *ctl = &svm->nested.ctl;
> > >  
> > > -	return __nested_vmcb_check_controls(vcpu, ctl);
> > > +	/*
> > > +	 * Make sure we did not enter guest mode yet, in which case
> > 
> > No pronouns.
> 
> I thought that rule was for commit logs. 

In KVM x86, it's a rule everywhere.  Pronouns often add ambiguity, and it's much
easier to have a hard "no pronouns" rule than to try and enforce an inherently
subjective "is this ambiguous or not" rule.

> There are plenty of 'we's in the KVM x86 code (and all x86 code for that
> matter) :P

Ya, KVM is an 18+ year old code base.  There's also a ton of bare "unsigned" usage,
and other things that are frowned upon and/or flagged by checkpatch.  I'm all for
cleaning things up when touching the code, but I'm staunchly against "tree"-wide
cleanups just to make checkpatch happy, and so there's quite a few historical
violations of the current "rules".

> > > diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> > > index f6fb70ddf7272..3e805a43ffcdb 100644
> > > --- a/arch/x86/kvm/svm/svm.h
> > > +++ b/arch/x86/kvm/svm/svm.h
> > > @@ -552,7 +552,8 @@ static inline bool gif_set(struct vcpu_svm *svm)
> > >  
> > >  static inline bool nested_npt_enabled(struct vcpu_svm *svm)
> > >  {
> > > -	return svm->nested.ctl.nested_ctl & SVM_NESTED_CTL_NP_ENABLE;
> > > +	return guest_cpu_cap_has(&svm->vcpu, X86_FEATURE_NPT) &&
> > > +		svm->nested.ctl.nested_ctl & SVM_NESTED_CTL_NP_ENABLE;
> > 
> > I would rather rely on Kevin's patch to clear unsupported features.
> 
> Not sure how Kevin's patch is relevant here, could you please clarify?

Doh, Kevin's patch only touches intercepts.  What I was trying to say is that I
would rather sanitize the snapshot (the approach Kevin's patch takes with the
intercepts), as opposed to guarding the accessor.  That way we can't have bugs
where KVM checks svm->nested.ctl.nested_ctl directly and bypasses the caps check.
Re: [PATCH v2 04/13] KVM: nSVM: Fix consistency checks for NP_ENABLE
Posted by Yosry Ahmed 2 months ago
On Tue, Dec 09, 2025 at 10:26:31AM -0800, Sean Christopherson wrote:
> On Tue, Dec 09, 2025, Yosry Ahmed wrote:
> > On Tue, Dec 09, 2025 at 08:27:39AM -0800, Sean Christopherson wrote:
> > > On Mon, Nov 10, 2025, Yosry Ahmed wrote:
> > > > @@ -400,7 +405,12 @@ static bool nested_vmcb_check_controls(struct kvm_vcpu *vcpu)
> > > >  	struct vcpu_svm *svm = to_svm(vcpu);
> > > >  	struct vmcb_ctrl_area_cached *ctl = &svm->nested.ctl;
> > > >  
> > > > -	return __nested_vmcb_check_controls(vcpu, ctl);
> > > > +	/*
> > > > +	 * Make sure we did not enter guest mode yet, in which case
> > > 
> > > No pronouns.
> > 
> > I thought that rule was for commit logs. 
> 
> In KVM x86, it's a rule everywhere.  Pronouns often add ambiguity, and it's much
> easier to have a hard "no pronouns" rule than to try and enforce an inherently
> subjective "is this ambiguous or not" rule.
> 
> > There are plenty of 'we's in the KVM x86 code (and all x86 code for that
> > matter) :P
> 
> Ya, KVM is an 18+ year old code base.  There's also a ton of bare "unsigned" usage,
> and other things that are frowned upon and/or flagged by checkpatch.  I'm all for
> cleaning things up when touching the code, but I'm staunchly against "tree"-wide
> cleanups just to make checkpatch happy, and so there's quite a few historical
> violations of the current "rules".

Ack.

> 
> > > > diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> > > > index f6fb70ddf7272..3e805a43ffcdb 100644
> > > > --- a/arch/x86/kvm/svm/svm.h
> > > > +++ b/arch/x86/kvm/svm/svm.h
> > > > @@ -552,7 +552,8 @@ static inline bool gif_set(struct vcpu_svm *svm)
> > > >  
> > > >  static inline bool nested_npt_enabled(struct vcpu_svm *svm)
> > > >  {
> > > > -	return svm->nested.ctl.nested_ctl & SVM_NESTED_CTL_NP_ENABLE;
> > > > +	return guest_cpu_cap_has(&svm->vcpu, X86_FEATURE_NPT) &&
> > > > +		svm->nested.ctl.nested_ctl & SVM_NESTED_CTL_NP_ENABLE;
> > > 
> > > I would rather rely on Kevin's patch to clear unsupported features.
> > 
> > Not sure how Kevin's patch is relevant here, could you please clarify?
> 
> Doh, Kevin's patch only touches intercepts.  What I was trying to say is that I
> would rather sanitize the snapshot (the approach Kevin's patch takes with the
> intercepts), as opposed to guarding the accessor.  That way we can't have bugs
> where KVM checks svm->nested.ctl.nested_ctl directly and bypasses the caps check.

I see, so clear SVM_NESTED_CTL_NP_ENABLE in
__nested_copy_vmcb_control_to_cache() instead.

If I drop the guest_cpu_cap_has() check here I will want to leave a
comment so that it's obvious to readers that SVM_NESTED_CTL_NP_ENABLE is
sanitized elsewhere if the guest cannot use NPTs. Alternatively, I can
just keep the guest_cpu_cap_has() check as documentation and a second
line of defense.

Any preferences?
Re: [PATCH v2 04/13] KVM: nSVM: Fix consistency checks for NP_ENABLE
Posted by Sean Christopherson 2 months ago
On Tue, Dec 09, 2025, Yosry Ahmed wrote:
> On Tue, Dec 09, 2025 at 10:26:31AM -0800, Sean Christopherson wrote:
> > On Tue, Dec 09, 2025, Yosry Ahmed wrote:
> > > > > diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> > > > > index f6fb70ddf7272..3e805a43ffcdb 100644
> > > > > --- a/arch/x86/kvm/svm/svm.h
> > > > > +++ b/arch/x86/kvm/svm/svm.h
> > > > > @@ -552,7 +552,8 @@ static inline bool gif_set(struct vcpu_svm *svm)
> > > > >  
> > > > >  static inline bool nested_npt_enabled(struct vcpu_svm *svm)
> > > > >  {
> > > > > -	return svm->nested.ctl.nested_ctl & SVM_NESTED_CTL_NP_ENABLE;
> > > > > +	return guest_cpu_cap_has(&svm->vcpu, X86_FEATURE_NPT) &&
> > > > > +		svm->nested.ctl.nested_ctl & SVM_NESTED_CTL_NP_ENABLE;
> > > > 
> > > > I would rather rely on Kevin's patch to clear unsupported features.
> > > 
> > > Not sure how Kevin's patch is relevant here, could you please clarify?
> > 
> > Doh, Kevin's patch only touches intercepts.  What I was trying to say is that I
> > would rather sanitize the snapshot (the approach Kevin's patch takes with the
> > intercepts), as opposed to guarding the accessor.  That way we can't have bugs
> > where KVM checks svm->nested.ctl.nested_ctl directly and bypasses the caps check.
> 
> I see, so clear SVM_NESTED_CTL_NP_ENABLE in
> __nested_copy_vmcb_control_to_cache() instead.
> 
> If I drop the guest_cpu_cap_has() check here I will want to leave a
> comment so that it's obvious to readers that SVM_NESTED_CTL_NP_ENABLE is
> sanitized elsewhere if the guest cannot use NPTs. Alternatively, I can
> just keep the guest_cpu_cap_has() check as documentation and a second
> line of defense.
> 
> Any preferences?

Honestly, do nothing.  I want to solidify sanitizing the cache as standard behavior,
at which point adding a comment implies that nested_npt_enabled() is somehow special,
i.e. that it _doesn't_ follow the standard.
Re: [PATCH v2 04/13] KVM: nSVM: Fix consistency checks for NP_ENABLE
Posted by Yosry Ahmed 2 months ago
On Tue, Dec 09, 2025 at 10:42:21AM -0800, Sean Christopherson wrote:
> On Tue, Dec 09, 2025, Yosry Ahmed wrote:
> > On Tue, Dec 09, 2025 at 10:26:31AM -0800, Sean Christopherson wrote:
> > > On Tue, Dec 09, 2025, Yosry Ahmed wrote:
> > > > > > diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> > > > > > index f6fb70ddf7272..3e805a43ffcdb 100644
> > > > > > --- a/arch/x86/kvm/svm/svm.h
> > > > > > +++ b/arch/x86/kvm/svm/svm.h
> > > > > > @@ -552,7 +552,8 @@ static inline bool gif_set(struct vcpu_svm *svm)
> > > > > >  
> > > > > >  static inline bool nested_npt_enabled(struct vcpu_svm *svm)
> > > > > >  {
> > > > > > -	return svm->nested.ctl.nested_ctl & SVM_NESTED_CTL_NP_ENABLE;
> > > > > > +	return guest_cpu_cap_has(&svm->vcpu, X86_FEATURE_NPT) &&
> > > > > > +		svm->nested.ctl.nested_ctl & SVM_NESTED_CTL_NP_ENABLE;
> > > > > 
> > > > > I would rather rely on Kevin's patch to clear unsupported features.
> > > > 
> > > > Not sure how Kevin's patch is relevant here, could you please clarify?
> > > 
> > > Doh, Kevin's patch only touches intercepts.  What I was trying to say is that I
> > > would rather sanitize the snapshot (the approach Kevin's patch takes with the
> > > intercepts), as opposed to guarding the accessor.  That way we can't have bugs
> > > where KVM checks svm->nested.ctl.nested_ctl directly and bypasses the caps check.
> > 
> > I see, so clear SVM_NESTED_CTL_NP_ENABLE in
> > __nested_copy_vmcb_control_to_cache() instead.
> > 
> > If I drop the guest_cpu_cap_has() check here I will want to leave a
> > comment so that it's obvious to readers that SVM_NESTED_CTL_NP_ENABLE is
> > sanitized elsewhere if the guest cannot use NPTs. Alternatively, I can
> > just keep the guest_cpu_cap_has() check as documentation and a second
> > line of defense.
> > 
> > Any preferences?
> 
> Honestly, do nothing.  I want to solidify sanitizing the cache as standard behavior,
> at which point adding a comment implies that nested_npt_enabled() is somehow special,
> i.e. that it _doesn't_ follow the standard.

Does this apply to patch 12 as well? In that patch I int_vector,
int_state, and event_inj when copying them to VMCB02 in
nested_vmcb02_prepare_control(). Mainly because
nested_vmcb02_prepare_control() already kinda filters what to copy from
VMCB12 (e.g. int_ctl), so it seemed like a better fit.

Do I keep that as-is, or do you prefer that I also sanitize these fields
when copying to the cache in nested_copy_vmcb_control_to_cache()?
Re: [PATCH v2 04/13] KVM: nSVM: Fix consistency checks for NP_ENABLE
Posted by Sean Christopherson 1 month, 4 weeks ago
On Tue, Dec 09, 2025, Yosry Ahmed wrote:
> On Tue, Dec 09, 2025 at 10:42:21AM -0800, Sean Christopherson wrote:
> > On Tue, Dec 09, 2025, Yosry Ahmed wrote:
> > > On Tue, Dec 09, 2025 at 10:26:31AM -0800, Sean Christopherson wrote:
> > > > On Tue, Dec 09, 2025, Yosry Ahmed wrote:
> > > > > > > diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> > > > > > > index f6fb70ddf7272..3e805a43ffcdb 100644
> > > > > > > --- a/arch/x86/kvm/svm/svm.h
> > > > > > > +++ b/arch/x86/kvm/svm/svm.h
> > > > > > > @@ -552,7 +552,8 @@ static inline bool gif_set(struct vcpu_svm *svm)
> > > > > > >  
> > > > > > >  static inline bool nested_npt_enabled(struct vcpu_svm *svm)
> > > > > > >  {
> > > > > > > -	return svm->nested.ctl.nested_ctl & SVM_NESTED_CTL_NP_ENABLE;
> > > > > > > +	return guest_cpu_cap_has(&svm->vcpu, X86_FEATURE_NPT) &&
> > > > > > > +		svm->nested.ctl.nested_ctl & SVM_NESTED_CTL_NP_ENABLE;
> > > > > > 
> > > > > > I would rather rely on Kevin's patch to clear unsupported features.
> > > > > 
> > > > > Not sure how Kevin's patch is relevant here, could you please clarify?
> > > > 
> > > > Doh, Kevin's patch only touches intercepts.  What I was trying to say is that I
> > > > would rather sanitize the snapshot (the approach Kevin's patch takes with the
> > > > intercepts), as opposed to guarding the accessor.  That way we can't have bugs
> > > > where KVM checks svm->nested.ctl.nested_ctl directly and bypasses the caps check.
> > > 
> > > I see, so clear SVM_NESTED_CTL_NP_ENABLE in
> > > __nested_copy_vmcb_control_to_cache() instead.
> > > 
> > > If I drop the guest_cpu_cap_has() check here I will want to leave a
> > > comment so that it's obvious to readers that SVM_NESTED_CTL_NP_ENABLE is
> > > sanitized elsewhere if the guest cannot use NPTs. Alternatively, I can
> > > just keep the guest_cpu_cap_has() check as documentation and a second
> > > line of defense.
> > > 
> > > Any preferences?
> > 
> > Honestly, do nothing.  I want to solidify sanitizing the cache as standard behavior,
> > at which point adding a comment implies that nested_npt_enabled() is somehow special,
> > i.e. that it _doesn't_ follow the standard.
> 
> Does this apply to patch 12 as well? In that patch I int_vector,

I <something>?

> int_state, and event_inj when copying them to VMCB02 in
> nested_vmcb02_prepare_control(). Mainly because
> nested_vmcb02_prepare_control() already kinda filters what to copy from
> VMCB12 (e.g. int_ctl), so it seemed like a better fit.
> 
> Do I keep that as-is, or do you prefer that I also sanitize these fields
> when copying to the cache in nested_copy_vmcb_control_to_cache()?

I don't think I follow.  What would the sanitization look like?  Note, I don't
think we need to completely sanitize _every_ field.  The key fields are ones
where KVM consumes and/or acts on the field.
Re: [PATCH v2 04/13] KVM: nSVM: Fix consistency checks for NP_ENABLE
Posted by Yosry Ahmed 1 month, 4 weeks ago
On Fri, Dec 12, 2025 at 10:32:23AM -0800, Sean Christopherson wrote:
> On Tue, Dec 09, 2025, Yosry Ahmed wrote:
> > On Tue, Dec 09, 2025 at 10:42:21AM -0800, Sean Christopherson wrote:
> > > On Tue, Dec 09, 2025, Yosry Ahmed wrote:
> > > > On Tue, Dec 09, 2025 at 10:26:31AM -0800, Sean Christopherson wrote:
> > > > > On Tue, Dec 09, 2025, Yosry Ahmed wrote:
> > > > > > > > diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> > > > > > > > index f6fb70ddf7272..3e805a43ffcdb 100644
> > > > > > > > --- a/arch/x86/kvm/svm/svm.h
> > > > > > > > +++ b/arch/x86/kvm/svm/svm.h
> > > > > > > > @@ -552,7 +552,8 @@ static inline bool gif_set(struct vcpu_svm *svm)
> > > > > > > >  
> > > > > > > >  static inline bool nested_npt_enabled(struct vcpu_svm *svm)
> > > > > > > >  {
> > > > > > > > -	return svm->nested.ctl.nested_ctl & SVM_NESTED_CTL_NP_ENABLE;
> > > > > > > > +	return guest_cpu_cap_has(&svm->vcpu, X86_FEATURE_NPT) &&
> > > > > > > > +		svm->nested.ctl.nested_ctl & SVM_NESTED_CTL_NP_ENABLE;
> > > > > > > 
> > > > > > > I would rather rely on Kevin's patch to clear unsupported features.
> > > > > > 
> > > > > > Not sure how Kevin's patch is relevant here, could you please clarify?
> > > > > 
> > > > > Doh, Kevin's patch only touches intercepts.  What I was trying to say is that I
> > > > > would rather sanitize the snapshot (the approach Kevin's patch takes with the
> > > > > intercepts), as opposed to guarding the accessor.  That way we can't have bugs
> > > > > where KVM checks svm->nested.ctl.nested_ctl directly and bypasses the caps check.
> > > > 
> > > > I see, so clear SVM_NESTED_CTL_NP_ENABLE in
> > > > __nested_copy_vmcb_control_to_cache() instead.
> > > > 
> > > > If I drop the guest_cpu_cap_has() check here I will want to leave a
> > > > comment so that it's obvious to readers that SVM_NESTED_CTL_NP_ENABLE is
> > > > sanitized elsewhere if the guest cannot use NPTs. Alternatively, I can
> > > > just keep the guest_cpu_cap_has() check as documentation and a second
> > > > line of defense.
> > > > 
> > > > Any preferences?
> > > 
> > > Honestly, do nothing.  I want to solidify sanitizing the cache as standard behavior,
> > > at which point adding a comment implies that nested_npt_enabled() is somehow special,
> > > i.e. that it _doesn't_ follow the standard.
> > 
> > Does this apply to patch 12 as well? In that patch I int_vector,
> 
> I <something>?

I "sanitize" int_vector..

Sorry :D

> 
> > int_state, and event_inj when copying them to VMCB02 in
> > nested_vmcb02_prepare_control(). Mainly because
> > nested_vmcb02_prepare_control() already kinda filters what to copy from
> > VMCB12 (e.g. int_ctl), so it seemed like a better fit.
> > 
> > Do I keep that as-is, or do you prefer that I also sanitize these fields
> > when copying to the cache in nested_copy_vmcb_control_to_cache()?
> 
> I don't think I follow.  What would the sanitization look like?  Note, I don't
> think we need to completely sanitize _every_ field.  The key fields are ones
> where KVM consumes and/or acts on the field.

Patch 12 currently sanitizes what is copied from VMCB12 to VMCB02 for
int_vector, int_state, and event_inj in nested_vmcb02_prepare_control():

@@ -890,9 +893,9 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
 		(svm->nested.ctl.int_ctl & int_ctl_vmcb12_bits) |
 		(vmcb01->control.int_ctl & int_ctl_vmcb01_bits);

-	vmcb02->control.int_vector          = svm->nested.ctl.int_vector;
-	vmcb02->control.int_state           = svm->nested.ctl.int_state;
-	vmcb02->control.event_inj           = svm->nested.ctl.event_inj;
+	vmcb02->control.int_vector          = svm->nested.ctl.int_vector & SVM_INT_VECTOR_MASK;
+	vmcb02->control.int_state           = svm->nested.ctl.int_state & SVM_INTERRUPT_SHADOW_MASK;
+	vmcb02->control.event_inj           = svm->nested.ctl.event_inj & ~SVM_EVTINJ_RESERVED_BITS;
 	vmcb02->control.event_inj_err       = svm->nested.ctl.event_inj_err;

My question was: given this:

> I want to solidify sanitizing the cache as standard behavior

Do you prefer that I move this sanitization when copying from L1's
VMCB12 to the cached VMCB12 in nested_copy_vmcb_control_to_cache()?

I initially made it part of nested_vmcb02_prepare_control() as it
already filters what to pick from the VMCB12 for some other related
fields like int_ctl based on what features are exposed to the guest.
Re: [PATCH v2 04/13] KVM: nSVM: Fix consistency checks for NP_ENABLE
Posted by Sean Christopherson 1 month, 3 weeks ago
On Fri, Dec 12, 2025, Yosry Ahmed wrote:
> On Fri, Dec 12, 2025 at 10:32:23AM -0800, Sean Christopherson wrote:
> > On Tue, Dec 09, 2025, Yosry Ahmed wrote:
> > > Do I keep that as-is, or do you prefer that I also sanitize these fields
> > > when copying to the cache in nested_copy_vmcb_control_to_cache()?
> > 
> > I don't think I follow.  What would the sanitization look like?  Note, I don't
> > think we need to completely sanitize _every_ field.  The key fields are ones
> > where KVM consumes and/or acts on the field.
> 
> Patch 12 currently sanitizes what is copied from VMCB12 to VMCB02 for
> int_vector, int_state, and event_inj in nested_vmcb02_prepare_control():
> 
> @@ -890,9 +893,9 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
>  		(svm->nested.ctl.int_ctl & int_ctl_vmcb12_bits) |
>  		(vmcb01->control.int_ctl & int_ctl_vmcb01_bits);
> 
> -	vmcb02->control.int_vector          = svm->nested.ctl.int_vector;
> -	vmcb02->control.int_state           = svm->nested.ctl.int_state;
> -	vmcb02->control.event_inj           = svm->nested.ctl.event_inj;
> +	vmcb02->control.int_vector          = svm->nested.ctl.int_vector & SVM_INT_VECTOR_MASK;
> +	vmcb02->control.int_state           = svm->nested.ctl.int_state & SVM_INTERRUPT_SHADOW_MASK;
> +	vmcb02->control.event_inj           = svm->nested.ctl.event_inj & ~SVM_EVTINJ_RESERVED_BITS;
>  	vmcb02->control.event_inj_err       = svm->nested.ctl.event_inj_err;
> 
> My question was: given this:
> 
> > I want to solidify sanitizing the cache as standard behavior
> 
> Do you prefer that I move this sanitization when copying from L1's
> VMCB12 to the cached VMCB12 in nested_copy_vmcb_control_to_cache()?

Hmm, good question.  Probably?  If the main motivation for sanitizing is to
guard against effectively exposing new features unintentionally via vmcs12, then
it seems like the safest option is to ensure the "bad" bits are _never_ set in
KVM-controlled state.

> I initially made it part of nested_vmcb02_prepare_control() as it
> already filters what to pick from the VMCB12 for some other related
> fields like int_ctl based on what features are exposed to the guest.