[RFC PATCH v2 3/6] KVM: x86/mmu: Pass round full 64-bit error code for the KVM page fault

isaku.yamahata@intel.com posted 6 patches 2 years, 7 months ago
There is a newer version of this series
[RFC PATCH v2 3/6] KVM: x86/mmu: Pass round full 64-bit error code for the KVM page fault
Posted by isaku.yamahata@intel.com 2 years, 7 months ago
From: Isaku Yamahata <isaku.yamahata@intel.com>

Because the full 64-bit error code, or more info about the fault, for the
KVM page fault will be needed for protected VM, TDX and SEV-SNP, update
kvm_mmu_do_page_fault() to accept the 64-bit value so it can pass it to the
callbacks.

The upper 32 bits of error code are discarded at kvm_mmu_page_fault()
by lower_32_bits().  Now it's passed down as full 64 bits.
Because only FNAME(page_fault) depends on it, move lower_32_bits() into
FNAME(page_fault).

The accesses of fault->error_code are as follows
- FNAME(page_fault): change to explicitly use lower_32_bits()
- kvm_mmu_page_fault(): explicit mask with PFERR_RSVD_MASK,
                        PFERR_NESTED_GUEST_PAGE
- mmutrace: changed u32 -> u64
- pgprintk(): change %x -> %llx

No functional change is intended.  This is a preparation to pass on more
info with page fault error code.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
Changes v1 -> v2:
- no change
---
 arch/x86/kvm/mmu/mmu.c          | 5 ++---
 arch/x86/kvm/mmu/mmu_internal.h | 4 ++--
 arch/x86/kvm/mmu/mmutrace.h     | 2 +-
 arch/x86/kvm/mmu/paging_tmpl.h  | 4 ++--
 4 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index dc2b9a2f717c..b8ba7f11c3cb 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4510,7 +4510,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 static int nonpaging_page_fault(struct kvm_vcpu *vcpu,
 				struct kvm_page_fault *fault)
 {
-	pgprintk("%s: gva %lx error %x\n", __func__, fault->addr, fault->error_code);
+	pgprintk("%s: gva %llx error %llx\n", __func__, fault->addr, fault->error_code);
 
 	/* This path builds a PAE pagetable, we can map 2mb pages at maximum. */
 	fault->max_level = PG_LEVEL_2M;
@@ -5820,8 +5820,7 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
 	}
 
 	if (r == RET_PF_INVALID) {
-		r = kvm_mmu_do_page_fault(vcpu, cr2_or_gpa,
-					  lower_32_bits(error_code), false,
+		r = kvm_mmu_do_page_fault(vcpu, cr2_or_gpa, error_code, false,
 					  &emulation_type);
 		if (KVM_BUG_ON(r == RET_PF_INVALID, vcpu->kvm))
 			return -EIO;
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index f1786698ae00..7f9ec1e5b136 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -191,7 +191,7 @@ static inline bool is_nx_huge_page_enabled(struct kvm *kvm)
 struct kvm_page_fault {
 	/* arguments to kvm_mmu_do_page_fault.  */
 	const gpa_t addr;
-	const u32 error_code;
+	const u64 error_code;
 	const bool prefetch;
 
 	/* Derived from error_code.  */
@@ -283,7 +283,7 @@ enum {
 };
 
 static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
-					u32 err, bool prefetch, int *emulation_type)
+					u64 err, bool prefetch, int *emulation_type)
 {
 	struct kvm_page_fault fault = {
 		.addr = cr2_or_gpa,
diff --git a/arch/x86/kvm/mmu/mmutrace.h b/arch/x86/kvm/mmu/mmutrace.h
index 2d7555381955..2e77883c92f6 100644
--- a/arch/x86/kvm/mmu/mmutrace.h
+++ b/arch/x86/kvm/mmu/mmutrace.h
@@ -261,7 +261,7 @@ TRACE_EVENT(
 	TP_STRUCT__entry(
 		__field(int, vcpu_id)
 		__field(gpa_t, cr2_or_gpa)
-		__field(u32, error_code)
+		__field(u64, error_code)
 		__field(u64 *, sptep)
 		__field(u64, old_spte)
 		__field(u64, new_spte)
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 0662e0278e70..ee4b881c5b39 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -758,7 +758,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	struct guest_walker walker;
 	int r;
 
-	pgprintk("%s: addr %lx err %x\n", __func__, fault->addr, fault->error_code);
+	pgprintk("%s: addr %llx err %llx\n", __func__, fault->addr, fault->error_code);
 	WARN_ON_ONCE(fault->is_tdp);
 
 	/*
@@ -767,7 +767,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	 * The bit needs to be cleared before walking guest page tables.
 	 */
 	r = FNAME(walk_addr)(&walker, vcpu, fault->addr,
-			     fault->error_code & ~PFERR_RSVD_MASK);
+			     lower_32_bits(fault->error_code) & ~PFERR_RSVD_MASK);
 
 	/*
 	 * The page is not mapped by the guest.  Let the guest handle it.
-- 
2.25.1
Re: [RFC PATCH v2 3/6] KVM: x86/mmu: Pass round full 64-bit error code for the KVM page fault
Posted by Huang, Kai 2 years, 7 months ago
On Thu, 2023-06-22 at 16:16 -0700, Yamahata, Isaku wrote:
> The upper 32 bits of error code are discarded at kvm_mmu_page_fault()
> by lower_32_bits().  Now it's passed down as full 64 bits.
> Because only FNAME(page_fault) depends on it, move lower_32_bits() into
> FNAME(page_fault).

I haven't looked into the code yet, but the last sentence around
FNAME(page_fault) doesn't make a lot sense IIUC?

For instance, we can have a shadow EPT table when EPT is enabled in L0 and
exposed to L1.  If we want to pass 64-bit error code to the handler, how can
FRAME(page_fault)() depend on the lower 32-bit value? 
Re: [RFC PATCH v2 3/6] KVM: x86/mmu: Pass round full 64-bit error code for the KVM page fault
Posted by Isaku Yamahata 2 years, 7 months ago
On Thu, Jun 22, 2023 at 11:28:22PM +0000,
"Huang, Kai" <kai.huang@intel.com> wrote:

> On Thu, 2023-06-22 at 16:16 -0700, Yamahata, Isaku wrote:
> > The upper 32 bits of error code are discarded at kvm_mmu_page_fault()
> > by lower_32_bits().  Now it's passed down as full 64 bits.
> > Because only FNAME(page_fault) depends on it, move lower_32_bits() into
> > FNAME(page_fault).
> 
> I haven't looked into the code yet, but the last sentence around
> FNAME(page_fault) doesn't make a lot sense IIUC?
> 
> For instance, we can have a shadow EPT table when EPT is enabled in L0 and
> exposed to L1.  If we want to pass 64-bit error code to the handler, how can
> FRAME(page_fault)() depend on the lower 32-bit value? 

Probably "depend" was too strong. In short, I wanted to not change the value
passed down as error_code from FNAME(page_fault).

FNAME(page_fault) calls helper function to walk page tables. Some check
PFERR_IMPLICIT_ACCESS_MASK(48 bit).  If we don't mask lower_32_bits(), it can
pass accidentally the bit.  Maybe we can audit the code carefully to check if
IMPLICIT_ACCESS bit doesn't matter or fix it.  But I don't want to do it with
this patch series.
-- 
Isaku Yamahata <isaku.yamahata@gmail.com>
Re: [RFC PATCH v2 3/6] KVM: x86/mmu: Pass round full 64-bit error code for the KVM page fault
Posted by Sean Christopherson 2 years, 7 months ago
On Thu, Jun 22, 2023, Isaku Yamahata wrote:
> On Thu, Jun 22, 2023 at 11:28:22PM +0000,
> Kai Huang <kai.huang@intel.com> wrote:
> 
> > On Thu, 2023-06-22 at 16:16 -0700, Yamahata, Isaku wrote:
> > > The upper 32 bits of error code are discarded at kvm_mmu_page_fault()
> > > by lower_32_bits().� Now it's passed down as full 64 bits.
> > > Because only FNAME(page_fault) depends on it, move lower_32_bits() into
> > > FNAME(page_fault).
> > 
> > I haven't looked into the code yet, but the last sentence around
> > FNAME(page_fault) doesn't make a lot sense IIUC?
> > 
> > For instance, we can have a shadow EPT table when EPT is enabled in L0 and
> > exposed to L1.  If we want to pass 64-bit error code to the handler, how can
> > FRAME(page_fault)() depend on the lower 32-bit value? 
> 
> Probably "depend" was too strong. In short, I wanted to not change the value
> passed down as error_code from FNAME(page_fault).
> 
> FNAME(page_fault) calls helper function to walk page tables. Some check
> PFERR_IMPLICIT_ACCESS_MASK(48 bit).

Argh, IMPLICIT_ACCESS is a KVM-defined value without any sanity checks against it
being used by hardware.

> If we don't mask lower_32_bits(), it can
> pass accidentally the bit.  Maybe we can audit the code carefully to check if
> IMPLICIT_ACCESS bit doesn't matter or fix it.  But I don't want to do it with
> this patch series.

No.  Going halfway on something like this is exactly how we end up with code that
no one understands and exists only because of "hysterical raisins".

There's no reason not to fix this properly.  It's obviously not your fault that
commit 147277540bbc ("kvm: svm: Add support for additional SVM NPF error codes")
took a shortcut, but that doesn't mean it's ok to keep punting the proper fix to
a future developer.

One of the things I've been trying to drill into people's heads by harping on
writing tests is that trying to do the least amount of work to get some newfangled
feature merged is not a winning strategy.  The more work that is done upfront by
developers, the less work that needs to be done by reviewers and maintainers.  It's
definitely not a 1:1 ratio, i.e. you doing an extra hour of work doesn't save me an
hour of work, but since reviewers and maintainers are a relatively scarce resource,
shifting work to developers almost always results in moving faster overall.  And
many times, doing more work upfront *does* require less effort overall, e.g. in
this case, we'll probably have spent more time discussing this than it would have
taken to audit the code in the first place.

As for IMPLICIT_ACCESS, that can and should be handled by asserting that the flag
isn't set by hardware, and if it is, suppressing the bit so that KVM doesn't
misinterpret an unknown hardware flag as IMPLICIT_ACCESS.

I'll test the below and post it separately, but feel free to incorporate it into
this series, I'll make sure everything gets ordered correctly if/when all of this
gets applied.

--
From: Sean Christopherson <seanjc@google.com>
Date: Fri, 23 Jun 2023 09:46:38 -0700
Subject: [PATCH] KVM: x86/mmu: Guard against collision with KVM-defined
 PFERR_IMPLICIT_ACCESS

Add an assertion in kvm_mmu_page_fault() to ensure the error code provided
by hardware doesn't conflict with KVM's software-defined IMPLICIT_ACCESS
flag.  In the unlikely scenario that future hardware starts using bit 48
for a hardware-defined flag, preserving the bit could result in KVM
incorrectly interpreting the unknown flag as KVM's IMPLICIT_ACCESS flag.

WARN so that any such conflict can be surfaced to KVM developers and
resolved, but otherwise ignore the bit as KVM can't possibly rely on a
flag it knows nothing about.

Fixes: 4f4aa80e3b88 ("KVM: X86: Handle implicit supervisor access with SMAP")
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 60397a1beda3..228a483d3746 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5742,6 +5742,17 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
 	int r, emulation_type = EMULTYPE_PF;
 	bool direct = vcpu->arch.mmu->root_role.direct;
 
+	/*
+	 * IMPLICIT_ACCESS is a KVM-defined flag used to correctly perform SMAP
+	 * checks when emulating instructions that triggers implicit access.
+	 * WARN if hardware generates a fault with an error code that collides
+	 * with the KVM-defined value.  Clear the flag and continue on, i.e.
+	 * don't terminate the VM, as KVM can't possibly be relying on a flag
+	 * that KVM doesn't know about.
+	 */
+	if (WARN_ON_ONCE(error_code & PFERR_IMPLICIT_ACCESS))
+		error_code &= ~PFERR_IMPLICIT_ACCESS;
+
 	if (WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root.hpa)))
 		return RET_PF_RETRY;
 

base-commit: 293375bf2fd333e5563dd80b894725b90cd84c5d
--
Re: [RFC PATCH v2 3/6] KVM: x86/mmu: Pass round full 64-bit error code for the KVM page fault
Posted by Huang, Kai 2 years, 7 months ago
> From: Sean Christopherson <seanjc@google.com>
> Date: Fri, 23 Jun 2023 09:46:38 -0700
> Subject: [PATCH] KVM: x86/mmu: Guard against collision with KVM-defined
>  PFERR_IMPLICIT_ACCESS
> 
> Add an assertion in kvm_mmu_page_fault() to ensure the error code provided
> by hardware doesn't conflict with KVM's software-defined IMPLICIT_ACCESS
> flag.  In the unlikely scenario that future hardware starts using bit 48
> for a hardware-defined flag, preserving the bit could result in KVM
> incorrectly interpreting the unknown flag as KVM's IMPLICIT_ACCESS flag.
> 
> WARN so that any such conflict can be surfaced to KVM developers and
> resolved, but otherwise ignore the bit as KVM can't possibly rely on a
> flag it knows nothing about.

I think the fundamental problem is we mix synthetic bit(s) with the hardware
error code together into a single 'u64'.  Given there's no guarantee from
hardware vendors (Intel/AMD) that some bits will be always reserved for software
use, there's no guarantee the synthetic bit(s) won't conflict with those
hardware defined bits.

Perhaps a fundamental fix is to use a new 'u64' as parameter for software-
defined error code passing to all relevant code paths.

But I think your fix (or detection) below should be good enough perhaps for a
long time, and even in the future when such conflict merges, we can move the
synthetic bit to another bit.  The only problem is probably we will need
relevant patch(es) back-ported to stable kernels.

> 
> Fixes: 4f4aa80e3b88 ("KVM: X86: Handle implicit supervisor access with SMAP")
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/mmu/mmu.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 60397a1beda3..228a483d3746 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -5742,6 +5742,17 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
>  	int r, emulation_type = EMULTYPE_PF;
>  	bool direct = vcpu->arch.mmu->root_role.direct;
>  
> +	/*
> +	 * IMPLICIT_ACCESS is a KVM-defined flag used to correctly perform SMAP
> +	 * checks when emulating instructions that triggers implicit access.
> +	 * WARN if hardware generates a fault with an error code that collides
> +	 * with the KVM-defined value.  Clear the flag and continue on, i.e.
> +	 * don't terminate the VM, as KVM can't possibly be relying on a flag
> +	 * that KVM doesn't know about.
> +	 */
> +	if (WARN_ON_ONCE(error_code & PFERR_IMPLICIT_ACCESS))
> +		error_code &= ~PFERR_IMPLICIT_ACCESS;
> +
>  	if (WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root.hpa)))
>  		return RET_PF_RETRY;
>  
> 
> base-commit: 293375bf2fd333e5563dd80b894725b90cd84c5d
> --

Re: [RFC PATCH v2 3/6] KVM: x86/mmu: Pass round full 64-bit error code for the KVM page fault
Posted by Sean Christopherson 2 years, 7 months ago
On Sat, Jun 24, 2023, Kai Huang wrote:
> 
> > From: Sean Christopherson <seanjc@google.com>
> > Date: Fri, 23 Jun 2023 09:46:38 -0700
> > Subject: [PATCH] KVM: x86/mmu: Guard against collision with KVM-defined
> >  PFERR_IMPLICIT_ACCESS
> > 
> > Add an assertion in kvm_mmu_page_fault() to ensure the error code provided
> > by hardware doesn't conflict with KVM's software-defined IMPLICIT_ACCESS
> > flag.  In the unlikely scenario that future hardware starts using bit 48
> > for a hardware-defined flag, preserving the bit could result in KVM
> > incorrectly interpreting the unknown flag as KVM's IMPLICIT_ACCESS flag.
> > 
> > WARN so that any such conflict can be surfaced to KVM developers and
> > resolved, but otherwise ignore the bit as KVM can't possibly rely on a
> > flag it knows nothing about.
> 
> I think the fundamental problem is we mix synthetic bit(s) with the hardware
> error code together into a single 'u64'.  Given there's no guarantee from
> hardware vendors (Intel/AMD) that some bits will be always reserved for software
> use, there's no guarantee the synthetic bit(s) won't conflict with those
> hardware defined bits.
> 
> Perhaps a fundamental fix is to use a new 'u64' as parameter for software-
> defined error code passing to all relevant code paths.

Yeah, in an ideal world KVM wouldn't usurp error code bits.  But I don't know
that it's worth plumbing in an extra param to all the affected helpers.  From a
functional perspective, unless someone runs with panic_on_warn=1 in production,
or I'm missing something, the warn-and-clear approach is sufficient.  If we get
more synthetic "access" bits, then we should revisit this, but I think for now
it's ok

> But I think your fix (or detection) below should be good enough perhaps for a
> long time, and even in the future when such conflict merges, we can move the
> synthetic bit to another bit.  The only problem is probably we will need
> relevant patch(es) back-ported to stable kernels.