[v1] KVM: x86/mmu: Page fault and MMIO cleanups

[PATCH 05/16] KVM: x86/mmu: Use synthetic page fault error code to indicate private faults

Posted by Sean Christopherson 1 year, 11 months ago

Add and use a synthetic, KVM-defined page fault error code to indicate
whether a fault is to private vs. shared memory.  TDX and SNP have
different mechanisms for reporting private vs. shared, and KVM's
software-protected VMs have no mechanism at all.  Usurp an error code
flag to avoid having to plumb another parameter to kvm_mmu_page_fault()
and friends.

Alternatively, KVM could borrow AMD's PFERR_GUEST_ENC_MASK, i.e. set it
for TDX and software-protected VMs as appropriate, but that would require
*clearing* the flag for SEV and SEV-ES VMs, which support encrypted
memory at the hardware layer, but don't utilize private memory at the
KVM layer.

Opportunistically add a comment to call out that the logic for software-
protected VMs is (and was before this commit) broken for nested MMUs, i.e.
for nested TDP, as the GPA is an L2 GPA.  Punt on trying to play nice with
nested MMUs as there is a _lot_ of functionality that simply doesn't work
for software-protected VMs, e.g. all of the paths where KVM accesses guest
memory need to be updated to be aware of private vs. shared memory.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h | 11 +++++++++++
 arch/x86/kvm/mmu/mmu.c          | 26 +++++++++++++++++++-------
 arch/x86/kvm/mmu/mmu_internal.h |  2 +-
 3 files changed, 31 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 1e69743ef0fb..4077c46c61ab 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -267,7 +267,18 @@ enum x86_intercept_stage;
 #define PFERR_GUEST_ENC_MASK	BIT_ULL(34)
 #define PFERR_GUEST_SIZEM_MASK	BIT_ULL(35)
 #define PFERR_GUEST_VMPL_MASK	BIT_ULL(36)
+
+/*
+ * IMPLICIT_ACCESS is a KVM-defined flag used to correctly perform SMAP checks
+ * when emulating instructions that triggers implicit access.
+ */
 #define PFERR_IMPLICIT_ACCESS	BIT_ULL(48)
+/*
+ * PRIVATE_ACCESS is a KVM-defined flag us to indicate that a fault occurred
+ * when the guest was accessing private memory.
+ */
+#define PFERR_PRIVATE_ACCESS	BIT_ULL(49)
+#define PFERR_SYNTHETIC_MASK	(PFERR_IMPLICIT_ACCESS | PFERR_PRIVATE_ACCESS)
 
 #define PFERR_NESTED_GUEST_PAGE (PFERR_GUEST_PAGE_MASK |	\
 				 PFERR_WRITE_MASK |		\
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 408969ac1291..7807bdcd87e8 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5839,19 +5839,31 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
 	bool direct = vcpu->arch.mmu->root_role.direct;
 
 	/*
-	 * IMPLICIT_ACCESS is a KVM-defined flag used to correctly perform SMAP
-	 * checks when emulating instructions that triggers implicit access.
 	 * WARN if hardware generates a fault with an error code that collides
-	 * with the KVM-defined value.  Clear the flag and continue on, i.e.
-	 * don't terminate the VM, as KVM can't possibly be relying on a flag
-	 * that KVM doesn't know about.
+	 * with KVM-defined sythentic flags.  Clear the flags and continue on,
+	 * i.e. don't terminate the VM, as KVM can't possibly be relying on a
+	 * flag that KVM doesn't know about.
 	 */
-	if (WARN_ON_ONCE(error_code & PFERR_IMPLICIT_ACCESS))
-		error_code &= ~PFERR_IMPLICIT_ACCESS;
+	if (WARN_ON_ONCE(error_code & PFERR_SYNTHETIC_MASK))
+		error_code &= ~PFERR_SYNTHETIC_MASK;
 
 	if (WARN_ON_ONCE(!VALID_PAGE(vcpu->arch.mmu->root.hpa)))
 		return RET_PF_RETRY;
 
+	/*
+	 * Except for reserved faults (emulated MMIO is shared-only), set the
+	 * private flag for software-protected VMs based on the gfn's current
+	 * attributes, which are the source of truth for such VMs.  Note, this
+	 * wrong for nested MMUs as the GPA is an L2 GPA, but KVM doesn't
+	 * currently supported nested virtualization (among many other things)
+	 * for software-protected VMs.
+	 */
+	if (IS_ENABLED(CONFIG_KVM_SW_PROTECTED_VM) &&
+	    !(error_code & PFERR_RSVD_MASK) &&
+	    vcpu->kvm->arch.vm_type == KVM_X86_SW_PROTECTED_VM &&
+	    kvm_mem_is_private(vcpu->kvm, gpa_to_gfn(cr2_or_gpa)))
+		error_code |= PFERR_PRIVATE_ACCESS;
+
 	r = RET_PF_INVALID;
 	if (unlikely(error_code & PFERR_RSVD_MASK)) {
 		r = handle_mmio_page_fault(vcpu, cr2_or_gpa, direct);
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 1fab1f2359b5..d7c10d338f14 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -306,7 +306,7 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 		.max_level = KVM_MAX_HUGEPAGE_LEVEL,
 		.req_level = PG_LEVEL_4K,
 		.goal_level = PG_LEVEL_4K,
-		.is_private = kvm_mem_is_private(vcpu->kvm, cr2_or_gpa >> PAGE_SHIFT),
+		.is_private = err & PFERR_PRIVATE_ACCESS,
 	};
 	int r;
 
-- 
2.44.0.278.ge034bb2e1d-goog

Re: [PATCH 05/16] KVM: x86/mmu: Use synthetic page fault error code to indicate private faults

Posted by Binbin Wu 1 year, 11 months ago


On 2/28/2024 10:41 AM, Sean Christopherson wrote:
> Add and use a synthetic, KVM-defined page fault error code to indicate
> whether a fault is to private vs. shared memory.  TDX and SNP have
> different mechanisms for reporting private vs. shared, and KVM's
> software-protected VMs have no mechanism at all.  Usurp an error code
> flag to avoid having to plumb another parameter to kvm_mmu_page_fault()
> and friends.
>
> Alternatively, KVM could borrow AMD's PFERR_GUEST_ENC_MASK, i.e. set it
> for TDX and software-protected VMs as appropriate, but that would require
> *clearing* the flag for SEV and SEV-ES VMs, which support encrypted
> memory at the hardware layer, but don't utilize private memory at the
> KVM layer.
>
> Opportunistically add a comment to call out that the logic for software-
> protected VMs is (and was before this commit) broken for nested MMUs, i.e.
> for nested TDP, as the GPA is an L2 GPA.  Punt on trying to play nice with
> nested MMUs as there is a _lot_ of functionality that simply doesn't work
> for software-protected VMs, e.g. all of the paths where KVM accesses guest
> memory need to be updated to be aware of private vs. shared memory.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   arch/x86/include/asm/kvm_host.h | 11 +++++++++++
>   arch/x86/kvm/mmu/mmu.c          | 26 +++++++++++++++++++-------
>   arch/x86/kvm/mmu/mmu_internal.h |  2 +-
>   3 files changed, 31 insertions(+), 8 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 1e69743ef0fb..4077c46c61ab 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -267,7 +267,18 @@ enum x86_intercept_stage;
>   #define PFERR_GUEST_ENC_MASK	BIT_ULL(34)
>   #define PFERR_GUEST_SIZEM_MASK	BIT_ULL(35)
>   #define PFERR_GUEST_VMPL_MASK	BIT_ULL(36)
> +
> +/*
> + * IMPLICIT_ACCESS is a KVM-defined flag used to correctly perform SMAP checks
> + * when emulating instructions that triggers implicit access.
> + */
>   #define PFERR_IMPLICIT_ACCESS	BIT_ULL(48)
> +/*
> + * PRIVATE_ACCESS is a KVM-defined flag us to indicate that a fault occurred

Here "us" is a typo? should be used?

> + * when the guest was accessing private memory.
> + */
> +#define PFERR_PRIVATE_ACCESS	BIT_ULL(49)
> +#define PFERR_SYNTHETIC_MASK	(PFERR_IMPLICIT_ACCESS | PFERR_PRIVATE_ACCESS)
>   
>   #define PFERR_NESTED_GUEST_PAGE (PFERR_GUEST_PAGE_MASK |	\
>   				 PFERR_WRITE_MASK |		\
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 408969ac1291..7807bdcd87e8 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -5839,19 +5839,31 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
>   	bool direct = vcpu->arch.mmu->root_role.direct;
>   
>   	/*
> -	 * IMPLICIT_ACCESS is a KVM-defined flag used to correctly perform SMAP
> -	 * checks when emulating instructions that triggers implicit access.
>   	 * WARN if hardware generates a fault with an error code that collides
> -	 * with the KVM-defined value.  Clear the flag and continue on, i.e.
> -	 * don't terminate the VM, as KVM can't possibly be relying on a flag
> -	 * that KVM doesn't know about.
> +	 * with KVM-defined sythentic flags.  Clear the flags and continue on,

:s/sythentic/synthetic/

Others,
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>

> +	 * i.e. don't terminate the VM, as KVM can't possibly be relying on a
> +	 * flag that KVM doesn't know about.
>   	 */
> -	if (WARN_ON_ONCE(error_code & PFERR_IMPLICIT_ACCESS))
> -		error_code &= ~PFERR_IMPLICIT_ACCESS;
> +	if (WARN_ON_ONCE(error_code & PFERR_SYNTHETIC_MASK))
> +		error_code &= ~PFERR_SYNTHETIC_MASK;
>   
>   	if (WARN_ON_ONCE(!VALID_PAGE(vcpu->arch.mmu->root.hpa)))
>   		return RET_PF_RETRY;
>   
> +	/*
> +	 * Except for reserved faults (emulated MMIO is shared-only), set the
> +	 * private flag for software-protected VMs based on the gfn's current
> +	 * attributes, which are the source of truth for such VMs.  Note, this
> +	 * wrong for nested MMUs as the GPA is an L2 GPA, but KVM doesn't
> +	 * currently supported nested virtualization (among many other things)
> +	 * for software-protected VMs.
> +	 */
> +	if (IS_ENABLED(CONFIG_KVM_SW_PROTECTED_VM) &&
> +	    !(error_code & PFERR_RSVD_MASK) &&
> +	    vcpu->kvm->arch.vm_type == KVM_X86_SW_PROTECTED_VM &&
> +	    kvm_mem_is_private(vcpu->kvm, gpa_to_gfn(cr2_or_gpa)))
> +		error_code |= PFERR_PRIVATE_ACCESS;
> +
>   	r = RET_PF_INVALID;
>   	if (unlikely(error_code & PFERR_RSVD_MASK)) {
>   		r = handle_mmio_page_fault(vcpu, cr2_or_gpa, direct);
> diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
> index 1fab1f2359b5..d7c10d338f14 100644
> --- a/arch/x86/kvm/mmu/mmu_internal.h
> +++ b/arch/x86/kvm/mmu/mmu_internal.h
> @@ -306,7 +306,7 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
>   		.max_level = KVM_MAX_HUGEPAGE_LEVEL,
>   		.req_level = PG_LEVEL_4K,
>   		.goal_level = PG_LEVEL_4K,
> -		.is_private = kvm_mem_is_private(vcpu->kvm, cr2_or_gpa >> PAGE_SHIFT),
> +		.is_private = err & PFERR_PRIVATE_ACCESS,
>   	};
>   	int r;
>

Re: [PATCH 05/16] KVM: x86/mmu: Use synthetic page fault error code to indicate private faults

Posted by Xu Yilun 1 year, 11 months ago

On Tue, Feb 27, 2024 at 06:41:36PM -0800, Sean Christopherson wrote:
> Add and use a synthetic, KVM-defined page fault error code to indicate
> whether a fault is to private vs. shared memory.  TDX and SNP have
> different mechanisms for reporting private vs. shared, and KVM's
> software-protected VMs have no mechanism at all.  Usurp an error code
> flag to avoid having to plumb another parameter to kvm_mmu_page_fault()
> and friends.
> 
> Alternatively, KVM could borrow AMD's PFERR_GUEST_ENC_MASK, i.e. set it
> for TDX and software-protected VMs as appropriate, but that would require
> *clearing* the flag for SEV and SEV-ES VMs, which support encrypted
> memory at the hardware layer, but don't utilize private memory at the
> KVM layer.

I see this alternative in other patchset.  And I still don't understand why
synthetic way is better after reading patch #5-7.  I assume for SEV(-ES) if
there is spurious PFERR_GUEST_ENC flag set in error code when private memory
is not used in KVM, then it is a HW issue or SW bug that needs to be caught
and warned, and by the way cleared.

Thanks,
Yilun

> 
> Opportunistically add a comment to call out that the logic for software-
> protected VMs is (and was before this commit) broken for nested MMUs, i.e.
> for nested TDP, as the GPA is an L2 GPA.  Punt on trying to play nice with
> nested MMUs as there is a _lot_ of functionality that simply doesn't work
> for software-protected VMs, e.g. all of the paths where KVM accesses guest
> memory need to be updated to be aware of private vs. shared memory.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/include/asm/kvm_host.h | 11 +++++++++++
>  arch/x86/kvm/mmu/mmu.c          | 26 +++++++++++++++++++-------
>  arch/x86/kvm/mmu/mmu_internal.h |  2 +-
>  3 files changed, 31 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 1e69743ef0fb..4077c46c61ab 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -267,7 +267,18 @@ enum x86_intercept_stage;
>  #define PFERR_GUEST_ENC_MASK	BIT_ULL(34)
>  #define PFERR_GUEST_SIZEM_MASK	BIT_ULL(35)
>  #define PFERR_GUEST_VMPL_MASK	BIT_ULL(36)
> +
> +/*
> + * IMPLICIT_ACCESS is a KVM-defined flag used to correctly perform SMAP checks
> + * when emulating instructions that triggers implicit access.
> + */
>  #define PFERR_IMPLICIT_ACCESS	BIT_ULL(48)
> +/*
> + * PRIVATE_ACCESS is a KVM-defined flag us to indicate that a fault occurred
> + * when the guest was accessing private memory.
> + */
> +#define PFERR_PRIVATE_ACCESS	BIT_ULL(49)
> +#define PFERR_SYNTHETIC_MASK	(PFERR_IMPLICIT_ACCESS | PFERR_PRIVATE_ACCESS)
>  
>  #define PFERR_NESTED_GUEST_PAGE (PFERR_GUEST_PAGE_MASK |	\
>  				 PFERR_WRITE_MASK |		\
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 408969ac1291..7807bdcd87e8 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -5839,19 +5839,31 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
>  	bool direct = vcpu->arch.mmu->root_role.direct;
>  
>  	/*
> -	 * IMPLICIT_ACCESS is a KVM-defined flag used to correctly perform SMAP
> -	 * checks when emulating instructions that triggers implicit access.
>  	 * WARN if hardware generates a fault with an error code that collides
> -	 * with the KVM-defined value.  Clear the flag and continue on, i.e.
> -	 * don't terminate the VM, as KVM can't possibly be relying on a flag
> -	 * that KVM doesn't know about.
> +	 * with KVM-defined sythentic flags.  Clear the flags and continue on,
> +	 * i.e. don't terminate the VM, as KVM can't possibly be relying on a
> +	 * flag that KVM doesn't know about.
>  	 */
> -	if (WARN_ON_ONCE(error_code & PFERR_IMPLICIT_ACCESS))
> -		error_code &= ~PFERR_IMPLICIT_ACCESS;
> +	if (WARN_ON_ONCE(error_code & PFERR_SYNTHETIC_MASK))
> +		error_code &= ~PFERR_SYNTHETIC_MASK;
>  
>  	if (WARN_ON_ONCE(!VALID_PAGE(vcpu->arch.mmu->root.hpa)))
>  		return RET_PF_RETRY;
>  
> +	/*
> +	 * Except for reserved faults (emulated MMIO is shared-only), set the
> +	 * private flag for software-protected VMs based on the gfn's current
> +	 * attributes, which are the source of truth for such VMs.  Note, this
> +	 * wrong for nested MMUs as the GPA is an L2 GPA, but KVM doesn't
> +	 * currently supported nested virtualization (among many other things)
> +	 * for software-protected VMs.
> +	 */
> +	if (IS_ENABLED(CONFIG_KVM_SW_PROTECTED_VM) &&
> +	    !(error_code & PFERR_RSVD_MASK) &&
> +	    vcpu->kvm->arch.vm_type == KVM_X86_SW_PROTECTED_VM &&
> +	    kvm_mem_is_private(vcpu->kvm, gpa_to_gfn(cr2_or_gpa)))
> +		error_code |= PFERR_PRIVATE_ACCESS;
> +
>  	r = RET_PF_INVALID;
>  	if (unlikely(error_code & PFERR_RSVD_MASK)) {
>  		r = handle_mmio_page_fault(vcpu, cr2_or_gpa, direct);
> diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
> index 1fab1f2359b5..d7c10d338f14 100644
> --- a/arch/x86/kvm/mmu/mmu_internal.h
> +++ b/arch/x86/kvm/mmu/mmu_internal.h
> @@ -306,7 +306,7 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
>  		.max_level = KVM_MAX_HUGEPAGE_LEVEL,
>  		.req_level = PG_LEVEL_4K,
>  		.goal_level = PG_LEVEL_4K,
> -		.is_private = kvm_mem_is_private(vcpu->kvm, cr2_or_gpa >> PAGE_SHIFT),
> +		.is_private = err & PFERR_PRIVATE_ACCESS,
>  	};
>  	int r;
>  
> -- 
> 2.44.0.278.ge034bb2e1d-goog
> 
>

Re: [PATCH 05/16] KVM: x86/mmu: Use synthetic page fault error code to indicate private faults

Posted by Sean Christopherson 1 year, 11 months ago

On Wed, Mar 06, 2024, Xu Yilun wrote:
> On Tue, Feb 27, 2024 at 06:41:36PM -0800, Sean Christopherson wrote:
> > Add and use a synthetic, KVM-defined page fault error code to indicate
> > whether a fault is to private vs. shared memory.  TDX and SNP have
> > different mechanisms for reporting private vs. shared, and KVM's
> > software-protected VMs have no mechanism at all.  Usurp an error code
> > flag to avoid having to plumb another parameter to kvm_mmu_page_fault()
> > and friends.
> > 
> > Alternatively, KVM could borrow AMD's PFERR_GUEST_ENC_MASK, i.e. set it
> > for TDX and software-protected VMs as appropriate, but that would require
> > *clearing* the flag for SEV and SEV-ES VMs, which support encrypted
> > memory at the hardware layer, but don't utilize private memory at the
> > KVM layer.
> 
> I see this alternative in other patchset.  And I still don't understand why
> synthetic way is better after reading patch #5-7.  I assume for SEV(-ES) if
> there is spurious PFERR_GUEST_ENC flag set in error code when private memory
> is not used in KVM, then it is a HW issue or SW bug that needs to be caught
> and warned, and by the way cleared.

The conundrum is that SEV(-ES) support _encrypted_ memory, but cannot support
what KVM calls "private" memory.  In quotes because SEV(-ES) memory encryption
does provide confidentially/privacy, but in KVM they don't support memslots that
can be switched between private and shared, e.g. will return false for
kvm_arch_has_private_mem().

And KVM _can't_ sanely use private/shared memslots for SEV(-ES), because it's
impossible to intercept implicit conversions by the guest, i.e. KVM can't prevent
the guest from encrypting a page that KVM thinks is private, and vice versa.

But from hardware's perspective, while the APM is a bit misleading and I don't
love the behavior, I can't really argue that it's a hardware bug if the CPU sets
PFERR_GUEST_ENC on a fault where the guest access had C-bit=1, because the access
_was_ indeed to encrypted memory.

Which is a long-winded way of saying the unwanted PFERR_GUEST_ENC faults aren't
really spurious, nor are they hardware or software bugs, just another unfortunate
side effect of the fact that the hypervisor can't intercept shared<->encrypted
conversions for SEV(-ES) guests.  And that means that KVM can't WARN, because
literally every SNP-capable CPU would yell when running SEV(-ES) guests.

I also don't like the idea of usurping PFERR_GUEST_ENC to have it mean something
different in KVM as compared to how hardware defines/uses the flag.

Lastly, the approach in Paolo's series to propagate PFERR_GUEST_ENC to .is_private
iff the VM has private memory is brittle, in that it would be too easy for KVM
code that has access to the error code to do the wrong thing and interpret
PFERR_GUEST_ENC has meaning "private".

Re: [PATCH 05/16] KVM: x86/mmu: Use synthetic page fault error code to indicate private faults

Posted by Xu Yilun 1 year, 11 months ago

On Wed, Mar 06, 2024 at 06:45:30AM -0800, Sean Christopherson wrote:
> On Wed, Mar 06, 2024, Xu Yilun wrote:
> > On Tue, Feb 27, 2024 at 06:41:36PM -0800, Sean Christopherson wrote:
> > > Add and use a synthetic, KVM-defined page fault error code to indicate
> > > whether a fault is to private vs. shared memory.  TDX and SNP have
> > > different mechanisms for reporting private vs. shared, and KVM's
> > > software-protected VMs have no mechanism at all.  Usurp an error code
> > > flag to avoid having to plumb another parameter to kvm_mmu_page_fault()
> > > and friends.
> > > 
> > > Alternatively, KVM could borrow AMD's PFERR_GUEST_ENC_MASK, i.e. set it
> > > for TDX and software-protected VMs as appropriate, but that would require
> > > *clearing* the flag for SEV and SEV-ES VMs, which support encrypted
> > > memory at the hardware layer, but don't utilize private memory at the
> > > KVM layer.
> > 
> > I see this alternative in other patchset.  And I still don't understand why
> > synthetic way is better after reading patch #5-7.  I assume for SEV(-ES) if
> > there is spurious PFERR_GUEST_ENC flag set in error code when private memory
> > is not used in KVM, then it is a HW issue or SW bug that needs to be caught
> > and warned, and by the way cleared.
> 
> The conundrum is that SEV(-ES) support _encrypted_ memory, but cannot support
> what KVM calls "private" memory.  In quotes because SEV(-ES) memory encryption
> does provide confidentially/privacy, but in KVM they don't support memslots that

I see.  I searched the basic knowledge of SEV(-ES/SNP) and finally understand
the difference of encrypted vs. private.  For SEV(-ES) only encrypted.  For SEV-SNP
both encrypted & private(ownership) supported, but seems now we are trying
to make encrypted & private equal, there is no "encrypted but shared" or
"plain but private" memory from KVM's perspective.

> can be switched between private and shared, e.g. will return false for
> kvm_arch_has_private_mem().
> 
> And KVM _can't_ sanely use private/shared memslots for SEV(-ES), because it's
> impossible to intercept implicit conversions by the guest, i.e. KVM can't prevent
> the guest from encrypting a page that KVM thinks is private, and vice versa.

Is it because there is no #NPF for RMP violation?

> 
> But from hardware's perspective, while the APM is a bit misleading and I don't
> love the behavior, I can't really argue that it's a hardware bug if the CPU sets
> PFERR_GUEST_ENC on a fault where the guest access had C-bit=1, because the access
> _was_ indeed to encrypted memory.
> 
> Which is a long-winded way of saying the unwanted PFERR_GUEST_ENC faults aren't
> really spurious, nor are they hardware or software bugs, just another unfortunate
> side effect of the fact that the hypervisor can't intercept shared<->encrypted
> conversions for SEV(-ES) guests.  And that means that KVM can't WARN, because
> literally every SNP-capable CPU would yell when running SEV(-ES) guests.
> 
> I also don't like the idea of usurping PFERR_GUEST_ENC to have it mean something
> different in KVM as compared to how hardware defines/uses the flag.

Thanks for your clue.  I agree PFERR_GUEST_ENC just for encrypted and a
synthetic flag for private.

Yilun

> 
> Lastly, the approach in Paolo's series to propagate PFERR_GUEST_ENC to .is_private
> iff the VM has private memory is brittle, in that it would be too easy for KVM
> code that has access to the error code to do the wrong thing and interpret
> PFERR_GUEST_ENC has meaning "private".

Re: [PATCH 05/16] KVM: x86/mmu: Use synthetic page fault error code to indicate private faults

Posted by Sean Christopherson 1 year, 11 months ago

On Thu, Mar 07, 2024, Xu Yilun wrote:
> On Wed, Mar 06, 2024 at 06:45:30AM -0800, Sean Christopherson wrote:
> > can be switched between private and shared, e.g. will return false for
> > kvm_arch_has_private_mem().
> > 
> > And KVM _can't_ sanely use private/shared memslots for SEV(-ES), because it's
> > impossible to intercept implicit conversions by the guest, i.e. KVM can't prevent
> > the guest from encrypting a page that KVM thinks is private, and vice versa.
> 
> Is it because there is no #NPF for RMP violation?

Yep, there is no RMP, thus no way for the host to express its view of shared vs.
private to hardware.  As a result, KVM can't block conversions, and the given
state of a page is completely unkown at any given time.  E.g. when memory is
reclaimed from an SEV(-ES) guest, KVM has to assume that the page is encrypted
and thus needs to be flushed (see sev_guest_memory_reclaimed()).

Re: [PATCH 05/16] KVM: x86/mmu: Use synthetic page fault error code to indicate private faults

Posted by Huang, Kai 1 year, 11 months ago

> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 408969ac1291..7807bdcd87e8 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -5839,19 +5839,31 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
>  	bool direct = vcpu->arch.mmu->root_role.direct;
>  
>  	/*
> -	 * IMPLICIT_ACCESS is a KVM-defined flag used to correctly perform SMAP
> -	 * checks when emulating instructions that triggers implicit access.
>  	 * WARN if hardware generates a fault with an error code that collides
> -	 * with the KVM-defined value.  Clear the flag and continue on, i.e.
> -	 * don't terminate the VM, as KVM can't possibly be relying on a flag
> -	 * that KVM doesn't know about.
> +	 * with KVM-defined sythentic flags.  Clear the flags and continue on,
> +	 * i.e. don't terminate the VM, as KVM can't possibly be relying on a
> +	 * flag that KVM doesn't know about.
>  	 */
> -	if (WARN_ON_ONCE(error_code & PFERR_IMPLICIT_ACCESS))
> -		error_code &= ~PFERR_IMPLICIT_ACCESS;
> +	if (WARN_ON_ONCE(error_code & PFERR_SYNTHETIC_MASK))
> +		error_code &= ~PFERR_SYNTHETIC_MASK;
>  

Hmm.. I thought for TDX the caller -- handle_ept_violation() -- should
explicitly set the PFERR_PRIVATE_ACCESS so that here the fault handler can
figure out the fault is private.

Now it seems the caller should never pass PFERR_PRIVATE_ACCESS, then ...

>  	if (WARN_ON_ONCE(!VALID_PAGE(vcpu->arch.mmu->root.hpa)))
>  		return RET_PF_RETRY;
>  
> +	/*
> +	 * Except for reserved faults (emulated MMIO is shared-only), set the
> +	 * private flag for software-protected VMs based on the gfn's current
> +	 * attributes, which are the source of truth for such VMs.  Note, this
> +	 * wrong for nested MMUs as the GPA is an L2 GPA, but KVM doesn't
> +	 * currently supported nested virtualization (among many other things)
> +	 * for software-protected VMs.
> +	 */
> +	if (IS_ENABLED(CONFIG_KVM_SW_PROTECTED_VM) &&
> +	    !(error_code & PFERR_RSVD_MASK) &&
> +	    vcpu->kvm->arch.vm_type == KVM_X86_SW_PROTECTED_VM &&
> +	    kvm_mem_is_private(vcpu->kvm, gpa_to_gfn(cr2_or_gpa)))
> +		error_code |= PFERR_PRIVATE_ACCESS;
> +
> 

... I am wondering how we figure out whether a fault is private for TDX?

Re: [PATCH 05/16] KVM: x86/mmu: Use synthetic page fault error code to indicate private faults

Posted by Sean Christopherson 1 year, 11 months ago

On Thu, Feb 29, 2024, Kai Huang wrote:
> 
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index 408969ac1291..7807bdcd87e8 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -5839,19 +5839,31 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
> >  	bool direct = vcpu->arch.mmu->root_role.direct;
> >  
> >  	/*
> > -	 * IMPLICIT_ACCESS is a KVM-defined flag used to correctly perform SMAP
> > -	 * checks when emulating instructions that triggers implicit access.
> >  	 * WARN if hardware generates a fault with an error code that collides
> > -	 * with the KVM-defined value.  Clear the flag and continue on, i.e.
> > -	 * don't terminate the VM, as KVM can't possibly be relying on a flag
> > -	 * that KVM doesn't know about.
> > +	 * with KVM-defined sythentic flags.  Clear the flags and continue on,
> > +	 * i.e. don't terminate the VM, as KVM can't possibly be relying on a
> > +	 * flag that KVM doesn't know about.
> >  	 */
> > -	if (WARN_ON_ONCE(error_code & PFERR_IMPLICIT_ACCESS))
> > -		error_code &= ~PFERR_IMPLICIT_ACCESS;
> > +	if (WARN_ON_ONCE(error_code & PFERR_SYNTHETIC_MASK))
> > +		error_code &= ~PFERR_SYNTHETIC_MASK;
> >  
> 
> Hmm.. I thought for TDX the caller -- handle_ept_violation() -- should
> explicitly set the PFERR_PRIVATE_ACCESS so that here the fault handler can
> figure out the fault is private.
> 
> Now it seems the caller should never pass PFERR_PRIVATE_ACCESS, then ...
> 
> >  	if (WARN_ON_ONCE(!VALID_PAGE(vcpu->arch.mmu->root.hpa)))
> >  		return RET_PF_RETRY;
> >  
> > +	/*
> > +	 * Except for reserved faults (emulated MMIO is shared-only), set the
> > +	 * private flag for software-protected VMs based on the gfn's current
> > +	 * attributes, which are the source of truth for such VMs.  Note, this
> > +	 * wrong for nested MMUs as the GPA is an L2 GPA, but KVM doesn't
> > +	 * currently supported nested virtualization (among many other things)
> > +	 * for software-protected VMs.
> > +	 */
> > +	if (IS_ENABLED(CONFIG_KVM_SW_PROTECTED_VM) &&
> > +	    !(error_code & PFERR_RSVD_MASK) &&
> > +	    vcpu->kvm->arch.vm_type == KVM_X86_SW_PROTECTED_VM &&
> > +	    kvm_mem_is_private(vcpu->kvm, gpa_to_gfn(cr2_or_gpa)))
> > +		error_code |= PFERR_PRIVATE_ACCESS;
> > +
> > 
> 
> ... I am wondering how we figure out whether a fault is private for TDX?

Read the next few patches :-)

The sanity check gets moved to the legacy #PF handler (any error code with bits
63:32!=0 yells) and SVM's #NPF handler (error code with synthetic bits set yells),
leaving VMX free and clear to stuff PFERR_PRIVATE_ACCESS as appropriate.