[v2] KVM: Guest Memory Pre-Population API

[PATCH v2 05/10] KVM: x86/mmu: Introduce kvm_tdp_map_page() to populate guest memory

Posted by isaku.yamahata@intel.com 1 year, 10 months ago

From: Isaku Yamahata <isaku.yamahata@intel.com>

Introduce a helper function to call the KVM fault handler.  It allows a new
ioctl to invoke the KVM fault handler to populate without seeing RET_PF_*
enums or other KVM MMU internal definitions because RET_PF_* are internal
to x86 KVM MMU.  The implementation is restricted to two-dimensional paging
for simplicity.  The shadow paging uses GVA for faulting instead of L1 GPA.
It makes the API difficult to use.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
v2:
- Make the helper function two-dimensional paging specific. (David)
- Return error when vcpu is in guest mode. (David)
- Rename goal_level to level in kvm_tdp_mmu_map_page(). (Sean)
- Update return code conversion. Don't check pfn.
  RET_PF_EMULATE => EINVAL, RET_PF_CONTINUE => EIO (Sean)
- Add WARN_ON_ONCE on RET_PF_CONTINUE and RET_PF_INVALID. (Sean)
- Drop unnecessary EXPORT_SYMBOL_GPL(). (Sean)
---
 arch/x86/kvm/mmu.h     |  3 +++
 arch/x86/kvm/mmu/mmu.c | 32 ++++++++++++++++++++++++++++++++
 2 files changed, 35 insertions(+)

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index e8b620a85627..51ff4f67e115 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -183,6 +183,9 @@ static inline void kvm_mmu_refresh_passthrough_bits(struct kvm_vcpu *vcpu,
 	__kvm_mmu_refresh_passthrough_bits(vcpu, mmu);
 }
 
+int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code,
+		     u8 *level);
+
 /*
  * Check if a given access (described through the I/D, W/R and U/S bits of a
  * page fault error code pfec) causes a permission fault with the given PTE
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 91dd4c44b7d8..a34f4af44cbd 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4687,6 +4687,38 @@ int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 	return direct_page_fault(vcpu, fault);
 }
 
+int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code,
+		     u8 *level)
+{
+	int r;
+
+	/* Restrict to TDP page fault. */
+	if (vcpu->arch.mmu->page_fault != kvm_tdp_page_fault)
+		return -EINVAL;
+
+	r = __kvm_mmu_do_page_fault(vcpu, gpa, error_code, false, NULL, level);
+	if (r < 0)
+		return r;
+
+	switch (r) {
+	case RET_PF_RETRY:
+		return -EAGAIN;
+
+	case RET_PF_FIXED:
+	case RET_PF_SPURIOUS:
+		return 0;
+
+	case RET_PF_EMULATE:
+		return -EINVAL;
+
+	case RET_PF_CONTINUE:
+	case RET_PF_INVALID:
+	default:
+		WARN_ON_ONCE(r);
+		return -EIO;
+	}
+}
+
 static void nonpaging_init_context(struct kvm_mmu *context)
 {
 	context->page_fault = nonpaging_page_fault;
-- 
2.43.2

Re: [PATCH v2 05/10] KVM: x86/mmu: Introduce kvm_tdp_map_page() to populate guest memory

Posted by Chao Gao 1 year, 9 months ago

On Wed, Apr 10, 2024 at 03:07:31PM -0700, isaku.yamahata@intel.com wrote:
>From: Isaku Yamahata <isaku.yamahata@intel.com>
>
>Introduce a helper function to call the KVM fault handler.  It allows a new
>ioctl to invoke the KVM fault handler to populate without seeing RET_PF_*
>enums or other KVM MMU internal definitions because RET_PF_* are internal
>to x86 KVM MMU.  The implementation is restricted to two-dimensional paging
>for simplicity.  The shadow paging uses GVA for faulting instead of L1 GPA.
>It makes the API difficult to use.
>
>Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
>---
>v2:
>- Make the helper function two-dimensional paging specific. (David)
>- Return error when vcpu is in guest mode. (David)
>- Rename goal_level to level in kvm_tdp_mmu_map_page(). (Sean)
>- Update return code conversion. Don't check pfn.
>  RET_PF_EMULATE => EINVAL, RET_PF_CONTINUE => EIO (Sean)
>- Add WARN_ON_ONCE on RET_PF_CONTINUE and RET_PF_INVALID. (Sean)
>- Drop unnecessary EXPORT_SYMBOL_GPL(). (Sean)
>---
> arch/x86/kvm/mmu.h     |  3 +++
> arch/x86/kvm/mmu/mmu.c | 32 ++++++++++++++++++++++++++++++++
> 2 files changed, 35 insertions(+)
>
>diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
>index e8b620a85627..51ff4f67e115 100644
>--- a/arch/x86/kvm/mmu.h
>+++ b/arch/x86/kvm/mmu.h
>@@ -183,6 +183,9 @@ static inline void kvm_mmu_refresh_passthrough_bits(struct kvm_vcpu *vcpu,
> 	__kvm_mmu_refresh_passthrough_bits(vcpu, mmu);
> }
> 
>+int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code,
>+		     u8 *level);
>+
> /*
>  * Check if a given access (described through the I/D, W/R and U/S bits of a
>  * page fault error code pfec) causes a permission fault with the given PTE
>diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
>index 91dd4c44b7d8..a34f4af44cbd 100644
>--- a/arch/x86/kvm/mmu/mmu.c
>+++ b/arch/x86/kvm/mmu/mmu.c
>@@ -4687,6 +4687,38 @@ int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
> 	return direct_page_fault(vcpu, fault);
> }
> 
>+int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code,
>+		     u8 *level)
>+{
>+	int r;
>+
>+	/* Restrict to TDP page fault. */

need to explain why. (just as you do in the changelog)

>+	if (vcpu->arch.mmu->page_fault != kvm_tdp_page_fault)

page fault handlers (i.e., vcpu->arch.mmu->page_fault()) will be called
finally. why not let page fault handlers reject the request to get rid of
this ad-hoc check? We just need to plumb a flag indicating this is a
pre-population request into the handlers. I think this way is clearer.

What do you think?

Re: [PATCH v2 05/10] KVM: x86/mmu: Introduce kvm_tdp_map_page() to populate guest memory

Posted by Isaku Yamahata 1 year, 9 months ago

On Wed, Apr 17, 2024 at 03:04:08PM +0800,
Chao Gao <chao.gao@intel.com> wrote:

> On Wed, Apr 10, 2024 at 03:07:31PM -0700, isaku.yamahata@intel.com wrote:
> >From: Isaku Yamahata <isaku.yamahata@intel.com>
> >
> >Introduce a helper function to call the KVM fault handler.  It allows a new
> >ioctl to invoke the KVM fault handler to populate without seeing RET_PF_*
> >enums or other KVM MMU internal definitions because RET_PF_* are internal
> >to x86 KVM MMU.  The implementation is restricted to two-dimensional paging
> >for simplicity.  The shadow paging uses GVA for faulting instead of L1 GPA.
> >It makes the API difficult to use.
> >
> >Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> >---
> >v2:
> >- Make the helper function two-dimensional paging specific. (David)
> >- Return error when vcpu is in guest mode. (David)
> >- Rename goal_level to level in kvm_tdp_mmu_map_page(). (Sean)
> >- Update return code conversion. Don't check pfn.
> >  RET_PF_EMULATE => EINVAL, RET_PF_CONTINUE => EIO (Sean)
> >- Add WARN_ON_ONCE on RET_PF_CONTINUE and RET_PF_INVALID. (Sean)
> >- Drop unnecessary EXPORT_SYMBOL_GPL(). (Sean)
> >---
> > arch/x86/kvm/mmu.h     |  3 +++
> > arch/x86/kvm/mmu/mmu.c | 32 ++++++++++++++++++++++++++++++++
> > 2 files changed, 35 insertions(+)
> >
> >diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> >index e8b620a85627..51ff4f67e115 100644
> >--- a/arch/x86/kvm/mmu.h
> >+++ b/arch/x86/kvm/mmu.h
> >@@ -183,6 +183,9 @@ static inline void kvm_mmu_refresh_passthrough_bits(struct kvm_vcpu *vcpu,
> > 	__kvm_mmu_refresh_passthrough_bits(vcpu, mmu);
> > }
> > 
> >+int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code,
> >+		     u8 *level);
> >+
> > /*
> >  * Check if a given access (described through the I/D, W/R and U/S bits of a
> >  * page fault error code pfec) causes a permission fault with the given PTE
> >diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> >index 91dd4c44b7d8..a34f4af44cbd 100644
> >--- a/arch/x86/kvm/mmu/mmu.c
> >+++ b/arch/x86/kvm/mmu/mmu.c
> >@@ -4687,6 +4687,38 @@ int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
> > 	return direct_page_fault(vcpu, fault);
> > }
> > 
> >+int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code,
> >+		     u8 *level)
> >+{
> >+	int r;
> >+
> >+	/* Restrict to TDP page fault. */
> 
> need to explain why. (just as you do in the changelog)

Sure.


> >+	if (vcpu->arch.mmu->page_fault != kvm_tdp_page_fault)
> 
> page fault handlers (i.e., vcpu->arch.mmu->page_fault()) will be called
> finally. why not let page fault handlers reject the request to get rid of
> this ad-hoc check? We just need to plumb a flag indicating this is a
> pre-population request into the handlers. I think this way is clearer.
> 
> What do you think?

__kvm_mmu_do_page_fault() doesn't check if the mmu mode is TDP or not.
If we don't want to check page_fault handler, the alternative check would
be if (!vcpu->arch.mmu->direct).  Or we will require the caller to guarantee
that MMU mode is tdp (direct or tdp_mmu).
-- 
Isaku Yamahata <isaku.yamahata@intel.com>

Re: [PATCH v2 05/10] KVM: x86/mmu: Introduce kvm_tdp_map_page() to populate guest memory

Posted by Edgecombe, Rick P 1 year, 9 months ago

On Wed, 2024-04-10 at 15:07 -0700, isaku.yamahata@intel.com wrote:
>  
> +int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code,
> +                    u8 *level)
> +{
> +       int r;
> +
> +       /* Restrict to TDP page fault. */
> +       if (vcpu->arch.mmu->page_fault != kvm_tdp_page_fault)
> +               return -EINVAL;
> +
> +       r = __kvm_mmu_do_page_fault(vcpu, gpa, error_code, false, NULL,
> level);

Why not prefetch = true? Doesn't it fit? It looks like the behavior will be to
not set the access bit.

> +       if (r < 0)
> +               return r;
> +
> +       switch (r) {
> +       case RET_PF_RETRY:
> +               return -EAGAIN;
> +
> +       case RET_PF_FIXED:
> +       case RET_PF_SPURIOUS:
> +               return 0;
> +
> +       case RET_PF_EMULATE:
> +               return -EINVAL;
> +
> +       case RET_PF_CONTINUE:
> +       case RET_PF_INVALID:
> +       default:
> +               WARN_ON_ONCE(r);
> +               return -EIO;
> +       }
> +}

Re: [PATCH v2 05/10] KVM: x86/mmu: Introduce kvm_tdp_map_page() to populate guest memory

Posted by Isaku Yamahata 1 year, 9 months ago

On Tue, Apr 16, 2024 at 02:46:17PM +0000,
"Edgecombe, Rick P" <rick.p.edgecombe@intel.com> wrote:

> On Wed, 2024-04-10 at 15:07 -0700, isaku.yamahata@intel.com wrote:
> >  
> > +int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code,
> > +                    u8 *level)
> > +{
> > +       int r;
> > +
> > +       /* Restrict to TDP page fault. */
> > +       if (vcpu->arch.mmu->page_fault != kvm_tdp_page_fault)
> > +               return -EINVAL;
> > +
> > +       r = __kvm_mmu_do_page_fault(vcpu, gpa, error_code, false, NULL,
> > level);
> 
> Why not prefetch = true? Doesn't it fit? It looks like the behavior will be to
> not set the access bit.

Makes sense. Yes, the difference is to set A/D bit or not.
-- 
Isaku Yamahata <isaku.yamahata@intel.com>