From: Isaku Yamahata <isaku.yamahata@intel.com>
Introduce a helper function to call the KVM fault handler. It allows a new
ioctl to invoke the KVM fault handler to populate without seeing RET_PF_*
enums or other KVM MMU internal definitions because RET_PF_* are internal
to x86 KVM MMU. The implementation is restricted to two-dimensional paging
for simplicity. The shadow paging uses GVA for faulting instead of L1 GPA.
It makes the API difficult to use.
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
v2:
- Make the helper function two-dimensional paging specific. (David)
- Return error when vcpu is in guest mode. (David)
- Rename goal_level to level in kvm_tdp_mmu_map_page(). (Sean)
- Update return code conversion. Don't check pfn.
RET_PF_EMULATE => EINVAL, RET_PF_CONTINUE => EIO (Sean)
- Add WARN_ON_ONCE on RET_PF_CONTINUE and RET_PF_INVALID. (Sean)
- Drop unnecessary EXPORT_SYMBOL_GPL(). (Sean)
---
arch/x86/kvm/mmu.h | 3 +++
arch/x86/kvm/mmu/mmu.c | 32 ++++++++++++++++++++++++++++++++
2 files changed, 35 insertions(+)
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index e8b620a85627..51ff4f67e115 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -183,6 +183,9 @@ static inline void kvm_mmu_refresh_passthrough_bits(struct kvm_vcpu *vcpu,
__kvm_mmu_refresh_passthrough_bits(vcpu, mmu);
}
+int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code,
+ u8 *level);
+
/*
* Check if a given access (described through the I/D, W/R and U/S bits of a
* page fault error code pfec) causes a permission fault with the given PTE
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 91dd4c44b7d8..a34f4af44cbd 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4687,6 +4687,38 @@ int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
return direct_page_fault(vcpu, fault);
}
+int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code,
+ u8 *level)
+{
+ int r;
+
+ /* Restrict to TDP page fault. */
+ if (vcpu->arch.mmu->page_fault != kvm_tdp_page_fault)
+ return -EINVAL;
+
+ r = __kvm_mmu_do_page_fault(vcpu, gpa, error_code, false, NULL, level);
+ if (r < 0)
+ return r;
+
+ switch (r) {
+ case RET_PF_RETRY:
+ return -EAGAIN;
+
+ case RET_PF_FIXED:
+ case RET_PF_SPURIOUS:
+ return 0;
+
+ case RET_PF_EMULATE:
+ return -EINVAL;
+
+ case RET_PF_CONTINUE:
+ case RET_PF_INVALID:
+ default:
+ WARN_ON_ONCE(r);
+ return -EIO;
+ }
+}
+
static void nonpaging_init_context(struct kvm_mmu *context)
{
context->page_fault = nonpaging_page_fault;
--
2.43.2
On Wed, Apr 10, 2024 at 03:07:31PM -0700, isaku.yamahata@intel.com wrote:
>From: Isaku Yamahata <isaku.yamahata@intel.com>
>
>Introduce a helper function to call the KVM fault handler. It allows a new
>ioctl to invoke the KVM fault handler to populate without seeing RET_PF_*
>enums or other KVM MMU internal definitions because RET_PF_* are internal
>to x86 KVM MMU. The implementation is restricted to two-dimensional paging
>for simplicity. The shadow paging uses GVA for faulting instead of L1 GPA.
>It makes the API difficult to use.
>
>Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
>---
>v2:
>- Make the helper function two-dimensional paging specific. (David)
>- Return error when vcpu is in guest mode. (David)
>- Rename goal_level to level in kvm_tdp_mmu_map_page(). (Sean)
>- Update return code conversion. Don't check pfn.
> RET_PF_EMULATE => EINVAL, RET_PF_CONTINUE => EIO (Sean)
>- Add WARN_ON_ONCE on RET_PF_CONTINUE and RET_PF_INVALID. (Sean)
>- Drop unnecessary EXPORT_SYMBOL_GPL(). (Sean)
>---
> arch/x86/kvm/mmu.h | 3 +++
> arch/x86/kvm/mmu/mmu.c | 32 ++++++++++++++++++++++++++++++++
> 2 files changed, 35 insertions(+)
>
>diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
>index e8b620a85627..51ff4f67e115 100644
>--- a/arch/x86/kvm/mmu.h
>+++ b/arch/x86/kvm/mmu.h
>@@ -183,6 +183,9 @@ static inline void kvm_mmu_refresh_passthrough_bits(struct kvm_vcpu *vcpu,
> __kvm_mmu_refresh_passthrough_bits(vcpu, mmu);
> }
>
>+int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code,
>+ u8 *level);
>+
> /*
> * Check if a given access (described through the I/D, W/R and U/S bits of a
> * page fault error code pfec) causes a permission fault with the given PTE
>diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
>index 91dd4c44b7d8..a34f4af44cbd 100644
>--- a/arch/x86/kvm/mmu/mmu.c
>+++ b/arch/x86/kvm/mmu/mmu.c
>@@ -4687,6 +4687,38 @@ int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
> return direct_page_fault(vcpu, fault);
> }
>
>+int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code,
>+ u8 *level)
>+{
>+ int r;
>+
>+ /* Restrict to TDP page fault. */
need to explain why. (just as you do in the changelog)
>+ if (vcpu->arch.mmu->page_fault != kvm_tdp_page_fault)
page fault handlers (i.e., vcpu->arch.mmu->page_fault()) will be called
finally. why not let page fault handlers reject the request to get rid of
this ad-hoc check? We just need to plumb a flag indicating this is a
pre-population request into the handlers. I think this way is clearer.
What do you think?
On Wed, Apr 17, 2024 at 03:04:08PM +0800,
Chao Gao <chao.gao@intel.com> wrote:
> On Wed, Apr 10, 2024 at 03:07:31PM -0700, isaku.yamahata@intel.com wrote:
> >From: Isaku Yamahata <isaku.yamahata@intel.com>
> >
> >Introduce a helper function to call the KVM fault handler. It allows a new
> >ioctl to invoke the KVM fault handler to populate without seeing RET_PF_*
> >enums or other KVM MMU internal definitions because RET_PF_* are internal
> >to x86 KVM MMU. The implementation is restricted to two-dimensional paging
> >for simplicity. The shadow paging uses GVA for faulting instead of L1 GPA.
> >It makes the API difficult to use.
> >
> >Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> >---
> >v2:
> >- Make the helper function two-dimensional paging specific. (David)
> >- Return error when vcpu is in guest mode. (David)
> >- Rename goal_level to level in kvm_tdp_mmu_map_page(). (Sean)
> >- Update return code conversion. Don't check pfn.
> > RET_PF_EMULATE => EINVAL, RET_PF_CONTINUE => EIO (Sean)
> >- Add WARN_ON_ONCE on RET_PF_CONTINUE and RET_PF_INVALID. (Sean)
> >- Drop unnecessary EXPORT_SYMBOL_GPL(). (Sean)
> >---
> > arch/x86/kvm/mmu.h | 3 +++
> > arch/x86/kvm/mmu/mmu.c | 32 ++++++++++++++++++++++++++++++++
> > 2 files changed, 35 insertions(+)
> >
> >diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> >index e8b620a85627..51ff4f67e115 100644
> >--- a/arch/x86/kvm/mmu.h
> >+++ b/arch/x86/kvm/mmu.h
> >@@ -183,6 +183,9 @@ static inline void kvm_mmu_refresh_passthrough_bits(struct kvm_vcpu *vcpu,
> > __kvm_mmu_refresh_passthrough_bits(vcpu, mmu);
> > }
> >
> >+int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code,
> >+ u8 *level);
> >+
> > /*
> > * Check if a given access (described through the I/D, W/R and U/S bits of a
> > * page fault error code pfec) causes a permission fault with the given PTE
> >diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> >index 91dd4c44b7d8..a34f4af44cbd 100644
> >--- a/arch/x86/kvm/mmu/mmu.c
> >+++ b/arch/x86/kvm/mmu/mmu.c
> >@@ -4687,6 +4687,38 @@ int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
> > return direct_page_fault(vcpu, fault);
> > }
> >
> >+int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code,
> >+ u8 *level)
> >+{
> >+ int r;
> >+
> >+ /* Restrict to TDP page fault. */
>
> need to explain why. (just as you do in the changelog)
Sure.
> >+ if (vcpu->arch.mmu->page_fault != kvm_tdp_page_fault)
>
> page fault handlers (i.e., vcpu->arch.mmu->page_fault()) will be called
> finally. why not let page fault handlers reject the request to get rid of
> this ad-hoc check? We just need to plumb a flag indicating this is a
> pre-population request into the handlers. I think this way is clearer.
>
> What do you think?
__kvm_mmu_do_page_fault() doesn't check if the mmu mode is TDP or not.
If we don't want to check page_fault handler, the alternative check would
be if (!vcpu->arch.mmu->direct). Or we will require the caller to guarantee
that MMU mode is tdp (direct or tdp_mmu).
--
Isaku Yamahata <isaku.yamahata@intel.com>
On Wed, 2024-04-10 at 15:07 -0700, isaku.yamahata@intel.com wrote:
>
> +int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code,
> + u8 *level)
> +{
> + int r;
> +
> + /* Restrict to TDP page fault. */
> + if (vcpu->arch.mmu->page_fault != kvm_tdp_page_fault)
> + return -EINVAL;
> +
> + r = __kvm_mmu_do_page_fault(vcpu, gpa, error_code, false, NULL,
> level);
Why not prefetch = true? Doesn't it fit? It looks like the behavior will be to
not set the access bit.
> + if (r < 0)
> + return r;
> +
> + switch (r) {
> + case RET_PF_RETRY:
> + return -EAGAIN;
> +
> + case RET_PF_FIXED:
> + case RET_PF_SPURIOUS:
> + return 0;
> +
> + case RET_PF_EMULATE:
> + return -EINVAL;
> +
> + case RET_PF_CONTINUE:
> + case RET_PF_INVALID:
> + default:
> + WARN_ON_ONCE(r);
> + return -EIO;
> + }
> +}
On Tue, Apr 16, 2024 at 02:46:17PM +0000,
"Edgecombe, Rick P" <rick.p.edgecombe@intel.com> wrote:
> On Wed, 2024-04-10 at 15:07 -0700, isaku.yamahata@intel.com wrote:
> >
> > +int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code,
> > + u8 *level)
> > +{
> > + int r;
> > +
> > + /* Restrict to TDP page fault. */
> > + if (vcpu->arch.mmu->page_fault != kvm_tdp_page_fault)
> > + return -EINVAL;
> > +
> > + r = __kvm_mmu_do_page_fault(vcpu, gpa, error_code, false, NULL,
> > level);
>
> Why not prefetch = true? Doesn't it fit? It looks like the behavior will be to
> not set the access bit.
Makes sense. Yes, the difference is to set A/D bit or not.
--
Isaku Yamahata <isaku.yamahata@intel.com>
© 2016 - 2026 Red Hat, Inc.