arch/x86/kvm/mmu/mmu.c | 5 +++++ 1 file changed, 5 insertions(+)
From: Isaku Yamahata <isaku.yamahata@intel.com>
Add GPA limit check to kvm_arch_vcpu_pre_fault_memory() with guest
maxphyaddr and kvm_mmu_max_gfn().
The KVM page fault handler decides which level of TDP to use, 4-level TDP
or 5-level TDP based on guest maxphyaddr (CPUID[0x80000008].EAX[7:0]), the
host maxphyaddr, and whether the host supports 5-level TDP or not. The
4-level TDP can map GPA up to 48 bits, and the 5-level TDP can map GPA up
to 52 bits. If guest maxphyaddr <= 48, KVM uses 4-level TDP even when the
host supports 5-level TDP.
If we pass GPA > beyond the TDP mappable limit to the TDP MMU fault handler
(concretely GPA > 48-bits with 4-level TDP), it will operate on GPA without
upper bits, (GPA & ((1UL < 48) - 1)), not the specified GPA. It is not
expected behavior. It wrongly maps GPA without upper bits with the page
for GPA with upper bits.
KVM_PRE_FAULT_MEMORY calls x86 KVM page fault handler, kvm_tdp_page_fault()
with a user-space-supplied GPA without the limit check so that the user
space can trigger WARN_ON_ONCE(). Check the GPA limit to fix it.
- For non-TDX case (DEFAULT_VM, SW_PROTECTED_VM, or SEV):
When the host supports 5-level TDP, KVM decides to use 4-level TDP if
cpuid_maxphyaddr() <= 48. cpuid_maxhyaddr() check prevents
KVM_PRE_FAULT_MEMORY from passing GFN beyond mappable GFN.
- For TDX case:
We'd like to exclude shared bit (or gfn_direct_mask in [1]) from GPA
passed to the TDP MMU so that the TDP MMU can handle Secure-EPT or
Shared-EPT (direct or mirrored in [1]) without explicitly
setting/clearing the GPA (except setting up the TDP iterator,
tdp_iter_refresh_sptep()). We'd like to make kvm_mmu_max_gfn() per VM
for TDX to be 52 or 47 independent of the guest maxphyaddr with other
patches.
Fixes: 6e01b7601dfe ("KVM: x86: Implement kvm_arch_vcpu_pre_fault_memory()")
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
arch/x86/kvm/mmu/mmu.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 4e0e9963066f..6ee5af55cee1 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4756,6 +4756,11 @@ long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu,
u64 end;
int r;
+ if (range->gpa >= (1UL << cpuid_maxphyaddr(vcpu)))
+ return -E2BIG;
+ if (gpa_to_gfn(range->gpa) > kvm_mmu_max_gfn())
+ return -E2BIG;
+
/*
* reload is efficient when called repeatedly, so we can do it on
* every iteration.
base-commit: c8b8b8190a80b591aa73c27c70a668799f8db547
--
2.45.2
On Mon, Jul 15, 2024, isaku.yamahata@intel.com wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
>
> Add GPA limit check to kvm_arch_vcpu_pre_fault_memory() with guest
> maxphyaddr and kvm_mmu_max_gfn().
>
> The KVM page fault handler decides which level of TDP to use, 4-level TDP
> or 5-level TDP based on guest maxphyaddr (CPUID[0x80000008].EAX[7:0]), the
> host maxphyaddr, and whether the host supports 5-level TDP or not. The
> 4-level TDP can map GPA up to 48 bits, and the 5-level TDP can map GPA up
> to 52 bits. If guest maxphyaddr <= 48, KVM uses 4-level TDP even when the
> host supports 5-level TDP.
>
> If we pass GPA > beyond the TDP mappable limit to the TDP MMU fault handler
> (concretely GPA > 48-bits with 4-level TDP), it will operate on GPA without
> upper bits, (GPA & ((1UL < 48) - 1)), not the specified GPA. It is not
> expected behavior. It wrongly maps GPA without upper bits with the page
> for GPA with upper bits.
>
> KVM_PRE_FAULT_MEMORY calls x86 KVM page fault handler, kvm_tdp_page_fault()
> with a user-space-supplied GPA without the limit check so that the user
> space can trigger WARN_ON_ONCE(). Check the GPA limit to fix it.
Which WARN?
> - For non-TDX case (DEFAULT_VM, SW_PROTECTED_VM, or SEV):
> When the host supports 5-level TDP, KVM decides to use 4-level TDP if
> cpuid_maxphyaddr() <= 48. cpuid_maxhyaddr() check prevents
> KVM_PRE_FAULT_MEMORY from passing GFN beyond mappable GFN.
Hardening against cpuid_maxphyaddr() should be out of scope. We don't enforce
it for guest faults, e.g. KVM doesn't kill the guest if allow_smaller_maxphyaddr
is false and the GPA is supposed to be illegal. And trying to enforce it here is
a fool's errand since userspace can simply do KVM_SET_CPUID2 to circumvent the
restriction.
> - For TDX case:
> We'd like to exclude shared bit (or gfn_direct_mask in [1]) from GPA
> passed to the TDP MMU so that the TDP MMU can handle Secure-EPT or
> Shared-EPT (direct or mirrored in [1]) without explicitly
> setting/clearing the GPA (except setting up the TDP iterator,
> tdp_iter_refresh_sptep()). We'd like to make kvm_mmu_max_gfn() per VM
> for TDX to be 52 or 47 independent of the guest maxphyaddr with other
> patches.
>
> Fixes: 6e01b7601dfe ("KVM: x86: Implement kvm_arch_vcpu_pre_fault_memory()")
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> ---
> arch/x86/kvm/mmu/mmu.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 4e0e9963066f..6ee5af55cee1 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -4756,6 +4756,11 @@ long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu,
> u64 end;
> int r;
>
> + if (range->gpa >= (1UL << cpuid_maxphyaddr(vcpu)))
> + return -E2BIG;
> + if (gpa_to_gfn(range->gpa) > kvm_mmu_max_gfn())
> + return -E2BIG;
Related to my thoughts on making kvm_mmu_max_gfn() and rejecting aliased memslots,
I think we should add a common helper that's used by kvm_arch_prepare_memory_region()
and kvm_arch_vcpu_pre_fault_memory() to reject GPAs that are disallowed.
https://lore.kernel.org/all/ZpbKqG_ZhCWxl-Fc@google.com
> +
> /*
> * reload is efficient when called repeatedly, so we can do it on
> * every iteration.
>
> base-commit: c8b8b8190a80b591aa73c27c70a668799f8db547
> --
> 2.45.2
>
On Tue, Jul 16, 2024 at 01:17:27PM -0700,
Sean Christopherson <seanjc@google.com> wrote:
> On Mon, Jul 15, 2024, isaku.yamahata@intel.com wrote:
> > From: Isaku Yamahata <isaku.yamahata@intel.com>
> >
> > Add GPA limit check to kvm_arch_vcpu_pre_fault_memory() with guest
> > maxphyaddr and kvm_mmu_max_gfn().
> >
> > The KVM page fault handler decides which level of TDP to use, 4-level TDP
> > or 5-level TDP based on guest maxphyaddr (CPUID[0x80000008].EAX[7:0]), the
> > host maxphyaddr, and whether the host supports 5-level TDP or not. The
> > 4-level TDP can map GPA up to 48 bits, and the 5-level TDP can map GPA up
> > to 52 bits. If guest maxphyaddr <= 48, KVM uses 4-level TDP even when the
> > host supports 5-level TDP.
> >
> > If we pass GPA > beyond the TDP mappable limit to the TDP MMU fault handler
> > (concretely GPA > 48-bits with 4-level TDP), it will operate on GPA without
> > upper bits, (GPA & ((1UL < 48) - 1)), not the specified GPA. It is not
> > expected behavior. It wrongly maps GPA without upper bits with the page
> > for GPA with upper bits.
> >
> > KVM_PRE_FAULT_MEMORY calls x86 KVM page fault handler, kvm_tdp_page_fault()
> > with a user-space-supplied GPA without the limit check so that the user
> > space can trigger WARN_ON_ONCE(). Check the GPA limit to fix it.
>
> Which WARN?
Sorry, I confused with the local changes for 4/5-level.
>
> > - For non-TDX case (DEFAULT_VM, SW_PROTECTED_VM, or SEV):
> > When the host supports 5-level TDP, KVM decides to use 4-level TDP if
> > cpuid_maxphyaddr() <= 48. cpuid_maxhyaddr() check prevents
> > KVM_PRE_FAULT_MEMORY from passing GFN beyond mappable GFN.
>
> Hardening against cpuid_maxphyaddr() should be out of scope. We don't enforce
> it for guest faults, e.g. KVM doesn't kill the guest if allow_smaller_maxphyaddr
> is false and the GPA is supposed to be illegal. And trying to enforce it here is
> a fool's errand since userspace can simply do KVM_SET_CPUID2 to circumvent the
> restriction.
Ok, I'll drop maxphys addr check.
> > - For TDX case:
> > We'd like to exclude shared bit (or gfn_direct_mask in [1]) from GPA
> > passed to the TDP MMU so that the TDP MMU can handle Secure-EPT or
> > Shared-EPT (direct or mirrored in [1]) without explicitly
> > setting/clearing the GPA (except setting up the TDP iterator,
> > tdp_iter_refresh_sptep()). We'd like to make kvm_mmu_max_gfn() per VM
> > for TDX to be 52 or 47 independent of the guest maxphyaddr with other
> > patches.
> >
> > Fixes: 6e01b7601dfe ("KVM: x86: Implement kvm_arch_vcpu_pre_fault_memory()")
> > Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> > ---
> > arch/x86/kvm/mmu/mmu.c | 5 +++++
> > 1 file changed, 5 insertions(+)
> >
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index 4e0e9963066f..6ee5af55cee1 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -4756,6 +4756,11 @@ long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu,
> > u64 end;
> > int r;
> >
> > + if (range->gpa >= (1UL << cpuid_maxphyaddr(vcpu)))
> > + return -E2BIG;
> > + if (gpa_to_gfn(range->gpa) > kvm_mmu_max_gfn())
> > + return -E2BIG;
>
>
> Related to my thoughts on making kvm_mmu_max_gfn() and rejecting aliased memslots,
> I think we should add a common helper that's used by kvm_arch_prepare_memory_region()
> and kvm_arch_vcpu_pre_fault_memory() to reject GPAs that are disallowed.
>
> https://lore.kernel.org/all/ZpbKqG_ZhCWxl-Fc@google.com
I'll look into it. I'll combine two patches into single patch series.
--
Isaku Yamahata <isaku.yamahata@intel.com>
On Tue, Jul 16, 2024 at 04:49:00PM -0700,
Isaku Yamahata <isaku.yamahata@intel.com> wrote:
> > > - For non-TDX case (DEFAULT_VM, SW_PROTECTED_VM, or SEV):
> > > When the host supports 5-level TDP, KVM decides to use 4-level TDP if
> > > cpuid_maxphyaddr() <= 48. cpuid_maxhyaddr() check prevents
> > > KVM_PRE_FAULT_MEMORY from passing GFN beyond mappable GFN.
> >
> > Hardening against cpuid_maxphyaddr() should be out of scope. We don't enforce
> > it for guest faults, e.g. KVM doesn't kill the guest if allow_smaller_maxphyaddr
> > is false and the GPA is supposed to be illegal. And trying to enforce it here is
> > a fool's errand since userspace can simply do KVM_SET_CPUID2 to circumvent the
> > restriction.
>
> Ok, I'll drop maxphys addr check.
Now Rick added a patch to check aliased GFN. This patch and per-VM mmu_max_gfn
become unnecessarily. Now I come up with update to pre_fault to test no
memslot case.
https://lore.kernel.org/kvm/20240718211230.1492011-19-rick.p.edgecombe@intel.com/
For non-x86 case, I'm not sure if we can expect what error.
From d62fc5170b17788041d364e6a17f97f01be4130e Mon Sep 17 00:00:00 2001
Message-ID: <d62fc5170b17788041d364e6a17f97f01be4130e.1721345479.git.isaku.yamahata@intel.com>
From: Isaku Yamahata <isaku.yamahata@intel.com>
Date: Wed, 29 May 2024 12:13:20 -0700
Subject: [PATCH] KVM: selftests: Update pre_fault_memory_test.c to test no
memslot case
Add test cases to pass GPA to get ENOENT where no memslot is assigned.
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
This tests passes for kvm queue branch, also with KVM TDX branch.
---
.../selftests/kvm/pre_fault_memory_test.c | 37 ++++++++++++++-----
1 file changed, 28 insertions(+), 9 deletions(-)
diff --git a/tools/testing/selftests/kvm/pre_fault_memory_test.c b/tools/testing/selftests/kvm/pre_fault_memory_test.c
index 0350a8896a2f..8d057a0bc6fd 100644
--- a/tools/testing/selftests/kvm/pre_fault_memory_test.c
+++ b/tools/testing/selftests/kvm/pre_fault_memory_test.c
@@ -30,8 +30,8 @@ static void guest_code(uint64_t base_gpa)
GUEST_DONE();
}
-static void pre_fault_memory(struct kvm_vcpu *vcpu, u64 gpa, u64 size,
- u64 left)
+static void __pre_fault_memory(struct kvm_vcpu *vcpu, u64 gpa, u64 size,
+ u64 left, int *ret, int *save_errno)
{
struct kvm_pre_fault_memory range = {
.gpa = gpa,
@@ -39,21 +39,28 @@ static void pre_fault_memory(struct kvm_vcpu *vcpu, u64 gpa, u64 size,
.flags = 0,
};
u64 prev;
- int ret, save_errno;
do {
prev = range.size;
- ret = __vcpu_ioctl(vcpu, KVM_PRE_FAULT_MEMORY, &range);
- save_errno = errno;
- TEST_ASSERT((range.size < prev) ^ (ret < 0),
+ *ret = __vcpu_ioctl(vcpu, KVM_PRE_FAULT_MEMORY, &range);
+ *save_errno = errno;
+ TEST_ASSERT((range.size < prev) ^ (*ret < 0),
"%sexpecting range.size to change on %s",
- ret < 0 ? "not " : "",
- ret < 0 ? "failure" : "success");
- } while (ret >= 0 ? range.size : save_errno == EINTR);
+ *ret < 0 ? "not " : "",
+ *ret < 0 ? "failure" : "success");
+ } while (*ret >= 0 ? range.size : *save_errno == EINTR);
TEST_ASSERT(range.size == left,
"Completed with %lld bytes left, expected %" PRId64,
range.size, left);
+}
+
+static void pre_fault_memory(struct kvm_vcpu *vcpu, u64 gpa, u64 size,
+ u64 left)
+{
+ int ret, save_errno;
+
+ __pre_fault_memory(vcpu, gpa, size, left, &ret, &save_errno);
if (left == 0)
__TEST_ASSERT_VM_VCPU_IOCTL(!ret, "KVM_PRE_FAULT_MEMORY", ret, vcpu->vm);
@@ -77,6 +84,7 @@ static void __test_pre_fault_memory(unsigned long vm_type, bool private)
uint64_t guest_test_phys_mem;
uint64_t guest_test_virt_mem;
uint64_t alignment, guest_page_size;
+ int ret, save_errno;
vm = vm_create_shape_with_one_vcpu(shape, &vcpu, guest_code);
@@ -101,6 +109,17 @@ static void __test_pre_fault_memory(unsigned long vm_type, bool private)
pre_fault_memory(vcpu, guest_test_phys_mem + SZ_2M, PAGE_SIZE * 2, PAGE_SIZE);
pre_fault_memory(vcpu, guest_test_phys_mem + TEST_SIZE, PAGE_SIZE, PAGE_SIZE);
+#ifdef __x86_64__
+ __pre_fault_memory(vcpu, guest_test_phy_mem - guest_page_size,
+ guest_page_size, guest_page_size, &ret, &save_errno);
+ __TEST_ASSERT_VM_VCPU_IOCTL(ret && save_errno == ENOENT,
+ "KVM_PRE_FAULT_MEMORY", ret, vcpu->vm);
+ __pre_fault_memory(vcpu, (vm->max_gfn + 1) << vm->page_shift,
+ guest_page_size, guest_page_size, &ret, &save_errno);
+ __TEST_ASSERT_VM_VCPU_IOCTL(ret && save_errno == ENOENT,
+ "KVM_PRE_FAULT_MEMORY", ret, vcpu->vm);
+#endif
+
vcpu_args_set(vcpu, 1, guest_test_virt_mem);
vcpu_run(vcpu);
base-commit: c8b8b8190a80b591aa73c27c70a668799f8db547
--
2.45.2
--
Isaku Yamahata <isaku.yamahata@intel.com>
© 2016 - 2025 Red Hat, Inc.