Rework the not-yet-released KVM_CAP_GUEST_MEMFD_MMAP into a more generic
KVM_CAP_GUEST_MEMFD_FLAGS capability so that adding new flags doesn't
require a new capability, and so that developers aren't tempted to bundle
multiple flags into a single capability.
Note, kvm_vm_ioctl_check_extension_generic() can only return a 32-bit
value, but that limitation can be easily circumvented by adding e.g.
KVM_CAP_GUEST_MEMFD_FLAGS2 in the unlikely event guest_memfd supports more
than 32 flags.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
Documentation/virt/kvm/api.rst | 10 +++++++---
include/uapi/linux/kvm.h | 2 +-
tools/testing/selftests/kvm/guest_memfd_test.c | 13 ++++++-------
virt/kvm/kvm_main.c | 7 +++++--
4 files changed, 19 insertions(+), 13 deletions(-)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 6ae24c5ca559..7ba92f2ced38 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6432,9 +6432,13 @@ most one mapping per page, i.e. binding multiple memory regions to a single
guest_memfd range is not allowed (any number of memory regions can be bound to
a single guest_memfd file, but the bound ranges must not overlap).
-When the capability KVM_CAP_GUEST_MEMFD_MMAP is supported, the 'flags' field
-supports GUEST_MEMFD_FLAG_MMAP. Setting this flag on guest_memfd creation
-enables mmap() and faulting of guest_memfd memory to host userspace.
+The capability KVM_CAP_GUEST_MEMFD_FLAGS enumerates the `flags` that can be
+specified via KVM_CREATE_GUEST_MEMFD. Currently defined flags:
+
+ ============================ ================================================
+ GUEST_MEMFD_FLAG_MMAP Enable using mmap() on the guest_memfd file
+ descriptor.
+ ============================ ================================================
When the KVM MMU performs a PFN lookup to service a guest fault and the backing
guest_memfd has the GUEST_MEMFD_FLAG_MMAP set, then the fault will always be
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 6efa98a57ec1..b1d52d0c56ec 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -962,7 +962,7 @@ struct kvm_enable_cap {
#define KVM_CAP_ARM_EL2_E2H0 241
#define KVM_CAP_RISCV_MP_STATE_RESET 242
#define KVM_CAP_ARM_CACHEABLE_PFNMAP_SUPPORTED 243
-#define KVM_CAP_GUEST_MEMFD_MMAP 244
+#define KVM_CAP_GUEST_MEMFD_FLAGS 244
struct kvm_irq_routing_irqchip {
__u32 irqchip;
diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c
index b3ca6737f304..3e58bd496104 100644
--- a/tools/testing/selftests/kvm/guest_memfd_test.c
+++ b/tools/testing/selftests/kvm/guest_memfd_test.c
@@ -262,19 +262,17 @@ static void test_guest_memfd_flags(struct kvm_vm *vm, uint64_t valid_flags)
static void test_guest_memfd(unsigned long vm_type)
{
- uint64_t flags = 0;
struct kvm_vm *vm;
size_t total_size;
size_t page_size;
+ uint64_t flags;
int fd;
page_size = getpagesize();
total_size = page_size * 4;
vm = vm_create_barebones_type(vm_type);
-
- if (vm_check_cap(vm, KVM_CAP_GUEST_MEMFD_MMAP))
- flags |= GUEST_MEMFD_FLAG_MMAP;
+ flags = vm_check_cap(vm, KVM_CAP_GUEST_MEMFD_FLAGS);
test_create_guest_memfd_multiple(vm);
test_create_guest_memfd_invalid_sizes(vm, flags, page_size);
@@ -328,13 +326,14 @@ static void test_guest_memfd_guest(void)
size_t size;
int fd, i;
- if (!kvm_has_cap(KVM_CAP_GUEST_MEMFD_MMAP))
+ if (!kvm_check_cap(KVM_CAP_GUEST_MEMFD_FLAGS))
return;
vm = __vm_create_shape_with_one_vcpu(VM_SHAPE_DEFAULT, &vcpu, 1, guest_code);
- TEST_ASSERT(vm_check_cap(vm, KVM_CAP_GUEST_MEMFD_MMAP),
- "Default VM type should always support guest_memfd mmap()");
+ TEST_ASSERT(vm_check_cap(vm, KVM_CAP_GUEST_MEMFD_FLAGS) & GUEST_MEMFD_FLAG_MMAP,
+ "Default VM type should support MMAP, supported flags = 0x%x",
+ vm_check_cap(vm, KVM_CAP_GUEST_MEMFD_FLAGS));
size = vm->page_size;
fd = vm_create_guest_memfd(vm, size, GUEST_MEMFD_FLAG_MMAP);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 226faeaa8e56..e3a268757621 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4928,8 +4928,11 @@ static int kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
#ifdef CONFIG_KVM_GUEST_MEMFD
case KVM_CAP_GUEST_MEMFD:
return 1;
- case KVM_CAP_GUEST_MEMFD_MMAP:
- return !kvm || kvm_arch_supports_gmem_mmap(kvm);
+ case KVM_CAP_GUEST_MEMFD_FLAGS:
+ if (!kvm || kvm_arch_supports_gmem_mmap(kvm))
+ return GUEST_MEMFD_FLAG_MMAP;
+
+ return 0;
#endif
default:
break;
--
2.51.0.618.g983fd99d29-goog
On 04.10.25 01:25, Sean Christopherson wrote: > Rework the not-yet-released KVM_CAP_GUEST_MEMFD_MMAP into a more generic > KVM_CAP_GUEST_MEMFD_FLAGS capability so that adding new flags doesn't > require a new capability, and so that developers aren't tempted to bundle > multiple flags into a single capability. > > Note, kvm_vm_ioctl_check_extension_generic() can only return a 32-bit > value, but that limitation can be easily circumvented by adding e.g. > KVM_CAP_GUEST_MEMFD_FLAGS2 in the unlikely event guest_memfd supports more > than 32 flags. > > Signed-off-by: Sean Christopherson <seanjc@google.com> > --- Reviewed-by: David Hildenbrand <david@redhat.com> -- Cheers David / dhildenb
Sean Christopherson <seanjc@google.com> writes: > Rework the not-yet-released KVM_CAP_GUEST_MEMFD_MMAP into a more generic > KVM_CAP_GUEST_MEMFD_FLAGS capability so that adding new flags doesn't > require a new capability, and so that developers aren't tempted to bundle > multiple flags into a single capability. > > Note, kvm_vm_ioctl_check_extension_generic() can only return a 32-bit > value, but that limitation can be easily circumvented by adding e.g. > KVM_CAP_GUEST_MEMFD_FLAGS2 in the unlikely event guest_memfd supports more > than 32 flags. > I know you suggested that guest_memfd's HugeTLB sizes shouldn't be squashed into the flags. Just using that as an example, would those kinds of flags (since they're using the upper bits, above the lower 32 bits) be awkward to represent in this new model? In this model, conditionally valid flags are always set, but userspace won't be able to do a flags check against the returned 32-bit value. Or do you think when this issue comes up, we'd put the flags in the upper bits in KVM_CAP_GUEST_MEMFD_FLAGS2 and userspace would then check against the OR-ed set of flags instead? Reviewed-by: Ackerley Tng <ackerleytng@google.com> Tested-by: Ackerley Tng <ackerleytng@google.com> > Signed-off-by: Sean Christopherson <seanjc@google.com> > --- > Documentation/virt/kvm/api.rst | 10 +++++++--- > include/uapi/linux/kvm.h | 2 +- > tools/testing/selftests/kvm/guest_memfd_test.c | 13 ++++++------- > virt/kvm/kvm_main.c | 7 +++++-- > 4 files changed, 19 insertions(+), 13 deletions(-) > > > [...snip...] >
On Mon, Oct 06, 2025, Ackerley Tng wrote:
> Sean Christopherson <seanjc@google.com> writes:
>
> > Rework the not-yet-released KVM_CAP_GUEST_MEMFD_MMAP into a more generic
> > KVM_CAP_GUEST_MEMFD_FLAGS capability so that adding new flags doesn't
> > require a new capability, and so that developers aren't tempted to bundle
> > multiple flags into a single capability.
> >
> > Note, kvm_vm_ioctl_check_extension_generic() can only return a 32-bit
> > value, but that limitation can be easily circumvented by adding e.g.
> > KVM_CAP_GUEST_MEMFD_FLAGS2 in the unlikely event guest_memfd supports more
> > than 32 flags.
>
> I know you suggested that guest_memfd's HugeTLB sizes shouldn't be
> squashed into the flags. Just using that as an example, would those
> kinds of flags (since they're using the upper bits, above the lower 32
> bits) be awkward to represent in this new model?
Are you asking specifically about flags that use bits 63:32? If so, no, I don't
see those as being awkward to deal with. Hopefully we kill of 32-bit KVM and it's
a complete non-issue, but even if we have to add KVM_CAP_GUEST_MEMFD_FLAGS2, I
don't see it being all that awkward for userspace to do:
uint64_t supported_gmem_flags = kvm_check_extension(KVM_CAP_GUEST_MEMFD_FLAGS) |
(kvm_check_extension(KVM_CAP_GUEST_MEMFD_FLAGS2) << 32);
We could even mimic what Intel did with 64-bit VMCS fields to handle 32-bit mode,
and explicitly name the second one KVM_CAP_GUEST_MEMFD_FLAGS_HI:
uint64_t supported_gmem_flags = kvm_check_extension(KVM_CAP_GUEST_MEMFD_FLAGS) |
(kvm_check_extension(KVM_CAP_GUEST_MEMFD_FLAGS_HI) << 32);
so that if KVM_CAP_GUEST_MEMFD_FLAGS_HI precedes 64-bit-only KVM, it could become
fully redundant, i.e. where someday this would hold true:
kvm_check_extension(KVM_CAP_GUEST_MEMFD_FLAGS) ==
kvm_check_extension(KVM_CAP_GUEST_MEMFD_FLAGS) | kvm_check_extension(KVM_CAP_GUEST_MEMFD_FLAGS_HI) << 32
> In this model, conditionally valid flags are always set,
I followed everything except this snippet.
> but userspace won't be able to do a flags check against the returned 32-bit
> value. Or do you think when this issue comes up, we'd put the flags in the
> upper bits in KVM_CAP_GUEST_MEMFD_FLAGS2 and userspace would then check
> against the OR-ed set of flags instead?
As above, enumerate support for flags 63:32 in a separate capability.
Sean Christopherson <seanjc@google.com> writes: > On Mon, Oct 06, 2025, Ackerley Tng wrote: >> Sean Christopherson <seanjc@google.com> writes: >> >> > Rework the not-yet-released KVM_CAP_GUEST_MEMFD_MMAP into a more generic >> > KVM_CAP_GUEST_MEMFD_FLAGS capability so that adding new flags doesn't >> > require a new capability, and so that developers aren't tempted to bundle >> > multiple flags into a single capability. >> > >> > Note, kvm_vm_ioctl_check_extension_generic() can only return a 32-bit >> > value, but that limitation can be easily circumvented by adding e.g. >> > KVM_CAP_GUEST_MEMFD_FLAGS2 in the unlikely event guest_memfd supports more >> > than 32 flags. >> >> I know you suggested that guest_memfd's HugeTLB sizes shouldn't be >> squashed into the flags. Just using that as an example, would those >> kinds of flags (since they're using the upper bits, above the lower 32 >> bits) be awkward to represent in this new model? > > Are you asking specifically about flags that use bits 63:32? If so, no, I don't > see those as being awkward to deal with. Hopefully we kill of 32-bit KVM and it's > a complete non-issue, but even if we have to add KVM_CAP_GUEST_MEMFD_FLAGS2, I > don't see it being all that awkward for userspace to do: > > uint64_t supported_gmem_flags = kvm_check_extension(KVM_CAP_GUEST_MEMFD_FLAGS) | > (kvm_check_extension(KVM_CAP_GUEST_MEMFD_FLAGS2) << 32); > > We could even mimic what Intel did with 64-bit VMCS fields to handle 32-bit mode, > and explicitly name the second one KVM_CAP_GUEST_MEMFD_FLAGS_HI: > > uint64_t supported_gmem_flags = kvm_check_extension(KVM_CAP_GUEST_MEMFD_FLAGS) | > (kvm_check_extension(KVM_CAP_GUEST_MEMFD_FLAGS_HI) << 32); > Had the same thing in mind, I guess having a precedent (and seeing it in code) makes it seem less awkward. Thanks! > so that if KVM_CAP_GUEST_MEMFD_FLAGS_HI precedes 64-bit-only KVM, it could become > fully redundant, i.e. where someday this would hold true: > > kvm_check_extension(KVM_CAP_GUEST_MEMFD_FLAGS) == > kvm_check_extension(KVM_CAP_GUEST_MEMFD_FLAGS) | kvm_check_extension(KVM_CAP_GUEST_MEMFD_FLAGS_HI) << 32 > >> In this model, conditionally valid flags are always set, > > I followed everything except this snippet. > I meant "conditionally valid" as in if GUEST_MEMFD_FLAG_BAR was valid only when GUEST_MEMFD_FLAG_FOO is set, then with this model, when KVM_CAP_GUEST_MEMFD_FLAGS is queried, would KVM return GUEST_MEMFD_FLAG_MMAP | GUEST_MEMFD_FLAG_FOO | GUEST_MEMFD_FLAG_BAR, where GUEST_MEMFD_FLAG_BAR is the conditionally valid flag? >> but userspace won't be able to do a flags check against the returned 32-bit >> value. Or do you think when this issue comes up, we'd put the flags in the >> upper bits in KVM_CAP_GUEST_MEMFD_FLAGS2 and userspace would then check >> against the OR-ed set of flags instead? > > As above, enumerate support for flags 63:32 in a separate capability. Got it.
On Tue, Oct 07, 2025, Ackerley Tng wrote: > Sean Christopherson <seanjc@google.com> writes: > >> In this model, conditionally valid flags are always set, > > > > I followed everything except this snippet. > > I meant "conditionally valid" as in if GUEST_MEMFD_FLAG_BAR was valid > only when GUEST_MEMFD_FLAG_FOO is set, then with this model, when > KVM_CAP_GUEST_MEMFD_FLAGS is queried, would KVM return > GUEST_MEMFD_FLAG_MMAP | GUEST_MEMFD_FLAG_FOO | GUEST_MEMFD_FLAG_BAR, > where GUEST_MEMFD_FLAG_BAR is the conditionally valid flag? Oh, conditional on other flags (or lack thereof). Yes, the capability would simply enumerate all supported flags, it would not try to communicate which combinations of flags are valid.
© 2016 - 2025 Red Hat, Inc.