On Thu, Mar 26, 2026 at 03:24:17PM -0700, Ackerley Tng wrote:
> Introduce a "version 2" of KVM_SET_MEMORY_ATTRIBUTES to support returning
> information back to userspace.
Hi Ackerley,
Not trying to bikeshed below, but I'm working on getting related QEMU
patches cleaned up to post soon and was working through some of the new
uAPI bits, and plumbing some of these capabilities in seems a little
awkward in a couple places so wondering if we should revisit how some of
this API is defined...
>
> This new ioctl and structure will, in a later patch, be shared as a
> guest_memfd ioctl, where the padding in the new kvm_memory_attributes2
> structure will be for writing the response from the guest_memfd ioctl to
> userspace.
>
> A new ioctl is necessary for these reasons:
>
> 1. KVM_SET_MEMORY_ATTRIBUTES is currently a write-only ioctl and does not
> allow userspace to read fields. There's nothing in code (yet?) that
> validates this, but using _IOWR for consistency would be prudent.
>
> 2. KVM_SET_MEMORY_ATTRIBUTES, when used as a guest_memfd ioctl, will need
> an additional field to provide userspace with more error details.
>
> Alternatively, a completely new ioctl could be defined, unrelated to
> KVM_SET_MEMORY_ATTRIBUTES, but using the same ioctl number and struct for
> the vm and guest_memfd ioctls streamlines the interface for userspace. In
> addition, any memory attributes, implemented on the vm or guest_memfd
> ioctl, can be easily shared with the other.
>
> Add KVM_CAP_MEMORY_ATTRIBUTES2 to indicate that struct
> kvm_memory_attributes2 exists and can be used either with
> KVM_SET_MEMORY_ATTRIBUTES2 via the vm or guest_memfd ioctl.
The guest_memfd support for the KVM_SET_MEMORY_ATTRIBUTES2 ioctl isn't
added until patch #10, and to scan for it you sort of need to infer it
via KVM_CAP_GUEST_MEMFD_MEMORY_ATTRIBUTES reporting non-zero (i.e.
KVM_MEMORY_ATTRIBUTE_PRIVATE), so it's confusing to state that
KVM_CAP_MEMORY_ATTRIBUTES2 means you can use the struct via a guest_memfd
ioctl.
I think the above is trying to simply say that the corresponding struct
exists, and remain agnostic about how it can be used. But if that were
the case, there would be no way to know when KVM_SET_MEMORY_ATTRIBUTES2 is
available in the first place, so in the case of KVM ioctls at least,
KVM_CAP_MEMORY_ATTRIBUTES2 is advertising both the struct and the ioctl,
whereas for guest_memfd it's only advertising the struct and not saying
anything about whether a similar gmem ioctl is available to use it.
Instead, maybe they should both have the same semantics:
KVM_CAP_MEMORY_ATTRIBUTES2: *SET_ATTRIBUTES* ioctl exists for KVM that utilizes
struct kvm_memory_attributes2
KVM_CAP_GUEST_MEMFD_MEMORY_ATTRIBUTES: *SET_ATTRIBUTES* ioctl exists for
guest_memfd that utilizes struct kvm_memory_attributes2
In which case you would leave out any mention of guest_memfd here as far as
the documentation does, and then in patch #10 you could modify it to be
something like:
4.145 KVM_SET_MEMORY_ATTRIBUTES2
---------------------------------
-:Capability: KVM_CAP_MEMORY_ATTRIBUTES2
+:Capability: KVM_CAP_MEMORY_ATTRIBUTES2, KVM_GUEST_MEMFD_CAP_MEMORY_ATTRIBUTES
-:Architectures: x86
+:Architectures: all
-:Type: vm ioctl
+:Type: vm, guest_memfd ioctl
:Parameters: struct kvm_memory_attributes2 (in/out)
:Returns: 0 on success, <0 on error
and *then* add in your mentions of how the usage/fields differ for
guest_memfd/KVM_GUEST_MEMFD_CAP_MEMORY_ATTRIBUTES case vs. KVM ioctls.
This avoids needing to issue 2 checks for the guest_memfd variant vs. 1
for KVM, but more importantly avoids subtle differences in how these
similarly-named capabilities are used/documented that might cause
unecessary confusion.
Thanks,
Mike
>
> Handle KVM_CAP_MEMORY_ATTRIBUTES2 and return the same supported attributes
> as would be returned for KVM_CAP_MEMORY_ATTRIBUTES - the supported
> attributes are the same for now, regardless of the CAP requested.
>
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> ---
> Documentation/virt/kvm/api.rst | 32 ++++++++++++++++++++++++++++++++
> include/uapi/linux/kvm.h | 12 ++++++++++++
> virt/kvm/kvm_main.c | 40 +++++++++++++++++++++++++++++++++++++---
> 3 files changed, 81 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 032516783e962..0b61e2579e1d8 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -6359,6 +6359,8 @@ S390:
> Returns -EINVAL if the VM has the KVM_VM_S390_UCONTROL flag set.
> Returns -EINVAL if called on a protected VM.
>
> +.. _KVM_SET_MEMORY_ATTRIBUTES:
> +
> 4.141 KVM_SET_MEMORY_ATTRIBUTES
> -------------------------------
>
> @@ -6551,6 +6553,36 @@ KVM_S390_KEYOP_SSKE
> Sets the storage key for the guest address ``guest_addr`` to the key
> specified in ``key``, returning the previous value in ``key``.
>
> +4.145 KVM_SET_MEMORY_ATTRIBUTES2
> +---------------------------------
> +
> +:Capability: KVM_CAP_MEMORY_ATTRIBUTES2
> +:Architectures: x86
> +:Type: vm ioctl
> +:Parameters: struct kvm_memory_attributes2 (in/out)
> +:Returns: 0 on success, <0 on error
> +
> +KVM_SET_MEMORY_ATTRIBUTES2 is an extension to
> +KVM_SET_MEMORY_ATTRIBUTES that supports returning (writing) values to
> +userspace. The original (pre-extension) fields are shared with
> +KVM_SET_MEMORY_ATTRIBUTES identically.
> +
> +Attribute values are shared with KVM_SET_MEMORY_ATTRIBUTES.
> +
> +::
> +
> + struct kvm_memory_attributes2 {
> + __u64 address;
> + __u64 size;
> + __u64 attributes;
> + __u64 flags;
> + __u64 reserved[12];
> + };
> +
> + #define KVM_MEMORY_ATTRIBUTE_PRIVATE (1ULL << 3)
> +
> +See also: :ref: `KVM_SET_MEMORY_ATTRIBUTES`.
> +
> .. _kvm_run:
>
> 5. The kvm_run structure
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 80364d4dbebb0..16567d4a769e5 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -989,6 +989,7 @@ struct kvm_enable_cap {
> #define KVM_CAP_ARM_SEA_TO_USER 245
> #define KVM_CAP_S390_USER_OPEREXEC 246
> #define KVM_CAP_S390_KEYOP 247
> +#define KVM_CAP_MEMORY_ATTRIBUTES2 248
>
> struct kvm_irq_routing_irqchip {
> __u32 irqchip;
> @@ -1637,6 +1638,17 @@ struct kvm_memory_attributes {
> __u64 flags;
> };
>
> +/* Available with KVM_CAP_MEMORY_ATTRIBUTES2 */
> +#define KVM_SET_MEMORY_ATTRIBUTES2 _IOWR(KVMIO, 0xd2, struct kvm_memory_attributes2)
> +
> +struct kvm_memory_attributes2 {
> + __u64 address;
> + __u64 size;
> + __u64 attributes;
> + __u64 flags;
> + __u64 reserved[12];
> +};
> +
> #define KVM_MEMORY_ATTRIBUTE_PRIVATE (1ULL << 3)
>
> #define KVM_CREATE_GUEST_MEMFD _IOWR(KVMIO, 0xd4, struct kvm_create_guest_memfd)
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 70b594dafc5cc..3c261904322f0 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2621,9 +2621,10 @@ static int kvm_vm_set_mem_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
> return r;
> }
> static int kvm_vm_ioctl_set_mem_attributes(struct kvm *kvm,
> - struct kvm_memory_attributes *attrs)
> + struct kvm_memory_attributes2 *attrs)
> {
> gfn_t start, end;
> + int i;
>
> /* flags is currently not used. */
> if (attrs->flags)
> @@ -2634,6 +2635,10 @@ static int kvm_vm_ioctl_set_mem_attributes(struct kvm *kvm,
> return -EINVAL;
> if (!PAGE_ALIGNED(attrs->address) || !PAGE_ALIGNED(attrs->size))
> return -EINVAL;
> + for (i = 0; i < ARRAY_SIZE(attrs->reserved); i++) {
> + if (attrs->reserved[i])
> + return -EINVAL;
> + }
>
> start = attrs->address >> PAGE_SHIFT;
> end = (attrs->address + attrs->size) >> PAGE_SHIFT;
> @@ -4966,6 +4971,7 @@ static int kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
> case KVM_CAP_DEVICE_CTRL:
> return 1;
> #ifdef CONFIG_KVM_VM_MEMORY_ATTRIBUTES
> + case KVM_CAP_MEMORY_ATTRIBUTES2:
> case KVM_CAP_MEMORY_ATTRIBUTES:
> if (!vm_memory_attributes)
> return 0;
> @@ -5191,6 +5197,14 @@ do { \
> sizeof_field(struct kvm_userspace_memory_region2, field)); \
> } while (0)
>
> +#define SANITY_CHECK_MEMORY_ATTRIBUTES_FIELD(field) \
> +do { \
> + BUILD_BUG_ON(offsetof(struct kvm_memory_attributes, field) != \
> + offsetof(struct kvm_memory_attributes2, field)); \
> + BUILD_BUG_ON(sizeof_field(struct kvm_memory_attributes, field) != \
> + sizeof_field(struct kvm_memory_attributes2, field)); \
> +} while (0)
> +
> static long kvm_vm_ioctl(struct file *filp,
> unsigned int ioctl, unsigned long arg)
> {
> @@ -5373,15 +5387,35 @@ static long kvm_vm_ioctl(struct file *filp,
> }
> #endif /* CONFIG_HAVE_KVM_IRQ_ROUTING */
> #ifdef CONFIG_KVM_VM_MEMORY_ATTRIBUTES
> + case KVM_SET_MEMORY_ATTRIBUTES2:
> case KVM_SET_MEMORY_ATTRIBUTES: {
> - struct kvm_memory_attributes attrs;
> + struct kvm_memory_attributes2 attrs;
> + unsigned long size;
> +
> + if (ioctl == KVM_SET_MEMORY_ATTRIBUTES) {
> + /*
> + * Fields beyond struct kvm_memory_attributes shouldn't
> + * be accessed, but avoid leaking kernel memory in case
> + * of a bug.
> + */
> + memset(&attrs, 0, sizeof(attrs));
> + size = sizeof(struct kvm_memory_attributes);
> + } else {
> + size = sizeof(struct kvm_memory_attributes2);
> + }
> +
> + /* Ensure the common parts of the two structs are identical. */
> + SANITY_CHECK_MEMORY_ATTRIBUTES_FIELD(address);
> + SANITY_CHECK_MEMORY_ATTRIBUTES_FIELD(size);
> + SANITY_CHECK_MEMORY_ATTRIBUTES_FIELD(attributes);
> + SANITY_CHECK_MEMORY_ATTRIBUTES_FIELD(flags);
>
> r = -ENOTTY;
> if (!vm_memory_attributes)
> goto out;
>
> r = -EFAULT;
> - if (copy_from_user(&attrs, argp, sizeof(attrs)))
> + if (copy_from_user(&attrs, argp, size))
> goto out;
>
> r = kvm_vm_ioctl_set_mem_attributes(kvm, &attrs);
>
> --
> 2.53.0.1018.g2bb0e51243-goog
>