From nobody Wed Dec 17 21:13:59 2025 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7FF7828030E for ; Fri, 3 Oct 2025 23:26:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759533979; cv=none; b=KZ80vYu1fvWNSP+UW9p25vrKUintb6K7kt9iiVfrn3xgesDYljZPBfsBO9RhkBlwoS4Whnu1jKssMTtEgA0hTmw1nK7bB3AHbo/41Cdnv2ipQAuMlJv4yFWo2JxOqUsYxwzJZl0+OoZr1ziOFpLbjGVI0z5JFrG+MveZuuTGLQc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759533979; c=relaxed/simple; bh=1v/mhrPlVxJZgZzeIdRHd4nNgyUm+aJgX/qajFTMclY=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=By3mb2U1uuWN7voNAMY5KTu3Ow+tCBdZ5B45yIj8N8P2iyfqoOJAXXuXGxsKp3iUy88IiTCObCNJafOR9Cz1b4aqEI/LYlpN4Ac5iPQ+3CNpboYFUEGtpuP3s6aZHFSM4oNPwPKRNkHYCPYpVUZv5imWOJVJVKImqpcCQuHq5tc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=tsq0v68m; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="tsq0v68m" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-32ea7fcddd8so5688334a91.3 for ; Fri, 03 Oct 2025 16:26:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1759533977; x=1760138777; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=dErpNYwkBnA4Ms/9JY48CCyGK42f8PnH4KojDbSi2ZQ=; b=tsq0v68m4Q8XGsb36fvGYMi0BmWqSAWgyjx2B6us0aGKI1lYzxpJUViqjBzXiwAziy IrK+w5fzKOOKT0MEOmY8SkjKHlfVogX8AjvvVl84ALxwHyMiMtOv94KTZ3v3lVqcM1nv gZZA1OWzgLW1PBptxXVwGL5Sl4RkLIfoB18befV+F/vHPnq3KVdhmnrgRvLkiOiuhtWi r3K9wgwtmXwJpVpqy2h08m05TW3v5Gp/6lcXQftwqrY7jQi5RdDnDUO3sjAoYuRKm4PG cCQLRXKrAYN23LAiDRVCwjF+Z+X9S7SIiSgklhuFog6RqM4R69XAKLtxhEahKAeJISBS r/Jg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1759533977; x=1760138777; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=dErpNYwkBnA4Ms/9JY48CCyGK42f8PnH4KojDbSi2ZQ=; b=Q294VM+hDLB3jWL1alPcrUF8Dm8g0owPmG5hLjndZ+x52WTJj09uzqd0E2N6D9qAaR WWRQnJM8tKn+2I1dV7dgUmURuzd3wTFhb38MUphx7/SrHq3/b3BVmoVhHhPXA09R+DQn 9ZP+seXQU+2yAgRHIG/Jd0wFqwLoACVLDfHJ6FKoT6RnuIjmzQENHKA0HK50GlZoy6+t qom3XxxctIG107/+cBFLQZYZYbtYcBPMiQYr9UGOoPiY3GIM/WockbxOzHKNKG3hx4xJ kt1gBuYa545ifXcntxhPqr57dC5gAuDwV4PxbYxNqgEuC7TBJPGqt11+Uc8gHkh+DC/8 7Pug== X-Forwarded-Encrypted: i=1; AJvYcCUMdS26Vw4SO+3YyiwfWOGS6CY7naaIQTHwe44DAcPHXD2W1swkk+w1Mnm+fN9wvbu8W0FfRqtcU4yPM9Q=@vger.kernel.org X-Gm-Message-State: AOJu0Ywxh9kNGBmkQryJwNh9ClmoBgXeoLIIO+2U39hYeLxK7F/UXY+c W2q+YLFN4rpnmW5SC7gr3dMIlmnFDjz51NYpcIhwXrApZ1iXNaprMoklE/r262mpEe0yX8S6LW3 u0f/o7A== X-Google-Smtp-Source: AGHT+IGmLnipcJ89n+6tzffsI8k91zXGxgNt382krjG/YtWM7Ua8ai7tDSt5Ju3IqqrXrgiSZn6heHHqOuc= X-Received: from pjbnn13.prod.google.com ([2002:a17:90b:38cd:b0:32e:3830:65f2]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:38d0:b0:330:6c5a:4af4 with SMTP id 98e67ed59e1d1-339c27cf614mr5247147a91.35.1759533976769; Fri, 03 Oct 2025 16:26:16 -0700 (PDT) Reply-To: Sean Christopherson Date: Fri, 3 Oct 2025 16:25:55 -0700 In-Reply-To: <20251003232606.4070510-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20251003232606.4070510-1-seanjc@google.com> X-Mailer: git-send-email 2.51.0.618.g983fd99d29-goog Message-ID: <20251003232606.4070510-3-seanjc@google.com> Subject: [PATCH v2 02/13] KVM: guest_memfd: Add INIT_SHARED flag, reject user page faults if not set From: Sean Christopherson To: Paolo Bonzini , Sean Christopherson , Christian Borntraeger , Janosch Frank , Claudio Imbrenda Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, David Hildenbrand , Fuad Tabba , Ackerley Tng Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add a guest_memfd flag to allow userspace to state that the underlying memory should be configured to be initialized as shared, and reject user page faults if the guest_memfd instance's memory isn't shared. Because KVM doesn't yet support in-place private<=3D>shared conversions, all guest_memfd memory effectively follows the initial state. Alternatively, KVM could deduce the initial state based on MMAP, which for all intents and purposes is what KVM currently does. However, implicitly deriving the default state based on MMAP will result in a messy ABI when support for in-place conversions is added. For x86 CoCo VMs, which don't yet support MMAP, memory is currently private by default (otherwise the memory would be unusable). If MMAP implies memory is shared by default, then the default state for CoCo VMs will vary based on MMAP, and from userspace's perspective, will change when in-place conversion support is added. I.e. to maintain guest<=3D>host ABI, userspace would need to immediately convert all memory from shared=3D>private, which is both ugly and inefficient. The inefficiency could be avoided by adding a flag to state that memory is _private_ by default, irrespective of MMAP, but that would lead to an equally messy and hard to document ABI. Bite the bullet and immediately add a flag to control the default state so that the effective behavior is explicit and straightforward. Fixes: 3d3a04fad25a ("KVM: Allow and advertise support for host mmap() on g= uest_memfd files") Cc: David Hildenbrand Reviewed-by: Fuad Tabba Tested-by: Fuad Tabba Reviewed-by: Ackerley Tng Signed-off-by: Sean Christopherson Reviewed-by: David Hildenbrand Tested-by: Ackerley Tng --- Documentation/virt/kvm/api.rst | 5 +++++ include/uapi/linux/kvm.h | 3 ++- tools/testing/selftests/kvm/guest_memfd_test.c | 15 ++++++++++++--- virt/kvm/guest_memfd.c | 6 +++++- virt/kvm/kvm_main.c | 3 ++- 5 files changed, 26 insertions(+), 6 deletions(-) diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 7ba92f2ced38..754b662a453c 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -6438,6 +6438,11 @@ specified via KVM_CREATE_GUEST_MEMFD. Currently def= ined flags: =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D GUEST_MEMFD_FLAG_MMAP Enable using mmap() on the guest_memfd file descriptor. + GUEST_MEMFD_FLAG_INIT_SHARED Make all memory in the file shared during + KVM_CREATE_GUEST_MEMFD (memory files created + without INIT_SHARED will be marked private). + Shared memory can be faulted into host user= space + page tables. Private memory cannot. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D =20 When the KVM MMU performs a PFN lookup to service a guest fault and the ba= cking diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index b1d52d0c56ec..52f6000ab020 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1599,7 +1599,8 @@ struct kvm_memory_attributes { #define KVM_MEMORY_ATTRIBUTE_PRIVATE (1ULL << 3) =20 #define KVM_CREATE_GUEST_MEMFD _IOWR(KVMIO, 0xd4, struct kvm_create_guest= _memfd) -#define GUEST_MEMFD_FLAG_MMAP (1ULL << 0) +#define GUEST_MEMFD_FLAG_MMAP (1ULL << 0) +#define GUEST_MEMFD_FLAG_INIT_SHARED (1ULL << 1) =20 struct kvm_create_guest_memfd { __u64 size; diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing= /selftests/kvm/guest_memfd_test.c index 3e58bd496104..0de56ce3c4e2 100644 --- a/tools/testing/selftests/kvm/guest_memfd_test.c +++ b/tools/testing/selftests/kvm/guest_memfd_test.c @@ -239,8 +239,9 @@ static void test_create_guest_memfd_multiple(struct kvm= _vm *vm) close(fd1); } =20 -static void test_guest_memfd_flags(struct kvm_vm *vm, uint64_t valid_flags) +static void test_guest_memfd_flags(struct kvm_vm *vm) { + uint64_t valid_flags =3D vm_check_cap(vm, KVM_CAP_GUEST_MEMFD_FLAGS); size_t page_size =3D getpagesize(); uint64_t flag; int fd; @@ -274,6 +275,10 @@ static void test_guest_memfd(unsigned long vm_type) vm =3D vm_create_barebones_type(vm_type); flags =3D vm_check_cap(vm, KVM_CAP_GUEST_MEMFD_FLAGS); =20 + /* This test doesn't yet support testing mmap() on private memory. */ + if (!(flags & GUEST_MEMFD_FLAG_INIT_SHARED)) + flags &=3D ~GUEST_MEMFD_FLAG_MMAP; + test_create_guest_memfd_multiple(vm); test_create_guest_memfd_invalid_sizes(vm, flags, page_size); =20 @@ -292,7 +297,7 @@ static void test_guest_memfd(unsigned long vm_type) test_fallocate(fd, page_size, total_size); test_invalid_punch_hole(fd, page_size, total_size); =20 - test_guest_memfd_flags(vm, flags); + test_guest_memfd_flags(vm); =20 close(fd); kvm_vm_free(vm); @@ -334,9 +339,13 @@ static void test_guest_memfd_guest(void) TEST_ASSERT(vm_check_cap(vm, KVM_CAP_GUEST_MEMFD_FLAGS) & GUEST_MEMFD_FLA= G_MMAP, "Default VM type should support MMAP, supported flags =3D 0x%x", vm_check_cap(vm, KVM_CAP_GUEST_MEMFD_FLAGS)); + TEST_ASSERT(vm_check_cap(vm, KVM_CAP_GUEST_MEMFD_FLAGS) & GUEST_MEMFD_FLA= G_INIT_SHARED, + "Default VM type should support INIT_SHARED, supported flags =3D 0x%= x", + vm_check_cap(vm, KVM_CAP_GUEST_MEMFD_FLAGS)); =20 size =3D vm->page_size; - fd =3D vm_create_guest_memfd(vm, size, GUEST_MEMFD_FLAG_MMAP); + fd =3D vm_create_guest_memfd(vm, size, GUEST_MEMFD_FLAG_MMAP | + GUEST_MEMFD_FLAG_INIT_SHARED); vm_set_user_memory_region2(vm, slot, KVM_MEM_GUEST_MEMFD, gpa, size, NULL= , fd, 0); =20 mem =3D mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index 94bafd6c558c..cf3afba23a6b 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -328,6 +328,9 @@ static vm_fault_t kvm_gmem_fault_user_mapping(struct vm= _fault *vmf) if (((loff_t)vmf->pgoff << PAGE_SHIFT) >=3D i_size_read(inode)) return VM_FAULT_SIGBUS; =20 + if (!((u64)inode->i_private & GUEST_MEMFD_FLAG_INIT_SHARED)) + return VM_FAULT_SIGBUS; + folio =3D kvm_gmem_get_folio(inode, vmf->pgoff); if (IS_ERR(folio)) { int err =3D PTR_ERR(folio); @@ -525,7 +528,8 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_= guest_memfd *args) u64 valid_flags =3D 0; =20 if (kvm_arch_supports_gmem_mmap(kvm)) - valid_flags |=3D GUEST_MEMFD_FLAG_MMAP; + valid_flags |=3D GUEST_MEMFD_FLAG_MMAP | + GUEST_MEMFD_FLAG_INIT_SHARED; =20 if (flags & ~valid_flags) return -EINVAL; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index e3a268757621..5f644ca54af3 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -4930,7 +4930,8 @@ static int kvm_vm_ioctl_check_extension_generic(struc= t kvm *kvm, long arg) return 1; case KVM_CAP_GUEST_MEMFD_FLAGS: if (!kvm || kvm_arch_supports_gmem_mmap(kvm)) - return GUEST_MEMFD_FLAG_MMAP; + return GUEST_MEMFD_FLAG_MMAP | + GUEST_MEMFD_FLAG_INIT_SHARED; =20 return 0; #endif --=20 2.51.0.618.g983fd99d29-goog