From nobody Sat Nov 30 10:50:17 2024 Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 355181C1AB1 for ; Tue, 10 Sep 2024 23:45:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726011926; cv=none; b=BWkq1IZE7b1mAst4WvvDmMGikZhadi72zhqddEZ/u8iCDxcaKd4G/D1LYDHC+z7Lq9P0HglJYOFYvqgbdnktsRqPs10StwJcbZHDJTARJ+m5nRotBWss3ZUB21BtQcV0lEludzZoER4xjKORwvCuJ4x9SoD2wrWMdU1vw4j7jNg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726011926; c=relaxed/simple; bh=E5exUTcHox7ZsBbSugU4kB9L1tXTbIgmGnzZ+uWTAcI=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=a2Y82VfNlCSqIk4yRNCTze3ERLFhn96RLPdUVKWpX+glmIRNNBLXzhOWxYjlgt2OpMfKub/YhI4Vyd9iLtPRKrRdms6+aoYN9DcDeOMhKmn2ar2TdA0b5Td6kkRB8nH0qksSYxo21OuJsmVle5N3FrXPzAzqvennEz8m3tGZUxU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=kkvXkgr/; arc=none smtp.client-ip=209.85.215.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="kkvXkgr/" Received: by mail-pg1-f201.google.com with SMTP id 41be03b00d2f7-7d817394b39so2910054a12.1 for ; Tue, 10 Sep 2024 16:45:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1726011923; x=1726616723; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=FjI6cBjR9AW7KizYJRES3kHs2nxbgzqD9PMU12DXiXA=; b=kkvXkgr/VVSh+6ffqFLFkQrHLirqCJUIz2CLLG4LBIreXlfxbKEyBtXPDgc57rKgeJ 41smivW0T1pTZGVhcxkL/MoemqHPzyJP+Q/NAC5eIj54zcLigUFWnTq1eamHC9m+5EZT kSTA49px7qb4+TAWYKCLYv5sgKGE9/oUeZFZ/nTyCTOjiD6eK/GLOb8c6xopdvtWZuwM DJyxZaDKR79JWE2WdU7aJTKnt0a9tlwa9g2GxGDWmbwSTEQNencTMsxH1OZwlQ0oE4t0 ccvEXTglDIX+JwpOHGbD7+7cfqzLpNFsXhmDYj9C0OSgcsrkNKV5NVxRQnu+uj+bmQr6 eMYA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1726011923; x=1726616723; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=FjI6cBjR9AW7KizYJRES3kHs2nxbgzqD9PMU12DXiXA=; b=rVzii7rSILMSQ0U+MhDyRMPVZUGdvrEUGWIhh8WbplN1DW8xpYKYf2PADb5LzDU+cY hkRHFRHXoAZ0k5QP5zjKnejWBFAj0F3krqELF7gXGR47HxW2s6GKzfHP1QqCSGRNZy92 +HJ2tp1+XS6T2mrCi9VGS3HjcWXV2MbeJvtHt/hevrdxUyL2xH1qgd5GW/4FD/YuTWhQ UUFyHQ7Nnu7CWXZFhgPLINwyDGNiOyRwsL4r+iR5ChFtbzrUDwZr3QfDKy2JwOLrG1UD eBFwub+QsQg4vjgQfih6cMLT50SPxEo0zLFvK8xiek5gA8F+F2BbJuQwwkQwE1frrBjF 8r4A== X-Forwarded-Encrypted: i=1; AJvYcCV4g+e2AmCJGLzNL8A3G0+LUdEgr42gwynaWb52YfD/Gsm3nlKl19RW0v+B1OCzIv7xFqixEMGEmKISK68=@vger.kernel.org X-Gm-Message-State: AOJu0Yz0vdPfo3tIwH+23auYZPUSAGV2FQ3E6IAT3QFMUBZ+7NWXJGQR /Cdx6A1eUMdAstxlUwODfr6N+8rEgkSGnbDq7bJX8Rg4PSc0B9XwmFwEbMcstwPQYgAuaNAtpct 3+WXauIZCm2MSTfOvx1przA== X-Google-Smtp-Source: AGHT+IHSKJ4yx6H+R6vmjAajFKhOO5Lbm3TP4ppQgPnLaqzbpul3ZYzZk4Z4nMlq9x4ej3kbxbURUt5MTOfpTqRbXQ== X-Received: from ackerleytng-ctop.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:13f8]) (user=ackerleytng job=sendgmr) by 2002:a17:902:c405:b0:207:50e2:f54 with SMTP id d9443c01a7336-20750e20f6emr1226925ad.1.1726011923409; Tue, 10 Sep 2024 16:45:23 -0700 (PDT) Date: Tue, 10 Sep 2024 23:44:00 +0000 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.46.0.598.g6f2099f65c-goog Message-ID: Subject: [RFC PATCH 29/39] KVM: Handle conversions in the SET_MEMORY_ATTRIBUTES ioctl From: Ackerley Tng To: tabba@google.com, quic_eberman@quicinc.com, roypat@amazon.co.uk, jgg@nvidia.com, peterx@redhat.com, david@redhat.com, rientjes@google.com, fvdl@google.com, jthoughton@google.com, seanjc@google.com, pbonzini@redhat.com, zhiquan1.li@intel.com, fan.du@intel.com, jun.miao@intel.com, isaku.yamahata@intel.com, muchun.song@linux.dev, mike.kravetz@oracle.com Cc: erdemaktas@google.com, vannapurve@google.com, ackerleytng@google.com, qperret@google.com, jhubbard@nvidia.com, willy@infradead.org, shuah@kernel.org, brauner@kernel.org, bfoster@redhat.com, kent.overstreet@linux.dev, pvorel@suse.cz, rppt@kernel.org, richard.weiyang@gmail.com, anup@brainfault.org, haibo1.xu@intel.com, ajones@ventanamicro.com, vkuznets@redhat.com, maciej.wieczor-retman@intel.com, pgonda@google.com, oliver.upton@linux.dev, linux-kernel@vger.kernel.org, linux-mm@kvack.org, kvm@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-fsdevel@kvack.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The key steps for a private to shared conversion are: 1. Unmap from guest page tables 2. Set pages associated with requested range in memslot to be faultable 3. Update kvm->mem_attr_array The key steps for a shared to private conversion are: 1. Check and disallow set_memory_attributes if any page in the range is still mapped or pinned, by a. Updating guest_memfd's faultability to prevent future faulting b. Returning -EINVAL if any pages are still pinned. 2. Update kvm->mem_attr_array Userspace VMM must ensure shared pages are not in use, since any faults racing with this call will get a SIGBUS. Co-developed-by: Ackerley Tng Signed-off-by: Ackerley Tng Co-developed-by: Vishal Annapurve Signed-off-by: Vishal Annapurve --- include/linux/kvm_host.h | 1 + virt/kvm/guest_memfd.c | 207 +++++++++++++++++++++++++++++++++++++++ virt/kvm/kvm_main.c | 15 +++ virt/kvm/kvm_mm.h | 9 ++ 4 files changed, 232 insertions(+) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 79a6b1a63027..10993cd33e34 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -2476,6 +2476,7 @@ typedef int (*kvm_gmem_populate_cb)(struct kvm *kvm, = gfn_t gfn, kvm_pfn_t pfn, =20 long kvm_gmem_populate(struct kvm *kvm, gfn_t gfn, void __user *src, long = npages, kvm_gmem_populate_cb post_populate, void *opaque); + #endif =20 #ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index 1d4dfe0660ad..110c4bbb004b 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -1592,4 +1592,211 @@ long kvm_gmem_populate(struct kvm *kvm, gfn_t start= _gfn, void __user *src, long return ret && !i ? ret : i; } EXPORT_SYMBOL_GPL(kvm_gmem_populate); + +/** + * Returns true if pages in range [@start, @end) in inode @inode have no + * userspace mappings. + */ +static bool kvm_gmem_no_mappings_range(struct inode *inode, pgoff_t start,= pgoff_t end) +{ + pgoff_t index; + bool checked_indices_unmapped; + + filemap_invalidate_lock_shared(inode->i_mapping); + + /* TODO: replace iteration with filemap_get_folios() for efficiency. */ + checked_indices_unmapped =3D true; + for (index =3D start; checked_indices_unmapped && index < end;) { + struct folio *folio; + + /* Don't use kvm_gmem_get_folio to avoid allocating */ + folio =3D filemap_lock_folio(inode->i_mapping, index); + if (IS_ERR(folio)) { + ++index; + continue; + } + + if (folio_mapped(folio) || folio_maybe_dma_pinned(folio)) + checked_indices_unmapped =3D false; + else + index =3D folio_next_index(folio); + + folio_unlock(folio); + folio_put(folio); + } + + filemap_invalidate_unlock_shared(inode->i_mapping); + return checked_indices_unmapped; +} + +/** + * Returns true if pages in range [@start, @end) in memslot @slot have no + * userspace mappings. + */ +static bool kvm_gmem_no_mappings_slot(struct kvm_memory_slot *slot, + gfn_t start, gfn_t end) +{ + pgoff_t offset_start; + pgoff_t offset_end; + struct file *file; + bool ret; + + offset_start =3D start - slot->base_gfn + slot->gmem.pgoff; + offset_end =3D end - slot->base_gfn + slot->gmem.pgoff; + + file =3D kvm_gmem_get_file(slot); + if (!file) + return false; + + ret =3D kvm_gmem_no_mappings_range(file_inode(file), offset_start, offset= _end); + + fput(file); + + return ret; +} + +/** + * Returns true if pages in range [@start, @end) have no host userspace ma= ppings. + */ +static bool kvm_gmem_no_mappings(struct kvm *kvm, gfn_t start, gfn_t end) +{ + int i; + + lockdep_assert_held(&kvm->slots_lock); + + for (i =3D 0; i < kvm_arch_nr_memslot_as_ids(kvm); i++) { + struct kvm_memslot_iter iter; + struct kvm_memslots *slots; + + slots =3D __kvm_memslots(kvm, i); + kvm_for_each_memslot_in_gfn_range(&iter, slots, start, end) { + struct kvm_memory_slot *slot; + gfn_t gfn_start; + gfn_t gfn_end; + + slot =3D iter.slot; + gfn_start =3D max(start, slot->base_gfn); + gfn_end =3D min(end, slot->base_gfn + slot->npages); + + if (iter.slot->flags & KVM_MEM_GUEST_MEMFD && + !kvm_gmem_no_mappings_slot(iter.slot, gfn_start, gfn_end)) + return false; + } + } + + return true; +} + +/** + * Set faultability of given range of gfns [@start, @end) in memslot @slot= to + * @faultable. + */ +static void kvm_gmem_set_faultable_slot(struct kvm_memory_slot *slot, gfn_= t start, + gfn_t end, bool faultable) +{ + pgoff_t start_offset; + pgoff_t end_offset; + struct file *file; + + file =3D kvm_gmem_get_file(slot); + if (!file) + return; + + start_offset =3D start - slot->base_gfn + slot->gmem.pgoff; + end_offset =3D end - slot->base_gfn + slot->gmem.pgoff; + + WARN_ON(kvm_gmem_set_faultable(file_inode(file), start_offset, end_offset, + faultable)); + + fput(file); +} + +/** + * Set faultability of given range of gfns [@start, @end) in memslot @slot= to + * @faultable. + */ +static void kvm_gmem_set_faultable_vm(struct kvm *kvm, gfn_t start, gfn_t = end, + bool faultable) +{ + int i; + + lockdep_assert_held(&kvm->slots_lock); + + for (i =3D 0; i < kvm_arch_nr_memslot_as_ids(kvm); i++) { + struct kvm_memslot_iter iter; + struct kvm_memslots *slots; + + slots =3D __kvm_memslots(kvm, i); + kvm_for_each_memslot_in_gfn_range(&iter, slots, start, end) { + struct kvm_memory_slot *slot; + gfn_t gfn_start; + gfn_t gfn_end; + + slot =3D iter.slot; + gfn_start =3D max(start, slot->base_gfn); + gfn_end =3D min(end, slot->base_gfn + slot->npages); + + if (iter.slot->flags & KVM_MEM_GUEST_MEMFD) { + kvm_gmem_set_faultable_slot(slot, gfn_start, + gfn_end, faultable); + } + } + } +} + +/** + * Returns true if guest_memfd permits setting range [@start, @end) to PRI= VATE. + * + * If memory is faulted in to host userspace and a request was made to set= the + * memory to PRIVATE, the faulted in pages must not be pinned for the requ= est to + * be permitted. + */ +static int kvm_gmem_should_set_attributes_private(struct kvm *kvm, gfn_t s= tart, + gfn_t end) +{ + kvm_gmem_set_faultable_vm(kvm, start, end, false); + + if (kvm_gmem_no_mappings(kvm, start, end)) + return 0; + + kvm_gmem_set_faultable_vm(kvm, start, end, true); + return -EINVAL; +} + +/** + * Returns true if guest_memfd permits setting range [@start, @end) to SHA= RED. + * + * Because this allows pages to be faulted in to userspace, this must only= be + * called after the pages have been invalidated from guest page tables. + */ +static int kvm_gmem_should_set_attributes_shared(struct kvm *kvm, gfn_t st= art, + gfn_t end) +{ + /* Always okay to set shared, hence set range faultable here. */ + kvm_gmem_set_faultable_vm(kvm, start, end, true); + + return 0; +} + +/** + * Returns 0 if guest_memfd permits setting attributes @attrs for range [@= start, + * @end) or negative error otherwise. + * + * If memory is faulted in to host userspace and a request was made to set= the + * memory to PRIVATE, the faulted in pages must not be pinned for the requ= est to + * be permitted. + * + * Because this may allow pages to be faulted in to userspace when request= ed to + * set attributes to shared, this must only be called after the pages have= been + * invalidated from guest page tables. + */ +int kvm_gmem_should_set_attributes(struct kvm *kvm, gfn_t start, gfn_t end, + unsigned long attrs) +{ + if (attrs & KVM_MEMORY_ATTRIBUTE_PRIVATE) + return kvm_gmem_should_set_attributes_private(kvm, start, end); + else + return kvm_gmem_should_set_attributes_shared(kvm, start, end); +} + #endif diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 92901656a0d4..1a7bbcc31b7e 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2524,6 +2524,13 @@ static int kvm_vm_set_mem_attributes(struct kvm *kvm= , gfn_t start, gfn_t end, .on_lock =3D kvm_mmu_invalidate_end, .may_block =3D true, }; + struct kvm_mmu_notifier_range error_set_range =3D { + .start =3D start, + .end =3D end, + .handler =3D (void *)kvm_null_fn, + .on_lock =3D kvm_mmu_invalidate_end, + .may_block =3D true, + }; unsigned long i; void *entry; int r =3D 0; @@ -2548,6 +2555,10 @@ static int kvm_vm_set_mem_attributes(struct kvm *kvm= , gfn_t start, gfn_t end, =20 kvm_handle_gfn_range(kvm, &pre_set_range); =20 + r =3D kvm_gmem_should_set_attributes(kvm, start, end, attributes); + if (r) + goto err; + for (i =3D start; i < end; i++) { r =3D xa_err(xa_store(&kvm->mem_attr_array, i, entry, GFP_KERNEL_ACCOUNT)); @@ -2560,6 +2571,10 @@ static int kvm_vm_set_mem_attributes(struct kvm *kvm= , gfn_t start, gfn_t end, mutex_unlock(&kvm->slots_lock); =20 return r; + +err: + kvm_handle_gfn_range(kvm, &error_set_range); + goto out_unlock; } static int kvm_vm_ioctl_set_mem_attributes(struct kvm *kvm, struct kvm_memory_attributes *attrs) diff --git a/virt/kvm/kvm_mm.h b/virt/kvm/kvm_mm.h index 715f19669d01..d8ff2b380d0e 100644 --- a/virt/kvm/kvm_mm.h +++ b/virt/kvm/kvm_mm.h @@ -41,6 +41,8 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_gu= est_memfd *args); int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot, unsigned int fd, loff_t offset); void kvm_gmem_unbind(struct kvm_memory_slot *slot); +int kvm_gmem_should_set_attributes(struct kvm *kvm, gfn_t start, gfn_t end, + unsigned long attrs); #else static inline void kvm_gmem_init(struct module *module) { @@ -59,6 +61,13 @@ static inline void kvm_gmem_unbind(struct kvm_memory_slo= t *slot) { WARN_ON_ONCE(1); } + +static inline int kvm_gmem_should_set_attributes(struct kvm *kvm, gfn_t st= art, + gfn_t end, unsigned long attrs) +{ + return 0; +} + #endif /* CONFIG_KVM_PRIVATE_MEM */ =20 #endif /* __KVM_MM_H__ */ --=20 2.46.0.598.g6f2099f65c-goog