From nobody Thu Oct 2 14:27:41 2025 Received: from fra-out-009.esa.eu-central-1.outbound.mail-perimeter.amazon.com (fra-out-009.esa.eu-central-1.outbound.mail-perimeter.amazon.com [3.64.237.68]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5C5682A1BA; Mon, 15 Sep 2025 16:18:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=3.64.237.68 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757953123; cv=none; b=iOxhYVA884jgp+Sfs1tRviNEWY4P6ql83bHySzAtXAJwyvKx2tPdzuSdJakExsM5fOBxwmACn605iEaV06EnsVNEzz5GF06X0WzGJZBorQ1TbYZWsQ8yKH083Itxep6nW7QSQuNrn0lZRPM7rAkCcG6cV8rr09zlnGh02JGD4IM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757953123; c=relaxed/simple; bh=FdYKDNIsDnnf446Tjuq2vzCml11mCYrOCvZMObDx8F8=; h=From:To:CC:Subject:Date:Message-ID:References:In-Reply-To: Content-Type:MIME-Version; b=X3YcbDfjOxVlgqwLz1guY0RTqBYoCTwZo0kocTPicGFdPxNNplIYj+Cnf/U67sQfHWX2PrR7vn2a5PYWxu1setB5dP/DkLB8w5tnydrAVXoyDjT385hXrLcmzIXY70gIc2CCK49hZTS70mRoVJ1RnT8DbDGVPAwCItGzZhrPxyw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.co.uk; spf=pass smtp.mailfrom=amazon.co.uk; dkim=pass (2048-bit key) header.d=amazon.co.uk header.i=@amazon.co.uk header.b=H7ql8+cs; arc=none smtp.client-ip=3.64.237.68 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.co.uk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.co.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=amazon.co.uk header.i=@amazon.co.uk header.b="H7ql8+cs" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.co.uk; i=@amazon.co.uk; q=dns/txt; s=amazoncorp2; t=1757953119; x=1789489119; h=from:to:cc:subject:date:message-id:references: in-reply-to:content-transfer-encoding:mime-version; bh=+7Ihk8dN1OX3rIDkweFMJxbqL33X3cWczmsnQZ84nD0=; b=H7ql8+csm4OdpmMpg+Q/3AZLcnmOkQMj8cmoHX3WnDSdK2o6uRiL51nR IeHOXvalIchLc1xnMEk4OqaQ88/txodrj04HmYwsOewI5XQ4FEW01zJ7/ dgj5GrOMc8HSXVyjNcQxheh3WWvoJQwqcFvRuBXUsd5gwL82tBMQL5hoo giiNLxb9Jy3UnTBhhpCNa0u+UiYUVO0BzT+JipOvdO1rlDa1v9lPHwJGy k6qk6FCwhaZPrrGAIuH7+VL4JC1ttmmJijr5NyQl+6gZ4sN0c3qL8W8Nf d5a6tzrLxr5rkNB5LbBciSSvtYI1jgBk21LjUyUhc+YRUN4nt/6go98UR w==; X-CSE-ConnectionGUID: FzHRW/IQS9iWkCJBJyujBg== X-CSE-MsgGUID: qDwG3z9ZTri5JNhH4IsCNQ== X-IronPort-AV: E=Sophos;i="6.18,266,1751241600"; d="scan'208";a="2037073" Received: from ip-10-6-11-83.eu-central-1.compute.internal (HELO smtpout.naws.eu-central-1.prod.farcaster.email.amazon.dev) ([10.6.11.83]) by internal-fra-out-009.esa.eu-central-1.outbound.mail-perimeter.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Sep 2025 16:18:29 +0000 Received: from EX19MTAEUC002.ant.amazon.com [54.240.197.228:31782] by smtpin.naws.eu-central-1.prod.farcaster.email.amazon.dev [10.0.39.25:2525] with esmtp (Farcaster) id eb035914-eb78-4888-80be-0d4b8f8fa5e0; Mon, 15 Sep 2025 16:18:28 +0000 (UTC) X-Farcaster-Flow-ID: eb035914-eb78-4888-80be-0d4b8f8fa5e0 Received: from EX19D022EUC001.ant.amazon.com (10.252.51.254) by EX19MTAEUC002.ant.amazon.com (10.252.51.181) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.2562.20; Mon, 15 Sep 2025 16:18:28 +0000 Received: from EX19D022EUC002.ant.amazon.com (10.252.51.137) by EX19D022EUC001.ant.amazon.com (10.252.51.254) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.2562.20; Mon, 15 Sep 2025 16:18:28 +0000 Received: from EX19D022EUC002.ant.amazon.com ([fe80::bd:307b:4d3a:7d80]) by EX19D022EUC002.ant.amazon.com ([fe80::bd:307b:4d3a:7d80%3]) with mapi id 15.02.2562.020; Mon, 15 Sep 2025 16:18:28 +0000 From: "Kalyazin, Nikita" To: "akpm@linux-foundation.org" , "david@redhat.com" , "pbonzini@redhat.com" , "seanjc@google.com" , "viro@zeniv.linux.org.uk" , "brauner@kernel.org" CC: "peterx@redhat.com" , "lorenzo.stoakes@oracle.com" , "Liam.Howlett@oracle.com" , "willy@infradead.org" , "vbabka@suse.cz" , "rppt@kernel.org" , "surenb@google.com" , "mhocko@suse.com" , "jack@suse.cz" , "linux-mm@kvack.org" , "kvm@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "linux-fsdevel@vger.kernel.org" , "jthoughton@google.com" , "tabba@google.com" , "vannapurve@google.com" , "Roy, Patrick" , "Thomson, Jack" , "Manwaring, Derek" , "Cali, Marco" , "Kalyazin, Nikita" Subject: [RFC PATCH v6 1/2] mm: guestmem: introduce guestmem library Thread-Topic: [RFC PATCH v6 1/2] mm: guestmem: introduce guestmem library Thread-Index: AQHcJlxeAltL5rIkhE6iLHFOGH9QMQ== Date: Mon, 15 Sep 2025 16:18:27 +0000 Message-ID: <20250915161815.40729-2-kalyazin@amazon.com> References: <20250915161815.40729-1-kalyazin@amazon.com> In-Reply-To: <20250915161815.40729-1-kalyazin@amazon.com> Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: Content-Transfer-Encoding: quoted-printable Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" From: Nikita Kalyazin Move MM-generic parts of guest_memfd from KVM to MM. This allows other hypervisors to use guestmem code and enables UserfaultFD implementation for guest_memfd [1]. Previously it was not possible because KVM (and guest_memfd code) might be built as a module. Based on a patch by Elliot Berman [2]. [1] https://lore.kernel.org/kvm/20250404154352.23078-1-kalyazin@amazon.com [2] https://lore.kernel.org/kvm/20241122-guestmem-library-v5-2-450e92951a15= @quicinc.com Signed-off-by: Nikita Kalyazin --- MAINTAINERS | 2 + include/linux/guestmem.h | 46 +++++ mm/Kconfig | 3 + mm/Makefile | 1 + mm/guestmem.c | 380 +++++++++++++++++++++++++++++++++++++++ virt/kvm/Kconfig | 1 + virt/kvm/guest_memfd.c | 303 ++++--------------------------- 7 files changed, 465 insertions(+), 271 deletions(-) create mode 100644 include/linux/guestmem.h create mode 100644 mm/guestmem.c diff --git a/MAINTAINERS b/MAINTAINERS index fed6cd812d79..c468c4847ffd 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -15956,6 +15956,7 @@ W: http://www.linux-mm.org T: git git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm T: quilt git://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new F: mm/ +F: mm/guestmem.c F: tools/mm/ =20 MEMORY MANAGEMENT - CORE @@ -15973,6 +15974,7 @@ W: http://www.linux-mm.org T: git git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm F: include/linux/gfp.h F: include/linux/gfp_types.h +F: include/linux/guestmem.h F: include/linux/highmem.h F: include/linux/memory.h F: include/linux/mm.h diff --git a/include/linux/guestmem.h b/include/linux/guestmem.h new file mode 100644 index 000000000000..2a173261d32b --- /dev/null +++ b/include/linux/guestmem.h @@ -0,0 +1,46 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_GUESTMEM_H +#define _LINUX_GUESTMEM_H + +#include + +struct address_space; +struct list_head; +struct inode; + +/** + * struct guestmem_ops - Hypervisor-specific maintenance operations + * @release_folio - Try to bring the folio back to fully owned by Linux + * for instance: about to free the folio [optional] + * @invalidate_begin - start invalidating mappings between start and end o= ffsets + * @invalidate_end - paired with ->invalidate_begin() [optional] + * @supports_mmap - return true if the inode supports mmap [optional] + */ +struct guestmem_ops { + bool (*release_folio)(struct address_space *mapping, + struct folio *folio); + void (*invalidate_begin)(struct list_head *entry, pgoff_t start, + pgoff_t end); + void (*invalidate_end)(struct list_head *entry, pgoff_t start, + pgoff_t end); + bool (*supports_mmap)(struct inode *inode); +}; + +int guestmem_attach_mapping(struct address_space *mapping, + const struct guestmem_ops *const ops, + struct list_head *data); +void guestmem_detach_mapping(struct address_space *mapping, + struct list_head *data); + +struct folio *guestmem_grab_folio(struct address_space *mapping, pgoff_t i= ndex); + +int guestmem_punch_hole(struct address_space *mapping, loff_t offset, + loff_t len); +int guestmem_allocate(struct address_space *mapping, loff_t offset, loff_t= len); + +bool guestmem_test_no_direct_map(struct inode *inode); +void guestmem_mark_prepared(struct folio *folio); +int guestmem_mmap(struct file *file, struct vm_area_struct *vma); +bool guestmem_vma_is_guestmem(struct vm_area_struct *vma); + +#endif diff --git a/mm/Kconfig b/mm/Kconfig index e443fe8cd6cf..a3705099601f 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -1254,6 +1254,9 @@ config SECRETMEM memory areas visible only in the context of the owning process and not mapped to other processes and other kernel page tables. =20 +config GUESTMEM + bool + config ANON_VMA_NAME bool "Anonymous VMA name support" depends on PROC_FS && ADVISE_SYSCALLS && MMU diff --git a/mm/Makefile b/mm/Makefile index ef54aa615d9d..c92892acd819 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -138,6 +138,7 @@ obj-$(CONFIG_PERCPU_STATS) +=3D percpu-stats.o obj-$(CONFIG_ZONE_DEVICE) +=3D memremap.o obj-$(CONFIG_HMM_MIRROR) +=3D hmm.o obj-$(CONFIG_MEMFD_CREATE) +=3D memfd.o +obj-$(CONFIG_GUESTMEM) +=3D guestmem.o obj-$(CONFIG_MAPPING_DIRTY_HELPERS) +=3D mapping_dirty_helpers.o obj-$(CONFIG_PTDUMP) +=3D ptdump.o obj-$(CONFIG_PAGE_REPORTING) +=3D page_reporting.o diff --git a/mm/guestmem.c b/mm/guestmem.c new file mode 100644 index 000000000000..110087aff7e8 --- /dev/null +++ b/mm/guestmem.c @@ -0,0 +1,380 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include +#include +#include + +struct guestmem { + const struct guestmem_ops *ops; +}; + +static inline bool __guestmem_release_folio(struct address_space *mapping, + struct folio *folio) +{ + struct guestmem *gmem =3D mapping->i_private_data; + + if (gmem->ops->release_folio) { + if (!gmem->ops->release_folio(mapping, folio)) + return false; + } + + return true; +} + +static inline void +__guestmem_invalidate_begin(struct address_space *const mapping, pgoff_t s= tart, + pgoff_t end) +{ + struct guestmem *gmem =3D mapping->i_private_data; + struct list_head *entry; + + list_for_each(entry, &mapping->i_private_list) + gmem->ops->invalidate_begin(entry, start, end); +} + +static inline void +__guestmem_invalidate_end(struct address_space *const mapping, pgoff_t sta= rt, + pgoff_t end) +{ + struct guestmem *gmem =3D mapping->i_private_data; + struct list_head *entry; + + if (gmem->ops->invalidate_end) { + list_for_each(entry, &mapping->i_private_list) + gmem->ops->invalidate_end(entry, start, end); + } +} + +static int guestmem_write_begin(const struct kiocb *kiocb, + struct address_space *mapping, + loff_t pos, unsigned int len, + struct folio **foliop, + void **fsdata) +{ + struct file *file =3D kiocb->ki_filp; + pgoff_t index =3D pos >> PAGE_SHIFT; + struct folio *folio; + + if (!PAGE_ALIGNED(pos) || len !=3D PAGE_SIZE) + return -EINVAL; + + if (pos + len > i_size_read(file_inode(file))) + return -EINVAL; + + folio =3D guestmem_grab_folio(file_inode(file)->i_mapping, index); + if (IS_ERR(folio)) + return -EFAULT; + + if (WARN_ON_ONCE(folio_test_large(folio))) { + folio_unlock(folio); + folio_put(folio); + return -EFAULT; + } + + if (folio_test_uptodate(folio)) { + folio_unlock(folio); + folio_put(folio); + return -ENOSPC; + } + + *foliop =3D folio; + return 0; +} + +static int guestmem_write_end(const struct kiocb *kiocb, + struct address_space *mapping, + loff_t pos, unsigned int len, unsigned int copied, + struct folio *folio, void *fsdata) +{ + if (copied) { + if (copied < len) { + unsigned int from =3D pos & (PAGE_SIZE - 1); + + folio_zero_range(folio, from + copied, len - copied); + } + guestmem_mark_prepared(folio); + } + + folio_unlock(folio); + folio_put(folio); + + return copied; +} + +static void guestmem_free_folio(struct address_space *mapping, + struct folio *folio) +{ + WARN_ON_ONCE(!__guestmem_release_folio(mapping, folio)); +} + +static int guestmem_error_folio(struct address_space *mapping, + struct folio *folio) +{ + pgoff_t start, end; + + filemap_invalidate_lock_shared(mapping); + + start =3D folio->index; + end =3D start + folio_nr_pages(folio); + + __guestmem_invalidate_begin(mapping, start, end); + + /* + * Do not truncate the range, what action is taken in response to the + * error is userspace's decision (assuming the architecture supports + * gracefully handling memory errors). If/when the guest attempts to + * access a poisoned page, kvm_gmem_get_pfn() will return -EHWPOISON, + * at which point KVM can either terminate the VM or propagate the + * error to userspace. + */ + + __guestmem_invalidate_end(mapping, start, end); + + filemap_invalidate_unlock_shared(mapping); + return MF_FAILED; +} + +static int guestmem_migrate_folio(struct address_space *mapping, + struct folio *dst, struct folio *src, + enum migrate_mode mode) +{ + WARN_ON_ONCE(1); + return -EINVAL; +} + +static const struct address_space_operations guestmem_aops =3D { + .dirty_folio =3D noop_dirty_folio, + .write_begin =3D guestmem_write_begin, + .write_end =3D guestmem_write_end, + .free_folio =3D guestmem_free_folio, + .error_remove_folio =3D guestmem_error_folio, + .migrate_folio =3D guestmem_migrate_folio, +}; + +int guestmem_attach_mapping(struct address_space *mapping, + const struct guestmem_ops *const ops, + struct list_head *data) +{ + struct guestmem *gmem; + + if (mapping->a_ops =3D=3D &guestmem_aops) { + gmem =3D mapping->i_private_data; + if (gmem->ops !=3D ops) + return -EINVAL; + + goto add; + } + + gmem =3D kzalloc(sizeof(*gmem), GFP_KERNEL); + if (!gmem) + return -ENOMEM; + + gmem->ops =3D ops; + + mapping->a_ops =3D &guestmem_aops; + mapping->i_private_data =3D gmem; + + mapping_set_gfp_mask(mapping, GFP_HIGHUSER); + mapping_set_inaccessible(mapping); + /* Unmovable mappings are supposed to be marked unevictable as well. */ + WARN_ON_ONCE(!mapping_unevictable(mapping)); + +add: + list_add(data, &mapping->i_private_list); + return 0; +} +EXPORT_SYMBOL_GPL(guestmem_attach_mapping); + +void guestmem_detach_mapping(struct address_space *mapping, + struct list_head *data) +{ + list_del(data); + + if (list_empty(&mapping->i_private_list)) { + /** + * Ensures we call ->free_folio() for any allocated folios. + * Any folios allocated after this point are assumed not to be + * accessed by the guest, so we don't need to worry about + * guestmem ops not being called on them. + */ + truncate_inode_pages(mapping, 0); + + kfree(mapping->i_private_data); + mapping->i_private_data =3D NULL; + mapping->a_ops =3D &empty_aops; + } +} +EXPORT_SYMBOL_GPL(guestmem_detach_mapping); + +struct folio *guestmem_grab_folio(struct address_space *mapping, pgoff_t i= ndex) +{ + /* TODO: Support huge pages. */ + return filemap_grab_folio(mapping, index); +} +EXPORT_SYMBOL_GPL(guestmem_grab_folio); + +int guestmem_punch_hole(struct address_space *mapping, loff_t offset, + loff_t len) +{ + pgoff_t start =3D offset >> PAGE_SHIFT; + pgoff_t end =3D (offset + len) >> PAGE_SHIFT; + + filemap_invalidate_lock(mapping); + __guestmem_invalidate_begin(mapping, start, end); + + truncate_inode_pages_range(mapping, offset, offset + len - 1); + + __guestmem_invalidate_end(mapping, start, end); + filemap_invalidate_unlock(mapping); + + return 0; +} +EXPORT_SYMBOL_GPL(guestmem_punch_hole); + +int guestmem_allocate(struct address_space *mapping, loff_t offset, loff_t= len) +{ + pgoff_t start, index, end; + int r; + + /* Dedicated guest is immutable by default. */ + if (offset + len > i_size_read(mapping->host)) + return -EINVAL; + + filemap_invalidate_lock_shared(mapping); + + start =3D offset >> PAGE_SHIFT; + end =3D (offset + len) >> PAGE_SHIFT; + + r =3D 0; + for (index =3D start; index < end; ) { + struct folio *folio; + + if (signal_pending(current)) { + r =3D -EINTR; + break; + } + + folio =3D guestmem_grab_folio(mapping, index); + if (IS_ERR(folio)) { + r =3D PTR_ERR(folio); + break; + } + + index =3D folio_next_index(folio); + + folio_unlock(folio); + folio_put(folio); + + /* 64-bit only, wrapping the index should be impossible. */ + if (WARN_ON_ONCE(!index)) + break; + + cond_resched(); + } + + filemap_invalidate_unlock_shared(mapping); + + return r; +} +EXPORT_SYMBOL_GPL(guestmem_allocate); + +bool guestmem_test_no_direct_map(struct inode *inode) +{ + return mapping_no_direct_map(inode->i_mapping); +} +EXPORT_SYMBOL_GPL(guestmem_test_no_direct_map); + +void guestmem_mark_prepared(struct folio *folio) +{ + struct inode *inode =3D folio_inode(folio); + + if (guestmem_test_no_direct_map(inode)) + set_direct_map_valid_noflush(folio_page(folio, 0), folio_nr_pages(folio)= , false); + + folio_mark_uptodate(folio); +} +EXPORT_SYMBOL_GPL(guestmem_mark_prepared); + +static vm_fault_t guestmem_fault_user_mapping(struct vm_fault *vmf) +{ + struct inode *inode =3D file_inode(vmf->vma->vm_file); + struct folio *folio; + vm_fault_t ret =3D VM_FAULT_LOCKED; + + if (((loff_t)vmf->pgoff << PAGE_SHIFT) >=3D i_size_read(inode)) + return VM_FAULT_SIGBUS; + + folio =3D guestmem_grab_folio(inode->i_mapping, vmf->pgoff); + if (IS_ERR(folio)) { + int err =3D PTR_ERR(folio); + + if (err =3D=3D -EAGAIN) + return VM_FAULT_RETRY; + + return vmf_error(err); + } + + if (WARN_ON_ONCE(folio_test_large(folio))) { + ret =3D VM_FAULT_SIGBUS; + goto out_folio; + } + + if (!folio_test_uptodate(folio)) { + clear_highpage(folio_page(folio, 0)); + guestmem_mark_prepared(folio); + } + + if (userfaultfd_minor(vmf->vma)) { + folio_unlock(folio); + return handle_userfault(vmf, VM_UFFD_MINOR); + } + + vmf->page =3D folio_file_page(folio, vmf->pgoff); + +out_folio: + if (ret !=3D VM_FAULT_LOCKED) { + folio_unlock(folio); + folio_put(folio); + } + + return ret; +} + +static const struct vm_operations_struct guestmem_vm_ops =3D { + .fault =3D guestmem_fault_user_mapping, +}; + +int guestmem_mmap(struct file *file, struct vm_area_struct *vma) +{ + struct address_space *mapping =3D file_inode(file)->i_mapping; + struct guestmem *gmem =3D mapping->i_private_data; + + if (!gmem->ops->supports_mmap || !gmem->ops->supports_mmap(file_inode(fil= e))) + return -ENODEV; + + if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) !=3D + (VM_SHARED | VM_MAYSHARE)) { + return -EINVAL; + } + + vma->vm_ops =3D &guestmem_vm_ops; + + return 0; +} +EXPORT_SYMBOL_GPL(guestmem_mmap); + +bool guestmem_vma_is_guestmem(struct vm_area_struct *vma) +{ + struct inode *inode; + + if (!vma->vm_file) + return false; + + inode =3D file_inode(vma->vm_file); + if (!inode || !inode->i_mapping || !inode->i_mapping->i_private_data) + return false; + + return inode->i_mapping->a_ops =3D=3D &guestmem_aops; +} diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig index 1b7d5be0b6c4..41e26ad33c1b 100644 --- a/virt/kvm/Kconfig +++ b/virt/kvm/Kconfig @@ -114,6 +114,7 @@ config KVM_GENERIC_MEMORY_ATTRIBUTES =20 config KVM_GUEST_MEMFD select XARRAY_MULTI + select GUESTMEM bool =20 config HAVE_KVM_ARCH_GMEM_PREPARE diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index 6989362c056c..15ab13bf6d40 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -1,6 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 #include #include +#include #include #include #include @@ -43,26 +44,6 @@ static int __kvm_gmem_prepare_folio(struct kvm *kvm, str= uct kvm_memory_slot *slo return 0; } =20 -static bool kvm_gmem_test_no_direct_map(struct inode *inode) -{ - return ((unsigned long) inode->i_private) & GUEST_MEMFD_FLAG_NO_DIRECT_MA= P; -} - -static inline int kvm_gmem_mark_prepared(struct folio *folio) -{ - struct inode *inode =3D folio_inode(folio); - int r =3D 0; - - if (kvm_gmem_test_no_direct_map(inode)) - r =3D set_direct_map_valid_noflush(folio_page(folio, 0), folio_nr_pages(= folio), - false); - - if (!r) - folio_mark_uptodate(folio); - - return r; -} - /* * Process @folio, which contains @gfn, so that the guest can use it. * The folio must be locked and the gfn must be contained in @slot. @@ -98,7 +79,7 @@ static int kvm_gmem_prepare_folio(struct kvm *kvm, struct= kvm_memory_slot *slot, index =3D ALIGN_DOWN(index, 1 << folio_order(folio)); r =3D __kvm_gmem_prepare_folio(kvm, slot, index, folio); if (!r) - r =3D kvm_gmem_mark_prepared(folio); + guestmem_mark_prepared(folio); =20 return r; } @@ -114,8 +95,7 @@ static int kvm_gmem_prepare_folio(struct kvm *kvm, struc= t kvm_memory_slot *slot, */ static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index) { - /* TODO: Support huge pages. */ - return filemap_grab_folio(inode->i_mapping, index); + return guestmem_grab_folio(inode->i_mapping, index); } =20 static void kvm_gmem_invalidate_begin(struct kvm_gmem *gmem, pgoff_t start, @@ -167,79 +147,6 @@ static void kvm_gmem_invalidate_end(struct kvm_gmem *g= mem, pgoff_t start, } } =20 -static long kvm_gmem_punch_hole(struct inode *inode, loff_t offset, loff_t= len) -{ - struct list_head *gmem_list =3D &inode->i_mapping->i_private_list; - pgoff_t start =3D offset >> PAGE_SHIFT; - pgoff_t end =3D (offset + len) >> PAGE_SHIFT; - struct kvm_gmem *gmem; - - /* - * Bindings must be stable across invalidation to ensure the start+end - * are balanced. - */ - filemap_invalidate_lock(inode->i_mapping); - - list_for_each_entry(gmem, gmem_list, entry) - kvm_gmem_invalidate_begin(gmem, start, end); - - truncate_inode_pages_range(inode->i_mapping, offset, offset + len - 1); - - list_for_each_entry(gmem, gmem_list, entry) - kvm_gmem_invalidate_end(gmem, start, end); - - filemap_invalidate_unlock(inode->i_mapping); - - return 0; -} - -static long kvm_gmem_allocate(struct inode *inode, loff_t offset, loff_t l= en) -{ - struct address_space *mapping =3D inode->i_mapping; - pgoff_t start, index, end; - int r; - - /* Dedicated guest is immutable by default. */ - if (offset + len > i_size_read(inode)) - return -EINVAL; - - filemap_invalidate_lock_shared(mapping); - - start =3D offset >> PAGE_SHIFT; - end =3D (offset + len) >> PAGE_SHIFT; - - r =3D 0; - for (index =3D start; index < end; ) { - struct folio *folio; - - if (signal_pending(current)) { - r =3D -EINTR; - break; - } - - folio =3D kvm_gmem_get_folio(inode, index); - if (IS_ERR(folio)) { - r =3D PTR_ERR(folio); - break; - } - - index =3D folio_next_index(folio); - - folio_unlock(folio); - folio_put(folio); - - /* 64-bit only, wrapping the index should be impossible. */ - if (WARN_ON_ONCE(!index)) - break; - - cond_resched(); - } - - filemap_invalidate_unlock_shared(mapping); - - return r; -} - static long kvm_gmem_fallocate(struct file *file, int mode, loff_t offset, loff_t len) { @@ -255,9 +162,9 @@ static long kvm_gmem_fallocate(struct file *file, int m= ode, loff_t offset, return -EINVAL; =20 if (mode & FALLOC_FL_PUNCH_HOLE) - ret =3D kvm_gmem_punch_hole(file_inode(file), offset, len); + ret =3D guestmem_punch_hole(file_inode(file)->i_mapping, offset, len); else - ret =3D kvm_gmem_allocate(file_inode(file), offset, len); + ret =3D guestmem_allocate(file_inode(file)->i_mapping, offset, len); =20 if (!ret) file_modified(file); @@ -299,7 +206,7 @@ static int kvm_gmem_release(struct inode *inode, struct= file *file) kvm_gmem_invalidate_begin(gmem, 0, -1ul); kvm_gmem_invalidate_end(gmem, 0, -1ul); =20 - list_del(&gmem->entry); + guestmem_detach_mapping(inode->i_mapping, &gmem->entry); =20 filemap_invalidate_unlock(inode->i_mapping); =20 @@ -335,74 +242,8 @@ static bool kvm_gmem_supports_mmap(struct inode *inode) return flags & GUEST_MEMFD_FLAG_MMAP; } =20 -static vm_fault_t kvm_gmem_fault_user_mapping(struct vm_fault *vmf) -{ - struct inode *inode =3D file_inode(vmf->vma->vm_file); - struct folio *folio; - vm_fault_t ret =3D VM_FAULT_LOCKED; - - if (((loff_t)vmf->pgoff << PAGE_SHIFT) >=3D i_size_read(inode)) - return VM_FAULT_SIGBUS; - - folio =3D kvm_gmem_get_folio(inode, vmf->pgoff); - if (IS_ERR(folio)) { - int err =3D PTR_ERR(folio); - - if (err =3D=3D -EAGAIN) - return VM_FAULT_RETRY; - - return vmf_error(err); - } - - if (WARN_ON_ONCE(folio_test_large(folio))) { - ret =3D VM_FAULT_SIGBUS; - goto out_folio; - } - - if (!folio_test_uptodate(folio)) { - int err =3D 0; - - clear_highpage(folio_page(folio, 0)); - err =3D kvm_gmem_mark_prepared(folio); - - if (err) { - ret =3D vmf_error(err); - goto out_folio; - } - } - - vmf->page =3D folio_file_page(folio, vmf->pgoff); - -out_folio: - if (ret !=3D VM_FAULT_LOCKED) { - folio_unlock(folio); - folio_put(folio); - } - - return ret; -} - -static const struct vm_operations_struct kvm_gmem_vm_ops =3D { - .fault =3D kvm_gmem_fault_user_mapping, -}; - -static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma) -{ - if (!kvm_gmem_supports_mmap(file_inode(file))) - return -ENODEV; - - if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) !=3D - (VM_SHARED | VM_MAYSHARE)) { - return -EINVAL; - } - - vma->vm_ops =3D &kvm_gmem_vm_ops; - - return 0; -} - static struct file_operations kvm_gmem_fops =3D { - .mmap =3D kvm_gmem_mmap, + .mmap =3D guestmem_mmap, .llseek =3D default_llseek, .write_iter =3D generic_perform_write, .open =3D generic_file_open, @@ -415,104 +256,24 @@ void kvm_gmem_init(struct module *module) kvm_gmem_fops.owner =3D module; } =20 -static int kvm_kmem_gmem_write_begin(const struct kiocb *kiocb, - struct address_space *mapping, - loff_t pos, unsigned int len, - struct folio **foliop, - void **fsdata) -{ - struct file *file =3D kiocb->ki_filp; - pgoff_t index =3D pos >> PAGE_SHIFT; - struct folio *folio; - - if (!PAGE_ALIGNED(pos) || len !=3D PAGE_SIZE) - return -EINVAL; - - if (pos + len > i_size_read(file_inode(file))) - return -EINVAL; - - folio =3D kvm_gmem_get_folio(file_inode(file), index); - if (IS_ERR(folio)) - return -EFAULT; - - if (WARN_ON_ONCE(folio_test_large(folio))) { - folio_unlock(folio); - folio_put(folio); - return -EFAULT; - } - - if (folio_test_uptodate(folio)) { - folio_unlock(folio); - folio_put(folio); - return -ENOSPC; - } - - *foliop =3D folio; - return 0; -} - -static int kvm_kmem_gmem_write_end(const struct kiocb *kiocb, - struct address_space *mapping, - loff_t pos, unsigned int len, - unsigned int copied, - struct folio *folio, void *fsdata) +static void kvm_guestmem_invalidate_begin(struct list_head *entry, pgoff_t= start, + pgoff_t end) { - if (copied) { - if (copied < len) { - unsigned int from =3D pos & (PAGE_SIZE - 1); - - folio_zero_range(folio, from + copied, len - copied); - } - kvm_gmem_mark_prepared(folio); - } - - folio_unlock(folio); - folio_put(folio); - - return copied; -} + struct kvm_gmem *gmem =3D container_of(entry, struct kvm_gmem, entry); =20 -static int kvm_gmem_migrate_folio(struct address_space *mapping, - struct folio *dst, struct folio *src, - enum migrate_mode mode) -{ - WARN_ON_ONCE(1); - return -EINVAL; + kvm_gmem_invalidate_begin(gmem, start, end); } =20 -static int kvm_gmem_error_folio(struct address_space *mapping, struct foli= o *folio) +static void kvm_guestmem_invalidate_end(struct list_head *entry, pgoff_t s= tart, + pgoff_t end) { - struct list_head *gmem_list =3D &mapping->i_private_list; - struct kvm_gmem *gmem; - pgoff_t start, end; - - filemap_invalidate_lock_shared(mapping); - - start =3D folio->index; - end =3D start + folio_nr_pages(folio); - - list_for_each_entry(gmem, gmem_list, entry) - kvm_gmem_invalidate_begin(gmem, start, end); + struct kvm_gmem *gmem =3D container_of(entry, struct kvm_gmem, entry); =20 - /* - * Do not truncate the range, what action is taken in response to the - * error is userspace's decision (assuming the architecture supports - * gracefully handling memory errors). If/when the guest attempts to - * access a poisoned page, kvm_gmem_get_pfn() will return -EHWPOISON, - * at which point KVM can either terminate the VM or propagate the - * error to userspace. - */ - - list_for_each_entry(gmem, gmem_list, entry) - kvm_gmem_invalidate_end(gmem, start, end); - - filemap_invalidate_unlock_shared(mapping); - - return MF_DELAYED; + kvm_gmem_invalidate_end(gmem, start, end); } =20 -static void kvm_gmem_free_folio(struct address_space *mapping, - struct folio *folio) +static bool kvm_gmem_release_folio(struct address_space *mapping, + struct folio *folio) { struct page *page =3D folio_page(folio, 0); kvm_pfn_t pfn =3D page_to_pfn(page); @@ -525,19 +286,19 @@ static void kvm_gmem_free_folio(struct address_space = *mapping, * happened in set_direct_map_invalid_noflush() in kvm_gmem_mark_prepared= (). * Thus set_direct_map_valid_noflush() here only updates prot bits. */ - if (kvm_gmem_test_no_direct_map(mapping->host)) + if (guestmem_test_no_direct_map(mapping->host)) set_direct_map_valid_noflush(page, folio_nr_pages(folio), true); =20 kvm_arch_gmem_invalidate(pfn, pfn + (1ul << order)); + + return true; } =20 -static const struct address_space_operations kvm_gmem_aops =3D { - .dirty_folio =3D noop_dirty_folio, - .write_begin =3D kvm_kmem_gmem_write_begin, - .write_end =3D kvm_kmem_gmem_write_end, - .migrate_folio =3D kvm_gmem_migrate_folio, - .error_remove_folio =3D kvm_gmem_error_folio, - .free_folio =3D kvm_gmem_free_folio, +static const struct guestmem_ops kvm_guestmem_ops =3D { + .invalidate_begin =3D kvm_guestmem_invalidate_begin, + .invalidate_end =3D kvm_guestmem_invalidate_end, + .release_folio =3D kvm_gmem_release_folio, + .supports_mmap =3D kvm_gmem_supports_mmap, }; =20 static int kvm_gmem_setattr(struct mnt_idmap *idmap, struct dentry *dentry, @@ -587,13 +348,12 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t = size, u64 flags) =20 inode->i_private =3D (void *)(unsigned long)flags; inode->i_op =3D &kvm_gmem_iops; - inode->i_mapping->a_ops =3D &kvm_gmem_aops; inode->i_mode |=3D S_IFREG; inode->i_size =3D size; - mapping_set_gfp_mask(inode->i_mapping, GFP_HIGHUSER); - mapping_set_inaccessible(inode->i_mapping); - /* Unmovable mappings are supposed to be marked unevictable as well. */ - WARN_ON_ONCE(!mapping_unevictable(inode->i_mapping)); + err =3D guestmem_attach_mapping(inode->i_mapping, &kvm_guestmem_ops, + &gmem->entry); + if (err) + goto err_putfile; =20 if (flags & GUEST_MEMFD_FLAG_NO_DIRECT_MAP) mapping_set_no_direct_map(inode->i_mapping); @@ -601,11 +361,12 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t = size, u64 flags) kvm_get_kvm(kvm); gmem->kvm =3D kvm; xa_init(&gmem->bindings); - list_add(&gmem->entry, &inode->i_mapping->i_private_list); =20 fd_install(fd, file); return fd; =20 +err_putfile: + fput(file); err_gmem: kfree(gmem); err_fd: @@ -869,7 +630,7 @@ long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn= , void __user *src, long p =3D src ? src + i * PAGE_SIZE : NULL; ret =3D post_populate(kvm, gfn, pfn, p, max_order, opaque); if (!ret) - ret =3D kvm_gmem_mark_prepared(folio); + guestmem_mark_prepared(folio); =20 put_folio_and_exit: folio_put(folio); --=20 2.50.1 From nobody Thu Oct 2 14:27:41 2025 Received: from fra-out-006.esa.eu-central-1.outbound.mail-perimeter.amazon.com (fra-out-006.esa.eu-central-1.outbound.mail-perimeter.amazon.com [18.197.217.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 65D4A2797AE; Mon, 15 Sep 2025 16:18:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=18.197.217.180 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757953132; cv=none; b=I0+cL9jlbXBcrip9O9BTdE7Fj25rJdmVacoSUaLtMC3i1VXH+pRySMxHRAfHP3wwyR/kgeB3ZbprRvwhhO17usKx1XU1bWP0LkrLoBFz52lROF2DdE1UjRMY2tTczkL3vOvSXMeaTYhbtt3ODz3PZTYuhck4C+BxBjWfI7Q0HI0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757953132; c=relaxed/simple; bh=fJDRk2Bdf81Gp496FGpX8GvGCt6gwfobFHAMxUT4pKI=; h=From:To:CC:Subject:Date:Message-ID:References:In-Reply-To: Content-Type:MIME-Version; b=D6E240+YxXNX3NQUGoPdeE0XOuRQnMDlBXwLuetbFCYidEb27dmKgSMY5y6u1V2B/Yzy/V66SksJYkmxsalzHwJSC0nR7ziXQ+Dp8mtmxkSulWtVFV5FuK3cfT81/JO1Hhdi1pf5oEb5r7YCKnaY2F6TwFy7YzfW8w7hvIT1ChI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.co.uk; spf=pass smtp.mailfrom=amazon.co.uk; dkim=pass (2048-bit key) header.d=amazon.co.uk header.i=@amazon.co.uk header.b=j7Kfrjvm; arc=none smtp.client-ip=18.197.217.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.co.uk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.co.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=amazon.co.uk header.i=@amazon.co.uk header.b="j7Kfrjvm" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.co.uk; i=@amazon.co.uk; q=dns/txt; s=amazoncorp2; t=1757953130; x=1789489130; h=from:to:cc:subject:date:message-id:references: in-reply-to:content-transfer-encoding:mime-version; bh=N3NeNyamqVLa9+DQCyzsj2cOhr41ZZd2MWu/K0OO1HI=; b=j7KfrjvmRdp9obCrpQXNsmdhWJv88NNYyQptQTSwGTGxAPidBvzn5+6P 9zTt+AqBOZYVw+divbWQiKGA0oGWNdPSzekwnweaKDb5jxp+FLMx2KRcW sWLcYURlp4h0S4zb3pGVOm8iGdF6HYCXXMQ2UR/KT/NINppCkelRUs9VW xZ/bmbJpYb403u/kKs5u3sxB04t31nvF7MGlpbPj4E07JXySKX4udcZBX PGNS9WMOkCZGIy8h7EDjV9rqXoGXWUqqb4tTpLT33gFHiStyJQNTfBCCl 9UNxCa5gqwhAqKJNfQJU5dR5fVcj0f95SA0pBa4UMC5qMSvnczjrMhmQv Q==; X-CSE-ConnectionGUID: ZcEgW1YvT0C2GF/FYjPcxw== X-CSE-MsgGUID: IfTURVlRSXKEJ320dW9aJQ== X-IronPort-AV: E=Sophos;i="6.18,266,1751241600"; d="scan'208";a="2137065" Received: from ip-10-6-3-216.eu-central-1.compute.internal (HELO smtpout.naws.eu-central-1.prod.farcaster.email.amazon.dev) ([10.6.3.216]) by internal-fra-out-006.esa.eu-central-1.outbound.mail-perimeter.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Sep 2025 16:18:40 +0000 Received: from EX19MTAEUB002.ant.amazon.com [54.240.197.224:25479] by smtpin.naws.eu-central-1.prod.farcaster.email.amazon.dev [10.0.0.240:2525] with esmtp (Farcaster) id f857cdf3-c4a1-44e0-903b-97f2c9bca60f; Mon, 15 Sep 2025 16:18:40 +0000 (UTC) X-Farcaster-Flow-ID: f857cdf3-c4a1-44e0-903b-97f2c9bca60f Received: from EX19D022EUC004.ant.amazon.com (10.252.51.159) by EX19MTAEUB002.ant.amazon.com (10.252.51.59) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.2562.20; Mon, 15 Sep 2025 16:18:39 +0000 Received: from EX19D022EUC002.ant.amazon.com (10.252.51.137) by EX19D022EUC004.ant.amazon.com (10.252.51.159) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.2562.20; Mon, 15 Sep 2025 16:18:39 +0000 Received: from EX19D022EUC002.ant.amazon.com ([fe80::bd:307b:4d3a:7d80]) by EX19D022EUC002.ant.amazon.com ([fe80::bd:307b:4d3a:7d80%3]) with mapi id 15.02.2562.020; Mon, 15 Sep 2025 16:18:39 +0000 From: "Kalyazin, Nikita" To: "akpm@linux-foundation.org" , "david@redhat.com" , "pbonzini@redhat.com" , "seanjc@google.com" , "viro@zeniv.linux.org.uk" , "brauner@kernel.org" CC: "peterx@redhat.com" , "lorenzo.stoakes@oracle.com" , "Liam.Howlett@oracle.com" , "willy@infradead.org" , "vbabka@suse.cz" , "rppt@kernel.org" , "surenb@google.com" , "mhocko@suse.com" , "jack@suse.cz" , "linux-mm@kvack.org" , "kvm@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "linux-fsdevel@vger.kernel.org" , "jthoughton@google.com" , "tabba@google.com" , "vannapurve@google.com" , "Roy, Patrick" , "Thomson, Jack" , "Manwaring, Derek" , "Cali, Marco" , "Kalyazin, Nikita" Subject: [RFC PATCH v6 2/2] userfaulfd: add minor mode for guestmem Thread-Topic: [RFC PATCH v6 2/2] userfaulfd: add minor mode for guestmem Thread-Index: AQHcJlxlJ76MA28SOEel205dRdsXkA== Date: Mon, 15 Sep 2025 16:18:39 +0000 Message-ID: <20250915161815.40729-3-kalyazin@amazon.com> References: <20250915161815.40729-1-kalyazin@amazon.com> In-Reply-To: <20250915161815.40729-1-kalyazin@amazon.com> Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: Content-Transfer-Encoding: quoted-printable Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" From: Nikita Kalyazin UserfaultFD support in guestmem enables use cases like restoring a guest_memfd-backed VM from a memory snapshot in Firecracker [1] where an external process is responsible for supplying the content of the guest memory or live migration of guest_memfd-backed VMs. [1] https://github.com/firecracker-microvm/firecracker/blob/main/docs/snaps= hotting/handling-page-faults-on-snapshot-resume.md Signed-off-by: Nikita Kalyazin --- Documentation/admin-guide/mm/userfaultfd.rst | 4 +++- fs/userfaultfd.c | 3 ++- include/linux/userfaultfd_k.h | 8 +++++--- include/uapi/linux/userfaultfd.h | 8 +++++++- mm/userfaultfd.c | 14 +++++++++++--- 5 files changed, 28 insertions(+), 9 deletions(-) diff --git a/Documentation/admin-guide/mm/userfaultfd.rst b/Documentation/a= dmin-guide/mm/userfaultfd.rst index e5cc8848dcb3..ca8c5954ffdb 100644 --- a/Documentation/admin-guide/mm/userfaultfd.rst +++ b/Documentation/admin-guide/mm/userfaultfd.rst @@ -111,7 +111,9 @@ events, except page fault notifications, may be generat= ed: - ``UFFD_FEATURE_MINOR_HUGETLBFS`` indicates that the kernel supports ``UFFDIO_REGISTER_MODE_MINOR`` registration for hugetlbfs virtual memory areas. ``UFFD_FEATURE_MINOR_SHMEM`` is the analogous feature indicating - support for shmem virtual memory areas. + support for shmem virtual memory areas. ``UFFD_FEATURE_MINOR_GUESTMEM`` + is the analogous feature indicating support for guestmem-backed memory + areas. =20 - ``UFFD_FEATURE_MOVE`` indicates that the kernel supports moving an existing page contents from userspace. diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 54c6cc7fe9c6..e4e80f1072a6 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -1978,7 +1978,8 @@ static int userfaultfd_api(struct userfaultfd_ctx *ct= x, uffdio_api.features =3D UFFD_API_FEATURES; #ifndef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR uffdio_api.features &=3D - ~(UFFD_FEATURE_MINOR_HUGETLBFS | UFFD_FEATURE_MINOR_SHMEM); + ~(UFFD_FEATURE_MINOR_HUGETLBFS | UFFD_FEATURE_MINOR_SHMEM | + UFFD_FEATURE_MINOR_GUESTMEM); #endif #ifndef CONFIG_HAVE_ARCH_USERFAULTFD_WP uffdio_api.features &=3D ~UFFD_FEATURE_PAGEFAULT_FLAG_WP; diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index c0e716aec26a..37bd4e71b611 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -14,6 +14,7 @@ #include /* linux/include/uapi/linux/userfaultfd.h */ =20 #include +#include #include #include #include @@ -218,7 +219,8 @@ static inline bool vma_can_userfault(struct vm_area_str= uct *vma, return false; =20 if ((vm_flags & VM_UFFD_MINOR) && - (!is_vm_hugetlb_page(vma) && !vma_is_shmem(vma))) + (!is_vm_hugetlb_page(vma) && !vma_is_shmem(vma) && + !guestmem_vma_is_guestmem(vma))) return false; =20 /* @@ -238,9 +240,9 @@ static inline bool vma_can_userfault(struct vm_area_str= uct *vma, return false; #endif =20 - /* By default, allow any of anon|shmem|hugetlb */ + /* By default, allow any of anon|shmem|hugetlb|guestmem */ return vma_is_anonymous(vma) || is_vm_hugetlb_page(vma) || - vma_is_shmem(vma); + vma_is_shmem(vma) || guestmem_vma_is_guestmem(vma); } =20 static inline bool vma_has_uffd_without_event_remap(struct vm_area_struct = *vma) diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaul= tfd.h index 2841e4ea8f2c..0fe9fbd29772 100644 --- a/include/uapi/linux/userfaultfd.h +++ b/include/uapi/linux/userfaultfd.h @@ -42,7 +42,8 @@ UFFD_FEATURE_WP_UNPOPULATED | \ UFFD_FEATURE_POISON | \ UFFD_FEATURE_WP_ASYNC | \ - UFFD_FEATURE_MOVE) + UFFD_FEATURE_MOVE | \ + UFFD_FEATURE_MINOR_GUESTMEM) #define UFFD_API_IOCTLS \ ((__u64)1 << _UFFDIO_REGISTER | \ (__u64)1 << _UFFDIO_UNREGISTER | \ @@ -230,6 +231,10 @@ struct uffdio_api { * * UFFD_FEATURE_MOVE indicates that the kernel supports moving an * existing page contents from userspace. + * + * UFFD_FEATURE_MINOR_GUESTMEM indicates the same support as + * UFFD_FEATURE_MINOR_HUGETLBFS, but for guestmem-backed pages + * instead. */ #define UFFD_FEATURE_PAGEFAULT_FLAG_WP (1<<0) #define UFFD_FEATURE_EVENT_FORK (1<<1) @@ -248,6 +253,7 @@ struct uffdio_api { #define UFFD_FEATURE_POISON (1<<14) #define UFFD_FEATURE_WP_ASYNC (1<<15) #define UFFD_FEATURE_MOVE (1<<16) +#define UFFD_FEATURE_MINOR_GUESTMEM (1<<17) __u64 features; =20 __u64 ioctls; diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 45e6290e2e8b..304e5d7dbb70 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -388,7 +388,14 @@ static int mfill_atomic_pte_continue(pmd_t *dst_pmd, struct page *page; int ret; =20 - ret =3D shmem_get_folio(inode, pgoff, 0, &folio, SGP_NOALLOC); + if (guestmem_vma_is_guestmem(dst_vma)) { + ret =3D 0; + folio =3D guestmem_grab_folio(inode->i_mapping, pgoff); + if (IS_ERR(folio)) + ret =3D PTR_ERR(folio); + } else { + ret =3D shmem_get_folio(inode, pgoff, 0, &folio, SGP_NOALLOC); + } /* Our caller expects us to return -EFAULT if we failed to find folio */ if (ret =3D=3D -ENOENT) ret =3D -EFAULT; @@ -766,9 +773,10 @@ static __always_inline ssize_t mfill_atomic(struct use= rfaultfd_ctx *ctx, return mfill_atomic_hugetlb(ctx, dst_vma, dst_start, src_start, len, flags); =20 - if (!vma_is_anonymous(dst_vma) && !vma_is_shmem(dst_vma)) + if (!vma_is_anonymous(dst_vma) && !vma_is_shmem(dst_vma) + && !guestmem_vma_is_guestmem(dst_vma)) goto out_unlock; - if (!vma_is_shmem(dst_vma) && + if (!vma_is_shmem(dst_vma) && !guestmem_vma_is_guestmem(dst_vma) && uffd_flags_mode_is(flags, MFILL_ATOMIC_CONTINUE)) goto out_unlock; =20 --=20 2.50.1