[PATCH v3 00/15] mm, kvm: allow uffd support in guest_memfd

Mike Rapoport posted 15 patches 6 days, 22 hours ago
There is a newer version of this series
include/linux/mm.h                            |   5 +
include/linux/shmem_fs.h                      |  14 -
include/linux/userfaultfd_k.h                 |  73 +-
mm/filemap.c                                  |   1 +
mm/hugetlb.c                                  |  15 +
mm/memory.c                                   |  43 ++
mm/shmem.c                                    | 188 ++---
mm/userfaultfd.c                              | 694 ++++++++++--------
.../testing/selftests/kvm/guest_memfd_test.c  | 191 +++++
virt/kvm/guest_memfd.c                        |  84 ++-
10 files changed, 860 insertions(+), 448 deletions(-)
[PATCH v3 00/15] mm, kvm: allow uffd support in guest_memfd
Posted by Mike Rapoport 6 days, 22 hours ago
From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>

Hi,

These patches enable support for userfaultfd in guest_memfd.

As the ground work I refactored userfaultfd handling of PTE-based memory types
(anonymous and shmem) and converted them to use vm_uffd_ops for allocating a
folio or getting an existing folio from the page cache. shmem also implements
callbacks that add a folio to the page cache after the data passed in
UFFDIO_COPY was copied and remove the folio from the page cache if page table
update fails.

In order for guest_memfd to notify userspace about page faults, there are new
VM_FAULT_UFFD_MINOR and VM_FAULT_UFFD_MISSING that a ->fault() handler can
return to inform the page fault handler that it needs to call
handle_userfault() to complete the fault.

Nikita helped to plumb these new goodies into guest_memfd and provided basic
tests to verify that guest_memfd works with userfaultfd.
The handling of UFFDIO_MISSING in guest_memfd requires ability to remove a
folio from page cache, the best way I could find was exporting
filemap_remove_folio() to KVM.

I deliberately left hugetlb out, at least for the most part.
hugetlb handles acquisition of VMA and more importantly establishing of parent
page table entry differently than PTE-based memory types. This is a different
abstraction level than what vm_uffd_ops provides and people objected to
exposing such low level APIs as a part of VMA operations.

Also, to enable uffd in guest_memfd refactoring of hugetlb is not needed and I
prefer to delay it until the dust settles after the changes in this set.

v3 changes:
* add fixes from Harry and Andrei
* fix handling of WP-only mode for WP_ASYNC contexts in vma_can_userfault()
* address David's comments about mfill_get_pmd() and rename it to
  mfill_establish_pmd()
* add VM_WARN()s for unsupported operations (James)
* update comments using James' suggestions

v2: https://lore.kernel.org/all/20260306171815.3160826-1-rppt@kernel.org
* instead of returning uffd-specific values from ->fault() handlers add
  __do_userfault() helper to resolve user faults in __do_fault()
* address comments from Peter
* rebased on v7.0-c1

RFC: https://lore.kernel.org/all/20260127192936.1250096-1-rppt@kernel.org

Mike Rapoport (Microsoft) (11):
  userfaultfd: introduce mfill_copy_folio_locked() helper
  userfaultfd: introduce struct mfill_state
  userfaultfd: introduce mfill_establish_pmd() helper
  userfaultfd: introduce mfill_get_vma() and mfill_put_vma()
  userfaultfd: retry copying with locks dropped in
    mfill_atomic_pte_copy()
  userfaultfd: move vma_can_userfault out of line
  userfaultfd: introduce vm_uffd_ops
  shmem, userfaultfd: use a VMA callback to handle UFFDIO_CONTINUE
  userfaultfd: introduce vm_uffd_ops->alloc_folio()
  shmem, userfaultfd: implement shmem uffd operations using vm_uffd_ops
  userfaultfd: mfill_atomic(): remove retry logic

Nikita Kalyazin (3):
  KVM: guest_memfd: implement userfaultfd operations
  KVM: selftests: test userfaultfd minor for guest_memfd
  KVM: selftests: test userfaultfd missing for guest_memfd

Peter Xu (1):
  mm: generalize handling of userfaults in __do_fault()

 include/linux/mm.h                            |   5 +
 include/linux/shmem_fs.h                      |  14 -
 include/linux/userfaultfd_k.h                 |  73 +-
 mm/filemap.c                                  |   1 +
 mm/hugetlb.c                                  |  15 +
 mm/memory.c                                   |  43 ++
 mm/shmem.c                                    | 188 ++---
 mm/userfaultfd.c                              | 694 ++++++++++--------
 .../testing/selftests/kvm/guest_memfd_test.c  | 191 +++++
 virt/kvm/guest_memfd.c                        |  84 ++-
 10 files changed, 860 insertions(+), 448 deletions(-)


base-commit: c369299895a591d96745d6492d4888259b004a9e
--
2.53.0
Re: [PATCH v3 00/15] mm, kvm: allow uffd support in guest_memfd
Posted by Andrew Morton 6 days, 13 hours ago
On Mon, 30 Mar 2026 13:11:01 +0300 Mike Rapoport <rppt@kernel.org> wrote:

> These patches enable support for userfaultfd in guest_memfd.

Thanks, I've updated mm.git's mm-unstable branch to this version.  I
added a little note-to-self to keep tabs on willy's [07/15] comment.

The series seems to be converging nicely.  Several of the patches
aren't showing R-b/A-b at this time.

I've moved this series even further down-queue, so it's definitely in
the second-week-of-merge-window batch.  So I'll be looking to move it
into mm-stable around Monday of that week (Apr 27?).  Four weeks for
testing, review and little touchups.

> v3 changes:
> * add fixes from Harry and Andrei
> * fix handling of WP-only mode for WP_ASYNC contexts in vma_can_userfault()
> * address David's comments about mfill_get_pmd() and rename it to
>   mfill_establish_pmd()
> * add VM_WARN()s for unsupported operations (James)
> * update comments using James' suggestions

Here's how v3 altered mm.git:


 include/linux/userfaultfd_k.h |    6 +++---
 mm/memory.c                   |    2 +-
 mm/userfaultfd.c              |   12 ++++++++----
 3 files changed, 12 insertions(+), 8 deletions(-)

--- a/include/linux/userfaultfd_k.h~b
+++ a/include/linux/userfaultfd_k.h
@@ -96,14 +96,14 @@ struct vm_uffd_ops {
 	struct folio *(*get_folio_noalloc)(struct inode *inode, pgoff_t pgoff);
 	/*
 	 * Called during resolution of UFFDIO_COPY request.
-	 * Should allocate and return a folio or NULL if allocation
-	 * fails.
+	 * Should allocate and return a folio or NULL if allocation fails.
 	 */
 	struct folio *(*alloc_folio)(struct vm_area_struct *vma,
 				     unsigned long addr);
 	/*
 	 * Called during resolution of UFFDIO_COPY request.
-	 * Should lock the folio and add it to VMA's page cache.
+	 * Should only be called with a folio returned by alloc_folio() above.
+	 * The folio will be set to locked.
 	 * Returns 0 on success, error code on failure.
 	 */
 	int (*filemap_add)(struct folio *folio, struct vm_area_struct *vma,
--- a/mm/memory.c~b
+++ a/mm/memory.c
@@ -5493,7 +5493,7 @@ static vm_fault_t __do_fault(struct vm_f
 	}
 
 	/*
-	 * If this is an userfaultfd trap, process it in advance before
+	 * If this is a userfault trap, process it in advance before
 	 * triggering the genuine fault handler.
 	 */
 	ret = __do_userfault(vmf);
--- a/mm/userfaultfd.c~b
+++ a/mm/userfaultfd.c
@@ -502,7 +502,7 @@ static int __mfill_atomic_pte(struct mfi
 	} else if (uffd_flags_mode_is(flags, MFILL_ATOMIC_ZEROPAGE)) {
 		clear_user_highpage(&folio->page, state->dst_addr);
 	} else {
-		VM_WARN_ONCE(1, "unknown UFFDIO operation");
+		VM_WARN_ONCE(1, "Unknown UFFDIO operation, flags: %x", flags);
 	}
 
 	/*
@@ -612,8 +612,10 @@ static int mfill_atomic_pte_continue(str
 	struct page *page;
 	int ret;
 
-	if (!ops)
+	if (!ops) {
+		VM_WARN_ONCE(1, "UFFDIO_CONTINUE for unsupported VMA");
 		return -EOPNOTSUPP;
+	}
 
 	folio = ops->get_folio_noalloc(inode, pgoff);
 	/* Our caller expects us to return -EFAULT if we failed to find folio */
@@ -864,6 +866,7 @@ static __always_inline ssize_t mfill_ato
 	if (uffd_flags_mode_is(flags, MFILL_ATOMIC_ZEROPAGE))
 		return mfill_atomic_pte_zeropage(state);
 
+	VM_WARN_ONCE(1, "Unknown UFFDIO operation, flags: %x", flags);
 	return -EOPNOTSUPP;
 }
 
@@ -2044,8 +2047,9 @@ bool vma_can_userfault(struct vm_area_st
 		return false;
 
 	/*
-	 * File backed memory with PTE level mappigns must implement
-	 * ops->get_folio_noalloc()
+	 * File backed VMAs (except HugeTLB) must implement
+	 * ops->get_folio_noalloc() because it's required by __do_userfault()
+	 * in page fault handling.
 	 */
 	if (!vma_is_anonymous(vma) && !is_vm_hugetlb_page(vma) &&
 	    !ops->get_folio_noalloc)
_