From nobody Mon Apr 6 11:53:30 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8257C3BAD8B; Mon, 30 Mar 2026 10:12:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774865574; cv=none; b=Y2dVD5B2rB1NTcrI3bInGTvioi1a81gf5b6a2FJMpEZPAM+HqPjxS/+VSk+A1kQBChRwgN2QBSuHg01TpnCVZJN85cQp6RUDT9zQ/eg95swsDRiKmowV8Kjg89nLHuK8VDxpAyr+4T+RNS21zK32uW7CGHFl7uhvWORvEbLXxls= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774865574; c=relaxed/simple; bh=sgqKlJf5BnGeophpkDo1XgppS/PlXJsOousdP1fvs+M=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=BFimt04A51MC/aKe9xW7o354GkB0QtoBHLUp1V/gUly9mM80Blqz6mG9478uV08ghfdEcnbNdbWlMj944q+Yoj0EaV5eszkn/Fwe1cp2PwG/tiHkcLw7XJdr1nf6YBuBphj19OQpMeVtVIH7SdcRS5UlJR6P1F4a/hyc9WE/hE0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=hILcN0Xe; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="hILcN0Xe" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2AFAAC4CEF7; Mon, 30 Mar 2026 10:12:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1774865574; bh=sgqKlJf5BnGeophpkDo1XgppS/PlXJsOousdP1fvs+M=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=hILcN0XeFbdlxHlruu2Besehj8PrHaiqK2pb5lcwSpeVPh4meHqmzavwE3/p7aJt0 MokaJ2fCycJrIYhZY6Ie+dyEedEdXN+hvlrNFeS2KLvGsYJitJY9EvJ2iZTQ3e+kKE 3wqe0tgp54GvzlHGrcrn8SGkf8ikX4AvblOSQiobkJ7Zg90ggh8y1BDJkqH/SkPRKn 3hV8fHIVNx7T2coeFmjILwXqtVcXJjiqjOwvLcCHlTsWdFbPL/0xNn1nrdV646OgkM 8U7MFS2K3y1tJH4yhnB3ILQUJjBvSqvGMhai2e8ofZhGWHI1RgRyDgbl0C0H1i+WpN Xeyx6IIGE6ETQ== From: Mike Rapoport To: Andrew Morton Cc: Andrea Arcangeli , Andrei Vagin , Axel Rasmussen , Baolin Wang , David Hildenbrand , Harry Yoo , Hugh Dickins , James Houghton , "Liam R. Howlett" , "Lorenzo Stoakes (Oracle)" , "Matthew Wilcox (Oracle)" , Michal Hocko , Mike Rapoport , Muchun Song , Nikita Kalyazin , Oscar Salvador , Paolo Bonzini , Peter Xu , Sean Christopherson , Shuah Khan , Suren Baghdasaryan , Vlastimil Babka , kvm@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v3 13/15] KVM: guest_memfd: implement userfaultfd operations Date: Mon, 30 Mar 2026 13:11:14 +0300 Message-ID: <20260330101116.1117699-14-rppt@kernel.org> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260330101116.1117699-1-rppt@kernel.org> References: <20260330101116.1117699-1-rppt@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Nikita Kalyazin userfaultfd notifications about page faults used for live migration and snapshotting of VMs. MISSING mode allows post-copy live migration and MINOR mode allows optimization for post-copy live migration for VMs backed with shared hugetlbfs or tmpfs mappings as described in detail in commit 7677f7fd8be7 ("userfaultfd: add minor fault registration mode"). To use the same mechanisms for VMs that use guest_memfd to map their memory, guest_memfd should support userfaultfd operations. Add implementation of vm_uffd_ops to guest_memfd. Signed-off-by: Nikita Kalyazin Co-developed-by: Mike Rapoport (Microsoft) Signed-off-by: Mike Rapoport (Microsoft) --- mm/filemap.c | 1 + virt/kvm/guest_memfd.c | 84 +++++++++++++++++++++++++++++++++++++++++- 2 files changed, 83 insertions(+), 2 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index 406cef06b684..a91582293118 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -262,6 +262,7 @@ void filemap_remove_folio(struct folio *folio) =20 filemap_free_folio(mapping, folio); } +EXPORT_SYMBOL_FOR_MODULES(filemap_remove_folio, "kvm"); =20 /* * page_cache_delete_batch - delete several folios from page cache diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index 017d84a7adf3..46582feeed75 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -7,6 +7,7 @@ #include #include #include +#include =20 #include "kvm_mm.h" =20 @@ -107,6 +108,12 @@ static int kvm_gmem_prepare_folio(struct kvm *kvm, str= uct kvm_memory_slot *slot, return __kvm_gmem_prepare_folio(kvm, slot, index, folio); } =20 +static struct folio *kvm_gmem_get_folio_noalloc(struct inode *inode, pgoff= _t pgoff) +{ + return __filemap_get_folio(inode->i_mapping, pgoff, + FGP_LOCK | FGP_ACCESSED, 0); +} + /* * Returns a locked folio on success. The caller is responsible for * setting the up-to-date flag before the memory is mapped into the guest. @@ -126,8 +133,7 @@ static struct folio *kvm_gmem_get_folio(struct inode *i= node, pgoff_t index) * Fast-path: See if folio is already present in mapping to avoid * policy_lookup. */ - folio =3D __filemap_get_folio(inode->i_mapping, index, - FGP_LOCK | FGP_ACCESSED, 0); + folio =3D kvm_gmem_get_folio_noalloc(inode, index); if (!IS_ERR(folio)) return folio; =20 @@ -457,12 +463,86 @@ static struct mempolicy *kvm_gmem_get_policy(struct v= m_area_struct *vma, } #endif /* CONFIG_NUMA */ =20 +#ifdef CONFIG_USERFAULTFD +static bool kvm_gmem_can_userfault(struct vm_area_struct *vma, vm_flags_t = vm_flags) +{ + struct inode *inode =3D file_inode(vma->vm_file); + + /* + * Only support userfaultfd for guest_memfd with INIT_SHARED flag. + * This ensures the memory can be mapped to userspace. + */ + if (!(GMEM_I(inode)->flags & GUEST_MEMFD_FLAG_INIT_SHARED)) + return false; + + return true; +} + +static struct folio *kvm_gmem_folio_alloc(struct vm_area_struct *vma, + unsigned long addr) +{ + struct inode *inode =3D file_inode(vma->vm_file); + pgoff_t pgoff =3D linear_page_index(vma, addr); + struct mempolicy *mpol; + struct folio *folio; + gfp_t gfp; + + if (unlikely(pgoff >=3D (i_size_read(inode) >> PAGE_SHIFT))) + return NULL; + + gfp =3D mapping_gfp_mask(inode->i_mapping); + mpol =3D mpol_shared_policy_lookup(&GMEM_I(inode)->policy, pgoff); + mpol =3D mpol ?: get_task_policy(current); + folio =3D filemap_alloc_folio(gfp, 0, mpol); + mpol_cond_put(mpol); + + return folio; +} + +static int kvm_gmem_filemap_add(struct folio *folio, + struct vm_area_struct *vma, + unsigned long addr) +{ + struct inode *inode =3D file_inode(vma->vm_file); + struct address_space *mapping =3D inode->i_mapping; + pgoff_t pgoff =3D linear_page_index(vma, addr); + int err; + + __folio_set_locked(folio); + err =3D filemap_add_folio(mapping, folio, pgoff, GFP_KERNEL); + if (err) { + folio_unlock(folio); + return err; + } + + return 0; +} + +static void kvm_gmem_filemap_remove(struct folio *folio, + struct vm_area_struct *vma) +{ + filemap_remove_folio(folio); + folio_unlock(folio); +} + +static const struct vm_uffd_ops kvm_gmem_uffd_ops =3D { + .can_userfault =3D kvm_gmem_can_userfault, + .get_folio_noalloc =3D kvm_gmem_get_folio_noalloc, + .alloc_folio =3D kvm_gmem_folio_alloc, + .filemap_add =3D kvm_gmem_filemap_add, + .filemap_remove =3D kvm_gmem_filemap_remove, +}; +#endif /* CONFIG_USERFAULTFD */ + static const struct vm_operations_struct kvm_gmem_vm_ops =3D { .fault =3D kvm_gmem_fault_user_mapping, #ifdef CONFIG_NUMA .get_policy =3D kvm_gmem_get_policy, .set_policy =3D kvm_gmem_set_policy, #endif +#ifdef CONFIG_USERFAULTFD + .uffd_ops =3D &kvm_gmem_uffd_ops, +#endif }; =20 static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma) --=20 2.53.0