From nobody Mon May 25 03:32:46 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4132526CE32 for ; Tue, 19 May 2026 05:25:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779168327; cv=none; b=PJi90BpILfyNpwnngf5IhV65WSdGtLvGrsocEmk3LB0nepFIRiR7ic6W4wVkpoCUb3gaFrq0VYkkRYJy+SXjaR8rFUQkGUbb8jz5zuDYfRU07nijkyAw8I5x/Iu1mlquYUf8zL/ZK/6L0BoleNsPXuf0i9qrmu6l8YW6TJZngV4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779168327; c=relaxed/simple; bh=V6BbUCGzURMjD3AiJxtmYCqeVn8UlrHsVdAJKJuZgFQ=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=akU1ndRoDHspYp5SGsXfxP3u9F6zsUJRTNCQnQc+s6bzJXIJ83cJraQq0pYRRK4eBXJvyOzEOEPfh8RZ5fzhMpNM2LUJhUaDrPoPlwqxQGoEayUWl6oreQxTCoHuYeyoid7+Gi8FoXTSIicH3mpntmZhYuknQst0K1Y2NdMzCnY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=sk27bgpo; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="sk27bgpo" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5BA01C2BCB3; Tue, 19 May 2026 05:25:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1779168327; bh=V6BbUCGzURMjD3AiJxtmYCqeVn8UlrHsVdAJKJuZgFQ=; h=From:To:Cc:Subject:Date:From; b=sk27bgpoj02r4mfRmc1zwr6ahiAx325MbdTHd3Z0lLHJJ0+fahDF4joQQsthCZ5kS E63QjsTHq+ZH/LFRj2kDvBJuGf7zGMKO1YL9Ed9xu1peBIxmsY5ZPGaOiu1g30X1Ev STdsnJMqYmD6fBZNQtdMVjvkcNim73/EdnGRKMHIC/D2/xw8bvD4+D9QheYwMW+aHZ rz+1dWDgLs1ySI/eKuoZtiXI1X0z10KvtOXyCssWSjC1F+H2Tb1wpc9IGG7y1ILoeI Uo/CvEQdjlB5VRQOsbdVUxWq/2prXnBUmtUHW+mmjTPemwMStBzfSE9Ukc4tQgNOCd Q31ewOXDCiJ7Q== From: Mike Rapoport To: Andrew Morton Cc: David Carlier , David Hildenbrand , Heechan Kang , "Liam R. Howlett" , Lorenzo Stoakes , Michael Bommarito , Mike Rapoport , Peter Xu , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH RESEND] userfaultfd: snapshot VMA state across UFFDIO_COPY retry Date: Tue, 19 May 2026 08:25:16 +0300 Message-ID: <20260519052516.3315196-1-rppt@kernel.org> X-Mailer: git-send-email 2.53.0 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Mike Rapoport (Microsoft)" mfill_copy_folio_retry() drops the VMA lock for copy_from_user() and reacquires it afterwards. The destination VMA can be replaced during that window. The existing check compares vma_uffd_ops() before and after the retry, but if a shmem VMA with MAP_SHARED is replaced with a shmem VMA with MAP_PRIVATE (or vice versa) the replacement goes undetected. The change from MAP_PRIVATE to MAP_SHARED will treat the folio allocated with shmem_alloc_folio() as anonymous and this will cause BUG() when mfill_atomic_install_pte() will try to folio_add_new_anon_rmap(). The change from MAP_SHARED to MAP_PRIVATE allows injection of folios into the page cache of the original VMA. Introduce helpers for more comprehensive comparison of VMA state: - vma_snapshot_get() to save the relevant VMA state into a struct vma_snapshot (original uffd_ops, actual uffd_ops, relevant VMA flags, vm_file and pgoff) before dropping the lock - vma_snapshot_changed() to compare the saved state with the state of the VMA acquired after retaking the locks - vma_snapshot_put() to release vm_file pinning. Use DEFINE_FREE() cleanup to wrap vma_snapshot_put() to avoid complicating error handling paths in mfill_copy_folio_retry(). Add vma_uffd_copy_ops() to avoid code duplication when original ops of shmem VMA with MAP_PRIVATE are replaced with anon_uffd_ops. Fixes: 292411fda25b ("mm/userfaultfd: detect VMA type change after copy ret= ry in mfill_copy_folio_retry()") Fixes: 6ab703034f14 ("userfaultfd: mfill_atomic(): remove retry logic") Tested-by: Heechan Kang Suggested-by: Peter Xu Co-developed-by: David Carlier Signed-off-by: David Carlier Co-developed-by: Michael Bommarito Signed-off-by: Michael Bommarito Signed-off-by: Mike Rapoport (Microsoft) --- mm/userfaultfd.c | 99 ++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 79 insertions(+), 20 deletions(-) diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 180bad42fc79..b70b84776a79 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -14,6 +14,8 @@ #include #include #include +#include +#include #include #include #include "internal.h" @@ -69,6 +71,24 @@ static const struct vm_uffd_ops *vma_uffd_ops(struct vm_= area_struct *vma) return vma->vm_ops ? vma->vm_ops->uffd_ops : NULL; } =20 +static const struct vm_uffd_ops *vma_uffd_copy_ops(struct vm_area_struct *= vma) +{ + const struct vm_uffd_ops *ops =3D vma_uffd_ops(vma); + + if (!ops) + return NULL; + + /* + * UFFDIO_COPY fills MAP_PRIVATE file-backed mappings as anonymous + * memory. This is an effective ops override, so retry validation must + * compare the override result, not just vma->vm_ops->uffd_ops. + */ + if (!(vma->vm_flags & VM_SHARED)) + return &anon_uffd_ops; + + return ops; +} + static __always_inline bool validate_dst_vma(struct vm_area_struct *dst_vma, unsigned long dst_en= d) { @@ -443,14 +463,70 @@ static int mfill_copy_folio_locked(struct folio *foli= o, unsigned long src_addr) return ret; } =20 +#define VMA_SNAPSHOT_FLAGS append_vma_flags(__VMA_UFFD_FLAGS, VMA_SHARED_B= IT) + +struct vma_snapshot { + const struct vm_uffd_ops *copy_ops; + const struct vm_uffd_ops *ops; + struct file *file; + vma_flags_t flags; + pgoff_t pgoff; +}; + +static void vma_snapshot_get(struct vma_snapshot *s, struct vm_area_struct= *vma) +{ + s->flags =3D vma_flags_and_mask(&vma->flags, VMA_SNAPSHOT_FLAGS); + s->copy_ops =3D vma_uffd_copy_ops(vma); + s->ops =3D vma_uffd_ops(vma); + s->pgoff =3D vma->vm_pgoff; + + if (vma->vm_file) + s->file =3D get_file(vma->vm_file); +} + +static bool vma_snapshot_changed(struct vma_snapshot *s, + struct vm_area_struct *vma) +{ + vma_flags_t flags =3D vma_flags_and_mask(&vma->flags, VMA_SNAPSHOT_FLAGS); + + if (!vma_flags_same_pair(&s->flags, &flags)) + return true; + + /* VMA type or effective uffd_ops changed while the lock was dropped */ + if (s->ops !=3D vma_uffd_ops(vma) || s->copy_ops !=3D vma_uffd_copy_ops(v= ma)) + return true; + + /* VMA was anonymous before; changed only if it no longer is */ + if (!s->file) + return !vma_is_anonymous(vma); + + /* VMA was file backed, but inode or offset has changed */ + if (!vma->vm_file || vma->vm_file->f_inode !=3D s->file->f_inode || + vma->vm_pgoff !=3D s->pgoff) + return true; + + return false; +} + +static void vma_snapshot_put(struct vma_snapshot *s) +{ + if (s->file) + fput(s->file); +} + +DEFINE_FREE(snapshot_put, struct vma_snapshot *, if (_T) vma_snapshot_put(= _T)); + static int mfill_copy_folio_retry(struct mfill_state *state, struct folio *folio) { - const struct vm_uffd_ops *orig_ops =3D vma_uffd_ops(state->vma); + struct vma_snapshot s =3D { 0 }; + struct vma_snapshot *p __free(snapshot_put) =3D &s; unsigned long src_addr =3D state->src_addr; void *kaddr; int err; =20 + vma_snapshot_get(&s, state->vma); + /* retry copying with mm_lock dropped */ mfill_put_vma(state); =20 @@ -467,12 +543,7 @@ static int mfill_copy_folio_retry(struct mfill_state *= state, if (err) return err; =20 - /* - * The VMA type may have changed while the lock was dropped - * (e.g. replaced with a hugetlb mapping), making the caller's - * ops pointer stale. - */ - if (vma_uffd_ops(state->vma) !=3D orig_ops) + if (vma_snapshot_changed(&s, state->vma)) return -EAGAIN; =20 err =3D mfill_establish_pmd(state); @@ -545,19 +616,7 @@ static int __mfill_atomic_pte(struct mfill_state *stat= e, =20 static int mfill_atomic_pte_copy(struct mfill_state *state) { - const struct vm_uffd_ops *ops =3D vma_uffd_ops(state->vma); - - /* - * The normal page fault path for a MAP_PRIVATE mapping in a - * file-backed VMA will invoke the fault, fill the hole in the file and - * COW it right away. The result generates plain anonymous memory. - * So when we are asked to fill a hole in a MAP_PRIVATE mapping, we'll - * generate anonymous memory directly without actually filling the - * hole. For the MAP_PRIVATE case the robustness check only happens in - * the pagetable (to verify it's still none) and not in the page cache. - */ - if (!(state->vma->vm_flags & VM_SHARED)) - ops =3D &anon_uffd_ops; + const struct vm_uffd_ops *ops =3D vma_uffd_copy_ops(state->vma); =20 return __mfill_atomic_pte(state, ops); } base-commit: 444fc9435e57157fcf30fc99aee44997f3458641 --=20 2.53.0