mm/userfaultfd.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)
From: Barry Song <v-songbaohua@oppo.com>
As David pointed out, what truly matters for mremap and userfaultfd
move operations is the soft dirty bit. The current comment and
implementation—which always sets the dirty bit for present PTEs
and fails to set the soft dirty bit for swap PTEs—are incorrect.
This patch updates the behavior to correctly set the soft dirty bit
for both present and swap PTEs in accordance with mremap.
Reported-by: David Hildenbrand <david@redhat.com>
Closes: https://lore.kernel.org/linux-mm/02f14ee1-923f-47e3-a994-4950afb9afcc@redhat.com/
Cc: Peter Xu <peterx@redhat.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
---
mm/userfaultfd.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index e8ce92dc105f..bc473ad21202 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -1064,8 +1064,13 @@ static int move_present_pte(struct mm_struct *mm,
src_folio->index = linear_page_index(dst_vma, dst_addr);
orig_dst_pte = folio_mk_pte(src_folio, dst_vma->vm_page_prot);
- /* Follow mremap() behavior and treat the entry dirty after the move */
- orig_dst_pte = pte_mkwrite(pte_mkdirty(orig_dst_pte), dst_vma);
+ /* Set soft dirty bit so userspace can notice the pte was moved */
+#ifdef CONFIG_MEM_SOFT_DIRTY
+ orig_dst_pte = pte_mksoft_dirty(orig_dst_pte);
+#endif
+ if (pte_dirty(orig_src_pte))
+ orig_dst_pte = pte_mkdirty(orig_dst_pte);
+ orig_dst_pte = pte_mkwrite(orig_dst_pte, dst_vma);
set_pte_at(mm, dst_addr, dst_pte, orig_dst_pte);
out:
@@ -1100,6 +1105,9 @@ static int move_swap_pte(struct mm_struct *mm, struct vm_area_struct *dst_vma,
}
orig_src_pte = ptep_get_and_clear(mm, src_addr, src_pte);
+#ifdef CONFIG_MEM_SOFT_DIRTY
+ orig_src_pte = pte_swp_mksoft_dirty(orig_src_pte);
+#endif
set_pte_at(mm, dst_addr, dst_pte, orig_src_pte);
double_pt_unlock(dst_ptl, src_ptl);
--
2.39.3 (Apple Git-146)
On Thu, May 08, 2025 at 09:07:35PM +1200, Barry Song wrote: > From: Barry Song <v-songbaohua@oppo.com> > > As David pointed out, what truly matters for mremap and userfaultfd > move operations is the soft dirty bit. The current comment and > implementation—which always sets the dirty bit for present PTEs > and fails to set the soft dirty bit for swap PTEs—are incorrect. > This patch updates the behavior to correctly set the soft dirty bit > for both present and swap PTEs in accordance with mremap. > > Reported-by: David Hildenbrand <david@redhat.com> > Closes: https://lore.kernel.org/linux-mm/02f14ee1-923f-47e3-a994-4950afb9afcc@redhat.com/ > Cc: Peter Xu <peterx@redhat.com> > Cc: Suren Baghdasaryan <surenb@google.com> > Cc: Lokesh Gidra <lokeshgidra@google.com> > Signed-off-by: Barry Song <v-songbaohua@oppo.com> Acked-by: Peter Xu <peterx@redhat.com> -- Peter Xu
On Thu, May 8, 2025 at 8:23 AM Peter Xu <peterx@redhat.com> wrote: > > On Thu, May 08, 2025 at 09:07:35PM +1200, Barry Song wrote: > > From: Barry Song <v-songbaohua@oppo.com> > > > > As David pointed out, what truly matters for mremap and userfaultfd > > move operations is the soft dirty bit. The current comment and > > implementation—which always sets the dirty bit for present PTEs > > and fails to set the soft dirty bit for swap PTEs—are incorrect. > > This patch updates the behavior to correctly set the soft dirty bit > > for both present and swap PTEs in accordance with mremap. > > > > Reported-by: David Hildenbrand <david@redhat.com> > > Closes: https://lore.kernel.org/linux-mm/02f14ee1-923f-47e3-a994-4950afb9afcc@redhat.com/ > > Cc: Peter Xu <peterx@redhat.com> > > Cc: Suren Baghdasaryan <surenb@google.com> > > Cc: Lokesh Gidra <lokeshgidra@google.com> > > Signed-off-by: Barry Song <v-songbaohua@oppo.com> > > Acked-by: Peter Xu <peterx@redhat.com> Reviewed-by: Suren Baghdasaryan <surenb@google.com> The function was introduced in 6.8. We should probably CC stable as well. > > -- > Peter Xu >
Thanks Barry for fixing this. On Thu, May 8, 2025 at 8:24 AM Peter Xu <peterx@redhat.com> wrote: > > On Thu, May 08, 2025 at 09:07:35PM +1200, Barry Song wrote: > > From: Barry Song <v-songbaohua@oppo.com> > > > > As David pointed out, what truly matters for mremap and userfaultfd > > move operations is the soft dirty bit. The current comment and > > implementation—which always sets the dirty bit for present PTEs > > and fails to set the soft dirty bit for swap PTEs—are incorrect. Can you please briefly describe the consequences of not setting the soft-dirty bit? I'm wondering if it needs to be backported as a fix? > > This patch updates the behavior to correctly set the soft dirty bit > > for both present and swap PTEs in accordance with mremap. > > > > Reported-by: David Hildenbrand <david@redhat.com> > > Closes: https://lore.kernel.org/linux-mm/02f14ee1-923f-47e3-a994-4950afb9afcc@redhat.com/ > > Cc: Peter Xu <peterx@redhat.com> > > Cc: Suren Baghdasaryan <surenb@google.com> > > Cc: Lokesh Gidra <lokeshgidra@google.com> > > Signed-off-by: Barry Song <v-songbaohua@oppo.com> > > Acked-by: Peter Xu <peterx@redhat.com> > > -- > Peter Xu >
On Fri, May 9, 2025 at 3:27 AM Lokesh Gidra <lokeshgidra@google.com> wrote: > > Thanks Barry for fixing this. > > On Thu, May 8, 2025 at 8:24 AM Peter Xu <peterx@redhat.com> wrote: > > > > On Thu, May 08, 2025 at 09:07:35PM +1200, Barry Song wrote: > > > From: Barry Song <v-songbaohua@oppo.com> > > > > > > As David pointed out, what truly matters for mremap and userfaultfd > > > move operations is the soft dirty bit. The current comment and > > > implementation—which always sets the dirty bit for present PTEs > > > and fails to set the soft dirty bit for swap PTEs—are incorrect. > > Can you please briefly describe the consequences of not setting the > soft-dirty bit? I'm wondering if it needs to be backported as a fix? As I understand it, this could break features like Checkpoint-Restore in Userspace (CRIU), which relies on tracking memory changes to create incremental dumps. While Android may not currently have a real-world use case for this, it would still be beneficial to backport the fix in a general way. > > > > This patch updates the behavior to correctly set the soft dirty bit > > > for both present and swap PTEs in accordance with mremap. > > > > > > Reported-by: David Hildenbrand <david@redhat.com> > > > Closes: https://lore.kernel.org/linux-mm/02f14ee1-923f-47e3-a994-4950afb9afcc@redhat.com/ > > > Cc: Peter Xu <peterx@redhat.com> > > > Cc: Suren Baghdasaryan <surenb@google.com> > > > Cc: Lokesh Gidra <lokeshgidra@google.com> > > > Signed-off-by: Barry Song <v-songbaohua@oppo.com> > > > > Acked-by: Peter Xu <peterx@redhat.com> > > > > -- > > Peter Xu Thanks Barry
On Thu, May 8, 2025 at 2:36 PM Barry Song <21cnbao@gmail.com> wrote: > > On Fri, May 9, 2025 at 3:27 AM Lokesh Gidra <lokeshgidra@google.com> wrote: > > > > Thanks Barry for fixing this. > > > > On Thu, May 8, 2025 at 8:24 AM Peter Xu <peterx@redhat.com> wrote: > > > > > > On Thu, May 08, 2025 at 09:07:35PM +1200, Barry Song wrote: > > > > From: Barry Song <v-songbaohua@oppo.com> > > > > > > > > As David pointed out, what truly matters for mremap and userfaultfd > > > > move operations is the soft dirty bit. The current comment and > > > > implementation—which always sets the dirty bit for present PTEs > > > > and fails to set the soft dirty bit for swap PTEs—are incorrect. > > > > Can you please briefly describe the consequences of not setting the > > soft-dirty bit? I'm wondering if it needs to be backported as a fix? > > As I understand it, this could break features like Checkpoint-Restore > in Userspace (CRIU), which relies on tracking memory changes to create > incremental dumps. While Android may not currently have a real-world > use case for this, it would still be beneficial to backport the fix in > a general way. > Makes sense. Thanks for clarifying. > > > > > > This patch updates the behavior to correctly set the soft dirty bit > > > > for both present and swap PTEs in accordance with mremap. > > > > > > > > Reported-by: David Hildenbrand <david@redhat.com> > > > > Closes: https://lore.kernel.org/linux-mm/02f14ee1-923f-47e3-a994-4950afb9afcc@redhat.com/ > > > > Cc: Peter Xu <peterx@redhat.com> > > > > Cc: Suren Baghdasaryan <surenb@google.com> > > > > Cc: Lokesh Gidra <lokeshgidra@google.com> > > > > Signed-off-by: Barry Song <v-songbaohua@oppo.com> > > > > > > Acked-by: Peter Xu <peterx@redhat.com> > > > > > > -- > > > Peter Xu > > Thanks > Barry
On 08.05.25 11:07, Barry Song wrote: > From: Barry Song <v-songbaohua@oppo.com> > > As David pointed out, what truly matters for mremap and userfaultfd > move operations is the soft dirty bit. The current comment and > implementation—which always sets the dirty bit for present PTEs > and fails to set the soft dirty bit for swap PTEs—are incorrect. > This patch updates the behavior to correctly set the soft dirty bit > for both present and swap PTEs in accordance with mremap. > > Reported-by: David Hildenbrand <david@redhat.com> > Closes: https://lore.kernel.org/linux-mm/02f14ee1-923f-47e3-a994-4950afb9afcc@redhat.com/ > Cc: Peter Xu <peterx@redhat.com> > Cc: Suren Baghdasaryan <surenb@google.com> > Cc: Lokesh Gidra <lokeshgidra@google.com> > Signed-off-by: Barry Song <v-songbaohua@oppo.com> > --- > mm/userfaultfd.c | 12 ++++++++++-- > 1 file changed, 10 insertions(+), 2 deletions(-) > > diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c > index e8ce92dc105f..bc473ad21202 100644 > --- a/mm/userfaultfd.c > +++ b/mm/userfaultfd.c > @@ -1064,8 +1064,13 @@ static int move_present_pte(struct mm_struct *mm, > src_folio->index = linear_page_index(dst_vma, dst_addr); > > orig_dst_pte = folio_mk_pte(src_folio, dst_vma->vm_page_prot); > - /* Follow mremap() behavior and treat the entry dirty after the move */ > - orig_dst_pte = pte_mkwrite(pte_mkdirty(orig_dst_pte), dst_vma); > + /* Set soft dirty bit so userspace can notice the pte was moved */ > +#ifdef CONFIG_MEM_SOFT_DIRTY > + orig_dst_pte = pte_mksoft_dirty(orig_dst_pte); > +#endif > + if (pte_dirty(orig_src_pte)) > + orig_dst_pte = pte_mkdirty(orig_dst_pte); > + orig_dst_pte = pte_mkwrite(orig_dst_pte, dst_vma); > > set_pte_at(mm, dst_addr, dst_pte, orig_dst_pte); > out: > @@ -1100,6 +1105,9 @@ static int move_swap_pte(struct mm_struct *mm, struct vm_area_struct *dst_vma, > } > > orig_src_pte = ptep_get_and_clear(mm, src_addr, src_pte); > +#ifdef CONFIG_MEM_SOFT_DIRTY > + orig_src_pte = pte_swp_mksoft_dirty(orig_src_pte); > +#endif > set_pte_at(mm, dst_addr, dst_pte, orig_src_pte); > double_pt_unlock(dst_ptl, src_ptl); Yeah, I think that should be the right thing to do. -- Cheers, David / dhildenb
© 2016 - 2026 Red Hat, Inc.