From: Barry Song <v-songbaohua@oppo.com>
The refcount may be temporarily or long-term increased, but this does
not change the fundamental nature of the folio already being lazy-
freed. Therefore, we only reset 'swapbacked' when we are certain the
folio is dirty and not droppable.
Suggested-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
---
mm/rmap.c | 49 ++++++++++++++++++++++---------------------------
1 file changed, 22 insertions(+), 27 deletions(-)
diff --git a/mm/rmap.c b/mm/rmap.c
index c6c4d4ea29a7..de6b8c34e98c 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1868,34 +1868,29 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
*/
smp_rmb();
- /*
- * The only page refs must be one from isolation
- * plus the rmap(s) (dropped by discard:).
- */
- if (ref_count == 1 + map_count &&
- (!folio_test_dirty(folio) ||
- /*
- * Unlike MADV_FREE mappings, VM_DROPPABLE
- * ones can be dropped even if they've
- * been dirtied.
- */
- (vma->vm_flags & VM_DROPPABLE))) {
- dec_mm_counter(mm, MM_ANONPAGES);
- goto discard;
- }
-
- /*
- * If the folio was redirtied, it cannot be
- * discarded. Remap the page to page table.
- */
- set_pte_at(mm, address, pvmw.pte, pteval);
- /*
- * Unlike MADV_FREE mappings, VM_DROPPABLE ones
- * never get swap backed on failure to drop.
- */
- if (!(vma->vm_flags & VM_DROPPABLE))
+ if (folio_test_dirty(folio) && !(vma->vm_flags & VM_DROPPABLE)) {
+ /*
+ * redirtied either using the page table or a previously
+ * obtained GUP reference.
+ */
+ set_pte_at(mm, address, pvmw.pte, pteval);
folio_set_swapbacked(folio);
- goto walk_abort;
+ goto walk_abort;
+ } else if (ref_count != 1 + map_count) {
+ /*
+ * Additional reference. Could be a GUP reference or any
+ * speculative reference. GUP users must mark the folio
+ * dirty if there was a modification. This folio cannot be
+ * reclaimed right now either way, so act just like nothing
+ * happened.
+ * We'll come back here later and detect if the folio was
+ * dirtied when the additional reference is gone.
+ */
+ set_pte_at(mm, address, pvmw.pte, pteval);
+ goto walk_abort;
+ }
+ dec_mm_counter(mm, MM_ANONPAGES);
+ goto discard;
}
if (swap_duplicate(entry) < 0) {
--
2.39.3 (Apple Git-146)
On 2025/1/6 11:17, Barry Song wrote: > From: Barry Song <v-songbaohua@oppo.com> > > The refcount may be temporarily or long-term increased, but this does > not change the fundamental nature of the folio already being lazy- > freed. Therefore, we only reset 'swapbacked' when we are certain the > folio is dirty and not droppable. > > Suggested-by: David Hildenbrand <david@redhat.com> > Signed-off-by: Barry Song <v-songbaohua@oppo.com> The changes look good to me. While we are at it, could you also change the __discard_anon_folio_pmd_locked() to follow the same strategy for lazy-freed PMD-sized folio?
On Mon, Jan 6, 2025 at 7:40 PM Baolin Wang <baolin.wang@linux.alibaba.com> wrote: > > > > On 2025/1/6 11:17, Barry Song wrote: > > From: Barry Song <v-songbaohua@oppo.com> > > > > The refcount may be temporarily or long-term increased, but this does > > not change the fundamental nature of the folio already being lazy- > > freed. Therefore, we only reset 'swapbacked' when we are certain the > > folio is dirty and not droppable. > > > > Suggested-by: David Hildenbrand <david@redhat.com> > > Signed-off-by: Barry Song <v-songbaohua@oppo.com> > > The changes look good to me. While we are at it, could you also change > the __discard_anon_folio_pmd_locked() to follow the same strategy for > lazy-freed PMD-sized folio? it seems you mean __discard_anon_folio_pmd_locked() is lacking folio_set_swapbacked(folio) for dirty pmd-mapped folios? and it seems !(vma->vm_flags & VM_DROPPABLE) is also not handled properly? Thanks barry
On 2025/1/6 17:03, Barry Song wrote: > On Mon, Jan 6, 2025 at 7:40 PM Baolin Wang > <baolin.wang@linux.alibaba.com> wrote: >> >> >> >> On 2025/1/6 11:17, Barry Song wrote: >>> From: Barry Song <v-songbaohua@oppo.com> >>> >>> The refcount may be temporarily or long-term increased, but this does >>> not change the fundamental nature of the folio already being lazy- >>> freed. Therefore, we only reset 'swapbacked' when we are certain the >>> folio is dirty and not droppable. >>> >>> Suggested-by: David Hildenbrand <david@redhat.com> >>> Signed-off-by: Barry Song <v-songbaohua@oppo.com> >> >> The changes look good to me. While we are at it, could you also change >> the __discard_anon_folio_pmd_locked() to follow the same strategy for >> lazy-freed PMD-sized folio? > > it seems you mean __discard_anon_folio_pmd_locked() is lacking > folio_set_swapbacked(folio) for dirty pmd-mapped folios? > and it seems !(vma->vm_flags & VM_DROPPABLE) is also not > handled properly? Right.
On Mon, Jan 6, 2025 at 5:34 PM Baolin Wang <baolin.wang@linux.alibaba.com> wrote: > > > > On 2025/1/6 17:03, Barry Song wrote: > > On Mon, Jan 6, 2025 at 7:40 PM Baolin Wang > > <baolin.wang@linux.alibaba.com> wrote: > >> > >> > >> > >> On 2025/1/6 11:17, Barry Song wrote: > >>> From: Barry Song <v-songbaohua@oppo.com> > >>> > >>> The refcount may be temporarily or long-term increased, but this does > >>> not change the fundamental nature of the folio already being lazy- > >>> freed. Therefore, we only reset 'swapbacked' when we are certain the > >>> folio is dirty and not droppable. > >>> > >>> Suggested-by: David Hildenbrand <david@redhat.com> > >>> Signed-off-by: Barry Song <v-songbaohua@oppo.com> > >> > >> The changes look good to me. While we are at it, could you also change > >> the __discard_anon_folio_pmd_locked() to follow the same strategy for > >> lazy-freed PMD-sized folio? > > > > it seems you mean __discard_anon_folio_pmd_locked() is lacking > > folio_set_swapbacked(folio) for dirty pmd-mapped folios? Good catch! Hmm... I don't recall why we don't call folio_set_swapbacked for dirty THPs in __discard_anon_folio_pmd_locked() - possibly to align with previous behavior ;) If a dirty PMD-mapped THP cannot be discarded, we just split it and restart the page walk to process the PTE-mapped THP. After that, we will only mark each folio within the THP as swap-backed individually. It seems like we could cut the work by calling folio_set_swapbacked() for dirty THPs directly in __discard_anon_folio_pmd_locked(), skipping the restart of the page walk after splitting the THP, IMHO ;) Thanks, Lance > > and it seems !(vma->vm_flags & VM_DROPPABLE) is also not > > handled properly? > > Right.
On Mon, Jan 6, 2025 at 10:39 PM Lance Yang <ioworker0@gmail.com> wrote: > > On Mon, Jan 6, 2025 at 5:34 PM Baolin Wang > <baolin.wang@linux.alibaba.com> wrote: > > > > > > > > On 2025/1/6 17:03, Barry Song wrote: > > > On Mon, Jan 6, 2025 at 7:40 PM Baolin Wang > > > <baolin.wang@linux.alibaba.com> wrote: > > >> > > >> > > >> > > >> On 2025/1/6 11:17, Barry Song wrote: > > >>> From: Barry Song <v-songbaohua@oppo.com> > > >>> > > >>> The refcount may be temporarily or long-term increased, but this does > > >>> not change the fundamental nature of the folio already being lazy- > > >>> freed. Therefore, we only reset 'swapbacked' when we are certain the > > >>> folio is dirty and not droppable. > > >>> > > >>> Suggested-by: David Hildenbrand <david@redhat.com> > > >>> Signed-off-by: Barry Song <v-songbaohua@oppo.com> > > >> > > >> The changes look good to me. While we are at it, could you also change > > >> the __discard_anon_folio_pmd_locked() to follow the same strategy for > > >> lazy-freed PMD-sized folio? > > > > > > it seems you mean __discard_anon_folio_pmd_locked() is lacking > > > folio_set_swapbacked(folio) for dirty pmd-mapped folios? > > Good catch! > > Hmm... I don't recall why we don't call folio_set_swapbacked for dirty > THPs in __discard_anon_folio_pmd_locked() - possibly to align with > previous behavior ;) > > If a dirty PMD-mapped THP cannot be discarded, we just split it and > restart the page walk to process the PTE-mapped THP. After that, we > will only mark each folio within the THP as swap-backed individually. > > It seems like we could cut the work by calling folio_set_swapbacked() > for dirty THPs directly in __discard_anon_folio_pmd_locked(), skipping > the restart of the page walk after splitting the THP, IMHO ;) In correction to the earlier email: folio_set_swapbacked() is only called in __discard_anon_folio_pmd_locked() when '!(vma->vm_flags & VM_DROPPABLE)' is true, IIUC. Thanks, Lance > > Thanks, > Lance > > > > > and it seems !(vma->vm_flags & VM_DROPPABLE) is also not > > > handled properly? > > > > > > Right.
On Tue, Jan 7, 2025 at 3:40 AM Lance Yang <ioworker0@gmail.com> wrote:
>
> On Mon, Jan 6, 2025 at 5:34 PM Baolin Wang
> <baolin.wang@linux.alibaba.com> wrote:
> >
> >
> >
> > On 2025/1/6 17:03, Barry Song wrote:
> > > On Mon, Jan 6, 2025 at 7:40 PM Baolin Wang
> > > <baolin.wang@linux.alibaba.com> wrote:
> > >>
> > >>
> > >>
> > >> On 2025/1/6 11:17, Barry Song wrote:
> > >>> From: Barry Song <v-songbaohua@oppo.com>
> > >>>
> > >>> The refcount may be temporarily or long-term increased, but this does
> > >>> not change the fundamental nature of the folio already being lazy-
> > >>> freed. Therefore, we only reset 'swapbacked' when we are certain the
> > >>> folio is dirty and not droppable.
> > >>>
> > >>> Suggested-by: David Hildenbrand <david@redhat.com>
> > >>> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> > >>
> > >> The changes look good to me. While we are at it, could you also change
> > >> the __discard_anon_folio_pmd_locked() to follow the same strategy for
> > >> lazy-freed PMD-sized folio?
> > >
> > > it seems you mean __discard_anon_folio_pmd_locked() is lacking
> > > folio_set_swapbacked(folio) for dirty pmd-mapped folios?
>
> Good catch!
>
> Hmm... I don't recall why we don't call folio_set_swapbacked for dirty
> THPs in __discard_anon_folio_pmd_locked() - possibly to align with
> previous behavior ;)
>
> If a dirty PMD-mapped THP cannot be discarded, we just split it and
> restart the page walk to process the PTE-mapped THP. After that, we
> will only mark each folio within the THP as swap-backed individually.
>
> It seems like we could cut the work by calling folio_set_swapbacked()
> for dirty THPs directly in __discard_anon_folio_pmd_locked(), skipping
> the restart of the page walk after splitting the THP, IMHO ;)
Yes, the existing code for PMD-mapped THPs seems quite inefficient. It splits
the PMD-mapped THP into smaller folios, marks each split PTE as dirty, and
then iterates over each PTE. I’m not sure why it’s designed this way—could
there be a specific reason behind this approach?
However, it does appear to handle folio_set_swapbacked() correctly, as only
a dirty PMD will result in dirty PTEs being generated in
__split_huge_pmd_locked():
} else {
pte_t entry;
entry = mk_pte(page, READ_ONCE(vma->vm_page_prot));
if (write)
entry = pte_mkwrite(entry, vma);
if (!young)
entry = pte_mkold(entry);
/* NOTE: this may set soft-dirty too on some archs */
if (dirty)
entry = pte_mkdirty(entry);
if (soft_dirty)
entry = pte_mksoft_dirty(entry);
if (uffd_wp)
entry = pte_mkuffd_wp(entry);
for (i = 0; i < HPAGE_PMD_NR; i++)
VM_WARN_ON(!pte_none(ptep_get(pte + i)));
set_ptes(mm, haddr, pte, entry, HPAGE_PMD_NR);
}
>
> Thanks,
> Lance
>
>
> > > and it seems !(vma->vm_flags & VM_DROPPABLE) is also not
> > > handled properly?
>
>
> >
> > Right.
Thanks
Barry
On Tue, Jan 7, 2025 at 9:52 AM Barry Song <21cnbao@gmail.com> wrote:
>
> On Tue, Jan 7, 2025 at 3:40 AM Lance Yang <ioworker0@gmail.com> wrote:
> >
> > On Mon, Jan 6, 2025 at 5:34 PM Baolin Wang
> > <baolin.wang@linux.alibaba.com> wrote:
> > >
> > >
> > >
> > > On 2025/1/6 17:03, Barry Song wrote:
> > > > On Mon, Jan 6, 2025 at 7:40 PM Baolin Wang
> > > > <baolin.wang@linux.alibaba.com> wrote:
> > > >>
> > > >>
> > > >>
> > > >> On 2025/1/6 11:17, Barry Song wrote:
> > > >>> From: Barry Song <v-songbaohua@oppo.com>
> > > >>>
> > > >>> The refcount may be temporarily or long-term increased, but this does
> > > >>> not change the fundamental nature of the folio already being lazy-
> > > >>> freed. Therefore, we only reset 'swapbacked' when we are certain the
> > > >>> folio is dirty and not droppable.
> > > >>>
> > > >>> Suggested-by: David Hildenbrand <david@redhat.com>
> > > >>> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> > > >>
> > > >> The changes look good to me. While we are at it, could you also change
> > > >> the __discard_anon_folio_pmd_locked() to follow the same strategy for
> > > >> lazy-freed PMD-sized folio?
> > > >
> > > > it seems you mean __discard_anon_folio_pmd_locked() is lacking
> > > > folio_set_swapbacked(folio) for dirty pmd-mapped folios?
> >
> > Good catch!
> >
> > Hmm... I don't recall why we don't call folio_set_swapbacked for dirty
> > THPs in __discard_anon_folio_pmd_locked() - possibly to align with
> > previous behavior ;)
> >
> > If a dirty PMD-mapped THP cannot be discarded, we just split it and
> > restart the page walk to process the PTE-mapped THP. After that, we
> > will only mark each folio within the THP as swap-backed individually.
> >
> > It seems like we could cut the work by calling folio_set_swapbacked()
> > for dirty THPs directly in __discard_anon_folio_pmd_locked(), skipping
> > the restart of the page walk after splitting the THP, IMHO ;)
>
> Yes, the existing code for PMD-mapped THPs seems quite inefficient. It splits
> the PMD-mapped THP into smaller folios, marks each split PTE as dirty, and
Apologies for the typo, I meant splitting a PMD-mapped THP into a PTE-mapped
THP.
> then iterates over each PTE. I’m not sure why it’s designed this way—could
> there be a specific reason behind this approach?
>
> However, it does appear to handle folio_set_swapbacked() correctly, as only
> a dirty PMD will result in dirty PTEs being generated in
> __split_huge_pmd_locked():
>
> } else {
> pte_t entry;
>
> entry = mk_pte(page, READ_ONCE(vma->vm_page_prot));
> if (write)
> entry = pte_mkwrite(entry, vma);
>
> if (!young)
> entry = pte_mkold(entry);
>
> /* NOTE: this may set soft-dirty too on some archs */
> if (dirty)
> entry = pte_mkdirty(entry);
>
> if (soft_dirty)
> entry = pte_mksoft_dirty(entry);
>
> if (uffd_wp)
> entry = pte_mkuffd_wp(entry);
>
> for (i = 0; i < HPAGE_PMD_NR; i++)
> VM_WARN_ON(!pte_none(ptep_get(pte + i)));
>
> set_ptes(mm, haddr, pte, entry, HPAGE_PMD_NR);
> }
>
>
>
> >
> > Thanks,
> > Lance
> >
> >
> > > > and it seems !(vma->vm_flags & VM_DROPPABLE) is also not
> > > > handled properly?
> >
> >
> > >
> > > Right.
>
> Thanks
> Barry
© 2016 - 2026 Red Hat, Inc.