mm/gup.c | 116 ++++++++++++++++++++++++++++++++++++------------------- 1 file changed, 77 insertions(+), 39 deletions(-)
This applies to today's mm-hotfixes-unstable (only). In order to test this, my earlier patch is a prequisite: commit 255231c75dcd ("mm/gup: stop leaking pinned pages in low memory conditions"). Changes since v1 [1]: 1) Completely different implementation: instead of changing the allocator from kmalloc() to kvmalloc(), just avoid allocations entirely. Note that David's original suggestion [2] included something that I've left out for now, mostly because it's a pre-existing question and deserves its own patch. But also, I don't understand it yet, either. [1] https://lore.kernel.org/20241030030116.670307-1-jhubbard@nvidia.com [2] https://lore.kernel.org/8d9dc103-47c5-4719-971a-31efb091432a@redhat.com thanks, John Hubbard Cc: David Hildenbrand <david@redhat.com> Cc: Vivek Kasireddy <vivek.kasireddy@intel.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Gerd Hoffmann <kraxel@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Peter Xu <peterx@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Dongwon Kim <dongwon.kim@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Junxiao Chang <junxiao.chang@intel.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: linux-stable@vger.kernel.org John Hubbard (1): mm/gup: avoid an unnecessary allocation call for FOLL_LONGTERM cases mm/gup.c | 116 ++++++++++++++++++++++++++++++++++++------------------- 1 file changed, 77 insertions(+), 39 deletions(-) base-commit: 0012ab094cad019afcd07bdfc28e2946537e715c -- 2.47.0
On 05.11.24 04:29, John Hubbard wrote: > This applies to today's mm-hotfixes-unstable (only). In order to test > this, my earlier patch is a prequisite: commit 255231c75dcd ("mm/gup: > stop leaking pinned pages in low memory conditions"). > > Changes since v1 [1]: > > 1) Completely different implementation: instead of changing the > allocator from kmalloc() to kvmalloc(), just avoid allocations entirely. > > Note that David's original suggestion [2] included something that I've > left out for now, mostly because it's a pre-existing question and > deserves its own patch. But also, I don't understand it yet, either. Yeah, I was only adding it because I stumbled over it. It might not be a problem, because we simply "skip" if we find a folio that was already isolated (possibly by us). What might happen is that we unnecessarily drain the LRU. __collapse_huge_page_isolate() scans the compound_pagelist() list, before try-locking and isolating. But it also just "fails" instead of retrying forever. Imagine the page tables looking like the following (e.g., COW in a MAP_PRIVATE file mapping that supports large folios) ------ F0P2 was replaced by a new (small) folio | [ F0P0 ] [ F0P1 ] [ F1P0 ] [F0P3 ] F0P0: Folio 0, page 0 Assume we try pinning that range and end up in collect_longterm_unpinnable_folios() with: F0, F0, F1, F0 Assume F0 and F1 are not long-term pinnable. i = 0: We isolate F0 i = 1: We see that it is the same F0 and skip i = 2: We isolate F1 i = 3: We see !folio_test_lru() and do a lru_add_drain_all() to then fail folio_isolate_lru() So the drain in i=3 could be avoided by scanning the list, if we already isolated that one. Working better than I originally thought. -- Cheers, David / dhildenb
On 11/5/24 12:42 AM, David Hildenbrand wrote: > On 05.11.24 04:29, John Hubbard wrote: ... > Yeah, I was only adding it because I stumbled over it. It might not be a problem, because we simply "skip" if we find a folio that was already isolated (possibly by us). What might happen is that we unnecessarily drain the LRU. > > __collapse_huge_page_isolate() scans the compound_pagelist() list, before try-locking and isolating. But it also just "fails" instead of retrying forever. > > Imagine the page tables looking like the following (e.g., COW in a MAP_PRIVATE file mapping that supports large folios) > > ------ F0P2 was replaced by a new (small) folio > | > [ F0P0 ] [ F0P1 ] [ F1P0 ] [F0P3 ] > > F0P0: Folio 0, page 0 > > Assume we try pinning that range and end up in collect_longterm_unpinnable_folios() with: > > F0, F0, F1, F0 > > > Assume F0 and F1 are not long-term pinnable. > > i = 0: We isolate F0 > i = 1: We see that it is the same F0 and skip > i = 2: We isolate F1 > i = 3: We see !folio_test_lru() and do a lru_add_drain_all() to then > fail folio_isolate_lru() > > So the drain in i=3 could be avoided by scanning the list, if we already isolated that one. Working better than I originally thought. Thanks for spelling out that case, I was having trouble visualizing it, but now it's clear. OK, so looking at this, I think it could be extended to more than just "skip the drain". It seems like we should also avoid counting the folio (the existing code seems wrong). So I think this approach would be correct, does it seem accurate to you as well? Here: diff --git a/mm/gup.c b/mm/gup.c index ad0c8922dac3..ab8e706b52f0 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -2324,11 +2324,21 @@ static unsigned long collect_longterm_unpinnable_folios( for (i = 0; i < pofs->nr_entries; i++) { struct folio *folio = pofs_get_folio(pofs, i); + struct folio *tmp_folio; + /* + * Two checks to see if this folio has already been collected. + * The first check is quick, and the second check is thorough. + */ if (folio == prev_folio) continue; prev_folio = folio; + list_for_each_entry(tmp_folio, movable_folio_list, lru) { + if (folio == tmp_folio) + continue; + } + if (folio_is_longterm_pinnable(folio)) continue; I need to test this more thoroughly, though, with a directed gup test (I'm not sure we have one yet). thanks, -- John Hubbard
© 2016 - 2024 Red Hat, Inc.