[PATCH v2 0/1] mm/gup: avoid an unnecessary allocation call for FOLL_LONGTERM cases

John Hubbard posted 1 patch 2 weeks, 5 days ago
mm/gup.c | 116 ++++++++++++++++++++++++++++++++++++-------------------
1 file changed, 77 insertions(+), 39 deletions(-)
[PATCH v2 0/1] mm/gup: avoid an unnecessary allocation call for FOLL_LONGTERM cases
Posted by John Hubbard 2 weeks, 5 days ago
This applies to today's mm-hotfixes-unstable (only). In order to test
this, my earlier patch is a prequisite: commit 255231c75dcd ("mm/gup:
stop leaking pinned pages in low memory conditions").

Changes since v1 [1]:

1) Completely different implementation: instead of changing the
allocator from kmalloc() to kvmalloc(), just avoid allocations entirely.

Note that David's original suggestion [2] included something that I've
left out for now, mostly because it's a pre-existing question and
deserves its own patch. But also, I don't understand it yet, either.


[1] https://lore.kernel.org/20241030030116.670307-1-jhubbard@nvidia.com

[2] https://lore.kernel.org/8d9dc103-47c5-4719-971a-31efb091432a@redhat.com


thanks,
John Hubbard

Cc: David Hildenbrand <david@redhat.com>
Cc: Vivek Kasireddy <vivek.kasireddy@intel.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dongwon Kim <dongwon.kim@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Junxiao Chang <junxiao.chang@intel.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: linux-stable@vger.kernel.org

John Hubbard (1):
  mm/gup: avoid an unnecessary allocation call for FOLL_LONGTERM cases

 mm/gup.c | 116 ++++++++++++++++++++++++++++++++++++-------------------
 1 file changed, 77 insertions(+), 39 deletions(-)


base-commit: 0012ab094cad019afcd07bdfc28e2946537e715c
-- 
2.47.0
Re: [PATCH v2 0/1] mm/gup: avoid an unnecessary allocation call for FOLL_LONGTERM cases
Posted by David Hildenbrand 2 weeks, 5 days ago
On 05.11.24 04:29, John Hubbard wrote:
> This applies to today's mm-hotfixes-unstable (only). In order to test
> this, my earlier patch is a prequisite: commit 255231c75dcd ("mm/gup:
> stop leaking pinned pages in low memory conditions").
> 
> Changes since v1 [1]:
> 
> 1) Completely different implementation: instead of changing the
> allocator from kmalloc() to kvmalloc(), just avoid allocations entirely.
> 
> Note that David's original suggestion [2] included something that I've
> left out for now, mostly because it's a pre-existing question and
> deserves its own patch. But also, I don't understand it yet, either.

Yeah, I was only adding it because I stumbled over it. It might not be a 
problem, because we simply "skip" if we find a folio that was already 
isolated (possibly by us). What might happen is that we unnecessarily 
drain the LRU.

__collapse_huge_page_isolate() scans the compound_pagelist() list, 
before try-locking and isolating. But it also just "fails" instead of 
retrying forever.

Imagine the page tables looking like the following (e.g., COW in a 
MAP_PRIVATE file mapping that supports large folios)

		      ------ F0P2 was replaced by a new (small) folio
		     |
[ F0P0 ] [ F0P1 ] [ F1P0 ] [F0P3 ]

F0P0: Folio 0, page 0

Assume we try pinning that range and end up in 
collect_longterm_unpinnable_folios() with:

F0, F0, F1, F0


Assume F0 and F1 are not long-term pinnable.

i = 0: We isolate F0
i = 1: We see that it is the same F0 and skip
i = 2: We isolate F1
i = 3: We see !folio_test_lru() and do a lru_add_drain_all() to then
        fail folio_isolate_lru()

So the drain in i=3 could be avoided by scanning the list, if we already 
isolated that one. Working better than I originally thought.

-- 
Cheers,

David / dhildenb
Re: [PATCH v2 0/1] mm/gup: avoid an unnecessary allocation call for FOLL_LONGTERM cases
Posted by John Hubbard 2 weeks, 3 days ago
On 11/5/24 12:42 AM, David Hildenbrand wrote:
> On 05.11.24 04:29, John Hubbard wrote:
...
> Yeah, I was only adding it because I stumbled over it. It might not be a problem, because we simply "skip" if we find a folio that was already isolated (possibly by us). What might happen is that we unnecessarily drain the LRU.
> 
> __collapse_huge_page_isolate() scans the compound_pagelist() list, before try-locking and isolating. But it also just "fails" instead of retrying forever.
> 
> Imagine the page tables looking like the following (e.g., COW in a MAP_PRIVATE file mapping that supports large folios)
> 
>                ------ F0P2 was replaced by a new (small) folio
>               |
> [ F0P0 ] [ F0P1 ] [ F1P0 ] [F0P3 ]
> 
> F0P0: Folio 0, page 0
> 
> Assume we try pinning that range and end up in collect_longterm_unpinnable_folios() with:
> 
> F0, F0, F1, F0
> 
> 
> Assume F0 and F1 are not long-term pinnable.
> 
> i = 0: We isolate F0
> i = 1: We see that it is the same F0 and skip
> i = 2: We isolate F1
> i = 3: We see !folio_test_lru() and do a lru_add_drain_all() to then
>         fail folio_isolate_lru()
> 
> So the drain in i=3 could be avoided by scanning the list, if we already isolated that one. Working better than I originally thought.

Thanks for spelling out that case, I was having trouble visualizing it,
but now it's clear.

OK, so looking at this, I think it could be extended to more than just
"skip the drain". It seems like we should also avoid counting the folio
(the existing code seems wrong).

So I think this approach would be correct, does it seem accurate to
you as well? Here:

diff --git a/mm/gup.c b/mm/gup.c
index ad0c8922dac3..ab8e706b52f0 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2324,11 +2324,21 @@ static unsigned long collect_longterm_unpinnable_folios(
  
  	for (i = 0; i < pofs->nr_entries; i++) {
  		struct folio *folio = pofs_get_folio(pofs, i);
+		struct folio *tmp_folio;
  
+		/*
+		 * Two checks to see if this folio has already been collected.
+		 * The first check is quick, and the second check is thorough.
+		 */
  		if (folio == prev_folio)
  			continue;
  		prev_folio = folio;
  
+		list_for_each_entry(tmp_folio, movable_folio_list, lru) {
+			if (folio == tmp_folio)
+				continue;
+		}
+
  		if (folio_is_longterm_pinnable(folio))
  			continue;



I need to test this more thoroughly, though, with a directed gup test (I'm not sure we
have one yet).
  

thanks,
-- 
John Hubbard