[v3] page_alloc: allow migration of smaller hugepages during contig_alloc

[PATCH v3] page_alloc: allow migration of smaller hugepages during contig_alloc

Posted by Gregory Price 2 months, 2 weeks ago

We presently skip regions with hugepages entirely when trying to do
contiguous page allocation.  Instead, if hugepage migration is enabled,
consider regions with hugepages smaller than the target contiguous
allocation request as valid targets for allocation.

isolate_migrate_pages_block() already expects requests with hugepages
to originate from alloc_contig, and hugetlb code also does a migratable
check when isolating in folio_isolate_hugetlb().

Suggested-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Gregory Price <gourry@gourry.net>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: David Hildenbrand <david@redhat.com>
---
 mm/page_alloc.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

---
v3: changelog updates and tags, no code changes

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a6fe1e9b9594..3b0f47f1f144 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6849,8 +6849,19 @@ static bool pfn_range_valid_contig(struct zone *z, unsigned long start_pfn,
 		if (PageReserved(page))
 			return false;
 
-		if (PageHuge(page))
-			return false;
+		if (PageHuge(page)) {
+			unsigned int order;
+
+			if (!IS_ENABLED(CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION))
+				return false;
+
+			/* Don't consider moving same size/larger pages */
+			page = compound_head(page);
+			order = compound_order(page);
+			if ((order >= MAX_FOLIO_ORDER) ||
+			    (nr_pages <= (1 << order)))
+				return false;
+		}
 	}
 	return true;
 }
-- 
2.51.1

Re: [PATCH v3] page_alloc: allow migration of smaller hugepages during contig_alloc

Posted by Joshua Hahn 2 months, 1 week ago

On Fri, 21 Nov 2025 14:15:40 -0500 Gregory Price <gourry@gourry.net> wrote:

> We presently skip regions with hugepages entirely when trying to do
> contiguous page allocation.  Instead, if hugepage migration is enabled,
> consider regions with hugepages smaller than the target contiguous
> allocation request as valid targets for allocation.
> 
> isolate_migrate_pages_block() already expects requests with hugepages
> to originate from alloc_contig, and hugetlb code also does a migratable
> check when isolating in folio_isolate_hugetlb().
> 
> Suggested-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Gregory Price <gourry@gourry.net>
> Reviewed-by: Zi Yan <ziy@nvidia.com>
> Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
> Reviewed-by: Oscar Salvador <osalvador@suse.de>
> Acked-by: David Rientjes <rientjes@google.com>
> Acked-by: David Hildenbrand <david@redhat.com>

Hello folks, sorry for arriving late to the party, it seems like the patch
has gotten a lot of reviews already. I thought I would stop by to share
some simple testing that I've done: On a machine with 62GiB memory, I tried to
see if I could allocate a bunch of 2MB hugeTLB pages, then allocate 1GB hugeTLB
pages on top to see if that attempt would succeed.

To test this, I made a really simple setup:
1. Allocate 48GB worth of 2MB hugeTLB pages (24576)
2. Allocate 4 1G hugeTLB pages

Before this patch, I get 0 1G hugeTLB pages.
After this patch, I can get all 4 requested 1G hugeTLB pages!

I would share the script, but it really is just as simple as echoing 24576
and 4 to .../hugepages-{2048kB, 1048576kB}/nr_hugepages, respsectively.
If you want to reproduce this at home, you might have to change how many
2MB pages to allocate to see this difference, depending on the size of your
machine.

With this, please feel free to add:
Tested-by: Joshua Hahn <joshua.hahnjy@gmail.com>

Have a great day, everyone!
Joshua

Re: [PATCH v3] page_alloc: allow migration of smaller hugepages during contig_alloc

Posted by Andrew Morton 2 months, 2 weeks ago

On Fri, 21 Nov 2025 14:15:40 -0500 Gregory Price <gourry@gourry.net> wrote:

> We presently skip regions with hugepages entirely when trying to do
> contiguous page allocation.  Instead, if hugepage migration is enabled,
> consider regions with hugepages smaller than the target contiguous
> allocation request as valid targets for allocation.

Why?  What benefit does this have to our users?

Some runtime testing results might be helpful?

> isolate_migrate_pages_block() already expects requests with hugepages
> to originate from alloc_contig, and hugetlb code also does a migratable
> check when isolating in folio_isolate_hugetlb().
> 
> Suggested-by: David Hildenbrand <david@redhat.com>

A Link: here might be illuminating.

> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -6849,8 +6849,19 @@ static bool pfn_range_valid_contig(struct zone *z, unsigned long start_pfn,
>  		if (PageReserved(page))
>  			return false;
>  
> -		if (PageHuge(page))
> -			return false;
> +		if (PageHuge(page)) {
> +			unsigned int order;
> +
> +			if (!IS_ENABLED(CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION))
> +				return false;
> +
> +			/* Don't consider moving same size/larger pages */

Comment says "what" (which was fairly obvious).  Please reveal "why".

> +			page = compound_head(page);
> +			order = compound_order(page);
> +			if ((order >= MAX_FOLIO_ORDER) ||
> +			    (nr_pages <= (1 << order)))
> +				return false;
> +		}
>  	}
>  	return true;
>  }

Re: [PATCH v3] page_alloc: allow migration of smaller hugepages during contig_alloc

Posted by Gregory Price 2 months, 2 weeks ago

On Fri, Nov 21, 2025 at 11:31:38AM -0800, Andrew Morton wrote:
> On Fri, 21 Nov 2025 14:15:40 -0500 Gregory Price <gourry@gourry.net> wrote:
> 
> > We presently skip regions with hugepages entirely when trying to do
> > contiguous page allocation.  Instead, if hugepage migration is enabled,
> > consider regions with hugepages smaller than the target contiguous
> > allocation request as valid targets for allocation.
> 
> Why?  What benefit does this have to our users?
> 
> Some runtime testing results might be helpful?

If multiple types of hugepages are in use, alloc_contig is less reliable.
In particular when 2MB and 1GB HugeTLB pages are present on the same system.

The same logic is actually present in isolate_migrate_pages_block() as
pointed out by David  which is called in the stack from alloc_contig -
but it's unreachable because this filters those regions.

I allude to this in the second paragraph, but it is worth spelling out
explicitly.  Will update.

> 
> > isolate_migrate_pages_block() already expects requests with hugepages
> > to originate from alloc_contig, and hugetlb code also does a migratable
> > check when isolating in folio_isolate_hugetlb().
> > 
> > Suggested-by: David Hildenbrand <david@redhat.com>
> 
> A Link: here might be illuminating.

Ah, fair point

Link: https://lore.kernel.org/linux-mm/6fe3562d-49b2-4975-aa86-e139c535ad00@redhat.com/

"""
However, it also means that we won't try moving 2MB folios to free up a
1GB folio.

That could be supported by allowing for moving hugetlb folios when their
size is small enough to be served by the buddy, and the size we are
allocating is larger than the one of these folios.
"""

> 
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -6849,8 +6849,19 @@ static bool pfn_range_valid_contig(struct zone *z, unsigned long start_pfn,
> >  		if (PageReserved(page))
> >  			return false;
> >  
> > -		if (PageHuge(page))
> > -			return false;
> > +		if (PageHuge(page)) {
> > +			unsigned int order;
> > +
> > +			if (!IS_ENABLED(CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION))
> > +				return false;
> > +
> > +			/* Don't consider moving same size/larger pages */
> 
> Comment says "what" (which was fairly obvious).  Please reveal "why".
> 

ack.

> > +			page = compound_head(page);
> > +			order = compound_order(page);
> > +			if ((order >= MAX_FOLIO_ORDER) ||
> > +			    (nr_pages <= (1 << order)))
> > +				return false;
> > +		}
> >  	}
> >  	return true;
> >  }
>