[RFC PATCH] mm, hugetlb: implement movable_gigantic_pages sysctl

Gregory Price posted 1 patch 2 months, 1 week ago
There is a newer version of this series
Documentation/admin-guide/sysctl/vm.rst | 17 +++++++++++++++++
include/linux/hugetlb.h                 |  3 ++-
mm/hugetlb.c                            | 11 +++++++++++
3 files changed, 30 insertions(+), 1 deletion(-)
[RFC PATCH] mm, hugetlb: implement movable_gigantic_pages sysctl
Posted by Gregory Price 2 months, 1 week ago
This reintroduces a concept removed by
commit d6cb41cc44c6 ("mm, hugetlb: remove hugepages_treat_as_movable sysctl")

This sysctl provides some flexibility between multiple requirements which
are difficult to square without adding significantly more complexity.

1) onlining memory in ZONE_MOVABLE to maintain hotplug compatibility
2) onlining memory in ZONE_MOVABLE to increase reliability of hugepage
   allocation.

When the user's intent for ZONE_MOVABLE is to allow more reliable huge
page allocation (as opposed to enabling hotplugability), disallowing 1GB
hugepages in this region this region is pointless.  So if hotplug is not
a requirement, we can loosen the restrictions to allow 1GB gigantic pages
in ZONE_MOVABLE.

Since 1GB can be difficult to migrate / has impacts on compaction /
defragmentation, we don't enable this by default.  However, since there
are scenarios where gigantic pages are migratable (hugetlb available in
multiple places), we should allow use of these on zone movable regions.

Note: Boot-time CMA is not possible for driver-managed hotplug memory,
as CMA requires the memory to be registered as SystemRAM at boot time.

Cc: David Hildenbrand <david@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Alexandru Moise <00moses.alexander00@gmail.com>
Suggested-by: David Rientjes <rientjes@google.com>
Signed-off-by: Gregory Price <gourry@gourry.net>
Link: https://lore.kernel.org/all/20180201193132.Hk7vI_xaU%25akpm@linux-foundation.org/
---
 Documentation/admin-guide/sysctl/vm.rst | 17 +++++++++++++++++
 include/linux/hugetlb.h                 |  3 ++-
 mm/hugetlb.c                            | 11 +++++++++++
 3 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
index 4d71211fdad8..89dcee3c3239 100644
--- a/Documentation/admin-guide/sysctl/vm.rst
+++ b/Documentation/admin-guide/sysctl/vm.rst
@@ -54,6 +54,7 @@ Currently, these files are in /proc/sys/vm:
 - mmap_min_addr
 - mmap_rnd_bits
 - mmap_rnd_compat_bits
+- movable_gigantic_pages
 - nr_hugepages
 - nr_hugepages_mempolicy
 - nr_overcommit_hugepages
@@ -624,6 +625,22 @@ This value can be changed after boot using the
 /proc/sys/vm/mmap_rnd_compat_bits tunable
 
 
+movable_gigantic_pages
+======================
+
+This parameter controls whether gigantic pages may be allocated from
+ZONE_MOVABLE. If set to non-zero, gigantic hugepages can be allocated
+from ZONE_MOVABLE. ZONE_MOVABLE memory may be created via the kernel
+boot parameter `kernelcore` or via memory hotplug as discussed in
+Documentation/admin-guide/mm/memory-hotplug.rst.
+
+Support may depend on specific architecture.
+
+Note that using ZONE_MOVABLE gigantic pages may make features like
+memory hotremove more unreliable, as migrating gigantic pages is more
+difficult due to needing larger amounts of phyiscally contiguous memory.
+
+
 nr_hugepages
 ============
 
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 526d27e88b3b..38870d21724a 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -172,6 +172,7 @@ bool hugetlbfs_pagecache_present(struct hstate *h,
 
 struct address_space *hugetlb_folio_mapping_lock_write(struct folio *folio);
 
+extern int movable_gigantic_pages __read_mostly;
 extern int sysctl_hugetlb_shm_group;
 extern struct list_head huge_boot_pages[MAX_NUMNODES];
 
@@ -916,7 +917,7 @@ static inline bool hugepage_movable_supported(struct hstate *h)
 	if (!hugepage_migration_supported(h))
 		return false;
 
-	if (hstate_is_gigantic(h))
+	if (hstate_is_gigantic(h) && !movable_gigantic_pages)
 		return false;
 	return true;
 }
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 753f99b4c718..24dbd30d1b69 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -55,6 +55,8 @@
 #include "hugetlb_cma.h"
 #include <linux/page-isolation.h>
 
+int movable_gigantic_pages;
+
 int hugetlb_max_hstate __read_mostly;
 unsigned int default_hstate_idx;
 struct hstate hstates[HUGE_MAX_HSTATE];
@@ -5195,6 +5197,15 @@ static const struct ctl_table hugetlb_table[] = {
 		.mode		= 0644,
 		.proc_handler	= hugetlb_overcommit_handler,
 	},
+#ifdef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
+	{
+		.procname	= "movable_gigantic_pages",
+		.data		= &movable_gigantic_pages,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec,
+	},
+#endif
 };
 
 static void __init hugetlb_sysctl_init(void)
-- 
2.51.0
Re: [RFC PATCH] mm, hugetlb: implement movable_gigantic_pages sysctl
Posted by David Hildenbrand 1 month, 4 weeks ago
On 09.10.25 18:15, Gregory Price wrote:
> This reintroduces a concept removed by
> commit d6cb41cc44c6 ("mm, hugetlb: remove hugepages_treat_as_movable sysctl")
> 
> This sysctl provides some flexibility between multiple requirements which
> are difficult to square without adding significantly more complexity.
> 
> 1) onlining memory in ZONE_MOVABLE to maintain hotplug compatibility
> 2) onlining memory in ZONE_MOVABLE to increase reliability of hugepage
>     allocation.
> 
> When the user's intent for ZONE_MOVABLE is to allow more reliable huge
> page allocation (as opposed to enabling hotplugability), disallowing 1GB
> hugepages in this region this region is pointless.  So if hotplug is not
> a requirement, we can loosen the restrictions to allow 1GB gigantic pages
> in ZONE_MOVABLE.
> 
> Since 1GB can be difficult to migrate / has impacts on compaction /
> defragmentation, we don't enable this by default.  However, since there
> are scenarios where gigantic pages are migratable (hugetlb available in
> multiple places), we should allow use of these on zone movable regions.
> 
> Note: Boot-time CMA is not possible for driver-managed hotplug memory,
> as CMA requires the memory to be registered as SystemRAM at boot time.
> 
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Alexandru Moise <00moses.alexander00@gmail.com>
> Suggested-by: David Rientjes <rientjes@google.com>
> Signed-off-by: Gregory Price <gourry@gourry.net>
> Link: https://lore.kernel.org/all/20180201193132.Hk7vI_xaU%25akpm@linux-foundation.org/
> ---

I just remembered one thing, maybe Oscar knows what I mean:

At some point we discussed a possible issue when 
alloc_contig_range()/alloc_contig_pages() would try to allocate a 
gigantic folio and would stumble over movable gigantic folios (possibly 
triggering some recursion when trying to move that one? Not sure).

We wanted to avoid having one gigantic folio allocation try to move 
another gigantic folio allocation.

I think your patch would not change anything in that regard: when we 
scan for a suitable range in alloc_contig_pages_noprof() we call 
pfn_range_valid_contig() .

There, we simply give up whenever we spot any PageHuge(), preventing 
this issue.

However, it also means that we won't try moving 2MB folios to free up a 
1GB folio.

That could be supported by allowing for moving hugetlb folios when their 
size is small enough to be served by the buddy, and the size we are 
allocating is larger than the one of these folios.

-- 
Cheers

David / dhildenb
Re: [RFC PATCH] mm, hugetlb: implement movable_gigantic_pages sysctl
Posted by Gregory Price 1 month, 4 weeks ago
On Mon, Oct 20, 2025 at 04:17:06PM +0200, David Hildenbrand wrote:
> On 09.10.25 18:15, Gregory Price wrote:
> That could be supported by allowing for moving hugetlb folios when their
> size is small enough to be served by the buddy, and the size we are
> allocating is larger than the one of these folios.
> 

I think this is roughly what you'd be looking for?
~Gregory

---

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5549b32cdd31..5def2c53092e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6922,8 +6922,12 @@ static bool pfn_range_valid_contig(struct zone *z, unsigned long start_pfn,
                if (PageReserved(page))
                        return false;

-               if (PageHuge(page))
-                       return false;
+               if (PageHuge(page)) {
+                       /* Don't consider moving same size/larger pages */
+                       if (!CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION ||
+                           ((1 << compound_order(page)) >= nr_pages))
+                               return false;
+               }
        }
        return true;
 }
Re: [RFC PATCH] mm, hugetlb: implement movable_gigantic_pages sysctl
Posted by Gregory Price 1 month, 4 weeks ago
On Mon, Oct 20, 2025 at 12:05:41PM -0400, Gregory Price wrote:
> On Mon, Oct 20, 2025 at 04:17:06PM +0200, David Hildenbrand wrote:
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 5549b32cdd31..5def2c53092e 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -6922,8 +6922,12 @@ static bool pfn_range_valid_contig(struct zone *z, unsigned long start_pfn,
>                 if (PageReserved(page))
>                         return false;
> 
> -               if (PageHuge(page))
> -                       return false;
> +               if (PageHuge(page)) {
> +                       /* Don't consider moving same size/larger pages */
> +                       if (!CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION ||
> +                           ((1 << compound_order(page)) >= nr_pages))
> +                               return false;
> +               }
>         }
>         return true;
>  }

Quick spot-check of the compaction code suggests this is handled
essentially the same way - and in fact compaction expects
alloc_contig to be sending compaction requests w/ hugepages.

So I'll go ahead and submit this separately for discussion.

~Gregory

---

->isolate_migratepages_range()
->isolate_migratepages_block()
{
...
	if (PageHuge(page)) {
		const unsigned int order = compound_order(page);
		/*
		 * skip hugetlbfs if we are not compacting for pages
		 * bigger than its order. THPs and other compound pages
		 * are handled below.
		 */
		 ...
		/* for alloc_contig case */
		if (locked) {
			unlock_page_lruvec_irqrestore(locked, flags);
			locked = NULL;
		}

	 }
...
	/*
	 * Regardless of being on LRU, compound pages such as THP
	 * (hugetlbfs is handled above) are not to be compacted unless
	 * we are attempting an allocation larger than the compound
	 * page size. We can potentially save a lot of iterations if we
	 * skip them at once. The check is racy, but we can consider
	 * only valid values and the only danger is skipping too much.
	 */
	if (PageCompound(page) && !cc->alloc_contig) {
		const unsigned int order = compound_order(page);

		/* Skip based on page order and compaction target order. */
		if (skip_isolation_on_order(order, cc->order)) {
			if (order <= MAX_PAGE_ORDER) {
				low_pfn += (1UL << order) - 1;
				nr_scanned += (1UL << order) - 1;
			}
			goto isolate_fail;
		}
	}
}
Re: [RFC PATCH] mm, hugetlb: implement movable_gigantic_pages sysctl
Posted by Gregory Price 1 month, 4 weeks ago
On Mon, Oct 20, 2025 at 04:17:06PM +0200, David Hildenbrand wrote:
> On 09.10.25 18:15, Gregory Price wrote:
> 
> However, it also means that we won't try moving 2MB folios to free up a 1GB
> folio.
>

This may actually explain some other behavior we've been seeing, re:
reliability of 1GB allocations.  Let me ask some folks to look at this.

Thanks David,
~Gregory