mm/page_alloc.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
We presently skip regions with hugepages entirely when trying to do
contiguous page allocation. Instead, if hugepage migration is enabled,
consider regions with hugepages smaller than the target contiguous
allocation request as valid targets for allocation.
Compaction `isolate_migrate_pages_block()` already expects requests
with hugepages to originate from alloc_contig, and hugetlb code also
does a migratable check when isolating in `folio_isolate_hugetlb()`.
We add the migration check here to avoid calling compaction on a
region if we know migration is not possible at all.
Suggested-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Gregory Price <gourry@gourry.net>
---
mm/page_alloc.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 600d9e981c23..e0760eafe032 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7048,8 +7048,14 @@ static bool pfn_range_valid_contig(struct zone *z, unsigned long start_pfn,
if (PageReserved(page))
return false;
- if (PageHuge(page))
- return false;
+ if (PageHuge(page)) {
+ struct folio *folio = page_folio(page);
+
+ /* Don't consider moving same size/larger pages */
+ if (!folio_test_hugetlb_migratable(folio) ||
+ (1 << folio_order(folio) >= nr_pages))
+ return false;
+ }
}
return true;
}
--
2.51.0
On 20.10.25 19:06, Gregory Price wrote:
> We presently skip regions with hugepages entirely when trying to do
> contiguous page allocation. Instead, if hugepage migration is enabled,
> consider regions with hugepages smaller than the target contiguous
> allocation request as valid targets for allocation.
>
> Compaction `isolate_migrate_pages_block()` already expects requests
> with hugepages to originate from alloc_contig, and hugetlb code also
> does a migratable check when isolating in `folio_isolate_hugetlb()`.
>
> We add the migration check here to avoid calling compaction on a
> region if we know migration is not possible at all.
>
> Suggested-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Gregory Price <gourry@gourry.net>
> ---
> mm/page_alloc.c | 10 ++++++++--
> 1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 600d9e981c23..e0760eafe032 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -7048,8 +7048,14 @@ static bool pfn_range_valid_contig(struct zone *z, unsigned long start_pfn,
> if (PageReserved(page))
> return false;
>
> - if (PageHuge(page))
> - return false;
> + if (PageHuge(page)) {
> + struct folio *folio = page_folio(page);
> +
> + /* Don't consider moving same size/larger pages */
> + if (!folio_test_hugetlb_migratable(folio) ||
> + (1 << folio_order(folio) >= nr_pages))
We have folio_nr_pages().
Do we really need the folio_hugetlb_migratable() check?
> + return false;
This code is completely racy. folio_nr_pages() should be fine AFAIKT (no
VM_WARN_ON() etc), not sure about folio_test_hugetlb_migratable().
If it becomes a problem we could do a snapshot_page() to take a snapshot
we can query.
--
Cheers
David / dhildenb
On Mon, Oct 20, 2025 at 07:24:04PM +0200, David Hildenbrand wrote: > On 20.10.25 19:06, Gregory Price wrote: > > Do we really need the folio_hugetlb_migratable() check? > This code is completely racy. My thought was it's better to check if any *one* folio in the bunch is non-migratable, it's better to never even call compaction in the first place. But you're right, this is racy. In one race, the compaction code will just fail if this bit gets set between now and the isolate call in folio_isolate_hugetlb() - resulting in searching the next block anyway. So that seemed ok? In the other race, the bit becomes un-set and we skip a block that might otherwise be valid. I can drop this check, it's just an optimistic optimization anyway. I should also probably check CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION here regardless, since we should skip compaction if migration isn't possible. >> folio_nr_pages() should be fine AFAIKT (no > VM_WARN_ON() etc), not sure about folio_test_hugetlb_migratable(). will change, and will check/change based on above thoughts. ~Gregory
On 20 Oct 2025, at 13:41, Gregory Price wrote: > On Mon, Oct 20, 2025 at 07:24:04PM +0200, David Hildenbrand wrote: >> On 20.10.25 19:06, Gregory Price wrote: >> >> Do we really need the folio_hugetlb_migratable() check? >> This code is completely racy. > > My thought was it's better to check if any *one* folio in the bunch is > non-migratable, it's better to never even call compaction in the first > place. But you're right, this is racy. > > In one race, the compaction code will just fail if this bit gets set > between now and the isolate call in folio_isolate_hugetlb() - resulting > in searching the next block anyway. So that seemed ok? > > In the other race, the bit becomes un-set and we skip a block that might > otherwise be valid. > > I can drop this check, it's just an optimistic optimization anyway. > > I should also probably check CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION here > regardless, since we should skip compaction if migration isn't possible. > >>> folio_nr_pages() should be fine AFAIKT (no >> VM_WARN_ON() etc), not sure about folio_test_hugetlb_migratable(). > > will change, and will check/change based on above thoughts. If it is racy, could folio_order() or folio_nr_pages() return a bogusly large and cause a wrong result? In isolate_migratepages_block(), compound_order(page) is used and checked against MAX_PAGE_ORDER to avoid a bogus page order. I wonder if we should use the same pattern here. Basically, what is the right way of checking a folio order without lock? Should we have a standardized helper function for that? -- Best Regards, Yan, Zi
On 20.10.25 21:15, Zi Yan wrote: > On 20 Oct 2025, at 13:41, Gregory Price wrote: > >> On Mon, Oct 20, 2025 at 07:24:04PM +0200, David Hildenbrand wrote: >>> On 20.10.25 19:06, Gregory Price wrote: >>> >>> Do we really need the folio_hugetlb_migratable() check? >>> This code is completely racy. >> >> My thought was it's better to check if any *one* folio in the bunch is >> non-migratable, it's better to never even call compaction in the first >> place. But you're right, this is racy. >> >> In one race, the compaction code will just fail if this bit gets set >> between now and the isolate call in folio_isolate_hugetlb() - resulting >> in searching the next block anyway. So that seemed ok? >> >> In the other race, the bit becomes un-set and we skip a block that might >> otherwise be valid. >> >> I can drop this check, it's just an optimistic optimization anyway. >> >> I should also probably check CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION here >> regardless, since we should skip compaction if migration isn't possible. >> >>>> folio_nr_pages() should be fine AFAIKT (no >>> VM_WARN_ON() etc), not sure about folio_test_hugetlb_migratable(). >> >> will change, and will check/change based on above thoughts. > > If it is racy, could folio_order() or folio_nr_pages() return a bogusly > large and cause a wrong result? > > In isolate_migratepages_block(), compound_order(page) is used and checked > against MAX_PAGE_ORDER to avoid a bogus page order. I wonder if we should > use the same pattern here. > > Basically, what is the right way of checking a folio order without lock? > Should we have a standardized helper function for that? As raised, snapshot_page() tries to stabilize the folio best it can. -- Cheers David / dhildenb
On Mon, Oct 20, 2025 at 09:18:36PM +0200, David Hildenbrand wrote: > > > > Basically, what is the right way of checking a folio order without lock? > > Should we have a standardized helper function for that? > > As raised, snapshot_page() tries to stabilize the folio best it can. is snapshot_page() even worth it if we're already racing on flag checks? i.e. there's already a race condition between pfn_range_valid_contig(range) -> compaction(range) on some bogus value the worst that happens is either compaction gets called when it can't compact, or compaction doesn't get called when it could compact - either way, compaction still handles it if called. We may just skip some blocks (which is still better than now). -- on compound_order - from mm.h: /* * compound_order() can be called without holding a reference, which means * that niceties like page_folio() don't work. These callers should be * prepared to handle wild return values. For example, PG_head may be * set before the order is initialised, or this may be a tail page. * See compaction.c for some good examples. */ Seems like the correct interface given the scenario. I'll poke around. ~Gregory
On 20.10.25 21:40, Gregory Price wrote: > On Mon, Oct 20, 2025 at 09:18:36PM +0200, David Hildenbrand wrote: >>> >>> Basically, what is the right way of checking a folio order without lock? >>> Should we have a standardized helper function for that? >> >> As raised, snapshot_page() tries to stabilize the folio best it can. > > is snapshot_page() even worth it if we're already racing on flag checks? I think it tries to handle what compound_order() cannot easily handle, as it will retry in case it detects an obvious race. > > i.e. there's already a race condition between > > pfn_range_valid_contig(range) -> compaction(range) Can you elaborate how compaction comes into play here? I'm missing the interaction. pfn_range_valid_contig() should be only called by alloc_contig_pages() and not out of compaction context? > > on some bogus value the worst that happens is either compaction gets > called when it can't compact, or compaction doesn't get called when it > could compact - either way, compaction still handles it if called. > > We may just skip some blocks (which is still better than now). > > -- > > on compound_order - from mm.h: > > /* > * compound_order() can be called without holding a reference, which means > * that niceties like page_folio() don't work. These callers should be > * prepared to handle wild return values. For example, PG_head may be > * set before the order is initialised, or this may be a tail page. > * See compaction.c for some good examples. > */ > > Seems like the correct interface given the scenario. I'll poke around. Yes, avoiding folios altogether is even better. As documented, it can result in crazy values due to races that must be handled (like compaction, yes). -- Cheers David / dhildenb
On Mon, Oct 20, 2025 at 09:46:21PM +0200, David Hildenbrand wrote:
> On 20.10.25 21:40, Gregory Price wrote:
> > On Mon, Oct 20, 2025 at 09:18:36PM +0200, David Hildenbrand wrote:
> > > >
> > > > Basically, what is the right way of checking a folio order without lock?
> > > > Should we have a standardized helper function for that?
> > >
> > > As raised, snapshot_page() tries to stabilize the folio best it can.
> >
> > is snapshot_page() even worth it if we're already racing on flag checks?
>
> I think it tries to handle what compound_order() cannot easily handle, as it
> will retry in case it detects an obvious race.
>
> >
> > i.e. there's already a race condition between
> >
> > pfn_range_valid_contig(range) -> compaction(range)
>
> Can you elaborate how compaction comes into play here? I'm missing the
> interaction.
>
> pfn_range_valid_contig() should be only called by alloc_contig_pages() and
> not out of compaction context?
>
I've been digging through the code a bit, so a quick shot from my notes
alloc_contig_pages_noprof
if (pfn_range_valid_contig(range)) <- check validity
__alloc_contig_pages(range)
alloc_contig_range_noprof(range)
start_isolate_page_range(range) <- isolate
__alloc_contig_migrate_range(range)
isolate_migratepages_range(range) <- compact
Seems like all the checks done in pfn_range_valid_contig() already race
with everything after it anyway since references aren't held? Any of
those pages could be freed (get bogus values), but i suppose not
allocated (given the zone lock is held)?
> > Seems like the correct interface given the scenario. I'll poke around.
>
> Yes, avoiding folios altogether is even better. As documented, it can result
> in crazy values due to races that must be handled (like compaction, yes).
>
i'll make this swap then.
~Gregory
On 20.10.25 21:58, Gregory Price wrote: > On Mon, Oct 20, 2025 at 09:46:21PM +0200, David Hildenbrand wrote: >> On 20.10.25 21:40, Gregory Price wrote: >>> On Mon, Oct 20, 2025 at 09:18:36PM +0200, David Hildenbrand wrote: >>>>> >>>>> Basically, what is the right way of checking a folio order without lock? >>>>> Should we have a standardized helper function for that? >>>> >>>> As raised, snapshot_page() tries to stabilize the folio best it can. >>> >>> is snapshot_page() even worth it if we're already racing on flag checks? >> >> I think it tries to handle what compound_order() cannot easily handle, as it >> will retry in case it detects an obvious race. >> >>> >>> i.e. there's already a race condition between >>> >>> pfn_range_valid_contig(range) -> compaction(range) >> >> Can you elaborate how compaction comes into play here? I'm missing the >> interaction. >> >> pfn_range_valid_contig() should be only called by alloc_contig_pages() and >> not out of compaction context? >> > > I've been digging through the code a bit, so a quick shot from my notes > > alloc_contig_pages_noprof > if (pfn_range_valid_contig(range)) <- check validity > __alloc_contig_pages(range) > alloc_contig_range_noprof(range) > start_isolate_page_range(range) <- isolate > __alloc_contig_migrate_range(range) > isolate_migratepages_range(range) <- compact Oh, that's what you mean with "compact", it's just isolation+migration. > > Seems like all the checks done in pfn_range_valid_contig() already race > with everything after it anyway since references aren't held? Any of > those pages could be freed (get bogus values), but i suppose not > allocated (given the zone lock is held)? Yes, it's completely racy. I was primarily concerned about us calling functions that will VM_WARN_ON() etc due to the races; not that they would make us accept/jump over a range although we shouldn't. Of course, regarding the latter, we want to try as good as possible to avoid jumping over ranges that we can actually handle. -- Cheers David / dhildenb
On Mon, Oct 20, 2025 at 10:17:42PM +0200, David Hildenbrand wrote: > Yes, it's completely racy. > > I was primarily concerned about us calling functions that will VM_WARN_ON() > etc due to the races; not that they would make us accept/jump over a range > although we shouldn't. > > Of course, regarding the latter, we want to try as good as possible to avoid > jumping over ranges that we can actually handle. > I'll go ahead and add a snapshot_page. ~Gregory
On 20.10.25 22:27, Gregory Price wrote: > On Mon, Oct 20, 2025 at 10:17:42PM +0200, David Hildenbrand wrote: >> Yes, it's completely racy. >> >> I was primarily concerned about us calling functions that will VM_WARN_ON() >> etc due to the races; not that they would make us accept/jump over a range >> although we shouldn't. >> >> Of course, regarding the latter, we want to try as good as possible to avoid >> jumping over ranges that we can actually handle. >> > > I'll go ahead and add a snapshot_page. I'd say it's probably good enough initially to just use compound_head() and filter out crazy values. -- Cheers David / dhildenb
© 2016 - 2026 Red Hat, Inc.