[PATCH v2] mm: migrate: requeue destination folio on deferred split queue

Usama Arif posted 1 patch 4 weeks, 1 day ago
There is a newer version of this series
mm/migrate.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
[PATCH v2] mm: migrate: requeue destination folio on deferred split queue
Posted by Usama Arif 4 weeks, 1 day ago
During folio migration, __folio_migrate_mapping() removes the source
folio from the deferred split queue, but the destination folio is never
re-queued.  This causes underutilized THPs to escape the shrinker after
NUMA migration, since they silently drop off the deferred split list.

Fix this by recording whether the source folio was on the deferred split
queue and its partially mapped state before move_to_new_folio() unqueues
it, and re-queuing the destination folio after a successful migration if
it was.

By the time migrate_folio_move() runs, partially mapped folios without a
pin have already been split by migrate_pages_batch().  So only two cases
remain on the deferred list at this point:
  1. Partially mapped folios with a pin (split failed).
  2. Fully mapped but potentially underused folios.
The recorded partially_mapped state is forwarded to deferred_split_folio()
so that the destination folio is correctly re-queued in both cases.

Reported-by: Johannes Weiner <hannes@cmpxchg.org>
Fixes: dafff3f4c850 ("mm: split underused THPs")
Signed-off-by: Usama Arif <usama.arif@linux.dev>
---
v1 -> v2:
- record whether source folio was on the deferred split queue before
  move_to_folio() (David)
- record partially mapped state and update commit message (Zi)
---
 mm/migrate.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/mm/migrate.c b/mm/migrate.c
index ece77ccb2ec0..61013d258eb4 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1360,6 +1360,8 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
 	int rc;
 	int old_page_state = 0;
 	struct anon_vma *anon_vma = NULL;
+	bool src_deferred_split = false;
+	bool src_partially_mapped = false;
 	struct list_head *prev;
 
 	__migrate_folio_extract(dst, &old_page_state, &anon_vma);
@@ -1373,6 +1375,12 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
 		goto out_unlock_both;
 	}
 
+	if (folio_test_large(src) && folio_test_large_rmappable(src) &&
+	    !data_race(list_empty(&src->_deferred_list))) {
+		src_deferred_split = true;
+		src_partially_mapped = folio_test_partially_mapped(src);
+	}
+
 	rc = move_to_new_folio(dst, src, mode);
 	if (rc)
 		goto out;
@@ -1393,6 +1401,15 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
 	if (old_page_state & PAGE_WAS_MAPPED)
 		remove_migration_ptes(src, dst, 0);
 
+	/*
+	 * Requeue the destination folio on the deferred split queue if
+	 * the source was on the queue.  The source is unqueued in
+	 * __folio_migrate_mapping(), so we recorded the state from
+	 * before move_to_new_folio().
+	 */
+	if (src_deferred_split)
+		deferred_split_folio(dst, src_partially_mapped);
+
 out_unlock_both:
 	folio_unlock(dst);
 	folio_set_owner_migrate_reason(dst, reason);
-- 
2.47.3
Re: [PATCH v2] mm: migrate: requeue destination folio on deferred split queue
Posted by Wei Yang 4 weeks ago
On Tue, Mar 10, 2026 at 03:54:19AM -0700, Usama Arif wrote:
>During folio migration, __folio_migrate_mapping() removes the source
>folio from the deferred split queue, but the destination folio is never
>re-queued.  This causes underutilized THPs to escape the shrinker after
>NUMA migration, since they silently drop off the deferred split list.
>
>Fix this by recording whether the source folio was on the deferred split
>queue and its partially mapped state before move_to_new_folio() unqueues
>it, and re-queuing the destination folio after a successful migration if
>it was.
>
>By the time migrate_folio_move() runs, partially mapped folios without a
>pin have already been split by migrate_pages_batch().  So only two cases
>remain on the deferred list at this point:
>  1. Partially mapped folios with a pin (split failed).
>  2. Fully mapped but potentially underused folios.
>The recorded partially_mapped state is forwarded to deferred_split_folio()
>so that the destination folio is correctly re-queued in both cases.
>
>Reported-by: Johannes Weiner <hannes@cmpxchg.org>
>Fixes: dafff3f4c850 ("mm: split underused THPs")
>Signed-off-by: Usama Arif <usama.arif@linux.dev>
>---
>v1 -> v2:
>- record whether source folio was on the deferred split queue before
>  move_to_folio() (David)
>- record partially mapped state and update commit message (Zi)
>---
> mm/migrate.c | 17 +++++++++++++++++
> 1 file changed, 17 insertions(+)
>
>diff --git a/mm/migrate.c b/mm/migrate.c
>index ece77ccb2ec0..61013d258eb4 100644
>--- a/mm/migrate.c
>+++ b/mm/migrate.c
>@@ -1360,6 +1360,8 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
> 	int rc;
> 	int old_page_state = 0;
> 	struct anon_vma *anon_vma = NULL;
>+	bool src_deferred_split = false;
>+	bool src_partially_mapped = false;
> 	struct list_head *prev;
> 
> 	__migrate_folio_extract(dst, &old_page_state, &anon_vma);
>@@ -1373,6 +1375,12 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
> 		goto out_unlock_both;
> 	}
> 
>+	if (folio_test_large(src) && folio_test_large_rmappable(src) &&
>+	    !data_race(list_empty(&src->_deferred_list))) {

We usually check order > 1, before accessing _deferred_list, because it is in
subpage 2.

I am not sure why we don't do it here. Do I miss something?

>+		src_deferred_split = true;
>+		src_partially_mapped = folio_test_partially_mapped(src);
>+	}
>+
> 	rc = move_to_new_folio(dst, src, mode);
> 	if (rc)
> 		goto out;
>@@ -1393,6 +1401,15 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
> 	if (old_page_state & PAGE_WAS_MAPPED)
> 		remove_migration_ptes(src, dst, 0);
> 
>+	/*
>+	 * Requeue the destination folio on the deferred split queue if
>+	 * the source was on the queue.  The source is unqueued in
>+	 * __folio_migrate_mapping(), so we recorded the state from
>+	 * before move_to_new_folio().
>+	 */
>+	if (src_deferred_split)
>+		deferred_split_folio(dst, src_partially_mapped);
>+
> out_unlock_both:
> 	folio_unlock(dst);
> 	folio_set_owner_migrate_reason(dst, reason);
>-- 
>2.47.3
>

-- 
Wei Yang
Help you, Help me
Re: [PATCH v2] mm: migrate: requeue destination folio on deferred split queue
Posted by David Hildenbrand (Arm) 3 weeks, 6 days ago
On 3/12/26 04:18, Wei Yang wrote:
> On Tue, Mar 10, 2026 at 03:54:19AM -0700, Usama Arif wrote:
>> During folio migration, __folio_migrate_mapping() removes the source
>> folio from the deferred split queue, but the destination folio is never
>> re-queued.  This causes underutilized THPs to escape the shrinker after
>> NUMA migration, since they silently drop off the deferred split list.
>>
>> Fix this by recording whether the source folio was on the deferred split
>> queue and its partially mapped state before move_to_new_folio() unqueues
>> it, and re-queuing the destination folio after a successful migration if
>> it was.
>>
>> By the time migrate_folio_move() runs, partially mapped folios without a
>> pin have already been split by migrate_pages_batch().  So only two cases
>> remain on the deferred list at this point:
>>  1. Partially mapped folios with a pin (split failed).
>>  2. Fully mapped but potentially underused folios.
>> The recorded partially_mapped state is forwarded to deferred_split_folio()
>> so that the destination folio is correctly re-queued in both cases.
>>
>> Reported-by: Johannes Weiner <hannes@cmpxchg.org>
>> Fixes: dafff3f4c850 ("mm: split underused THPs")
>> Signed-off-by: Usama Arif <usama.arif@linux.dev>
>> ---
>> v1 -> v2:
>> - record whether source folio was on the deferred split queue before
>>  move_to_folio() (David)
>> - record partially mapped state and update commit message (Zi)
>> ---
>> mm/migrate.c | 17 +++++++++++++++++
>> 1 file changed, 17 insertions(+)
>>
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index ece77ccb2ec0..61013d258eb4 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -1360,6 +1360,8 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
>> 	int rc;
>> 	int old_page_state = 0;
>> 	struct anon_vma *anon_vma = NULL;
>> +	bool src_deferred_split = false;
>> +	bool src_partially_mapped = false;
>> 	struct list_head *prev;
>>
>> 	__migrate_folio_extract(dst, &old_page_state, &anon_vma);
>> @@ -1373,6 +1375,12 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
>> 		goto out_unlock_both;
>> 	}
>>
>> +	if (folio_test_large(src) && folio_test_large_rmappable(src) &&
>> +	    !data_race(list_empty(&src->_deferred_list))) {
> 
> We usually check order > 1, before accessing _deferred_list, because it is in
> subpage 2.
> 
> I am not sure why we don't do it here. Do I miss something?

Valid point! non-anon folios could trigger that.

-- 
Cheers,

David
Re: [PATCH v2] mm: migrate: requeue destination folio on deferred split queue
Posted by David Hildenbrand (Arm) 4 weeks ago
On 3/10/26 11:54, Usama Arif wrote:
> During folio migration, __folio_migrate_mapping() removes the source
> folio from the deferred split queue, but the destination folio is never
> re-queued.  This causes underutilized THPs to escape the shrinker after
> NUMA migration, since they silently drop off the deferred split list.
> 
> Fix this by recording whether the source folio was on the deferred split
> queue and its partially mapped state before move_to_new_folio() unqueues
> it, and re-queuing the destination folio after a successful migration if
> it was.
> 
> By the time migrate_folio_move() runs, partially mapped folios without a
> pin have already been split by migrate_pages_batch().  So only two cases
> remain on the deferred list at this point:
>   1. Partially mapped folios with a pin (split failed).
>   2. Fully mapped but potentially underused folios.
> The recorded partially_mapped state is forwarded to deferred_split_folio()
> so that the destination folio is correctly re-queued in both cases.
> 
> Reported-by: Johannes Weiner <hannes@cmpxchg.org>
> Fixes: dafff3f4c850 ("mm: split underused THPs")
> Signed-off-by: Usama Arif <usama.arif@linux.dev>
> ---
> v1 -> v2:
> - record whether source folio was on the deferred split queue before
>   move_to_folio() (David)
> - record partially mapped state and update commit message (Zi)
> ---
>  mm/migrate.c | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/mm/migrate.c b/mm/migrate.c
> index ece77ccb2ec0..61013d258eb4 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1360,6 +1360,8 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
>  	int rc;
>  	int old_page_state = 0;
>  	struct anon_vma *anon_vma = NULL;
> +	bool src_deferred_split = false;
> +	bool src_partially_mapped = false;
>  	struct list_head *prev;
>  
>  	__migrate_folio_extract(dst, &old_page_state, &anon_vma);
> @@ -1373,6 +1375,12 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
>  		goto out_unlock_both;
>  	}
>  
> +	if (folio_test_large(src) && folio_test_large_rmappable(src) &&

I don't think the folio_test_large_rmappable() check is required. Other
folios we migrate here would always have _deferred_list initialized but
unused.

Acked-by: David Hildenbrand (Arm) <david@kernel.org>

-- 
Cheers,

David
Re: [PATCH v2] mm: migrate: requeue destination folio on deferred split queue
Posted by Usama Arif 4 weeks ago

On 11/03/2026 12:23, David Hildenbrand (Arm) wrote:
> On 3/10/26 11:54, Usama Arif wrote:
>> During folio migration, __folio_migrate_mapping() removes the source
>> folio from the deferred split queue, but the destination folio is never
>> re-queued.  This causes underutilized THPs to escape the shrinker after
>> NUMA migration, since they silently drop off the deferred split list.
>>
>> Fix this by recording whether the source folio was on the deferred split
>> queue and its partially mapped state before move_to_new_folio() unqueues
>> it, and re-queuing the destination folio after a successful migration if
>> it was.
>>
>> By the time migrate_folio_move() runs, partially mapped folios without a
>> pin have already been split by migrate_pages_batch().  So only two cases
>> remain on the deferred list at this point:
>>   1. Partially mapped folios with a pin (split failed).
>>   2. Fully mapped but potentially underused folios.
>> The recorded partially_mapped state is forwarded to deferred_split_folio()
>> so that the destination folio is correctly re-queued in both cases.
>>
>> Reported-by: Johannes Weiner <hannes@cmpxchg.org>
>> Fixes: dafff3f4c850 ("mm: split underused THPs")
>> Signed-off-by: Usama Arif <usama.arif@linux.dev>
>> ---
>> v1 -> v2:
>> - record whether source folio was on the deferred split queue before
>>   move_to_folio() (David)
>> - record partially mapped state and update commit message (Zi)
>> ---
>>  mm/migrate.c | 17 +++++++++++++++++
>>  1 file changed, 17 insertions(+)
>>
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index ece77ccb2ec0..61013d258eb4 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -1360,6 +1360,8 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
>>  	int rc;
>>  	int old_page_state = 0;
>>  	struct anon_vma *anon_vma = NULL;
>> +	bool src_deferred_split = false;
>> +	bool src_partially_mapped = false;
>>  	struct list_head *prev;
>>  
>>  	__migrate_folio_extract(dst, &old_page_state, &anon_vma);
>> @@ -1373,6 +1375,12 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private,
>>  		goto out_unlock_both;
>>  	}
>>  
>> +	if (folio_test_large(src) && folio_test_large_rmappable(src) &&
> 
> I don't think the folio_test_large_rmappable() check is required. Other
> folios we migrate here would always have _deferred_list initialized but
> unused.
> 
> Acked-by: David Hildenbrand (Arm) <david@kernel.org>
> 


I have been auditing the THP shrinker code when it comes to NUMA migration and I think we need
another fix for this. I have sent it here https://lore.kernel.org/all/20260311132342.3193160-1-usama.arif@linux.dev/
Re: [PATCH v2] mm: migrate: requeue destination folio on deferred split queue
Posted by Zi Yan 4 weeks, 1 day ago
On 10 Mar 2026, at 6:54, Usama Arif wrote:

> During folio migration, __folio_migrate_mapping() removes the source
> folio from the deferred split queue, but the destination folio is never
> re-queued.  This causes underutilized THPs to escape the shrinker after
> NUMA migration, since they silently drop off the deferred split list.
>
> Fix this by recording whether the source folio was on the deferred split
> queue and its partially mapped state before move_to_new_folio() unqueues
> it, and re-queuing the destination folio after a successful migration if
> it was.
>
> By the time migrate_folio_move() runs, partially mapped folios without a
> pin have already been split by migrate_pages_batch().  So only two cases
> remain on the deferred list at this point:
>   1. Partially mapped folios with a pin (split failed).
>   2. Fully mapped but potentially underused folios.
> The recorded partially_mapped state is forwarded to deferred_split_folio()
> so that the destination folio is correctly re-queued in both cases.
>
> Reported-by: Johannes Weiner <hannes@cmpxchg.org>
> Fixes: dafff3f4c850 ("mm: split underused THPs")
> Signed-off-by: Usama Arif <usama.arif@linux.dev>
> ---
> v1 -> v2:
> - record whether source folio was on the deferred split queue before
>   move_to_folio() (David)
> - record partially mapped state and update commit message (Zi)
> ---
>  mm/migrate.c | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
>
LGTM.

Acked-by: Zi Yan <ziy@nvidia.com>

Best Regards,
Yan, Zi