Device memory migration has two call sites that split huge PMDs:
migrate_vma_split_unmapped_folio():
Called from migrate_vma_pages() when migrating a PMD-mapped THP to a
destination that doesn't support compound pages. It splits the PMD
then splits the folio via folio_split_unmapped().
If the PMD split fails, folio_split_unmapped() would operate on an
unsplit folio with inconsistent page table state. Propagate -ENOMEM
to skip this page's migration. This is safe as folio_split_unmapped
failure would be propagated in a similar way.
migrate_vma_insert_page():
Called from migrate_vma_pages() when inserting a page into a VMA
during migration back from device memory. If a huge zero PMD exists
at the target address, it must be split before PTE insertion.
If the split fails, the subsequent pte_alloc() and set_pte_at() would
operate on a PMD slot still occupied by the huge zero entry. Use
goto abort, consistent with other allocation failures in this function.
Signed-off-by: Usama Arif <usama.arif@linux.dev>
---
mm/migrate_device.c | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/mm/migrate_device.c b/mm/migrate_device.c
index 78c7acf024615..bc53e06fd9735 100644
--- a/mm/migrate_device.c
+++ b/mm/migrate_device.c
@@ -909,7 +909,13 @@ static int migrate_vma_split_unmapped_folio(struct migrate_vma *migrate,
int ret = 0;
folio_get(folio);
- split_huge_pmd_address(migrate->vma, addr, true);
+ /*
+ * If PMD split fails, folio_split_unmapped would operate on an
+ * unsplit folio with inconsistent page table state.
+ */
+ ret = split_huge_pmd_address(migrate->vma, addr, true);
+ if (ret)
+ return ret;
ret = folio_split_unmapped(folio, 0);
if (ret)
return ret;
@@ -1005,7 +1011,13 @@ static void migrate_vma_insert_page(struct migrate_vma *migrate,
if (pmd_trans_huge(*pmdp)) {
if (!is_huge_zero_pmd(*pmdp))
goto abort;
- split_huge_pmd(vma, pmdp, addr);
+ /*
+ * If split fails, the huge zero PMD remains and
+ * pte_alloc/PTE insertion that follows would be
+ * incorrect.
+ */
+ if (split_huge_pmd(vma, pmdp, addr))
+ goto abort;
} else if (pmd_leaf(*pmdp))
goto abort;
}
--
2.47.3
On Thu, Feb 26, 2026 at 4:34 AM Usama Arif <usama.arif@linux.dev> wrote:
>
> Device memory migration has two call sites that split huge PMDs:
>
> migrate_vma_split_unmapped_folio():
> Called from migrate_vma_pages() when migrating a PMD-mapped THP to a
> destination that doesn't support compound pages. It splits the PMD
> then splits the folio via folio_split_unmapped().
>
> If the PMD split fails, folio_split_unmapped() would operate on an
> unsplit folio with inconsistent page table state. Propagate -ENOMEM
> to skip this page's migration. This is safe as folio_split_unmapped
> failure would be propagated in a similar way.
>
> migrate_vma_insert_page():
> Called from migrate_vma_pages() when inserting a page into a VMA
> during migration back from device memory. If a huge zero PMD exists
> at the target address, it must be split before PTE insertion.
>
> If the split fails, the subsequent pte_alloc() and set_pte_at() would
> operate on a PMD slot still occupied by the huge zero entry. Use
> goto abort, consistent with other allocation failures in this function.
>
> Signed-off-by: Usama Arif <usama.arif@linux.dev>
> ---
> mm/migrate_device.c | 16 ++++++++++++++--
> 1 file changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/mm/migrate_device.c b/mm/migrate_device.c
> index 78c7acf024615..bc53e06fd9735 100644
> --- a/mm/migrate_device.c
> +++ b/mm/migrate_device.c
> @@ -909,7 +909,13 @@ static int migrate_vma_split_unmapped_folio(struct migrate_vma *migrate,
> int ret = 0;
>
> folio_get(folio);
Should we be concerned about this folio_get? Are we incrementing a
reference that was already held if we back out of the split?
-- Nico
> - split_huge_pmd_address(migrate->vma, addr, true);
> + /*
> + * If PMD split fails, folio_split_unmapped would operate on an
> + * unsplit folio with inconsistent page table state.
> + */
> + ret = split_huge_pmd_address(migrate->vma, addr, true);
> + if (ret)
> + return ret;
> ret = folio_split_unmapped(folio, 0);
> if (ret)
> return ret;
> @@ -1005,7 +1011,13 @@ static void migrate_vma_insert_page(struct migrate_vma *migrate,
> if (pmd_trans_huge(*pmdp)) {
> if (!is_huge_zero_pmd(*pmdp))
> goto abort;
> - split_huge_pmd(vma, pmdp, addr);
> + /*
> + * If split fails, the huge zero PMD remains and
> + * pte_alloc/PTE insertion that follows would be
> + * incorrect.
> + */
> + if (split_huge_pmd(vma, pmdp, addr))
> + goto abort;
> } else if (pmd_leaf(*pmdp))
> goto abort;
> }
> --
> 2.47.3
>
On 02/03/2026 21:20, Nico Pache wrote:
> On Thu, Feb 26, 2026 at 4:34 AM Usama Arif <usama.arif@linux.dev> wrote:
>>
>> Device memory migration has two call sites that split huge PMDs:
>>
>> migrate_vma_split_unmapped_folio():
>> Called from migrate_vma_pages() when migrating a PMD-mapped THP to a
>> destination that doesn't support compound pages. It splits the PMD
>> then splits the folio via folio_split_unmapped().
>>
>> If the PMD split fails, folio_split_unmapped() would operate on an
>> unsplit folio with inconsistent page table state. Propagate -ENOMEM
>> to skip this page's migration. This is safe as folio_split_unmapped
>> failure would be propagated in a similar way.
>>
>> migrate_vma_insert_page():
>> Called from migrate_vma_pages() when inserting a page into a VMA
>> during migration back from device memory. If a huge zero PMD exists
>> at the target address, it must be split before PTE insertion.
>>
>> If the split fails, the subsequent pte_alloc() and set_pte_at() would
>> operate on a PMD slot still occupied by the huge zero entry. Use
>> goto abort, consistent with other allocation failures in this function.
>>
>> Signed-off-by: Usama Arif <usama.arif@linux.dev>
>> ---
>> mm/migrate_device.c | 16 ++++++++++++++--
>> 1 file changed, 14 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/migrate_device.c b/mm/migrate_device.c
>> index 78c7acf024615..bc53e06fd9735 100644
>> --- a/mm/migrate_device.c
>> +++ b/mm/migrate_device.c
>> @@ -909,7 +909,13 @@ static int migrate_vma_split_unmapped_folio(struct migrate_vma *migrate,
>> int ret = 0;
>>
>> folio_get(folio);
>
> Should we be concerned about this folio_get? Are we incrementing a
> reference that was already held if we back out of the split?
>
> -- Nico
Hi Nico,
Thanks for pointing this out. It spun out to an entire investigation for me [1].
Similar to [1], I inserted trace prints [2] and created a new __split_huge_pmd2
that always returns -ENOMEM. Without folio_put on error [3], we get a refcount of 2.
hmm-tests-129 [000] .l... 1.485514: __migrate_device_finalize: FINALIZE[0]: src=ffb48827440e8000 dst=ffb48827440e8000 src==dst=1 refcount_src=2 mapcount_src=0 order_src=9 migrate=0 BEFORE remove_migration_ptes
hmm-tests-129 [000] .l... 1.485517: __migrate_device_finalize: FINALIZE[0]: src=ffb48827440e8000 refcount=3 mapcount=1 AFTER remove_migration_ptes
hmm-tests-129 [000] .l... 1.485518: __migrate_device_finalize: FINALIZE[0]: src=ffb48827440e8000 refcount=2 AFTER folio_put(src)
With folio_put on error [4], we get a refcount of 1.
hmm-tests-129 [001] ..... 1.492216: __migrate_device_finalize: FINALIZE[0]: src=fff7b8be840f0000 dst=fff7b8be840f0000 src==dst=1 refcount_src=1 mapcount_src=0 order_src=9 migrate=0 BEFORE remove_migration_ptes
hmm-tests-129 [001] ..... 1.492219: __migrate_device_finalize: FINALIZE[0]: src=fff7b8be840f0000 refcount=2 mapcount=1 AFTER remove_migration_ptes
hmm-tests-129 [001] ..... 1.492220: __migrate_device_finalize: FINALIZE[0]: src=fff7b8be840f0000 refcount=1 AFTER folio_put(src)
So we need folio_put for split_huge_pmd_address failure, but NOT for
folio_split_unmapped.
[1] https://lore.kernel.org/all/332c9e16-46c3-4e1c-898e-2cb0a87ba1fc@linux.dev/
[2] https://gist.github.com/uarif1/6abe4bedb85814e9be8d48a4fe742b41
[3] https://gist.github.com/uarif1/f718af2113bc1a33484674b61b9dafcc
[4] https://gist.github.com/uarif1/03c42f2549eaf2bc555e8b03e07a63c8
On Thu, Mar 5, 2026 at 9:55 AM Usama Arif <usama.arif@linux.dev> wrote: > > > > On 02/03/2026 21:20, Nico Pache wrote: > > On Thu, Feb 26, 2026 at 4:34 AM Usama Arif <usama.arif@linux.dev> wrote: > >> > >> Device memory migration has two call sites that split huge PMDs: > >> > >> migrate_vma_split_unmapped_folio(): > >> Called from migrate_vma_pages() when migrating a PMD-mapped THP to a > >> destination that doesn't support compound pages. It splits the PMD > >> then splits the folio via folio_split_unmapped(). > >> > >> If the PMD split fails, folio_split_unmapped() would operate on an > >> unsplit folio with inconsistent page table state. Propagate -ENOMEM > >> to skip this page's migration. This is safe as folio_split_unmapped > >> failure would be propagated in a similar way. > >> > >> migrate_vma_insert_page(): > >> Called from migrate_vma_pages() when inserting a page into a VMA > >> during migration back from device memory. If a huge zero PMD exists > >> at the target address, it must be split before PTE insertion. > >> > >> If the split fails, the subsequent pte_alloc() and set_pte_at() would > >> operate on a PMD slot still occupied by the huge zero entry. Use > >> goto abort, consistent with other allocation failures in this function. > >> > >> Signed-off-by: Usama Arif <usama.arif@linux.dev> > >> --- > >> mm/migrate_device.c | 16 ++++++++++++++-- > >> 1 file changed, 14 insertions(+), 2 deletions(-) > >> > >> diff --git a/mm/migrate_device.c b/mm/migrate_device.c > >> index 78c7acf024615..bc53e06fd9735 100644 > >> --- a/mm/migrate_device.c > >> +++ b/mm/migrate_device.c > >> @@ -909,7 +909,13 @@ static int migrate_vma_split_unmapped_folio(struct migrate_vma *migrate, > >> int ret = 0; > >> > >> folio_get(folio); > > > > Should we be concerned about this folio_get? Are we incrementing a > > reference that was already held if we back out of the split? > > > > -- Nico > > > > Hi Nico, > > Thanks for pointing this out. It spun out to an entire investigation for me [1]. Hey Usama, I'm sorry my question lead you down a rabbit hole but I'm glad you did the proper investigation and found the correct answer :) Thanks for looking into it and for clearing that up via a comment! Cheers, -- Nico > > Similar to [1], I inserted trace prints [2] and created a new __split_huge_pmd2 > that always returns -ENOMEM. Without folio_put on error [3], we get a refcount of 2. > > hmm-tests-129 [000] .l... 1.485514: __migrate_device_finalize: FINALIZE[0]: src=ffb48827440e8000 dst=ffb48827440e8000 src==dst=1 refcount_src=2 mapcount_src=0 order_src=9 migrate=0 BEFORE remove_migration_ptes > hmm-tests-129 [000] .l... 1.485517: __migrate_device_finalize: FINALIZE[0]: src=ffb48827440e8000 refcount=3 mapcount=1 AFTER remove_migration_ptes > hmm-tests-129 [000] .l... 1.485518: __migrate_device_finalize: FINALIZE[0]: src=ffb48827440e8000 refcount=2 AFTER folio_put(src) > > > With folio_put on error [4], we get a refcount of 1. > > hmm-tests-129 [001] ..... 1.492216: __migrate_device_finalize: FINALIZE[0]: src=fff7b8be840f0000 dst=fff7b8be840f0000 src==dst=1 refcount_src=1 mapcount_src=0 order_src=9 migrate=0 BEFORE remove_migration_ptes > hmm-tests-129 [001] ..... 1.492219: __migrate_device_finalize: FINALIZE[0]: src=fff7b8be840f0000 refcount=2 mapcount=1 AFTER remove_migration_ptes > hmm-tests-129 [001] ..... 1.492220: __migrate_device_finalize: FINALIZE[0]: src=fff7b8be840f0000 refcount=1 AFTER folio_put(src) > > > So we need folio_put for split_huge_pmd_address failure, but NOT for > folio_split_unmapped. > > > [1] https://lore.kernel.org/all/332c9e16-46c3-4e1c-898e-2cb0a87ba1fc@linux.dev/ > [2] https://gist.github.com/uarif1/6abe4bedb85814e9be8d48a4fe742b41 > [3] https://gist.github.com/uarif1/f718af2113bc1a33484674b61b9dafcc > [4] https://gist.github.com/uarif1/03c42f2549eaf2bc555e8b03e07a63c8 >
On 09/03/2026 18:09, Nico Pache wrote: > On Thu, Mar 5, 2026 at 9:55 AM Usama Arif <usama.arif@linux.dev> wrote: >> >> >> >> On 02/03/2026 21:20, Nico Pache wrote: >>> On Thu, Feb 26, 2026 at 4:34 AM Usama Arif <usama.arif@linux.dev> wrote: >>>> >>>> Device memory migration has two call sites that split huge PMDs: >>>> >>>> migrate_vma_split_unmapped_folio(): >>>> Called from migrate_vma_pages() when migrating a PMD-mapped THP to a >>>> destination that doesn't support compound pages. It splits the PMD >>>> then splits the folio via folio_split_unmapped(). >>>> >>>> If the PMD split fails, folio_split_unmapped() would operate on an >>>> unsplit folio with inconsistent page table state. Propagate -ENOMEM >>>> to skip this page's migration. This is safe as folio_split_unmapped >>>> failure would be propagated in a similar way. >>>> >>>> migrate_vma_insert_page(): >>>> Called from migrate_vma_pages() when inserting a page into a VMA >>>> during migration back from device memory. If a huge zero PMD exists >>>> at the target address, it must be split before PTE insertion. >>>> >>>> If the split fails, the subsequent pte_alloc() and set_pte_at() would >>>> operate on a PMD slot still occupied by the huge zero entry. Use >>>> goto abort, consistent with other allocation failures in this function. >>>> >>>> Signed-off-by: Usama Arif <usama.arif@linux.dev> >>>> --- >>>> mm/migrate_device.c | 16 ++++++++++++++-- >>>> 1 file changed, 14 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/mm/migrate_device.c b/mm/migrate_device.c >>>> index 78c7acf024615..bc53e06fd9735 100644 >>>> --- a/mm/migrate_device.c >>>> +++ b/mm/migrate_device.c >>>> @@ -909,7 +909,13 @@ static int migrate_vma_split_unmapped_folio(struct migrate_vma *migrate, >>>> int ret = 0; >>>> >>>> folio_get(folio); >>> >>> Should we be concerned about this folio_get? Are we incrementing a >>> reference that was already held if we back out of the split? >>> >>> -- Nico >> >> >> >> Hi Nico, >> >> Thanks for pointing this out. It spun out to an entire investigation for me [1]. > > Hey Usama, > > I'm sorry my question lead you down a rabbit hole but I'm glad you did > the proper investigation and found the correct answer :) Thanks for > looking into it and for clearing that up via a comment! > > Cheers, > -- Nico > Thanks for the review! There is a need for folio_put in this patch specifically for split_huge_pmd_address which I wouldnt have figured out without your comment, so really appreciate it! Usama
On 02/03/2026 21:20, Nico Pache wrote: > On Thu, Feb 26, 2026 at 4:34 AM Usama Arif <usama.arif@linux.dev> wrote: >> >> Device memory migration has two call sites that split huge PMDs: >> >> migrate_vma_split_unmapped_folio(): >> Called from migrate_vma_pages() when migrating a PMD-mapped THP to a >> destination that doesn't support compound pages. It splits the PMD >> then splits the folio via folio_split_unmapped(). >> >> If the PMD split fails, folio_split_unmapped() would operate on an >> unsplit folio with inconsistent page table state. Propagate -ENOMEM >> to skip this page's migration. This is safe as folio_split_unmapped >> failure would be propagated in a similar way. >> >> migrate_vma_insert_page(): >> Called from migrate_vma_pages() when inserting a page into a VMA >> during migration back from device memory. If a huge zero PMD exists >> at the target address, it must be split before PTE insertion. >> >> If the split fails, the subsequent pte_alloc() and set_pte_at() would >> operate on a PMD slot still occupied by the huge zero entry. Use >> goto abort, consistent with other allocation failures in this function. >> >> Signed-off-by: Usama Arif <usama.arif@linux.dev> >> --- >> mm/migrate_device.c | 16 ++++++++++++++-- >> 1 file changed, 14 insertions(+), 2 deletions(-) >> >> diff --git a/mm/migrate_device.c b/mm/migrate_device.c >> index 78c7acf024615..bc53e06fd9735 100644 >> --- a/mm/migrate_device.c >> +++ b/mm/migrate_device.c >> @@ -909,7 +909,13 @@ static int migrate_vma_split_unmapped_folio(struct migrate_vma *migrate, >> int ret = 0; >> >> folio_get(folio); > > Should we be concerned about this folio_get? Are we incrementing a > reference that was already held if we back out of the split? > Good catch! I think this bug existed even before this patch, if folio_split_unmapped fails, the reference is still there. Let me send an independent fix for this. > -- Nico > >> - split_huge_pmd_address(migrate->vma, addr, true); >> + /* >> + * If PMD split fails, folio_split_unmapped would operate on an >> + * unsplit folio with inconsistent page table state. >> + */ >> + ret = split_huge_pmd_address(migrate->vma, addr, true); >> + if (ret) >> + return ret; >> ret = folio_split_unmapped(folio, 0); >> if (ret) >> return ret;
© 2016 - 2026 Red Hat, Inc.