Extend migrate_vma_collect_pmd() to handle partially mapped large
folios that require splitting before migration can proceed.
During PTE walk in the collection phase, if a large folio is only
partially mapped in the migration range, it must be split to ensure
the folio is correctly migrated.
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Gregory Price <gourry@gourry.net>
Cc: Ying Huang <ying.huang@linux.alibaba.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona@ffwll.ch>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Mika Penttilä <mpenttil@redhat.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Francois Dugast <francois.dugast@intel.com>
Signed-off-by: Balbir Singh <balbirs@nvidia.com>
---
mm/migrate_device.c | 95 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 95 insertions(+)
diff --git a/mm/migrate_device.c b/mm/migrate_device.c
index e05e14d6eacd..e58c3f9d01c8 100644
--- a/mm/migrate_device.c
+++ b/mm/migrate_device.c
@@ -54,6 +54,54 @@ static int migrate_vma_collect_hole(unsigned long start,
return 0;
}
+/**
+ * migrate_vma_split_folio - Helper function to split a(n) (m)THP folio
+ *
+ * @folio - the folio to split
+ * @fault_page - struct page associated with the fault if any
+ *
+ * Returns 0 on success
+ */
+static int migrate_vma_split_folio(struct folio *folio,
+ struct page *fault_page)
+{
+ int ret;
+ struct folio *fault_folio = fault_page ? page_folio(fault_page) : NULL;
+ struct folio *new_fault_folio = NULL;
+
+ if (folio != fault_folio) {
+ folio_get(folio);
+ folio_lock(folio);
+ }
+
+ ret = split_folio(folio);
+ if (ret) {
+ if (folio != fault_folio) {
+ folio_unlock(folio);
+ folio_put(folio);
+ }
+ return ret;
+ }
+
+ new_fault_folio = fault_page ? page_folio(fault_page) : NULL;
+
+ /*
+ * Ensure the lock is held on the correct
+ * folio after the split
+ */
+ if (!new_fault_folio) {
+ folio_unlock(folio);
+ folio_put(folio);
+ } else if (folio != new_fault_folio) {
+ folio_get(new_fault_folio);
+ folio_lock(new_fault_folio);
+ folio_unlock(folio);
+ folio_put(folio);
+ }
+
+ return 0;
+}
+
static int migrate_vma_collect_pmd(pmd_t *pmdp,
unsigned long start,
unsigned long end,
@@ -136,6 +184,8 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
* page table entry. Other special swap entries are not
* migratable, and we ignore regular swapped page.
*/
+ struct folio *folio;
+
entry = pte_to_swp_entry(pte);
if (!is_device_private_entry(entry))
goto next;
@@ -147,6 +197,29 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
pgmap->owner != migrate->pgmap_owner)
goto next;
+ folio = page_folio(page);
+ if (folio_test_large(folio)) {
+ int ret;
+
+ /*
+ * The reason for finding pmd present with a
+ * large folio for the pte is partial unmaps.
+ * Split the folio now for the migration to be
+ * handled correctly
+ */
+ pte_unmap_unlock(ptep, ptl);
+ ret = migrate_vma_split_folio(folio,
+ migrate->fault_page);
+
+ if (ret) {
+ ptep = pte_offset_map_lock(mm, pmdp, addr, &ptl);
+ goto next;
+ }
+
+ addr = start;
+ goto again;
+ }
+
mpfn = migrate_pfn(page_to_pfn(page)) |
MIGRATE_PFN_MIGRATE;
if (is_writable_device_private_entry(entry))
@@ -171,6 +244,28 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
pgmap->owner != migrate->pgmap_owner)
goto next;
}
+ folio = page_folio(page);
+ if (folio_test_large(folio)) {
+ int ret;
+
+ /*
+ * The reason for finding pmd present with a
+ * large folio for the pte is partial unmaps.
+ * Split the folio now for the migration to be
+ * handled correctly
+ */
+ pte_unmap_unlock(ptep, ptl);
+ ret = migrate_vma_split_folio(folio,
+ migrate->fault_page);
+
+ if (ret) {
+ ptep = pte_offset_map_lock(mm, pmdp, addr, &ptl);
+ goto next;
+ }
+
+ addr = start;
+ goto again;
+ }
mpfn = migrate_pfn(pfn) | MIGRATE_PFN_MIGRATE;
mpfn |= pte_write(pte) ? MIGRATE_PFN_WRITE : 0;
}
--
2.50.1
Hi Balbir, kernel test robot noticed the following build warnings: [auto build test WARNING on akpm-mm/mm-nonmm-unstable] [also build test WARNING on sj/damon/next linus/master v6.17-rc4 next-20250904] [cannot apply to akpm-mm/mm-everything] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Balbir-Singh/mm-zone_device-support-large-zone-device-private-folios/20250903-092241 base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-nonmm-unstable patch link: https://lore.kernel.org/r/20250903011900.3657435-6-balbirs%40nvidia.com patch subject: [v4 05/15] mm/migrate_device: handle partially mapped folios during collection config: arm64-randconfig-001-20250904 (https://download.01.org/0day-ci/archive/20250904/202509041732.UTJsw3LB-lkp@intel.com/config) compiler: aarch64-linux-gcc (GCC) 8.5.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250904/202509041732.UTJsw3LB-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202509041732.UTJsw3LB-lkp@intel.com/ All warnings (new ones prefixed by >>): >> Warning: mm/migrate_device.c:66 function parameter 'folio' not described in 'migrate_vma_split_folio' >> Warning: mm/migrate_device.c:66 function parameter 'fault_page' not described in 'migrate_vma_split_folio' -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki
Hi, On 9/3/25 04:18, Balbir Singh wrote: > Extend migrate_vma_collect_pmd() to handle partially mapped large > folios that require splitting before migration can proceed. > > During PTE walk in the collection phase, if a large folio is only > partially mapped in the migration range, it must be split to ensure > the folio is correctly migrated. > > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: David Hildenbrand <david@redhat.com> > Cc: Zi Yan <ziy@nvidia.com> > Cc: Joshua Hahn <joshua.hahnjy@gmail.com> > Cc: Rakie Kim <rakie.kim@sk.com> > Cc: Byungchul Park <byungchul@sk.com> > Cc: Gregory Price <gourry@gourry.net> > Cc: Ying Huang <ying.huang@linux.alibaba.com> > Cc: Alistair Popple <apopple@nvidia.com> > Cc: Oscar Salvador <osalvador@suse.de> > Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> > Cc: Baolin Wang <baolin.wang@linux.alibaba.com> > Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> > Cc: Nico Pache <npache@redhat.com> > Cc: Ryan Roberts <ryan.roberts@arm.com> > Cc: Dev Jain <dev.jain@arm.com> > Cc: Barry Song <baohua@kernel.org> > Cc: Lyude Paul <lyude@redhat.com> > Cc: Danilo Krummrich <dakr@kernel.org> > Cc: David Airlie <airlied@gmail.com> > Cc: Simona Vetter <simona@ffwll.ch> > Cc: Ralph Campbell <rcampbell@nvidia.com> > Cc: Mika Penttilä <mpenttil@redhat.com> > Cc: Matthew Brost <matthew.brost@intel.com> > Cc: Francois Dugast <francois.dugast@intel.com> > > Signed-off-by: Balbir Singh <balbirs@nvidia.com> > --- > mm/migrate_device.c | 95 +++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 95 insertions(+) > > diff --git a/mm/migrate_device.c b/mm/migrate_device.c > index e05e14d6eacd..e58c3f9d01c8 100644 > --- a/mm/migrate_device.c > +++ b/mm/migrate_device.c > @@ -54,6 +54,54 @@ static int migrate_vma_collect_hole(unsigned long start, > return 0; > } > > +/** > + * migrate_vma_split_folio - Helper function to split a(n) (m)THP folio > + * > + * @folio - the folio to split > + * @fault_page - struct page associated with the fault if any > + * > + * Returns 0 on success > + */ > +static int migrate_vma_split_folio(struct folio *folio, > + struct page *fault_page) > +{ > + int ret; > + struct folio *fault_folio = fault_page ? page_folio(fault_page) : NULL; > + struct folio *new_fault_folio = NULL; > + > + if (folio != fault_folio) { > + folio_get(folio); > + folio_lock(folio); > + } > + > + ret = split_folio(folio); > + if (ret) { > + if (folio != fault_folio) { > + folio_unlock(folio); > + folio_put(folio); > + } > + return ret; > + } > + > + new_fault_folio = fault_page ? page_folio(fault_page) : NULL; > + > + /* > + * Ensure the lock is held on the correct > + * folio after the split > + */ > + if (!new_fault_folio) { > + folio_unlock(folio); > + folio_put(folio); > + } else if (folio != new_fault_folio) { > + folio_get(new_fault_folio); > + folio_lock(new_fault_folio); > + folio_unlock(folio); > + folio_put(folio); > + } > + > + return 0; > +} > + > static int migrate_vma_collect_pmd(pmd_t *pmdp, > unsigned long start, > unsigned long end, > @@ -136,6 +184,8 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, > * page table entry. Other special swap entries are not > * migratable, and we ignore regular swapped page. > */ > + struct folio *folio; > + > entry = pte_to_swp_entry(pte); > if (!is_device_private_entry(entry)) > goto next; > @@ -147,6 +197,29 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, > pgmap->owner != migrate->pgmap_owner) > goto next; > > + folio = page_folio(page); > + if (folio_test_large(folio)) { > + int ret; > + > + /* > + * The reason for finding pmd present with a > + * large folio for the pte is partial unmaps. > + * Split the folio now for the migration to be > + * handled correctly > + */ > + pte_unmap_unlock(ptep, ptl); > + ret = migrate_vma_split_folio(folio, > + migrate->fault_page); > + > + if (ret) { > + ptep = pte_offset_map_lock(mm, pmdp, addr, &ptl); > + goto next; > + } > + > + addr = start; > + goto again; > + } > + > mpfn = migrate_pfn(page_to_pfn(page)) | > MIGRATE_PFN_MIGRATE; > if (is_writable_device_private_entry(entry)) > @@ -171,6 +244,28 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, > pgmap->owner != migrate->pgmap_owner) > goto next; > } > + folio = page_folio(page); > + if (folio_test_large(folio)) { > + int ret; > + > + /* > + * The reason for finding pmd present with a > + * large folio for the pte is partial unmaps. > + * Split the folio now for the migration to be > + * handled correctly > + */ There are other reasons like vma splits for various reasons. > + pte_unmap_unlock(ptep, ptl); > + ret = migrate_vma_split_folio(folio, > + migrate->fault_page); > + > + if (ret) { > + ptep = pte_offset_map_lock(mm, pmdp, addr, &ptl); > + goto next; > + } > + > + addr = start; > + goto again; > + } > mpfn = migrate_pfn(pfn) | MIGRATE_PFN_MIGRATE; > mpfn |= pte_write(pte) ? MIGRATE_PFN_WRITE : 0; > }
On 9/3/25 14:40, Mika Penttilä wrote: > Hi, > > On 9/3/25 04:18, Balbir Singh wrote: > >> Extend migrate_vma_collect_pmd() to handle partially mapped large >> folios that require splitting before migration can proceed. >> >> During PTE walk in the collection phase, if a large folio is only >> partially mapped in the migration range, it must be split to ensure >> the folio is correctly migrated. >> <snip> >> + >> + /* >> + * The reason for finding pmd present with a >> + * large folio for the pte is partial unmaps. >> + * Split the folio now for the migration to be >> + * handled correctly >> + */ > > There are other reasons like vma splits for various reasons. > Yes, David had pointed that out as well, I meant to cleanup the comment change "The" to "One", I missed addressing it in the refactor, but easy to do Thanks, Balbir Singh
On 9/3/25 09:05, Balbir Singh wrote: > On 9/3/25 14:40, Mika Penttilä wrote: >> Hi, >> >> On 9/3/25 04:18, Balbir Singh wrote: >> >>> Extend migrate_vma_collect_pmd() to handle partially mapped large >>> folios that require splitting before migration can proceed. >>> >>> During PTE walk in the collection phase, if a large folio is only >>> partially mapped in the migration range, it must be split to ensure >>> the folio is correctly migrated. >>> > <snip> > >>> + >>> + /* >>> + * The reason for finding pmd present with a >>> + * large folio for the pte is partial unmaps. >>> + * Split the folio now for the migration to be >>> + * handled correctly >>> + */ >> There are other reasons like vma splits for various reasons. >> > Yes, David had pointed that out as well, I meant to cleanup the comment change > "The" to "One", I missed addressing it in the refactor, but easy to do And of course now you split all mTHPs as well, which is different what we do today (ignoring). Splitting might be the right thing to do, but maybe worth mentioning. > Thanks, > Balbir Singh > --Mika
© 2016 - 2025 Red Hat, Inc.