drivers/gpu/drm/nouveau/nouveau_dmem.c | 246 +++++++--- drivers/gpu/drm/nouveau/nouveau_svm.c | 6 +- drivers/gpu/drm/nouveau/nouveau_svm.h | 3 +- include/linux/huge_mm.h | 19 +- include/linux/memremap.h | 51 ++- include/linux/migrate.h | 2 + include/linux/mm.h | 1 + include/linux/rmap.h | 2 + include/linux/swapops.h | 17 + lib/test_hmm.c | 432 ++++++++++++++---- lib/test_hmm_uapi.h | 3 + mm/huge_memory.c | 358 ++++++++++++--- mm/memory.c | 6 +- mm/memremap.c | 48 +- mm/migrate_device.c | 517 ++++++++++++++++++--- mm/page_vma_mapped.c | 13 +- mm/pgtable-generic.c | 6 + mm/rmap.c | 22 +- tools/testing/selftests/mm/hmm-tests.c | 607 ++++++++++++++++++++++++- 19 files changed, 2040 insertions(+), 319 deletions(-)
This patch series adds support for THP migration of zone device pages. To do so, the patches implement support for folio zone device pages by adding support for setting up larger order pages. Larger order pages provide a speedup in throughput and latency. In my local testing (using lib/test_hmm) and a throughput test, the series shows a 350% improvement in data transfer throughput and a 500% improvement in latency These patches build on the earlier posts by Ralph Campbell [1] Two new flags are added in vma_migration to select and mark compound pages. migrate_vma_setup(), migrate_vma_pages() and migrate_vma_finalize() support migration of these pages when MIGRATE_VMA_SELECT_COMPOUND is passed in as arguments. The series also adds zone device awareness to (m)THP pages along with fault handling of large zone device private pages. page vma walk and the rmap code is also zone device aware. Support has also been added for folios that might need to be split in the middle of migration (when the src and dst do not agree on MIGRATE_PFN_COMPOUND), that occurs when src side of the migration can migrate large pages, but the destination has not been able to allocate large pages. The code supported and used folio_split() when migrating THP pages, this is used when MIGRATE_VMA_SELECT_COMPOUND is not passed as an argument to migrate_vma_setup(). The test infrastructure lib/test_hmm.c has been enhanced to support THP migration. A new ioctl to emulate failure of large page allocations has been added to test the folio split code path. hmm-tests.c has new test cases for huge page migration and to test the folio split path. A new throughput test has been added as well. The nouveau dmem code has been enhanced to use the new THP migration capability. mTHP support: The patches hard code, HPAGE_PMD_NR in a few places, but the code has been kept generic to support various order sizes. With additional refactoring of the code support of different order sizes should be possible. The future plan is to post enhancements to support mTHP with a rough design as follows: 1. Add the notion of allowable thp orders to the HMM based test driver 2. For non PMD based THP paths in migrate_device.c, check to see if a suitable order is found and supported by the driver 3. Iterate across orders to check the highest supported order for migration 4. Migrate and finalize The mTHP patches can be built on top of this series, the key design elements that need to be worked out are infrastructure and driver support for multiple ordered pages and their migration. HMM support for large folios: Francois Dugast posted patches support for HMM handling [4], the proposed changes can build on top of this series to provide support for HMM fault handling. References: [1] https://lore.kernel.org/linux-mm/20201106005147.20113-1-rcampbell@nvidia.com/ [2] https://lore.kernel.org/linux-mm/20250306044239.3874247-3-balbirs@nvidia.com/T/ [3] https://lore.kernel.org/lkml/20250703233511.2028395-1-balbirs@nvidia.com/ [4] https://lore.kernel.org/lkml/20250722193445.1588348-1-francois.dugast@intel.com/ These patches are built on top of mm/mm-stable Cc: Karol Herbst <kherbst@redhat.com> Cc: Lyude Paul <lyude@redhat.com> Cc: Danilo Krummrich <dakr@kernel.org> Cc: David Airlie <airlied@gmail.com> Cc: Simona Vetter <simona@ffwll.ch> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Barry Song <baohua@kernel.org> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Xu <peterx@redhat.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Jane Chu <jane.chu@oracle.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Mika Penttilä <mpenttil@redhat.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Francois Dugast <francois.dugast@intel.com> Changelog v2 [3] : - Several review comments from David Hildenbrand were addressed, Mika, Zi, Matthew also provided helpful review comments - In paths where it makes sense a new helper is_pmd_device_private_entry() is used - anon_exclusive handling of zone device private pages in split_huge_pmd_locked() has been fixed - Patches that introduced helpers have been folded into where they are used - Zone device handling in mm/huge_memory.c has benefited from the code and testing of Matthew Brost, he helped find bugs related to copy_huge_pmd() and partial unmapping of folios. - Zone device THP PMD support via page_vma_mapped_walk() is restricted to try_to_migrate_one() - There is a new dedicated helper to split large zone device folios Changelog v1 [2]: - Support for handling fault_folio and using trylock in the fault path - A new test case has been added to measure the throughput improvement - General refactoring of code to keep up with the changes in mm - New split folio callback when the entire split is complete/done. The callback is used to know when the head order needs to be reset. Testing: - Testing was done with ZONE_DEVICE private pages on an x86 VM Balbir Singh (11): mm/zone_device: support large zone device private folios mm/thp: zone_device awareness in THP handling code mm/migrate_device: THP migration of zone device pages mm/memory/fault: add support for zone device THP fault handling lib/test_hmm: test cases and support for zone device private THP mm/memremap: add folio_split support mm/thp: add split during migration support lib/test_hmm: add test case for split pages selftests/mm/hmm-tests: new tests for zone device THP migration gpu/drm/nouveau: add THP migration support selftests/mm/hmm-tests: new throughput tests including THP drivers/gpu/drm/nouveau/nouveau_dmem.c | 246 +++++++--- drivers/gpu/drm/nouveau/nouveau_svm.c | 6 +- drivers/gpu/drm/nouveau/nouveau_svm.h | 3 +- include/linux/huge_mm.h | 19 +- include/linux/memremap.h | 51 ++- include/linux/migrate.h | 2 + include/linux/mm.h | 1 + include/linux/rmap.h | 2 + include/linux/swapops.h | 17 + lib/test_hmm.c | 432 ++++++++++++++---- lib/test_hmm_uapi.h | 3 + mm/huge_memory.c | 358 ++++++++++++--- mm/memory.c | 6 +- mm/memremap.c | 48 +- mm/migrate_device.c | 517 ++++++++++++++++++--- mm/page_vma_mapped.c | 13 +- mm/pgtable-generic.c | 6 + mm/rmap.c | 22 +- tools/testing/selftests/mm/hmm-tests.c | 607 ++++++++++++++++++++++++- 19 files changed, 2040 insertions(+), 319 deletions(-) -- 2.50.1
On Wed, Jul 30, 2025 at 07:21:28PM +1000, Balbir Singh wrote: > This patch series adds support for THP migration of zone device pages. > To do so, the patches implement support for folio zone device pages > by adding support for setting up larger order pages. Larger order > pages provide a speedup in throughput and latency. > > In my local testing (using lib/test_hmm) and a throughput test, the > series shows a 350% improvement in data transfer throughput and a > 500% improvement in latency > > These patches build on the earlier posts by Ralph Campbell [1] > > Two new flags are added in vma_migration to select and mark compound pages. > migrate_vma_setup(), migrate_vma_pages() and migrate_vma_finalize() > support migration of these pages when MIGRATE_VMA_SELECT_COMPOUND > is passed in as arguments. > > The series also adds zone device awareness to (m)THP pages along > with fault handling of large zone device private pages. page vma walk > and the rmap code is also zone device aware. Support has also been > added for folios that might need to be split in the middle > of migration (when the src and dst do not agree on > MIGRATE_PFN_COMPOUND), that occurs when src side of the migration can > migrate large pages, but the destination has not been able to allocate > large pages. The code supported and used folio_split() when migrating > THP pages, this is used when MIGRATE_VMA_SELECT_COMPOUND is not passed > as an argument to migrate_vma_setup(). > > The test infrastructure lib/test_hmm.c has been enhanced to support THP > migration. A new ioctl to emulate failure of large page allocations has > been added to test the folio split code path. hmm-tests.c has new test > cases for huge page migration and to test the folio split path. A new > throughput test has been added as well. > > The nouveau dmem code has been enhanced to use the new THP migration > capability. > > mTHP support: > > The patches hard code, HPAGE_PMD_NR in a few places, but the code has > been kept generic to support various order sizes. With additional > refactoring of the code support of different order sizes should be > possible. > > The future plan is to post enhancements to support mTHP with a rough > design as follows: > > 1. Add the notion of allowable thp orders to the HMM based test driver > 2. For non PMD based THP paths in migrate_device.c, check to see if > a suitable order is found and supported by the driver > 3. Iterate across orders to check the highest supported order for migration > 4. Migrate and finalize > > The mTHP patches can be built on top of this series, the key design > elements that need to be worked out are infrastructure and driver support > for multiple ordered pages and their migration. > > HMM support for large folios: > > Francois Dugast posted patches support for HMM handling [4], the proposed > changes can build on top of this series to provide support for HMM fault > handling. > > References: > [1] https://lore.kernel.org/linux-mm/20201106005147.20113-1-rcampbell@nvidia.com/ > [2] https://lore.kernel.org/linux-mm/20250306044239.3874247-3-balbirs@nvidia.com/T/ > [3] https://lore.kernel.org/lkml/20250703233511.2028395-1-balbirs@nvidia.com/ > [4] https://lore.kernel.org/lkml/20250722193445.1588348-1-francois.dugast@intel.com/ > > These patches are built on top of mm/mm-stable > > Cc: Karol Herbst <kherbst@redhat.com> > Cc: Lyude Paul <lyude@redhat.com> > Cc: Danilo Krummrich <dakr@kernel.org> > Cc: David Airlie <airlied@gmail.com> > Cc: Simona Vetter <simona@ffwll.ch> > Cc: "Jérôme Glisse" <jglisse@redhat.com> > Cc: Shuah Khan <shuah@kernel.org> > Cc: David Hildenbrand <david@redhat.com> > Cc: Barry Song <baohua@kernel.org> > Cc: Baolin Wang <baolin.wang@linux.alibaba.com> > Cc: Ryan Roberts <ryan.roberts@arm.com> > Cc: Matthew Wilcox <willy@infradead.org> > Cc: Peter Xu <peterx@redhat.com> > Cc: Zi Yan <ziy@nvidia.com> > Cc: Kefeng Wang <wangkefeng.wang@huawei.com> > Cc: Jane Chu <jane.chu@oracle.com> > Cc: Alistair Popple <apopple@nvidia.com> > Cc: Donet Tom <donettom@linux.ibm.com> > Cc: Ralph Campbell <rcampbell@nvidia.com> > Cc: Mika Penttilä <mpenttil@redhat.com> > Cc: Matthew Brost <matthew.brost@intel.com> > Cc: Francois Dugast <francois.dugast@intel.com> > > Changelog v2 [3] : > - Several review comments from David Hildenbrand were addressed, Mika, > Zi, Matthew also provided helpful review comments > - In paths where it makes sense a new helper > is_pmd_device_private_entry() is used > - anon_exclusive handling of zone device private pages in > split_huge_pmd_locked() has been fixed > - Patches that introduced helpers have been folded into where they > are used > - Zone device handling in mm/huge_memory.c has benefited from the code > and testing of Matthew Brost, he helped find bugs related to > copy_huge_pmd() and partial unmapping of folios. I see a ton of discussion on this series, particularly patch 2. It looks like you have landed on a different solution for partial unmaps. I wanted to pull this series into in for testing but if this is actively being refactored, likely best to hold off until next post or test off a WIP branch if you have one. Matt > - Zone device THP PMD support via page_vma_mapped_walk() is restricted > to try_to_migrate_one() > - There is a new dedicated helper to split large zone device folios > > Changelog v1 [2]: > - Support for handling fault_folio and using trylock in the fault path > - A new test case has been added to measure the throughput improvement > - General refactoring of code to keep up with the changes in mm > - New split folio callback when the entire split is complete/done. The > callback is used to know when the head order needs to be reset. > > Testing: > - Testing was done with ZONE_DEVICE private pages on an x86 VM > > Balbir Singh (11): > mm/zone_device: support large zone device private folios > mm/thp: zone_device awareness in THP handling code > mm/migrate_device: THP migration of zone device pages > mm/memory/fault: add support for zone device THP fault handling > lib/test_hmm: test cases and support for zone device private THP > mm/memremap: add folio_split support > mm/thp: add split during migration support > lib/test_hmm: add test case for split pages > selftests/mm/hmm-tests: new tests for zone device THP migration > gpu/drm/nouveau: add THP migration support > selftests/mm/hmm-tests: new throughput tests including THP > > drivers/gpu/drm/nouveau/nouveau_dmem.c | 246 +++++++--- > drivers/gpu/drm/nouveau/nouveau_svm.c | 6 +- > drivers/gpu/drm/nouveau/nouveau_svm.h | 3 +- > include/linux/huge_mm.h | 19 +- > include/linux/memremap.h | 51 ++- > include/linux/migrate.h | 2 + > include/linux/mm.h | 1 + > include/linux/rmap.h | 2 + > include/linux/swapops.h | 17 + > lib/test_hmm.c | 432 ++++++++++++++---- > lib/test_hmm_uapi.h | 3 + > mm/huge_memory.c | 358 ++++++++++++--- > mm/memory.c | 6 +- > mm/memremap.c | 48 +- > mm/migrate_device.c | 517 ++++++++++++++++++--- > mm/page_vma_mapped.c | 13 +- > mm/pgtable-generic.c | 6 + > mm/rmap.c | 22 +- > tools/testing/selftests/mm/hmm-tests.c | 607 ++++++++++++++++++++++++- > 19 files changed, 2040 insertions(+), 319 deletions(-) > > -- > 2.50.1 >
On 30.07.25 11:21, Balbir Singh wrote: BTW, I keep getting confused by the topic. Isn't this essentially "mm: support device-private THP" and the support for migration is just a necessary requirement to *enable* device private? -- Cheers, David / dhildenb
On 7/30/25 21:30, David Hildenbrand wrote: > On 30.07.25 11:21, Balbir Singh wrote: > > BTW, I keep getting confused by the topic. > > Isn't this essentially > > "mm: support device-private THP" > > and the support for migration is just a necessary requirement to *enable* device private? > I agree, I can change the title, but the focus of the use case is to support THP migration for improved latency and throughput. All of that involves support of device-private THP Balbir Singh
On 31.07.25 10:41, Balbir Singh wrote: > On 7/30/25 21:30, David Hildenbrand wrote: >> On 30.07.25 11:21, Balbir Singh wrote: >> >> BTW, I keep getting confused by the topic. >> >> Isn't this essentially >> >> "mm: support device-private THP" >> >> and the support for migration is just a necessary requirement to *enable* device private? >> > > I agree, I can change the title, but the focus of the use case is to > support THP migration for improved latency and throughput. All of that > involves support of device-private THP Well, the subject as is makes one believe that THP support for zone-device pages would already be there, and that you are adding migration support. That was the confusing part to me, because in the very first patch you add ... THP support for (selected/private) zone device pages. -- Cheers, David / dhildenb
On Wed, Jul 30, 2025 at 01:30:13PM +0200, David Hildenbrand wrote: > On 30.07.25 11:21, Balbir Singh wrote: > > BTW, I keep getting confused by the topic. > > Isn't this essentially > > "mm: support device-private THP" > > and the support for migration is just a necessary requirement to *enable* > device private? Yes, that's a good point. Migration is one component but there is also fault handling, etc. so I think calling this "support device-private THP" makes sense. - Alistair > -- > Cheers, > > David / dhildenb >
© 2016 - 2025 Red Hat, Inc.