[v1 00/12] THP support for zone device page migration

Balbir Singh posted 12 patches 3 months ago
There is a newer version of this series
drivers/gpu/drm/nouveau/nouveau_dmem.c | 246 +++++---
drivers/gpu/drm/nouveau/nouveau_svm.c  |   6 +-
drivers/gpu/drm/nouveau/nouveau_svm.h  |   3 +-
include/linux/huge_mm.h                |  18 +-
include/linux/memremap.h               |  29 +-
include/linux/migrate.h                |   2 +
include/linux/mm.h                     |   1 +
lib/test_hmm.c                         | 428 ++++++++++---
lib/test_hmm_uapi.h                    |   3 +
mm/huge_memory.c                       | 261 ++++++--
mm/memory.c                            |   6 +-
mm/memremap.c                          |  50 +-
mm/migrate.c                           |   2 +
mm/migrate_device.c                    | 488 ++++++++++++---
mm/page_alloc.c                        |   1 +
mm/page_vma_mapped.c                   |  10 +
mm/pgtable-generic.c                   |   6 +
mm/rmap.c                              |  19 +-
tools/testing/selftests/mm/hmm-tests.c | 805 ++++++++++++++++++++++++-
19 files changed, 2072 insertions(+), 312 deletions(-)
[v1 00/12] THP support for zone device page migration
Posted by Balbir Singh 3 months ago
This patch series adds support for THP migration of zone device pages.
To do so, the patches implement support for folio zone device pages
by adding support for setting up larger order pages.

These patches build on the earlier posts by Ralph Campbell [1]

Two new flags are added in vma_migration to select and mark compound pages.
migrate_vma_setup(), migrate_vma_pages() and migrate_vma_finalize()
support migration of these pages when MIGRATE_VMA_SELECT_COMPOUND
is passed in as arguments.

The series also adds zone device awareness to (m)THP pages along
with fault handling of large zone device private pages. page vma walk
and the rmap code is also zone device aware. Support has also been
added for folios that might need to be split in the middle
of migration (when the src and dst do not agree on
MIGRATE_PFN_COMPOUND), that occurs when src side of the migration can
migrate large pages, but the destination has not been able to allocate
large pages. The code supported and used folio_split() when migrating
THP pages, this is used when MIGRATE_VMA_SELECT_COMPOUND is not passed
as an argument to migrate_vma_setup().

The test infrastructure lib/test_hmm.c has been enhanced to support THP
migration. A new ioctl to emulate failure of large page allocations has
been added to test the folio split code path. hmm-tests.c has new test
cases for huge page migration and to test the folio split path. A new
throughput test has been added as well.

The nouveau dmem code has been enhanced to use the new THP migration
capability.

Feedback from the RFC [2]:

It was advised that prep_compound_page() not be exposed just for the purposes
of testing (test driver lib/test_hmm.c). Work arounds of copy and split the
folios did not work due to lock order dependency in the callback for
split folio.

mTHP support:

The patches hard code, HPAGE_PMD_NR in a few places, but the code has
been kept generic to support various order sizes. With additional
refactoring of the code support of different order sizes should be
possible.

The future plan is to post enhancements to support mTHP with a rough
design as follows:

1. Add the notion of allowable thp orders to the HMM based test driver
2. For non PMD based THP paths in migrate_device.c, check to see if
   a suitable order is found and supported by the driver
3. Iterate across orders to check the highest supported order for migration
4. Migrate and finalize

The mTHP patches can be built on top of this series, the key design elements
that need to be worked out are infrastructure and driver support for multiple
ordered pages and their migration.

References:
[1] https://lore.kernel.org/linux-mm/20201106005147.20113-1-rcampbell@nvidia.com/
[2] https://lore.kernel.org/linux-mm/20250306044239.3874247-3-balbirs@nvidia.com/T/

These patches are built on top of mm-unstable

Cc: Karol Herbst <kherbst@redhat.com>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona@ffwll.ch>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Jane Chu <jane.chu@oracle.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Donet Tom <donettom@linux.ibm.com>

Changelog v1:
- Changes from RFC [2], include support for handling fault_folio and using
  trylock in the fault path
- A new test case has been added to measure the throughput improvement
- General refactoring of code to keep up with the changes in mm
- New split folio callback when the entire split is complete/done. The
  callback is used to know when the head order needs to be reset.

Testing:
- Testing was done with ZONE_DEVICE private pages on an x86 VM
- Throughput showed upto 5x improvement with THP migration, system to device
  migration is slower due to the mirroring of data (see buffer->mirror)

Balbir Singh (12):
  mm/zone_device: support large zone device private folios
  mm/migrate_device: flags for selecting device private THP pages
  mm/thp: zone_device awareness in THP handling code
  mm/migrate_device: THP migration of zone device pages
  mm/memory/fault: Add support for zone device THP fault handling
  lib/test_hmm: test cases and support for zone device private THP
  mm/memremap: Add folio_split support
  mm/thp: add split during migration support
  lib/test_hmm: add test case for split pages
  selftests/mm/hmm-tests: new tests for zone device THP migration
  gpu/drm/nouveau: Add THP migration support
  selftests/mm/hmm-tests: New throughput tests including THP

 drivers/gpu/drm/nouveau/nouveau_dmem.c | 246 +++++---
 drivers/gpu/drm/nouveau/nouveau_svm.c  |   6 +-
 drivers/gpu/drm/nouveau/nouveau_svm.h  |   3 +-
 include/linux/huge_mm.h                |  18 +-
 include/linux/memremap.h               |  29 +-
 include/linux/migrate.h                |   2 +
 include/linux/mm.h                     |   1 +
 lib/test_hmm.c                         | 428 ++++++++++---
 lib/test_hmm_uapi.h                    |   3 +
 mm/huge_memory.c                       | 261 ++++++--
 mm/memory.c                            |   6 +-
 mm/memremap.c                          |  50 +-
 mm/migrate.c                           |   2 +
 mm/migrate_device.c                    | 488 ++++++++++++---
 mm/page_alloc.c                        |   1 +
 mm/page_vma_mapped.c                   |  10 +
 mm/pgtable-generic.c                   |   6 +
 mm/rmap.c                              |  19 +-
 tools/testing/selftests/mm/hmm-tests.c | 805 ++++++++++++++++++++++++-
 19 files changed, 2072 insertions(+), 312 deletions(-)

-- 
2.49.0

Re: [v1 00/12] THP support for zone device page migration
Posted by Zi Yan 3 months ago
On 3 Jul 2025, at 18:27, Balbir Singh wrote:

> This patch series adds support for THP migration of zone device pages.
> To do so, the patches implement support for folio zone device pages
> by adding support for setting up larger order pages.
>
> These patches build on the earlier posts by Ralph Campbell [1]
>
> Two new flags are added in vma_migration to select and mark compound pages.
> migrate_vma_setup(), migrate_vma_pages() and migrate_vma_finalize()
> support migration of these pages when MIGRATE_VMA_SELECT_COMPOUND
> is passed in as arguments.
>
> The series also adds zone device awareness to (m)THP pages along
> with fault handling of large zone device private pages. page vma walk
> and the rmap code is also zone device aware. Support has also been
> added for folios that might need to be split in the middle
> of migration (when the src and dst do not agree on
> MIGRATE_PFN_COMPOUND), that occurs when src side of the migration can
> migrate large pages, but the destination has not been able to allocate
> large pages. The code supported and used folio_split() when migrating
> THP pages, this is used when MIGRATE_VMA_SELECT_COMPOUND is not passed
> as an argument to migrate_vma_setup().
>
> The test infrastructure lib/test_hmm.c has been enhanced to support THP
> migration. A new ioctl to emulate failure of large page allocations has
> been added to test the folio split code path. hmm-tests.c has new test
> cases for huge page migration and to test the folio split path. A new
> throughput test has been added as well.
>
> The nouveau dmem code has been enhanced to use the new THP migration
> capability.
>
> Feedback from the RFC [2]:
>
> It was advised that prep_compound_page() not be exposed just for the purposes
> of testing (test driver lib/test_hmm.c). Work arounds of copy and split the
> folios did not work due to lock order dependency in the callback for
> split folio.
>
> mTHP support:
>
> The patches hard code, HPAGE_PMD_NR in a few places, but the code has
> been kept generic to support various order sizes. With additional
> refactoring of the code support of different order sizes should be
> possible.
>
> The future plan is to post enhancements to support mTHP with a rough
> design as follows:
>
> 1. Add the notion of allowable thp orders to the HMM based test driver
> 2. For non PMD based THP paths in migrate_device.c, check to see if
>    a suitable order is found and supported by the driver
> 3. Iterate across orders to check the highest supported order for migration
> 4. Migrate and finalize
>
> The mTHP patches can be built on top of this series, the key design elements
> that need to be worked out are infrastructure and driver support for multiple
> ordered pages and their migration.
>
> References:
> [1] https://lore.kernel.org/linux-mm/20201106005147.20113-1-rcampbell@nvidia.com/
> [2] https://lore.kernel.org/linux-mm/20250306044239.3874247-3-balbirs@nvidia.com/T/
>
> These patches are built on top of mm-unstable
>
> Cc: Karol Herbst <kherbst@redhat.com>
> Cc: Lyude Paul <lyude@redhat.com>
> Cc: Danilo Krummrich <dakr@kernel.org>
> Cc: David Airlie <airlied@gmail.com>
> Cc: Simona Vetter <simona@ffwll.ch>
> Cc: "Jérôme Glisse" <jglisse@redhat.com>
> Cc: Shuah Khan <shuah@kernel.org>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Barry Song <baohua@kernel.org>
> Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
> Cc: Ryan Roberts <ryan.roberts@arm.com>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Zi Yan <ziy@nvidia.com>
> Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
> Cc: Jane Chu <jane.chu@oracle.com>
> Cc: Alistair Popple <apopple@nvidia.com>
> Cc: Donet Tom <donettom@linux.ibm.com>
>
> Changelog v1:
> - Changes from RFC [2], include support for handling fault_folio and using
>   trylock in the fault path
> - A new test case has been added to measure the throughput improvement
> - General refactoring of code to keep up with the changes in mm
> - New split folio callback when the entire split is complete/done. The
>   callback is used to know when the head order needs to be reset.
>
> Testing:
> - Testing was done with ZONE_DEVICE private pages on an x86 VM
> - Throughput showed upto 5x improvement with THP migration, system to device
>   migration is slower due to the mirroring of data (see buffer->mirror)
>
> Balbir Singh (12):
>   mm/zone_device: support large zone device private folios
>   mm/migrate_device: flags for selecting device private THP pages
>   mm/thp: zone_device awareness in THP handling code
>   mm/migrate_device: THP migration of zone device pages
>   mm/memory/fault: Add support for zone device THP fault handling
>   lib/test_hmm: test cases and support for zone device private THP
>   mm/memremap: Add folio_split support
>   mm/thp: add split during migration support
>   lib/test_hmm: add test case for split pages
>   selftests/mm/hmm-tests: new tests for zone device THP migration
>   gpu/drm/nouveau: Add THP migration support
>   selftests/mm/hmm-tests: New throughput tests including THP
>
>  drivers/gpu/drm/nouveau/nouveau_dmem.c | 246 +++++---
>  drivers/gpu/drm/nouveau/nouveau_svm.c  |   6 +-
>  drivers/gpu/drm/nouveau/nouveau_svm.h  |   3 +-
>  include/linux/huge_mm.h                |  18 +-
>  include/linux/memremap.h               |  29 +-
>  include/linux/migrate.h                |   2 +
>  include/linux/mm.h                     |   1 +
>  lib/test_hmm.c                         | 428 ++++++++++---
>  lib/test_hmm_uapi.h                    |   3 +
>  mm/huge_memory.c                       | 261 ++++++--
>  mm/memory.c                            |   6 +-
>  mm/memremap.c                          |  50 +-
>  mm/migrate.c                           |   2 +
>  mm/migrate_device.c                    | 488 ++++++++++++---
>  mm/page_alloc.c                        |   1 +
>  mm/page_vma_mapped.c                   |  10 +
>  mm/pgtable-generic.c                   |   6 +
>  mm/rmap.c                              |  19 +-
>  tools/testing/selftests/mm/hmm-tests.c | 805 ++++++++++++++++++++++++-
>  19 files changed, 2072 insertions(+), 312 deletions(-)
>

I only got the cover letter. Did you forget to send the actual patches?

Thanks.

Best Regards,
Yan, Zi
Re: [v1 00/12] THP support for zone device page migration
Posted by Balbir Singh 3 months ago
On 7/4/25 09:00, Zi Yan wrote:
> On 3 Jul 2025, at 18:27, Balbir Singh wrote:
> 
>> This patch series adds support for THP migration of zone device pages.
>> To do so, the patches implement support for folio zone device pages
>> by adding support for setting up larger order pages.
>>
>> These patches build on the earlier posts by Ralph Campbell [1]
>>
>> Two new flags are added in vma_migration to select and mark compound pages.
>> migrate_vma_setup(), migrate_vma_pages() and migrate_vma_finalize()
>> support migration of these pages when MIGRATE_VMA_SELECT_COMPOUND
>> is passed in as arguments.
>>
>> The series also adds zone device awareness to (m)THP pages along
>> with fault handling of large zone device private pages. page vma walk
>> and the rmap code is also zone device aware. Support has also been
>> added for folios that might need to be split in the middle
>> of migration (when the src and dst do not agree on
>> MIGRATE_PFN_COMPOUND), that occurs when src side of the migration can
>> migrate large pages, but the destination has not been able to allocate
>> large pages. The code supported and used folio_split() when migrating
>> THP pages, this is used when MIGRATE_VMA_SELECT_COMPOUND is not passed
>> as an argument to migrate_vma_setup().
>>
>> The test infrastructure lib/test_hmm.c has been enhanced to support THP
>> migration. A new ioctl to emulate failure of large page allocations has
>> been added to test the folio split code path. hmm-tests.c has new test
>> cases for huge page migration and to test the folio split path. A new
>> throughput test has been added as well.
>>
>> The nouveau dmem code has been enhanced to use the new THP migration
>> capability.
>>
>> Feedback from the RFC [2]:
>>
>> It was advised that prep_compound_page() not be exposed just for the purposes
>> of testing (test driver lib/test_hmm.c). Work arounds of copy and split the
>> folios did not work due to lock order dependency in the callback for
>> split folio.
>>
>> mTHP support:
>>
>> The patches hard code, HPAGE_PMD_NR in a few places, but the code has
>> been kept generic to support various order sizes. With additional
>> refactoring of the code support of different order sizes should be
>> possible.
>>
>> The future plan is to post enhancements to support mTHP with a rough
>> design as follows:
>>
>> 1. Add the notion of allowable thp orders to the HMM based test driver
>> 2. For non PMD based THP paths in migrate_device.c, check to see if
>>    a suitable order is found and supported by the driver
>> 3. Iterate across orders to check the highest supported order for migration
>> 4. Migrate and finalize
>>
>> The mTHP patches can be built on top of this series, the key design elements
>> that need to be worked out are infrastructure and driver support for multiple
>> ordered pages and their migration.
>>
>> References:
>> [1] https://lore.kernel.org/linux-mm/20201106005147.20113-1-rcampbell@nvidia.com/
>> [2] https://lore.kernel.org/linux-mm/20250306044239.3874247-3-balbirs@nvidia.com/T/
>>
>> These patches are built on top of mm-unstable
>>
>> Cc: Karol Herbst <kherbst@redhat.com>
>> Cc: Lyude Paul <lyude@redhat.com>
>> Cc: Danilo Krummrich <dakr@kernel.org>
>> Cc: David Airlie <airlied@gmail.com>
>> Cc: Simona Vetter <simona@ffwll.ch>
>> Cc: "Jérôme Glisse" <jglisse@redhat.com>
>> Cc: Shuah Khan <shuah@kernel.org>
>> Cc: David Hildenbrand <david@redhat.com>
>> Cc: Barry Song <baohua@kernel.org>
>> Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
>> Cc: Ryan Roberts <ryan.roberts@arm.com>
>> Cc: Matthew Wilcox <willy@infradead.org>
>> Cc: Peter Xu <peterx@redhat.com>
>> Cc: Zi Yan <ziy@nvidia.com>
>> Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
>> Cc: Jane Chu <jane.chu@oracle.com>
>> Cc: Alistair Popple <apopple@nvidia.com>
>> Cc: Donet Tom <donettom@linux.ibm.com>
>>
>> Changelog v1:
>> - Changes from RFC [2], include support for handling fault_folio and using
>>   trylock in the fault path
>> - A new test case has been added to measure the throughput improvement
>> - General refactoring of code to keep up with the changes in mm
>> - New split folio callback when the entire split is complete/done. The
>>   callback is used to know when the head order needs to be reset.
>>
>> Testing:
>> - Testing was done with ZONE_DEVICE private pages on an x86 VM
>> - Throughput showed upto 5x improvement with THP migration, system to device
>>   migration is slower due to the mirroring of data (see buffer->mirror)
>>
>> Balbir Singh (12):
>>   mm/zone_device: support large zone device private folios
>>   mm/migrate_device: flags for selecting device private THP pages
>>   mm/thp: zone_device awareness in THP handling code
>>   mm/migrate_device: THP migration of zone device pages
>>   mm/memory/fault: Add support for zone device THP fault handling
>>   lib/test_hmm: test cases and support for zone device private THP
>>   mm/memremap: Add folio_split support
>>   mm/thp: add split during migration support
>>   lib/test_hmm: add test case for split pages
>>   selftests/mm/hmm-tests: new tests for zone device THP migration
>>   gpu/drm/nouveau: Add THP migration support
>>   selftests/mm/hmm-tests: New throughput tests including THP
>>
>>  drivers/gpu/drm/nouveau/nouveau_dmem.c | 246 +++++---
>>  drivers/gpu/drm/nouveau/nouveau_svm.c  |   6 +-
>>  drivers/gpu/drm/nouveau/nouveau_svm.h  |   3 +-
>>  include/linux/huge_mm.h                |  18 +-
>>  include/linux/memremap.h               |  29 +-
>>  include/linux/migrate.h                |   2 +
>>  include/linux/mm.h                     |   1 +
>>  lib/test_hmm.c                         | 428 ++++++++++---
>>  lib/test_hmm_uapi.h                    |   3 +
>>  mm/huge_memory.c                       | 261 ++++++--
>>  mm/memory.c                            |   6 +-
>>  mm/memremap.c                          |  50 +-
>>  mm/migrate.c                           |   2 +
>>  mm/migrate_device.c                    | 488 ++++++++++++---
>>  mm/page_alloc.c                        |   1 +
>>  mm/page_vma_mapped.c                   |  10 +
>>  mm/pgtable-generic.c                   |   6 +
>>  mm/rmap.c                              |  19 +-
>>  tools/testing/selftests/mm/hmm-tests.c | 805 ++++++++++++++++++++++++-
>>  19 files changed, 2072 insertions(+), 312 deletions(-)
>>
> 
> I only got the cover letter. Did you forget to send the actual patches?
> 

A script of mine stripped the cc from the actual patches (my bad), the rest went to
linux-mm and linux-kernel. I can resend if needed

Balbir