include/linux/hmm.h | 17 +- include/linux/migrate.h | 6 +- lib/test_hmm.c | 100 +++- lib/test_hmm_uapi.h | 19 +- mm/hmm.c | 657 +++++++++++++++++++++++-- mm/migrate_device.c | 589 +++------------------- tools/testing/selftests/mm/hmm-tests.c | 54 ++ 7 files changed, 869 insertions(+), 573 deletions(-)
From: Mika Penttilä <mpenttil@redhat.com> Currently, the way device page faulting and migration works is not optimal, if you want to do both fault handling and migration at once. Being able to migrate not present pages (or pages mapped with incorrect permissions, eg. COW) to the GPU requires doing either of the following sequences: 1. hmm_range_fault() - fault in non-present pages with correct permissions, etc. 2. migrate_vma_*() - migrate the pages Or: 1. migrate_vma_*() - migrate present pages 2. If non-present pages detected by migrate_vma_*(): a) call hmm_range_fault() to fault pages in b) call migrate_vma_*() again to migrate now present pages The problem with the first sequence is that you always have to do two page walks even when most of the time the pages are present or zero page mappings so the common case takes a performance hit. The second sequence is better for the common case, but far worse if pages aren't present because now you have to walk the page tables three times (once to find the page is not present, once so hmm_range_fault() can find a non-present page to fault in and once again to setup the migration). It is also tricky to code correctly. One page table walk could costs over 1000 cpu cycles on X86-64, which is a significant hit. We should be able to walk the page table once, faulting pages in as required and replacing them with migration entries if requested. Add a new flag to HMM APIs, HMM_PFN_REQ_MIGRATE, which tells to prepare for migration also during fault handling. Also, for the migrate_vma_setup() call paths, a flags, MIGRATE_VMA_FAULT, is added to tell to add fault handling to migrate. Tested in X86-64 VM with HMM test device, passing the selftests. Tested also rebased on the "Remove device private pages from physical address space" series: https://lore.kernel.org/linux-mm/20260107091823.68974-1-jniethe@nvidia.com/ plus a small patch to adjust with no problems. Changes from RFC: - rebase on 6.19-rc5 - adjust for the device THP - changes from feedback Revisions: - RFC https://lore.kernel.org/linux-mm/20250814072045.3637192-1-mpenttil@redhat.com/ Cc: David Hildenbrand <david@redhat.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Leon Romanovsky <leonro@nvidia.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Balbir Singh <balbirs@nvidia.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Matthew Brost <matthew.brost@intel.com> Suggested-by: Alistair Popple <apopple@nvidia.com> Signed-off-by: Mika Penttilä <mpenttil@redhat.com> Mika Penttilä (3): mm: unified hmm fault and migrate device pagewalk paths mm: add new testcase for the migrate on fault case mm:/migrate_device.c: remove migrate_vma_collect_*() functions include/linux/hmm.h | 17 +- include/linux/migrate.h | 6 +- lib/test_hmm.c | 100 +++- lib/test_hmm_uapi.h | 19 +- mm/hmm.c | 657 +++++++++++++++++++++++-- mm/migrate_device.c | 589 +++------------------- tools/testing/selftests/mm/hmm-tests.c | 54 ++ 7 files changed, 869 insertions(+), 573 deletions(-) -- 2.50.0
On 1/14/26 20:19, mpenttil@redhat.com wrote: > From: Mika Penttilä <mpenttil@redhat.com> > > Currently, the way device page faulting and migration works > is not optimal, if you want to do both fault handling and > migration at once. > > Being able to migrate not present pages (or pages mapped with incorrect > permissions, eg. COW) to the GPU requires doing either of the > following sequences: > > 1. hmm_range_fault() - fault in non-present pages with correct permissions, etc. > 2. migrate_vma_*() - migrate the pages > > Or: > > 1. migrate_vma_*() - migrate present pages > 2. If non-present pages detected by migrate_vma_*(): > a) call hmm_range_fault() to fault pages in > b) call migrate_vma_*() again to migrate now present pages > > The problem with the first sequence is that you always have to do two > page walks even when most of the time the pages are present or zero page > mappings so the common case takes a performance hit. > > The second sequence is better for the common case, but far worse if > pages aren't present because now you have to walk the page tables three > times (once to find the page is not present, once so hmm_range_fault() > can find a non-present page to fault in and once again to setup the > migration). It is also tricky to code correctly. One page table walk > could costs over 1000 cpu cycles on X86-64, which is a significant hit. > > We should be able to walk the page table once, faulting > pages in as required and replacing them with migration entries if > requested. > > Add a new flag to HMM APIs, HMM_PFN_REQ_MIGRATE, > which tells to prepare for migration also during fault handling. > Also, for the migrate_vma_setup() call paths, a flags, MIGRATE_VMA_FAULT, > is added to tell to add fault handling to migrate. > > Tested in X86-64 VM with HMM test device, passing the selftests. > Tested also rebased on the > "Remove device private pages from physical address space" series: > https://lore.kernel.org/linux-mm/20260107091823.68974-1-jniethe@nvidia.com/ > plus a small patch to adjust with no problems. > > Changes from RFC: > - rebase on 6.19-rc5 > - adjust for the device THP > - changes from feedback > > Revisions: > - RFC https://lore.kernel.org/linux-mm/20250814072045.3637192-1-mpenttil@redhat.com/ > > Cc: David Hildenbrand <david@redhat.com> > Cc: Jason Gunthorpe <jgg@nvidia.com> > Cc: Leon Romanovsky <leonro@nvidia.com> > Cc: Alistair Popple <apopple@nvidia.com> > Cc: Balbir Singh <balbirs@nvidia.com> > Cc: Zi Yan <ziy@nvidia.com> > Cc: Matthew Brost <matthew.brost@intel.com> > Suggested-by: Alistair Popple <apopple@nvidia.com> > Signed-off-by: Mika Penttilä <mpenttil@redhat.com> > > Mika Penttilä (3): > mm: unified hmm fault and migrate device pagewalk paths > mm: add new testcase for the migrate on fault case > mm:/migrate_device.c: remove migrate_vma_collect_*() functions > > include/linux/hmm.h | 17 +- > include/linux/migrate.h | 6 +- > lib/test_hmm.c | 100 +++- > lib/test_hmm_uapi.h | 19 +- > mm/hmm.c | 657 +++++++++++++++++++++++-- > mm/migrate_device.c | 589 +++------------------- > tools/testing/selftests/mm/hmm-tests.c | 54 ++ > 7 files changed, 869 insertions(+), 573 deletions(-) > I see some kernel test robot failures, I assume there will be a new version for review? Balbir
Hi Balbir! On 1/16/26 01:16, Balbir Singh wrote: > On 1/14/26 20:19, mpenttil@redhat.com wrote: >> From: Mika Penttilä <mpenttil@redhat.com> >> >> Currently, the way device page faulting and migration works >> is not optimal, if you want to do both fault handling and >> migration at once. >> >> Being able to migrate not present pages (or pages mapped with incorrect >> permissions, eg. COW) to the GPU requires doing either of the >> following sequences: >> >> 1. hmm_range_fault() - fault in non-present pages with correct permissions, etc. >> 2. migrate_vma_*() - migrate the pages >> >> Or: >> >> 1. migrate_vma_*() - migrate present pages >> 2. If non-present pages detected by migrate_vma_*(): >> a) call hmm_range_fault() to fault pages in >> b) call migrate_vma_*() again to migrate now present pages >> >> The problem with the first sequence is that you always have to do two >> page walks even when most of the time the pages are present or zero page >> mappings so the common case takes a performance hit. >> >> The second sequence is better for the common case, but far worse if >> pages aren't present because now you have to walk the page tables three >> times (once to find the page is not present, once so hmm_range_fault() >> can find a non-present page to fault in and once again to setup the >> migration). It is also tricky to code correctly. One page table walk >> could costs over 1000 cpu cycles on X86-64, which is a significant hit. >> >> We should be able to walk the page table once, faulting >> pages in as required and replacing them with migration entries if >> requested. >> >> Add a new flag to HMM APIs, HMM_PFN_REQ_MIGRATE, >> which tells to prepare for migration also during fault handling. >> Also, for the migrate_vma_setup() call paths, a flags, MIGRATE_VMA_FAULT, >> is added to tell to add fault handling to migrate. >> >> Tested in X86-64 VM with HMM test device, passing the selftests. >> Tested also rebased on the >> "Remove device private pages from physical address space" series: >> https://lore.kernel.org/linux-mm/20260107091823.68974-1-jniethe@nvidia.com/ >> plus a small patch to adjust with no problems. >> >> Changes from RFC: >> - rebase on 6.19-rc5 >> - adjust for the device THP >> - changes from feedback >> >> Revisions: >> - RFC https://lore.kernel.org/linux-mm/20250814072045.3637192-1-mpenttil@redhat.com/ >> >> Cc: David Hildenbrand <david@redhat.com> >> Cc: Jason Gunthorpe <jgg@nvidia.com> >> Cc: Leon Romanovsky <leonro@nvidia.com> >> Cc: Alistair Popple <apopple@nvidia.com> >> Cc: Balbir Singh <balbirs@nvidia.com> >> Cc: Zi Yan <ziy@nvidia.com> >> Cc: Matthew Brost <matthew.brost@intel.com> >> Suggested-by: Alistair Popple <apopple@nvidia.com> >> Signed-off-by: Mika Penttilä <mpenttil@redhat.com> >> >> Mika Penttilä (3): >> mm: unified hmm fault and migrate device pagewalk paths >> mm: add new testcase for the migrate on fault case >> mm:/migrate_device.c: remove migrate_vma_collect_*() functions >> >> include/linux/hmm.h | 17 +- >> include/linux/migrate.h | 6 +- >> lib/test_hmm.c | 100 +++- >> lib/test_hmm_uapi.h | 19 +- >> mm/hmm.c | 657 +++++++++++++++++++++++-- >> mm/migrate_device.c | 589 +++------------------- >> tools/testing/selftests/mm/hmm-tests.c | 54 ++ >> 7 files changed, 869 insertions(+), 573 deletions(-) >> > > I see some kernel test robot failures, I assume there will be a new version > for review? > > Balbir > Yes I will fix those and send a new version. The test robot failures are basically missing guards for !MIGRATE configs. Thanks, Mika
© 2016 - 2026 Red Hat, Inc.