[v2] Remove device private pages from physical address space

[PATCH v2 00/11] Remove device private pages from physical address space

Posted by Jordan Niethe 1 month ago

Today, when creating these device private struct pages, the first step
is to use request_free_mem_region() to get a range of physical address
space large enough to represent the devices memory. This allocated
physical address range is then remapped as device private memory using
memremap_pages.

Needing allocation of physical address space has some problems:

  1) There may be insufficient physical address space to represent the
     device memory. KASLR reducing the physical address space and VM
     configurations with limited physical address space increase the
     likelihood of hitting this especially as device memory increases. This
     has been observed to prevent device private from being initialized.  

  2) Attempting to add the device private pages to the linear map at
     addresses beyond the actual physical memory causes issues on
     architectures like aarch64  - meaning the feature does not work there [0].

This series changes device private memory so that it does not require
allocation of physical address space and these problems are avoided.
Instead of using the physical address space, we introduce a "device
private address space" and allocate from there.

A consequence of placing the device private pages outside of the
physical address space is that they no longer have a PFN. However, it is
still necessary to be able to look up a corresponding device private
page from a device private PTE entry, which means that we still require
some way to index into this device private address space. Instead of a
PFN, device private pages use an offset into this device private address
space to look up device private struct pages.

The problem that then needs to be addressed is how to avoid confusing
these device private offsets with PFNs. It is the inherent limited usage
of the device private pages themselves which make this possible. A
device private page is only used for userspace mappings, we do not need
to be concerned with them being used within the mm more broadly. This
means that the only way that the core kernel looks up these pages is via
the page table, where their PTE already indicates if they refer to a
device private page via their swap type, e.g.  SWP_DEVICE_WRITE. We can
use this information to determine if the PTE contains a PFN which should
be looked up in the page map, or a device private offset which should be
looked up elsewhere.

This applies when we are creating PTE entries for device private pages -
because they have their own type there are already must be handled
separately, so it is a small step to convert them to a device private
PFN now too.

The first part of the series updates callers where device private
offsets might now be encountered to track this extra state.

The last patch contains the bulk of the work where we change how we
convert between device private pages to device private offsets and then
use a new interface for allocating device private pages without the need
for reserving physical address space.

By removing the device private pages from the physical address space,
this series also opens up the possibility to moving away from tracking
device private memory using struct pages in the future. This is
desirable as on systems with large amounts of memory these device
private struct pages use a signifiant amount of memory and take a
significant amount of time to initialize.

*** Changes in v2 ***

The most significant change in v2 is addressing code paths that are
common between MEMORY_DEVICE_PRIVATE and MEMORY_DEVICE_COHERENT devices.

This had been overlooked in previous revisions.

To do this we introduce a migrate_pfn_from_page() helper which will call
device_private_offset_to_page() and set the MIGRATE_PFN_DEVICE_PRIVATE
flag if required.

In places where we could have a device private offset
(MEMORY_DEVICE_PRIVATE) or a pfn (MEMORY_DEVICE_COHERENT) we update to
use an mpfn to disambiguate.  This includes some users in the drivers
and migrate_device_{pfns,range}().

Seeking opinions on using the mpfns like this or if a new type would be
preferred.

  - mm/migrate_device: Introduce migrate_pfn_from_page() helper
    - New to series

  - drm/amdkfd: Use migrate pfns internally
    - New to series

  - mm/migrate_device: Make migrate_device_{pfns,range}() take mpfns
    - New to series

  - mm/migrate_device: Add migrate PFN flag to track device private pages
    - Update for migrate_pfn_from_page()
    - Rename to MIGRATE_PFN_DEVICE_PRIVATE
    - drm/amd: Check adev->gmc.xgmi.connected_to_cpu
    - lib/test_hmm.c: Check chunk->pagemap.type == MEMORY_DEVICE_PRIVATE

  - mm: Add helpers to create migration entries from struct pages
    - Add a flags param

  - mm: Add a new swap type for migration entries of device private pages
    - Add softleaf_is_migration_device_private_read()

  - mm: Add helpers to create device private entries from struct pages
    - Add a flags param

  - mm: Remove device private pages from the physical address space
    - Make sure last member of struct dev_pagemap remains DECLARE_FLEX_ARRAY(struct range, ranges);

Testing:
- selftests/mm/hmm-tests on an amd64 VM

* NOTE: I will need help in testing the driver changes *

Revisions:
- RFC: https://lore.kernel.org/all/20251128044146.80050-1-jniethe@nvidia.com/
- v1: https://lore.kernel.org/all/20251231043154.42931-1-jniethe@nvidia.com/

[0] https://lore.kernel.org/lkml/CAMj1kXFZ=4hLL1w6iCV5O5uVoVLHAJbc0rr40j24ObenAjXe9w@mail.gmail.com/

Jordan Niethe (11):
  mm/migrate_device: Introduce migrate_pfn_from_page() helper
  drm/amdkfd: Use migrate pfns internally
  mm/migrate_device: Make migrate_device_{pfns,range}() take mpfns
  mm/migrate_device: Add migrate PFN flag to track device private pages
  mm/page_vma_mapped: Add flags to page_vma_mapped_walk::pfn to track
    device private pages
  mm: Add helpers to create migration entries from struct pages
  mm: Add a new swap type for migration entries of device private pages
  mm: Add helpers to create device private entries from struct pages
  mm/util: Add flag to track device private pages in page snapshots
  mm/hmm: Add flag to track device private pages
  mm: Remove device private pages from the physical address space

 Documentation/mm/hmm.rst                 |  11 +-
 arch/powerpc/kvm/book3s_hv_uvmem.c       |  43 ++---
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c |  45 +++---
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.h |   2 +-
 drivers/gpu/drm/drm_pagemap.c            |  11 +-
 drivers/gpu/drm/nouveau/nouveau_dmem.c   |  45 ++----
 drivers/gpu/drm/xe/xe_svm.c              |  37 ++---
 fs/proc/page.c                           |   6 +-
 include/drm/drm_pagemap.h                |   8 +-
 include/linux/hmm.h                      |   7 +-
 include/linux/leafops.h                  | 116 ++++++++++++--
 include/linux/memremap.h                 |  64 +++++++-
 include/linux/migrate.h                  |  23 ++-
 include/linux/mm.h                       |   9 +-
 include/linux/rmap.h                     |  33 +++-
 include/linux/swap.h                     |   8 +-
 include/linux/swapops.h                  | 136 ++++++++++++++++
 lib/test_hmm.c                           |  86 ++++++----
 mm/debug.c                               |   9 +-
 mm/hmm.c                                 |   5 +-
 mm/huge_memory.c                         |  43 ++---
 mm/hugetlb.c                             |  15 +-
 mm/memory.c                              |   5 +-
 mm/memremap.c                            | 193 ++++++++++++++++++-----
 mm/migrate.c                             |   6 +-
 mm/migrate_device.c                      |  76 +++++----
 mm/mm_init.c                             |   8 +-
 mm/mprotect.c                            |  10 +-
 mm/page_vma_mapped.c                     |  32 +++-
 mm/rmap.c                                |  59 ++++---
 mm/util.c                                |   8 +-
 mm/vmscan.c                              |   2 +-
 32 files changed, 822 insertions(+), 339 deletions(-)


base-commit: f8f9c1f4d0c7a64600e2ca312dec824a0bc2f1da
-- 
2.34.1

Re: [PATCH v2 00/11] Remove device private pages from physical address space

Posted by Andrew Morton 1 month ago

On Wed,  7 Jan 2026 20:18:12 +1100 Jordan Niethe <jniethe@nvidia.com> wrote:

> Today, when creating these device private struct pages, the first step
> is to use request_free_mem_region() to get a range of physical address
> space large enough to represent the devices memory. This allocated
> physical address range is then remapped as device private memory using
> memremap_pages.

Welcome to Linux MM.  That's a heck of an opening salvo ;)

> Needing allocation of physical address space has some problems:
> 
>   1) There may be insufficient physical address space to represent the
>      device memory. KASLR reducing the physical address space and VM
>      configurations with limited physical address space increase the
>      likelihood of hitting this especially as device memory increases. This
>      has been observed to prevent device private from being initialized.  
> 
>   2) Attempting to add the device private pages to the linear map at
>      addresses beyond the actual physical memory causes issues on
>      architectures like aarch64  - meaning the feature does not work there [0].

Can you better help us understand the seriousness of these problems? 
How much are our users really hurting from this?

> Seeking opinions on using the mpfns like this or if a new type would be
> preferred.

Whose opinions?  IOW, can you suggest who you'd like to see review this
work?

> 
> * NOTE: I will need help in testing the driver changes *
> 

Again, please name names ;)  I'm not afraid to prod.


I'm reluctant to add this to mm.git's development/testing branches at
this time.  Your advice on when you think we're ready for that step
would be valuable, thanks.

Re: [PATCH v2 00/11] Remove device private pages from physical address space

Posted by Alistair Popple 1 month ago

On 2026-01-08 at 07:06 +1100, Andrew Morton <akpm@linux-foundation.org> wrote...
> On Wed,  7 Jan 2026 20:18:12 +1100 Jordan Niethe <jniethe@nvidia.com> wrote:
> 
> > Today, when creating these device private struct pages, the first step
> > is to use request_free_mem_region() to get a range of physical address
> > space large enough to represent the devices memory. This allocated
> > physical address range is then remapped as device private memory using
> > memremap_pages.
> 
> Welcome to Linux MM.  That's a heck of an opening salvo ;)
> 
> > Needing allocation of physical address space has some problems:
> > 
> >   1) There may be insufficient physical address space to represent the
> >      device memory. KASLR reducing the physical address space and VM
> >      configurations with limited physical address space increase the
> >      likelihood of hitting this especially as device memory increases. This
> >      has been observed to prevent device private from being initialized.  
> > 
> >   2) Attempting to add the device private pages to the linear map at
> >      addresses beyond the actual physical memory causes issues on
> >      architectures like aarch64  - meaning the feature does not work there [0].
> 
> Can you better help us understand the seriousness of these problems? 
> How much are our users really hurting from this?

Hopefully the rest of the thread helps address this.

> > Seeking opinions on using the mpfns like this or if a new type would be
> > preferred.
> 
> Whose opinions?  IOW, can you suggest who you'd like to see review this
> work?

I was going to see if I could find Lorenzo on IRC as I think it would be good to
get his opinion on the softleaf changes. And probably Felix's (and my) opinion
for the mpfn changes (I don't think Intel currently uses DEVICE_COHERENT which
this bit has the biggest impact on).

> > 
> > * NOTE: I will need help in testing the driver changes *
> > 
> 
> Again, please name names ;)  I'm not afraid to prod.

As noted in the other thread Intel Xe and AMD GPU are the biggest. Matthew has
already offered to help test Intel (thanks!) and Felix saw the v1 posting so
hoping he can help with testing there.

> I'm reluctant to add this to mm.git's development/testing branches at
> this time.  Your advice on when you think we're ready for that step
> would be valuable, thanks.

Will leave the readiness call to Jordan, but we were hoping to get
this in for the v6.20 merge window if at all possible. I realise
we're probably running late given we generally like to let stuff
settle in development/testing branches for a while prior to the
merge window, but it did have an early round of review last year
(https://lore.kernel.org/linux-mm/20251128044146.80050-1-jniethe@nvidia.com/)
and I reviewed it internally and it looked very reasonable.

I will take a look at this latest version later today.

 - Alistair

Re: [PATCH v2 00/11] Remove device private pages from physical address space

Posted by Jordan Niethe 1 month ago

Hi,

On 8/1/26 12:49, Alistair Popple wrote:
> On 2026-01-08 at 07:06 +1100, Andrew Morton <akpm@linux-foundation.org> wrote...
>> On Wed,  7 Jan 2026 20:18:12 +1100 Jordan Niethe <jniethe@nvidia.com> wrote:
>>
>>> Today, when creating these device private struct pages, the first step
>>> is to use request_free_mem_region() to get a range of physical address
>>> space large enough to represent the devices memory. This allocated
>>> physical address range is then remapped as device private memory using
>>> memremap_pages.
>>
>> Welcome to Linux MM.  That's a heck of an opening salvo ;)
>>
>>> Needing allocation of physical address space has some problems:
>>>
>>>    1) There may be insufficient physical address space to represent the
>>>       device memory. KASLR reducing the physical address space and VM
>>>       configurations with limited physical address space increase the
>>>       likelihood of hitting this especially as device memory increases. This
>>>       has been observed to prevent device private from being initialized.
>>>
>>>    2) Attempting to add the device private pages to the linear map at
>>>       addresses beyond the actual physical memory causes issues on
>>>       architectures like aarch64  - meaning the feature does not work there [0].
>>
>> Can you better help us understand the seriousness of these problems?
>> How much are our users really hurting from this?
> 
> Hopefully the rest of the thread helps address this.
> 
>>> Seeking opinions on using the mpfns like this or if a new type would be
>>> preferred.
>>
>> Whose opinions?  IOW, can you suggest who you'd like to see review this
>> work?
> 
> I was going to see if I could find Lorenzo on IRC as I think it would be good to
> get his opinion on the softleaf changes. And probably Felix's (and my) opinion
> for the mpfn changes (I don't think Intel currently uses DEVICE_COHERENT which
> this bit has the biggest impact on).

It also effects intel's driver because the mpfn changes also touch
migrate_device_pfns() which gets used there.

So also looking for Matthew's thoughts here as well as Felix's.

> 
>>>
>>> * NOTE: I will need help in testing the driver changes *
>>>
>>
>> Again, please name names ;)  I'm not afraid to prod.
> 
> As noted in the other thread Intel Xe and AMD GPU are the biggest. Matthew has
> already offered to help test Intel (thanks!) and Felix saw the v1 posting so
> hoping he can help with testing there.

Yes, I should also be able to get run this through the intel-xe CI.
The other area that needs testing is the powerpc ultravisor.
(+cc) Madhavan Srinivasan - are you able to help here?

> 
>> I'm reluctant to add this to mm.git's development/testing branches at
>> this time.  Your advice on when you think we're ready for that step
>> would be valuable, thanks.
> 
> Will leave the readiness call to Jordan, but we were hoping to get
> this in for the v6.20 merge window if at all possible. I realise
> we're probably running late given we generally like to let stuff
> settle in development/testing branches for a while prior to the
> merge window, but it did have an early round of review last year
> (https://lore.kernel.org/linux-mm/20251128044146.80050-1-jniethe@nvidia.com/)
> and I reviewed it internally and it looked very reasonable.

Matt has kindly said that he is reviewing the patches so will wait for 
his feedback.
I'd also like to get the results from the intel-xe CI first.

Andrew, I'll advise on including in mm.git after these steps - but I don't
expect any major issues at this stage.  The changes have been solid with the
hmm selftests and with updating our out of tree driver to use the new
interface.


Thanks,
Jordan.

> 
> I will take a look at this latest version later today.
> 
>   - Alistair

Re: [PATCH v2 00/11] Remove device private pages from physical address space

Posted by John Hubbard 1 month ago

On 1/7/26 12:06 PM, Andrew Morton wrote:
> On Wed,  7 Jan 2026 20:18:12 +1100 Jordan Niethe <jniethe@nvidia.com> wrote:
...
> Can you better help us understand the seriousness of these problems? 
> How much are our users really hurting from this?
> 

A lot! We have been involved in escalations from various customers
who have attempted to enable, say, KASLR and HMM at the same time.
And they ran out of phys address space, forcing them into an awkward
ugly choice of one or the other, often.

This is a huge pain point and a barrier to HMM adoption.

thanks,
-- 
John Hubbard

Re: [PATCH v2 00/11] Remove device private pages from physical address space

Posted by Jason Gunthorpe 1 month ago

On Wed, Jan 07, 2026 at 12:06:08PM -0800, Andrew Morton wrote:

> >   2) Attempting to add the device private pages to the linear map at
> >      addresses beyond the actual physical memory causes issues on
> >      architectures like aarch64  - meaning the feature does not work there [0].
> 
> Can you better help us understand the seriousness of these problems? 
> How much are our users really hurting from this?

We think it is pretty serious, in the future HW support sense, as it
means real systems being built do not work :)

Also Willy and others were cheering this work on at LPC. I think the
possible followup to move DEVICE_PRIVATE from struct page and reduce
the memory allocation would be well celebrated.

The Intel Xe and AMD GPU teams are the two drivers most important to
be testing this as they consume the feature.

Jason

Re: [PATCH v2 00/11] Remove device private pages from physical address space

Posted by Balbir Singh 1 month ago

On 1/8/26 06:54, Jason Gunthorpe wrote:
> On Wed, Jan 07, 2026 at 12:06:08PM -0800, Andrew Morton wrote:
> 
>>>   2) Attempting to add the device private pages to the linear map at
>>>      addresses beyond the actual physical memory causes issues on
>>>      architectures like aarch64  - meaning the feature does not work there [0].
>>
>> Can you better help us understand the seriousness of these problems? 
>> How much are our users really hurting from this?
> 
> We think it is pretty serious, in the future HW support sense, as it
> means real systems being built do not work :)
> 
> Also Willy and others were cheering this work on at LPC. I think the
> possible followup to move DEVICE_PRIVATE from struct page and reduce
> the memory allocation would be well celebrated.
> 
> The Intel Xe and AMD GPU teams are the two drivers most important to
> be testing this as they consume the feature.
> 

And the ultravisor usage in powerpc as well (book3s_hv_uvmem).

Balbir

Re: [PATCH v2 00/11] Remove device private pages from physical address space

Posted by Alistair Popple 1 month ago

On 2026-01-08 at 08:02 +1100, Balbir Singh <balbirs@nvidia.com> wrote...
> On 1/8/26 06:54, Jason Gunthorpe wrote:
> > On Wed, Jan 07, 2026 at 12:06:08PM -0800, Andrew Morton wrote:
> > 
> >>>   2) Attempting to add the device private pages to the linear map at
> >>>      addresses beyond the actual physical memory causes issues on
> >>>      architectures like aarch64  - meaning the feature does not work there [0].
> >>
> >> Can you better help us understand the seriousness of these problems? 
> >> How much are our users really hurting from this?
> > 
> > We think it is pretty serious, in the future HW support sense, as it
> > means real systems being built do not work :)

There's actually existing HW that could benefit from this support - after all
there is nothing stopping someone plugging a Intel/AMD/NVIDIA GPU into an ARM
machine today :-)

So it would be nice if we could support this feature there as it results in
really sub-optimal performance compared with x86 when using the SVM (shared
virtual memory) feature because data has to be remote mapped (ie. accessed via
PCIe link) rather than migrated to local GPU video memory.

Having the kernel steal physical address space has also caused problems on
x86 - we have encountered virtualised environments which depending on specific
firmware/BIOS don't have enough free physical address space to support device
private pages and hence migration of memory to the GPU device, again leading to
sub-optmial performance.

> > Also Willy and others were cheering this work on at LPC. I think the
> > possible followup to move DEVICE_PRIVATE from struct page and reduce
> > the memory allocation would be well celebrated.

For reference the recording of my LPC presentation covering both this series and
the above is here - https://www.youtube.com/watch?v=CFe_c8-tEuM

The hope is that in addition to enabling support for this more broadly across
other platforms/architectures that it will also enable further clean-ups to
reduce memory allocation overhead (I almost convinced myself we wouldn't need a
struct at all ... almost)

> > The Intel Xe and AMD GPU teams are the two drivers most important to
> > be testing this as they consume the feature.
> > 
> 
> And the ultravisor usage in powerpc as well (book3s_hv_uvmem).

As does Nouveau (which I've tested). But I agree AMD GPU and Intel Xe are the
most important drivers here. I would be surprised if anyone was actually using
the powerpc ultravisor, and I don't have access to a setup for this, so unless
some PPC folk can offer to help I wouldn't like to see testing there hold up
the series.

Especially as I believe most of the driver side changes are relatively straight
forward.

 - Alistair

> Balbir

Re: [PATCH v2 00/11] Remove device private pages from physical address space

Posted by Matthew Brost 1 month ago

On Wed, Jan 07, 2026 at 08:18:12PM +1100, Jordan Niethe wrote:
> Today, when creating these device private struct pages, the first step
> is to use request_free_mem_region() to get a range of physical address
> space large enough to represent the devices memory. This allocated
> physical address range is then remapped as device private memory using
> memremap_pages.
> 
> Needing allocation of physical address space has some problems:
> 
>   1) There may be insufficient physical address space to represent the
>      device memory. KASLR reducing the physical address space and VM
>      configurations with limited physical address space increase the
>      likelihood of hitting this especially as device memory increases. This
>      has been observed to prevent device private from being initialized.  
> 
>   2) Attempting to add the device private pages to the linear map at
>      addresses beyond the actual physical memory causes issues on
>      architectures like aarch64  - meaning the feature does not work there [0].
> 
> This series changes device private memory so that it does not require
> allocation of physical address space and these problems are avoided.
> Instead of using the physical address space, we introduce a "device
> private address space" and allocate from there.
> 
> A consequence of placing the device private pages outside of the
> physical address space is that they no longer have a PFN. However, it is
> still necessary to be able to look up a corresponding device private
> page from a device private PTE entry, which means that we still require
> some way to index into this device private address space. Instead of a
> PFN, device private pages use an offset into this device private address
> space to look up device private struct pages.
> 
> The problem that then needs to be addressed is how to avoid confusing
> these device private offsets with PFNs. It is the inherent limited usage
> of the device private pages themselves which make this possible. A
> device private page is only used for userspace mappings, we do not need
> to be concerned with them being used within the mm more broadly. This
> means that the only way that the core kernel looks up these pages is via
> the page table, where their PTE already indicates if they refer to a
> device private page via their swap type, e.g.  SWP_DEVICE_WRITE. We can
> use this information to determine if the PTE contains a PFN which should
> be looked up in the page map, or a device private offset which should be
> looked up elsewhere.
> 
> This applies when we are creating PTE entries for device private pages -
> because they have their own type there are already must be handled
> separately, so it is a small step to convert them to a device private
> PFN now too.
> 
> The first part of the series updates callers where device private
> offsets might now be encountered to track this extra state.
> 
> The last patch contains the bulk of the work where we change how we
> convert between device private pages to device private offsets and then
> use a new interface for allocating device private pages without the need
> for reserving physical address space.
> 
> By removing the device private pages from the physical address space,
> this series also opens up the possibility to moving away from tracking
> device private memory using struct pages in the future. This is
> desirable as on systems with large amounts of memory these device
> private struct pages use a signifiant amount of memory and take a
> significant amount of time to initialize.
> 
> *** Changes in v2 ***
> 
> The most significant change in v2 is addressing code paths that are
> common between MEMORY_DEVICE_PRIVATE and MEMORY_DEVICE_COHERENT devices.
> 
> This had been overlooked in previous revisions.
> 
> To do this we introduce a migrate_pfn_from_page() helper which will call
> device_private_offset_to_page() and set the MIGRATE_PFN_DEVICE_PRIVATE
> flag if required.
> 
> In places where we could have a device private offset
> (MEMORY_DEVICE_PRIVATE) or a pfn (MEMORY_DEVICE_COHERENT) we update to
> use an mpfn to disambiguate.  This includes some users in the drivers
> and migrate_device_{pfns,range}().
> 
> Seeking opinions on using the mpfns like this or if a new type would be
> preferred.
> 
>   - mm/migrate_device: Introduce migrate_pfn_from_page() helper
>     - New to series
> 
>   - drm/amdkfd: Use migrate pfns internally
>     - New to series
> 
>   - mm/migrate_device: Make migrate_device_{pfns,range}() take mpfns
>     - New to series
> 
>   - mm/migrate_device: Add migrate PFN flag to track device private pages
>     - Update for migrate_pfn_from_page()
>     - Rename to MIGRATE_PFN_DEVICE_PRIVATE
>     - drm/amd: Check adev->gmc.xgmi.connected_to_cpu
>     - lib/test_hmm.c: Check chunk->pagemap.type == MEMORY_DEVICE_PRIVATE
> 
>   - mm: Add helpers to create migration entries from struct pages
>     - Add a flags param
> 
>   - mm: Add a new swap type for migration entries of device private pages
>     - Add softleaf_is_migration_device_private_read()
> 
>   - mm: Add helpers to create device private entries from struct pages
>     - Add a flags param
> 
>   - mm: Remove device private pages from the physical address space
>     - Make sure last member of struct dev_pagemap remains DECLARE_FLEX_ARRAY(struct range, ranges);
> 
> Testing:
> - selftests/mm/hmm-tests on an amd64 VM
> 
> * NOTE: I will need help in testing the driver changes *
> 

Thanks for the series. For some reason Intel's CI couldn't apply this
series to drm-tip to get results [1]. I'll manually apply this and run all
our SVM tests and get back you on results + review the changes here. For
future reference if you want to use our CI system, the series must apply
to drm-tip, feel free to rebase this series and just send to intel-xe
list if you want CI results.

I was also wondering if Nvidia could help review one our core MM patches
[2] which is gating enabling 2M device pages too?

Matt

[1] https://patchwork.freedesktop.org/series/159738/
[2] https://patchwork.freedesktop.org/patch/694775/?series=159119&rev=1 

> Revisions:
> - RFC: https://lore.kernel.org/all/20251128044146.80050-1-jniethe@nvidia.com/
> - v1: https://lore.kernel.org/all/20251231043154.42931-1-jniethe@nvidia.com/
> 
> [0] https://lore.kernel.org/lkml/CAMj1kXFZ=4hLL1w6iCV5O5uVoVLHAJbc0rr40j24ObenAjXe9w@mail.gmail.com/
> 
> Jordan Niethe (11):
>   mm/migrate_device: Introduce migrate_pfn_from_page() helper
>   drm/amdkfd: Use migrate pfns internally
>   mm/migrate_device: Make migrate_device_{pfns,range}() take mpfns
>   mm/migrate_device: Add migrate PFN flag to track device private pages
>   mm/page_vma_mapped: Add flags to page_vma_mapped_walk::pfn to track
>     device private pages
>   mm: Add helpers to create migration entries from struct pages
>   mm: Add a new swap type for migration entries of device private pages
>   mm: Add helpers to create device private entries from struct pages
>   mm/util: Add flag to track device private pages in page snapshots
>   mm/hmm: Add flag to track device private pages
>   mm: Remove device private pages from the physical address space
> 
>  Documentation/mm/hmm.rst                 |  11 +-
>  arch/powerpc/kvm/book3s_hv_uvmem.c       |  43 ++---
>  drivers/gpu/drm/amd/amdkfd/kfd_migrate.c |  45 +++---
>  drivers/gpu/drm/amd/amdkfd/kfd_migrate.h |   2 +-
>  drivers/gpu/drm/drm_pagemap.c            |  11 +-
>  drivers/gpu/drm/nouveau/nouveau_dmem.c   |  45 ++----
>  drivers/gpu/drm/xe/xe_svm.c              |  37 ++---
>  fs/proc/page.c                           |   6 +-
>  include/drm/drm_pagemap.h                |   8 +-
>  include/linux/hmm.h                      |   7 +-
>  include/linux/leafops.h                  | 116 ++++++++++++--
>  include/linux/memremap.h                 |  64 +++++++-
>  include/linux/migrate.h                  |  23 ++-
>  include/linux/mm.h                       |   9 +-
>  include/linux/rmap.h                     |  33 +++-
>  include/linux/swap.h                     |   8 +-
>  include/linux/swapops.h                  | 136 ++++++++++++++++
>  lib/test_hmm.c                           |  86 ++++++----
>  mm/debug.c                               |   9 +-
>  mm/hmm.c                                 |   5 +-
>  mm/huge_memory.c                         |  43 ++---
>  mm/hugetlb.c                             |  15 +-
>  mm/memory.c                              |   5 +-
>  mm/memremap.c                            | 193 ++++++++++++++++++-----
>  mm/migrate.c                             |   6 +-
>  mm/migrate_device.c                      |  76 +++++----
>  mm/mm_init.c                             |   8 +-
>  mm/mprotect.c                            |  10 +-
>  mm/page_vma_mapped.c                     |  32 +++-
>  mm/rmap.c                                |  59 ++++---
>  mm/util.c                                |   8 +-
>  mm/vmscan.c                              |   2 +-
>  32 files changed, 822 insertions(+), 339 deletions(-)
> 
> 
> base-commit: f8f9c1f4d0c7a64600e2ca312dec824a0bc2f1da
> -- 
> 2.34.1
>

Re: [PATCH v2 00/11] Remove device private pages from physical address space

Posted by Jordan Niethe 1 month ago

Hi,

On 8/1/26 05:36, Matthew Brost wrote:
> 
> Thanks for the series. For some reason Intel's CI couldn't apply this
> series to drm-tip to get results [1]. I'll manually apply this and run all
> our SVM tests and get back you on results + review the changes here. For
> future reference if you want to use our CI system, the series must apply
> to drm-tip, feel free to rebase this series and just send to intel-xe
> list if you want CI 

Thanks, I'll rebase on drm-tip and send to the intel-xe list.

Jordan.

> 
> I was also wondering if Nvidia could help review one our core MM patches
> [2] which is gating enabling 2M device pages too?
> 
> Matt
> 
> [1] https://patchwork.freedesktop.org/series/159738/
> [2] https://patchwork.freedesktop.org/patch/694775/?series=159119&rev=1

Re: [PATCH v2 00/11] Remove device private pages from physical address space

Posted by Jordan Niethe 1 month ago

Hi,

On 8/1/26 13:25, Jordan Niethe wrote:
> Hi,
> 
> On 8/1/26 05:36, Matthew Brost wrote:
>>
>> Thanks for the series. For some reason Intel's CI couldn't apply this
>> series to drm-tip to get results [1]. I'll manually apply this and run 
>> all
>> our SVM tests and get back you on results + review the changes here. For
>> future reference if you want to use our CI system, the series must apply
>> to drm-tip, feel free to rebase this series and just send to intel-xe
>> list if you want CI 
> 
> Thanks, I'll rebase on drm-tip and send to the intel-xe list.

For reference the rebase on drm-tip on the intel-xe list:

https://patchwork.freedesktop.org/series/159738/

Will watch the CI results.

Thanks,
Jordan.

> 
> Jordan.
> 
>>
>> I was also wondering if Nvidia could help review one our core MM patches
>> [2] which is gating enabling 2M device pages too?
>>
>> Matt
>>
>> [1] https://patchwork.freedesktop.org/series/159738/
>> [2] https://patchwork.freedesktop.org/patch/694775/?series=159119&rev=1
> 
>

Re: [PATCH v2 00/11] Remove device private pages from physical address space

Posted by Jordan Niethe 4 weeks, 1 day ago

Hi,

On 8/1/26 16:42, Jordan Niethe wrote:
> Hi,
> 
> On 8/1/26 13:25, Jordan Niethe wrote:
>> Hi,
>>
>> On 8/1/26 05:36, Matthew Brost wrote:
>>>
>>> Thanks for the series. For some reason Intel's CI couldn't apply this
>>> series to drm-tip to get results [1]. I'll manually apply this and 
>>> run all
>>> our SVM tests and get back you on results + review the changes here. For
>>> future reference if you want to use our CI system, the series must apply
>>> to drm-tip, feel free to rebase this series and just send to intel-xe
>>> list if you want CI 
>>
>> Thanks, I'll rebase on drm-tip and send to the intel-xe list.
> 
> For reference the rebase on drm-tip on the intel-xe list:
> 
> https://patchwork.freedesktop.org/series/159738/
> 
> Will watch the CI results.

The series causes some failures in the intel-xe tests:
https://patchwork.freedesktop.org/series/159738/#rev4

Working through the failures now.

Thanks,
Jordan.

> 
> Thanks,
> Jordan.
> 
>>
>> Jordan.
>>
>>>
>>> I was also wondering if Nvidia could help review one our core MM patches
>>> [2] which is gating enabling 2M device pages too?
>>>
>>> Matt
>>>
>>> [1] https://patchwork.freedesktop.org/series/159738/
>>> [2] https://patchwork.freedesktop.org/patch/694775/?series=159119&rev=1
>>
>>
>

Re: [PATCH v2 00/11] Remove device private pages from physical address space

Posted by Matthew Brost 4 weeks, 1 day ago

On Fri, Jan 09, 2026 at 11:01:13AM +1100, Jordan Niethe wrote:
> Hi,
> 
> On 8/1/26 16:42, Jordan Niethe wrote:
> > Hi,
> > 
> > On 8/1/26 13:25, Jordan Niethe wrote:
> > > Hi,
> > > 
> > > On 8/1/26 05:36, Matthew Brost wrote:
> > > > 
> > > > Thanks for the series. For some reason Intel's CI couldn't apply this
> > > > series to drm-tip to get results [1]. I'll manually apply this
> > > > and run all
> > > > our SVM tests and get back you on results + review the changes here. For
> > > > future reference if you want to use our CI system, the series must apply
> > > > to drm-tip, feel free to rebase this series and just send to intel-xe
> > > > list if you want CI
> > > 
> > > Thanks, I'll rebase on drm-tip and send to the intel-xe list.
> > 
> > For reference the rebase on drm-tip on the intel-xe list:
> > 
> > https://patchwork.freedesktop.org/series/159738/
> > 
> > Will watch the CI results.
> 
> The series causes some failures in the intel-xe tests:
> https://patchwork.freedesktop.org/series/159738/#rev4
> 
> Working through the failures now.
> 

Yea, I saw the failures. I haven't had time look at the patches on my
end quite yet. Scrabling to get a few things in 6.20/7.0 PR, so I may
not have bandwidth to look in depth until mid next week but digging is
on my TODO list.

Matt 

> Thanks,
> Jordan.
> 
> > 
> > Thanks,
> > Jordan.
> > 
> > > 
> > > Jordan.
> > > 
> > > > 
> > > > I was also wondering if Nvidia could help review one our core MM patches
> > > > [2] which is gating enabling 2M device pages too?
> > > > 
> > > > Matt
> > > > 
> > > > [1] https://patchwork.freedesktop.org/series/159738/
> > > > [2] https://patchwork.freedesktop.org/patch/694775/?series=159119&rev=1
> > > 
> > > 
> > 
>

Re: [PATCH v2 00/11] Remove device private pages from physical address space

Posted by Jordan Niethe 4 weeks, 1 day ago

Hi
On 9/1/26 11:31, Matthew Brost wrote:
> On Fri, Jan 09, 2026 at 11:01:13AM +1100, Jordan Niethe wrote:
>> Hi,
>>
>> On 8/1/26 16:42, Jordan Niethe wrote:
>>> Hi,
>>>
>>> On 8/1/26 13:25, Jordan Niethe wrote:
>>>> Hi,
>>>>
>>>> On 8/1/26 05:36, Matthew Brost wrote:
>>>>>
>>>>> Thanks for the series. For some reason Intel's CI couldn't apply this
>>>>> series to drm-tip to get results [1]. I'll manually apply this
>>>>> and run all
>>>>> our SVM tests and get back you on results + review the changes here. For
>>>>> future reference if you want to use our CI system, the series must apply
>>>>> to drm-tip, feel free to rebase this series and just send to intel-xe
>>>>> list if you want CI
>>>>
>>>> Thanks, I'll rebase on drm-tip and send to the intel-xe list.
>>>
>>> For reference the rebase on drm-tip on the intel-xe list:
>>>
>>> https://patchwork.freedesktop.org/series/159738/
>>>
>>> Will watch the CI results.
>>
>> The series causes some failures in the intel-xe tests:
>> https://patchwork.freedesktop.org/series/159738/#rev4
>>
>> Working through the failures now.
>>
> 
> Yea, I saw the failures. I haven't had time look at the patches on my
> end quite yet. Scrabling to get a few things in 6.20/7.0 PR, so I may
> not have bandwidth to look in depth until mid next week but digging is
> on my TODO list.

Sure, that's completely fine. The failures seem pretty directly related 
to the
series so I think I'll be able to make good progress.

For example 
https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-159738v4/bat-bmg-2/igt@xe_evict@evict-beng-small.html

It looks like I missed that xe_pagemap_destroy_work() needs to be updated to
remove the call to devm_release_mem_region() now we are no longer 
reserving a mem
region. 
  
  


Thanks,
Jordan.

> 
> Matt
> 
>> Thanks,
>> Jordan.
>>
>>>
>>> Thanks,
>>> Jordan.
>>>
>>>>
>>>> Jordan.
>>>>
>>>>>
>>>>> I was also wondering if Nvidia could help review one our core MM patches
>>>>> [2] which is gating enabling 2M device pages too?
>>>>>
>>>>> Matt
>>>>>
>>>>> [1] https://patchwork.freedesktop.org/series/159738/
>>>>> [2] https://patchwork.freedesktop.org/patch/694775/?series=159119&rev=1
>>>>
>>>>
>>>
>>

Re: [PATCH v2 00/11] Remove device private pages from physical address space

Posted by Matthew Brost 4 weeks, 1 day ago

On Fri, Jan 09, 2026 at 12:27:50PM +1100, Jordan Niethe wrote:
> Hi
> On 9/1/26 11:31, Matthew Brost wrote:
> > On Fri, Jan 09, 2026 at 11:01:13AM +1100, Jordan Niethe wrote:
> > > Hi,
> > > 
> > > On 8/1/26 16:42, Jordan Niethe wrote:
> > > > Hi,
> > > > 
> > > > On 8/1/26 13:25, Jordan Niethe wrote:
> > > > > Hi,
> > > > > 
> > > > > On 8/1/26 05:36, Matthew Brost wrote:
> > > > > > 
> > > > > > Thanks for the series. For some reason Intel's CI couldn't apply this
> > > > > > series to drm-tip to get results [1]. I'll manually apply this
> > > > > > and run all
> > > > > > our SVM tests and get back you on results + review the changes here. For
> > > > > > future reference if you want to use our CI system, the series must apply
> > > > > > to drm-tip, feel free to rebase this series and just send to intel-xe
> > > > > > list if you want CI
> > > > > 
> > > > > Thanks, I'll rebase on drm-tip and send to the intel-xe list.
> > > > 
> > > > For reference the rebase on drm-tip on the intel-xe list:
> > > > 
> > > > https://patchwork.freedesktop.org/series/159738/
> > > > 
> > > > Will watch the CI results.
> > > 
> > > The series causes some failures in the intel-xe tests:
> > > https://patchwork.freedesktop.org/series/159738/#rev4
> > > 
> > > Working through the failures now.
> > > 
> > 
> > Yea, I saw the failures. I haven't had time look at the patches on my
> > end quite yet. Scrabling to get a few things in 6.20/7.0 PR, so I may
> > not have bandwidth to look in depth until mid next week but digging is
> > on my TODO list.
> 
> Sure, that's completely fine. The failures seem pretty directly related to
> the
> series so I think I'll be able to make good progress.
> 
> For example https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-159738v4/bat-bmg-2/igt@xe_evict@evict-beng-small.html
> 
> It looks like I missed that xe_pagemap_destroy_work() needs to be updated to
> remove the call to devm_release_mem_region() now we are no longer reserving
> a mem
> region.

+1

So this is the one I’d be most concerned about [1].
xe_exec_system_allocator is our SVM test, which does almost all the
ridiculous things possible in user space to stress SVM. It’s blowing up
in the core MM—but the source of the bug could be anywhere (e.g., Xe
SVM, GPU SVM, migrate device layer, or core MM). I’ll try to help when I
have bandwidth.

Matt

[1] https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-159738v4/shard-bmg-9/igt@xe_exec_system_allocator@threads-many-large-execqueues-free-nomemset.html

> 
> 
> Thanks,
> Jordan.
> 
> > 
> > Matt
> > 
> > > Thanks,
> > > Jordan.
> > > 
> > > > 
> > > > Thanks,
> > > > Jordan.
> > > > 
> > > > > 
> > > > > Jordan.
> > > > > 
> > > > > > 
> > > > > > I was also wondering if Nvidia could help review one our core MM patches
> > > > > > [2] which is gating enabling 2M device pages too?
> > > > > > 
> > > > > > Matt
> > > > > > 
> > > > > > [1] https://patchwork.freedesktop.org/series/159738/
> > > > > > [2] https://patchwork.freedesktop.org/patch/694775/?series=159119&rev=1
> > > > > 
> > > > > 
> > > > 
> > > 
>

Re: [PATCH v2 00/11] Remove device private pages from physical address space

Posted by Jordan Niethe 3 weeks, 3 days ago

Hi,

On 9/1/26 17:22, Matthew Brost wrote:
> On Fri, Jan 09, 2026 at 12:27:50PM +1100, Jordan Niethe wrote:
>> Hi
>> On 9/1/26 11:31, Matthew Brost wrote:
>>> On Fri, Jan 09, 2026 at 11:01:13AM +1100, Jordan Niethe wrote:
>>>> Hi,
>>>>
>>>> On 8/1/26 16:42, Jordan Niethe wrote:
>>>>> Hi,
>>>>>
>>>>> On 8/1/26 13:25, Jordan Niethe wrote:
>>>>>> Hi,
>>>>>>
>>>>>> On 8/1/26 05:36, Matthew Brost wrote:
>>>>>>>
>>>>>>> Thanks for the series. For some reason Intel's CI couldn't apply this
>>>>>>> series to drm-tip to get results [1]. I'll manually apply this
>>>>>>> and run all
>>>>>>> our SVM tests and get back you on results + review the changes here. For
>>>>>>> future reference if you want to use our CI system, the series must apply
>>>>>>> to drm-tip, feel free to rebase this series and just send to intel-xe
>>>>>>> list if you want CI
>>>>>>
>>>>>> Thanks, I'll rebase on drm-tip and send to the intel-xe list.
>>>>>
>>>>> For reference the rebase on drm-tip on the intel-xe list:
>>>>>
>>>>> https://patchwork.freedesktop.org/series/159738/
>>>>>
>>>>> Will watch the CI results.
>>>>
>>>> The series causes some failures in the intel-xe tests:
>>>> https://patchwork.freedesktop.org/series/159738/#rev4
>>>>
>>>> Working through the failures now.
>>>>
>>>
>>> Yea, I saw the failures. I haven't had time look at the patches on my
>>> end quite yet. Scrabling to get a few things in 6.20/7.0 PR, so I may
>>> not have bandwidth to look in depth until mid next week but digging is
>>> on my TODO list.
>>
>> Sure, that's completely fine. The failures seem pretty directly related to
>> the
>> series so I think I'll be able to make good progress.
>>
>> For example https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-159738v4/bat-bmg-2/igt@xe_evict@evict-beng-small.html
>>
>> It looks like I missed that xe_pagemap_destroy_work() needs to be updated to
>> remove the call to devm_release_mem_region() now we are no longer reserving
>> a mem
>> region.
> 
> +1
> 
> So this is the one I’d be most concerned about [1].
> xe_exec_system_allocator is our SVM test, which does almost all the
> ridiculous things possible in user space to stress SVM. It’s blowing up
> in the core MM—but the source of the bug could be anywhere (e.g., Xe
> SVM, GPU SVM, migrate device layer, or core MM). I’ll try to help when I
> have bandwidth.
> 
> Matt
> 
> [1] https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-159738v4/shard-bmg-9/igt@xe_exec_system_allocator@threads-many-large-execqueues-free-nomemset.html

A similar fault in lruvec_stat_mod_folio can be repro'd if
memremap_device_private_pagemap() is called with NUMA_NO_NODE instead of 
(say)
numa_node_id() for the nid parameter.

The xe_svm driver uses devm_memremap_device_private_pagemap() which uses
dev_to_node() for the nid parameter. Suspect this is causing something 
similar
to happen.

When memremap_pages() calls pagemap_range() we have the following logic:

         if (nid < 0)
                 nid = numa_mem_id();

I think we might need to add this to memremap_device_private_pagemap() 
to handle
the NUMA_NO_NODE case. Still confirming.

Thanks,
Jordan.

> 
>>
>>
>> Thanks,
>> Jordan.
>>
>>>
>>> Matt
>>>
>>>> Thanks,
>>>> Jordan.
>>>>
>>>>>
>>>>> Thanks,
>>>>> Jordan.
>>>>>
>>>>>>
>>>>>> Jordan.
>>>>>>
>>>>>>>
>>>>>>> I was also wondering if Nvidia could help review one our core MM patches
>>>>>>> [2] which is gating enabling 2M device pages too?
>>>>>>>
>>>>>>> Matt
>>>>>>>
>>>>>>> [1] https://patchwork.freedesktop.org/series/159738/
>>>>>>> [2] https://patchwork.freedesktop.org/patch/694775/?series=159119&rev=1
>>>>>>
>>>>>>
>>>>>
>>>>
>>

Re: [PATCH v2 00/11] Remove device private pages from physical address space

Posted by Jordan Niethe 2 weeks, 1 day ago

Hi,

On 14/1/26 16:41, Jordan Niethe wrote:
> Hi,
> 
> On 9/1/26 17:22, Matthew Brost wrote:
>> On Fri, Jan 09, 2026 at 12:27:50PM +1100, Jordan Niethe wrote:
>>> Hi
>>> On 9/1/26 11:31, Matthew Brost wrote:
>>>> On Fri, Jan 09, 2026 at 11:01:13AM +1100, Jordan Niethe wrote:
>>>>> Hi,
>>>>>
>>>>> On 8/1/26 16:42, Jordan Niethe wrote:
>>>>>> Hi,
>>>>>>
>>>>>> On 8/1/26 13:25, Jordan Niethe wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> On 8/1/26 05:36, Matthew Brost wrote:
>>>>>>>>
>>>>>>>> Thanks for the series. For some reason Intel's CI couldn't apply this
>>>>>>>> series to drm-tip to get results [1]. I'll manually apply this
>>>>>>>> and run all
>>>>>>>> our SVM tests and get back you on results + review the changes here. For
>>>>>>>> future reference if you want to use our CI system, the series must apply
>>>>>>>> to drm-tip, feel free to rebase this series and just send to intel-xe
>>>>>>>> list if you want CI
>>>>>>>
>>>>>>> Thanks, I'll rebase on drm-tip and send to the intel-xe list.
>>>>>>
>>>>>> For reference the rebase on drm-tip on the intel-xe list:
>>>>>>
>>>>>> https://patchwork.freedesktop.org/series/159738/
>>>>>>
>>>>>> Will watch the CI results.
>>>>>
>>>>> The series causes some failures in the intel-xe tests:
>>>>> https://patchwork.freedesktop.org/series/159738/#rev4
>>>>>
>>>>> Working through the failures now.
>>>>>
>>>>
>>>> Yea, I saw the failures. I haven't had time look at the patches on my
>>>> end quite yet. Scrabling to get a few things in 6.20/7.0 PR, so I may
>>>> not have bandwidth to look in depth until mid next week but digging is
>>>> on my TODO list.
>>>
>>> Sure, that's completely fine. The failures seem pretty directly related to
>>> the
>>> series so I think I'll be able to make good progress.
>>>
>>> For example https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-159738v4/bat-bmg-2/igt@xe_evict@evict-beng-small.html
>>>
>>> It looks like I missed that xe_pagemap_destroy_work() needs to be updated to
>>> remove the call to devm_release_mem_region() now we are no longer reserving
>>> a mem
>>> region.
>>
>> +1
>>
>> So this is the one I’d be most concerned about [1].
>> xe_exec_system_allocator is our SVM test, which does almost all the
>> ridiculous things possible in user space to stress SVM. It’s blowing up
>> in the core MM—but the source of the bug could be anywhere (e.g., Xe
>> SVM, GPU SVM, migrate device layer, or core MM). I’ll try to help when I
>> have bandwidth.
>>
>> Matt
>>
>> [1] https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-159738v4/shard-bmg-9/igt@xe_exec_system_allocator@threads-many-large-execqueues-free-nomemset.html
> 
> A similar fault in lruvec_stat_mod_folio can be repro'd if
> memremap_device_private_pagemap() is called with NUMA_NO_NODE instead of (say)
> numa_node_id() for the nid parameter.
> 
> The xe_svm driver uses devm_memremap_device_private_pagemap() which uses
> dev_to_node() for the nid parameter. Suspect this is causing something similar
> to happen.
> 
> When memremap_pages() calls pagemap_range() we have the following logic:
> 
>          if (nid < 0)
>                  nid = numa_mem_id();
> 
> I think we might need to add this to memremap_device_private_pagemap() to handle
> the NUMA_NO_NODE case. Still confirming.

This was the problem, fixed in v3.

> 
> Thanks,
> Jordan.
> 
>>
>>>
>>>
>>> Thanks,
>>> Jordan.
>>>
>>>>
>>>> Matt
>>>>
>>>>> Thanks,
>>>>> Jordan.
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Jordan.
>>>>>>
>>>>>>>
>>>>>>> Jordan.
>>>>>>>
>>>>>>>>
>>>>>>>> I was also wondering if Nvidia could help review one our core MM patches
>>>>>>>> [2] which is gating enabling 2M device pages too?
>>>>>>>>
>>>>>>>> Matt
>>>>>>>>
>>>>>>>> [1] https://patchwork.freedesktop.org/series/159738/
>>>>>>>> [2] https://patchwork.freedesktop.org/patch/694775/?series=159119&rev=1
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>
>

Re: [PATCH v2 00/11] Remove device private pages from physical address space

Posted by Zi Yan 1 month ago

On 7 Jan 2026, at 13:36, Matthew Brost wrote:

> On Wed, Jan 07, 2026 at 08:18:12PM +1100, Jordan Niethe wrote:
>> Today, when creating these device private struct pages, the first step
>> is to use request_free_mem_region() to get a range of physical address
>> space large enough to represent the devices memory. This allocated
>> physical address range is then remapped as device private memory using
>> memremap_pages.
>>
>> Needing allocation of physical address space has some problems:
>>
>>   1) There may be insufficient physical address space to represent the
>>      device memory. KASLR reducing the physical address space and VM
>>      configurations with limited physical address space increase the
>>      likelihood of hitting this especially as device memory increases. This
>>      has been observed to prevent device private from being initialized.
>>
>>   2) Attempting to add the device private pages to the linear map at
>>      addresses beyond the actual physical memory causes issues on
>>      architectures like aarch64  - meaning the feature does not work there [0].
>>
>> This series changes device private memory so that it does not require
>> allocation of physical address space and these problems are avoided.
>> Instead of using the physical address space, we introduce a "device
>> private address space" and allocate from there.
>>
>> A consequence of placing the device private pages outside of the
>> physical address space is that they no longer have a PFN. However, it is
>> still necessary to be able to look up a corresponding device private
>> page from a device private PTE entry, which means that we still require
>> some way to index into this device private address space. Instead of a
>> PFN, device private pages use an offset into this device private address
>> space to look up device private struct pages.
>>
>> The problem that then needs to be addressed is how to avoid confusing
>> these device private offsets with PFNs. It is the inherent limited usage
>> of the device private pages themselves which make this possible. A
>> device private page is only used for userspace mappings, we do not need
>> to be concerned with them being used within the mm more broadly. This
>> means that the only way that the core kernel looks up these pages is via
>> the page table, where their PTE already indicates if they refer to a
>> device private page via their swap type, e.g.  SWP_DEVICE_WRITE. We can
>> use this information to determine if the PTE contains a PFN which should
>> be looked up in the page map, or a device private offset which should be
>> looked up elsewhere.
>>
>> This applies when we are creating PTE entries for device private pages -
>> because they have their own type there are already must be handled
>> separately, so it is a small step to convert them to a device private
>> PFN now too.
>>
>> The first part of the series updates callers where device private
>> offsets might now be encountered to track this extra state.
>>
>> The last patch contains the bulk of the work where we change how we
>> convert between device private pages to device private offsets and then
>> use a new interface for allocating device private pages without the need
>> for reserving physical address space.
>>
>> By removing the device private pages from the physical address space,
>> this series also opens up the possibility to moving away from tracking
>> device private memory using struct pages in the future. This is
>> desirable as on systems with large amounts of memory these device
>> private struct pages use a signifiant amount of memory and take a
>> significant amount of time to initialize.
>>
>> *** Changes in v2 ***
>>
>> The most significant change in v2 is addressing code paths that are
>> common between MEMORY_DEVICE_PRIVATE and MEMORY_DEVICE_COHERENT devices.
>>
>> This had been overlooked in previous revisions.
>>
>> To do this we introduce a migrate_pfn_from_page() helper which will call
>> device_private_offset_to_page() and set the MIGRATE_PFN_DEVICE_PRIVATE
>> flag if required.
>>
>> In places where we could have a device private offset
>> (MEMORY_DEVICE_PRIVATE) or a pfn (MEMORY_DEVICE_COHERENT) we update to
>> use an mpfn to disambiguate.  This includes some users in the drivers
>> and migrate_device_{pfns,range}().
>>
>> Seeking opinions on using the mpfns like this or if a new type would be
>> preferred.
>>
>>   - mm/migrate_device: Introduce migrate_pfn_from_page() helper
>>     - New to series
>>
>>   - drm/amdkfd: Use migrate pfns internally
>>     - New to series
>>
>>   - mm/migrate_device: Make migrate_device_{pfns,range}() take mpfns
>>     - New to series
>>
>>   - mm/migrate_device: Add migrate PFN flag to track device private pages
>>     - Update for migrate_pfn_from_page()
>>     - Rename to MIGRATE_PFN_DEVICE_PRIVATE
>>     - drm/amd: Check adev->gmc.xgmi.connected_to_cpu
>>     - lib/test_hmm.c: Check chunk->pagemap.type == MEMORY_DEVICE_PRIVATE
>>
>>   - mm: Add helpers to create migration entries from struct pages
>>     - Add a flags param
>>
>>   - mm: Add a new swap type for migration entries of device private pages
>>     - Add softleaf_is_migration_device_private_read()
>>
>>   - mm: Add helpers to create device private entries from struct pages
>>     - Add a flags param
>>
>>   - mm: Remove device private pages from the physical address space
>>     - Make sure last member of struct dev_pagemap remains DECLARE_FLEX_ARRAY(struct range, ranges);
>>
>> Testing:
>> - selftests/mm/hmm-tests on an amd64 VM
>>
>> * NOTE: I will need help in testing the driver changes *
>>
>
> Thanks for the series. For some reason Intel's CI couldn't apply this
> series to drm-tip to get results [1]. I'll manually apply this and run all
> our SVM tests and get back you on results + review the changes here. For
> future reference if you want to use our CI system, the series must apply
> to drm-tip, feel free to rebase this series and just send to intel-xe
> list if you want CI results.
>
> I was also wondering if Nvidia could help review one our core MM patches
> [2] which is gating enabling 2M device pages too?

I will take a look. But next time, do you mind Ccing MM maintainers and
reviewers based on MAINTAINERS file? Otherwise, it is hard for people to
check every email from linux-mm.

Thanks.

>
> Matt
>
> [1] https://patchwork.freedesktop.org/series/159738/
> [2] https://patchwork.freedesktop.org/patch/694775/?series=159119&rev=1
>
>> Revisions:
>> - RFC: https://lore.kernel.org/all/20251128044146.80050-1-jniethe@nvidia.com/
>> - v1: https://lore.kernel.org/all/20251231043154.42931-1-jniethe@nvidia.com/
>>
>> [0] https://lore.kernel.org/lkml/CAMj1kXFZ=4hLL1w6iCV5O5uVoVLHAJbc0rr40j24ObenAjXe9w@mail.gmail.com/
>>
>> Jordan Niethe (11):
>>   mm/migrate_device: Introduce migrate_pfn_from_page() helper
>>   drm/amdkfd: Use migrate pfns internally
>>   mm/migrate_device: Make migrate_device_{pfns,range}() take mpfns
>>   mm/migrate_device: Add migrate PFN flag to track device private pages
>>   mm/page_vma_mapped: Add flags to page_vma_mapped_walk::pfn to track
>>     device private pages
>>   mm: Add helpers to create migration entries from struct pages
>>   mm: Add a new swap type for migration entries of device private pages
>>   mm: Add helpers to create device private entries from struct pages
>>   mm/util: Add flag to track device private pages in page snapshots
>>   mm/hmm: Add flag to track device private pages
>>   mm: Remove device private pages from the physical address space
>>
>>  Documentation/mm/hmm.rst                 |  11 +-
>>  arch/powerpc/kvm/book3s_hv_uvmem.c       |  43 ++---
>>  drivers/gpu/drm/amd/amdkfd/kfd_migrate.c |  45 +++---
>>  drivers/gpu/drm/amd/amdkfd/kfd_migrate.h |   2 +-
>>  drivers/gpu/drm/drm_pagemap.c            |  11 +-
>>  drivers/gpu/drm/nouveau/nouveau_dmem.c   |  45 ++----
>>  drivers/gpu/drm/xe/xe_svm.c              |  37 ++---
>>  fs/proc/page.c                           |   6 +-
>>  include/drm/drm_pagemap.h                |   8 +-
>>  include/linux/hmm.h                      |   7 +-
>>  include/linux/leafops.h                  | 116 ++++++++++++--
>>  include/linux/memremap.h                 |  64 +++++++-
>>  include/linux/migrate.h                  |  23 ++-
>>  include/linux/mm.h                       |   9 +-
>>  include/linux/rmap.h                     |  33 +++-
>>  include/linux/swap.h                     |   8 +-
>>  include/linux/swapops.h                  | 136 ++++++++++++++++
>>  lib/test_hmm.c                           |  86 ++++++----
>>  mm/debug.c                               |   9 +-
>>  mm/hmm.c                                 |   5 +-
>>  mm/huge_memory.c                         |  43 ++---
>>  mm/hugetlb.c                             |  15 +-
>>  mm/memory.c                              |   5 +-
>>  mm/memremap.c                            | 193 ++++++++++++++++++-----
>>  mm/migrate.c                             |   6 +-
>>  mm/migrate_device.c                      |  76 +++++----
>>  mm/mm_init.c                             |   8 +-
>>  mm/mprotect.c                            |  10 +-
>>  mm/page_vma_mapped.c                     |  32 +++-
>>  mm/rmap.c                                |  59 ++++---
>>  mm/util.c                                |   8 +-
>>  mm/vmscan.c                              |   2 +-
>>  32 files changed, 822 insertions(+), 339 deletions(-)
>>
>>
>> base-commit: f8f9c1f4d0c7a64600e2ca312dec824a0bc2f1da
>> -- 
>> 2.34.1
>>


Best Regards,
Yan, Zi