[RFC PATCH 0/6] Remove device private pages from physical address space

Jordan Niethe posted 6 patches 3 days, 16 hours ago
Documentation/mm/hmm.rst |   9 +-
fs/proc/page.c           |   6 +-
include/linux/hmm.h      |   5 ++
include/linux/memremap.h |  25 +++++-
include/linux/migrate.h  |   5 ++
include/linux/mm.h       |   9 +-
include/linux/rmap.h     |  33 +++++++-
include/linux/swap.h     |   8 +-
include/linux/swapops.h  | 102 +++++++++++++++++++++--
lib/test_hmm.c           |  66 ++++++++-------
mm/debug.c               |   9 +-
mm/hmm.c                 |   2 +-
mm/memory.c              |   9 +-
mm/memremap.c            | 174 +++++++++++++++++++++++++++++----------
mm/migrate.c             |   6 +-
mm/migrate_device.c      |  44 ++++++----
mm/mm_init.c             |   8 +-
mm/mprotect.c            |  21 +++--
mm/page_vma_mapped.c     |  18 +++-
mm/pagewalk.c            |   2 +-
mm/rmap.c                |  68 ++++++++++-----
mm/util.c                |   8 +-
mm/vmscan.c              |   2 +-
23 files changed, 485 insertions(+), 154 deletions(-)
[RFC PATCH 0/6] Remove device private pages from physical address space
Posted by Jordan Niethe 3 days, 16 hours ago
Today, when creating these device private struct pages, the first step
is to use request_free_mem_region() to get a range of physical address
space large enough to represent the devices memory. This allocated
physical address range is then remapped as device private memory using
memremap_pages.

Needing allocation of physical address space has some problems:

  1) There may be insufficient physical address space to represent the
     device memory. KASLR reducing the physical address space and VM
     configurations with limited physical address space increase the
     likelihood of hitting this especially as device memory increases. This
     has been observed to prevent device private from being initialized.  

  2) Attempting to add the device private pages to the linear map at
     addresses beyond the actual physical memory causes issues on
     architectures like aarch64  - meaning the feature does not work there [0].

This RFC changes device private memory so that it does not require
allocation of physical address space and these problems are avoided.
Instead of using the physical address space, we introduce a "device
private address space" and allocate from there.

A consequence of placing the device private pages outside of the
physical address space is that they no longer have a PFN. However, it is
still necessary to be able to look up a corresponding device private
page from a device private PTE entry, which means that we still require
some way to index into this device private address space. This leads to
the idea of a device private PFN. This is like a PFN but instead of
associating memory in the physical address space with a struct page, it
associates device memory in the device private address space with a
device private struct page.

The problem that then needs to be addressed is how to avoid confusing
these device private PFNs with the regular PFNs. It is the inherent
limited usage of the device private pages themselves which make this
possible. A device private page is only used for userspace mappings, we
do not need to be concerned with them being used within the mm more
broadly. This means that the only way that the core kernel looks up
these pages is via the page table, where their PTE already indicates if
they refer to a device private page via their swap type, e.g.
SWP_DEVICE_WRITE. We can use this information to determine if the PTE
contains a normal PFN which should be looked up in the page map, or a
device private PFN which should be looked up elsewhere.

This applies when we are creating PTE entries for device private pages -
because they have their own type there are already must be handled
separately, so it is a small step to convert them to a device private
PFN now too.

The first part of the series updates callers where device private PFNs
might now be encountered to track this extra state.

The last patch contains the bulk of the work where we change how we
convert between device private pages to device private PFNs and then use
a new interface for allocating device private pages without the need for
reserving physical address space.

For the purposes of the RFC changes have been limited to test_hmm.c
updates to the other drivers will be included in the next revision.

This would include updating existing users of memremap_pages() to use
memremap_device_private_pagemap() instead to allocate device private
pages. This also means they would no longer need to call
request_free_mem_region().  An equivalent of devm_memremap_pages() will
also be necessary.

Users of the migrate_vma() interface will also need to be updated to be
aware these device private PFNs.

By removing the device private pages from the physical address space,
this RFC also opens up the possibility to moving away from tracking
device private memory using struct pages in the future. This is
desirable as on systems with large amounts of memory these device
private struct pages use a signifiant amount of memory and take a
significant amount of time to initialize.

Testing:
- selftests/mm/hmm-tests on an amd64 VM

[0] https://lore.kernel.org/lkml/CAMj1kXFZ=4hLL1w6iCV5O5uVoVLHAJbc0rr40j24ObenAjXe9w@mail.gmail.com/

Jordan Niethe (6):
  mm/hmm: Add flag to track device private PFNs
  mm/migrate_device: Add migrate PFN flag to track device private PFNs
  mm/page_vma_mapped: Add flags to page_vma_mapped_walk::pfn to track
    device private PFNs
  mm: Add a new swap type for migration entries with device private PFNs
  mm/util: Add flag to track device private PFNs in page snapshots
  mm: Remove device private pages from the physical address space

 Documentation/mm/hmm.rst |   9 +-
 fs/proc/page.c           |   6 +-
 include/linux/hmm.h      |   5 ++
 include/linux/memremap.h |  25 +++++-
 include/linux/migrate.h  |   5 ++
 include/linux/mm.h       |   9 +-
 include/linux/rmap.h     |  33 +++++++-
 include/linux/swap.h     |   8 +-
 include/linux/swapops.h  | 102 +++++++++++++++++++++--
 lib/test_hmm.c           |  66 ++++++++-------
 mm/debug.c               |   9 +-
 mm/hmm.c                 |   2 +-
 mm/memory.c              |   9 +-
 mm/memremap.c            | 174 +++++++++++++++++++++++++++++----------
 mm/migrate.c             |   6 +-
 mm/migrate_device.c      |  44 ++++++----
 mm/mm_init.c             |   8 +-
 mm/mprotect.c            |  21 +++--
 mm/page_vma_mapped.c     |  18 +++-
 mm/pagewalk.c            |   2 +-
 mm/rmap.c                |  68 ++++++++++-----
 mm/util.c                |   8 +-
 mm/vmscan.c              |   2 +-
 23 files changed, 485 insertions(+), 154 deletions(-)


base-commit: e1afacb68573c3cd0a3785c6b0508876cd3423bc
-- 
2.34.1
Re: [RFC PATCH 0/6] Remove device private pages from physical address space
Posted by David Hildenbrand (Red Hat) 3 days, 13 hours ago
On 11/28/25 05:41, Jordan Niethe wrote:
> Today, when creating these device private struct pages, the first step
> is to use request_free_mem_region() to get a range of physical address
> space large enough to represent the devices memory. This allocated
> physical address range is then remapped as device private memory using
> memremap_pages.

Just a note that as we are finishing the old release and are about to 
start the merge window (+ there is Thanksgiving), expect few replies to 
non-urgent stuff in the next weeks.

Having that said, the proposal is interesting. I recall that Alistair 
and Jason recently discussed removing the need of dealing with PFNs
completely for device-private.

Is that the result of these discussions?

-- 
Cheers

David
Re: [RFC PATCH 0/6] Remove device private pages from physical address space
Posted by Alistair Popple 21 hours ago
On 2025-11-28 at 18:40 +1100, "David Hildenbrand (Red Hat)" <david@kernel.org> wrote...
> On 11/28/25 05:41, Jordan Niethe wrote:
> > Today, when creating these device private struct pages, the first step
> > is to use request_free_mem_region() to get a range of physical address
> > space large enough to represent the devices memory. This allocated
> > physical address range is then remapped as device private memory using
> > memremap_pages.
> 
> Just a note that as we are finishing the old release and are about to start
> the merge window (+ there is Thanksgiving), expect few replies to non-urgent
> stuff in the next weeks.

Thanks David! Mostly we just wanted to at least get the RFC out prior to LPC so
I can talk about it there if needed.

> Having that said, the proposal is interesting. I recall that Alistair and
> Jason recently discussed removing the need of dealing with PFNs
> completely for device-private.
> 
> Is that the result of these discussions?

That is certainly something we would like to explore, but this idea mostly came
from a more immediate need to deal with the lack of support on AARCH64 where we
can't just steal random bits of the physical address space (which is reasonable
- the kernel doesn't really "own" the physical memory map after all), and also
the KASLR and VM issues which cause initialisation to fail.

Removing struct pages entirely for at least device private memory is also
something I'd like to explore with this.

> -- 
> Cheers
> 
> David
Re: [RFC PATCH 0/6] Remove device private pages from physical address space
Posted by Mika Penttilä 3 days, 4 hours ago
Hi Jordan!

On 11/28/25 06:41, Jordan Niethe wrote:

> Today, when creating these device private struct pages, the first step
> is to use request_free_mem_region() to get a range of physical address
> space large enough to represent the devices memory. This allocated
> physical address range is then remapped as device private memory using
> memremap_pages.
>
I just did a quick read thru, and liked how it turned out to be, nice work!

--Mika
Re: [RFC PATCH 0/6] Remove device private pages from physical address space
Posted by Matthew Wilcox 3 days, 5 hours ago
On Fri, Nov 28, 2025 at 03:41:40PM +1100, Jordan Niethe wrote:
> A consequence of placing the device private pages outside of the
> physical address space is that they no longer have a PFN. However, it is
> still necessary to be able to look up a corresponding device private
> page from a device private PTE entry, which means that we still require
> some way to index into this device private address space. This leads to
> the idea of a device private PFN. This is like a PFN but instead of

Don't call it a "device private PFN".  That's going to lead to
confusion.  Device private index?  Device memory index?

> By removing the device private pages from the physical address space,
> this RFC also opens up the possibility to moving away from tracking
> device private memory using struct pages in the future. This is
> desirable as on systems with large amounts of memory these device
> private struct pages use a signifiant amount of memory and take a
> significant amount of time to initialize.

I did tell Jerome he was making a huge mistake with his design, but
he forced it in anyway.
Re: [RFC PATCH 0/6] Remove device private pages from physical address space
Posted by Matthew Brost 3 days, 1 hour ago
On Fri, Nov 28, 2025 at 03:41:40PM +1100, Jordan Niethe wrote:
> Today, when creating these device private struct pages, the first step
> is to use request_free_mem_region() to get a range of physical address
> space large enough to represent the devices memory. This allocated
> physical address range is then remapped as device private memory using
> memremap_pages.
> 
> Needing allocation of physical address space has some problems:
> 
>   1) There may be insufficient physical address space to represent the
>      device memory. KASLR reducing the physical address space and VM
>      configurations with limited physical address space increase the
>      likelihood of hitting this especially as device memory increases. This
>      has been observed to prevent device private from being initialized.  
> 
>   2) Attempting to add the device private pages to the linear map at
>      addresses beyond the actual physical memory causes issues on
>      architectures like aarch64  - meaning the feature does not work there [0].
> 
> This RFC changes device private memory so that it does not require
> allocation of physical address space and these problems are avoided.
> Instead of using the physical address space, we introduce a "device
> private address space" and allocate from there.
> 
> A consequence of placing the device private pages outside of the
> physical address space is that they no longer have a PFN. However, it is
> still necessary to be able to look up a corresponding device private
> page from a device private PTE entry, which means that we still require
> some way to index into this device private address space. This leads to
> the idea of a device private PFN. This is like a PFN but instead of
> associating memory in the physical address space with a struct page, it
> associates device memory in the device private address space with a
> device private struct page.
> 
> The problem that then needs to be addressed is how to avoid confusing
> these device private PFNs with the regular PFNs. It is the inherent
> limited usage of the device private pages themselves which make this
> possible. A device private page is only used for userspace mappings, we
> do not need to be concerned with them being used within the mm more
> broadly. This means that the only way that the core kernel looks up
> these pages is via the page table, where their PTE already indicates if
> they refer to a device private page via their swap type, e.g.
> SWP_DEVICE_WRITE. We can use this information to determine if the PTE
> contains a normal PFN which should be looked up in the page map, or a
> device private PFN which should be looked up elsewhere.
> 
> This applies when we are creating PTE entries for device private pages -
> because they have their own type there are already must be handled
> separately, so it is a small step to convert them to a device private
> PFN now too.
> 
> The first part of the series updates callers where device private PFNs
> might now be encountered to track this extra state.
> 
> The last patch contains the bulk of the work where we change how we
> convert between device private pages to device private PFNs and then use
> a new interface for allocating device private pages without the need for
> reserving physical address space.
> 
> For the purposes of the RFC changes have been limited to test_hmm.c
> updates to the other drivers will be included in the next revision.
> 
> This would include updating existing users of memremap_pages() to use
> memremap_device_private_pagemap() instead to allocate device private
> pages. This also means they would no longer need to call
> request_free_mem_region().  An equivalent of devm_memremap_pages() will
> also be necessary.
> 
> Users of the migrate_vma() interface will also need to be updated to be
> aware these device private PFNs.
> 
> By removing the device private pages from the physical address space,
> this RFC also opens up the possibility to moving away from tracking
> device private memory using struct pages in the future. This is
> desirable as on systems with large amounts of memory these device
> private struct pages use a signifiant amount of memory and take a
> significant amount of time to initialize.

A couple things.

- I’m fairly certain that, briefly looking at this, it will break all
  upstream DRM drivers (AMDKFD, Nouveau, Xe / GPUSVM) that use device
  private pages. I looked into what I think conflicts with Xe / GPUSVM,
  and I believe the impact is fairly minor. I’m happy to help by pulling
  this code and fixing up our side.

- I’m fully on board with eventually moving to something that uses less
  memory than struct page, and I’m happy to coordinate on future changes.

- Before we start coordinating on this patch set, should we hold off until
  the 6.19 cycle, which includes 2M device pages from Balbir [1] (i.e.,
  rebase this series on top of 6.19 once it includes 2M pages)? I suspect
  that, given the scope of this series and Balbir’s, there will be some
  conflicts.

Matt

[1] https://patchwork.freedesktop.org/series/152798/

> 
> Testing:
> - selftests/mm/hmm-tests on an amd64 VM
> 
> [0] https://lore.kernel.org/lkml/CAMj1kXFZ=4hLL1w6iCV5O5uVoVLHAJbc0rr40j24ObenAjXe9w@mail.gmail.com/
> 
> Jordan Niethe (6):
>   mm/hmm: Add flag to track device private PFNs
>   mm/migrate_device: Add migrate PFN flag to track device private PFNs
>   mm/page_vma_mapped: Add flags to page_vma_mapped_walk::pfn to track
>     device private PFNs
>   mm: Add a new swap type for migration entries with device private PFNs
>   mm/util: Add flag to track device private PFNs in page snapshots
>   mm: Remove device private pages from the physical address space
> 
>  Documentation/mm/hmm.rst |   9 +-
>  fs/proc/page.c           |   6 +-
>  include/linux/hmm.h      |   5 ++
>  include/linux/memremap.h |  25 +++++-
>  include/linux/migrate.h  |   5 ++
>  include/linux/mm.h       |   9 +-
>  include/linux/rmap.h     |  33 +++++++-
>  include/linux/swap.h     |   8 +-
>  include/linux/swapops.h  | 102 +++++++++++++++++++++--
>  lib/test_hmm.c           |  66 ++++++++-------
>  mm/debug.c               |   9 +-
>  mm/hmm.c                 |   2 +-
>  mm/memory.c              |   9 +-
>  mm/memremap.c            | 174 +++++++++++++++++++++++++++++----------
>  mm/migrate.c             |   6 +-
>  mm/migrate_device.c      |  44 ++++++----
>  mm/mm_init.c             |   8 +-
>  mm/mprotect.c            |  21 +++--
>  mm/page_vma_mapped.c     |  18 +++-
>  mm/pagewalk.c            |   2 +-
>  mm/rmap.c                |  68 ++++++++++-----
>  mm/util.c                |   8 +-
>  mm/vmscan.c              |   2 +-
>  23 files changed, 485 insertions(+), 154 deletions(-)
> 
> 
> base-commit: e1afacb68573c3cd0a3785c6b0508876cd3423bc
> -- 
> 2.34.1
> 
Re: [RFC PATCH 0/6] Remove device private pages from physical address space
Posted by Alistair Popple 21 hours ago
On 2025-11-29 at 06:22 +1100, Matthew Brost <matthew.brost@intel.com> wrote...
> On Fri, Nov 28, 2025 at 03:41:40PM +1100, Jordan Niethe wrote:
> > Today, when creating these device private struct pages, the first step
> > is to use request_free_mem_region() to get a range of physical address
> > space large enough to represent the devices memory. This allocated
> > physical address range is then remapped as device private memory using
> > memremap_pages.
> > 
> > Needing allocation of physical address space has some problems:
> > 
> >   1) There may be insufficient physical address space to represent the
> >      device memory. KASLR reducing the physical address space and VM
> >      configurations with limited physical address space increase the
> >      likelihood of hitting this especially as device memory increases. This
> >      has been observed to prevent device private from being initialized.  
> > 
> >   2) Attempting to add the device private pages to the linear map at
> >      addresses beyond the actual physical memory causes issues on
> >      architectures like aarch64  - meaning the feature does not work there [0].
> > 
> > This RFC changes device private memory so that it does not require
> > allocation of physical address space and these problems are avoided.
> > Instead of using the physical address space, we introduce a "device
> > private address space" and allocate from there.
> > 
> > A consequence of placing the device private pages outside of the
> > physical address space is that they no longer have a PFN. However, it is
> > still necessary to be able to look up a corresponding device private
> > page from a device private PTE entry, which means that we still require
> > some way to index into this device private address space. This leads to
> > the idea of a device private PFN. This is like a PFN but instead of
> > associating memory in the physical address space with a struct page, it
> > associates device memory in the device private address space with a
> > device private struct page.
> > 
> > The problem that then needs to be addressed is how to avoid confusing
> > these device private PFNs with the regular PFNs. It is the inherent
> > limited usage of the device private pages themselves which make this
> > possible. A device private page is only used for userspace mappings, we
> > do not need to be concerned with them being used within the mm more
> > broadly. This means that the only way that the core kernel looks up
> > these pages is via the page table, where their PTE already indicates if
> > they refer to a device private page via their swap type, e.g.
> > SWP_DEVICE_WRITE. We can use this information to determine if the PTE
> > contains a normal PFN which should be looked up in the page map, or a
> > device private PFN which should be looked up elsewhere.
> > 
> > This applies when we are creating PTE entries for device private pages -
> > because they have their own type there are already must be handled
> > separately, so it is a small step to convert them to a device private
> > PFN now too.
> > 
> > The first part of the series updates callers where device private PFNs
> > might now be encountered to track this extra state.
> > 
> > The last patch contains the bulk of the work where we change how we
> > convert between device private pages to device private PFNs and then use
> > a new interface for allocating device private pages without the need for
> > reserving physical address space.
> > 
> > For the purposes of the RFC changes have been limited to test_hmm.c
> > updates to the other drivers will be included in the next revision.
> > 
> > This would include updating existing users of memremap_pages() to use
> > memremap_device_private_pagemap() instead to allocate device private
> > pages. This also means they would no longer need to call
> > request_free_mem_region().  An equivalent of devm_memremap_pages() will
> > also be necessary.
> > 
> > Users of the migrate_vma() interface will also need to be updated to be
> > aware these device private PFNs.
> > 
> > By removing the device private pages from the physical address space,
> > this RFC also opens up the possibility to moving away from tracking
> > device private memory using struct pages in the future. This is
> > desirable as on systems with large amounts of memory these device
> > private struct pages use a signifiant amount of memory and take a
> > significant amount of time to initialize.
> 
> A couple things.
> 
> - I’m fairly certain that, briefly looking at this, it will break all
>   upstream DRM drivers (AMDKFD, Nouveau, Xe / GPUSVM) that use device
>   private pages. I looked into what I think conflicts with Xe / GPUSVM,
>   and I believe the impact is fairly minor. I’m happy to help by pulling
>   this code and fixing up our side.

It most certainly will :-) I think Jordan called that out above but we wanted
to get the design right before spending too much time updating drivers. That
said I don't think the driver changes should be extensive, but let us know if
you disagree.

> - I’m fully on board with eventually moving to something that uses less
>   memory than struct page, and I’m happy to coordinate on future changes.

Thanks!

> - Before we start coordinating on this patch set, should we hold off until
>   the 6.19 cycle, which includes 2M device pages from Balbir [1] (i.e.,
>   rebase this series on top of 6.19 once it includes 2M pages)? I suspect
>   that, given the scope of this series and Balbir’s, there will be some
>   conflicts.

Our aim here is to get some review of the design and the patches/implementation
for the 6.19 cycle but I agree that this will need to get rebased on top of
Balbir's series.

 - Alistair

> Matt
> 
> [1] https://patchwork.freedesktop.org/series/152798/
> 
> > 
> > Testing:
> > - selftests/mm/hmm-tests on an amd64 VM
> > 
> > [0] https://lore.kernel.org/lkml/CAMj1kXFZ=4hLL1w6iCV5O5uVoVLHAJbc0rr40j24ObenAjXe9w@mail.gmail.com/
> > 
> > Jordan Niethe (6):
> >   mm/hmm: Add flag to track device private PFNs
> >   mm/migrate_device: Add migrate PFN flag to track device private PFNs
> >   mm/page_vma_mapped: Add flags to page_vma_mapped_walk::pfn to track
> >     device private PFNs
> >   mm: Add a new swap type for migration entries with device private PFNs
> >   mm/util: Add flag to track device private PFNs in page snapshots
> >   mm: Remove device private pages from the physical address space
> > 
> >  Documentation/mm/hmm.rst |   9 +-
> >  fs/proc/page.c           |   6 +-
> >  include/linux/hmm.h      |   5 ++
> >  include/linux/memremap.h |  25 +++++-
> >  include/linux/migrate.h  |   5 ++
> >  include/linux/mm.h       |   9 +-
> >  include/linux/rmap.h     |  33 +++++++-
> >  include/linux/swap.h     |   8 +-
> >  include/linux/swapops.h  | 102 +++++++++++++++++++++--
> >  lib/test_hmm.c           |  66 ++++++++-------
> >  mm/debug.c               |   9 +-
> >  mm/hmm.c                 |   2 +-
> >  mm/memory.c              |   9 +-
> >  mm/memremap.c            | 174 +++++++++++++++++++++++++++++----------
> >  mm/migrate.c             |   6 +-
> >  mm/migrate_device.c      |  44 ++++++----
> >  mm/mm_init.c             |   8 +-
> >  mm/mprotect.c            |  21 +++--
> >  mm/page_vma_mapped.c     |  18 +++-
> >  mm/pagewalk.c            |   2 +-
> >  mm/rmap.c                |  68 ++++++++++-----
> >  mm/util.c                |   8 +-
> >  mm/vmscan.c              |   2 +-
> >  23 files changed, 485 insertions(+), 154 deletions(-)
> > 
> > 
> > base-commit: e1afacb68573c3cd0a3785c6b0508876cd3423bc
> > -- 
> > 2.34.1
> > 
Re: [RFC PATCH 0/6] Remove device private pages from physical address space
Posted by Matthew Brost 19 hours ago
On Mon, Dec 01, 2025 at 10:23:32AM +1100, Alistair Popple wrote:
> On 2025-11-29 at 06:22 +1100, Matthew Brost <matthew.brost@intel.com> wrote...
> > On Fri, Nov 28, 2025 at 03:41:40PM +1100, Jordan Niethe wrote:
> > > Today, when creating these device private struct pages, the first step
> > > is to use request_free_mem_region() to get a range of physical address
> > > space large enough to represent the devices memory. This allocated
> > > physical address range is then remapped as device private memory using
> > > memremap_pages.
> > > 
> > > Needing allocation of physical address space has some problems:
> > > 
> > >   1) There may be insufficient physical address space to represent the
> > >      device memory. KASLR reducing the physical address space and VM
> > >      configurations with limited physical address space increase the
> > >      likelihood of hitting this especially as device memory increases. This
> > >      has been observed to prevent device private from being initialized.  
> > > 
> > >   2) Attempting to add the device private pages to the linear map at
> > >      addresses beyond the actual physical memory causes issues on
> > >      architectures like aarch64  - meaning the feature does not work there [0].
> > > 
> > > This RFC changes device private memory so that it does not require
> > > allocation of physical address space and these problems are avoided.
> > > Instead of using the physical address space, we introduce a "device
> > > private address space" and allocate from there.
> > > 
> > > A consequence of placing the device private pages outside of the
> > > physical address space is that they no longer have a PFN. However, it is
> > > still necessary to be able to look up a corresponding device private
> > > page from a device private PTE entry, which means that we still require
> > > some way to index into this device private address space. This leads to
> > > the idea of a device private PFN. This is like a PFN but instead of
> > > associating memory in the physical address space with a struct page, it
> > > associates device memory in the device private address space with a
> > > device private struct page.
> > > 
> > > The problem that then needs to be addressed is how to avoid confusing
> > > these device private PFNs with the regular PFNs. It is the inherent
> > > limited usage of the device private pages themselves which make this
> > > possible. A device private page is only used for userspace mappings, we
> > > do not need to be concerned with them being used within the mm more
> > > broadly. This means that the only way that the core kernel looks up
> > > these pages is via the page table, where their PTE already indicates if
> > > they refer to a device private page via their swap type, e.g.
> > > SWP_DEVICE_WRITE. We can use this information to determine if the PTE
> > > contains a normal PFN which should be looked up in the page map, or a
> > > device private PFN which should be looked up elsewhere.
> > > 
> > > This applies when we are creating PTE entries for device private pages -
> > > because they have their own type there are already must be handled
> > > separately, so it is a small step to convert them to a device private
> > > PFN now too.
> > > 
> > > The first part of the series updates callers where device private PFNs
> > > might now be encountered to track this extra state.
> > > 
> > > The last patch contains the bulk of the work where we change how we
> > > convert between device private pages to device private PFNs and then use
> > > a new interface for allocating device private pages without the need for
> > > reserving physical address space.
> > > 
> > > For the purposes of the RFC changes have been limited to test_hmm.c
> > > updates to the other drivers will be included in the next revision.
> > > 
> > > This would include updating existing users of memremap_pages() to use
> > > memremap_device_private_pagemap() instead to allocate device private
> > > pages. This also means they would no longer need to call
> > > request_free_mem_region().  An equivalent of devm_memremap_pages() will
> > > also be necessary.
> > > 
> > > Users of the migrate_vma() interface will also need to be updated to be
> > > aware these device private PFNs.
> > > 
> > > By removing the device private pages from the physical address space,
> > > this RFC also opens up the possibility to moving away from tracking
> > > device private memory using struct pages in the future. This is
> > > desirable as on systems with large amounts of memory these device
> > > private struct pages use a signifiant amount of memory and take a
> > > significant amount of time to initialize.
> > 
> > A couple things.
> > 
> > - I’m fairly certain that, briefly looking at this, it will break all
> >   upstream DRM drivers (AMDKFD, Nouveau, Xe / GPUSVM) that use device
> >   private pages. I looked into what I think conflicts with Xe / GPUSVM,
> >   and I believe the impact is fairly minor. I’m happy to help by pulling
> >   this code and fixing up our side.
> 
> It most certainly will :-) I think Jordan called that out above but we wanted

I don't always read.

> to get the design right before spending too much time updating drivers. That
> said I don't think the driver changes should be extensive, but let us know if
> you disagree.

I did a quick look, and I believe it pretty minor (e.g., pfn_to_page is used a
few places for device pages which would need a refactor, etc...). Maybe
a bit more, we will find out but not too concerned.

> 
> > - I’m fully on board with eventually moving to something that uses less
> >   memory than struct page, and I’m happy to coordinate on future changes.
> 
> Thanks!
> 
> > - Before we start coordinating on this patch set, should we hold off until
> >   the 6.19 cycle, which includes 2M device pages from Balbir [1] (i.e.,
> >   rebase this series on top of 6.19 once it includes 2M pages)? I suspect
> >   that, given the scope of this series and Balbir’s, there will be some
> >   conflicts.
> 
> Our aim here is to get some review of the design and the patches/implementation
> for the 6.19 cycle but I agree that this will need to get rebased on top of
> Balbir's series.

+1. Will be on the lookout for the next post and pull into 6.19 DRM tree
and at least test out the Intel stuffi + send fixes if needed.

I can enable both of you for Intel CI too, just include intel-xe list on
next post and it will get kicked off and you can find the results on
patchworks.

Matt

> 
>  - Alistair
> 
> > Matt
> > 
> > [1] https://patchwork.freedesktop.org/series/152798/
> > 
> > > 
> > > Testing:
> > > - selftests/mm/hmm-tests on an amd64 VM
> > > 
> > > [0] https://lore.kernel.org/lkml/CAMj1kXFZ=4hLL1w6iCV5O5uVoVLHAJbc0rr40j24ObenAjXe9w@mail.gmail.com/
> > > 
> > > Jordan Niethe (6):
> > >   mm/hmm: Add flag to track device private PFNs
> > >   mm/migrate_device: Add migrate PFN flag to track device private PFNs
> > >   mm/page_vma_mapped: Add flags to page_vma_mapped_walk::pfn to track
> > >     device private PFNs
> > >   mm: Add a new swap type for migration entries with device private PFNs
> > >   mm/util: Add flag to track device private PFNs in page snapshots
> > >   mm: Remove device private pages from the physical address space
> > > 
> > >  Documentation/mm/hmm.rst |   9 +-
> > >  fs/proc/page.c           |   6 +-
> > >  include/linux/hmm.h      |   5 ++
> > >  include/linux/memremap.h |  25 +++++-
> > >  include/linux/migrate.h  |   5 ++
> > >  include/linux/mm.h       |   9 +-
> > >  include/linux/rmap.h     |  33 +++++++-
> > >  include/linux/swap.h     |   8 +-
> > >  include/linux/swapops.h  | 102 +++++++++++++++++++++--
> > >  lib/test_hmm.c           |  66 ++++++++-------
> > >  mm/debug.c               |   9 +-
> > >  mm/hmm.c                 |   2 +-
> > >  mm/memory.c              |   9 +-
> > >  mm/memremap.c            | 174 +++++++++++++++++++++++++++++----------
> > >  mm/migrate.c             |   6 +-
> > >  mm/migrate_device.c      |  44 ++++++----
> > >  mm/mm_init.c             |   8 +-
> > >  mm/mprotect.c            |  21 +++--
> > >  mm/page_vma_mapped.c     |  18 +++-
> > >  mm/pagewalk.c            |   2 +-
> > >  mm/rmap.c                |  68 ++++++++++-----
> > >  mm/util.c                |   8 +-
> > >  mm/vmscan.c              |   2 +-
> > >  23 files changed, 485 insertions(+), 154 deletions(-)
> > > 
> > > 
> > > base-commit: e1afacb68573c3cd0a3785c6b0508876cd3423bc
> > > -- 
> > > 2.34.1
> > >