[PATCH 00/16] mm: remove is_swap_[pte, pmd]() + non-swap entries, introduce leaf entries

Lorenzo Stoakes posted 16 patches 3 months ago
There is a newer version of this series
MAINTAINERS                   |   1 +
arch/s390/mm/gmap_helpers.c   |  18 +-
arch/s390/mm/pgtable.c        |  12 +-
fs/proc/task_mmu.c            | 294 +++++++++-------
fs/userfaultfd.c              |  85 ++---
include/asm-generic/hugetlb.h |   8 -
include/linux/huge_mm.h       |  48 ++-
include/linux/hugetlb.h       |   2 -
include/linux/leafops.h       | 622 ++++++++++++++++++++++++++++++++++
include/linux/migrate.h       |   3 +-
include/linux/mm_inline.h     |   6 +-
include/linux/swapops.h       | 273 +--------------
include/linux/userfaultfd_k.h |  33 +-
mm/damon/ops-common.c         |   6 +-
mm/debug_vm_pgtable.c         |  86 +++--
mm/filemap.c                  |   8 +-
mm/hmm.c                      |  36 +-
mm/huge_memory.c              | 263 +++++++-------
mm/hugetlb.c                  | 165 ++++-----
mm/internal.h                 |  20 +-
mm/khugepaged.c               |  33 +-
mm/ksm.c                      |   6 +-
mm/madvise.c                  |  28 +-
mm/memory-failure.c           |   8 +-
mm/memory.c                   | 150 ++++----
mm/mempolicy.c                |  25 +-
mm/migrate.c                  |  45 +--
mm/migrate_device.c           |  24 +-
mm/mincore.c                  |  25 +-
mm/mprotect.c                 |  59 ++--
mm/mremap.c                   |  13 +-
mm/page_table_check.c         |  33 +-
mm/page_vma_mapped.c          |  65 ++--
mm/pagewalk.c                 |  15 +-
mm/rmap.c                     |  17 +-
mm/shmem.c                    |   7 +-
mm/swap_state.c               |  12 +-
mm/swapfile.c                 |  14 +-
mm/userfaultfd.c              |  53 +--
39 files changed, 1537 insertions(+), 1084 deletions(-)
create mode 100644 include/linux/leafops.h
[PATCH 00/16] mm: remove is_swap_[pte, pmd]() + non-swap entries, introduce leaf entries
Posted by Lorenzo Stoakes 3 months ago
There's an established convention in the kernel that we treat leaf page
tables (so far at the PTE, PMD level) as containing 'swap entries' should
they be neither empty (i.e. p**_none() evaluating true) nor present
(i.e. p**_present() evaluating true).

However, at the same time we also have helper predicates - is_swap_pte(),
is_swap_pmd() - which are inconsistently used.

This is problematic, as it is logical to assume that should somebody wish
to operate upon a page table swap entry they should first check to see if
it is in fact one.

It also implies that perhaps, in future, we might introduce a non-present,
none page table entry that is not a swap entry.

This series resolves this issue by systematically eliminating all use of
the is_swap_pte() and is swap_pmd() predicates so we retain only the
convention that should a leaf page table entry be neither none nor present
it is a swap entry.

We also have the further issue that 'swap entry' is unfortunately a really
rather overloaded term and in fact refers to both entries for swap and for
other information such as migration entries, page table markers, and device
private entries.

We therefore have the rather 'unique' concept of a 'non-swap' swap entry.

This series therefore introduces the concept of 'leaf entries' to eliminate
this confusion.

A leaf entry in this sense is any page table entry which is non-present,
and represented by the leaf_entry_t type.

This includes 'none' or empty entries, which are simply represented by an
zero leaf entry value.

In order to maintain compatibility as we transition the kernel to this new
type, we simply typedef swp_entry_t to leaf_entry_t.

We introduce a number of predicates and helpers to interact with leaf
entries in include/linux/leafops.h which, as it imports swapops.h, can be
treated as a drop-in replacement for swapops.h wherever leaf entry helpers
are used.

Since leafent_from_[pte, pmd]() treats present entries as they were
empty/none leaf entries, this allows for a great deal of simplification of
code throughout the code base, which this series utilises a great deal.

We additionally change from swap entry to leaf entry handling where it
makes sense to and eliminate functions from swapops.h where leaf entries
obviate the need for the functions.


non-RFC v1:
* As part of efforts to eliminate swp_entry_t usage, remove
  pte_none_mostly() and correct UFFD PTE marker handling.
* Introduce leaf_entry_t - credit to Gregory for naming, and to Jason for
  the concept of simply using a leafent_*() set of functions to interact
  with these entities.
* Replace pte_to_swp_entry_or_zero() with leafent_from_pte() and simply
  categorise pte_none() cases as an empty leaf entry, as per Jason.
* Eliminate get_pte_swap_entry() - as we can simply do this with
  leafent_from_pte() also, as discussed with Jason.
* Put pmd_trans_huge_lock() acquisition/release in pagemap_pmd_range()
  rather than pmd_trans_huge_lock_thp() as per Gregory.
* Eliminate pmd_to_swp_entry() and related and introduce leafent_from_pmd()
  to replace it and further propagate leaf entry usage.
* Remove the confusing and unnecessary is_hugetlb_entry_[migration,
  hwpoison]() functions.
* Replace is_pfn_swap_entry(), pfn_swap_entry_to_page(),
  is_writable_device_private_entry(), is_device_exclusive_entry(),
  is_migration_entry(), is_writable_migration_entry(),
  is_readable_migration_entry(), is_readable_exclusive_migration_entry()
  and pfn_swap_entry_folio() with leafent equivalents.
* Wrapped up the 'safe' behaviour discussed with Jason in
  leafent_from_[pte, pmd]() so these can be used unconditionally which
  simplifies things a lot.
* Further changes that are a consequence of the introduction of leaf
  entries.

RFC:
https://lore.kernel.org/all/cover.1761288179.git.lorenzo.stoakes@oracle.com/

Lorenzo Stoakes (16):
  mm: correctly handle UFFD PTE markers
  mm: introduce leaf entry type and use to simplify leaf entry logic
  mm: avoid unnecessary uses of is_swap_pte()
  mm: eliminate uses of is_swap_pte() when leafent_from_pte() suffices
  mm: use leaf entries in debug pgtable + remove is_swap_pte()
  fs/proc/task_mmu: refactor pagemap_pmd_range()
  mm: avoid unnecessary use of is_swap_pmd()
  mm/huge_memory: refactor copy_huge_pmd() non-present logic
  mm/huge_memory: refactor change_huge_pmd() non-present logic
  mm: replace pmd_to_swp_entry() with leafent_from_pmd()
  mm: introduce pmd_is_huge() and use where appropriate
  mm: remove remaining is_swap_pmd() users and is_swap_pmd()
  mm: remove non_swap_entry() and use leaf entry helpers instead
  mm: remove is_hugetlb_entry_[migration, hwpoisoned]()
  mm: eliminate further swapops predicates
  mm: replace remaining pte_to_swp_entry() with leafent_from_pte()

 MAINTAINERS                   |   1 +
 arch/s390/mm/gmap_helpers.c   |  18 +-
 arch/s390/mm/pgtable.c        |  12 +-
 fs/proc/task_mmu.c            | 294 +++++++++-------
 fs/userfaultfd.c              |  85 ++---
 include/asm-generic/hugetlb.h |   8 -
 include/linux/huge_mm.h       |  48 ++-
 include/linux/hugetlb.h       |   2 -
 include/linux/leafops.h       | 622 ++++++++++++++++++++++++++++++++++
 include/linux/migrate.h       |   3 +-
 include/linux/mm_inline.h     |   6 +-
 include/linux/swapops.h       | 273 +--------------
 include/linux/userfaultfd_k.h |  33 +-
 mm/damon/ops-common.c         |   6 +-
 mm/debug_vm_pgtable.c         |  86 +++--
 mm/filemap.c                  |   8 +-
 mm/hmm.c                      |  36 +-
 mm/huge_memory.c              | 263 +++++++-------
 mm/hugetlb.c                  | 165 ++++-----
 mm/internal.h                 |  20 +-
 mm/khugepaged.c               |  33 +-
 mm/ksm.c                      |   6 +-
 mm/madvise.c                  |  28 +-
 mm/memory-failure.c           |   8 +-
 mm/memory.c                   | 150 ++++----
 mm/mempolicy.c                |  25 +-
 mm/migrate.c                  |  45 +--
 mm/migrate_device.c           |  24 +-
 mm/mincore.c                  |  25 +-
 mm/mprotect.c                 |  59 ++--
 mm/mremap.c                   |  13 +-
 mm/page_table_check.c         |  33 +-
 mm/page_vma_mapped.c          |  65 ++--
 mm/pagewalk.c                 |  15 +-
 mm/rmap.c                     |  17 +-
 mm/shmem.c                    |   7 +-
 mm/swap_state.c               |  12 +-
 mm/swapfile.c                 |  14 +-
 mm/userfaultfd.c              |  53 +--
 39 files changed, 1537 insertions(+), 1084 deletions(-)
 create mode 100644 include/linux/leafops.h

--
2.51.0
Re: [PATCH 00/16] mm: remove is_swap_[pte, pmd]() + non-swap entries, introduce leaf entries
Posted by Wei Yang 3 months ago
On Mon, Nov 03, 2025 at 12:31:41PM +0000, Lorenzo Stoakes wrote:
>There's an established convention in the kernel that we treat leaf page
>tables (so far at the PTE, PMD level) as containing 'swap entries' should
>they be neither empty (i.e. p**_none() evaluating true) nor present
>(i.e. p**_present() evaluating true).
>
>However, at the same time we also have helper predicates - is_swap_pte(),
>is_swap_pmd() - which are inconsistently used.
>
>This is problematic, as it is logical to assume that should somebody wish
>to operate upon a page table swap entry they should first check to see if
>it is in fact one.
>
>It also implies that perhaps, in future, we might introduce a non-present,
>none page table entry that is not a swap entry.
>
>This series resolves this issue by systematically eliminating all use of
>the is_swap_pte() and is swap_pmd() predicates so we retain only the
>convention that should a leaf page table entry be neither none nor present
>it is a swap entry.
>
>We also have the further issue that 'swap entry' is unfortunately a really
>rather overloaded term and in fact refers to both entries for swap and for
>other information such as migration entries, page table markers, and device
>private entries.
>
>We therefore have the rather 'unique' concept of a 'non-swap' swap entry.
>
>This series therefore introduces the concept of 'leaf entries' to eliminate
>this confusion.
>
>A leaf entry in this sense is any page table entry which is non-present,
>and represented by the leaf_entry_t type.
>
>This includes 'none' or empty entries, which are simply represented by an
>zero leaf entry value.
>
>In order to maintain compatibility as we transition the kernel to this new
>type, we simply typedef swp_entry_t to leaf_entry_t.
>
>We introduce a number of predicates and helpers to interact with leaf
>entries in include/linux/leafops.h which, as it imports swapops.h, can be
>treated as a drop-in replacement for swapops.h wherever leaf entry helpers
>are used.
>
>Since leafent_from_[pte, pmd]() treats present entries as they were
>empty/none leaf entries, this allows for a great deal of simplification of
>code throughout the code base, which this series utilises a great deal.
>
>We additionally change from swap entry to leaf entry handling where it
>makes sense to and eliminate functions from swapops.h where leaf entries
>obviate the need for the functions.
>

Hi, Lorenzo

Thanks for the effort on cleanup this, which helps me clearing the confusing
on checking swap entry.


-- 
Wei Yang
Help you, Help me
Re: [PATCH 00/16] mm: remove is_swap_[pte, pmd]() + non-swap entries, introduce leaf entries
Posted by Lorenzo Stoakes 3 months ago
On Wed, Nov 05, 2025 at 02:41:40AM +0000, Wei Yang wrote:
> On Mon, Nov 03, 2025 at 12:31:41PM +0000, Lorenzo Stoakes wrote:
> >There's an established convention in the kernel that we treat leaf page
> >tables (so far at the PTE, PMD level) as containing 'swap entries' should
> >they be neither empty (i.e. p**_none() evaluating true) nor present
> >(i.e. p**_present() evaluating true).
> >
> >However, at the same time we also have helper predicates - is_swap_pte(),
> >is_swap_pmd() - which are inconsistently used.
> >
> >This is problematic, as it is logical to assume that should somebody wish
> >to operate upon a page table swap entry they should first check to see if
> >it is in fact one.
> >
> >It also implies that perhaps, in future, we might introduce a non-present,
> >none page table entry that is not a swap entry.
> >
> >This series resolves this issue by systematically eliminating all use of
> >the is_swap_pte() and is swap_pmd() predicates so we retain only the
> >convention that should a leaf page table entry be neither none nor present
> >it is a swap entry.
> >
> >We also have the further issue that 'swap entry' is unfortunately a really
> >rather overloaded term and in fact refers to both entries for swap and for
> >other information such as migration entries, page table markers, and device
> >private entries.
> >
> >We therefore have the rather 'unique' concept of a 'non-swap' swap entry.
> >
> >This series therefore introduces the concept of 'leaf entries' to eliminate
> >this confusion.
> >
> >A leaf entry in this sense is any page table entry which is non-present,
> >and represented by the leaf_entry_t type.
> >
> >This includes 'none' or empty entries, which are simply represented by an
> >zero leaf entry value.
> >
> >In order to maintain compatibility as we transition the kernel to this new
> >type, we simply typedef swp_entry_t to leaf_entry_t.
> >
> >We introduce a number of predicates and helpers to interact with leaf
> >entries in include/linux/leafops.h which, as it imports swapops.h, can be
> >treated as a drop-in replacement for swapops.h wherever leaf entry helpers
> >are used.
> >
> >Since leafent_from_[pte, pmd]() treats present entries as they were
> >empty/none leaf entries, this allows for a great deal of simplification of
> >code throughout the code base, which this series utilises a great deal.
> >
> >We additionally change from swap entry to leaf entry handling where it
> >makes sense to and eliminate functions from swapops.h where leaf entries
> >obviate the need for the functions.
> >
>
> Hi, Lorenzo
>
> Thanks for the effort on cleanup this, which helps me clearing the confusing
> on checking swap entry.

Thank you :) much appreciated!

Hope it's useful, my ultimate initial aim was to address my own confusion and
frustration (stemming out of a debate about use of the is_swap_pte() predicate
on a review), I'm glad that via review and also thinking 'hmm we should address
this also' etc. this his developed into something that hopefully makes life
easier for everybody!

>
>
> --
> Wei Yang
> Help you, Help me

Cheers, Lorenzo
Re: [PATCH 00/16] mm: remove is_swap_[pte, pmd]() + non-swap entries, introduce leaf entries
Posted by Andrew Morton 3 months ago
On Mon,  3 Nov 2025 12:31:41 +0000 Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote:

> There's an established convention in the kernel that we treat leaf page
> tables (so far at the PTE, PMD level) as containing 'swap entries' should
> they be neither empty (i.e. p**_none() evaluating true) nor present
> (i.e. p**_present() evaluating true).
> 
> ...
>

All queued up, along with two -fix patches, thanks.

I disabled the customary email spray.