[RFC PATCH 00/12] remove is_swap_[pte, pmd]() + non-swap confusion

Lorenzo Stoakes posted 12 patches 3 months, 2 weeks ago
arch/s390/mm/gmap_helpers.c   |   2 +-
arch/s390/mm/pgtable.c        |   2 +-
fs/proc/task_mmu.c            | 214 ++++++++++++++++++++--------------
include/linux/huge_mm.h       |  49 +++++---
include/linux/swapops.h       |  99 ++++++++++++++--
include/linux/userfaultfd_k.h |  16 +--
mm/debug_vm_pgtable.c         |  43 ++++---
mm/filemap.c                  |   2 +-
mm/hmm.c                      |   2 +-
mm/huge_memory.c              | 189 ++++++++++++++++--------------
mm/hugetlb.c                  |   6 +-
mm/internal.h                 |  12 +-
mm/khugepaged.c               |  29 ++---
mm/madvise.c                  |  14 +--
mm/memory.c                   |  62 +++++-----
mm/migrate.c                  |   2 +-
mm/mincore.c                  |   2 +-
mm/mprotect.c                 |  45 ++++---
mm/mremap.c                   |   9 +-
mm/page_table_check.c         |  25 ++--
mm/page_vma_mapped.c          |  30 +++--
mm/swap_state.c               |   5 +-
mm/swapfile.c                 |   3 +-
mm/userfaultfd.c              |   2 +-
24 files changed, 511 insertions(+), 353 deletions(-)
[RFC PATCH 00/12] remove is_swap_[pte, pmd]() + non-swap confusion
Posted by Lorenzo Stoakes 3 months, 2 weeks ago
There's an established convention in the kernel that we treat leaf page
tables (so far at the PTE, PMD level) as containing 'swap entries' should
they be neither empty (i.e. p**_none() evaluating true) nor present
(i.e. p**_present() evaluating true).

However, at the same time we also have helper predicates - is_swap_pte(),
is_swap_pmd() - which are inconsistently used.

This is problematic, as it is logical to assume that should somebody wish
to operate upon a page table swap entry they should first check to see if
it is in fact one.

It also implies that perhaps, in future, we might introduce a non-present,
none page table entry that is not a swap entry.

This series resolves this issue by systematically eliminating all use of
the is_swap_pte() and is swap_pmd() predicates so we retain only the
convention that should a leaf page table entry be neither none nor present
it is a swap entry.

We also have the further issue that 'swap entry' is unfortunately a really
rather overloaded term and in fact refers to both entries for swap and for
other information such as migration entries, page table markers, and device
private entries.

We therefore have the rather 'unique' concept of a 'non-swap' swap entry.

This is deeply confusing, so this series goes further and eliminates the
non_swap_entry() predicate, replacing it with is_non_present_entry() - with
an eye to a new convention of referring to these non-swap 'swap entries' as
non-present.

It also introduces the is_swap_entry() predicate to explicitly and
logically refer to actual 'true' swap entries, improving code readibility,
avoiding the hideous convention of:

	if (!non_swap_entry(entry)) {
		...
	}

As part of these changes we also introduce a few other new predicates:

* pte_to_swp_entry_or_zero() - allows for convenient conversion from a PTE
  to a swap entry if present, or an empty swap entry if none. This is
  useful as many swap entry conversions are simply checking for flags for
  which this suffices.

* get_pte_swap_entry() - Retrieves a PTE swap entry if it truly is a swap
  entry (i.e. not a non-present entry), returning true if so, otherwise
  returns false. This simplifies a lot of logic that previously open-coded
  this.

* is_huge_pmd() - Determines if a PMD contains either a present transparent
  huge page entry or a huge non-present entry. This again simplifies a lot
  of logic that simply open-coded this.

REVIEWERS NOTE:

This series applies against mm-unstable as there are currently conflicts
with mm-new. Should the series receive community assent I will resolve
these at the point the RFC tag is removed.

I also intend to use this as a foundation for further work to add higher
order page table markers.

Lorenzo Stoakes (12):
  mm: introduce and use pte_to_swp_entry_or_zero()
  mm: avoid unnecessary uses of is_swap_pte()
  mm: introduce get_pte_swap_entry() and use it
  mm: use get_pte_swap_entry() in debug pgtable + remove is_swap_pte()
  fs/proc/task_mmu: refactor pagemap_pmd_range()
  mm: avoid unnecessary use of is_swap_pmd()
  mm: introduce is_huge_pmd() and use where appropriate
  mm/huge_memory: refactor copy_huge_pmd() non-present logic
  mm/huge_memory: refactor change_huge_pmd() non-present logic
  mm: remove remaining is_swap_pmd() users and is_swap_pmd()
  mm: rename non_swap_entry() to is_non_present_entry()
  mm: provide is_swap_entry() and use it

 arch/s390/mm/gmap_helpers.c   |   2 +-
 arch/s390/mm/pgtable.c        |   2 +-
 fs/proc/task_mmu.c            | 214 ++++++++++++++++++++--------------
 include/linux/huge_mm.h       |  49 +++++---
 include/linux/swapops.h       |  99 ++++++++++++++--
 include/linux/userfaultfd_k.h |  16 +--
 mm/debug_vm_pgtable.c         |  43 ++++---
 mm/filemap.c                  |   2 +-
 mm/hmm.c                      |   2 +-
 mm/huge_memory.c              | 189 ++++++++++++++++--------------
 mm/hugetlb.c                  |   6 +-
 mm/internal.h                 |  12 +-
 mm/khugepaged.c               |  29 ++---
 mm/madvise.c                  |  14 +--
 mm/memory.c                   |  62 +++++-----
 mm/migrate.c                  |   2 +-
 mm/mincore.c                  |   2 +-
 mm/mprotect.c                 |  45 ++++---
 mm/mremap.c                   |   9 +-
 mm/page_table_check.c         |  25 ++--
 mm/page_vma_mapped.c          |  30 +++--
 mm/swap_state.c               |   5 +-
 mm/swapfile.c                 |   3 +-
 mm/userfaultfd.c              |   2 +-
 24 files changed, 511 insertions(+), 353 deletions(-)

--
2.51.0
Re: [RFC PATCH 00/12] remove is_swap_[pte, pmd]() + non-swap confusion
Posted by Jason Gunthorpe 3 months, 1 week ago
On Fri, Oct 24, 2025 at 08:41:16AM +0100, Lorenzo Stoakes wrote:
> There's an established convention in the kernel that we treat leaf page
> tables (so far at the PTE, PMD level) as containing 'swap entries' should
> they be neither empty (i.e. p**_none() evaluating true) nor present
> (i.e. p**_present() evaluating true).

I have to say I've never liked the none-vs-present naming either.

> This is deeply confusing, so this series goes further and eliminates the
> non_swap_entry() predicate, replacing it with is_non_present_entry() - with
> an eye to a new convention of referring to these non-swap 'swap entries' as
> non-present.

I'm not keen on is_non_present_entry(), it seems confusing again.

It looks like we are stuck with swp_entry_t as the being the handle
for a non-present pte. Oh well, not a great name, but fine..

So we think of that swp_entry_t having multiple types: swap, migration,
device private, etc, etc

Then I'd think the general pattern should be to get a swp_entry_t:

    if (pte_present(pte))
        return;
    swpent = pte_to_swp_entry(pte);

And then evaluate the type:

    if (swpent_is_swap()) {
    }


If you keep the naming as "swp_entry" indicates the multi-type value,
then "swap" can mean a swp_entry which is used by the swap subsystem.

That suggests functions like this:

swpent_is_swap()
swpent_is_migration()
..

and your higher level helpers like:

/* True if the pte is a swpent_is_swap() */
static inline bool swpent_get_swap_pte(pte_t pte, swp_entry_t *entryp)
{
   if (pte_present(pte))
        return false;
   *swpent = pte_to_swp_entry(pte);
   return swpent_is_swap(*swpent);
}

I also think it will be more readable to keep all these things under a
swpent namespace instead of using unstructured english names.

> * pte_to_swp_entry_or_zero() - allows for convenient conversion from a PTE
>   to a swap entry if present, or an empty swap entry if none. This is
>   useful as many swap entry conversions are simply checking for flags for
>   which this suffices.

I'd expect a safe function should be more like

   *swpent = pte_to_swp_entry_safe(pte);
   return swpent_is_swap(*swpent);

Where "safe" means that if the PTE is None or Present then
swpent_is_XX() == false. Ie it returns a 0 swpent and 0 swpent is
always nothing.

> * get_pte_swap_entry() - Retrieves a PTE swap entry if it truly is a swap
>   entry (i.e. not a non-present entry), returning true if so, otherwise
>   returns false. This simplifies a lot of logic that previously open-coded
>   this.

Like this is still a tortured function:

+static inline bool get_pte_swap_entry(pte_t pte, swp_entry_t *entryp)
+{
+       if (pte_present(pte))
+               return false;
+       if (pte_none(pte))
+               return false;
+
+       *entryp = pte_to_swp_entry(pte);
+       if (non_swap_entry(*entryp))
+               return false;
+
+       return true;
+}
+

static inline bool get_pte_swap_entry(pte_t pte, swp_entry_t *entryp)
{
   return swpent_is_swap(*swpent = pte_to_swp_entry_safe(pte));
}

Maybe it doesn't even need an inline at that point?

> * is_huge_pmd() - Determines if a PMD contains either a present transparent
>   huge page entry or a huge non-present entry. This again simplifies a lot
>   of logic that simply open-coded this.

is_huge_or_swpent_pmd() would be nicer, IMHO. I think it is surprising
when any of these APIs accept swap entries without being explicit

Jason
Re: [RFC PATCH 00/12] remove is_swap_[pte, pmd]() + non-swap confusion
Posted by Gregory Price 3 months, 1 week ago
On Mon, Oct 27, 2025 at 01:09:23PM -0300, Jason Gunthorpe wrote:
> 
> I'm not keen on is_non_present_entry(), it seems confusing again.
> 

The confusion stems from `present` referring to the state of the hardware
PTE bits, instead of referring to the state of the entry.

But even if we're stuck with "non-present entry", it's still infinitely
more understandable (and teachable) than "non_swap_swap_entry".

So even if we never get to the point of replacing swp_entry_t, this is a
clear and obvious improvement.
~Gregory
Re: [RFC PATCH 00/12] remove is_swap_[pte, pmd]() + non-swap confusion
Posted by Lorenzo Stoakes 3 months, 1 week ago
(Note I never intended this to be an RFC, it was only because of
series-likely-to-be-dropped causing nasty conflicts this isn't an 'out
there' series rather a practical submission).

To preface, as I said elsewhere, I intend to do more on this, renaming
swp_entry_t to probably leaf_entry_t (thanks Gregory!)

The issue is no matter how I do this people will theorise different
approaches, I'm trying to practically find a way forward that works
iteratively.

There are 502 lines referencing swp_entry_t in the kernel.

Leaving _that_ to another series seems sensible to me.

On Mon, Oct 27, 2025 at 01:09:23PM -0300, Jason Gunthorpe wrote:
> On Fri, Oct 24, 2025 at 08:41:16AM +0100, Lorenzo Stoakes wrote:
> > There's an established convention in the kernel that we treat leaf page
> > tables (so far at the PTE, PMD level) as containing 'swap entries' should
> > they be neither empty (i.e. p**_none() evaluating true) nor present
> > (i.e. p**_present() evaluating true).
>
> I have to say I've never liked the none-vs-present naming either.

OK.

I think that's not something we can reasonably get away from practically at
this point.

I don't love 'none' as a thing. Empty would be better but you know
naming... hard.

>
> > This is deeply confusing, so this series goes further and eliminates the
> > non_swap_entry() predicate, replacing it with is_non_present_entry() - with
> > an eye to a new convention of referring to these non-swap 'swap entries' as
> > non-present.
>
> I'm not keen on is_non_present_entry(), it seems confusing again.

What is confusing about it?

The idea is we don't mention swap. If something is non-present and not swap
then it's... a non-present entry.

The logic generally does need to differentiate between the two.

Otherwise we're back to referencing swap again...

>
> It looks like we are stuck with swp_entry_t as the being the handle
> for a non-present pte. Oh well, not a great name, but fine..

We're not stuck with it, I'm just doing things a step at a time. I fully
intend to rename it.

Perfect is the enemy of the good, and this series improves things a lot.

>
> So we think of that swp_entry_t having multiple types: swap, migration,
> device private, etc, etc

I mean do we use fields in the swap entry significantly differently in each
of these types?

I think any change like that would need to be a follow-up series because
that'd involve rewriting a lot of code...

If there's _good reason_ to and we can sensibly stick a like struct entry
in the union we can rename swp_type to leaf_type or something that could be
nice actually.

But maybe make opaque...

But only if there's enough variation in the 'shape' of the data in the swap
entry between different types.

I would need to go check... because if not, then there's no point really.

>
> Then I'd think the general pattern should be to get a swp_entry_t:
>
>     if (pte_present(pte))
>         return;
>     swpent = pte_to_swp_entry(pte);
>
> And then evaluate the type:
>
>     if (swpent_is_swap()) {
>     }

That just embeds the confusion re: swap entry in the function name, no
that's horrible?

We may as well keep non_swap_entry() if we do that right?

Again, I intend to rename swap_entry_t. I'm not doing everything at once
because it's simply not practical.

>
>
> If you keep the naming as "swp_entry" indicates the multi-type value,

Nope don't intend to keep.

> then "swap" can mean a swp_entry which is used by the swap subsystem.
>
> That suggests functions like this:
>
> swpent_is_swap()
> swpent_is_migration()
> ..

The _whole point_ of this series is to separate out the idea that you're
dealing with swap entries so I don't like swpent as a name obviously.

Once we do the rename, we can do something like leafent_is_swap() etc.

And agreed that's neater and more consistent. We'd want to rename all swap
predicates similarly.

We could also just pre-empt and prefix functions with leafent_is_swap() if
you prefer.

We could even do:

/* TODO: Rename swap_entry_t to leaf_entry_t */
typedef swap_entry_t leaf_entry_t;

And use the new type right away.

What do you think?

>
> and your higher level helpers like:
>
> /* True if the pte is a swpent_is_swap() */
> static inline bool swpent_get_swap_pte(pte_t pte, swp_entry_t *entryp)
> {
>    if (pte_present(pte))
>         return false;
>    *swpent = pte_to_swp_entry(pte);
>    return swpent_is_swap(*swpent);
> }

I already implement in the series a pte_to_swp_entry_or_zero() function
that goes one further - checks pte_present() for you, if pte_none() you
just get an empty swap entry, so this can be:

static inline bool get_pte_swap_entry(pte_t, swp_entry_t *entryp)
{
	*entryp = pte_to_swp_entry_or_zero(pte);
	return is_swap_entry(*entryp);
}

Which is a valid refactoring suggestion but the building blocks are
_already in the series_.

I mean that's valid feedback, that's much nicer, will refactor thanks.

>
> I also think it will be more readable to keep all these things under a
> swpent namespace instead of using unstructured english names.

Nope. Again, the whole point of the series is to avoid referencing
swap. swpent_xxx() is just eliminating the purpose of the series right?

Yes it sucks that the type name is what it is, but this is an iterative
process.

But as above, we could pre-empt future changes and prefix with a
leafent_*() prefix if that works for you?

>
> > * pte_to_swp_entry_or_zero() - allows for convenient conversion from a PTE
> >   to a swap entry if present, or an empty swap entry if none. This is
> >   useful as many swap entry conversions are simply checking for flags for
> >   which this suffices.
>
> I'd expect a safe function should be more like
>
>    *swpent = pte_to_swp_entry_safe(pte);
>    return swpent_is_swap(*swpent);
>
> Where "safe" means that if the PTE is None or Present then
> swpent_is_XX() == false. Ie it returns a 0 swpent and 0 swpent is
> always nothing.

Not sure it's really 'safe', the name is unfortunate, but you could read
this as 'always get a valid swap entry to operate on'...

I like to encode the fact we are treating pte_none() this way.

pte_to_leaf_entry() is possible?

Or

leaf_entry_t leafent_from_pte()...?

>
> > * get_pte_swap_entry() - Retrieves a PTE swap entry if it truly is a swap
> >   entry (i.e. not a non-present entry), returning true if so, otherwise
> >   returns false. This simplifies a lot of logic that previously open-coded
> >   this.
>
> Like this is still a tortured function:

This is moot (also not the final version ofc as I get rid of
non_swap_entry()), I agree the approach you suggest is better as per above
so we'll go with that (though without the horrid embeded assignment bit).

>
> +static inline bool get_pte_swap_entry(pte_t pte, swp_entry_t *entryp)
> +{
> +       if (pte_present(pte))
> +               return false;
> +       if (pte_none(pte))
> +               return false;
> +
> +       *entryp = pte_to_swp_entry(pte);
> +       if (non_swap_entry(*entryp))
> +               return false;
> +
> +       return true;
> +}
> +
>
> static inline bool get_pte_swap_entry(pte_t pte, swp_entry_t *entryp)
> {
>    return swpent_is_swap(*swpent = pte_to_swp_entry_safe(pte));
> }

I absolutely hate that embedded assignment, but this is equivalent to what
I suggested above, so agreed this is a good suggestion broadly.

>
> Maybe it doesn't even need an inline at that point?

Don't understand what you mean by that. It's in a header file?

>
> > * is_huge_pmd() - Determines if a PMD contains either a present transparent
> >   huge page entry or a huge non-present entry. This again simplifies a lot
> >   of logic that simply open-coded this.
>
> is_huge_or_swpent_pmd() would be nicer, IMHO. I think it is surprising
> when any of these APIs accept swap entries without being explicit

Again, I'm not going to reference swap in a series intended to eliminate
this, it defeats the purpose.

And the non-present (or whatever you want to call it) entry _is_ huge. So
it's just adding more confusion that way IMO.

>
> Jason

Thanks, Lorenzo
Re: [RFC PATCH 00/12] remove is_swap_[pte, pmd]() + non-swap confusion
Posted by Jason Gunthorpe 3 months, 1 week ago
On Mon, Oct 27, 2025 at 05:33:57PM +0000, Lorenzo Stoakes wrote:
> (Note I never intended this to be an RFC, it was only because of
> series-likely-to-be-dropped causing nasty conflicts this isn't an 'out
> there' series rather a practical submission).
> 
> To preface, as I said elsewhere, I intend to do more on this, renaming
> swp_entry_t to probably leaf_entry_t (thanks Gregory!)
> 
> The issue is no matter how I do this people will theorise different
> approaches, I'm trying to practically find a way forward that works
> iteratively.

It is why I suggested that swp_entry_t is the name we have (for this
series at least) and lean into it as the proper name for the abstract
idea of a multi-type'd value. Having a following series to rename
"swp_entry_t" to some "leaf entry" will resolve the poor naming.

But for now, "swp_entry_t" does not mean *swap* entry, it means "leaf
entry with a really bad type name".

And swpent_* is the namespace prefix for things dealing with
swp_entry_t.

If done consistently then the switch to leaf entry naming is just a
simple mass rename of swpent/leafent.

> > That suggests functions like this:
> >
> > swpent_is_swap()
> > swpent_is_migration()
> > ..
> 
> The _whole point_ of this series is to separate out the idea that you're
> dealing with swap entries so I don't like swpent as a name obviously.

As you say we can't fix everything at once, but if you do the above
and then rename the end state would be

leafent_is_swap()
leafent_is_migration()
 ..

And that seems like a good end state.

So pick the small steps, either lean into swpent in this series as the
place holder for leafent in the next..

Or this seems like a good idea too:

> We could also just pre-empt and prefix functions with leafent_is_swap() if
> you prefer.
> 
> We could even do:
> 
> /* TODO: Rename swap_entry_t to leaf_entry_t */
> typedef swap_entry_t leaf_entry_t;
>
> And use the new type right away.

Then the followup series is cleaning away swap_entry_t as a name.

> > /* True if the pte is a swpent_is_swap() */
> > static inline bool swpent_get_swap_pte(pte_t pte, swp_entry_t *entryp)
> > {
> >    if (pte_present(pte))
> >         return false;
> >    *swpent = pte_to_swp_entry(pte);
> >    return swpent_is_swap(*swpent);
> > }
> 
> I already implement in the series a pte_to_swp_entry_or_zero() function

I saw, but I don't think it is a great name.. It doesn't really give
"zero" it gives a swp_entry_t that doesn't pass any of the
swpent_is_XX() functions. ie a none type.

> that goes one further - checks pte_present() for you, if pte_none() you
> just get an empty swap entry, so this can be:

And I was hoping to see a path to get rid of the pte_none() stuff, or
at least on most arches. It is pretty pointless to check for pte_none
if the arch has a none-pte that already is 0..

So pte_none can be more like:
   swpent_is_none(pte_to_swp_entry(pte))

Where pte_to_swp_entry is just some bit maths with no conditionals.

> > I also think it will be more readable to keep all these things under a
> > swpent namespace instead of using unstructured english names.
> 
> Nope. Again, the whole point of the series is to avoid referencing
> swap. swpent_xxx() is just eliminating the purpose of the series right?
> 
> Yes it sucks that the type name is what it is, but this is an iterative
> process.

Sure, but don't add a bunch of new names with *no namespace*. As above
either accept swpent is a placeholder for leafent in the next series,
or do this:

> But as above, we could pre-empt future changes and prefix with a
> leafent_*() prefix if that works for you?

Which seems like a good idea to me.

> > I'd expect a safe function should be more like
> >
> >    *swpent = pte_to_swp_entry_safe(pte);
> >    return swpent_is_swap(*swpent);
> >
> > Where "safe" means that if the PTE is None or Present then
> > swpent_is_XX() == false. Ie it returns a 0 swpent and 0 swpent is
> > always nothing.
> 
> Not sure it's really 'safe', the name is unfortunate, but you could read
> this as 'always get a valid swap entry to operate on'...

My suggestion was the leaf entry has a type {none, swap, migration, etc}

And this _safe version returns the none type'd leaf entry for a
present pte.

We move toward eliminating the idea of pte_none by saying a
non-present pte is always a leaf_entry and what we call a "none pte"
is a "none leaf entry"

> leaf_entry_t leafent_from_pte()...?

Probably this one?
> > static inline bool get_pte_swap_entry(pte_t pte, swp_entry_t *entryp)
> > {
> >    return swpent_is_swap(*swpent = pte_to_swp_entry_safe(pte));
> > }
> 
> I absolutely hate that embedded assignment, but this is equivalent to what
> I suggested above, so agreed this is a good suggestion broadly.
> 
> >
> > Maybe it doesn't even need an inline at that point?
> 
> Don't understand what you mean by that. It's in a header file?

I mean just write it like this in the callers:

  swp_entry_t leafent = pte_to_swp_entry_safe(pte);

  if (swpent_is_swap(leafent)) {
  }

It is basically the same # lines as the helper version.

> > > * is_huge_pmd() - Determines if a PMD contains either a present transparent
> > >   huge page entry or a huge non-present entry. This again simplifies a lot
> > >   of logic that simply open-coded this.
> >
> > is_huge_or_swpent_pmd() would be nicer, IMHO. I think it is surprising
> > when any of these APIs accept swap entries without being explicit
> 
> Again, I'm not going to reference swap in a series intended to eliminate
> this, it defeats the purpose.
> 
> And the non-present (or whatever you want to call it) entry _is_ huge. So
> it's just adding more confusion that way IMO.

Then this:

  pmd_is_present_or_leafent(pmd)

Jason
Re: [RFC PATCH 00/12] remove is_swap_[pte, pmd]() + non-swap confusion
Posted by Lorenzo Stoakes 3 months, 1 week ago
On Tue, Oct 28, 2025 at 09:48:17AM -0300, Jason Gunthorpe wrote:
> On Mon, Oct 27, 2025 at 05:33:57PM +0000, Lorenzo Stoakes wrote:
> > (Note I never intended this to be an RFC, it was only because of
> > series-likely-to-be-dropped causing nasty conflicts this isn't an 'out
> > there' series rather a practical submission).
> >
> > To preface, as I said elsewhere, I intend to do more on this, renaming
> > swp_entry_t to probably leaf_entry_t (thanks Gregory!)
> >
> > The issue is no matter how I do this people will theorise different
> > approaches, I'm trying to practically find a way forward that works
> > iteratively.
>
> It is why I suggested that swp_entry_t is the name we have (for this
> series at least) and lean into it as the proper name for the abstract
> idea of a multi-type'd value. Having a following series to rename
> "swp_entry_t" to some "leaf entry" will resolve the poor naming.

This is addressed below.

> But for now, "swp_entry_t" does not mean *swap* entry, it means "leaf
> entry with a really bad type name".

Yes.

>
> And swpent_* is the namespace prefix for things dealing with
> swp_entry_t.
>
> If done consistently then the switch to leaf entry naming is just a
> simple mass rename of swpent/leafent.
>
> > > That suggests functions like this:
> > >
> > > swpent_is_swap()
> > > swpent_is_migration()
> > > ..
> >
> > The _whole point_ of this series is to separate out the idea that you're
> > dealing with swap entries so I don't like swpent as a name obviously.
>
> As you say we can't fix everything at once, but if you do the above
> and then rename the end state would be
>
> leafent_is_swap()
> leafent_is_migration()
>  ..
>
> And that seems like a good end state.

This is a two wrongs don't make a right situation.

I don't want to belabour this because we ultimately agree using
leafent_xxx() now is fine.

>
> So pick the small steps, either lean into swpent in this series as the
> place holder for leafent in the next..
>
> Or this seems like a good idea too:
>
> > We could also just pre-empt and prefix functions with leafent_is_swap() if
> > you prefer.

Good. I may even go so far as to say 'thank science we agree on that' ;)

Yes I'll do this.

> >
> > We could even do:
> >
> > /* TODO: Rename swap_entry_t to leaf_entry_t */
> > typedef swap_entry_t leaf_entry_t;

BTW typo, obv. meant swp_entry_t here...

> >
> > And use the new type right away.
>
> Then the followup series is cleaning away swap_entry_t as a name.

OK so you're good with the typedef? This would be quite nice actually as we
could then use leaf_entry_t in all the core leafent_xxx() logic ahead of
time and reduce confusion _there_ and effectively document that swp_entry_t
is just badly named.

This follow up series is one I very much intend to do, it's just going to
be a big churny one (hey my speciality anyway) but one which is best done
entirely mechanically I think.

>
> > > /* True if the pte is a swpent_is_swap() */
> > > static inline bool swpent_get_swap_pte(pte_t pte, swp_entry_t *entryp)
> > > {
> > >    if (pte_present(pte))
> > >         return false;
> > >    *swpent = pte_to_swp_entry(pte);
> > >    return swpent_is_swap(*swpent);
> > > }
> >
> > I already implement in the series a pte_to_swp_entry_or_zero() function
>
> I saw, but I don't think it is a great name.. It doesn't really give
> "zero" it gives a swp_entry_t that doesn't pass any of the
> swpent_is_XX() functions. ie a none type.

Naming is hard...

I mean really it wouldn't be all too awful to have pte_to_leafent() do this
now...

>
> > that goes one further - checks pte_present() for you, if pte_none() you
> > just get an empty swap entry, so this can be:
>
> And I was hoping to see a path to get rid of the pte_none() stuff, or
> at least on most arches. It is pretty pointless to check for pte_none
> if the arch has a none-pte that already is 0..
>
> So pte_none can be more like:
>    swpent_is_none(pte_to_swp_entry(pte))
>
> Where pte_to_swp_entry is just some bit maths with no conditionals.

*leafent

I mean I'm not so sure that's all that useful, you often want to skip over
things that are 'none' entries without doing this conversion.

We could use the concept of 'none is an empty leaf_entry_t' more thoroughly
internally in functions though.

I will see what I can do.

>
> > > I also think it will be more readable to keep all these things under a
> > > swpent namespace instead of using unstructured english names.
> >
> > Nope. Again, the whole point of the series is to avoid referencing
> > swap. swpent_xxx() is just eliminating the purpose of the series right?
> >
> > Yes it sucks that the type name is what it is, but this is an iterative
> > process.
>
> Sure, but don't add a bunch of new names with *no namespace*. As above
> either accept swpent is a placeholder for leafent in the next series,
> or do this:
>
> > But as above, we could pre-empt future changes and prefix with a
> > leafent_*() prefix if that works for you?
>
> Which seems like a good idea to me.

Yup. We agree on this.

>
> > > I'd expect a safe function should be more like
> > >
> > >    *swpent = pte_to_swp_entry_safe(pte);
> > >    return swpent_is_swap(*swpent);
> > >
> > > Where "safe" means that if the PTE is None or Present then
> > > swpent_is_XX() == false. Ie it returns a 0 swpent and 0 swpent is
> > > always nothing.
> >
> > Not sure it's really 'safe', the name is unfortunate, but you could read
> > this as 'always get a valid swap entry to operate on'...
>
> My suggestion was the leaf entry has a type {none, swap, migration, etc}
>
> And this _safe version returns the none type'd leaf entry for a
> present pte.

I mean that's already what's happening more or less with the ..._is_zero()
function (albeit needing a rename).

>
> We move toward eliminating the idea of pte_none by saying a
> non-present pte is always a leaf_entry and what we call a "none pte"
> is a "none leaf entry"

Well as discussed above.

>
> > leaf_entry_t leafent_from_pte()...?
>
> Probably this one?
> > > static inline bool get_pte_swap_entry(pte_t pte, swp_entry_t *entryp)
> > > {
> > >    return swpent_is_swap(*swpent = pte_to_swp_entry_safe(pte));
> > > }
> >
> > I absolutely hate that embedded assignment, but this is equivalent to what
> > I suggested above, so agreed this is a good suggestion broadly.
> >
> > >
> > > Maybe it doesn't even need an inline at that point?
> >
> > Don't understand what you mean by that. It's in a header file?
>
> I mean just write it like this in the callers:
>
>   swp_entry_t leafent = pte_to_swp_entry_safe(pte);
>
>   if (swpent_is_swap(leafent)) {
>   }
>
> It is basically the same # lines as the helper version.

Right, good point!

>
> > > > * is_huge_pmd() - Determines if a PMD contains either a present transparent
> > > >   huge page entry or a huge non-present entry. This again simplifies a lot
> > > >   of logic that simply open-coded this.
> > >
> > > is_huge_or_swpent_pmd() would be nicer, IMHO. I think it is surprising
> > > when any of these APIs accept swap entries without being explicit
> >
> > Again, I'm not going to reference swap in a series intended to eliminate
> > this, it defeats the purpose.
> >
> > And the non-present (or whatever you want to call it) entry _is_ huge. So
> > it's just adding more confusion that way IMO.
>
> Then this:
>
>   pmd_is_present_or_leafent(pmd)

A PMD can be present and contain an entry pointing at a PTE table so I'm
not sure that helps... naming is hard :)

Will think of alternatives on respin.

>
> Jason

Thanks, Lorenzo
Re: [RFC PATCH 00/12] remove is_swap_[pte, pmd]() + non-swap confusion
Posted by Jason Gunthorpe 3 months, 1 week ago
On Tue, Oct 28, 2025 at 06:20:54PM +0000, Lorenzo Stoakes wrote:
> > > And use the new type right away.
> >
> > Then the followup series is cleaning away swap_entry_t as a name.
> 
> OK so you're good with the typedef? This would be quite nice actually as we
> could then use leaf_entry_t in all the core leafent_xxx() logic ahead of
> time and reduce confusion _there_ and effectively document that swp_entry_t
> is just badly named.

Yeah, I think so, a commit message explaining it is temporary and a
future series will mechanically rename it away and this is
preparation.

> I mean I'm not so sure that's all that useful, you often want to skip over
> things that are 'none' entries without doing this conversion.

Maybe go directly from a pte to the leaf entry type for this check?

#define __swp_type(x) ((x).val >> (64 - SWP_TYPE_BITS))

That's basically free on most arches..

> We could use the concept of 'none is an empty leaf_entry_t' more thoroughly
> internally in functions though.
> 
> I will see what I can do.

Sure, maybe something works out

Though if we want to keep them seperate then maybe pte_is_leafent() is
the right name for pte_none(). Reads so much better like this:

if (pte_is_leafent(pte)) {
    leafent_t leaf = leafent_from_pte(pte)

     if (leafent_is_swap(leaf)) {..}
}

> > Then this:
> >
> >   pmd_is_present_or_leafent(pmd)
> 
> A PMD can be present and contain an entry pointing at a PTE table so I'm
> not sure that helps... naming is hard :)

pmd_is_leaf_or_leafent()

In the PTE API we are calling present entries that are address, not
tables, leafs.

Jason
Re: [RFC PATCH 00/12] remove is_swap_[pte, pmd]() + non-swap confusion
Posted by Lorenzo Stoakes 3 months, 1 week ago
On Wed, Oct 29, 2025 at 11:10:48AM -0300, Jason Gunthorpe wrote:
> On Tue, Oct 28, 2025 at 06:20:54PM +0000, Lorenzo Stoakes wrote:
> > > > And use the new type right away.
> > >
> > > Then the followup series is cleaning away swap_entry_t as a name.
> >
> > OK so you're good with the typedef? This would be quite nice actually as we
> > could then use leaf_entry_t in all the core leafent_xxx() logic ahead of
> > time and reduce confusion _there_ and effectively document that swp_entry_t
> > is just badly named.
>
> Yeah, I think so, a commit message explaining it is temporary and a
> future series will mechanically rename it away and this is
> preparation.
>
> > I mean I'm not so sure that's all that useful, you often want to skip over
> > things that are 'none' entries without doing this conversion.
>
> Maybe go directly from a pte to the leaf entry type for this check?
>
> #define __swp_type(x) ((x).val >> (64 - SWP_TYPE_BITS))
>
> That's basically free on most arches..

That's nice, I guess we could throw in a pte_present() check there and just grab
the type out direct like that

>
> > We could use the concept of 'none is an empty leaf_entry_t' more thoroughly
> > internally in functions though.
> >
> > I will see what I can do.
>
> Sure, maybe something works out
>
> Though if we want to keep them seperate then maybe pte_is_leafent() is
> the right name for pte_none(). Reads so much better like this:
>
> if (pte_is_leafent(pte)) {

Ah so this would amount to !pte_is_present()

>     leafent_t leaf = leafent_from_pte(pte)
>
>      if (leafent_is_swap(leaf)) {..}

And yeah... that is nice you know... :)

> }

>
> > > Then this:
> > >
> > >   pmd_is_present_or_leafent(pmd)
> >
> > A PMD can be present and contain an entry pointing at a PTE table so I'm
> > not sure that helps... naming is hard :)
>
> pmd_is_leaf_or_leafent()
>
> In the PTE API we are calling present entries that are address, not
> tables, leafs.

Hmm I think pmd_is_present_or_leafent() is clearer actually on second
thoughts :)

Still feel a desire to shove a 'huge' in there though but then it's getting
wordy... :)

Let me play around...

>
> Jason

Cheers, Lorenzo
Re: [RFC PATCH 00/12] remove is_swap_[pte, pmd]() + non-swap confusion
Posted by Gregory Price 3 months, 1 week ago
On Wed, Oct 29, 2025 at 07:09:59PM +0000, Lorenzo Stoakes wrote:
> >
> > pmd_is_leaf_or_leafent()
> >
> > In the PTE API we are calling present entries that are address, not
> > tables, leafs.
> 
> Hmm I think pmd_is_present_or_leafent() is clearer actually on second
> thoughts :)
> 

apologies if misunderstanding, but I like short names :]

#define pmd_exists(entry) (pmd_is_present() || pmd_is_leafent())

If you care about what that entry is, you'll have to spell out these
checks in your code anyway, so no need to explode the naming to include
everything that might be there.

~Gregory
Re: [RFC PATCH 00/12] remove is_swap_[pte, pmd]() + non-swap confusion
Posted by Lorenzo Stoakes 3 months ago
On Wed, Oct 29, 2025 at 05:23:39PM -0400, Gregory Price wrote:
> On Wed, Oct 29, 2025 at 07:09:59PM +0000, Lorenzo Stoakes wrote:
> > >
> > > pmd_is_leaf_or_leafent()
> > >
> > > In the PTE API we are calling present entries that are address, not
> > > tables, leafs.
> >
> > Hmm I think pmd_is_present_or_leafent() is clearer actually on second
> > thoughts :)
> >
>
> apologies if misunderstanding, but I like short names :]
>
> #define pmd_exists(entry) (pmd_is_present() || pmd_is_leafent())
>
> If you care about what that entry is, you'll have to spell out these
> checks in your code anyway, so no need to explode the naming to include
> everything that might be there.

Ah actually one issue we have here is that huge_mm.h is _super_ early in
header import order so we can't use any leafent stuff here at all, which
sucks.

Will have to compromise a bit here unfortunately!

>
> ~Gregory

Cheers, Lorenzo
Re: [RFC PATCH 00/12] remove is_swap_[pte, pmd]() + non-swap confusion
Posted by Lorenzo Stoakes 3 months, 1 week ago
On Wed, Oct 29, 2025 at 05:23:39PM -0400, Gregory Price wrote:
> On Wed, Oct 29, 2025 at 07:09:59PM +0000, Lorenzo Stoakes wrote:
> > >
> > > pmd_is_leaf_or_leafent()
> > >
> > > In the PTE API we are calling present entries that are address, not
> > > tables, leafs.
> >
> > Hmm I think pmd_is_present_or_leafent() is clearer actually on second
> > thoughts :)
> >
>
> apologies if misunderstanding, but I like short names :]
>
> #define pmd_exists(entry) (pmd_is_present() || pmd_is_leafent())
>
> If you care about what that entry is, you'll have to spell out these
> checks in your code anyway, so no need to explode the naming to include
> everything that might be there.

Yeah that's a fair point, and I prefer short names also :)

This does read better thanks!

I should try for a non-RFC respin relatively soon now I've sent the other two
series I've been working on recently.

>
> ~Gregory

Cheers, Lorenzo
Re: [RFC PATCH 00/12] remove is_swap_[pte, pmd]() + non-swap confusion
Posted by Yosry Ahmed 3 months, 2 weeks ago
On Fri, Oct 24, 2025 at 08:41:16AM +0100, Lorenzo Stoakes wrote:
> There's an established convention in the kernel that we treat leaf page
> tables (so far at the PTE, PMD level) as containing 'swap entries' should
> they be neither empty (i.e. p**_none() evaluating true) nor present
> (i.e. p**_present() evaluating true).
> 
> However, at the same time we also have helper predicates - is_swap_pte(),
> is_swap_pmd() - which are inconsistently used.
> 
> This is problematic, as it is logical to assume that should somebody wish
> to operate upon a page table swap entry they should first check to see if
> it is in fact one.
> 
> It also implies that perhaps, in future, we might introduce a non-present,
> none page table entry that is not a swap entry.
> 
> This series resolves this issue by systematically eliminating all use of
> the is_swap_pte() and is swap_pmd() predicates so we retain only the
> convention that should a leaf page table entry be neither none nor present
> it is a swap entry.
> 
> We also have the further issue that 'swap entry' is unfortunately a really
> rather overloaded term and in fact refers to both entries for swap and for
> other information such as migration entries, page table markers, and device
> private entries.
> 
> We therefore have the rather 'unique' concept of a 'non-swap' swap entry.
> 
> This is deeply confusing, so this series goes further and eliminates the
> non_swap_entry() predicate, replacing it with is_non_present_entry() - with
> an eye to a new convention of referring to these non-swap 'swap entries' as
> non-present.

I just wanted to say THANK YOU for doing this. It is indeed a very
annoying and confusing convention, and I wanted to do something about it
in the past but never got around to it..

> 
> It also introduces the is_swap_entry() predicate to explicitly and
> logically refer to actual 'true' swap entries, improving code readibility,
> avoiding the hideous convention of:
> 
> 	if (!non_swap_entry(entry)) {
> 		...
> 	}
> 
> As part of these changes we also introduce a few other new predicates:
> 
> * pte_to_swp_entry_or_zero() - allows for convenient conversion from a PTE
>   to a swap entry if present, or an empty swap entry if none. This is
>   useful as many swap entry conversions are simply checking for flags for
>   which this suffices.
> 
> * get_pte_swap_entry() - Retrieves a PTE swap entry if it truly is a swap
>   entry (i.e. not a non-present entry), returning true if so, otherwise
>   returns false. This simplifies a lot of logic that previously open-coded
>   this.
> 
> * is_huge_pmd() - Determines if a PMD contains either a present transparent
>   huge page entry or a huge non-present entry. This again simplifies a lot
>   of logic that simply open-coded this.
> 
> REVIEWERS NOTE:
> 
> This series applies against mm-unstable as there are currently conflicts
> with mm-new. Should the series receive community assent I will resolve
> these at the point the RFC tag is removed.
> 
> I also intend to use this as a foundation for further work to add higher
> order page table markers.
> 
> Lorenzo Stoakes (12):
>   mm: introduce and use pte_to_swp_entry_or_zero()
>   mm: avoid unnecessary uses of is_swap_pte()
>   mm: introduce get_pte_swap_entry() and use it
>   mm: use get_pte_swap_entry() in debug pgtable + remove is_swap_pte()
>   fs/proc/task_mmu: refactor pagemap_pmd_range()
>   mm: avoid unnecessary use of is_swap_pmd()
>   mm: introduce is_huge_pmd() and use where appropriate
>   mm/huge_memory: refactor copy_huge_pmd() non-present logic
>   mm/huge_memory: refactor change_huge_pmd() non-present logic
>   mm: remove remaining is_swap_pmd() users and is_swap_pmd()
>   mm: rename non_swap_entry() to is_non_present_entry()
>   mm: provide is_swap_entry() and use it
> 
>  arch/s390/mm/gmap_helpers.c   |   2 +-
>  arch/s390/mm/pgtable.c        |   2 +-
>  fs/proc/task_mmu.c            | 214 ++++++++++++++++++++--------------
>  include/linux/huge_mm.h       |  49 +++++---
>  include/linux/swapops.h       |  99 ++++++++++++++--
>  include/linux/userfaultfd_k.h |  16 +--
>  mm/debug_vm_pgtable.c         |  43 ++++---
>  mm/filemap.c                  |   2 +-
>  mm/hmm.c                      |   2 +-
>  mm/huge_memory.c              | 189 ++++++++++++++++--------------
>  mm/hugetlb.c                  |   6 +-
>  mm/internal.h                 |  12 +-
>  mm/khugepaged.c               |  29 ++---
>  mm/madvise.c                  |  14 +--
>  mm/memory.c                   |  62 +++++-----
>  mm/migrate.c                  |   2 +-
>  mm/mincore.c                  |   2 +-
>  mm/mprotect.c                 |  45 ++++---
>  mm/mremap.c                   |   9 +-
>  mm/page_table_check.c         |  25 ++--
>  mm/page_vma_mapped.c          |  30 +++--
>  mm/swap_state.c               |   5 +-
>  mm/swapfile.c                 |   3 +-
>  mm/userfaultfd.c              |   2 +-
>  24 files changed, 511 insertions(+), 353 deletions(-)
> 
> --
> 2.51.0
Re: [RFC PATCH 00/12] remove is_swap_[pte, pmd]() + non-swap confusion
Posted by Lorenzo Stoakes 3 months, 2 weeks ago
On Fri, Oct 24, 2025 at 08:05:33PM +0000, Yosry Ahmed wrote:
> On Fri, Oct 24, 2025 at 08:41:16AM +0100, Lorenzo Stoakes wrote:
> > There's an established convention in the kernel that we treat leaf page
> > tables (so far at the PTE, PMD level) as containing 'swap entries' should
> > they be neither empty (i.e. p**_none() evaluating true) nor present
> > (i.e. p**_present() evaluating true).
> >
> > However, at the same time we also have helper predicates - is_swap_pte(),
> > is_swap_pmd() - which are inconsistently used.
> >
> > This is problematic, as it is logical to assume that should somebody wish
> > to operate upon a page table swap entry they should first check to see if
> > it is in fact one.
> >
> > It also implies that perhaps, in future, we might introduce a non-present,
> > none page table entry that is not a swap entry.
> >
> > This series resolves this issue by systematically eliminating all use of
> > the is_swap_pte() and is swap_pmd() predicates so we retain only the
> > convention that should a leaf page table entry be neither none nor present
> > it is a swap entry.
> >
> > We also have the further issue that 'swap entry' is unfortunately a really
> > rather overloaded term and in fact refers to both entries for swap and for
> > other information such as migration entries, page table markers, and device
> > private entries.
> >
> > We therefore have the rather 'unique' concept of a 'non-swap' swap entry.
> >
> > This is deeply confusing, so this series goes further and eliminates the
> > non_swap_entry() predicate, replacing it with is_non_present_entry() - with
> > an eye to a new convention of referring to these non-swap 'swap entries' as
> > non-present.
>
> I just wanted to say THANK YOU for doing this. It is indeed a very
> annoying and confusing convention, and I wanted to do something about it
> in the past but never got around to it..

:) that's very kind of you to say thanks! I was motivated by pure irritation at
this situation after David and I discussed it extensively.

I was initially thinking we should consistently _use_ the predicate and was
arguing for it on review, but David pointed out the convention is quite the
opposite and it became apparent to me th the predicates were the issue.

Of course I have encountered this non-swap swap entry madness as all of us
in mm have, but fixing that ends up being a natural extension to resolving
the is_swap_xxx() stuff and a big relief to deal with also.

There's more work to be done but this addresses some of the more proximate
issues! :)

Cheers, Lorenzo