[PATCH v2 0/5] madvise cleanup

Lorenzo Stoakes posted 5 patches 3 months, 2 weeks ago
include/linux/huge_mm.h |   9 +-
mm/khugepaged.c         |   9 +-
mm/madvise.c            | 585 +++++++++++++++++++++-------------------
3 files changed, 313 insertions(+), 290 deletions(-)
[PATCH v2 0/5] madvise cleanup
Posted by Lorenzo Stoakes 3 months, 2 weeks ago
This is a series of patches that helps address a number of historic
problems in the madvise() implementation:

* Eliminate the visitor pattern and having the code which is implemented
  for both the anon_vma_name implementation and ordinary madvise()
  operations use the same madvise_vma_behavior() implementation.

* Thread state through the madvise_behavior state object - this object,
  very usefully introduced by SJ, is already used to transmit state through
  operations. This series extends this by having all madvise() operations
  use this, including anon_vma_name.

* Thread range, VMA state through madvise_behavior - This helps avoid a lot
  of the confusing code around range and VMA state and again keeps things
  consistent and with a single 'source of truth'.

* Addressing the very strange behaviour around the passed around struct
  vm_area_struct **prev pointer - all read-only users do absolutely nothing
  with the prev pointer. The only function that uses it is
  madvise_update_vma(), and in all cases prev is always reset to
  VMA.

  Fix this by no longer having aything but madvise_update_vma() reference
  prev, and having madvise_walk_vmas() update prev in each
  instance. Additionally make it clear that the meaningful change in vma
  state is when madvise_update_vma() potentially merges a VMA, so
  explicitly retrieve the VMA in this case.

* Update and clarify the madvise_walk_vmas() function - this is a source of
  a great deal of confusion, so simplify, stop using prev = NULL to signify
  that the mmap lock has been dropped (!) and make that explicit, and add
  some comments to explain what's going on.

v2:
* Propagated tags (thanks everyone!)
* Don't separate out __MADV_SET_ANON_VMA_NAME and __MADV_SET_CLEAR_VMA_NAME,
  just use __MADV_SET_ANON_VMA_NAME as per Zi.
* Eliminate is_anon_vma_name() as no longer necessary, addressing Zi's concern
  around naming another way :)
* Put mm_struct abstraction of try_vma_read_lock() into 2/5 from 3/5 as per Zi.
* Added comment about get/put anon_vma_name in madvise_vma_behavior() as per
  Vlastimil.
* Renamed have_anon_name to set_new_anon_name to make it clear why we make an
  exception to this get/put behaviour in madvise_vma_behavior().
* Reworded 1/4 commit message to make it clearer what's being done as per
  Vlastimil.
* Avoid comma-separated decls in struct madvise_behavior_range as per Zi and
  Vlastimil.
* Put fix for silly development bug (range->start comparison to end not
  range->end) in 3/5 rather than 4/5 so as to eliminate it altogether,
  having fixed it during development but having not put the fix in the
  correct place :) as per Vlastimil.
* Rename end to last_end in madvise_walk_vmas() and added a comment for
  clarity as per Vlastimil.
* Update madvise_walk_vmas() comment to no longer refer to a visitor
  function.
* Separated out prev, vma fields in struct madvise_behavior as per
  Vlastimil.
* Added assert on not holding VMA lock whenever mmap lock is dropped and
  abstracted to mark_mmap_lock_dropped() so we always assert when we do
  this, based on discussion with Vlastimil.
* Removed duplicate comment about weird -ENOMEM unmapped error behaviour.

v1:
https://lore.kernel.org/all/cover.1750363557.git.lorenzo.stoakes@oracle.com/

Lorenzo Stoakes (5):
  mm/madvise: remove the visitor pattern and thread anon_vma state
  mm/madvise: thread mm_struct through madvise_behavior
  mm/madvise: thread VMA range state through madvise_behavior
  mm/madvise: thread all madvise state through madv_behavior
  mm/madvise: eliminate very confusing manipulation of prev VMA

 include/linux/huge_mm.h |   9 +-
 mm/khugepaged.c         |   9 +-
 mm/madvise.c            | 585 +++++++++++++++++++++-------------------
 3 files changed, 313 insertions(+), 290 deletions(-)

--
2.49.0
Re: [PATCH v2 0/5] madvise cleanup
Posted by SeongJae Park 3 months, 2 weeks ago
On Fri, 20 Jun 2025 16:33:00 +0100 Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote:

> This is a series of patches that helps address a number of historic
> problems in the madvise() implementation:
> 
> * Eliminate the visitor pattern and having the code which is implemented
>   for both the anon_vma_name implementation and ordinary madvise()
>   operations use the same madvise_vma_behavior() implementation.
> 
> * Thread state through the madvise_behavior state object - this object,
>   very usefully introduced by SJ, is already used to transmit state through
>   operations. This series extends this by having all madvise() operations
>   use this, including anon_vma_name.
> 
> * Thread range, VMA state through madvise_behavior - This helps avoid a lot
>   of the confusing code around range and VMA state and again keeps things
>   consistent and with a single 'source of truth'.
> 
> * Addressing the very strange behaviour around the passed around struct
>   vm_area_struct **prev pointer - all read-only users do absolutely nothing
>   with the prev pointer. The only function that uses it is
>   madvise_update_vma(), and in all cases prev is always reset to
>   VMA.
> 
>   Fix this by no longer having aything but madvise_update_vma() reference
>   prev, and having madvise_walk_vmas() update prev in each
>   instance. Additionally make it clear that the meaningful change in vma
>   state is when madvise_update_vma() potentially merges a VMA, so
>   explicitly retrieve the VMA in this case.
> 
> * Update and clarify the madvise_walk_vmas() function - this is a source of
>   a great deal of confusion, so simplify, stop using prev = NULL to signify
>   that the mmap lock has been dropped (!) and make that explicit, and add
>   some comments to explain what's going on.
> 
> v2:
> * Propagated tags (thanks everyone!)
> * Don't separate out __MADV_SET_ANON_VMA_NAME and __MADV_SET_CLEAR_VMA_NAME,

FWIW.  If this cover letter is added to the first patch, like Andrew usually
does, as-is, checkpatch.pl may warn like below.

   WARNING: Prefer a maximum 75 chars per line (possible unwrapped commit description?)

Obviously no real problem and I don't really care.  I just found this since my
tool (hkml) runs checkpatch.pl after adding the cover letter to the first
patch, and hence this is just FWIW.


Thanks,
SJ

[...]
Re: [PATCH v2 0/5] madvise cleanup
Posted by Lorenzo Stoakes 3 months, 2 weeks ago
On Fri, Jun 20, 2025 at 10:21:08AM -0700, SeongJae Park wrote:
> On Fri, 20 Jun 2025 16:33:00 +0100 Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote:
>
> > This is a series of patches that helps address a number of historic
> > problems in the madvise() implementation:
> >
> > * Eliminate the visitor pattern and having the code which is implemented
> >   for both the anon_vma_name implementation and ordinary madvise()
> >   operations use the same madvise_vma_behavior() implementation.
> >
> > * Thread state through the madvise_behavior state object - this object,
> >   very usefully introduced by SJ, is already used to transmit state through
> >   operations. This series extends this by having all madvise() operations
> >   use this, including anon_vma_name.
> >
> > * Thread range, VMA state through madvise_behavior - This helps avoid a lot
> >   of the confusing code around range and VMA state and again keeps things
> >   consistent and with a single 'source of truth'.
> >
> > * Addressing the very strange behaviour around the passed around struct
> >   vm_area_struct **prev pointer - all read-only users do absolutely nothing
> >   with the prev pointer. The only function that uses it is
> >   madvise_update_vma(), and in all cases prev is always reset to
> >   VMA.
> >
> >   Fix this by no longer having aything but madvise_update_vma() reference
> >   prev, and having madvise_walk_vmas() update prev in each
> >   instance. Additionally make it clear that the meaningful change in vma
> >   state is when madvise_update_vma() potentially merges a VMA, so
> >   explicitly retrieve the VMA in this case.
> >
> > * Update and clarify the madvise_walk_vmas() function - this is a source of
> >   a great deal of confusion, so simplify, stop using prev = NULL to signify
> >   that the mmap lock has been dropped (!) and make that explicit, and add
> >   some comments to explain what's going on.
> >
> > v2:
> > * Propagated tags (thanks everyone!)
> > * Don't separate out __MADV_SET_ANON_VMA_NAME and __MADV_SET_CLEAR_VMA_NAME,
>
> FWIW.  If this cover letter is added to the first patch, like Andrew usually
> does, as-is, checkpatch.pl may warn like below.
>
>    WARNING: Prefer a maximum 75 chars per line (possible unwrapped commit description?)
>
> Obviously no real problem and I don't really care.  I just found this since my
> tool (hkml) runs checkpatch.pl after adding the cover letter to the first
> patch, and hence this is just FWIW.

Yeah, sorry, this is because I didn't 'compress' the v2 revision log, which
won't be reproduced in patches anyway so should be ok :>)

>
>
> Thanks,
> SJ
>
> [...]