include/linux/huge_mm.h | 9 +- mm/khugepaged.c | 9 +- mm/madvise.c | 585 +++++++++++++++++++++------------------- 3 files changed, 313 insertions(+), 290 deletions(-)
This is a series of patches that helps address a number of historic problems in the madvise() implementation: * Eliminate the visitor pattern and having the code which is implemented for both the anon_vma_name implementation and ordinary madvise() operations use the same madvise_vma_behavior() implementation. * Thread state through the madvise_behavior state object - this object, very usefully introduced by SJ, is already used to transmit state through operations. This series extends this by having all madvise() operations use this, including anon_vma_name. * Thread range, VMA state through madvise_behavior - This helps avoid a lot of the confusing code around range and VMA state and again keeps things consistent and with a single 'source of truth'. * Addressing the very strange behaviour around the passed around struct vm_area_struct **prev pointer - all read-only users do absolutely nothing with the prev pointer. The only function that uses it is madvise_update_vma(), and in all cases prev is always reset to VMA. Fix this by no longer having aything but madvise_update_vma() reference prev, and having madvise_walk_vmas() update prev in each instance. Additionally make it clear that the meaningful change in vma state is when madvise_update_vma() potentially merges a VMA, so explicitly retrieve the VMA in this case. * Update and clarify the madvise_walk_vmas() function - this is a source of a great deal of confusion, so simplify, stop using prev = NULL to signify that the mmap lock has been dropped (!) and make that explicit, and add some comments to explain what's going on. v2: * Propagated tags (thanks everyone!) * Don't separate out __MADV_SET_ANON_VMA_NAME and __MADV_SET_CLEAR_VMA_NAME, just use __MADV_SET_ANON_VMA_NAME as per Zi. * Eliminate is_anon_vma_name() as no longer necessary, addressing Zi's concern around naming another way :) * Put mm_struct abstraction of try_vma_read_lock() into 2/5 from 3/5 as per Zi. * Added comment about get/put anon_vma_name in madvise_vma_behavior() as per Vlastimil. * Renamed have_anon_name to set_new_anon_name to make it clear why we make an exception to this get/put behaviour in madvise_vma_behavior(). * Reworded 1/4 commit message to make it clearer what's being done as per Vlastimil. * Avoid comma-separated decls in struct madvise_behavior_range as per Zi and Vlastimil. * Put fix for silly development bug (range->start comparison to end not range->end) in 3/5 rather than 4/5 so as to eliminate it altogether, having fixed it during development but having not put the fix in the correct place :) as per Vlastimil. * Rename end to last_end in madvise_walk_vmas() and added a comment for clarity as per Vlastimil. * Update madvise_walk_vmas() comment to no longer refer to a visitor function. * Separated out prev, vma fields in struct madvise_behavior as per Vlastimil. * Added assert on not holding VMA lock whenever mmap lock is dropped and abstracted to mark_mmap_lock_dropped() so we always assert when we do this, based on discussion with Vlastimil. * Removed duplicate comment about weird -ENOMEM unmapped error behaviour. v1: https://lore.kernel.org/all/cover.1750363557.git.lorenzo.stoakes@oracle.com/ Lorenzo Stoakes (5): mm/madvise: remove the visitor pattern and thread anon_vma state mm/madvise: thread mm_struct through madvise_behavior mm/madvise: thread VMA range state through madvise_behavior mm/madvise: thread all madvise state through madv_behavior mm/madvise: eliminate very confusing manipulation of prev VMA include/linux/huge_mm.h | 9 +- mm/khugepaged.c | 9 +- mm/madvise.c | 585 +++++++++++++++++++++------------------- 3 files changed, 313 insertions(+), 290 deletions(-) -- 2.49.0
On Fri, 20 Jun 2025 16:33:00 +0100 Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote: > This is a series of patches that helps address a number of historic > problems in the madvise() implementation: > > * Eliminate the visitor pattern and having the code which is implemented > for both the anon_vma_name implementation and ordinary madvise() > operations use the same madvise_vma_behavior() implementation. > > * Thread state through the madvise_behavior state object - this object, > very usefully introduced by SJ, is already used to transmit state through > operations. This series extends this by having all madvise() operations > use this, including anon_vma_name. > > * Thread range, VMA state through madvise_behavior - This helps avoid a lot > of the confusing code around range and VMA state and again keeps things > consistent and with a single 'source of truth'. > > * Addressing the very strange behaviour around the passed around struct > vm_area_struct **prev pointer - all read-only users do absolutely nothing > with the prev pointer. The only function that uses it is > madvise_update_vma(), and in all cases prev is always reset to > VMA. > > Fix this by no longer having aything but madvise_update_vma() reference > prev, and having madvise_walk_vmas() update prev in each > instance. Additionally make it clear that the meaningful change in vma > state is when madvise_update_vma() potentially merges a VMA, so > explicitly retrieve the VMA in this case. > > * Update and clarify the madvise_walk_vmas() function - this is a source of > a great deal of confusion, so simplify, stop using prev = NULL to signify > that the mmap lock has been dropped (!) and make that explicit, and add > some comments to explain what's going on. > > v2: > * Propagated tags (thanks everyone!) > * Don't separate out __MADV_SET_ANON_VMA_NAME and __MADV_SET_CLEAR_VMA_NAME, FWIW. If this cover letter is added to the first patch, like Andrew usually does, as-is, checkpatch.pl may warn like below. WARNING: Prefer a maximum 75 chars per line (possible unwrapped commit description?) Obviously no real problem and I don't really care. I just found this since my tool (hkml) runs checkpatch.pl after adding the cover letter to the first patch, and hence this is just FWIW. Thanks, SJ [...]
On Fri, Jun 20, 2025 at 10:21:08AM -0700, SeongJae Park wrote: > On Fri, 20 Jun 2025 16:33:00 +0100 Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote: > > > This is a series of patches that helps address a number of historic > > problems in the madvise() implementation: > > > > * Eliminate the visitor pattern and having the code which is implemented > > for both the anon_vma_name implementation and ordinary madvise() > > operations use the same madvise_vma_behavior() implementation. > > > > * Thread state through the madvise_behavior state object - this object, > > very usefully introduced by SJ, is already used to transmit state through > > operations. This series extends this by having all madvise() operations > > use this, including anon_vma_name. > > > > * Thread range, VMA state through madvise_behavior - This helps avoid a lot > > of the confusing code around range and VMA state and again keeps things > > consistent and with a single 'source of truth'. > > > > * Addressing the very strange behaviour around the passed around struct > > vm_area_struct **prev pointer - all read-only users do absolutely nothing > > with the prev pointer. The only function that uses it is > > madvise_update_vma(), and in all cases prev is always reset to > > VMA. > > > > Fix this by no longer having aything but madvise_update_vma() reference > > prev, and having madvise_walk_vmas() update prev in each > > instance. Additionally make it clear that the meaningful change in vma > > state is when madvise_update_vma() potentially merges a VMA, so > > explicitly retrieve the VMA in this case. > > > > * Update and clarify the madvise_walk_vmas() function - this is a source of > > a great deal of confusion, so simplify, stop using prev = NULL to signify > > that the mmap lock has been dropped (!) and make that explicit, and add > > some comments to explain what's going on. > > > > v2: > > * Propagated tags (thanks everyone!) > > * Don't separate out __MADV_SET_ANON_VMA_NAME and __MADV_SET_CLEAR_VMA_NAME, > > FWIW. If this cover letter is added to the first patch, like Andrew usually > does, as-is, checkpatch.pl may warn like below. > > WARNING: Prefer a maximum 75 chars per line (possible unwrapped commit description?) > > Obviously no real problem and I don't really care. I just found this since my > tool (hkml) runs checkpatch.pl after adding the cover letter to the first > patch, and hence this is just FWIW. Yeah, sorry, this is because I didn't 'compress' the v2 revision log, which won't be reproduced in patches anyway so should be ok :>) > > > Thanks, > SJ > > [...]
© 2016 - 2025 Red Hat, Inc.