[PATCH 00/10] mm/mremap: permit mremap() move of multiple VMAs

Lorenzo Stoakes posted 10 patches 3 months ago
There is a newer version of this series
fs/userfaultfd.c                         |  15 +-
include/linux/userfaultfd_k.h            |   1 +
mm/mremap.c                              | 502 ++++++++++++++---------
tools/testing/selftests/mm/mremap_test.c | 145 ++++++-
4 files changed, 462 insertions(+), 201 deletions(-)
[PATCH 00/10] mm/mremap: permit mremap() move of multiple VMAs
Posted by Lorenzo Stoakes 3 months ago
Historically we've made it a uAPI requirement that mremap() may only
operate on a single VMA at a time.

For instances where VMAs need to be resized, this makes sense, as it
becomes very difficult to determine what a user actually wants should they
indicate a desire to expand or shrink the size of multiple VMAs (truncate?
Adjust sizes individually? Some other strategy?).

However, in instances where a user is moving VMAs, it is restrictive to
disallow this.

This is especially the case when anonymous mapping remap may or may not be
mergeable depending on whether VMAs have or have not been faulted due to
anon_vma assignment and folio index alignment with vma->vm_pgoff.

Often this can result in surprising impact where a moved region is faulted,
then moved back and a user fails to observe a merge from otherwise
compatible, adjacent VMAs.

This change allows such cases to work without the user having to be
cognizant of whether a prior mremap() move or other VMA operations has
resulted in VMA fragmentation.

In order to do this, this series performs a large amount of refactoring,
most pertinently - grouping sanity checks together, separately those that
check input parameters and those relating to VMAs.

we also simplify the post-mmap lock drop processing for uffd and mlock()'d
VMAs.

With this done, we can then fairly straightforwardly implement this
functionality.

This works exclusively for mremap() invocations which specify
MREMAP_FIXED. It is not compatible with VMAs which use userfaultfd, as the
notification of the userland fault handler would require us to drop the
mmap lock.

The input and output addresses ranges must not overlap. We carefully
account for moves which would result in VMA merges or would otherwise
result in VMA iterator invalidation.

Lorenzo Stoakes (10):
  mm/mremap: perform some simple cleanups
  mm/mremap: refactor initial parameter sanity checks
  mm/mremap: put VMA check and prep logic into helper function
  mm/mremap: cleanup post-processing stage of mremap
  mm/mremap: use an explicit uffd failure path for mremap
  mm/mremap: check remap conditions earlier
  mm/mremap: move remap_is_valid() into check_prep_vma()
  mm/mremap: clean up mlock populate behaviour
  mm/mremap: permit mremap() move of multiple VMAs
  tools/testing/selftests: extend mremap_test to test multi-VMA mremap

 fs/userfaultfd.c                         |  15 +-
 include/linux/userfaultfd_k.h            |   1 +
 mm/mremap.c                              | 502 ++++++++++++++---------
 tools/testing/selftests/mm/mremap_test.c | 145 ++++++-
 4 files changed, 462 insertions(+), 201 deletions(-)

--
2.50.0
Re: [PATCH 00/10] mm/mremap: permit mremap() move of multiple VMAs
Posted by Hugh Dickins 3 months ago
On Mon, 7 Jul 2025, Lorenzo Stoakes wrote:

> Historically we've made it a uAPI requirement that mremap() may only
> operate on a single VMA at a time.
> 
> For instances where VMAs need to be resized, this makes sense, as it
> becomes very difficult to determine what a user actually wants should they
> indicate a desire to expand or shrink the size of multiple VMAs (truncate?
> Adjust sizes individually? Some other strategy?).
> 
> However, in instances where a user is moving VMAs, it is restrictive to
> disallow this.
> 
> This is especially the case when anonymous mapping remap may or may not be
> mergeable depending on whether VMAs have or have not been faulted due to
> anon_vma assignment and folio index alignment with vma->vm_pgoff.
> 
> Often this can result in surprising impact where a moved region is faulted,
> then moved back and a user fails to observe a merge from otherwise
> compatible, adjacent VMAs.
> 
> This change allows such cases to work without the user having to be
> cognizant of whether a prior mremap() move or other VMA operations has
> resulted in VMA fragmentation.
> 
> In order to do this, this series performs a large amount of refactoring,
> most pertinently - grouping sanity checks together, separately those that
> check input parameters and those relating to VMAs.
> 
> we also simplify the post-mmap lock drop processing for uffd and mlock()'d
> VMAs.
> 
> With this done, we can then fairly straightforwardly implement this
> functionality.
> 
> This works exclusively for mremap() invocations which specify
> MREMAP_FIXED. It is not compatible with VMAs which use userfaultfd, as the
> notification of the userland fault handler would require us to drop the
> mmap lock.
> 
> The input and output addresses ranges must not overlap. We carefully
> account for moves which would result in VMA merges or would otherwise
> result in VMA iterator invalidation.

Applause!

No way shall I review this, but each time I've seen an mremap series
from Lorenzo go by, I've wanted to say "but wouldn't it be better to...";
but it felt too impertinent to prod you in a direction I'd never dare
take myself (and quite likely that you had already tried, but found it
fundamentally impossible).

Thank you, yes, this is a very welcome step forward.

Hugh
Re: [PATCH 00/10] mm/mremap: permit mremap() move of multiple VMAs
Posted by Lorenzo Stoakes 3 months ago
On Sun, Jul 06, 2025 at 11:12:35PM -0700, Hugh Dickins wrote:
> Applause!
>
> No way shall I review this, but each time I've seen an mremap series
> from Lorenzo go by, I've wanted to say "but wouldn't it be better to...";
> but it felt too impertinent to prod you in a direction I'd never dare
> take myself (and quite likely that you had already tried, but found it
> fundamentally impossible).
>
> Thank you, yes, this is a very welcome step forward.

Thank you that's very kind of you! :) and please, by all means do feel free
to prod or to give your thoughts and opinions on things, they're very
welcome and appreciated!

With respect to this series, I think it really underlines what a difference
refactoring can make to being able to have code do something new - prior to
my last refactoring series and the refactoring bits here I just don't think
it would have been possible.

WRT to the relocate anon series - I thought it'd be interesting to talk
about why it didn't work out a bit in case you/others might find it
interesting:

Indeed, while I'd like us to more efficiently process VMAs in the anon_vma
case, it turns out there's simply too many moving parts for it to be
feasible at this time - I reached the point of dealing with many many edge
cases addressing the points David raised about folios in the swap cache and
migration entries (which might also fail to migrate), having gone to great
lengths to avoid having a not-reliable undo path.

I'd even invented a new means of 'hiding' anon_vma's from the rmap walker,
and did split folio work up front and and and :)

But then there came a point where unavoidably I'd ahave to do a split folio
mid-way through the operation and GUP fast could race and increment a
refcount that'd break that and... it was just obvious this approach wasn't
workable, and was far too fragile.

Important to accept when one reaches such a point, but it wasn't a waste,
as a. there's a lot that can be reused and applied later, b. I learned a
great deal, c. it helped further my research in this area.

I think overall efforts in this direction will require a more ambitious
rework of the anon_vma stuff, something I intend to do :) but it'll all be
done incrementally, with a great deal of care, and obviously working with
the community throughout.

>
> Hugh

Cheers, Lorenzo
Re: [PATCH 00/10] mm/mremap: permit mremap() move of multiple VMAs
Posted by Lorenzo Stoakes 3 months ago
+cc linux-api, FYI - apologies I intended to cc from the start, was simply
an oversight. All future respins will cc.

This series changes mremap() semantics (I will update the manpage
accordingly of course).

Cheers, Lorenzo

On Mon, Jul 07, 2025 at 06:27:43AM +0100, Lorenzo Stoakes wrote:
> Historically we've made it a uAPI requirement that mremap() may only
> operate on a single VMA at a time.
>
> For instances where VMAs need to be resized, this makes sense, as it
> becomes very difficult to determine what a user actually wants should they
> indicate a desire to expand or shrink the size of multiple VMAs (truncate?
> Adjust sizes individually? Some other strategy?).
>
> However, in instances where a user is moving VMAs, it is restrictive to
> disallow this.
>
> This is especially the case when anonymous mapping remap may or may not be
> mergeable depending on whether VMAs have or have not been faulted due to
> anon_vma assignment and folio index alignment with vma->vm_pgoff.
>
> Often this can result in surprising impact where a moved region is faulted,
> then moved back and a user fails to observe a merge from otherwise
> compatible, adjacent VMAs.
>
> This change allows such cases to work without the user having to be
> cognizant of whether a prior mremap() move or other VMA operations has
> resulted in VMA fragmentation.
>
> In order to do this, this series performs a large amount of refactoring,
> most pertinently - grouping sanity checks together, separately those that
> check input parameters and those relating to VMAs.
>
> we also simplify the post-mmap lock drop processing for uffd and mlock()'d
> VMAs.
>
> With this done, we can then fairly straightforwardly implement this
> functionality.
>
> This works exclusively for mremap() invocations which specify
> MREMAP_FIXED. It is not compatible with VMAs which use userfaultfd, as the
> notification of the userland fault handler would require us to drop the
> mmap lock.
>
> The input and output addresses ranges must not overlap. We carefully
> account for moves which would result in VMA merges or would otherwise
> result in VMA iterator invalidation.
>
> Lorenzo Stoakes (10):
>   mm/mremap: perform some simple cleanups
>   mm/mremap: refactor initial parameter sanity checks
>   mm/mremap: put VMA check and prep logic into helper function
>   mm/mremap: cleanup post-processing stage of mremap
>   mm/mremap: use an explicit uffd failure path for mremap
>   mm/mremap: check remap conditions earlier
>   mm/mremap: move remap_is_valid() into check_prep_vma()
>   mm/mremap: clean up mlock populate behaviour
>   mm/mremap: permit mremap() move of multiple VMAs
>   tools/testing/selftests: extend mremap_test to test multi-VMA mremap
>
>  fs/userfaultfd.c                         |  15 +-
>  include/linux/userfaultfd_k.h            |   1 +
>  mm/mremap.c                              | 502 ++++++++++++++---------
>  tools/testing/selftests/mm/mremap_test.c | 145 ++++++-
>  4 files changed, 462 insertions(+), 201 deletions(-)
>
> --
> 2.50.0