[PATCH 0/7] mm: Optimize mseal checks

Pedro Falcato posted 7 patches 1 year, 4 months ago
There is a newer version of this series
mm/internal.h | 30 ++++++++++++++++------
mm/madvise.c  | 13 +++-------
mm/mmap.c     | 36 ++++++++++-----------------
mm/mprotect.c | 12 +++------
mm/mremap.c   | 33 ++++++------------------
mm/mseal.c    | 69 +++++++++++----------------------------------------
6 files changed, 63 insertions(+), 130 deletions(-)
[PATCH 0/7] mm: Optimize mseal checks
Posted by Pedro Falcato 1 year, 4 months ago
Optimize mseal checks by removing the separate can_modify_mm() step, and
just doing checks on the individual vmas, when various operations are
themselves iterating through the tree. This provides a nice speedup.

While I was at it, I found that is_madv_discard() was completely bogus.

Note that my series ignores arch_unmap(), which seems to generally be what we're trending towards[2]. It should
be applied on top of any powerpc vdso ->close patch to avoid regressions on the PPC architecture. No other
architecture seems to use arch_unmap.

Note2: This series does not pass all mseal_tests on my end (test_seal_mremap_move_dontunmap_anyaddr fails twice). But the
top of Linus's tree does not pass these for me either (neither does my Arch Linux 6.10.2 kernel),
for some reason (mremap regression?).

will-it-scale mmap1_process[1] -t 1 results:

commit 3450fe2b574b4345e4296ccae395149e1a357fee:

min:277605 max:277605 total:277605
min:281784 max:281784 total:281784
min:277238 max:277238 total:277238
min:281761 max:281761 total:281761
min:274279 max:274279 total:274279
min:254854 max:254854 total:254854
measurement
min:269143 max:269143 total:269143
min:270454 max:270454 total:270454
min:243523 max:243523 total:243523
min:251148 max:251148 total:251148
min:209669 max:209669 total:209669
min:190426 max:190426 total:190426
min:231219 max:231219 total:231219
min:275364 max:275364 total:275364
min:266540 max:266540 total:266540
min:242572 max:242572 total:242572
min:284469 max:284469 total:284469
min:278882 max:278882 total:278882
min:283269 max:283269 total:283269
min:281204 max:281204 total:281204

After this patch set:

min:280580 max:280580 total:280580
min:290514 max:290514 total:290514
min:291006 max:291006 total:291006
min:290352 max:290352 total:290352
min:294582 max:294582 total:294582
min:293075 max:293075 total:293075
measurement
min:295613 max:295613 total:295613
min:294070 max:294070 total:294070
min:293193 max:293193 total:293193
min:291631 max:291631 total:291631
min:295278 max:295278 total:295278
min:293782 max:293782 total:293782
min:290361 max:290361 total:290361
min:294517 max:294517 total:294517
min:293750 max:293750 total:293750
min:293572 max:293572 total:293572
min:295239 max:295239 total:295239
min:292932 max:292932 total:292932
min:293319 max:293319 total:293319
min:294954 max:294954 total:294954

This was a Completely Unscientific test but seems to show there were around 5-10% gains on ops per second.

[1]: mmap1_process does mmap and munmap in a loop. I didn't bother testing multithreading cases.
[2]: https://lore.kernel.org/all/87o766iehy.fsf@mail.lhotse/
Link: https://lore.kernel.org/all/202408041602.caa0372-oliver.sang@intel.com/

Pedro Falcato (7):
  mm: Move can_modify_vma to mm/internal.h
  mm/munmap: Replace can_modify_mm with can_modify_vma
  mm/mprotect: Replace can_modify_mm with can_modify_vma
  mm/mremap: Replace can_modify_mm with can_modify_vma
  mseal: Fix is_madv_discard()
  mseal: Replace can_modify_mm_madv with a vma variant
  mm: Remove can_modify_mm()

 mm/internal.h | 30 ++++++++++++++++------
 mm/madvise.c  | 13 +++-------
 mm/mmap.c     | 36 ++++++++++-----------------
 mm/mprotect.c | 12 +++------
 mm/mremap.c   | 33 ++++++------------------
 mm/mseal.c    | 69 +++++++++++----------------------------------------
 6 files changed, 63 insertions(+), 130 deletions(-)

-- 
2.46.0
Re: [PATCH 0/7] mm: Optimize mseal checks
Posted by Jeff Xu 1 year, 4 months ago
On Tue, Aug 6, 2024 at 2:28 PM Pedro Falcato <pedro.falcato@gmail.com> wrote:
>
> Optimize mseal checks by removing the separate can_modify_mm() step, and
> just doing checks on the individual vmas, when various operations are
> themselves iterating through the tree. This provides a nice speedup.
>
> While I was at it, I found that is_madv_discard() was completely bogus.
>
Thanks for catching this!
Is it possible to separate this fix out from this series and send it
separately and merge first ?

> Note that my series ignores arch_unmap(), which seems to generally be what we're trending towards[2]. It should
> be applied on top of any powerpc vdso ->close patch to avoid regressions on the PPC architecture. No other
> architecture seems to use arch_unmap.
>
> Note2: This series does not pass all mseal_tests on my end (test_seal_mremap_move_dontunmap_anyaddr fails twice). But the
> top of Linus's tree does not pass these for me either (neither does my Arch Linux 6.10.2 kernel),
> for some reason (mremap regression?).
>
I just sync to Linus's main and I was able to run the test (except two
pkeys related test are skipped because I m on VM)

> will-it-scale mmap1_process[1] -t 1 results:
>
> commit 3450fe2b574b4345e4296ccae395149e1a357fee:
>
> min:277605 max:277605 total:277605
> min:281784 max:281784 total:281784
> min:277238 max:277238 total:277238
> min:281761 max:281761 total:281761
> min:274279 max:274279 total:274279
> min:254854 max:254854 total:254854
> measurement
> min:269143 max:269143 total:269143
> min:270454 max:270454 total:270454
> min:243523 max:243523 total:243523
> min:251148 max:251148 total:251148
> min:209669 max:209669 total:209669
> min:190426 max:190426 total:190426
> min:231219 max:231219 total:231219
> min:275364 max:275364 total:275364
> min:266540 max:266540 total:266540
> min:242572 max:242572 total:242572
> min:284469 max:284469 total:284469
> min:278882 max:278882 total:278882
> min:283269 max:283269 total:283269
> min:281204 max:281204 total:281204
>
> After this patch set:
>
> min:280580 max:280580 total:280580
> min:290514 max:290514 total:290514
> min:291006 max:291006 total:291006
> min:290352 max:290352 total:290352
> min:294582 max:294582 total:294582
> min:293075 max:293075 total:293075
> measurement
> min:295613 max:295613 total:295613
> min:294070 max:294070 total:294070
> min:293193 max:293193 total:293193
> min:291631 max:291631 total:291631
> min:295278 max:295278 total:295278
> min:293782 max:293782 total:293782
> min:290361 max:290361 total:290361
> min:294517 max:294517 total:294517
> min:293750 max:293750 total:293750
> min:293572 max:293572 total:293572
> min:295239 max:295239 total:295239
> min:292932 max:292932 total:292932
> min:293319 max:293319 total:293319
> min:294954 max:294954 total:294954
>
> This was a Completely Unscientific test but seems to show there were around 5-10% gains on ops per second.
>
> [1]: mmap1_process does mmap and munmap in a loop. I didn't bother testing multithreading cases.
> [2]: https://lore.kernel.org/all/87o766iehy.fsf@mail.lhotse/
> Link: https://lore.kernel.org/all/202408041602.caa0372-oliver.sang@intel.com/
>
> Pedro Falcato (7):
>   mm: Move can_modify_vma to mm/internal.h
>   mm/munmap: Replace can_modify_mm with can_modify_vma
>   mm/mprotect: Replace can_modify_mm with can_modify_vma
>   mm/mremap: Replace can_modify_mm with can_modify_vma
>   mseal: Fix is_madv_discard()
>   mseal: Replace can_modify_mm_madv with a vma variant
>   mm: Remove can_modify_mm()
>
>  mm/internal.h | 30 ++++++++++++++++------
>  mm/madvise.c  | 13 +++-------
>  mm/mmap.c     | 36 ++++++++++-----------------
>  mm/mprotect.c | 12 +++------
>  mm/mremap.c   | 33 ++++++------------------
>  mm/mseal.c    | 69 +++++++++++----------------------------------------
>  6 files changed, 63 insertions(+), 130 deletions(-)
>
> --
> 2.46.0
>
Re: [PATCH 0/7] mm: Optimize mseal checks
Posted by Pedro Falcato 1 year, 4 months ago
On Tue, Aug 6, 2024 at 11:25 PM Jeff Xu <jeffxu@google.com> wrote:
>
> On Tue, Aug 6, 2024 at 2:28 PM Pedro Falcato <pedro.falcato@gmail.com> wrote:
> >
> > Optimize mseal checks by removing the separate can_modify_mm() step, and
> > just doing checks on the individual vmas, when various operations are
> > themselves iterating through the tree. This provides a nice speedup.
> >
> > While I was at it, I found that is_madv_discard() was completely bogus.
> >
> Thanks for catching this!
> Is it possible to separate this fix out from this series and send it
> separately and merge first ?

Sure. This series is definitely too risky to catch this release, so
sending it out as a fix (tomorrow, it's late here) sounds ok.

>
> > Note that my series ignores arch_unmap(), which seems to generally be what we're trending towards[2]. It should
> > be applied on top of any powerpc vdso ->close patch to avoid regressions on the PPC architecture. No other
> > architecture seems to use arch_unmap.
> >
> > Note2: This series does not pass all mseal_tests on my end (test_seal_mremap_move_dontunmap_anyaddr fails twice). But the
> > top of Linus's tree does not pass these for me either (neither does my Arch Linux 6.10.2 kernel),
> > for some reason (mremap regression?).
> >
> I just sync to Linus's main and I was able to run the test (except two
> pkeys related test are skipped because I m on VM)

Okay. Fun bug.

I was really confused as to why no one could repro this except me :)

It looks like recently[1] glibc started consuming the new_address
variadic argument when MREMAP_DONTUNMAP. As to the why,
MREMAP_DONTUNMAP also seems to take new_address as a hint (this is not
documented in the man page, and strace also doesn't know this).
However, this trips up some checks that were always fine before
(because glibc always passed NULL, and musl still does):

if (offset_in_page(new_addr))
if (new_len > TASK_SIZE || new_addr > TASK_SIZE - new_len)
if (addr + old_len > new_addr && new_addr + new_len > addr)

^^ These all look at the address without looking at MREMAP_FIXED, and
return -EINVAL if they fail.

So, test_seal_mremap_move_dontunmap_anyaddr passes 0xdeadbeef For Some
Reason (why are you testing mremap in mseal_test.c??), it trips up
offset_in_page(new_addr) in mremap_to, and we crash and burn.

As to why no one else could repro this: I guess you're not running a
glibc new enough ;)

[1] https://sourceware.org/git/?p=glibc.git;a=commit;h=6c40cb0e9f893d49dc7caee580a055de53562206

-- 
Pedro
Re: [PATCH 0/7] mm: Optimize mseal checks
Posted by Jeff Xu 1 year, 4 months ago
On Tue, Aug 6, 2024 at 5:49 PM Pedro Falcato <pedro.falcato@gmail.com> wrote:
>
> On Tue, Aug 6, 2024 at 11:25 PM Jeff Xu <jeffxu@google.com> wrote:
> >
> > On Tue, Aug 6, 2024 at 2:28 PM Pedro Falcato <pedro.falcato@gmail.com> wrote:
> > >
> > > Optimize mseal checks by removing the separate can_modify_mm() step, and
> > > just doing checks on the individual vmas, when various operations are
> > > themselves iterating through the tree. This provides a nice speedup.
> > >
> > > While I was at it, I found that is_madv_discard() was completely bogus.
> > >
> > Thanks for catching this!
> > Is it possible to separate this fix out from this series and send it
> > separately and merge first ?
>
> Sure. This series is definitely too risky to catch this release, so
> sending it out as a fix (tomorrow, it's late here) sounds ok.
>
Do you mind if I send out a fix ? (I will also include a test case to
cover that )

> >
> > > Note that my series ignores arch_unmap(), which seems to generally be what we're trending towards[2]. It should
> > > be applied on top of any powerpc vdso ->close patch to avoid regressions on the PPC architecture. No other
> > > architecture seems to use arch_unmap.
> > >
> > > Note2: This series does not pass all mseal_tests on my end (test_seal_mremap_move_dontunmap_anyaddr fails twice). But the
> > > top of Linus's tree does not pass these for me either (neither does my Arch Linux 6.10.2 kernel),
> > > for some reason (mremap regression?).
> > >
> > I just sync to Linus's main and I was able to run the test (except two
> > pkeys related test are skipped because I m on VM)
>
> Okay. Fun bug.
>
> I was really confused as to why no one could repro this except me :)
>
> It looks like recently[1] glibc started consuming the new_address
> variadic argument when MREMAP_DONTUNMAP. As to the why,
> MREMAP_DONTUNMAP also seems to take new_address as a hint (this is not
> documented in the man page, and strace also doesn't know this).
> However, this trips up some checks that were always fine before
> (because glibc always passed NULL, and musl still does):
>
> if (offset_in_page(new_addr))
> if (new_len > TASK_SIZE || new_addr > TASK_SIZE - new_len)
> if (addr + old_len > new_addr && new_addr + new_len > addr)
>
> ^^ These all look at the address without looking at MREMAP_FIXED, and
> return -EINVAL if they fail.
>
> So, test_seal_mremap_move_dontunmap_anyaddr passes 0xdeadbeef For Some
> Reason (why are you testing mremap in mseal_test.c??), it trips up
> offset_in_page(new_addr) in mremap_to, and we crash and burn.
>
> As to why no one else could repro this: I guess you're not running a
> glibc new enough ;)
>
That makes sense, mystery resolved.

I added sys_ functions for mmap/munmap/mprotect, etc, so that the test
does not depend on libc, but I didn't do that for mremap, I think the
fix will be to add sys_mremap as well.


> [1] https://sourceware.org/git/?p=glibc.git;a=commit;h=6c40cb0e9f893d49dc7caee580a055de53562206
>
> --
> Pedro
Re: [PATCH 0/7] mm: Optimize mseal checks
Posted by Pedro Falcato 1 year, 4 months ago
On Wed, Aug 7, 2024 at 2:40 AM Jeff Xu <jeffxu@google.com> wrote:
>
> On Tue, Aug 6, 2024 at 5:49 PM Pedro Falcato <pedro.falcato@gmail.com> wrote:
> >
> > On Tue, Aug 6, 2024 at 11:25 PM Jeff Xu <jeffxu@google.com> wrote:
> > >
> > > On Tue, Aug 6, 2024 at 2:28 PM Pedro Falcato <pedro.falcato@gmail.com> wrote:
> > > >
> > > > Optimize mseal checks by removing the separate can_modify_mm() step, and
> > > > just doing checks on the individual vmas, when various operations are
> > > > themselves iterating through the tree. This provides a nice speedup.
> > > >
> > > > While I was at it, I found that is_madv_discard() was completely bogus.
> > > >
> > > Thanks for catching this!
> > > Is it possible to separate this fix out from this series and send it
> > > separately and merge first ?
> >
> > Sure. This series is definitely too risky to catch this release, so
> > sending it out as a fix (tomorrow, it's late here) sounds ok.
> >
> Do you mind if I send out a fix ? (I will also include a test case to
> cover that )

No need, I'll handle it before the end of the day.

>
> > >
> > > > Note that my series ignores arch_unmap(), which seems to generally be what we're trending towards[2]. It should
> > > > be applied on top of any powerpc vdso ->close patch to avoid regressions on the PPC architecture. No other
> > > > architecture seems to use arch_unmap.
> > > >
> > > > Note2: This series does not pass all mseal_tests on my end (test_seal_mremap_move_dontunmap_anyaddr fails twice). But the
> > > > top of Linus's tree does not pass these for me either (neither does my Arch Linux 6.10.2 kernel),
> > > > for some reason (mremap regression?).
> > > >
> > > I just sync to Linus's main and I was able to run the test (except two
> > > pkeys related test are skipped because I m on VM)
> >
> > Okay. Fun bug.
> >
> > I was really confused as to why no one could repro this except me :)
> >
> > It looks like recently[1] glibc started consuming the new_address
> > variadic argument when MREMAP_DONTUNMAP. As to the why,
> > MREMAP_DONTUNMAP also seems to take new_address as a hint (this is not
> > documented in the man page, and strace also doesn't know this).
> > However, this trips up some checks that were always fine before
> > (because glibc always passed NULL, and musl still does):
> >
> > if (offset_in_page(new_addr))
> > if (new_len > TASK_SIZE || new_addr > TASK_SIZE - new_len)
> > if (addr + old_len > new_addr && new_addr + new_len > addr)
> >
> > ^^ These all look at the address without looking at MREMAP_FIXED, and
> > return -EINVAL if they fail.
> >
> > So, test_seal_mremap_move_dontunmap_anyaddr passes 0xdeadbeef For Some
> > Reason (why are you testing mremap in mseal_test.c??), it trips up
> > offset_in_page(new_addr) in mremap_to, and we crash and burn.
> >
> > As to why no one else could repro this: I guess you're not running a
> > glibc new enough ;)
> >
> That makes sense, mystery resolved.
>
> I added sys_ functions for mmap/munmap/mprotect, etc, so that the test
> does not depend on libc, but I didn't do that for mremap, I think the
> fix will be to add sys_mremap as well.

I disagree, I don't understand why you're doing this test. And even if
you are rightfully doing the test, the test is wrong (and
mremap_dontunmap.c tests agree, and always pass 0 as new_address).
The manpage needs to be updated to reflect this, and this test either
needs the 0xdeadbeef removed, or the whole thing.

Adding a sys_mremap wrapper is inconsequential here, because you'll
need to decide whether to pick up new_address from the flags argument
and, if you do, it'll fail with the same error, but for everyone.

-- 
Pedro
Re: [PATCH 0/7] mm: Optimize mseal checks
Posted by Jeff Xu 1 year, 4 months ago
On Wed, Aug 7, 2024 at 5:57 AM Pedro Falcato <pedro.falcato@gmail.com> wrote:
>
> On Wed, Aug 7, 2024 at 2:40 AM Jeff Xu <jeffxu@google.com> wrote:
> >
> > On Tue, Aug 6, 2024 at 5:49 PM Pedro Falcato <pedro.falcato@gmail.com> wrote:
> > >
> > > On Tue, Aug 6, 2024 at 11:25 PM Jeff Xu <jeffxu@google.com> wrote:
> > > >
> > > > On Tue, Aug 6, 2024 at 2:28 PM Pedro Falcato <pedro.falcato@gmail.com> wrote:
> > > > >
> > > > > Optimize mseal checks by removing the separate can_modify_mm() step, and
> > > > > just doing checks on the individual vmas, when various operations are
> > > > > themselves iterating through the tree. This provides a nice speedup.
> > > > >
> > > > > While I was at it, I found that is_madv_discard() was completely bogus.
> > > > >
> > > > Thanks for catching this!
> > > > Is it possible to separate this fix out from this series and send it
> > > > separately and merge first ?
> > >
> > > Sure. This series is definitely too risky to catch this release, so
> > > sending it out as a fix (tomorrow, it's late here) sounds ok.
> > >
> > Do you mind if I send out a fix ? (I will also include a test case to
> > cover that )
>
> No need, I'll handle it before the end of the day.
>
> >
> > > >
> > > > > Note that my series ignores arch_unmap(), which seems to generally be what we're trending towards[2]. It should
> > > > > be applied on top of any powerpc vdso ->close patch to avoid regressions on the PPC architecture. No other
> > > > > architecture seems to use arch_unmap.
> > > > >
> > > > > Note2: This series does not pass all mseal_tests on my end (test_seal_mremap_move_dontunmap_anyaddr fails twice). But the
> > > > > top of Linus's tree does not pass these for me either (neither does my Arch Linux 6.10.2 kernel),
> > > > > for some reason (mremap regression?).
> > > > >
> > > > I just sync to Linus's main and I was able to run the test (except two
> > > > pkeys related test are skipped because I m on VM)
> > >
> > > Okay. Fun bug.
> > >
> > > I was really confused as to why no one could repro this except me :)
> > >
> > > It looks like recently[1] glibc started consuming the new_address
> > > variadic argument when MREMAP_DONTUNMAP. As to the why,
> > > MREMAP_DONTUNMAP also seems to take new_address as a hint (this is not
> > > documented in the man page, and strace also doesn't know this).
> > > However, this trips up some checks that were always fine before
> > > (because glibc always passed NULL, and musl still does):
> > >
> > > if (offset_in_page(new_addr))
> > > if (new_len > TASK_SIZE || new_addr > TASK_SIZE - new_len)
> > > if (addr + old_len > new_addr && new_addr + new_len > addr)
> > >
> > > ^^ These all look at the address without looking at MREMAP_FIXED, and
> > > return -EINVAL if they fail.
> > >
> > > So, test_seal_mremap_move_dontunmap_anyaddr passes 0xdeadbeef For Some
> > > Reason (why are you testing mremap in mseal_test.c??), it trips up
> > > offset_in_page(new_addr) in mremap_to, and we crash and burn.
> > >
> > > As to why no one else could repro this: I guess you're not running a
> > > glibc new enough ;)
> > >
> > That makes sense, mystery resolved.
> >
> > I added sys_ functions for mmap/munmap/mprotect, etc, so that the test
> > does not depend on libc, but I didn't do that for mremap, I think the
> > fix will be to add sys_mremap as well.
>
> I disagree, I don't understand why you're doing this test.
The test is for testing the return error code should be EPERM in case
of sealing.

> And even if
> you are rightfully doing the test, the test is wrong (and
> mremap_dontunmap.c tests agree, and always pass 0 as new_address).
the new_address don't need to be 0.
E.g.
mremap(ptr, size, size, MREMAP_MAYMOVE | MREMAP_DONTUNMAP, 0xdead0000);
Will success and relocate memory to 0xdead0000


> The manpage needs to be updated to reflect this, and this test either
> needs the 0xdeadbeef removed, or the whole thing.
>
> Adding a sys_mremap wrapper is inconsequential here, because you'll
> need to decide whether to pick up new_address from the flags argument
> and, if you do, it'll fail with the same error, but for everyone.
>
I will send a patch and you can try on your sys (since I don't have
the new glibc installed)

> --
> Pedro