mm/internal.h | 30 ++++++++++++++++------ mm/madvise.c | 13 +++------- mm/mmap.c | 36 ++++++++++----------------- mm/mprotect.c | 12 +++------ mm/mremap.c | 33 ++++++------------------ mm/mseal.c | 69 +++++++++++---------------------------------------- 6 files changed, 63 insertions(+), 130 deletions(-)
Optimize mseal checks by removing the separate can_modify_mm() step, and just doing checks on the individual vmas, when various operations are themselves iterating through the tree. This provides a nice speedup. While I was at it, I found that is_madv_discard() was completely bogus. Note that my series ignores arch_unmap(), which seems to generally be what we're trending towards[2]. It should be applied on top of any powerpc vdso ->close patch to avoid regressions on the PPC architecture. No other architecture seems to use arch_unmap. Note2: This series does not pass all mseal_tests on my end (test_seal_mremap_move_dontunmap_anyaddr fails twice). But the top of Linus's tree does not pass these for me either (neither does my Arch Linux 6.10.2 kernel), for some reason (mremap regression?). will-it-scale mmap1_process[1] -t 1 results: commit 3450fe2b574b4345e4296ccae395149e1a357fee: min:277605 max:277605 total:277605 min:281784 max:281784 total:281784 min:277238 max:277238 total:277238 min:281761 max:281761 total:281761 min:274279 max:274279 total:274279 min:254854 max:254854 total:254854 measurement min:269143 max:269143 total:269143 min:270454 max:270454 total:270454 min:243523 max:243523 total:243523 min:251148 max:251148 total:251148 min:209669 max:209669 total:209669 min:190426 max:190426 total:190426 min:231219 max:231219 total:231219 min:275364 max:275364 total:275364 min:266540 max:266540 total:266540 min:242572 max:242572 total:242572 min:284469 max:284469 total:284469 min:278882 max:278882 total:278882 min:283269 max:283269 total:283269 min:281204 max:281204 total:281204 After this patch set: min:280580 max:280580 total:280580 min:290514 max:290514 total:290514 min:291006 max:291006 total:291006 min:290352 max:290352 total:290352 min:294582 max:294582 total:294582 min:293075 max:293075 total:293075 measurement min:295613 max:295613 total:295613 min:294070 max:294070 total:294070 min:293193 max:293193 total:293193 min:291631 max:291631 total:291631 min:295278 max:295278 total:295278 min:293782 max:293782 total:293782 min:290361 max:290361 total:290361 min:294517 max:294517 total:294517 min:293750 max:293750 total:293750 min:293572 max:293572 total:293572 min:295239 max:295239 total:295239 min:292932 max:292932 total:292932 min:293319 max:293319 total:293319 min:294954 max:294954 total:294954 This was a Completely Unscientific test but seems to show there were around 5-10% gains on ops per second. [1]: mmap1_process does mmap and munmap in a loop. I didn't bother testing multithreading cases. [2]: https://lore.kernel.org/all/87o766iehy.fsf@mail.lhotse/ Link: https://lore.kernel.org/all/202408041602.caa0372-oliver.sang@intel.com/ Pedro Falcato (7): mm: Move can_modify_vma to mm/internal.h mm/munmap: Replace can_modify_mm with can_modify_vma mm/mprotect: Replace can_modify_mm with can_modify_vma mm/mremap: Replace can_modify_mm with can_modify_vma mseal: Fix is_madv_discard() mseal: Replace can_modify_mm_madv with a vma variant mm: Remove can_modify_mm() mm/internal.h | 30 ++++++++++++++++------ mm/madvise.c | 13 +++------- mm/mmap.c | 36 ++++++++++----------------- mm/mprotect.c | 12 +++------ mm/mremap.c | 33 ++++++------------------ mm/mseal.c | 69 +++++++++++---------------------------------------- 6 files changed, 63 insertions(+), 130 deletions(-) -- 2.46.0
On Tue, Aug 6, 2024 at 2:28 PM Pedro Falcato <pedro.falcato@gmail.com> wrote: > > Optimize mseal checks by removing the separate can_modify_mm() step, and > just doing checks on the individual vmas, when various operations are > themselves iterating through the tree. This provides a nice speedup. > > While I was at it, I found that is_madv_discard() was completely bogus. > Thanks for catching this! Is it possible to separate this fix out from this series and send it separately and merge first ? > Note that my series ignores arch_unmap(), which seems to generally be what we're trending towards[2]. It should > be applied on top of any powerpc vdso ->close patch to avoid regressions on the PPC architecture. No other > architecture seems to use arch_unmap. > > Note2: This series does not pass all mseal_tests on my end (test_seal_mremap_move_dontunmap_anyaddr fails twice). But the > top of Linus's tree does not pass these for me either (neither does my Arch Linux 6.10.2 kernel), > for some reason (mremap regression?). > I just sync to Linus's main and I was able to run the test (except two pkeys related test are skipped because I m on VM) > will-it-scale mmap1_process[1] -t 1 results: > > commit 3450fe2b574b4345e4296ccae395149e1a357fee: > > min:277605 max:277605 total:277605 > min:281784 max:281784 total:281784 > min:277238 max:277238 total:277238 > min:281761 max:281761 total:281761 > min:274279 max:274279 total:274279 > min:254854 max:254854 total:254854 > measurement > min:269143 max:269143 total:269143 > min:270454 max:270454 total:270454 > min:243523 max:243523 total:243523 > min:251148 max:251148 total:251148 > min:209669 max:209669 total:209669 > min:190426 max:190426 total:190426 > min:231219 max:231219 total:231219 > min:275364 max:275364 total:275364 > min:266540 max:266540 total:266540 > min:242572 max:242572 total:242572 > min:284469 max:284469 total:284469 > min:278882 max:278882 total:278882 > min:283269 max:283269 total:283269 > min:281204 max:281204 total:281204 > > After this patch set: > > min:280580 max:280580 total:280580 > min:290514 max:290514 total:290514 > min:291006 max:291006 total:291006 > min:290352 max:290352 total:290352 > min:294582 max:294582 total:294582 > min:293075 max:293075 total:293075 > measurement > min:295613 max:295613 total:295613 > min:294070 max:294070 total:294070 > min:293193 max:293193 total:293193 > min:291631 max:291631 total:291631 > min:295278 max:295278 total:295278 > min:293782 max:293782 total:293782 > min:290361 max:290361 total:290361 > min:294517 max:294517 total:294517 > min:293750 max:293750 total:293750 > min:293572 max:293572 total:293572 > min:295239 max:295239 total:295239 > min:292932 max:292932 total:292932 > min:293319 max:293319 total:293319 > min:294954 max:294954 total:294954 > > This was a Completely Unscientific test but seems to show there were around 5-10% gains on ops per second. > > [1]: mmap1_process does mmap and munmap in a loop. I didn't bother testing multithreading cases. > [2]: https://lore.kernel.org/all/87o766iehy.fsf@mail.lhotse/ > Link: https://lore.kernel.org/all/202408041602.caa0372-oliver.sang@intel.com/ > > Pedro Falcato (7): > mm: Move can_modify_vma to mm/internal.h > mm/munmap: Replace can_modify_mm with can_modify_vma > mm/mprotect: Replace can_modify_mm with can_modify_vma > mm/mremap: Replace can_modify_mm with can_modify_vma > mseal: Fix is_madv_discard() > mseal: Replace can_modify_mm_madv with a vma variant > mm: Remove can_modify_mm() > > mm/internal.h | 30 ++++++++++++++++------ > mm/madvise.c | 13 +++------- > mm/mmap.c | 36 ++++++++++----------------- > mm/mprotect.c | 12 +++------ > mm/mremap.c | 33 ++++++------------------ > mm/mseal.c | 69 +++++++++++---------------------------------------- > 6 files changed, 63 insertions(+), 130 deletions(-) > > -- > 2.46.0 >
On Tue, Aug 6, 2024 at 11:25 PM Jeff Xu <jeffxu@google.com> wrote: > > On Tue, Aug 6, 2024 at 2:28 PM Pedro Falcato <pedro.falcato@gmail.com> wrote: > > > > Optimize mseal checks by removing the separate can_modify_mm() step, and > > just doing checks on the individual vmas, when various operations are > > themselves iterating through the tree. This provides a nice speedup. > > > > While I was at it, I found that is_madv_discard() was completely bogus. > > > Thanks for catching this! > Is it possible to separate this fix out from this series and send it > separately and merge first ? Sure. This series is definitely too risky to catch this release, so sending it out as a fix (tomorrow, it's late here) sounds ok. > > > Note that my series ignores arch_unmap(), which seems to generally be what we're trending towards[2]. It should > > be applied on top of any powerpc vdso ->close patch to avoid regressions on the PPC architecture. No other > > architecture seems to use arch_unmap. > > > > Note2: This series does not pass all mseal_tests on my end (test_seal_mremap_move_dontunmap_anyaddr fails twice). But the > > top of Linus's tree does not pass these for me either (neither does my Arch Linux 6.10.2 kernel), > > for some reason (mremap regression?). > > > I just sync to Linus's main and I was able to run the test (except two > pkeys related test are skipped because I m on VM) Okay. Fun bug. I was really confused as to why no one could repro this except me :) It looks like recently[1] glibc started consuming the new_address variadic argument when MREMAP_DONTUNMAP. As to the why, MREMAP_DONTUNMAP also seems to take new_address as a hint (this is not documented in the man page, and strace also doesn't know this). However, this trips up some checks that were always fine before (because glibc always passed NULL, and musl still does): if (offset_in_page(new_addr)) if (new_len > TASK_SIZE || new_addr > TASK_SIZE - new_len) if (addr + old_len > new_addr && new_addr + new_len > addr) ^^ These all look at the address without looking at MREMAP_FIXED, and return -EINVAL if they fail. So, test_seal_mremap_move_dontunmap_anyaddr passes 0xdeadbeef For Some Reason (why are you testing mremap in mseal_test.c??), it trips up offset_in_page(new_addr) in mremap_to, and we crash and burn. As to why no one else could repro this: I guess you're not running a glibc new enough ;) [1] https://sourceware.org/git/?p=glibc.git;a=commit;h=6c40cb0e9f893d49dc7caee580a055de53562206 -- Pedro
On Tue, Aug 6, 2024 at 5:49 PM Pedro Falcato <pedro.falcato@gmail.com> wrote: > > On Tue, Aug 6, 2024 at 11:25 PM Jeff Xu <jeffxu@google.com> wrote: > > > > On Tue, Aug 6, 2024 at 2:28 PM Pedro Falcato <pedro.falcato@gmail.com> wrote: > > > > > > Optimize mseal checks by removing the separate can_modify_mm() step, and > > > just doing checks on the individual vmas, when various operations are > > > themselves iterating through the tree. This provides a nice speedup. > > > > > > While I was at it, I found that is_madv_discard() was completely bogus. > > > > > Thanks for catching this! > > Is it possible to separate this fix out from this series and send it > > separately and merge first ? > > Sure. This series is definitely too risky to catch this release, so > sending it out as a fix (tomorrow, it's late here) sounds ok. > Do you mind if I send out a fix ? (I will also include a test case to cover that ) > > > > > Note that my series ignores arch_unmap(), which seems to generally be what we're trending towards[2]. It should > > > be applied on top of any powerpc vdso ->close patch to avoid regressions on the PPC architecture. No other > > > architecture seems to use arch_unmap. > > > > > > Note2: This series does not pass all mseal_tests on my end (test_seal_mremap_move_dontunmap_anyaddr fails twice). But the > > > top of Linus's tree does not pass these for me either (neither does my Arch Linux 6.10.2 kernel), > > > for some reason (mremap regression?). > > > > > I just sync to Linus's main and I was able to run the test (except two > > pkeys related test are skipped because I m on VM) > > Okay. Fun bug. > > I was really confused as to why no one could repro this except me :) > > It looks like recently[1] glibc started consuming the new_address > variadic argument when MREMAP_DONTUNMAP. As to the why, > MREMAP_DONTUNMAP also seems to take new_address as a hint (this is not > documented in the man page, and strace also doesn't know this). > However, this trips up some checks that were always fine before > (because glibc always passed NULL, and musl still does): > > if (offset_in_page(new_addr)) > if (new_len > TASK_SIZE || new_addr > TASK_SIZE - new_len) > if (addr + old_len > new_addr && new_addr + new_len > addr) > > ^^ These all look at the address without looking at MREMAP_FIXED, and > return -EINVAL if they fail. > > So, test_seal_mremap_move_dontunmap_anyaddr passes 0xdeadbeef For Some > Reason (why are you testing mremap in mseal_test.c??), it trips up > offset_in_page(new_addr) in mremap_to, and we crash and burn. > > As to why no one else could repro this: I guess you're not running a > glibc new enough ;) > That makes sense, mystery resolved. I added sys_ functions for mmap/munmap/mprotect, etc, so that the test does not depend on libc, but I didn't do that for mremap, I think the fix will be to add sys_mremap as well. > [1] https://sourceware.org/git/?p=glibc.git;a=commit;h=6c40cb0e9f893d49dc7caee580a055de53562206 > > -- > Pedro
On Wed, Aug 7, 2024 at 2:40 AM Jeff Xu <jeffxu@google.com> wrote: > > On Tue, Aug 6, 2024 at 5:49 PM Pedro Falcato <pedro.falcato@gmail.com> wrote: > > > > On Tue, Aug 6, 2024 at 11:25 PM Jeff Xu <jeffxu@google.com> wrote: > > > > > > On Tue, Aug 6, 2024 at 2:28 PM Pedro Falcato <pedro.falcato@gmail.com> wrote: > > > > > > > > Optimize mseal checks by removing the separate can_modify_mm() step, and > > > > just doing checks on the individual vmas, when various operations are > > > > themselves iterating through the tree. This provides a nice speedup. > > > > > > > > While I was at it, I found that is_madv_discard() was completely bogus. > > > > > > > Thanks for catching this! > > > Is it possible to separate this fix out from this series and send it > > > separately and merge first ? > > > > Sure. This series is definitely too risky to catch this release, so > > sending it out as a fix (tomorrow, it's late here) sounds ok. > > > Do you mind if I send out a fix ? (I will also include a test case to > cover that ) No need, I'll handle it before the end of the day. > > > > > > > > Note that my series ignores arch_unmap(), which seems to generally be what we're trending towards[2]. It should > > > > be applied on top of any powerpc vdso ->close patch to avoid regressions on the PPC architecture. No other > > > > architecture seems to use arch_unmap. > > > > > > > > Note2: This series does not pass all mseal_tests on my end (test_seal_mremap_move_dontunmap_anyaddr fails twice). But the > > > > top of Linus's tree does not pass these for me either (neither does my Arch Linux 6.10.2 kernel), > > > > for some reason (mremap regression?). > > > > > > > I just sync to Linus's main and I was able to run the test (except two > > > pkeys related test are skipped because I m on VM) > > > > Okay. Fun bug. > > > > I was really confused as to why no one could repro this except me :) > > > > It looks like recently[1] glibc started consuming the new_address > > variadic argument when MREMAP_DONTUNMAP. As to the why, > > MREMAP_DONTUNMAP also seems to take new_address as a hint (this is not > > documented in the man page, and strace also doesn't know this). > > However, this trips up some checks that were always fine before > > (because glibc always passed NULL, and musl still does): > > > > if (offset_in_page(new_addr)) > > if (new_len > TASK_SIZE || new_addr > TASK_SIZE - new_len) > > if (addr + old_len > new_addr && new_addr + new_len > addr) > > > > ^^ These all look at the address without looking at MREMAP_FIXED, and > > return -EINVAL if they fail. > > > > So, test_seal_mremap_move_dontunmap_anyaddr passes 0xdeadbeef For Some > > Reason (why are you testing mremap in mseal_test.c??), it trips up > > offset_in_page(new_addr) in mremap_to, and we crash and burn. > > > > As to why no one else could repro this: I guess you're not running a > > glibc new enough ;) > > > That makes sense, mystery resolved. > > I added sys_ functions for mmap/munmap/mprotect, etc, so that the test > does not depend on libc, but I didn't do that for mremap, I think the > fix will be to add sys_mremap as well. I disagree, I don't understand why you're doing this test. And even if you are rightfully doing the test, the test is wrong (and mremap_dontunmap.c tests agree, and always pass 0 as new_address). The manpage needs to be updated to reflect this, and this test either needs the 0xdeadbeef removed, or the whole thing. Adding a sys_mremap wrapper is inconsequential here, because you'll need to decide whether to pick up new_address from the flags argument and, if you do, it'll fail with the same error, but for everyone. -- Pedro
On Wed, Aug 7, 2024 at 5:57 AM Pedro Falcato <pedro.falcato@gmail.com> wrote: > > On Wed, Aug 7, 2024 at 2:40 AM Jeff Xu <jeffxu@google.com> wrote: > > > > On Tue, Aug 6, 2024 at 5:49 PM Pedro Falcato <pedro.falcato@gmail.com> wrote: > > > > > > On Tue, Aug 6, 2024 at 11:25 PM Jeff Xu <jeffxu@google.com> wrote: > > > > > > > > On Tue, Aug 6, 2024 at 2:28 PM Pedro Falcato <pedro.falcato@gmail.com> wrote: > > > > > > > > > > Optimize mseal checks by removing the separate can_modify_mm() step, and > > > > > just doing checks on the individual vmas, when various operations are > > > > > themselves iterating through the tree. This provides a nice speedup. > > > > > > > > > > While I was at it, I found that is_madv_discard() was completely bogus. > > > > > > > > > Thanks for catching this! > > > > Is it possible to separate this fix out from this series and send it > > > > separately and merge first ? > > > > > > Sure. This series is definitely too risky to catch this release, so > > > sending it out as a fix (tomorrow, it's late here) sounds ok. > > > > > Do you mind if I send out a fix ? (I will also include a test case to > > cover that ) > > No need, I'll handle it before the end of the day. > > > > > > > > > > > > Note that my series ignores arch_unmap(), which seems to generally be what we're trending towards[2]. It should > > > > > be applied on top of any powerpc vdso ->close patch to avoid regressions on the PPC architecture. No other > > > > > architecture seems to use arch_unmap. > > > > > > > > > > Note2: This series does not pass all mseal_tests on my end (test_seal_mremap_move_dontunmap_anyaddr fails twice). But the > > > > > top of Linus's tree does not pass these for me either (neither does my Arch Linux 6.10.2 kernel), > > > > > for some reason (mremap regression?). > > > > > > > > > I just sync to Linus's main and I was able to run the test (except two > > > > pkeys related test are skipped because I m on VM) > > > > > > Okay. Fun bug. > > > > > > I was really confused as to why no one could repro this except me :) > > > > > > It looks like recently[1] glibc started consuming the new_address > > > variadic argument when MREMAP_DONTUNMAP. As to the why, > > > MREMAP_DONTUNMAP also seems to take new_address as a hint (this is not > > > documented in the man page, and strace also doesn't know this). > > > However, this trips up some checks that were always fine before > > > (because glibc always passed NULL, and musl still does): > > > > > > if (offset_in_page(new_addr)) > > > if (new_len > TASK_SIZE || new_addr > TASK_SIZE - new_len) > > > if (addr + old_len > new_addr && new_addr + new_len > addr) > > > > > > ^^ These all look at the address without looking at MREMAP_FIXED, and > > > return -EINVAL if they fail. > > > > > > So, test_seal_mremap_move_dontunmap_anyaddr passes 0xdeadbeef For Some > > > Reason (why are you testing mremap in mseal_test.c??), it trips up > > > offset_in_page(new_addr) in mremap_to, and we crash and burn. > > > > > > As to why no one else could repro this: I guess you're not running a > > > glibc new enough ;) > > > > > That makes sense, mystery resolved. > > > > I added sys_ functions for mmap/munmap/mprotect, etc, so that the test > > does not depend on libc, but I didn't do that for mremap, I think the > > fix will be to add sys_mremap as well. > > I disagree, I don't understand why you're doing this test. The test is for testing the return error code should be EPERM in case of sealing. > And even if > you are rightfully doing the test, the test is wrong (and > mremap_dontunmap.c tests agree, and always pass 0 as new_address). the new_address don't need to be 0. E.g. mremap(ptr, size, size, MREMAP_MAYMOVE | MREMAP_DONTUNMAP, 0xdead0000); Will success and relocate memory to 0xdead0000 > The manpage needs to be updated to reflect this, and this test either > needs the 0xdeadbeef removed, or the whole thing. > > Adding a sys_mremap wrapper is inconsequential here, because you'll > need to decide whether to pick up new_address from the flags argument > and, if you do, it'll fail with the same error, but for everyone. > I will send a patch and you can try on your sys (since I don't have the new glibc installed) > -- > Pedro
© 2016 - 2025 Red Hat, Inc.