The nr_hugepgs variable is used to keep the original nr_hugepages at the
hugepage setup step at test beginning. After userfaultfd test, a cleaup is
executed, both /sys/kernel/mm/hugepages/hugepages-*/nr_hugepages and
/proc/sys//vm/nr_hugepages are reset to 'original' value before userfaultfd
test starts.
Issue here is the value used to restore /proc/sys/vm/nr_hugepages is
nr_hugepgs which is the initial value before the vm_runtests.sh runs, not
the value before userfaultfd test starts. 'va_high_addr_swith.sh' tests
runs after that will possibly see no hugepages available for test, and got
EINVAL when mmap(HUGETLB), making the result invalid.
And before pkey tests, nr_hugepgs is changed to be used as a temp variable
to save nr_hugepages before pkey test, and restore it after pkey tests
finish. The original nr_hugepages value is not tracked anymore, so no way
to restore it after all tests finish.
Add a new variable nr_hugepgs_origin to save the original nr_hugepages, and
and restore it to nr_hugepages after all tests finish. And change to use
the nr_hugepgs variable to save the /proc/sys/vm/nr_hugeages after hugepage
setup, it's also the value before userfaultfd test starts, and the correct
value to be restored after userfaultfd finishes. The va_high_addr_switch.sh
broken will be resolved.
Signed-off-by: Chunyu Hu <chuhu@redhat.com>
---
tools/testing/selftests/mm/run_vmtests.sh | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh
index 471e539d82b8..f1a7ad3ec6a7 100755
--- a/tools/testing/selftests/mm/run_vmtests.sh
+++ b/tools/testing/selftests/mm/run_vmtests.sh
@@ -172,13 +172,13 @@ fi
# set proper nr_hugepages
if [ -n "$freepgs" ] && [ -n "$hpgsize_KB" ]; then
- nr_hugepgs=$(cat /proc/sys/vm/nr_hugepages)
+ nr_hugepgs_origin=$(cat /proc/sys/vm/nr_hugepages)
needpgs=$((needmem_KB / hpgsize_KB))
tries=2
while [ "$tries" -gt 0 ] && [ "$freepgs" -lt "$needpgs" ]; do
lackpgs=$((needpgs - freepgs))
echo 3 > /proc/sys/vm/drop_caches
- if ! echo $((lackpgs + nr_hugepgs)) > /proc/sys/vm/nr_hugepages; then
+ if ! echo $((lackpgs + nr_hugepgs_origin)) > /proc/sys/vm/nr_hugepages; then
echo "Please run this test as root"
exit $ksft_skip
fi
@@ -189,6 +189,7 @@ if [ -n "$freepgs" ] && [ -n "$hpgsize_KB" ]; then
done < /proc/meminfo
tries=$((tries - 1))
done
+ nr_hugepgs=$(cat /proc/sys/vm/nr_hugepages)
if [ "$freepgs" -lt "$needpgs" ]; then
printf "Not enough huge pages available (%d < %d)\n" \
"$freepgs" "$needpgs"
@@ -532,6 +533,10 @@ CATEGORY="page_frag" run_test ./test_page_frag.sh aligned
CATEGORY="page_frag" run_test ./test_page_frag.sh nonaligned
+if [ "${HAVE_HUGEPAGES}" = 1 ]; then
+ echo "$nr_hugepgs_origin" > /proc/sys/vm/nr_hugepages
+fi
+
echo "SUMMARY: PASS=${count_pass} SKIP=${count_skip} FAIL=${count_fail}" | tap_prefix
echo "1..${count_total}" | tap_output
--
2.49.0
On 27.08.25 09:52, Chunyu Hu wrote: > The nr_hugepgs variable is used to keep the original nr_hugepages at the > hugepage setup step at test beginning. After userfaultfd test, a cleaup is > executed, both /sys/kernel/mm/hugepages/hugepages-*/nr_hugepages and > /proc/sys//vm/nr_hugepages are reset to 'original' value before userfaultfd > test starts. > > Issue here is the value used to restore /proc/sys/vm/nr_hugepages is > nr_hugepgs which is the initial value before the vm_runtests.sh runs, not > the value before userfaultfd test starts. 'va_high_addr_swith.sh' tests > runs after that will possibly see no hugepages available for test, and got > EINVAL when mmap(HUGETLB), making the result invalid. > > And before pkey tests, nr_hugepgs is changed to be used as a temp variable > to save nr_hugepages before pkey test, and restore it after pkey tests > finish. The original nr_hugepages value is not tracked anymore, so no way > to restore it after all tests finish. > > Add a new variable nr_hugepgs_origin to save the original nr_hugepages, and > and restore it to nr_hugepages after all tests finish. And change to use > the nr_hugepgs variable to save the /proc/sys/vm/nr_hugeages after hugepage > setup, it's also the value before userfaultfd test starts, and the correct > value to be restored after userfaultfd finishes. The va_high_addr_switch.sh > broken will be resolved. > > Signed-off-by: Chunyu Hu <chuhu@redhat.com> > --- > tools/testing/selftests/mm/run_vmtests.sh | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh > index 471e539d82b8..f1a7ad3ec6a7 100755 > --- a/tools/testing/selftests/mm/run_vmtests.sh > +++ b/tools/testing/selftests/mm/run_vmtests.sh > @@ -172,13 +172,13 @@ fi > > # set proper nr_hugepages > if [ -n "$freepgs" ] && [ -n "$hpgsize_KB" ]; then > - nr_hugepgs=$(cat /proc/sys/vm/nr_hugepages) > + nr_hugepgs_origin=$(cat /proc/sys/vm/nr_hugepages) I'd call this "orig_nr_hugepgs". But it's a shame that the naming is then out of sync with nr_size_hugepgs? > needpgs=$((needmem_KB / hpgsize_KB)) > tries=2 > while [ "$tries" -gt 0 ] && [ "$freepgs" -lt "$needpgs" ]; do > lackpgs=$((needpgs - freepgs)) > echo 3 > /proc/sys/vm/drop_caches > - if ! echo $((lackpgs + nr_hugepgs)) > /proc/sys/vm/nr_hugepages; then > + if ! echo $((lackpgs + nr_hugepgs_origin)) > /proc/sys/vm/nr_hugepages; then > echo "Please run this test as root" > exit $ksft_skip > fi > @@ -189,6 +189,7 @@ if [ -n "$freepgs" ] && [ -n "$hpgsize_KB" ]; then > done < /proc/meminfo > tries=$((tries - 1)) > done > + nr_hugepgs=$(cat /proc/sys/vm/nr_hugepages) > if [ "$freepgs" -lt "$needpgs" ]; then > printf "Not enough huge pages available (%d < %d)\n" \ > "$freepgs" "$needpgs" > @@ -532,6 +533,10 @@ CATEGORY="page_frag" run_test ./test_page_frag.sh aligned > > CATEGORY="page_frag" run_test ./test_page_frag.sh nonaligned > > +if [ "${HAVE_HUGEPAGES}" = 1 ]; then > + echo "$nr_hugepgs_origin" > /proc/sys/vm/nr_hugepages > +fi FWIW, I think the tests should maybe be doing that (save+configure+restore) themselves, like we do with THP settings through. thp_save_settings() thp_write_settings() and friends. This is not really something run_vmtests.sh should bother with. A bigger rework, though ... -- Cheers David / dhildenb
On Wed, Aug 27, 2025 at 7:41 PM David Hildenbrand <david@redhat.com> wrote: > > On 27.08.25 09:52, Chunyu Hu wrote: > > The nr_hugepgs variable is used to keep the original nr_hugepages at the > > hugepage setup step at test beginning. After userfaultfd test, a cleaup is > > executed, both /sys/kernel/mm/hugepages/hugepages-*/nr_hugepages and > > /proc/sys//vm/nr_hugepages are reset to 'original' value before userfaultfd > > test starts. > > > > Issue here is the value used to restore /proc/sys/vm/nr_hugepages is > > nr_hugepgs which is the initial value before the vm_runtests.sh runs, not > > the value before userfaultfd test starts. 'va_high_addr_swith.sh' tests > > runs after that will possibly see no hugepages available for test, and got > > EINVAL when mmap(HUGETLB), making the result invalid. > > > > And before pkey tests, nr_hugepgs is changed to be used as a temp variable > > to save nr_hugepages before pkey test, and restore it after pkey tests > > finish. The original nr_hugepages value is not tracked anymore, so no way > > to restore it after all tests finish. > > > > Add a new variable nr_hugepgs_origin to save the original nr_hugepages, and > > and restore it to nr_hugepages after all tests finish. And change to use > > the nr_hugepgs variable to save the /proc/sys/vm/nr_hugeages after hugepage > > setup, it's also the value before userfaultfd test starts, and the correct > > value to be restored after userfaultfd finishes. The va_high_addr_switch.sh > > broken will be resolved. > > > > Signed-off-by: Chunyu Hu <chuhu@redhat.com> > > --- > > tools/testing/selftests/mm/run_vmtests.sh | 9 +++++++-- > > 1 file changed, 7 insertions(+), 2 deletions(-) > > > > diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh > > index 471e539d82b8..f1a7ad3ec6a7 100755 > > --- a/tools/testing/selftests/mm/run_vmtests.sh > > +++ b/tools/testing/selftests/mm/run_vmtests.sh > > @@ -172,13 +172,13 @@ fi > > > > # set proper nr_hugepages > > if [ -n "$freepgs" ] && [ -n "$hpgsize_KB" ]; then > > - nr_hugepgs=$(cat /proc/sys/vm/nr_hugepages) > > + nr_hugepgs_origin=$(cat /proc/sys/vm/nr_hugepages) > > I'd call this "orig_nr_hugepgs". Hi David, Thank you for your review and valuable feedback. I will rename it with a v2 and resend the two patches. Do you have suggestions on patch 2? > > But it's a shame that the naming is then out of sync with nr_size_hugepgs? nr_size_hugepgs is for uffd-wp-mremap, the test need all sizes hugepages, it's used to save and restore the nr_hugepagees of all sizes of hugepages, it's a test case setup, not like nr_hugepgs which is a global/general setup. They are not the same kind, maybe they don't need to be aligned... > > > > needpgs=$((needmem_KB / hpgsize_KB)) > > tries=2 > > while [ "$tries" -gt 0 ] && [ "$freepgs" -lt "$needpgs" ]; do > > lackpgs=$((needpgs - freepgs)) > > echo 3 > /proc/sys/vm/drop_caches > > - if ! echo $((lackpgs + nr_hugepgs)) > /proc/sys/vm/nr_hugepages; then > > + if ! echo $((lackpgs + nr_hugepgs_origin)) > /proc/sys/vm/nr_hugepages; then > > echo "Please run this test as root" > > exit $ksft_skip > > fi > > @@ -189,6 +189,7 @@ if [ -n "$freepgs" ] && [ -n "$hpgsize_KB" ]; then > > done < /proc/meminfo > > tries=$((tries - 1)) > > done > > + nr_hugepgs=$(cat /proc/sys/vm/nr_hugepages) > > if [ "$freepgs" -lt "$needpgs" ]; then > > printf "Not enough huge pages available (%d < %d)\n" \ > > "$freepgs" "$needpgs" > > @@ -532,6 +533,10 @@ CATEGORY="page_frag" run_test ./test_page_frag.sh aligned > > > > CATEGORY="page_frag" run_test ./test_page_frag.sh nonaligned > > > > +if [ "${HAVE_HUGEPAGES}" = 1 ]; then > > + echo "$nr_hugepgs_origin" > /proc/sys/vm/nr_hugepages > > +fi > > FWIW, I think the tests should maybe be doing that > (save+configure+restore) themselves, like we do with THP settings through. > > thp_save_settings() > thp_write_settings() > > and friends. > > This is not really something run_vmtests.sh should bother with. > > A bigger rework, though ... Totally agree, with the c interface to do that is better. then the vm_runtest.sh would be clean. It's a bigger rework outside of this topic... > > -- > Cheers > > David / dhildenb > -- ---- Thanks, Chunyu Hu
On Wed, Aug 27, 2025 at 01:41:34PM +0200, David Hildenbrand wrote: > On 27.08.25 09:52, Chunyu Hu wrote: > > The nr_hugepgs variable is used to keep the original nr_hugepages at the > > hugepage setup step at test beginning. After userfaultfd test, a cleaup is > > executed, both /sys/kernel/mm/hugepages/hugepages-*/nr_hugepages and > > /proc/sys//vm/nr_hugepages are reset to 'original' value before userfaultfd > > test starts. > > > > Issue here is the value used to restore /proc/sys/vm/nr_hugepages is > > nr_hugepgs which is the initial value before the vm_runtests.sh runs, not > > the value before userfaultfd test starts. 'va_high_addr_swith.sh' tests > > runs after that will possibly see no hugepages available for test, and got > > EINVAL when mmap(HUGETLB), making the result invalid. > > > > And before pkey tests, nr_hugepgs is changed to be used as a temp variable > > to save nr_hugepages before pkey test, and restore it after pkey tests > > finish. The original nr_hugepages value is not tracked anymore, so no way > > to restore it after all tests finish. > > > > Add a new variable nr_hugepgs_origin to save the original nr_hugepages, and > > and restore it to nr_hugepages after all tests finish. And change to use > > the nr_hugepgs variable to save the /proc/sys/vm/nr_hugeages after hugepage > > setup, it's also the value before userfaultfd test starts, and the correct > > value to be restored after userfaultfd finishes. The va_high_addr_switch.sh > > broken will be resolved. > > > > Signed-off-by: Chunyu Hu <chuhu@redhat.com> > > --- > > tools/testing/selftests/mm/run_vmtests.sh | 9 +++++++-- > > 1 file changed, 7 insertions(+), 2 deletions(-) > > > > diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh > > index 471e539d82b8..f1a7ad3ec6a7 100755 > > --- a/tools/testing/selftests/mm/run_vmtests.sh > > +++ b/tools/testing/selftests/mm/run_vmtests.sh > > @@ -172,13 +172,13 @@ fi > > # set proper nr_hugepages > > if [ -n "$freepgs" ] && [ -n "$hpgsize_KB" ]; then > > - nr_hugepgs=$(cat /proc/sys/vm/nr_hugepages) > > + nr_hugepgs_origin=$(cat /proc/sys/vm/nr_hugepages) > > I'd call this "orig_nr_hugepgs". [RESEND to add back the unexpectedly dropped mail list, sorry for that] Hi David, Thank you for your review and valuable feedback. I will rename it with a v2 and resend the two patches. > > But it's a shame that the naming is then out of sync with nr_size_hugepgs? nr_size_hugepgs is for uffd-wp-mremap, the test need all sizes hugepages, it's used to save and restore the nr_hugepagees of all sizes of hugepages, it's a test case setup, not like nr_hugepgs which is a global/general setup. They are not the same kind, maybe they don't need to be aligned. > > > > needpgs=$((needmem_KB / hpgsize_KB)) > > tries=2 > > while [ "$tries" -gt 0 ] && [ "$freepgs" -lt "$needpgs" ]; do > > lackpgs=$((needpgs - freepgs)) > > echo 3 > /proc/sys/vm/drop_caches > > - if ! echo $((lackpgs + nr_hugepgs)) > /proc/sys/vm/nr_hugepages; then > > + if ! echo $((lackpgs + nr_hugepgs_origin)) > /proc/sys/vm/nr_hugepages; then > > echo "Please run this test as root" > > exit $ksft_skip > > fi > > @@ -189,6 +189,7 @@ if [ -n "$freepgs" ] && [ -n "$hpgsize_KB" ]; then > > done < /proc/meminfo > > tries=$((tries - 1)) > > done > > + nr_hugepgs=$(cat /proc/sys/vm/nr_hugepages) > > if [ "$freepgs" -lt "$needpgs" ]; then > > printf "Not enough huge pages available (%d < %d)\n" \ > > "$freepgs" "$needpgs" > > @@ -532,6 +533,10 @@ CATEGORY="page_frag" run_test ./test_page_frag.sh aligned > > CATEGORY="page_frag" run_test ./test_page_frag.sh nonaligned > > +if [ "${HAVE_HUGEPAGES}" = 1 ]; then > > + echo "$nr_hugepgs_origin" > /proc/sys/vm/nr_hugepages > > +fi > > FWIW, I think the tests should maybe be doing that (save+configure+restore) > themselves, like we do with THP settings through. > > thp_save_settings() > thp_write_settings() > > and friends. For this va_high_addr_switch test, looks like we can do the save_setting, write_setting and restore_setting in the va_high_addr_switch.sh. > > This is not really something run_vmtests.sh should bother with. > > A bigger rework, though ... Totally agree, with the c interface to do that is better. then the vm_runtest.sh would be clean. It's a bigger rework outside of this topic. > > -- > Cheers > > David / dhildenb >
The test will fail as below on x86_64 with cpu la57 support (will skip if
no la57 support). Note, the test requries nr_hugepages to be set first.
# running bash ./va_high_addr_switch.sh
# -------------------------------------
# mmap(addr_switch_hint - pagesize, pagesize): 0x7f55b60fa000 - OK
# mmap(addr_switch_hint - pagesize, (2 * pagesize)): 0x7f55b60f9000 - OK
# mmap(addr_switch_hint, pagesize): 0x800000000000 - OK
# mmap(addr_switch_hint, 2 * pagesize, MAP_FIXED): 0x800000000000 - OK
# mmap(NULL): 0x7f55b60f9000 - OK
# mmap(low_addr): 0x40000000 - OK
# mmap(high_addr): 0x1000000000000 - OK
# mmap(high_addr) again: 0xffff55b6136000 - OK
# mmap(high_addr, MAP_FIXED): 0x1000000000000 - OK
# mmap(-1): 0xffff55b6134000 - OK
# mmap(-1) again: 0xffff55b6132000 - OK
# mmap(addr_switch_hint - pagesize, pagesize): 0x7f55b60fa000 - OK
# mmap(addr_switch_hint - pagesize, 2 * pagesize): 0x7f55b60f9000 - OK
# mmap(addr_switch_hint - pagesize/2 , 2 * pagesize): 0x7f55b60f7000 - OK
# mmap(addr_switch_hint, pagesize): 0x800000000000 - OK
# mmap(addr_switch_hint, 2 * pagesize, MAP_FIXED): 0x800000000000 - OK
# mmap(NULL, MAP_HUGETLB): 0x7f55b5c00000 - OK
# mmap(low_addr, MAP_HUGETLB): 0x40000000 - OK
# mmap(high_addr, MAP_HUGETLB): 0x1000000000000 - OK
# mmap(high_addr, MAP_HUGETLB) again: 0xffff55b5e00000 - OK
# mmap(high_addr, MAP_FIXED | MAP_HUGETLB): 0x1000000000000 - OK
# mmap(-1, MAP_HUGETLB): 0x7f55b5c00000 - OK
# mmap(-1, MAP_HUGETLB) again: 0x7f55b5a00000 - OK
# mmap(addr_switch_hint - pagesize, 2*hugepagesize, MAP_HUGETLB): 0x800000000000 - FAILED
# mmap(addr_switch_hint , 2*hugepagesize, MAP_FIXED | MAP_HUGETLB): 0x800000000000 - OK
# [FAIL]
addr_switch_hint is defined as DFEFAULT_MAP_WINDOW in the failed test (for
x86_64, DFEFAULT_MAP_WINDOW is defined as (1UL<<47) - pagesize) in 64 bit.
Before commit cc92882ee218 ("mm: drop hugetlb_get_unmapped_area{_*}
functions"), for x86_64 hugetlb_get_unmapped_area() is handled in arch code
arch/x86/mm/hugetlbpage.c and addr is checked with map_address_hint_valid()
after align with 'addr &= huge_page_mask(h)' which is a round down way, and
it will fail the check because the addr is within the DEFAULT_MAP_WINDOW but
(addr + len) is above the DFEFAULT_MAP_WINDOW. So it wil go through the
hugetlb_get_unmmaped_area_top_down() to find an area within the
DFEFAULT_MAP_WINDOW.
After commit cc92882ee218 ("mm: drop hugetlb_get_unmapped_area{_*}
functions"). The addr hint for hugetlb_get_unmmaped_area() will be rounded
up and aligned to hugepage size with ALIGN() for all arches. And after the
align, the addr will be above the default MAP_DEFAULT_WINDOW, and the
map_addresshint_valid() check will pass because both aligned addr (addr0)
and (addr + len) are above the DEFAULT_MAP_WINDOW, and the aligned hint
address (0x800000000000) is returned as an suitable gap is found there,
in arch_get_unmapped_area_topdown().
To still cover the case that addr is within the DEFAULT_MAP_WINDOW, and
addr + len is above the DFEFAULT_MAP_WINDOW, make the addr hint one
hugepage lower, so that after the align it's still within DEFAULT_MAP_WINDOW,
and the addr + len (2 hugepages) will be above DEFAULT_MAP_WINDOW.
Fixes: cc92882ee218 ("mm: drop hugetlb_get_unmapped_area{_*} functions")
Signed-off-by: Chunyu Hu <chuhu@redhat.com>
---
tools/testing/selftests/mm/va_high_addr_switch.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/mm/va_high_addr_switch.c b/tools/testing/selftests/mm/va_high_addr_switch.c
index 896b3f73fc53..bd96dc3b5931 100644
--- a/tools/testing/selftests/mm/va_high_addr_switch.c
+++ b/tools/testing/selftests/mm/va_high_addr_switch.c
@@ -230,10 +230,10 @@ void testcases_init(void)
.msg = "mmap(-1, MAP_HUGETLB) again",
},
{
- .addr = (void *)(addr_switch_hint - pagesize),
+ .addr = (void *)(addr_switch_hint - pagesize - hugepagesize),
.size = 2 * hugepagesize,
.flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS,
- .msg = "mmap(addr_switch_hint - pagesize, 2*hugepagesize, MAP_HUGETLB)",
+ .msg = "mmap(addr_switch_hint - pagesize - hugepagesize, 2*hugepagesize, MAP_HUGETLB)",
.low_addr_required = 1,
.keep_mapped = 1,
},
--
2.49.0
© 2016 - 2025 Red Hat, Inc.