[PATCH 0/2] mm/khugepaged: fix sub-PMD MADV_COLLAPSE range handling

Chen Wandun posted 2 patches 1 month, 1 week ago
mm/khugepaged.c                               |   9 +-
tools/testing/selftests/mm/.gitignore         |   1 +
tools/testing/selftests/mm/Makefile           |   2 +
.../selftests/mm/ksft_madv_collapse.sh        |   4 +
.../selftests/mm/madv_collapse_range.c        | 141 ++++++++++++++++++
tools/testing/selftests/mm/run_vmtests.sh     |   5 +
6 files changed, 159 insertions(+), 3 deletions(-)
create mode 100755 tools/testing/selftests/mm/ksft_madv_collapse.sh
create mode 100644 tools/testing/selftests/mm/madv_collapse_range.c
[PATCH 0/2] mm/khugepaged: fix sub-PMD MADV_COLLAPSE range handling
Posted by Chen Wandun 1 month, 1 week ago
madvise_collapse() computes a THP-aligned window from the caller's range:

  hstart = (start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK  /* round up  */
  hend   =  end   &  HPAGE_PMD_MASK                    /* round down */

When the caller's range is smaller than one PMD (2 MiB) and/or not
PMD-aligned, hstart can end up greater than hend.  In that case the
collapsing loop is correctly skipped, but the return value was computed
as ((hend - hstart) >> HPAGE_PMD_SHIFT): with hstart > hend the
subtraction wraps unsigned, producing a huge value, the comparison
"thps != 0" fires, and -EINVAL is returned instead of 0.

A concrete example:

  /* both cover less than one THP; both should return 0 */
  madvise(aligned, PAGE_SIZE, MADV_COLLAPSE);             /* OK, returns 0 */
  madvise(aligned + PAGE_SIZE, PAGE_SIZE, MADV_COLLAPSE); /* returns -EINVAL */

The fix moves the hstart/hend calculation before kmalloc_obj() and
returns 0 early when hstart >= hend.  This also avoids the kmalloc,
mmgrab(), and lru_add_drain_all() calls for ranges that trivially
contain no PMD window.  The same effect could be achieved by only
guarding the final return expression, but early-return keeps the
no-op path free of the allocator and drain overhead.

Patch 1 fixes the kernel bug.
Patch 2 adds a selftest with two cases covering the hstart == hend
(aligned, was already correct) and hstart > hend (unaligned, was
broken) scenarios.

Chen Wandun (2):
  mm/khugepaged: fix spurious -EINVAL from sub-PMD MADV_COLLAPSE range
  selftests/mm: add MADV_COLLAPSE sub-PMD range tests

 mm/khugepaged.c                               |   9 +-
 tools/testing/selftests/mm/.gitignore         |   1 +
 tools/testing/selftests/mm/Makefile           |   2 +
 .../selftests/mm/ksft_madv_collapse.sh        |   4 +
 .../selftests/mm/madv_collapse_range.c        | 141 ++++++++++++++++++
 tools/testing/selftests/mm/run_vmtests.sh     |   5 +
 6 files changed, 159 insertions(+), 3 deletions(-)
 create mode 100755 tools/testing/selftests/mm/ksft_madv_collapse.sh
 create mode 100644 tools/testing/selftests/mm/madv_collapse_range.c

-- 
2.43.0
Re: [PATCH 0/2] mm/khugepaged: fix sub-PMD MADV_COLLAPSE range handling
Posted by Lance Yang 1 month ago
Hi,

scripts/get_maintainer.pl is your friend :)
Please use it to Cc the relevant maintainers and reviewers next time.

Cheers, Lance

On Thu, May 07, 2026 at 03:05:56PM +0800, Chen Wandun wrote:
>madvise_collapse() computes a THP-aligned window from the caller's range:
>
>  hstart = (start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK  /* round up  */
>  hend   =  end   &  HPAGE_PMD_MASK                    /* round down */
>
>When the caller's range is smaller than one PMD (2 MiB) and/or not
>PMD-aligned, hstart can end up greater than hend.  In that case the
>collapsing loop is correctly skipped, but the return value was computed
>as ((hend - hstart) >> HPAGE_PMD_SHIFT): with hstart > hend the
>subtraction wraps unsigned, producing a huge value, the comparison
>"thps != 0" fires, and -EINVAL is returned instead of 0.
>
>A concrete example:
>
>  /* both cover less than one THP; both should return 0 */
>  madvise(aligned, PAGE_SIZE, MADV_COLLAPSE);             /* OK, returns 0 */
>  madvise(aligned + PAGE_SIZE, PAGE_SIZE, MADV_COLLAPSE); /* returns -EINVAL */
>
>The fix moves the hstart/hend calculation before kmalloc_obj() and
>returns 0 early when hstart >= hend.  This also avoids the kmalloc,
>mmgrab(), and lru_add_drain_all() calls for ranges that trivially
>contain no PMD window.  The same effect could be achieved by only
>guarding the final return expression, but early-return keeps the
>no-op path free of the allocator and drain overhead.
>
>Patch 1 fixes the kernel bug.
>Patch 2 adds a selftest with two cases covering the hstart == hend
>(aligned, was already correct) and hstart > hend (unaligned, was
>broken) scenarios.
>
>Chen Wandun (2):
>  mm/khugepaged: fix spurious -EINVAL from sub-PMD MADV_COLLAPSE range
>  selftests/mm: add MADV_COLLAPSE sub-PMD range tests
>
> mm/khugepaged.c                               |   9 +-
> tools/testing/selftests/mm/.gitignore         |   1 +
> tools/testing/selftests/mm/Makefile           |   2 +
> .../selftests/mm/ksft_madv_collapse.sh        |   4 +
> .../selftests/mm/madv_collapse_range.c        | 141 ++++++++++++++++++
> tools/testing/selftests/mm/run_vmtests.sh     |   5 +
> 6 files changed, 159 insertions(+), 3 deletions(-)
> create mode 100755 tools/testing/selftests/mm/ksft_madv_collapse.sh
> create mode 100644 tools/testing/selftests/mm/madv_collapse_range.c
>
>-- 
>2.43.0
>
>
Re: [PATCH 0/2] mm/khugepaged: fix sub-PMD MADV_COLLAPSE range handling
Posted by Wandun 1 month ago

On 5/9/26 17:47, Lance Yang wrote:
> Hi,
>
> scripts/get_maintainer.pl is your friend :)
> Please use it to Cc the relevant maintainers and reviewers next time.
Many thanks for your kind reminder :)
I will do it next time.

Best regards,
Wandun
>
> Cheers, Lance
>
> On Thu, May 07, 2026 at 03:05:56PM +0800, Chen Wandun wrote:
>> madvise_collapse() computes a THP-aligned window from the caller's range:
>>
>>   hstart = (start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK  /* round up  */
>>   hend   =  end   &  HPAGE_PMD_MASK                    /* round down */
>>
>> When the caller's range is smaller than one PMD (2 MiB) and/or not
>> PMD-aligned, hstart can end up greater than hend.  In that case the
>> collapsing loop is correctly skipped, but the return value was computed
>> as ((hend - hstart) >> HPAGE_PMD_SHIFT): with hstart > hend the
>> subtraction wraps unsigned, producing a huge value, the comparison
>> "thps != 0" fires, and -EINVAL is returned instead of 0.
>>
>> A concrete example:
>>
>>   /* both cover less than one THP; both should return 0 */
>>   madvise(aligned, PAGE_SIZE, MADV_COLLAPSE);             /* OK, returns 0 */
>>   madvise(aligned + PAGE_SIZE, PAGE_SIZE, MADV_COLLAPSE); /* returns -EINVAL */
>>
>> The fix moves the hstart/hend calculation before kmalloc_obj() and
>> returns 0 early when hstart >= hend.  This also avoids the kmalloc,
>> mmgrab(), and lru_add_drain_all() calls for ranges that trivially
>> contain no PMD window.  The same effect could be achieved by only
>> guarding the final return expression, but early-return keeps the
>> no-op path free of the allocator and drain overhead.
>>
>> Patch 1 fixes the kernel bug.
>> Patch 2 adds a selftest with two cases covering the hstart == hend
>> (aligned, was already correct) and hstart > hend (unaligned, was
>> broken) scenarios.
>>
>> Chen Wandun (2):
>>   mm/khugepaged: fix spurious -EINVAL from sub-PMD MADV_COLLAPSE range
>>   selftests/mm: add MADV_COLLAPSE sub-PMD range tests
>>
>> mm/khugepaged.c                               |   9 +-
>> tools/testing/selftests/mm/.gitignore         |   1 +
>> tools/testing/selftests/mm/Makefile           |   2 +
>> .../selftests/mm/ksft_madv_collapse.sh        |   4 +
>> .../selftests/mm/madv_collapse_range.c        | 141 ++++++++++++++++++
>> tools/testing/selftests/mm/run_vmtests.sh     |   5 +
>> 6 files changed, 159 insertions(+), 3 deletions(-)
>> create mode 100755 tools/testing/selftests/mm/ksft_madv_collapse.sh
>> create mode 100644 tools/testing/selftests/mm/madv_collapse_range.c
>>
>> -- 
>> 2.43.0
>>
>>