The khugepaged_scan_mm_slot() uses a 'progress' counter to limit the
amount of work performed and consists of three components:
1. Transitioning to a new mm (+1).
2. Skipping an unsuitable VMA (+1).
3. Scanning a PMD-sized range (+HPAGE_PMD_NR).
Consider a 1MB VMA sitting between two 2MB alignment boundaries:
vma1 vma2 vma3
+----------+------+----------+
|2M |1M |2M |
+----------+------+----------+
^ ^
start end
^
hstart,hend
In this case, for vma2:
hstart = round_up(start, HPAGE_PMD_SIZE) -> Next 2MB alignment
hend = round_down(end, HPAGE_PMD_SIZE) -> Prev 2MB alignment
Currently, since `hend <= hstart`, VMAs that are too small or unaligned
to contain a hugepage are skipped without incrementing 'progress'.
A process containing a large number of such small VMAs will unfairly
consume more CPU cycles before yielding compared to a process with
fewer, larger, or aligned VMAs.
Fix this by incrementing progress when the `hend <= hstart` condition
is met.
Additionally, change 'progress' type to `unsigned int` to match both
the 'pages' type and the function return value.
Suggested-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
Signed-off-by: Shivank Garg <shivankg@amd.com>
---
Incorporate comment feedback from Lance:
https://lore.kernel.org/linux-mm/6b408736-978a-4d40-adfc-97819951c3a6@linux.dev
mm/khugepaged.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 984294a16861..93ce39915f4a 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2403,7 +2403,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
struct mm_slot *slot;
struct mm_struct *mm;
struct vm_area_struct *vma;
- int progress = 0;
+ unsigned int progress = 0;
VM_BUG_ON(!pages);
lockdep_assert_held(&khugepaged_mm_lock);
@@ -2447,7 +2447,8 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
}
hstart = round_up(vma->vm_start, HPAGE_PMD_SIZE);
hend = round_down(vma->vm_end, HPAGE_PMD_SIZE);
- if (khugepaged_scan.address > hend) {
+ if (khugepaged_scan.address > hend || hend <= hstart) {
+ /* VMA already scanned or too small/unaligned for hugepage. */
progress++;
continue;
}
--
2.43.0
On 19/01/26 12:52 am, Shivank Garg wrote:
> The khugepaged_scan_mm_slot() uses a 'progress' counter to limit the
> amount of work performed and consists of three components:
> 1. Transitioning to a new mm (+1).
> 2. Skipping an unsuitable VMA (+1).
> 3. Scanning a PMD-sized range (+HPAGE_PMD_NR).
>
> Consider a 1MB VMA sitting between two 2MB alignment boundaries:
>
> vma1 vma2 vma3
> +----------+------+----------+
> |2M |1M |2M |
> +----------+------+----------+
> ^ ^
> start end
> ^
> hstart,hend
Won't such a VMA be skipped by thp_vma_allowable_order()? That internally
checks, apart from eligibility by sysfs, that the extent of the VMA can
map a hugepage.
>
> In this case, for vma2:
> hstart = round_up(start, HPAGE_PMD_SIZE) -> Next 2MB alignment
> hend = round_down(end, HPAGE_PMD_SIZE) -> Prev 2MB alignment
>
> Currently, since `hend <= hstart`, VMAs that are too small or unaligned
> to contain a hugepage are skipped without incrementing 'progress'.
> A process containing a large number of such small VMAs will unfairly
> consume more CPU cycles before yielding compared to a process with
> fewer, larger, or aligned VMAs.
>
> Fix this by incrementing progress when the `hend <= hstart` condition
> is met.
>
> Additionally, change 'progress' type to `unsigned int` to match both
> the 'pages' type and the function return value.
>
> Suggested-by: Wei Yang <richard.weiyang@gmail.com>
> Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
> Reviewed-by: Lance Yang <lance.yang@linux.dev>
> Signed-off-by: Shivank Garg <shivankg@amd.com>
> ---
>
> Incorporate comment feedback from Lance:
> https://lore.kernel.org/linux-mm/6b408736-978a-4d40-adfc-97819951c3a6@linux.dev
>
> mm/khugepaged.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 984294a16861..93ce39915f4a 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -2403,7 +2403,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
> struct mm_slot *slot;
> struct mm_struct *mm;
> struct vm_area_struct *vma;
> - int progress = 0;
> + unsigned int progress = 0;
>
> VM_BUG_ON(!pages);
> lockdep_assert_held(&khugepaged_mm_lock);
> @@ -2447,7 +2447,8 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
> }
> hstart = round_up(vma->vm_start, HPAGE_PMD_SIZE);
> hend = round_down(vma->vm_end, HPAGE_PMD_SIZE);
> - if (khugepaged_scan.address > hend) {
> + if (khugepaged_scan.address > hend || hend <= hstart) {
> + /* VMA already scanned or too small/unaligned for hugepage. */
> progress++;
> continue;
> }
On 2026/1/22 15:32, Dev Jain wrote:
>
> On 19/01/26 12:52 am, Shivank Garg wrote:
>> The khugepaged_scan_mm_slot() uses a 'progress' counter to limit the
>> amount of work performed and consists of three components:
>> 1. Transitioning to a new mm (+1).
>> 2. Skipping an unsuitable VMA (+1).
>> 3. Scanning a PMD-sized range (+HPAGE_PMD_NR).
>>
>> Consider a 1MB VMA sitting between two 2MB alignment boundaries:
>>
>> vma1 vma2 vma3
>> +----------+------+----------+
>> |2M |1M |2M |
>> +----------+------+----------+
>> ^ ^
>> start end
>> ^
>> hstart,hend
>
> Won't such a VMA be skipped by thp_vma_allowable_order()? That internally
> checks, apart from eligibility by sysfs, that the extent of the VMA can
> map a hugepage.
Ah, you're right!
I was worrying about a case that doesn't actually happen.
Thanks,
Lance
>
>>
>> In this case, for vma2:
>> hstart = round_up(start, HPAGE_PMD_SIZE) -> Next 2MB alignment
>> hend = round_down(end, HPAGE_PMD_SIZE) -> Prev 2MB alignment
>>
>> Currently, since `hend <= hstart`, VMAs that are too small or unaligned
>> to contain a hugepage are skipped without incrementing 'progress'.
>> A process containing a large number of such small VMAs will unfairly
>> consume more CPU cycles before yielding compared to a process with
>> fewer, larger, or aligned VMAs.
>>
>> Fix this by incrementing progress when the `hend <= hstart` condition
>> is met.
>>
>> Additionally, change 'progress' type to `unsigned int` to match both
>> the 'pages' type and the function return value.
>>
>> Suggested-by: Wei Yang <richard.weiyang@gmail.com>
>> Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
>> Reviewed-by: Lance Yang <lance.yang@linux.dev>
>> Signed-off-by: Shivank Garg <shivankg@amd.com>
>> ---
>>
>> Incorporate comment feedback from Lance:
>> https://lore.kernel.org/linux-mm/6b408736-978a-4d40-adfc-97819951c3a6@linux.dev
>>
>> mm/khugepaged.c | 5 +++--
>> 1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> index 984294a16861..93ce39915f4a 100644
>> --- a/mm/khugepaged.c
>> +++ b/mm/khugepaged.c
>> @@ -2403,7 +2403,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
>> struct mm_slot *slot;
>> struct mm_struct *mm;
>> struct vm_area_struct *vma;
>> - int progress = 0;
>> + unsigned int progress = 0;
>>
>> VM_BUG_ON(!pages);
>> lockdep_assert_held(&khugepaged_mm_lock);
>> @@ -2447,7 +2447,8 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
>> }
>> hstart = round_up(vma->vm_start, HPAGE_PMD_SIZE);
>> hend = round_down(vma->vm_end, HPAGE_PMD_SIZE);
>> - if (khugepaged_scan.address > hend) {
>> + if (khugepaged_scan.address > hend || hend <= hstart) {
>> + /* VMA already scanned or too small/unaligned for hugepage. */
>> progress++;
>> continue;
>> }
On 1/22/2026 2:14 PM, Lance Yang wrote: > > > On 2026/1/22 15:32, Dev Jain wrote: >> >> On 19/01/26 12:52 am, Shivank Garg wrote: >>> The khugepaged_scan_mm_slot() uses a 'progress' counter to limit the >>> amount of work performed and consists of three components: >>> 1. Transitioning to a new mm (+1). >>> 2. Skipping an unsuitable VMA (+1). >>> 3. Scanning a PMD-sized range (+HPAGE_PMD_NR). >>> >>> Consider a 1MB VMA sitting between two 2MB alignment boundaries: >>> >>> vma1 vma2 vma3 >>> +----------+------+----------+ >>> |2M |1M |2M | >>> +----------+------+----------+ >>> ^ ^ >>> start end >>> ^ >>> hstart,hend >> >> Won't such a VMA be skipped by thp_vma_allowable_order()? That internally >> checks, apart from eligibility by sysfs, that the extent of the VMA can >> map a hugepage. > > Ah, you're right! > > I was worrying about a case that doesn't actually happen. > You're right, thp_vma_allowable_order() is taking care of this, making hend <= hstart check redundant. Thank you for catching this. I'll drop this change and send revision keeping only the unsigned int type change for 'progress'. Thanks, Shivank
On 1/22/2026 5:56 PM, Garg, Shivank wrote: > > > On 1/22/2026 2:14 PM, Lance Yang wrote: >> >> >> On 2026/1/22 15:32, Dev Jain wrote: >>> >>> On 19/01/26 12:52 am, Shivank Garg wrote: >>>> The khugepaged_scan_mm_slot() uses a 'progress' counter to limit the >>>> amount of work performed and consists of three components: >>>> 1. Transitioning to a new mm (+1). >>>> 2. Skipping an unsuitable VMA (+1). >>>> 3. Scanning a PMD-sized range (+HPAGE_PMD_NR). >>>> >>>> Consider a 1MB VMA sitting between two 2MB alignment boundaries: >>>> >>>> vma1 vma2 vma3 >>>> +----------+------+----------+ >>>> |2M |1M |2M | >>>> +----------+------+----------+ >>>> ^ ^ >>>> start end >>>> ^ >>>> hstart,hend >>> >>> Won't such a VMA be skipped by thp_vma_allowable_order()? That internally >>> checks, apart from eligibility by sysfs, that the extent of the VMA can >>> map a hugepage. >> >> Ah, you're right! >> >> I was worrying about a case that doesn't actually happen. >> > You're right, thp_vma_allowable_order() is taking care of this, making > hend <= hstart check redundant. > > Thank you for catching this. > > I'll drop this change and send revision keeping only the unsigned int type > change for 'progress'. > Hi Andrew, Please drop this patch. As Dev and Lance noted that the hend <= hstart handling check is redundant. The 'progress' variable to unsigned int is not critical either. Other patches from series remain unchanged. Sorry for the noise, and thanks. Regards, Shivank
On Fri, 23 Jan 2026 16:12:07 +0530 "Garg, Shivank" <shivankg@amd.com> wrote: > >> I was worrying about a case that doesn't actually happen. > >> > > You're right, thp_vma_allowable_order() is taking care of this, making > > hend <= hstart check redundant. > > > > Thank you for catching this. > > > > I'll drop this change and send revision keeping only the unsigned int type > > change for 'progress'. > > > > Hi Andrew, > > Please drop this patch. thud. > As Dev and Lance noted that the hend <= hstart handling check is redundant. > > The 'progress' variable to unsigned int is not critical either. OK. Were you thinking of adopting Dev's suggestion? https://lkml.kernel.org/r/6486c6dd-2702-4a4d-9662-09639532ce6f@arm.com. If so, let's defer until the next cycle, please. Now is the time to be focusing on stabilization, test, finishing up review, etc.
On 1/23/2026 9:07 PM, Andrew Morton wrote: > On Fri, 23 Jan 2026 16:12:07 +0530 "Garg, Shivank" <shivankg@amd.com> wrote: > >>>> I was worrying about a case that doesn't actually happen. >>>> >>> You're right, thp_vma_allowable_order() is taking care of this, making >>> hend <= hstart check redundant. >>> >>> Thank you for catching this. >>> >>> I'll drop this change and send revision keeping only the unsigned int type >>> change for 'progress'. >>> >> >> Hi Andrew, >> >> Please drop this patch. > > thud. > >> As Dev and Lance noted that the hend <= hstart handling check is redundant. >> >> The 'progress' variable to unsigned int is not critical either. > > OK. > > Were you thinking of adopting Dev's suggestion? > https://lkml.kernel.org/r/6486c6dd-2702-4a4d-9662-09639532ce6f@arm.com. > > If so, let's defer until the next cycle, please. Now is the time to be > focusing on stabilization, test, finishing up review, etc. > Thanks Andrew, understood. I'll defer Dev's suggestion to next cycle. Thanks for the guidance. :) Best Regards, Shivank
© 2016 - 2026 Red Hat, Inc.