[v3] mm/khugepaged: cleanups and scan limit fix

[PATCH V3 2/5] mm/khugepaged: count small VMAs towards scan limit

Posted by Shivank Garg 2 weeks, 6 days ago

The khugepaged_scan_mm_slot() uses a 'progress' counter to limit the
amount of work performed and consists of three components:
1. Transitioning to a new mm (+1).
2. Skipping an unsuitable VMA (+1).
3. Scanning a PMD-sized range (+HPAGE_PMD_NR).

Consider a 1MB VMA sitting between two 2MB alignment boundaries:

     vma1       vma2   vma3
    +----------+------+----------+
    |2M        |1M    |2M        |
    +----------+------+----------+
               ^      ^
               start  end
               ^
          hstart,hend

In this case, for vma2:
  hstart = round_up(start, HPAGE_PMD_SIZE)  -> Next 2MB alignment
  hend   = round_down(end, HPAGE_PMD_SIZE) -> Prev 2MB alignment

Currently, since `hend <= hstart`, VMAs that are too small or unaligned
to contain a hugepage are skipped without incrementing 'progress'.
A process containing a large number of such small VMAs will unfairly
consume more CPU cycles before yielding compared to a process with
fewer, larger, or aligned VMAs.

Fix this by incrementing progress when the `hend <= hstart` condition
is met.

Additionally, change 'progress' type to `unsigned int` to match both
the 'pages' type and the function return value.

Suggested-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
Signed-off-by: Shivank Garg <shivankg@amd.com>
---

Incorporate comment feedback from Lance:
https://lore.kernel.org/linux-mm/6b408736-978a-4d40-adfc-97819951c3a6@linux.dev

 mm/khugepaged.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 984294a16861..93ce39915f4a 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2403,7 +2403,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
 	struct mm_slot *slot;
 	struct mm_struct *mm;
 	struct vm_area_struct *vma;
-	int progress = 0;
+	unsigned int progress = 0;
 
 	VM_BUG_ON(!pages);
 	lockdep_assert_held(&khugepaged_mm_lock);
@@ -2447,7 +2447,8 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
 		}
 		hstart = round_up(vma->vm_start, HPAGE_PMD_SIZE);
 		hend = round_down(vma->vm_end, HPAGE_PMD_SIZE);
-		if (khugepaged_scan.address > hend) {
+		if (khugepaged_scan.address > hend || hend <= hstart) {
+			/* VMA already scanned or too small/unaligned for hugepage. */
 			progress++;
 			continue;
 		}
-- 
2.43.0

Re: [PATCH V3 2/5] mm/khugepaged: count small VMAs towards scan limit

Posted by Dev Jain 2 weeks, 3 days ago

On 19/01/26 12:52 am, Shivank Garg wrote:
> The khugepaged_scan_mm_slot() uses a 'progress' counter to limit the
> amount of work performed and consists of three components:
> 1. Transitioning to a new mm (+1).
> 2. Skipping an unsuitable VMA (+1).
> 3. Scanning a PMD-sized range (+HPAGE_PMD_NR).
>
> Consider a 1MB VMA sitting between two 2MB alignment boundaries:
>
>      vma1       vma2   vma3
>     +----------+------+----------+
>     |2M        |1M    |2M        |
>     +----------+------+----------+
>                ^      ^
>                start  end
>                ^
>           hstart,hend

Won't such a VMA be skipped by thp_vma_allowable_order()? That internally
checks, apart from eligibility by sysfs, that the extent of the VMA can
map a hugepage.

>
> In this case, for vma2:
>   hstart = round_up(start, HPAGE_PMD_SIZE)  -> Next 2MB alignment
>   hend   = round_down(end, HPAGE_PMD_SIZE) -> Prev 2MB alignment
>
> Currently, since `hend <= hstart`, VMAs that are too small or unaligned
> to contain a hugepage are skipped without incrementing 'progress'.
> A process containing a large number of such small VMAs will unfairly
> consume more CPU cycles before yielding compared to a process with
> fewer, larger, or aligned VMAs.
>
> Fix this by incrementing progress when the `hend <= hstart` condition
> is met.
>
> Additionally, change 'progress' type to `unsigned int` to match both
> the 'pages' type and the function return value.
>
> Suggested-by: Wei Yang <richard.weiyang@gmail.com>
> Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
> Reviewed-by: Lance Yang <lance.yang@linux.dev>
> Signed-off-by: Shivank Garg <shivankg@amd.com>
> ---
>
> Incorporate comment feedback from Lance:
> https://lore.kernel.org/linux-mm/6b408736-978a-4d40-adfc-97819951c3a6@linux.dev
>
>  mm/khugepaged.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 984294a16861..93ce39915f4a 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -2403,7 +2403,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
>  	struct mm_slot *slot;
>  	struct mm_struct *mm;
>  	struct vm_area_struct *vma;
> -	int progress = 0;
> +	unsigned int progress = 0;
>  
>  	VM_BUG_ON(!pages);
>  	lockdep_assert_held(&khugepaged_mm_lock);
> @@ -2447,7 +2447,8 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
>  		}
>  		hstart = round_up(vma->vm_start, HPAGE_PMD_SIZE);
>  		hend = round_down(vma->vm_end, HPAGE_PMD_SIZE);
> -		if (khugepaged_scan.address > hend) {
> +		if (khugepaged_scan.address > hend || hend <= hstart) {
> +			/* VMA already scanned or too small/unaligned for hugepage. */
>  			progress++;
>  			continue;
>  		}

Re: [PATCH V3 2/5] mm/khugepaged: count small VMAs towards scan limit

Posted by Lance Yang 2 weeks, 2 days ago


On 2026/1/22 15:32, Dev Jain wrote:
> 
> On 19/01/26 12:52 am, Shivank Garg wrote:
>> The khugepaged_scan_mm_slot() uses a 'progress' counter to limit the
>> amount of work performed and consists of three components:
>> 1. Transitioning to a new mm (+1).
>> 2. Skipping an unsuitable VMA (+1).
>> 3. Scanning a PMD-sized range (+HPAGE_PMD_NR).
>>
>> Consider a 1MB VMA sitting between two 2MB alignment boundaries:
>>
>>       vma1       vma2   vma3
>>      +----------+------+----------+
>>      |2M        |1M    |2M        |
>>      +----------+------+----------+
>>                 ^      ^
>>                 start  end
>>                 ^
>>            hstart,hend
> 
> Won't such a VMA be skipped by thp_vma_allowable_order()? That internally
> checks, apart from eligibility by sysfs, that the extent of the VMA can
> map a hugepage.

Ah, you're right!

I was worrying about a case that doesn't actually happen.

Thanks,
Lance

> 
>>
>> In this case, for vma2:
>>    hstart = round_up(start, HPAGE_PMD_SIZE)  -> Next 2MB alignment
>>    hend   = round_down(end, HPAGE_PMD_SIZE) -> Prev 2MB alignment
>>
>> Currently, since `hend <= hstart`, VMAs that are too small or unaligned
>> to contain a hugepage are skipped without incrementing 'progress'.
>> A process containing a large number of such small VMAs will unfairly
>> consume more CPU cycles before yielding compared to a process with
>> fewer, larger, or aligned VMAs.
>>
>> Fix this by incrementing progress when the `hend <= hstart` condition
>> is met.
>>
>> Additionally, change 'progress' type to `unsigned int` to match both
>> the 'pages' type and the function return value.
>>
>> Suggested-by: Wei Yang <richard.weiyang@gmail.com>
>> Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
>> Reviewed-by: Lance Yang <lance.yang@linux.dev>
>> Signed-off-by: Shivank Garg <shivankg@amd.com>
>> ---
>>
>> Incorporate comment feedback from Lance:
>> https://lore.kernel.org/linux-mm/6b408736-978a-4d40-adfc-97819951c3a6@linux.dev
>>
>>   mm/khugepaged.c | 5 +++--
>>   1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> index 984294a16861..93ce39915f4a 100644
>> --- a/mm/khugepaged.c
>> +++ b/mm/khugepaged.c
>> @@ -2403,7 +2403,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
>>   	struct mm_slot *slot;
>>   	struct mm_struct *mm;
>>   	struct vm_area_struct *vma;
>> -	int progress = 0;
>> +	unsigned int progress = 0;
>>   
>>   	VM_BUG_ON(!pages);
>>   	lockdep_assert_held(&khugepaged_mm_lock);
>> @@ -2447,7 +2447,8 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
>>   		}
>>   		hstart = round_up(vma->vm_start, HPAGE_PMD_SIZE);
>>   		hend = round_down(vma->vm_end, HPAGE_PMD_SIZE);
>> -		if (khugepaged_scan.address > hend) {
>> +		if (khugepaged_scan.address > hend || hend <= hstart) {
>> +			/* VMA already scanned or too small/unaligned for hugepage. */
>>   			progress++;
>>   			continue;
>>   		}

Re: [PATCH V3 2/5] mm/khugepaged: count small VMAs towards scan limit

Posted by Garg, Shivank 2 weeks, 2 days ago


On 1/22/2026 2:14 PM, Lance Yang wrote:
> 
> 
> On 2026/1/22 15:32, Dev Jain wrote:
>>
>> On 19/01/26 12:52 am, Shivank Garg wrote:
>>> The khugepaged_scan_mm_slot() uses a 'progress' counter to limit the
>>> amount of work performed and consists of three components:
>>> 1. Transitioning to a new mm (+1).
>>> 2. Skipping an unsuitable VMA (+1).
>>> 3. Scanning a PMD-sized range (+HPAGE_PMD_NR).
>>>
>>> Consider a 1MB VMA sitting between two 2MB alignment boundaries:
>>>
>>>       vma1       vma2   vma3
>>>      +----------+------+----------+
>>>      |2M        |1M    |2M        |
>>>      +----------+------+----------+
>>>                 ^      ^
>>>                 start  end
>>>                 ^
>>>            hstart,hend
>>
>> Won't such a VMA be skipped by thp_vma_allowable_order()? That internally
>> checks, apart from eligibility by sysfs, that the extent of the VMA can
>> map a hugepage.
> 
> Ah, you're right!
> 
> I was worrying about a case that doesn't actually happen.
> 
You're right, thp_vma_allowable_order() is taking care of this, making 
hend <= hstart check redundant.

Thank you for catching this.

I'll drop this change and send revision keeping only the unsigned int type 
change for 'progress'.

Thanks,
Shivank

Re: [PATCH V3 2/5] mm/khugepaged: count small VMAs towards scan limit

Posted by Garg, Shivank 2 weeks, 1 day ago


On 1/22/2026 5:56 PM, Garg, Shivank wrote:
> 
> 
> On 1/22/2026 2:14 PM, Lance Yang wrote:
>>
>>
>> On 2026/1/22 15:32, Dev Jain wrote:
>>>
>>> On 19/01/26 12:52 am, Shivank Garg wrote:
>>>> The khugepaged_scan_mm_slot() uses a 'progress' counter to limit the
>>>> amount of work performed and consists of three components:
>>>> 1. Transitioning to a new mm (+1).
>>>> 2. Skipping an unsuitable VMA (+1).
>>>> 3. Scanning a PMD-sized range (+HPAGE_PMD_NR).
>>>>
>>>> Consider a 1MB VMA sitting between two 2MB alignment boundaries:
>>>>
>>>>       vma1       vma2   vma3
>>>>      +----------+------+----------+
>>>>      |2M        |1M    |2M        |
>>>>      +----------+------+----------+
>>>>                 ^      ^
>>>>                 start  end
>>>>                 ^
>>>>            hstart,hend
>>>
>>> Won't such a VMA be skipped by thp_vma_allowable_order()? That internally
>>> checks, apart from eligibility by sysfs, that the extent of the VMA can
>>> map a hugepage.
>>
>> Ah, you're right!
>>
>> I was worrying about a case that doesn't actually happen.
>>
> You're right, thp_vma_allowable_order() is taking care of this, making 
> hend <= hstart check redundant.
> 
> Thank you for catching this.
> 
> I'll drop this change and send revision keeping only the unsigned int type 
> change for 'progress'.
> 

Hi Andrew,

Please drop this patch.

As Dev and Lance noted that the hend <= hstart handling check is redundant.

The 'progress' variable to unsigned int is not critical either.

Other patches from series remain unchanged.
Sorry for the noise, and thanks.

Regards,
Shivank

Re: [PATCH V3 2/5] mm/khugepaged: count small VMAs towards scan limit

Posted by Andrew Morton 2 weeks, 1 day ago

On Fri, 23 Jan 2026 16:12:07 +0530 "Garg, Shivank" <shivankg@amd.com> wrote:

> >> I was worrying about a case that doesn't actually happen.
> >>
> > You're right, thp_vma_allowable_order() is taking care of this, making 
> > hend <= hstart check redundant.
> > 
> > Thank you for catching this.
> > 
> > I'll drop this change and send revision keeping only the unsigned int type 
> > change for 'progress'.
> > 
> 
> Hi Andrew,
> 
> Please drop this patch.

thud.

> As Dev and Lance noted that the hend <= hstart handling check is redundant.
> 
> The 'progress' variable to unsigned int is not critical either.

OK.

Were you thinking of adopting Dev's suggestion? 
https://lkml.kernel.org/r/6486c6dd-2702-4a4d-9662-09639532ce6f@arm.com.

If so, let's defer until the next cycle, please.  Now is the time to be
focusing on stabilization, test, finishing up review, etc.

Re: [PATCH V3 2/5] mm/khugepaged: count small VMAs towards scan limit

Posted by Garg, Shivank 2 weeks, 1 day ago


On 1/23/2026 9:07 PM, Andrew Morton wrote:
> On Fri, 23 Jan 2026 16:12:07 +0530 "Garg, Shivank" <shivankg@amd.com> wrote:
> 
>>>> I was worrying about a case that doesn't actually happen.
>>>>
>>> You're right, thp_vma_allowable_order() is taking care of this, making 
>>> hend <= hstart check redundant.
>>>
>>> Thank you for catching this.
>>>
>>> I'll drop this change and send revision keeping only the unsigned int type 
>>> change for 'progress'.
>>>
>>
>> Hi Andrew,
>>
>> Please drop this patch.
> 
> thud.
> 
>> As Dev and Lance noted that the hend <= hstart handling check is redundant.
>>
>> The 'progress' variable to unsigned int is not critical either.
> 
> OK.
> 
> Were you thinking of adopting Dev's suggestion? 
> https://lkml.kernel.org/r/6486c6dd-2702-4a4d-9662-09639532ce6f@arm.com.
> 
> If so, let's defer until the next cycle, please.  Now is the time to be
> focusing on stabilization, test, finishing up review, etc.
> 

Thanks Andrew, understood. I'll defer Dev's suggestion to next cycle.
Thanks for the guidance. :)

Best Regards,
Shivank