[v8] Improve khugepaged scan logic

[PATCH mm-new v8 2/4] mm: khugepaged: refine scan progress number

Posted by Vernon Yang 1 month, 1 week ago

From: Vernon Yang <yanglincheng@kylinos.cn>

Currently, each scan always increases "progress" by HPAGE_PMD_NR,
even if only scanning a single PTE/PMD entry.

- When only scanning a sigle PTE entry, let me provide a detailed
  example:

static int hpage_collapse_scan_pmd()
{
	for (addr = start_addr, _pte = pte; _pte < pte + HPAGE_PMD_NR;
	     _pte++, addr += PAGE_SIZE) {
		pte_t pteval = ptep_get(_pte);
		...
		if (pte_uffd_wp(pteval)) { <-- first scan hit
			result = SCAN_PTE_UFFD_WP;
			goto out_unmap;
		}
	}
}

During the first scan, if pte_uffd_wp(pteval) is true, the loop exits
directly. In practice, only one PTE is scanned before termination.
Here, "progress += 1" reflects the actual number of PTEs scanned, but
previously "progress += HPAGE_PMD_NR" always.

- When the memory has been collapsed to PMD, let me provide a detailed
  example:

The following data is traced by bpftrace on a desktop system. After
the system has been left idle for 10 minutes upon booting, a lot of
SCAN_PMD_MAPPED or SCAN_NO_PTE_TABLE are observed during a full scan
by khugepaged.

From trace_mm_khugepaged_scan_pmd and trace_mm_khugepaged_scan_file, the
following statuses were observed, with frequency mentioned next to them:

SCAN_SUCCEED          : 1
SCAN_EXCEED_SHARED_PTE: 2
SCAN_PMD_MAPPED       : 142
SCAN_NO_PTE_TABLE     : 178
total progress size   : 674 MB
Total time            : 419 seconds, include khugepaged_scan_sleep_millisecs

The khugepaged_scan list save all task that support collapse into hugepage,
as long as the task is not destroyed, khugepaged will not remove it from
the khugepaged_scan list. This exist a phenomenon where task has already
collapsed all memory regions into hugepage, but khugepaged continues to
scan it, which wastes CPU time and invalid, and due to
khugepaged_scan_sleep_millisecs (default 10s) causes a long wait for
scanning a large number of invalid task, so scanning really valid task
is later.

After applying this patch, when the memory is either SCAN_PMD_MAPPED or
SCAN_NO_PTE_TABLE, just skip it, as follow:

SCAN_EXCEED_SHARED_PTE: 2
SCAN_PMD_MAPPED       : 147
SCAN_NO_PTE_TABLE     : 173
total progress size   : 45 MB
Total time            : 20 seconds

SCAN_PTE_MAPPED_HUGEPAGE is the same, for detailed data, refer to
https://lore.kernel.org/linux-mm/4qdu7owpmxfh3ugsue775fxarw5g2gcggbxdf5psj75nnu7z2u@cv2uu2yocaxq

Signed-off-by: Vernon Yang <yanglincheng@kylinos.cn>
Reviewed-by: Dev Jain <dev.jain@arm.com>
---
 mm/khugepaged.c | 42 ++++++++++++++++++++++++++++++++----------
 1 file changed, 32 insertions(+), 10 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index e2f6b68a0011..61e25cf5424b 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -68,7 +68,10 @@ enum scan_result {
 static struct task_struct *khugepaged_thread __read_mostly;
 static DEFINE_MUTEX(khugepaged_mutex);
 
-/* default scan 8*HPAGE_PMD_NR ptes (or vmas) every 10 second */
+/*
+ * default scan 8*HPAGE_PMD_NR ptes, pmd_mapped, no_pte_table or vmas
+ * every 10 second.
+ */
 static unsigned int khugepaged_pages_to_scan __read_mostly;
 static unsigned int khugepaged_pages_collapsed;
 static unsigned int khugepaged_full_scans;
@@ -1231,7 +1234,8 @@ static enum scan_result collapse_huge_page(struct mm_struct *mm, unsigned long a
 }
 
 static enum scan_result hpage_collapse_scan_pmd(struct mm_struct *mm,
-		struct vm_area_struct *vma, unsigned long start_addr, bool *mmap_locked,
+		struct vm_area_struct *vma, unsigned long start_addr,
+		bool *mmap_locked, unsigned int *cur_progress,
 		struct collapse_control *cc)
 {
 	pmd_t *pmd;
@@ -1247,19 +1251,27 @@ static enum scan_result hpage_collapse_scan_pmd(struct mm_struct *mm,
 	VM_BUG_ON(start_addr & ~HPAGE_PMD_MASK);
 
 	result = find_pmd_or_thp_or_none(mm, start_addr, &pmd);
-	if (result != SCAN_SUCCEED)
+	if (result != SCAN_SUCCEED) {
+		if (cur_progress)
+			*cur_progress = 1;
 		goto out;
+	}
 
 	memset(cc->node_load, 0, sizeof(cc->node_load));
 	nodes_clear(cc->alloc_nmask);
 	pte = pte_offset_map_lock(mm, pmd, start_addr, &ptl);
 	if (!pte) {
+		if (cur_progress)
+			*cur_progress = 1;
 		result = SCAN_NO_PTE_TABLE;
 		goto out;
 	}
 
 	for (addr = start_addr, _pte = pte; _pte < pte + HPAGE_PMD_NR;
 	     _pte++, addr += PAGE_SIZE) {
+		if (cur_progress)
+			*cur_progress += 1;
+
 		pte_t pteval = ptep_get(_pte);
 		if (pte_none_or_zero(pteval)) {
 			++none_or_zero;
@@ -2279,8 +2291,9 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
 	return result;
 }
 
-static enum scan_result hpage_collapse_scan_file(struct mm_struct *mm, unsigned long addr,
-		struct file *file, pgoff_t start, struct collapse_control *cc)
+static enum scan_result hpage_collapse_scan_file(struct mm_struct *mm,
+		unsigned long addr, struct file *file, pgoff_t start,
+		unsigned int *cur_progress, struct collapse_control *cc)
 {
 	struct folio *folio = NULL;
 	struct address_space *mapping = file->f_mapping;
@@ -2370,6 +2383,12 @@ static enum scan_result hpage_collapse_scan_file(struct mm_struct *mm, unsigned
 		}
 	}
 	rcu_read_unlock();
+	if (cur_progress) {
+		if (result == SCAN_PTE_MAPPED_HUGEPAGE)
+			*cur_progress = 1;
+		else
+			*cur_progress = HPAGE_PMD_NR;
+	}
 
 	if (result == SCAN_SUCCEED) {
 		if (cc->is_khugepaged &&
@@ -2448,6 +2467,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, enum scan_result
 
 		while (khugepaged_scan.address < hend) {
 			bool mmap_locked = true;
+			unsigned int cur_progress = 0;
 
 			cond_resched();
 			if (unlikely(hpage_collapse_test_exit_or_disable(mm)))
@@ -2464,7 +2484,8 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, enum scan_result
 				mmap_read_unlock(mm);
 				mmap_locked = false;
 				*result = hpage_collapse_scan_file(mm,
-					khugepaged_scan.address, file, pgoff, cc);
+					khugepaged_scan.address, file, pgoff,
+					&cur_progress, cc);
 				fput(file);
 				if (*result == SCAN_PTE_MAPPED_HUGEPAGE) {
 					mmap_read_lock(mm);
@@ -2478,7 +2499,8 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, enum scan_result
 				}
 			} else {
 				*result = hpage_collapse_scan_pmd(mm, vma,
-					khugepaged_scan.address, &mmap_locked, cc);
+					khugepaged_scan.address, &mmap_locked,
+					&cur_progress, cc);
 			}
 
 			if (*result == SCAN_SUCCEED)
@@ -2486,7 +2508,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, enum scan_result
 
 			/* move to next address */
 			khugepaged_scan.address += HPAGE_PMD_SIZE;
-			progress += HPAGE_PMD_NR;
+			progress += cur_progress;
 			if (!mmap_locked)
 				/*
 				 * We released mmap_lock so break loop.  Note
@@ -2809,7 +2831,7 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
 			mmap_locked = false;
 			*lock_dropped = true;
 			result = hpage_collapse_scan_file(mm, addr, file, pgoff,
-							  cc);
+							  NULL, cc);
 
 			if (result == SCAN_PAGE_DIRTY_OR_WRITEBACK && !triggered_wb &&
 			    mapping_can_writeback(file->f_mapping)) {
@@ -2824,7 +2846,7 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
 			fput(file);
 		} else {
 			result = hpage_collapse_scan_pmd(mm, vma, addr,
-							 &mmap_locked, cc);
+							 &mmap_locked, NULL, cc);
 		}
 		if (!mmap_locked)
 			*lock_dropped = true;
-- 
2.51.0

Re: [PATCH mm-new v8 2/4] mm: khugepaged: refine scan progress number

Posted by Wei Yang 1 month, 1 week ago

On Sat, Feb 21, 2026 at 05:39:16PM +0800, Vernon Yang wrote:
>From: Vernon Yang <yanglincheng@kylinos.cn>
>
>Currently, each scan always increases "progress" by HPAGE_PMD_NR,
>even if only scanning a single PTE/PMD entry.
>
>- When only scanning a sigle PTE entry, let me provide a detailed
>  example:
>
>static int hpage_collapse_scan_pmd()
>{
>	for (addr = start_addr, _pte = pte; _pte < pte + HPAGE_PMD_NR;
>	     _pte++, addr += PAGE_SIZE) {
>		pte_t pteval = ptep_get(_pte);
>		...
>		if (pte_uffd_wp(pteval)) { <-- first scan hit
>			result = SCAN_PTE_UFFD_WP;
>			goto out_unmap;
>		}
>	}
>}
>
>During the first scan, if pte_uffd_wp(pteval) is true, the loop exits
>directly. In practice, only one PTE is scanned before termination.
>Here, "progress += 1" reflects the actual number of PTEs scanned, but
>previously "progress += HPAGE_PMD_NR" always.
>
>- When the memory has been collapsed to PMD, let me provide a detailed
>  example:
>
>The following data is traced by bpftrace on a desktop system. After
>the system has been left idle for 10 minutes upon booting, a lot of
>SCAN_PMD_MAPPED or SCAN_NO_PTE_TABLE are observed during a full scan
>by khugepaged.
>
>>From trace_mm_khugepaged_scan_pmd and trace_mm_khugepaged_scan_file, the
>following statuses were observed, with frequency mentioned next to them:
>
>SCAN_SUCCEED          : 1
>SCAN_EXCEED_SHARED_PTE: 2
>SCAN_PMD_MAPPED       : 142
>SCAN_NO_PTE_TABLE     : 178
>total progress size   : 674 MB
>Total time            : 419 seconds, include khugepaged_scan_sleep_millisecs
>
>The khugepaged_scan list save all task that support collapse into hugepage,
>as long as the task is not destroyed, khugepaged will not remove it from
>the khugepaged_scan list. This exist a phenomenon where task has already
>collapsed all memory regions into hugepage, but khugepaged continues to
>scan it, which wastes CPU time and invalid, and due to
>khugepaged_scan_sleep_millisecs (default 10s) causes a long wait for
>scanning a large number of invalid task, so scanning really valid task
>is later.
>
>After applying this patch, when the memory is either SCAN_PMD_MAPPED or
>SCAN_NO_PTE_TABLE, just skip it, as follow:
>
>SCAN_EXCEED_SHARED_PTE: 2
>SCAN_PMD_MAPPED       : 147
>SCAN_NO_PTE_TABLE     : 173
>total progress size   : 45 MB
>Total time            : 20 seconds
>
>SCAN_PTE_MAPPED_HUGEPAGE is the same, for detailed data, refer to
>https://lore.kernel.org/linux-mm/4qdu7owpmxfh3ugsue775fxarw5g2gcggbxdf5psj75nnu7z2u@cv2uu2yocaxq
>
>Signed-off-by: Vernon Yang <yanglincheng@kylinos.cn>
>Reviewed-by: Dev Jain <dev.jain@arm.com>
>---
> mm/khugepaged.c | 42 ++++++++++++++++++++++++++++++++----------
> 1 file changed, 32 insertions(+), 10 deletions(-)
>
>diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>index e2f6b68a0011..61e25cf5424b 100644
>--- a/mm/khugepaged.c
>+++ b/mm/khugepaged.c
>@@ -68,7 +68,10 @@ enum scan_result {
> static struct task_struct *khugepaged_thread __read_mostly;
> static DEFINE_MUTEX(khugepaged_mutex);
> 
>-/* default scan 8*HPAGE_PMD_NR ptes (or vmas) every 10 second */
>+/*
>+ * default scan 8*HPAGE_PMD_NR ptes, pmd_mapped, no_pte_table or vmas
>+ * every 10 second.
>+ */
> static unsigned int khugepaged_pages_to_scan __read_mostly;
> static unsigned int khugepaged_pages_collapsed;
> static unsigned int khugepaged_full_scans;
>@@ -1231,7 +1234,8 @@ static enum scan_result collapse_huge_page(struct mm_struct *mm, unsigned long a
> }
> 
> static enum scan_result hpage_collapse_scan_pmd(struct mm_struct *mm,
>-		struct vm_area_struct *vma, unsigned long start_addr, bool *mmap_locked,
>+		struct vm_area_struct *vma, unsigned long start_addr,
>+		bool *mmap_locked, unsigned int *cur_progress,
> 		struct collapse_control *cc)
> {
> 	pmd_t *pmd;
>@@ -1247,19 +1251,27 @@ static enum scan_result hpage_collapse_scan_pmd(struct mm_struct *mm,
> 	VM_BUG_ON(start_addr & ~HPAGE_PMD_MASK);
> 
> 	result = find_pmd_or_thp_or_none(mm, start_addr, &pmd);
>-	if (result != SCAN_SUCCEED)
>+	if (result != SCAN_SUCCEED) {
>+		if (cur_progress)
>+			*cur_progress = 1;
> 		goto out;
>+	}

How about put cur_progress in struct collapse_control?

Then we don't need to check cur_progress every time before modification.

-- 
Wei Yang
Help you, Help me

Re: [PATCH mm-new v8 2/4] mm: khugepaged: refine scan progress number

Posted by Vernon Yang 1 month ago

On Tue, Feb 24, 2026 at 03:52:47AM +0000, Wei Yang wrote:
> On Sat, Feb 21, 2026 at 05:39:16PM +0800, Vernon Yang wrote:
> >From: Vernon Yang <yanglincheng@kylinos.cn>
> >
> >Currently, each scan always increases "progress" by HPAGE_PMD_NR,
> >even if only scanning a single PTE/PMD entry.
> >
> >- When only scanning a sigle PTE entry, let me provide a detailed
> >  example:
> >
> >static int hpage_collapse_scan_pmd()
> >{
> >	for (addr = start_addr, _pte = pte; _pte < pte + HPAGE_PMD_NR;
> >	     _pte++, addr += PAGE_SIZE) {
> >		pte_t pteval = ptep_get(_pte);
> >		...
> >		if (pte_uffd_wp(pteval)) { <-- first scan hit
> >			result = SCAN_PTE_UFFD_WP;
> >			goto out_unmap;
> >		}
> >	}
> >}
> >
> >During the first scan, if pte_uffd_wp(pteval) is true, the loop exits
> >directly. In practice, only one PTE is scanned before termination.
> >Here, "progress += 1" reflects the actual number of PTEs scanned, but
> >previously "progress += HPAGE_PMD_NR" always.
> >
> >- When the memory has been collapsed to PMD, let me provide a detailed
> >  example:
> >
> >The following data is traced by bpftrace on a desktop system. After
> >the system has been left idle for 10 minutes upon booting, a lot of
> >SCAN_PMD_MAPPED or SCAN_NO_PTE_TABLE are observed during a full scan
> >by khugepaged.
> >
> >>From trace_mm_khugepaged_scan_pmd and trace_mm_khugepaged_scan_file, the
> >following statuses were observed, with frequency mentioned next to them:
> >
> >SCAN_SUCCEED          : 1
> >SCAN_EXCEED_SHARED_PTE: 2
> >SCAN_PMD_MAPPED       : 142
> >SCAN_NO_PTE_TABLE     : 178
> >total progress size   : 674 MB
> >Total time            : 419 seconds, include khugepaged_scan_sleep_millisecs
> >
> >The khugepaged_scan list save all task that support collapse into hugepage,
> >as long as the task is not destroyed, khugepaged will not remove it from
> >the khugepaged_scan list. This exist a phenomenon where task has already
> >collapsed all memory regions into hugepage, but khugepaged continues to
> >scan it, which wastes CPU time and invalid, and due to
> >khugepaged_scan_sleep_millisecs (default 10s) causes a long wait for
> >scanning a large number of invalid task, so scanning really valid task
> >is later.
> >
> >After applying this patch, when the memory is either SCAN_PMD_MAPPED or
> >SCAN_NO_PTE_TABLE, just skip it, as follow:
> >
> >SCAN_EXCEED_SHARED_PTE: 2
> >SCAN_PMD_MAPPED       : 147
> >SCAN_NO_PTE_TABLE     : 173
> >total progress size   : 45 MB
> >Total time            : 20 seconds
> >
> >SCAN_PTE_MAPPED_HUGEPAGE is the same, for detailed data, refer to
> >https://lore.kernel.org/linux-mm/4qdu7owpmxfh3ugsue775fxarw5g2gcggbxdf5psj75nnu7z2u@cv2uu2yocaxq
> >
> >Signed-off-by: Vernon Yang <yanglincheng@kylinos.cn>
> >Reviewed-by: Dev Jain <dev.jain@arm.com>
> >---
> > mm/khugepaged.c | 42 ++++++++++++++++++++++++++++++++----------
> > 1 file changed, 32 insertions(+), 10 deletions(-)
> >
> >diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> >index e2f6b68a0011..61e25cf5424b 100644
> >--- a/mm/khugepaged.c
> >+++ b/mm/khugepaged.c
> >@@ -68,7 +68,10 @@ enum scan_result {
> > static struct task_struct *khugepaged_thread __read_mostly;
> > static DEFINE_MUTEX(khugepaged_mutex);
> >
> >-/* default scan 8*HPAGE_PMD_NR ptes (or vmas) every 10 second */
> >+/*
> >+ * default scan 8*HPAGE_PMD_NR ptes, pmd_mapped, no_pte_table or vmas
> >+ * every 10 second.
> >+ */
> > static unsigned int khugepaged_pages_to_scan __read_mostly;
> > static unsigned int khugepaged_pages_collapsed;
> > static unsigned int khugepaged_full_scans;
> >@@ -1231,7 +1234,8 @@ static enum scan_result collapse_huge_page(struct mm_struct *mm, unsigned long a
> > }
> >
> > static enum scan_result hpage_collapse_scan_pmd(struct mm_struct *mm,
> >-		struct vm_area_struct *vma, unsigned long start_addr, bool *mmap_locked,
> >+		struct vm_area_struct *vma, unsigned long start_addr,
> >+		bool *mmap_locked, unsigned int *cur_progress,
> > 		struct collapse_control *cc)
> > {
> > 	pmd_t *pmd;
> >@@ -1247,19 +1251,27 @@ static enum scan_result hpage_collapse_scan_pmd(struct mm_struct *mm,
> > 	VM_BUG_ON(start_addr & ~HPAGE_PMD_MASK);
> >
> > 	result = find_pmd_or_thp_or_none(mm, start_addr, &pmd);
> >-	if (result != SCAN_SUCCEED)
> >+	if (result != SCAN_SUCCEED) {
> >+		if (cur_progress)
> >+			*cur_progress = 1;
> > 		goto out;
> >+	}
>
> How about put cur_progress in struct collapse_control?
>
> Then we don't need to check cur_progress every time before modification.

Thank you for suggestion.

Placing it inside "struct collapse_control" makes the overall code
simpler, there also coincidentally has a 4-bytes hole, as shown below:

struct collapse_control {
        bool                       is_khugepaged;        /*     0     1 */

        /* XXX 3 bytes hole, try to pack */

        u32                        node_load[64];        /*     4   256 */

        /* XXX 4 bytes hole, try to pack */

        /* --- cacheline 4 boundary (256 bytes) was 8 bytes ago --- */
        nodemask_t                 alloc_nmask;          /*   264     8 */

        /* size: 272, cachelines: 5, members: 3 */
        /* sum members: 265, holes: 2, sum holes: 7 */
        /* last cacheline: 16 bytes */
};

But regardless of khugepaged or madvise(MADV_COLLAPSE), "cur_progress"
will be counted, while madvise(MADV_COLLAPSE) actually does not need to
be counted.

David, do we want to place "cur_progress" inside the "struct collapse_control"?
If Yes, it would be better to rename "cur_progress" to "pmd_progress",
as show below:

struct collapse_control {
        bool is_khugepaged;

        /* Num pages scanned per node */
        u32 node_load[MAX_NUMNODES];

        /*
         * Num pages scanned per pmd, include ptes,
         * pte_mapped_hugepage, pmd_mapped or no_pte_table.
         */
        unsigned int pmd_progress;

        /* nodemask for allocation fallback */
        nodemask_t alloc_nmask;
};

--
Cheers,
Vernon

Re: [PATCH mm-new v8 2/4] mm: khugepaged: refine scan progress number

Posted by David Hildenbrand (Arm) 1 month ago

On 2/25/26 15:25, Vernon Yang wrote:
> On Tue, Feb 24, 2026 at 03:52:47AM +0000, Wei Yang wrote:
>> On Sat, Feb 21, 2026 at 05:39:16PM +0800, Vernon Yang wrote:
>>> From: Vernon Yang <yanglincheng@kylinos.cn>
>>>
>>> Currently, each scan always increases "progress" by HPAGE_PMD_NR,
>>> even if only scanning a single PTE/PMD entry.
>>>
>>> - When only scanning a sigle PTE entry, let me provide a detailed
>>>  example:
>>>
>>> static int hpage_collapse_scan_pmd()
>>> {
>>> 	for (addr = start_addr, _pte = pte; _pte < pte + HPAGE_PMD_NR;
>>> 	     _pte++, addr += PAGE_SIZE) {
>>> 		pte_t pteval = ptep_get(_pte);
>>> 		...
>>> 		if (pte_uffd_wp(pteval)) { <-- first scan hit
>>> 			result = SCAN_PTE_UFFD_WP;
>>> 			goto out_unmap;
>>> 		}
>>> 	}
>>> }
>>>
>>> During the first scan, if pte_uffd_wp(pteval) is true, the loop exits
>>> directly. In practice, only one PTE is scanned before termination.
>>> Here, "progress += 1" reflects the actual number of PTEs scanned, but
>>> previously "progress += HPAGE_PMD_NR" always.
>>>
>>> - When the memory has been collapsed to PMD, let me provide a detailed
>>>  example:
>>>
>>> The following data is traced by bpftrace on a desktop system. After
>>> the system has been left idle for 10 minutes upon booting, a lot of
>>> SCAN_PMD_MAPPED or SCAN_NO_PTE_TABLE are observed during a full scan
>>> by khugepaged.
>>>
>>> >From trace_mm_khugepaged_scan_pmd and trace_mm_khugepaged_scan_file, the
>>> following statuses were observed, with frequency mentioned next to them:
>>>
>>> SCAN_SUCCEED          : 1
>>> SCAN_EXCEED_SHARED_PTE: 2
>>> SCAN_PMD_MAPPED       : 142
>>> SCAN_NO_PTE_TABLE     : 178
>>> total progress size   : 674 MB
>>> Total time            : 419 seconds, include khugepaged_scan_sleep_millisecs
>>>
>>> The khugepaged_scan list save all task that support collapse into hugepage,
>>> as long as the task is not destroyed, khugepaged will not remove it from
>>> the khugepaged_scan list. This exist a phenomenon where task has already
>>> collapsed all memory regions into hugepage, but khugepaged continues to
>>> scan it, which wastes CPU time and invalid, and due to
>>> khugepaged_scan_sleep_millisecs (default 10s) causes a long wait for
>>> scanning a large number of invalid task, so scanning really valid task
>>> is later.
>>>
>>> After applying this patch, when the memory is either SCAN_PMD_MAPPED or
>>> SCAN_NO_PTE_TABLE, just skip it, as follow:
>>>
>>> SCAN_EXCEED_SHARED_PTE: 2
>>> SCAN_PMD_MAPPED       : 147
>>> SCAN_NO_PTE_TABLE     : 173
>>> total progress size   : 45 MB
>>> Total time            : 20 seconds
>>>
>>> SCAN_PTE_MAPPED_HUGEPAGE is the same, for detailed data, refer to
>>> https://lore.kernel.org/linux-mm/4qdu7owpmxfh3ugsue775fxarw5g2gcggbxdf5psj75nnu7z2u@cv2uu2yocaxq
>>>
>>> Signed-off-by: Vernon Yang <yanglincheng@kylinos.cn>
>>> Reviewed-by: Dev Jain <dev.jain@arm.com>
>>> ---
>>> mm/khugepaged.c | 42 ++++++++++++++++++++++++++++++++----------
>>> 1 file changed, 32 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>>> index e2f6b68a0011..61e25cf5424b 100644
>>> --- a/mm/khugepaged.c
>>> +++ b/mm/khugepaged.c
>>> @@ -68,7 +68,10 @@ enum scan_result {
>>> static struct task_struct *khugepaged_thread __read_mostly;
>>> static DEFINE_MUTEX(khugepaged_mutex);
>>>
>>> -/* default scan 8*HPAGE_PMD_NR ptes (or vmas) every 10 second */
>>> +/*
>>> + * default scan 8*HPAGE_PMD_NR ptes, pmd_mapped, no_pte_table or vmas
>>> + * every 10 second.
>>> + */
>>> static unsigned int khugepaged_pages_to_scan __read_mostly;
>>> static unsigned int khugepaged_pages_collapsed;
>>> static unsigned int khugepaged_full_scans;
>>> @@ -1231,7 +1234,8 @@ static enum scan_result collapse_huge_page(struct mm_struct *mm, unsigned long a
>>> }
>>>
>>> static enum scan_result hpage_collapse_scan_pmd(struct mm_struct *mm,
>>> -		struct vm_area_struct *vma, unsigned long start_addr, bool *mmap_locked,
>>> +		struct vm_area_struct *vma, unsigned long start_addr,
>>> +		bool *mmap_locked, unsigned int *cur_progress,
>>> 		struct collapse_control *cc)
>>> {
>>> 	pmd_t *pmd;
>>> @@ -1247,19 +1251,27 @@ static enum scan_result hpage_collapse_scan_pmd(struct mm_struct *mm,
>>> 	VM_BUG_ON(start_addr & ~HPAGE_PMD_MASK);
>>>
>>> 	result = find_pmd_or_thp_or_none(mm, start_addr, &pmd);
>>> -	if (result != SCAN_SUCCEED)
>>> +	if (result != SCAN_SUCCEED) {
>>> +		if (cur_progress)
>>> +			*cur_progress = 1;
>>> 		goto out;
>>> +	}
>>
>> How about put cur_progress in struct collapse_control?
>>
>> Then we don't need to check cur_progress every time before modification.
> 
> Thank you for suggestion.
> 
> Placing it inside "struct collapse_control" makes the overall code
> simpler, there also coincidentally has a 4-bytes hole, as shown below:
> 
> struct collapse_control {
>         bool                       is_khugepaged;        /*     0     1 */
> 
>         /* XXX 3 bytes hole, try to pack */
> 
>         u32                        node_load[64];        /*     4   256 */
> 
>         /* XXX 4 bytes hole, try to pack */
> 
>         /* --- cacheline 4 boundary (256 bytes) was 8 bytes ago --- */
>         nodemask_t                 alloc_nmask;          /*   264     8 */
> 
>         /* size: 272, cachelines: 5, members: 3 */
>         /* sum members: 265, holes: 2, sum holes: 7 */
>         /* last cacheline: 16 bytes */
> };
> 
> But regardless of khugepaged or madvise(MADV_COLLAPSE), "cur_progress"
> will be counted, while madvise(MADV_COLLAPSE) actually does not need to
> be counted.
> 
> David, do we want to place "cur_progress" inside the "struct collapse_control"?

Might end up looking nicer code-wise. But the reset semantics (within a
pmd) are a bit weird.

> If Yes, it would be better to rename "cur_progress" to "pmd_progress",
> as show below:
> 

"pmd_progress" is misleading. "progress_in_pmd" might be clearer.

Play with it to see if it looks better :)

-- 
Cheers,

David

Re: [PATCH mm-new v8 2/4] mm: khugepaged: refine scan progress number

Posted by Vernon Yang 1 month ago

On Wed, Feb 25, 2026 at 03:29:05PM +0100, David Hildenbrand (Arm) wrote:
> On 2/25/26 15:25, Vernon Yang wrote:
> > On Tue, Feb 24, 2026 at 03:52:47AM +0000, Wei Yang wrote:
> >> On Sat, Feb 21, 2026 at 05:39:16PM +0800, Vernon Yang wrote:
> >>> From: Vernon Yang <yanglincheng@kylinos.cn>
> >>>
> >>> Currently, each scan always increases "progress" by HPAGE_PMD_NR,
> >>> even if only scanning a single PTE/PMD entry.
> >>>
> >>> - When only scanning a sigle PTE entry, let me provide a detailed
> >>>  example:
> >>>
> >>> static int hpage_collapse_scan_pmd()
> >>> {
> >>> 	for (addr = start_addr, _pte = pte; _pte < pte + HPAGE_PMD_NR;
> >>> 	     _pte++, addr += PAGE_SIZE) {
> >>> 		pte_t pteval = ptep_get(_pte);
> >>> 		...
> >>> 		if (pte_uffd_wp(pteval)) { <-- first scan hit
> >>> 			result = SCAN_PTE_UFFD_WP;
> >>> 			goto out_unmap;
> >>> 		}
> >>> 	}
> >>> }
> >>>
> >>> During the first scan, if pte_uffd_wp(pteval) is true, the loop exits
> >>> directly. In practice, only one PTE is scanned before termination.
> >>> Here, "progress += 1" reflects the actual number of PTEs scanned, but
> >>> previously "progress += HPAGE_PMD_NR" always.
> >>>
> >>> - When the memory has been collapsed to PMD, let me provide a detailed
> >>>  example:
> >>>
> >>> The following data is traced by bpftrace on a desktop system. After
> >>> the system has been left idle for 10 minutes upon booting, a lot of
> >>> SCAN_PMD_MAPPED or SCAN_NO_PTE_TABLE are observed during a full scan
> >>> by khugepaged.
> >>>
> >>> >From trace_mm_khugepaged_scan_pmd and trace_mm_khugepaged_scan_file, the
> >>> following statuses were observed, with frequency mentioned next to them:
> >>>
> >>> SCAN_SUCCEED          : 1
> >>> SCAN_EXCEED_SHARED_PTE: 2
> >>> SCAN_PMD_MAPPED       : 142
> >>> SCAN_NO_PTE_TABLE     : 178
> >>> total progress size   : 674 MB
> >>> Total time            : 419 seconds, include khugepaged_scan_sleep_millisecs
> >>>
> >>> The khugepaged_scan list save all task that support collapse into hugepage,
> >>> as long as the task is not destroyed, khugepaged will not remove it from
> >>> the khugepaged_scan list. This exist a phenomenon where task has already
> >>> collapsed all memory regions into hugepage, but khugepaged continues to
> >>> scan it, which wastes CPU time and invalid, and due to
> >>> khugepaged_scan_sleep_millisecs (default 10s) causes a long wait for
> >>> scanning a large number of invalid task, so scanning really valid task
> >>> is later.
> >>>
> >>> After applying this patch, when the memory is either SCAN_PMD_MAPPED or
> >>> SCAN_NO_PTE_TABLE, just skip it, as follow:
> >>>
> >>> SCAN_EXCEED_SHARED_PTE: 2
> >>> SCAN_PMD_MAPPED       : 147
> >>> SCAN_NO_PTE_TABLE     : 173
> >>> total progress size   : 45 MB
> >>> Total time            : 20 seconds
> >>>
> >>> SCAN_PTE_MAPPED_HUGEPAGE is the same, for detailed data, refer to
> >>> https://lore.kernel.org/linux-mm/4qdu7owpmxfh3ugsue775fxarw5g2gcggbxdf5psj75nnu7z2u@cv2uu2yocaxq
> >>>
> >>> Signed-off-by: Vernon Yang <yanglincheng@kylinos.cn>
> >>> Reviewed-by: Dev Jain <dev.jain@arm.com>
> >>> ---
> >>> mm/khugepaged.c | 42 ++++++++++++++++++++++++++++++++----------
> >>> 1 file changed, 32 insertions(+), 10 deletions(-)
> >>>
> >>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> >>> index e2f6b68a0011..61e25cf5424b 100644
> >>> --- a/mm/khugepaged.c
> >>> +++ b/mm/khugepaged.c
> >>> @@ -68,7 +68,10 @@ enum scan_result {
> >>> static struct task_struct *khugepaged_thread __read_mostly;
> >>> static DEFINE_MUTEX(khugepaged_mutex);
> >>>
> >>> -/* default scan 8*HPAGE_PMD_NR ptes (or vmas) every 10 second */
> >>> +/*
> >>> + * default scan 8*HPAGE_PMD_NR ptes, pmd_mapped, no_pte_table or vmas
> >>> + * every 10 second.
> >>> + */
> >>> static unsigned int khugepaged_pages_to_scan __read_mostly;
> >>> static unsigned int khugepaged_pages_collapsed;
> >>> static unsigned int khugepaged_full_scans;
> >>> @@ -1231,7 +1234,8 @@ static enum scan_result collapse_huge_page(struct mm_struct *mm, unsigned long a
> >>> }
> >>>
> >>> static enum scan_result hpage_collapse_scan_pmd(struct mm_struct *mm,
> >>> -		struct vm_area_struct *vma, unsigned long start_addr, bool *mmap_locked,
> >>> +		struct vm_area_struct *vma, unsigned long start_addr,
> >>> +		bool *mmap_locked, unsigned int *cur_progress,
> >>> 		struct collapse_control *cc)
> >>> {
> >>> 	pmd_t *pmd;
> >>> @@ -1247,19 +1251,27 @@ static enum scan_result hpage_collapse_scan_pmd(struct mm_struct *mm,
> >>> 	VM_BUG_ON(start_addr & ~HPAGE_PMD_MASK);
> >>>
> >>> 	result = find_pmd_or_thp_or_none(mm, start_addr, &pmd);
> >>> -	if (result != SCAN_SUCCEED)
> >>> +	if (result != SCAN_SUCCEED) {
> >>> +		if (cur_progress)
> >>> +			*cur_progress = 1;
> >>> 		goto out;
> >>> +	}
> >>
> >> How about put cur_progress in struct collapse_control?
> >>
> >> Then we don't need to check cur_progress every time before modification.
> >
> > Thank you for suggestion.
> >
> > Placing it inside "struct collapse_control" makes the overall code
> > simpler, there also coincidentally has a 4-bytes hole, as shown below:
> >
> > struct collapse_control {
> >         bool                       is_khugepaged;        /*     0     1 */
> >
> >         /* XXX 3 bytes hole, try to pack */
> >
> >         u32                        node_load[64];        /*     4   256 */
> >
> >         /* XXX 4 bytes hole, try to pack */
> >
> >         /* --- cacheline 4 boundary (256 bytes) was 8 bytes ago --- */
> >         nodemask_t                 alloc_nmask;          /*   264     8 */
> >
> >         /* size: 272, cachelines: 5, members: 3 */
> >         /* sum members: 265, holes: 2, sum holes: 7 */
> >         /* last cacheline: 16 bytes */
> > };
> >
> > But regardless of khugepaged or madvise(MADV_COLLAPSE), "cur_progress"
> > will be counted, while madvise(MADV_COLLAPSE) actually does not need to
> > be counted.
> >
> > David, do we want to place "cur_progress" inside the "struct collapse_control"?
>
> Might end up looking nicer code-wise. But the reset semantics (within a
> pmd) are a bit weird.
>
> > If Yes, it would be better to rename "cur_progress" to "pmd_progress",
> > as show below:
> >
>
> "pmd_progress" is misleading. "progress_in_pmd" might be clearer.
>
> Play with it to see if it looks better :)

Hi Andrew, David,

Based on previous discussions [1], v2 as follow, and testing shows the
same performance benefits. Just make code cleaner, no function changes.

If David has no further revisions, Andrew, could you please squash the
following clean into this patch? If you prefer a new version, please let
me know. Thanks.

[1] https://lore.kernel.org/linux-mm/zdvzmoop5xswqcyiwmvvrdfianm4ccs3gryfecwbm4bhuh7ebo@7an4huwgbuwo

---

From 73e6aa8ffcd5ac1ee510938ff4bdbd24edc86680 Mon Sep 17 00:00:00 2001
From: Vernon Yang <yanglincheng@kylinos.cn>
Date: Thu, 26 Feb 2026 18:24:21 +0800
Subject: [PATCH] mm: khugepaged: simplify scanning progress

Placing "progress" inside "struct collapse_control" makes the overall
code simpler, there also coincidentally has a 4-bytes hole, as shown
below:

struct collapse_control {
        bool                       is_khugepaged;        /*     0     1 */
        /* XXX 3 bytes hole, try to pack */
        u32                        node_load[64];        /*     4   256 */
        /* XXX 4 bytes hole, try to pack */
        /* --- cacheline 4 boundary (256 bytes) was 8 bytes ago --- */
        nodemask_t                 alloc_nmask;          /*   264     8 */

        /* size: 272, cachelines: 5, members: 3 */
        /* sum members: 265, holes: 2, sum holes: 7 */
        /* last cacheline: 16 bytes */
};

No function changes.

Signed-off-by: Vernon Yang <yanglincheng@kylinos.cn>
---
 mm/khugepaged.c | 78 ++++++++++++++++++++++---------------------------
 1 file changed, 35 insertions(+), 43 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 7c1642fbe394..13b0fe50dfc5 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -70,8 +70,8 @@ static struct task_struct *khugepaged_thread __read_mostly;
 static DEFINE_MUTEX(khugepaged_mutex);

 /*
- * default scan 8*HPAGE_PMD_NR ptes, pmd_mapped, no_pte_table or vmas
- * every 10 second.
+ * default scan 8*HPAGE_PMD_NR ptes, pte_mapped_hugepage, pmd_mapped,
+ * no_pte_table or vmas every 10 second.
  */
 static unsigned int khugepaged_pages_to_scan __read_mostly;
 static unsigned int khugepaged_pages_collapsed;
@@ -104,6 +104,9 @@ struct collapse_control {
 	/* Num pages scanned per node */
 	u32 node_load[MAX_NUMNODES];

+	/* Num pages scanned (see khugepaged_pages_to_scan) */
+	unsigned int progress;
+
 	/* nodemask for allocation fallback */
 	nodemask_t alloc_nmask;
 };
@@ -1246,8 +1249,7 @@ static enum scan_result collapse_huge_page(struct mm_struct *mm, unsigned long a

 static enum scan_result hpage_collapse_scan_pmd(struct mm_struct *mm,
 		struct vm_area_struct *vma, unsigned long start_addr,
-		bool *mmap_locked, unsigned int *cur_progress,
-		struct collapse_control *cc)
+		bool *mmap_locked, struct collapse_control *cc)
 {
 	pmd_t *pmd;
 	pte_t *pte, *_pte;
@@ -1263,8 +1265,7 @@ static enum scan_result hpage_collapse_scan_pmd(struct mm_struct *mm,

 	result = find_pmd_or_thp_or_none(mm, start_addr, &pmd);
 	if (result != SCAN_SUCCEED) {
-		if (cur_progress)
-			*cur_progress = 1;
+		cc->progress++;
 		goto out;
 	}

@@ -1272,16 +1273,14 @@ static enum scan_result hpage_collapse_scan_pmd(struct mm_struct *mm,
 	nodes_clear(cc->alloc_nmask);
 	pte = pte_offset_map_lock(mm, pmd, start_addr, &ptl);
 	if (!pte) {
-		if (cur_progress)
-			*cur_progress = 1;
+		cc->progress++;
 		result = SCAN_NO_PTE_TABLE;
 		goto out;
 	}

 	for (addr = start_addr, _pte = pte; _pte < pte + HPAGE_PMD_NR;
 	     _pte++, addr += PAGE_SIZE) {
-		if (cur_progress)
-			*cur_progress += 1;
+		cc->progress++;

 		pte_t pteval = ptep_get(_pte);
 		if (pte_none_or_zero(pteval)) {
@@ -2314,7 +2313,7 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,

 static enum scan_result hpage_collapse_scan_file(struct mm_struct *mm,
 		unsigned long addr, struct file *file, pgoff_t start,
-		unsigned int *cur_progress, struct collapse_control *cc)
+		struct collapse_control *cc)
 {
 	struct folio *folio = NULL;
 	struct address_space *mapping = file->f_mapping;
@@ -2404,12 +2403,10 @@ static enum scan_result hpage_collapse_scan_file(struct mm_struct *mm,
 		}
 	}
 	rcu_read_unlock();
-	if (cur_progress) {
-		if (result == SCAN_PTE_MAPPED_HUGEPAGE)
-			*cur_progress = 1;
-		else
-			*cur_progress = HPAGE_PMD_NR;
-	}
+	if (result == SCAN_PTE_MAPPED_HUGEPAGE)
+		cc->progress++;
+	else
+		cc->progress += HPAGE_PMD_NR;

 	if (result == SCAN_SUCCEED) {
 		if (cc->is_khugepaged &&
@@ -2425,8 +2422,8 @@ static enum scan_result hpage_collapse_scan_file(struct mm_struct *mm,
 	return result;
 }

-static unsigned int khugepaged_scan_mm_slot(unsigned int pages, enum scan_result *result,
-					    struct collapse_control *cc)
+static void khugepaged_scan_mm_slot(unsigned int progress_max,
+		enum scan_result *result, struct collapse_control *cc)
 	__releases(&khugepaged_mm_lock)
 	__acquires(&khugepaged_mm_lock)
 {
@@ -2434,9 +2431,8 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, enum scan_result
 	struct mm_slot *slot;
 	struct mm_struct *mm;
 	struct vm_area_struct *vma;
-	int progress = 0;
+	unsigned int progress_prev = cc->progress;

-	VM_BUG_ON(!pages);
 	lockdep_assert_held(&khugepaged_mm_lock);
 	*result = SCAN_FAIL;

@@ -2459,7 +2455,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, enum scan_result
 	if (unlikely(!mmap_read_trylock(mm)))
 		goto breakouterloop_mmap_lock;

-	progress++;
+	cc->progress++;
 	if (unlikely(hpage_collapse_test_exit_or_disable(mm)))
 		goto breakouterloop;

@@ -2469,17 +2465,17 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, enum scan_result

 		cond_resched();
 		if (unlikely(hpage_collapse_test_exit_or_disable(mm))) {
-			progress++;
+			cc->progress++;
 			break;
 		}
 		if (!thp_vma_allowable_order(vma, vma->vm_flags, TVA_KHUGEPAGED, PMD_ORDER)) {
-			progress++;
+			cc->progress++;
 			continue;
 		}
 		hstart = round_up(vma->vm_start, HPAGE_PMD_SIZE);
 		hend = round_down(vma->vm_end, HPAGE_PMD_SIZE);
 		if (khugepaged_scan.address > hend) {
-			progress++;
+			cc->progress++;
 			continue;
 		}
 		if (khugepaged_scan.address < hstart)
@@ -2488,7 +2484,6 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, enum scan_result

 		while (khugepaged_scan.address < hend) {
 			bool mmap_locked = true;
-			unsigned int cur_progress = 0;

 			cond_resched();
 			if (unlikely(hpage_collapse_test_exit_or_disable(mm)))
@@ -2505,8 +2500,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, enum scan_result
 				mmap_read_unlock(mm);
 				mmap_locked = false;
 				*result = hpage_collapse_scan_file(mm,
-					khugepaged_scan.address, file, pgoff,
-					&cur_progress, cc);
+					khugepaged_scan.address, file, pgoff, cc);
 				fput(file);
 				if (*result == SCAN_PTE_MAPPED_HUGEPAGE) {
 					mmap_read_lock(mm);
@@ -2520,8 +2514,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, enum scan_result
 				}
 			} else {
 				*result = hpage_collapse_scan_pmd(mm, vma,
-					khugepaged_scan.address, &mmap_locked,
-					&cur_progress, cc);
+					khugepaged_scan.address, &mmap_locked, cc);
 			}

 			if (*result == SCAN_SUCCEED)
@@ -2529,7 +2522,6 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, enum scan_result

 			/* move to next address */
 			khugepaged_scan.address += HPAGE_PMD_SIZE;
-			progress += cur_progress;
 			if (!mmap_locked)
 				/*
 				 * We released mmap_lock so break loop.  Note
@@ -2539,7 +2531,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, enum scan_result
 				 * correct result back to caller.
 				 */
 				goto breakouterloop_mmap_lock;
-			if (progress >= pages)
+			if (cc->progress >= progress_max)
 				goto breakouterloop;
 		}
 	}
@@ -2570,9 +2562,8 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, enum scan_result
 		collect_mm_slot(slot);
 	}

-	trace_mm_khugepaged_scan(mm, progress, khugepaged_scan.mm_slot == NULL);
-
-	return progress;
+	trace_mm_khugepaged_scan(mm, cc->progress - progress_prev,
+				 khugepaged_scan.mm_slot == NULL);
 }

 static int khugepaged_has_work(void)
@@ -2588,13 +2579,14 @@ static int khugepaged_wait_event(void)

 static void khugepaged_do_scan(struct collapse_control *cc)
 {
-	unsigned int progress = 0, pass_through_head = 0;
-	unsigned int pages = READ_ONCE(khugepaged_pages_to_scan);
+	const unsigned int progress_max = READ_ONCE(khugepaged_pages_to_scan);
+	unsigned int pass_through_head = 0;
 	bool wait = true;
 	enum scan_result result = SCAN_SUCCEED;

 	lru_add_drain_all();

+	cc->progress = 0;
 	while (true) {
 		cond_resched();

@@ -2606,13 +2598,12 @@ static void khugepaged_do_scan(struct collapse_control *cc)
 			pass_through_head++;
 		if (khugepaged_has_work() &&
 		    pass_through_head < 2)
-			progress += khugepaged_scan_mm_slot(pages - progress,
-							    &result, cc);
+			khugepaged_scan_mm_slot(progress_max, &result, cc);
 		else
-			progress = pages;
+			cc->progress = progress_max;
 		spin_unlock(&khugepaged_mm_lock);

-		if (progress >= pages)
+		if (cc->progress >= progress_max)
 			break;

 		if (result == SCAN_ALLOC_HUGE_PAGE_FAIL) {
@@ -2818,6 +2809,7 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
 	if (!cc)
 		return -ENOMEM;
 	cc->is_khugepaged = false;
+	cc->progress = 0;

 	mmgrab(mm);
 	lru_add_drain_all();
@@ -2852,7 +2844,7 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
 			mmap_locked = false;
 			*lock_dropped = true;
 			result = hpage_collapse_scan_file(mm, addr, file, pgoff,
-							  NULL, cc);
+							  cc);

 			if (result == SCAN_PAGE_DIRTY_OR_WRITEBACK && !triggered_wb &&
 			    mapping_can_writeback(file->f_mapping)) {
@@ -2867,7 +2859,7 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
 			fput(file);
 		} else {
 			result = hpage_collapse_scan_pmd(mm, vma, addr,
-							 &mmap_locked, NULL, cc);
+							 &mmap_locked, cc);
 		}
 		if (!mmap_locked)
 			*lock_dropped = true;
--
2.51.0

Re: [PATCH mm-new v8 2/4] mm: khugepaged: refine scan progress number

Posted by David Hildenbrand (Arm) 1 month ago

On 2/26/26 15:31, Vernon Yang wrote:
> On Wed, Feb 25, 2026 at 03:29:05PM +0100, David Hildenbrand (Arm) wrote:
>> On 2/25/26 15:25, Vernon Yang wrote:
>>>
>>> Thank you for suggestion.
>>>
>>> Placing it inside "struct collapse_control" makes the overall code
>>> simpler, there also coincidentally has a 4-bytes hole, as shown below:
>>>
>>> struct collapse_control {
>>>         bool                       is_khugepaged;        /*     0     1 */
>>>
>>>         /* XXX 3 bytes hole, try to pack */
>>>
>>>         u32                        node_load[64];        /*     4   256 */
>>>
>>>         /* XXX 4 bytes hole, try to pack */
>>>
>>>         /* --- cacheline 4 boundary (256 bytes) was 8 bytes ago --- */
>>>         nodemask_t                 alloc_nmask;          /*   264     8 */
>>>
>>>         /* size: 272, cachelines: 5, members: 3 */
>>>         /* sum members: 265, holes: 2, sum holes: 7 */
>>>         /* last cacheline: 16 bytes */
>>> };
>>>
>>> But regardless of khugepaged or madvise(MADV_COLLAPSE), "cur_progress"
>>> will be counted, while madvise(MADV_COLLAPSE) actually does not need to
>>> be counted.
>>>
>>> David, do we want to place "cur_progress" inside the "struct collapse_control"?
>>
>> Might end up looking nicer code-wise. But the reset semantics (within a
>> pmd) are a bit weird.
>>
>>> If Yes, it would be better to rename "cur_progress" to "pmd_progress",
>>> as show below:
>>>
>>
>> "pmd_progress" is misleading. "progress_in_pmd" might be clearer.
>>
>> Play with it to see if it looks better :)
> 
> Hi Andrew, David,
> 
> Based on previous discussions [1], v2 as follow, and testing shows the
> same performance benefits. Just make code cleaner, no function changes.
> 
> If David has no further revisions, Andrew, could you please squash the
> following clean into this patch? If you prefer a new version, please let
> me know. Thanks.

Do we also have to update the resulting patch description? Patch itself
LGTM.


-- 
Cheers,

David

Re: [PATCH mm-new v8 2/4] mm: khugepaged: refine scan progress number

Posted by Vernon Yang 1 month ago

On Thu, Feb 26, 2026 at 11:45 PM David Hildenbrand (Arm)
<david@kernel.org> wrote:
>
> On 2/26/26 15:31, Vernon Yang wrote:
> > On Wed, Feb 25, 2026 at 03:29:05PM +0100, David Hildenbrand (Arm) wrote:
> >> On 2/25/26 15:25, Vernon Yang wrote:
> >>>
> >>> Thank you for suggestion.
> >>>
> >>> Placing it inside "struct collapse_control" makes the overall code
> >>> simpler, there also coincidentally has a 4-bytes hole, as shown below:
> >>>
> >>> struct collapse_control {
> >>>         bool                       is_khugepaged;        /*     0     1 */
> >>>
> >>>         /* XXX 3 bytes hole, try to pack */
> >>>
> >>>         u32                        node_load[64];        /*     4   256 */
> >>>
> >>>         /* XXX 4 bytes hole, try to pack */
> >>>
> >>>         /* --- cacheline 4 boundary (256 bytes) was 8 bytes ago --- */
> >>>         nodemask_t                 alloc_nmask;          /*   264     8 */
> >>>
> >>>         /* size: 272, cachelines: 5, members: 3 */
> >>>         /* sum members: 265, holes: 2, sum holes: 7 */
> >>>         /* last cacheline: 16 bytes */
> >>> };
> >>>
> >>> But regardless of khugepaged or madvise(MADV_COLLAPSE), "cur_progress"
> >>> will be counted, while madvise(MADV_COLLAPSE) actually does not need to
> >>> be counted.
> >>>
> >>> David, do we want to place "cur_progress" inside the "struct collapse_control"?
> >>
> >> Might end up looking nicer code-wise. But the reset semantics (within a
> >> pmd) are a bit weird.
> >>
> >>> If Yes, it would be better to rename "cur_progress" to "pmd_progress",
> >>> as show below:
> >>>
> >>
> >> "pmd_progress" is misleading. "progress_in_pmd" might be clearer.
> >>
> >> Play with it to see if it looks better :)
> >
> > Hi Andrew, David,
> >
> > Based on previous discussions [1], v2 as follow, and testing shows the
> > same performance benefits. Just make code cleaner, no function changes.
> >
> > If David has no further revisions, Andrew, could you please squash the
> > following clean into this patch? If you prefer a new version, please let
> > me know. Thanks.
>
> Do we also have to update the resulting patch description? Patch itself
> LGTM.

No need to update the patch description.

--
Cheers,
Vernon

Re: [PATCH mm-new v8 2/4] mm: khugepaged: refine scan progress number

Posted by Lorenzo Stoakes (Oracle) 1 week, 1 day ago

On Fri, Feb 27, 2026 at 01:15:24AM +0800, Vernon Yang wrote:
> On Thu, Feb 26, 2026 at 11:45 PM David Hildenbrand (Arm)
> <david@kernel.org> wrote:
> >
> > On 2/26/26 15:31, Vernon Yang wrote:
> > > On Wed, Feb 25, 2026 at 03:29:05PM +0100, David Hildenbrand (Arm) wrote:
> > >> On 2/25/26 15:25, Vernon Yang wrote:
> > >>>
> > >>> Thank you for suggestion.
> > >>>
> > >>> Placing it inside "struct collapse_control" makes the overall code
> > >>> simpler, there also coincidentally has a 4-bytes hole, as shown below:
> > >>>
> > >>> struct collapse_control {
> > >>>         bool                       is_khugepaged;        /*     0     1 */
> > >>>
> > >>>         /* XXX 3 bytes hole, try to pack */
> > >>>
> > >>>         u32                        node_load[64];        /*     4   256 */
> > >>>
> > >>>         /* XXX 4 bytes hole, try to pack */
> > >>>
> > >>>         /* --- cacheline 4 boundary (256 bytes) was 8 bytes ago --- */
> > >>>         nodemask_t                 alloc_nmask;          /*   264     8 */
> > >>>
> > >>>         /* size: 272, cachelines: 5, members: 3 */
> > >>>         /* sum members: 265, holes: 2, sum holes: 7 */
> > >>>         /* last cacheline: 16 bytes */
> > >>> };
> > >>>
> > >>> But regardless of khugepaged or madvise(MADV_COLLAPSE), "cur_progress"
> > >>> will be counted, while madvise(MADV_COLLAPSE) actually does not need to
> > >>> be counted.
> > >>>
> > >>> David, do we want to place "cur_progress" inside the "struct collapse_control"?
> > >>
> > >> Might end up looking nicer code-wise. But the reset semantics (within a
> > >> pmd) are a bit weird.
> > >>
> > >>> If Yes, it would be better to rename "cur_progress" to "pmd_progress",
> > >>> as show below:
> > >>>
> > >>
> > >> "pmd_progress" is misleading. "progress_in_pmd" might be clearer.
> > >>
> > >> Play with it to see if it looks better :)
> > >
> > > Hi Andrew, David,
> > >
> > > Based on previous discussions [1], v2 as follow, and testing shows the
> > > same performance benefits. Just make code cleaner, no function changes.
> > >
> > > If David has no further revisions, Andrew, could you please squash the
> > > following clean into this patch? If you prefer a new version, please let
> > > me know. Thanks.
> >
> > Do we also have to update the resulting patch description? Patch itself
> > LGTM.
>
> No need to update the patch description.

I will take a look at this (sorry for delay) but general point - while
fix-patches are convenient, they're incredibly anti-reviewer.

I hope at some point in the future we can move away from that so you can look at
a series on list and know that what's shown there is the actual patch.

As it stands, I can't go line-by-line correctly here without quite a bit of
additional effort.

(Not a criticism of you Vernon just a general point about mm process).

>
> --
> Cheers,
> Vernon

Thanks, Lorenzo

Re: [PATCH mm-new v8 2/4] mm: khugepaged: refine scan progress number

Posted by Andrew Morton 1 week ago

On Wed, 25 Mar 2026 14:10:23 +0000 "Lorenzo Stoakes (Oracle)" <ljs@kernel.org> wrote:

> > No need to update the patch description.
> 
> I will take a look at this (sorry for delay) but general point - while
> fix-patches are convenient, they're incredibly anti-reviewer.
> 
> I hope at some point in the future we can move away from that so you can look at
> a series on list and know that what's shown there is the actual patch.

Oh.  I've never really received that message, at least not at all
clearly.

I've been hoping that the -fix patches are actually pro-reviewer, for
those reviewers who have looked at the previous version.  A full resend
of something you've already looked at is quite annoying!

I try to mitigate that by sending the
heres-what-you-changed-since-last-time replies.  It's a little more
work at this end, but that's not at all a problem.

I see a couple of options here

a) I can fold the -fix into the base patch then send out the
   resulting diff as a reply-to-all.

b) We can just deprecate the -fix things and ask people for full
   resends.

It depends on what people prefer.  How do we determine that?

Re: [PATCH mm-new v8 2/4] mm: khugepaged: refine scan progress number

Posted by David Hildenbrand (Arm) 1 week ago

On 3/25/26 16:09, Andrew Morton wrote:
> On Wed, 25 Mar 2026 14:10:23 +0000 "Lorenzo Stoakes (Oracle)" <ljs@kernel.org> wrote:
> 
>>> No need to update the patch description.
>>
>> I will take a look at this (sorry for delay) but general point - while
>> fix-patches are convenient, they're incredibly anti-reviewer.

+1, I could have sworn we brought that up before. :)

>>
>> I hope at some point in the future we can move away from that so you can look at
>> a series on list and know that what's shown there is the actual patch.
> 
> Oh.  I've never really received that message, at least not at all
> clearly.
> 
> I've been hoping that the -fix patches are actually pro-reviewer, for
> those reviewers who have looked at the previous version.  A full resend
> of something you've already looked at is quite annoying!
> 
> I try to mitigate that by sending the
> heres-what-you-changed-since-last-time replies.  It's a little more
> work at this end, but that's not at all a problem.
> 
> I see a couple of options here
> 
> a) I can fold the -fix into the base patch then send out the
>    resulting diff as a reply-to-all.
> 
> b) We can just deprecate the -fix things and ask people for full
>    resends.
> 
> It depends on what people prefer.  How do we determine that?

I like "fix" for smaller "obvious" stuff where a resend is really just
noise.

But for bigger stuff I prefer a full resend (we can still have these
temporary fixups, but for reviewers a follow-up resend is better).

-- 
Cheers,

David

Re: [PATCH mm-new v8 2/4] mm: khugepaged: refine scan progress number

Posted by Lorenzo Stoakes (Oracle) 1 week ago

On Wed, Mar 25, 2026 at 04:17:21PM +0100, David Hildenbrand (Arm) wrote:
> On 3/25/26 16:09, Andrew Morton wrote:
> > On Wed, 25 Mar 2026 14:10:23 +0000 "Lorenzo Stoakes (Oracle)" <ljs@kernel.org> wrote:
> >
> >>> No need to update the patch description.
> >>
> >> I will take a look at this (sorry for delay) but general point - while
> >> fix-patches are convenient, they're incredibly anti-reviewer.
>
> +1, I could have sworn we brought that up before. :)
>
> >>
> >> I hope at some point in the future we can move away from that so you can look at
> >> a series on list and know that what's shown there is the actual patch.
> >
> > Oh.  I've never really received that message, at least not at all
> > clearly.
> >
> > I've been hoping that the -fix patches are actually pro-reviewer, for
> > those reviewers who have looked at the previous version.  A full resend
> > of something you've already looked at is quite annoying!
> >
> > I try to mitigate that by sending the
> > heres-what-you-changed-since-last-time replies.  It's a little more
> > work at this end, but that's not at all a problem.
> >
> > I see a couple of options here
> >
> > a) I can fold the -fix into the base patch then send out the
> >    resulting diff as a reply-to-all.
> >
> > b) We can just deprecate the -fix things and ask people for full
> >    resends.
> >
> > It depends on what people prefer.  How do we determine that?
>
> I like "fix" for smaller "obvious" stuff where a resend is really just
> noise.
>
> But for bigger stuff I prefer a full resend (we can still have these
> temporary fixups, but for reviewers a follow-up resend is better).

Yeah, it's really about being able to come to a series later and be able to
comment line-by-line.

Really larger stuff should be resent I think, esp. if there's multiple
fixes in the series.

A reply with the same-patch-but-with-fix-applied would definitely be
useful!

>
> --
> Cheers,
>
> David

Thanks, Lorenzo

Re: [PATCH mm-new v8 2/4] mm: khugepaged: refine scan progress number

Posted by David Hildenbrand (Arm) 1 week ago

On 3/25/26 16:20, Lorenzo Stoakes (Oracle) wrote:
> On Wed, Mar 25, 2026 at 04:17:21PM +0100, David Hildenbrand (Arm) wrote:
>> On 3/25/26 16:09, Andrew Morton wrote:
>>>
>>
>> +1, I could have sworn we brought that up before. :)
>>
>>>
>>> Oh.  I've never really received that message, at least not at all
>>> clearly.
>>>
>>> I've been hoping that the -fix patches are actually pro-reviewer, for
>>> those reviewers who have looked at the previous version.  A full resend
>>> of something you've already looked at is quite annoying!
>>>
>>> I try to mitigate that by sending the
>>> heres-what-you-changed-since-last-time replies.  It's a little more
>>> work at this end, but that's not at all a problem.
>>>
>>> I see a couple of options here
>>>
>>> a) I can fold the -fix into the base patch then send out the
>>>    resulting diff as a reply-to-all.
>>>
>>> b) We can just deprecate the -fix things and ask people for full
>>>    resends.
>>>
>>> It depends on what people prefer.  How do we determine that?
>>
>> I like "fix" for smaller "obvious" stuff where a resend is really just
>> noise.
>>
>> But for bigger stuff I prefer a full resend (we can still have these
>> temporary fixups, but for reviewers a follow-up resend is better).
> 
> Yeah, it's really about being able to come to a series later and be able to
> comment line-by-line.
> 
> Really larger stuff should be resent I think, esp. if there's multiple
> fixes in the series.
> 
> A reply with the same-patch-but-with-fix-applied would definitely be
> useful!

Right, for completeness, this is what we had in an off-list thread:

"
Not sure if that's a problem for others, but I got the feeling that this
escalated a bit lately.

I know, that we prefer fixups to sort out smaller stuff. So far so good.
In the last time there were some series where I was seriously completely
lost which state of the patches would go upstream, or what I should even
review, because there were just fixups over fixups.

Fixups are nice, but for someone reviewing a series, too many fixups
(either as inline patch or even worse, as independent patches) just
causes a mess.

It also gives the impression of "this is mostly done, so don't waste
your time reviewing it anymore." --- "just the finishing touches" ---
"don't jump in late and cause trouble".
"

-- 
Cheers,

David

Re: [PATCH mm-new v8 2/4] mm: khugepaged: refine scan progress number

Posted by Vernon Yang 1 week ago

On Wed, Mar 25, 2026 at 11:23 PM David Hildenbrand (Arm)
<david@kernel.org> wrote:
>
> On 3/25/26 16:20, Lorenzo Stoakes (Oracle) wrote:
> > On Wed, Mar 25, 2026 at 04:17:21PM +0100, David Hildenbrand (Arm) wrote:
> >> On 3/25/26 16:09, Andrew Morton wrote:
> >>>
> >>
> >> +1, I could have sworn we brought that up before. :)
> >>
> >>>
> >>> Oh.  I've never really received that message, at least not at all
> >>> clearly.
> >>>
> >>> I've been hoping that the -fix patches are actually pro-reviewer, for
> >>> those reviewers who have looked at the previous version.  A full resend
> >>> of something you've already looked at is quite annoying!
> >>>
> >>> I try to mitigate that by sending the
> >>> heres-what-you-changed-since-last-time replies.  It's a little more
> >>> work at this end, but that's not at all a problem.
> >>>
> >>> I see a couple of options here
> >>>
> >>> a) I can fold the -fix into the base patch then send out the
> >>>    resulting diff as a reply-to-all.
> >>>
> >>> b) We can just deprecate the -fix things and ask people for full
> >>>    resends.
> >>>
> >>> It depends on what people prefer.  How do we determine that?
> >>
> >> I like "fix" for smaller "obvious" stuff where a resend is really just
> >> noise.
> >>
> >> But for bigger stuff I prefer a full resend (we can still have these
> >> temporary fixups, but for reviewers a follow-up resend is better).

Yeah, I completely agree with this policy.

Initially, I thought it was just cleanup without functional changes, but
I didn't consider the new reviewers (haven't seen the previous version).
Sorry.

Although this patchset is already in the mm-stable branch, if we want to
resend V9 version, I'd be happy to do so. Please let me know. Thanks!

> > Yeah, it's really about being able to come to a series later and be able to
> > comment line-by-line.
> >
> > Really larger stuff should be resent I think, esp. if there's multiple
> > fixes in the series.
> >
> > A reply with the same-patch-but-with-fix-applied would definitely be
> > useful!
>
> Right, for completeness, this is what we had in an off-list thread:
>
> "
> Not sure if that's a problem for others, but I got the feeling that this
> escalated a bit lately.
>
> I know, that we prefer fixups to sort out smaller stuff. So far so good.
> In the last time there were some series where I was seriously completely
> lost which state of the patches would go upstream, or what I should even
> review, because there were just fixups over fixups.
>
> Fixups are nice, but for someone reviewing a series, too many fixups
> (either as inline patch or even worse, as independent patches) just
> causes a mess.
>
> It also gives the impression of "this is mostly done, so don't waste
> your time reviewing it anymore." --- "just the finishing touches" ---
> "don't jump in late and cause trouble".
> "

--
Cheers,
Vernon

Re: [PATCH mm-new v8 2/4] mm: khugepaged: refine scan progress number

Posted by Andrew Morton 1 week ago

On Thu, 26 Mar 2026 01:00:23 +0800 Vernon Yang <vernon2gm@gmail.com> wrote:

> Although this patchset is already in the mm-stable branch, if we want to
> resend V9 version, I'd be happy to do so. Please let me know. Thanks!

Depends on what changed in v9.  If it's a major change then I expect
I'd drop this series from mm-stable and we restart the clock on the
integration and review of this work.

If it's a minor touchup then a standalone patch against mm-stable would
be fine, or just leave things as-is and prepare that change after
7.1-rc1.

Re: [PATCH mm-new v8 2/4] mm: khugepaged: refine scan progress number

Posted by Lorenzo Stoakes (Oracle) 1 week ago

On Wed, Mar 25, 2026 at 11:59:32AM -0700, Andrew Morton wrote:
> On Thu, 26 Mar 2026 01:00:23 +0800 Vernon Yang <vernon2gm@gmail.com> wrote:
>
> > Although this patchset is already in the mm-stable branch, if we want to
> > resend V9 version, I'd be happy to do so. Please let me know. Thanks!
>
> Depends on what changed in v9.  If it's a major change then I expect
> I'd drop this series from mm-stable and we restart the clock on the
> integration and review of this work.
>
> If it's a minor touchup then a standalone patch against mm-stable would
> be fine, or just leave things as-is and prepare that change after
> 7.1-rc1.

I think he means just resending this for the sake of reviewability or
whatnot?  Anyway if it's for my sake it's fine there's no need, it's
already in mm-stable and I don't want to make a fuss, I'll just try to
structure review more efficiently in future so I don't end up with BOTH a
backlog AND a random-walk of what I actually review :)

Thanks, Lorenzo

Re: [PATCH mm-new v8 2/4] mm: khugepaged: refine scan progress number

Posted by Vernon Yang 1 week ago

On Thu, Mar 26, 2026 at 3:04 AM Lorenzo Stoakes (Oracle) <ljs@kernel.org> wrote:
> On Wed, Mar 25, 2026 at 11:59:32AM -0700, Andrew Morton wrote:
> > On Thu, 26 Mar 2026 01:00:23 +0800 Vernon Yang <vernon2gm@gmail.com> wrote:
> >
> > > Although this patchset is already in the mm-stable branch, if we want to
> > > resend V9 version, I'd be happy to do so. Please let me know. Thanks!
> >
> > Depends on what changed in v9.  If it's a major change then I expect
> > I'd drop this series from mm-stable and we restart the clock on the
> > integration and review of this work.
> >
> > If it's a minor touchup then a standalone patch against mm-stable would
> > be fine, or just leave things as-is and prepare that change after
> > 7.1-rc1.
>
> I think he means just resending this for the sake of reviewability or

Yes, Just resending this for the sake of reviewability.

> whatnot?  Anyway if it's for my sake it's fine there's no need, it's
> already in mm-stable and I don't want to make a fuss, I'll just try to
> structure review more efficiently in future so I don't end up with BOTH a
> backlog AND a random-walk of what I actually review :)
>
> Thanks, Lorenzo

Re: [PATCH mm-new v8 2/4] mm: khugepaged: refine scan progress number

Posted by Lorenzo Stoakes (Oracle) 1 week ago

On Thu, Mar 26, 2026 at 09:59:46AM +0800, Vernon Yang wrote:
> On Thu, Mar 26, 2026 at 3:04 AM Lorenzo Stoakes (Oracle) <ljs@kernel.org> wrote:
> > On Wed, Mar 25, 2026 at 11:59:32AM -0700, Andrew Morton wrote:
> > > On Thu, 26 Mar 2026 01:00:23 +0800 Vernon Yang <vernon2gm@gmail.com> wrote:
> > >
> > > > Although this patchset is already in the mm-stable branch, if we want to
> > > > resend V9 version, I'd be happy to do so. Please let me know. Thanks!
> > >
> > > Depends on what changed in v9.  If it's a major change then I expect
> > > I'd drop this series from mm-stable and we restart the clock on the
> > > integration and review of this work.
> > >
> > > If it's a minor touchup then a standalone patch against mm-stable would
> > > be fine, or just leave things as-is and prepare that change after
> > > 7.1-rc1.
> >
> > I think he means just resending this for the sake of reviewability or
>
> Yes, Just resending this for the sake of reviewability.

Yeah then there's no need, let's leave this as it is.

Separately - thanks very much for doing this Vernon :) paying down technical
debt in THP is _very_ much appreciated!

Cheers, Lorenzo

Re: [PATCH mm-new v8 2/4] mm: khugepaged: refine scan progress number

Posted by Lorenzo Stoakes (Oracle) 1 week ago

On Thu, Mar 26, 2026 at 01:00:23AM +0800, Vernon Yang wrote:
> On Wed, Mar 25, 2026 at 11:23 PM David Hildenbrand (Arm)
> <david@kernel.org> wrote:
> >
> > On 3/25/26 16:20, Lorenzo Stoakes (Oracle) wrote:
> > > On Wed, Mar 25, 2026 at 04:17:21PM +0100, David Hildenbrand (Arm) wrote:
> > >> On 3/25/26 16:09, Andrew Morton wrote:
> > >>>
> > >>
> > >> +1, I could have sworn we brought that up before. :)
> > >>
> > >>>
> > >>> Oh.  I've never really received that message, at least not at all
> > >>> clearly.
> > >>>
> > >>> I've been hoping that the -fix patches are actually pro-reviewer, for
> > >>> those reviewers who have looked at the previous version.  A full resend
> > >>> of something you've already looked at is quite annoying!
> > >>>
> > >>> I try to mitigate that by sending the
> > >>> heres-what-you-changed-since-last-time replies.  It's a little more
> > >>> work at this end, but that's not at all a problem.
> > >>>
> > >>> I see a couple of options here
> > >>>
> > >>> a) I can fold the -fix into the base patch then send out the
> > >>>    resulting diff as a reply-to-all.
> > >>>
> > >>> b) We can just deprecate the -fix things and ask people for full
> > >>>    resends.
> > >>>
> > >>> It depends on what people prefer.  How do we determine that?
> > >>
> > >> I like "fix" for smaller "obvious" stuff where a resend is really just
> > >> noise.
> > >>
> > >> But for bigger stuff I prefer a full resend (we can still have these
> > >> temporary fixups, but for reviewers a follow-up resend is better).
>
> Yeah, I completely agree with this policy.
>
> Initially, I thought it was just cleanup without functional changes, but
> I didn't consider the new reviewers (haven't seen the previous version).
> Sorry.
>
> Although this patchset is already in the mm-stable branch, if we want to
> resend V9 version, I'd be happy to do so. Please let me know. Thanks!

There's no need don't worry :) this isn't about your series, just a general
point.

Thanks, Lorenzo

Re: [PATCH mm-new v8 2/4] mm: khugepaged: refine scan progress number

Posted by Andrew Morton 1 week ago

On Wed, 25 Mar 2026 16:22:55 +0100 "David Hildenbrand (Arm)" <david@kernel.org> wrote:

> > 
> > Really larger stuff should be resent I think, esp. if there's multiple
> > fixes in the series.
> > 
> > A reply with the same-patch-but-with-fix-applied would definitely be
> > useful!
> 
> Right, for completeness, this is what we had in an off-list thread:
> 
> "
> Not sure if that's a problem for others, but I got the feeling that this
> escalated a bit lately.
> 
> I know, that we prefer fixups to sort out smaller stuff. So far so good.
> In the last time there were some series where I was seriously completely
> lost which state of the patches would go upstream, or what I should even
> review, because there were just fixups over fixups.
> 
> Fixups are nice, but for someone reviewing a series, too many fixups
> (either as inline patch or even worse, as independent patches) just
> causes a mess.
> 
> It also gives the impression of "this is mostly done, so don't waste
> your time reviewing it anymore." --- "just the finishing touches" ---
> "don't jump in late and cause trouble".
> "

hm OK, so what to do.  We're OK with teeny -fixes but anything more
substantial we ask for a full resend and I do the heres-what-changed
reply?

I presently don't fold the -fixes until the very last moment.  Could do
that much earlier if it helps anything?  Possibly useful to people who
are looking at the series in the mm.git tree.

Re: [PATCH mm-new v8 2/4] mm: khugepaged: refine scan progress number

Posted by Lorenzo Stoakes (Oracle) 1 week ago

On Wed, Mar 25, 2026 at 09:17:35AM -0700, Andrew Morton wrote:
> On Wed, 25 Mar 2026 16:22:55 +0100 "David Hildenbrand (Arm)" <david@kernel.org> wrote:
>
> > >
> > > Really larger stuff should be resent I think, esp. if there's multiple
> > > fixes in the series.
> > >
> > > A reply with the same-patch-but-with-fix-applied would definitely be
> > > useful!
> >
> > Right, for completeness, this is what we had in an off-list thread:
> >
> > "
> > Not sure if that's a problem for others, but I got the feeling that this
> > escalated a bit lately.
> >
> > I know, that we prefer fixups to sort out smaller stuff. So far so good.
> > In the last time there were some series where I was seriously completely
> > lost which state of the patches would go upstream, or what I should even
> > review, because there were just fixups over fixups.
> >
> > Fixups are nice, but for someone reviewing a series, too many fixups
> > (either as inline patch or even worse, as independent patches) just
> > causes a mess.
> >
> > It also gives the impression of "this is mostly done, so don't waste
> > your time reviewing it anymore." --- "just the finishing touches" ---
> > "don't jump in late and cause trouble".
> > "
>
> hm OK, so what to do.  We're OK with teeny -fixes but anything more
> substantial we ask for a full resend and I do the heres-what-changed
> reply?
>

Yeah that works for me.

> I presently don't fold the -fixes until the very last moment.  Could do
> that much earlier if it helps anything?  Possibly useful to people who
> are looking at the series in the mm.git tree.

It'd generally be easier imo to have those changes folded, but with something
added to the commit message to indicate this so I can know whether or not that
was folded in.

Maybe just directly squash the commits?

Thanks, Lorenzo

Re: [PATCH mm-new v8 2/4] mm: khugepaged: refine scan progress number

Posted by Andrew Morton 1 week ago

On Wed, 25 Mar 2026 16:26:12 +0000 "Lorenzo Stoakes (Oracle)" <ljs@kernel.org> wrote:

> > hm OK, so what to do.  We're OK with teeny -fixes but anything more
> > substantial we ask for a full resend and I do the heres-what-changed
> > reply?
> >
> 
> Yeah that works for me.
> 
> > I presently don't fold the -fixes until the very last moment.  Could do
> > that much earlier if it helps anything?  Possibly useful to people who
> > are looking at the series in the mm.git tree.
> 
> It'd generally be easier imo to have those changes folded, but with something
> added to the commit message to indicate this so I can know whether or not that
> was folded in.

I always add a [footer] when folding -fixes, eg:

[sj@kernel.org: verify found biggest system ram]
  Link: https://lkml.kernel.org/r/20260317144725.88524-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20260311052927.93921-3-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Yang yingliang <yangyingliang@huawei.com>

So one can simply chase the link.


An unfortunate exception is when the -fix is from myself - I don't take
the patch from a mailing list so I have no Link: to include, eg

[akpm@linux-foundation.org: fix spello, add comment]
Link: https://lkml.kernel.org/r/20260220151500.13585-1-rioo.tsukatsukii@gmail.com
Signed-off-by: Rio <rioo.tsukatsukii@gmail.com>
Cc: Joel Granados <joel.granados@kernel.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Wang Jinchao <wangjinchao600@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

But these things are usually quite minor and the precipitating
discussion can be found by reading the main Link:.

> Maybe just directly squash the commits?

Not understanding this proposal?

Re: [PATCH mm-new v8 2/4] mm: khugepaged: refine scan progress number

Posted by Lorenzo Stoakes (Oracle) 1 week ago

On Wed, Mar 25, 2026 at 11:36:50AM -0700, Andrew Morton wrote:
> On Wed, 25 Mar 2026 16:26:12 +0000 "Lorenzo Stoakes (Oracle)" <ljs@kernel.org> wrote:
>
> > > hm OK, so what to do.  We're OK with teeny -fixes but anything more
> > > substantial we ask for a full resend and I do the heres-what-changed
> > > reply?
> > >
> >
> > Yeah that works for me.
> >
> > > I presently don't fold the -fixes until the very last moment.  Could do
> > > that much earlier if it helps anything?  Possibly useful to people who
> > > are looking at the series in the mm.git tree.
> >
> > It'd generally be easier imo to have those changes folded, but with something
> > added to the commit message to indicate this so I can know whether or not that
> > was folded in.
>
> I always add a [footer] when folding -fixes, eg:
>
> [sj@kernel.org: verify found biggest system ram]
>   Link: https://lkml.kernel.org/r/20260317144725.88524-1-sj@kernel.org
> Link: https://lkml.kernel.org/r/20260311052927.93921-3-sj@kernel.org
> Signed-off-by: SeongJae Park <sj@kernel.org>
> Cc: Yang yingliang <yangyingliang@huawei.com>
>
> So one can simply chase the link.
>
>
> An unfortunate exception is when the -fix is from myself - I don't take
> the patch from a mailing list so I have no Link: to include, eg
>
> [akpm@linux-foundation.org: fix spello, add comment]
> Link: https://lkml.kernel.org/r/20260220151500.13585-1-rioo.tsukatsukii@gmail.com
> Signed-off-by: Rio <rioo.tsukatsukii@gmail.com>
> Cc: Joel Granados <joel.granados@kernel.org>
> Cc: Petr Mladek <pmladek@suse.com>
> Cc: Wang Jinchao <wangjinchao600@gmail.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>
> But these things are usually quite minor and the precipitating
> discussion can be found by reading the main Link:.
>
> > Maybe just directly squash the commits?
>
> Not understanding this proposal?

I mean instead of having a separate commit for the fix, put that fix into the
patch before it and denote it with a footer as you put above.

I guess that translates to what you do when you rebase and fold the fixes into
commits as you do now anyway.

I don't see any reason not to do that right away, as really it's good to see the
combined change in one go for all practical purposes (if I resend, I'll be
combining work, if I can grab it from the tree and avoid a git rebase -i all the
better).

Thanks, Lorenzo

Re: [PATCH mm-new v8 2/4] mm: khugepaged: refine scan progress number

Posted by Andrew Morton 1 week ago

On Wed, 25 Mar 2026 18:53:40 +0000 "Lorenzo Stoakes (Oracle)" <ljs@kernel.org> wrote:

> > [akpm@linux-foundation.org: fix spello, add comment]
> > Link: https://lkml.kernel.org/r/20260220151500.13585-1-rioo.tsukatsukii@gmail.com
> > Signed-off-by: Rio <rioo.tsukatsukii@gmail.com>
> > Cc: Joel Granados <joel.granados@kernel.org>
> > Cc: Petr Mladek <pmladek@suse.com>
> > Cc: Wang Jinchao <wangjinchao600@gmail.com>
> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> >
> > But these things are usually quite minor and the precipitating
> > discussion can be found by reading the main Link:.
> >
> > > Maybe just directly squash the commits?
> >
> > Not understanding this proposal?
> 
> I mean instead of having a separate commit for the fix, put that fix into the
> patch before it and denote it with a footer as you put above.
> 
> I guess that translates to what you do when you rebase and fold the fixes into
> commits as you do now anyway.
> 
> I don't see any reason not to do that right away, as really it's good to see the
> combined change in one go for all practical purposes (if I resend, I'll be
> combining work, if I can grab it from the tree and avoid a git rebase -i all the
> better).

OK.  So what have we concluded here?

Is it: if I get a -fix, I add that in the usual way, then temporarily
fold it into the base patch and mail the result out for fyi.  Then
after <period> I permanently fold the fix into the base and add the
footer?

If so, what's <period>?

Re: [PATCH mm-new v8 2/4] mm: khugepaged: refine scan progress number

Posted by Lorenzo Stoakes (Oracle) 1 week ago

On Wed, Mar 25, 2026 at 12:15:49PM -0700, Andrew Morton wrote:
> On Wed, 25 Mar 2026 18:53:40 +0000 "Lorenzo Stoakes (Oracle)" <ljs@kernel.org> wrote:
>
> > > [akpm@linux-foundation.org: fix spello, add comment]
> > > Link: https://lkml.kernel.org/r/20260220151500.13585-1-rioo.tsukatsukii@gmail.com
> > > Signed-off-by: Rio <rioo.tsukatsukii@gmail.com>
> > > Cc: Joel Granados <joel.granados@kernel.org>
> > > Cc: Petr Mladek <pmladek@suse.com>
> > > Cc: Wang Jinchao <wangjinchao600@gmail.com>
> > > Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> > >
> > > But these things are usually quite minor and the precipitating
> > > discussion can be found by reading the main Link:.
> > >
> > > > Maybe just directly squash the commits?
> > >
> > > Not understanding this proposal?
> >
> > I mean instead of having a separate commit for the fix, put that fix into the
> > patch before it and denote it with a footer as you put above.
> >
> > I guess that translates to what you do when you rebase and fold the fixes into
> > commits as you do now anyway.
> >
> > I don't see any reason not to do that right away, as really it's good to see the
> > combined change in one go for all practical purposes (if I resend, I'll be
> > combining work, if I can grab it from the tree and avoid a git rebase -i all the
> > better).
>
> OK.  So what have we concluded here?
>
> Is it: if I get a -fix, I add that in the usual way, then temporarily
> fold it into the base patch and mail the result out for fyi.  Then
> after <period> I permanently fold the fix into the base and add the
> footer?
>
> If so, what's <period>?

To me it feels like that should be 0, just squash it in right away, since the
trees are being rebased constantly right?

And that means the tree contains exactly what it would if the series were
re-sent.

David, what do you think?

Thanks, Lorenzo

Re: [PATCH mm-new v8 2/4] mm: khugepaged: refine scan progress number

Posted by David Hildenbrand (Arm) 1 week ago

On 3/25/26 21:03, Lorenzo Stoakes (Oracle) wrote:
> On Wed, Mar 25, 2026 at 12:15:49PM -0700, Andrew Morton wrote:
>> On Wed, 25 Mar 2026 18:53:40 +0000 "Lorenzo Stoakes (Oracle)" <ljs@kernel.org> wrote:
>>
>>>
>>> I mean instead of having a separate commit for the fix, put that fix into the
>>> patch before it and denote it with a footer as you put above.
>>>
>>> I guess that translates to what you do when you rebase and fold the fixes into
>>> commits as you do now anyway.
>>>
>>> I don't see any reason not to do that right away, as really it's good to see the
>>> combined change in one go for all practical purposes (if I resend, I'll be
>>> combining work, if I can grab it from the tree and avoid a git rebase -i all the
>>> better).
>>
>> OK.  So what have we concluded here?
>>
>> Is it: if I get a -fix, I add that in the usual way, then temporarily
>> fold it into the base patch and mail the result out for fyi.  Then
>> after <period> I permanently fold the fix into the base and add the
>> footer?
>>
>> If so, what's <period>?
> 
> To me it feels like that should be 0, just squash it in right away, since the
> trees are being rebased constantly right?
> 
> And that means the tree contains exactly what it would if the series were
> re-sent.
> 
> David, what do you think?

How often was it helpful that a fixup patch would stay separate? I would
assume "not often". :)

-- 
Cheers,

David

Re: [PATCH mm-new v8 2/4] mm: khugepaged: refine scan progress number

Posted by Andrew Morton 1 week ago

On Wed, 25 Mar 2026 21:16:11 +0100 "David Hildenbrand (Arm)" <david@kernel.org> wrote:

> On 3/25/26 21:03, Lorenzo Stoakes (Oracle) wrote:
> > On Wed, Mar 25, 2026 at 12:15:49PM -0700, Andrew Morton wrote:
> >> On Wed, 25 Mar 2026 18:53:40 +0000 "Lorenzo Stoakes (Oracle)" <ljs@kernel.org> wrote:
> >>
> >>>
> >>> I mean instead of having a separate commit for the fix, put that fix into the
> >>> patch before it and denote it with a footer as you put above.
> >>>
> >>> I guess that translates to what you do when you rebase and fold the fixes into
> >>> commits as you do now anyway.
> >>>
> >>> I don't see any reason not to do that right away, as really it's good to see the
> >>> combined change in one go for all practical purposes (if I resend, I'll be
> >>> combining work, if I can grab it from the tree and avoid a git rebase -i all the
> >>> better).
> >>
> >> OK.  So what have we concluded here?
> >>
> >> Is it: if I get a -fix, I add that in the usual way, then temporarily
> >> fold it into the base patch and mail the result out for fyi.  Then
> >> after <period> I permanently fold the fix into the base and add the
> >> footer?
> >>
> >> If so, what's <period>?
> > 
> > To me it feels like that should be 0, just squash it in right away,

OK...

> since the trees are being rebased constantly right?

yup, the mm-*_unstable branches and mm-new are blown away and rebuilt
from quilt each time.  And the quilt patches are rediffed and refreshed
during this.

> > And that means the tree contains exactly what it would if the series were
> > re-sent.
> > 
> > David, what do you think?
> 
> How often was it helpful that a fixup patch would stay separate? I would
> assume "not often". :)

Not often.  Sometimes a -fix is messed up and we grow a -fix-fix.  The
record is something like -fix-fix-fix-fix-fix.  I suppose there's
slight value in tracking this for a while.  Not much though.

Re: [PATCH mm-new v8 2/4] mm: khugepaged: refine scan progress number

Posted by Lorenzo Stoakes (Oracle) 1 week ago

On Wed, Mar 25, 2026 at 02:10:23PM +0000, Lorenzo Stoakes (Oracle) wrote:
>
> I will take a look at this (sorry for delay) but general point - while

OK well on second thoughts this is in mm-stable with 2 weeks to go (I still
have no understanding of how any of the process works) and thus is
immutable, and so I won't be reviewing this then I suppose.

Thanks, Lorenzo