[PATCH v2] mm: khugepaged: fix NR_FILE_PAGES and NR_SHMEM in collapse_file()

Shakeel Butt posted 1 patch 1 week, 1 day ago
mm/khugepaged.c | 16 +++++++++-------
1 file changed, 9 insertions(+), 7 deletions(-)
[PATCH v2] mm: khugepaged: fix NR_FILE_PAGES and NR_SHMEM in collapse_file()
Posted by Shakeel Butt 1 week, 1 day ago
In META's fleet, we observed high-level cgroups showing zero file memcg
stats while their descendants had non-zero values. Investigation using
drgn revealed that these parent cgroups actually had negative file stats,
aggregated from their children.

This issue became more frequent after deploying thp-always more widely,
pointing to a correlation with THP file collapsing. The root cause is
that collapse_file() assumes old folios and the new THP belong to the
same node and memcg. When this assumption breaks, stats become skewed.
The bug affects not just memcg stats but also per-numa stats, and not
just NR_FILE_PAGES but also NR_SHMEM.

The assumption breaks in scenarios such as:

1. Small folios allocated on one node while the THP gets allocated on a
   different node.

2. A package downloader running in one cgroup populates the page cache,
   while a job in a different cgroup executes the downloaded binary.

3. A file shared between processes in different cgroups, where one
   process faults in the pages and khugepaged (or madvise(COLLAPSE))
   collapses them on behalf of the other.

Fix the accounting by explicitly incrementing stats for the new THP and
decrementing stats for the old folios being replaced.

Fixes: f3f0e1d2150b ("khugepaged: add support of collapse for tmpfs/shmem pages")
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
---
 mm/khugepaged.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 1d994b6c58c6..fa1e57fd2c46 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2195,16 +2195,13 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
 		xas_lock_irq(&xas);
 	}
 
-	if (is_shmem)
+	if (is_shmem) {
+		lruvec_stat_mod_folio(new_folio, NR_SHMEM, HPAGE_PMD_NR);
 		lruvec_stat_mod_folio(new_folio, NR_SHMEM_THPS, HPAGE_PMD_NR);
-	else
+	} else {
 		lruvec_stat_mod_folio(new_folio, NR_FILE_THPS, HPAGE_PMD_NR);
-
-	if (nr_none) {
-		lruvec_stat_mod_folio(new_folio, NR_FILE_PAGES, nr_none);
-		/* nr_none is always 0 for non-shmem. */
-		lruvec_stat_mod_folio(new_folio, NR_SHMEM, nr_none);
 	}
+	lruvec_stat_mod_folio(new_folio, NR_FILE_PAGES, HPAGE_PMD_NR);
 
 	/*
 	 * Mark new_folio as uptodate before inserting it into the
@@ -2238,6 +2235,11 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
 	 */
 	list_for_each_entry_safe(folio, tmp, &pagelist, lru) {
 		list_del(&folio->lru);
+		lruvec_stat_mod_folio(folio, NR_FILE_PAGES,
+				      -folio_nr_pages(folio));
+		if (is_shmem)
+			lruvec_stat_mod_folio(folio, NR_SHMEM,
+					      -folio_nr_pages(folio));
 		folio->mapping = NULL;
 		folio_clear_active(folio);
 		folio_clear_unevictable(folio);
-- 
2.47.3
Re: [PATCH v2] mm: khugepaged: fix NR_FILE_PAGES and NR_SHMEM in collapse_file()
Posted by David Hildenbrand (arm) 5 days, 5 hours ago
On 1/30/26 05:29, Shakeel Butt wrote:
> In META's fleet, we observed high-level cgroups showing zero file memcg
> stats while their descendants had non-zero values. Investigation using
> drgn revealed that these parent cgroups actually had negative file stats,
> aggregated from their children.
> 
> This issue became more frequent after deploying thp-always more widely,
> pointing to a correlation with THP file collapsing. The root cause is
> that collapse_file() assumes old folios and the new THP belong to the
> same node and memcg. When this assumption breaks, stats become skewed.
> The bug affects not just memcg stats but also per-numa stats, and not
> just NR_FILE_PAGES but also NR_SHMEM.
> 
> The assumption breaks in scenarios such as:
> 
> 1. Small folios allocated on one node while the THP gets allocated on a
>     different node.
> 
> 2. A package downloader running in one cgroup populates the page cache,
>     while a job in a different cgroup executes the downloaded binary.
> 
> 3. A file shared between processes in different cgroups, where one
>     process faults in the pages and khugepaged (or madvise(COLLAPSE))
>     collapses them on behalf of the other.
> 
> Fix the accounting by explicitly incrementing stats for the new THP and
> decrementing stats for the old folios being replaced.
> 
> Fixes: f3f0e1d2150b ("khugepaged: add support of collapse for tmpfs/shmem pages")
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> ---

Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>

-- 
Cheers

David
Re: [PATCH v2] mm: khugepaged: fix NR_FILE_PAGES and NR_SHMEM in collapse_file()
Posted by David Hildenbrand (arm) 5 days, 5 hours ago
On 2/2/26 09:54, David Hildenbrand (arm) wrote:
> On 1/30/26 05:29, Shakeel Butt wrote:
>> In META's fleet, we observed high-level cgroups showing zero file memcg
>> stats while their descendants had non-zero values. Investigation using
>> drgn revealed that these parent cgroups actually had negative file stats,
>> aggregated from their children.
>>
>> This issue became more frequent after deploying thp-always more widely,
>> pointing to a correlation with THP file collapsing. The root cause is
>> that collapse_file() assumes old folios and the new THP belong to the
>> same node and memcg. When this assumption breaks, stats become skewed.
>> The bug affects not just memcg stats but also per-numa stats, and not
>> just NR_FILE_PAGES but also NR_SHMEM.
>>
>> The assumption breaks in scenarios such as:
>>
>> 1. Small folios allocated on one node while the THP gets allocated on a
>>      different node.
>>
>> 2. A package downloader running in one cgroup populates the page cache,
>>      while a job in a different cgroup executes the downloaded binary.
>>
>> 3. A file shared between processes in different cgroups, where one
>>      process faults in the pages and khugepaged (or madvise(COLLAPSE))
>>      collapses them on behalf of the other.
>>
>> Fix the accounting by explicitly incrementing stats for the new THP and
>> decrementing stats for the old folios being replaced.
>>
>> Fixes: f3f0e1d2150b ("khugepaged: add support of collapse for tmpfs/shmem pages")
>> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
>> ---
> 
> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>

Heh, forgot to adjust the shortcut

Acked-by: David Hildenbrand (arm) <david@kernel.org>

-- 
Cheers

David
Re: [PATCH v2] mm: khugepaged: fix NR_FILE_PAGES and NR_SHMEM in collapse_file()
Posted by Andrew Morton 6 days, 17 hours ago
On Thu, 29 Jan 2026 20:29:25 -0800 Shakeel Butt <shakeel.butt@linux.dev> wrote:

> In META's fleet, we observed high-level cgroups showing zero file memcg
> stats while their descendants had non-zero values. Investigation using
> drgn revealed that these parent cgroups actually had negative file stats,
> aggregated from their children.
> 
> This issue became more frequent after deploying thp-always more widely,
> pointing to a correlation with THP file collapsing. The root cause is
> that collapse_file() assumes old folios and the new THP belong to the
> same node and memcg. When this assumption breaks, stats become skewed.
> The bug affects not just memcg stats but also per-numa stats, and not
> just NR_FILE_PAGES but also NR_SHMEM.
> 
> The assumption breaks in scenarios such as:
> 
> 1. Small folios allocated on one node while the THP gets allocated on a
>    different node.
> 
> 2. A package downloader running in one cgroup populates the page cache,
>    while a job in a different cgroup executes the downloaded binary.
> 
> 3. A file shared between processes in different cgroups, where one
>    process faults in the pages and khugepaged (or madvise(COLLAPSE))
>    collapses them on behalf of the other.
> 
> Fix the accounting by explicitly incrementing stats for the new THP and
> decrementing stats for the old folios being replaced.
> 
> Fixes: f3f0e1d2150b ("khugepaged: add support of collapse for tmpfs/shmem pages")

As the bug is 10 years old I think I'll queue this for 6.20(?)-rc1 with
cc:stable.  Just to get it a bit more time-under-test before -stable
kernels pick it up.  Sound OK?
Re: [PATCH v2] mm: khugepaged: fix NR_FILE_PAGES and NR_SHMEM in collapse_file()
Posted by Shakeel Butt 6 days, 16 hours ago
January 31, 2026 at 1:15 PM, "Andrew Morton" <akpm@linux-foundation.org mailto:akpm@linux-foundation.org?to=%22Andrew%20Morton%22%20%3Cakpm%40linux-foundation.org%3E > wrote:

> As the bug is 10 years old I think I'll queue this for 6.20(?)-rc1 with
> cc:stable. Just to get it a bit more time-under-test before -stable
> kernels pick it up. Sound OK?
>

Yup, sounds reasonable.
Re: [PATCH v2] mm: khugepaged: fix NR_FILE_PAGES and NR_SHMEM in collapse_file()
Posted by Kiryl Shutsemau 1 week ago
On Thu, Jan 29, 2026 at 08:29:25PM -0800, Shakeel Butt wrote:
> In META's fleet, we observed high-level cgroups showing zero file memcg
> stats while their descendants had non-zero values. Investigation using
> drgn revealed that these parent cgroups actually had negative file stats,
> aggregated from their children.
> 
> This issue became more frequent after deploying thp-always more widely,
> pointing to a correlation with THP file collapsing. The root cause is
> that collapse_file() assumes old folios and the new THP belong to the
> same node and memcg. When this assumption breaks, stats become skewed.
> The bug affects not just memcg stats but also per-numa stats, and not
> just NR_FILE_PAGES but also NR_SHMEM.
> 
> The assumption breaks in scenarios such as:
> 
> 1. Small folios allocated on one node while the THP gets allocated on a
>    different node.
> 
> 2. A package downloader running in one cgroup populates the page cache,
>    while a job in a different cgroup executes the downloaded binary.
> 
> 3. A file shared between processes in different cgroups, where one
>    process faults in the pages and khugepaged (or madvise(COLLAPSE))
>    collapses them on behalf of the other.
> 
> Fix the accounting by explicitly incrementing stats for the new THP and
> decrementing stats for the old folios being replaced.
> 
> Fixes: f3f0e1d2150b ("khugepaged: add support of collapse for tmpfs/shmem pages")

My bug survived for almost 10 years!

> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>

Reviewed-by: Kiryl Shutsemau <kas@kernel.org>

-- 
  Kiryl Shutsemau / Kirill A. Shutemov
Re: [PATCH v2] mm: khugepaged: fix NR_FILE_PAGES and NR_SHMEM in collapse_file()
Posted by Johannes Weiner 1 week ago
On Thu, Jan 29, 2026 at 08:29:25PM -0800, Shakeel Butt wrote:
> In META's fleet, we observed high-level cgroups showing zero file memcg
> stats while their descendants had non-zero values. Investigation using
> drgn revealed that these parent cgroups actually had negative file stats,
> aggregated from their children.
> 
> This issue became more frequent after deploying thp-always more widely,
> pointing to a correlation with THP file collapsing. The root cause is
> that collapse_file() assumes old folios and the new THP belong to the
> same node and memcg. When this assumption breaks, stats become skewed.
> The bug affects not just memcg stats but also per-numa stats, and not
> just NR_FILE_PAGES but also NR_SHMEM.
> 
> The assumption breaks in scenarios such as:
> 
> 1. Small folios allocated on one node while the THP gets allocated on a
>    different node.
> 
> 2. A package downloader running in one cgroup populates the page cache,
>    while a job in a different cgroup executes the downloaded binary.
> 
> 3. A file shared between processes in different cgroups, where one
>    process faults in the pages and khugepaged (or madvise(COLLAPSE))
>    collapses them on behalf of the other.
> 
> Fix the accounting by explicitly incrementing stats for the new THP and
> decrementing stats for the old folios being replaced.
> 
> Fixes: f3f0e1d2150b ("khugepaged: add support of collapse for tmpfs/shmem pages")
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Re: [PATCH v2] mm: khugepaged: fix NR_FILE_PAGES and NR_SHMEM in collapse_file()
Posted by Barry Song 1 week, 1 day ago
On Fri, Jan 30, 2026 at 12:29 PM Shakeel Butt <shakeel.butt@linux.dev> wrote:
>
> In META's fleet, we observed high-level cgroups showing zero file memcg
> stats while their descendants had non-zero values. Investigation using
> drgn revealed that these parent cgroups actually had negative file stats,
> aggregated from their children.
>
> This issue became more frequent after deploying thp-always more widely,
> pointing to a correlation with THP file collapsing. The root cause is
> that collapse_file() assumes old folios and the new THP belong to the
> same node and memcg. When this assumption breaks, stats become skewed.
> The bug affects not just memcg stats but also per-numa stats, and not
> just NR_FILE_PAGES but also NR_SHMEM.
>
> The assumption breaks in scenarios such as:
>
> 1. Small folios allocated on one node while the THP gets allocated on a
>    different node.
>
> 2. A package downloader running in one cgroup populates the page cache,
>    while a job in a different cgroup executes the downloaded binary.
>
> 3. A file shared between processes in different cgroups, where one
>    process faults in the pages and khugepaged (or madvise(COLLAPSE))
>    collapses them on behalf of the other.
>
> Fix the accounting by explicitly incrementing stats for the new THP and
> decrementing stats for the old folios being replaced.
>
> Fixes: f3f0e1d2150b ("khugepaged: add support of collapse for tmpfs/shmem pages")
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>

Thanks!

Reviewed-by: Barry Song <baohua@kernel.org>

> ---
>  mm/khugepaged.c | 16 +++++++++-------
>  1 file changed, 9 insertions(+), 7 deletions(-)
Re: [PATCH v2] mm: khugepaged: fix NR_FILE_PAGES and NR_SHMEM in collapse_file()
Posted by Dev Jain 1 week, 1 day ago
On 30/01/26 9:59 am, Shakeel Butt wrote:
> In META's fleet, we observed high-level cgroups showing zero file memcg
> stats while their descendants had non-zero values. Investigation using
> drgn revealed that these parent cgroups actually had negative file stats,
> aggregated from their children.
>
> This issue became more frequent after deploying thp-always more widely,
> pointing to a correlation with THP file collapsing. The root cause is
> that collapse_file() assumes old folios and the new THP belong to the
> same node and memcg. When this assumption breaks, stats become skewed.
> The bug affects not just memcg stats but also per-numa stats, and not
> just NR_FILE_PAGES but also NR_SHMEM.
>
> The assumption breaks in scenarios such as:
>
> 1. Small folios allocated on one node while the THP gets allocated on a
>    different node.
>
> 2. A package downloader running in one cgroup populates the page cache,
>    while a job in a different cgroup executes the downloaded binary.
>
> 3. A file shared between processes in different cgroups, where one
>    process faults in the pages and khugepaged (or madvise(COLLAPSE))
>    collapses them on behalf of the other.
>
> Fix the accounting by explicitly incrementing stats for the new THP and
> decrementing stats for the old folios being replaced.
>
> Fixes: f3f0e1d2150b ("khugepaged: add support of collapse for tmpfs/shmem pages")
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> ---

Thanks.

Reviewed-by: Dev Jain <dev.jain@arm.com>

>  mm/khugepaged.c | 16 +++++++++-------
>  1 file changed, 9 insertions(+), 7 deletions(-)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 1d994b6c58c6..fa1e57fd2c46 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -2195,16 +2195,13 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
>  		xas_lock_irq(&xas);
>  	}
>  
> -	if (is_shmem)
> +	if (is_shmem) {
> +		lruvec_stat_mod_folio(new_folio, NR_SHMEM, HPAGE_PMD_NR);
>  		lruvec_stat_mod_folio(new_folio, NR_SHMEM_THPS, HPAGE_PMD_NR);
> -	else
> +	} else {
>  		lruvec_stat_mod_folio(new_folio, NR_FILE_THPS, HPAGE_PMD_NR);
> -
> -	if (nr_none) {
> -		lruvec_stat_mod_folio(new_folio, NR_FILE_PAGES, nr_none);
> -		/* nr_none is always 0 for non-shmem. */
> -		lruvec_stat_mod_folio(new_folio, NR_SHMEM, nr_none);
>  	}
> +	lruvec_stat_mod_folio(new_folio, NR_FILE_PAGES, HPAGE_PMD_NR);
>  
>  	/*
>  	 * Mark new_folio as uptodate before inserting it into the
> @@ -2238,6 +2235,11 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
>  	 */
>  	list_for_each_entry_safe(folio, tmp, &pagelist, lru) {
>  		list_del(&folio->lru);
> +		lruvec_stat_mod_folio(folio, NR_FILE_PAGES,
> +				      -folio_nr_pages(folio));
> +		if (is_shmem)
> +			lruvec_stat_mod_folio(folio, NR_SHMEM,
> +					      -folio_nr_pages(folio));

I notice here that we don't need to do accounting for NR_SHMEM_THPS or NR_FILE_THPS -
but the following bit:

if (folio_order(folio) == HPAGE_PMD_ORDER && folio->index == start)

in the khugepaged code, seems to suggest that we can reach this stat accounting path
with a PMD order old folio, if folio->index != start. But this condition should not be possible;
a folio is always order-aligned within the file, which means the folio->index here
is PMD-aligned. The entry of collapse_file() asserts that start is also PMD-aligned (guaranteed
by thp_vma_allowable_order in khugepaged_scan_mm_slot). Therefore start must equal folio->index.

If I am not missing something here, I'll send a patch to convert this to a VM_WARN_ON.
 

>  		folio->mapping = NULL;
>  		folio_clear_active(folio);
>  		folio_clear_unevictable(folio);
Re: [PATCH v2] mm: khugepaged: fix NR_FILE_PAGES and NR_SHMEM in collapse_file()
Posted by Lance Yang 1 week, 1 day ago

On 2026/1/30 16:10, Dev Jain wrote:
> 
> On 30/01/26 9:59 am, Shakeel Butt wrote:
>> In META's fleet, we observed high-level cgroups showing zero file memcg
>> stats while their descendants had non-zero values. Investigation using
>> drgn revealed that these parent cgroups actually had negative file stats,
>> aggregated from their children.
>>
>> This issue became more frequent after deploying thp-always more widely,
>> pointing to a correlation with THP file collapsing. The root cause is
>> that collapse_file() assumes old folios and the new THP belong to the
>> same node and memcg. When this assumption breaks, stats become skewed.
>> The bug affects not just memcg stats but also per-numa stats, and not
>> just NR_FILE_PAGES but also NR_SHMEM.
>>
>> The assumption breaks in scenarios such as:
>>
>> 1. Small folios allocated on one node while the THP gets allocated on a
>>     different node.
>>
>> 2. A package downloader running in one cgroup populates the page cache,
>>     while a job in a different cgroup executes the downloaded binary.
>>
>> 3. A file shared between processes in different cgroups, where one
>>     process faults in the pages and khugepaged (or madvise(COLLAPSE))
>>     collapses them on behalf of the other.
>>
>> Fix the accounting by explicitly incrementing stats for the new THP and
>> decrementing stats for the old folios being replaced.
>>
>> Fixes: f3f0e1d2150b ("khugepaged: add support of collapse for tmpfs/shmem pages")
>> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
>> ---
> 
> Thanks.
> 
> Reviewed-by: Dev Jain <dev.jain@arm.com>
> 
>>   mm/khugepaged.c | 16 +++++++++-------
>>   1 file changed, 9 insertions(+), 7 deletions(-)
>>
>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> index 1d994b6c58c6..fa1e57fd2c46 100644
>> --- a/mm/khugepaged.c
>> +++ b/mm/khugepaged.c
>> @@ -2195,16 +2195,13 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
>>   		xas_lock_irq(&xas);
>>   	}
>>   
>> -	if (is_shmem)
>> +	if (is_shmem) {
>> +		lruvec_stat_mod_folio(new_folio, NR_SHMEM, HPAGE_PMD_NR);
>>   		lruvec_stat_mod_folio(new_folio, NR_SHMEM_THPS, HPAGE_PMD_NR);
>> -	else
>> +	} else {
>>   		lruvec_stat_mod_folio(new_folio, NR_FILE_THPS, HPAGE_PMD_NR);
>> -
>> -	if (nr_none) {
>> -		lruvec_stat_mod_folio(new_folio, NR_FILE_PAGES, nr_none);
>> -		/* nr_none is always 0 for non-shmem. */
>> -		lruvec_stat_mod_folio(new_folio, NR_SHMEM, nr_none);
>>   	}
>> +	lruvec_stat_mod_folio(new_folio, NR_FILE_PAGES, HPAGE_PMD_NR);
>>   
>>   	/*
>>   	 * Mark new_folio as uptodate before inserting it into the
>> @@ -2238,6 +2235,11 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
>>   	 */
>>   	list_for_each_entry_safe(folio, tmp, &pagelist, lru) {
>>   		list_del(&folio->lru);
>> +		lruvec_stat_mod_folio(folio, NR_FILE_PAGES,
>> +				      -folio_nr_pages(folio));
>> +		if (is_shmem)
>> +			lruvec_stat_mod_folio(folio, NR_SHMEM,
>> +					      -folio_nr_pages(folio));
> 
> I notice here that we don't need to do accounting for NR_SHMEM_THPS or NR_FILE_THPS -
> but the following bit:
> 
> if (folio_order(folio) == HPAGE_PMD_ORDER && folio->index == start)
> 
> in the khugepaged code, seems to suggest that we can reach this stat accounting path
> with a PMD order old folio, if folio->index != start. But this condition should not be possible;
> a folio is always order-aligned within the file, which means the folio->index here
> is PMD-aligned. The entry of collapse_file() asserts that start is also PMD-aligned (guaranteed


Yep, good catch! There are checks in __filemap_add_folio():

	VM_BUG_ON_FOLIO(index & (folio_nr_pages(folio) - 1), folio);

and at the top of collapse_file():

	VM_BUG_ON(start & (HPAGE_PMD_NR - 1));

guarantee that any PMD folio in the scan range [start, start + HPAGE_PMD_NR)
must have index == start.

Converting this to a VM_WARN_ON looks good to me :)


Cheers,
Lance

> by thp_vma_allowable_order in khugepaged_scan_mm_slot). Therefore start must equal folio->index.
> 
> If I am not missing something here, I'll send a patch to convert this to a VM_WARN_ON.
>   
> 
>>   		folio->mapping = NULL;
>>   		folio_clear_active(folio);
>>   		folio_clear_unevictable(folio);
Re: [PATCH v2] mm: khugepaged: fix NR_FILE_PAGES and NR_SHMEM in collapse_file()
Posted by Baolin Wang 1 week, 1 day ago

On 1/30/26 12:29 PM, Shakeel Butt wrote:
> In META's fleet, we observed high-level cgroups showing zero file memcg
> stats while their descendants had non-zero values. Investigation using
> drgn revealed that these parent cgroups actually had negative file stats,
> aggregated from their children.
> 
> This issue became more frequent after deploying thp-always more widely,
> pointing to a correlation with THP file collapsing. The root cause is
> that collapse_file() assumes old folios and the new THP belong to the
> same node and memcg. When this assumption breaks, stats become skewed.
> The bug affects not just memcg stats but also per-numa stats, and not
> just NR_FILE_PAGES but also NR_SHMEM.
> 
> The assumption breaks in scenarios such as:
> 
> 1. Small folios allocated on one node while the THP gets allocated on a
>     different node.
> 
> 2. A package downloader running in one cgroup populates the page cache,
>     while a job in a different cgroup executes the downloaded binary.
> 
> 3. A file shared between processes in different cgroups, where one
>     process faults in the pages and khugepaged (or madvise(COLLAPSE))
>     collapses them on behalf of the other.
> 
> Fix the accounting by explicitly incrementing stats for the new THP and
> decrementing stats for the old folios being replaced.
> 
> Fixes: f3f0e1d2150b ("khugepaged: add support of collapse for tmpfs/shmem pages")
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> ---

Thanks for the fix. LGTM.
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>