mm: memcontrol: fix memcg accounting during cpu hotplug

[PATCH] mm: memcontrol: fix memcg accounting during cpu hotplug

Posted by Andrew Guerrero 2 days, 12 hours ago

A filesystem writeback performance issue was discovered by repeatedly
running CPU hotplug operations while a process in a cgroup with memory
and io controllers enabled wrote to an ext4 file in a loop.

When a CPU is offlined, the memcg_hotplug_cpu_dead() callback function
flushes per-cpu vmstats counters. However, instead of applying a per-cpu
counter once to each cgroup in the heirarchy, the per-cpu counter is
applied repeatedly just to the nested cgroup. Under certain conditions,
the per-cpu NR_FILE_DIRTY counter is routinely positive during hotplug
events and the dirty file count artifically inflates. Once the dirty
file count grows past the dirty_freerun_ceiling(), balance_dirty_pages()
starts a backgroup writeback each time a file page is marked dirty
within the nested cgroup.

This change fixes memcg_hotplug_cpu_dead() so that the per-cpu vmstats
and vmevents counters are applied once to each cgroup in the heirarchy,
similar to __mod_memcg_state() and __count_memcg_events().

Fixes: 42a300353577 ("mm: memcontrol: fix recursive statistics correctness & scalabilty")
Signed-off-by: Andrew Guerrero <ajgja@amazon.com>
Reviewed-by: Gunnar Kudrjavets <gunnarku@amazon.com>
---
Hey all,

This patch is intended for the 5.10 longterm release branch. It will not apply
cleanly to mainline and is inadvertantly fixed by a larger series of changes in 
later release branches:
a3d4c05a4474 ("mm: memcontrol: fix cpuhotplug statistics flushing").

In 5.15, the counter flushing code is completely removed. This may be another
viable option here too, though it's a larger change.

Thanks!
---
 mm/memcontrol.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 142b4d5e08fe..8e085a4f45b7 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2394,7 +2394,7 @@ static int memcg_hotplug_cpu_dead(unsigned int cpu)
 			x = this_cpu_xchg(memcg->vmstats_percpu->stat[i], 0);
 			if (x)
 				for (mi = memcg; mi; mi = parent_mem_cgroup(mi))
-					atomic_long_add(x, &memcg->vmstats[i]);
+					atomic_long_add(x, &mi->vmstats[i]);
 
 			if (i >= NR_VM_NODE_STAT_ITEMS)
 				continue;
@@ -2417,7 +2417,7 @@ static int memcg_hotplug_cpu_dead(unsigned int cpu)
 			x = this_cpu_xchg(memcg->vmstats_percpu->events[i], 0);
 			if (x)
 				for (mi = memcg; mi; mi = parent_mem_cgroup(mi))
-					atomic_long_add(x, &memcg->vmevents[i]);
+					atomic_long_add(x, &mi->vmevents[i]);
 		}
 	}
 

base-commit: c30b4019ea89633d790f0bfcbb03234f0d006f87
-- 
2.47.3

Re: [PATCH] mm: memcontrol: fix memcg accounting during cpu hotplug

Posted by Greg KH 1 day, 2 hours ago

On Sat, Sep 06, 2025 at 03:21:08AM +0000, Andrew Guerrero wrote:
> A filesystem writeback performance issue was discovered by repeatedly
> running CPU hotplug operations while a process in a cgroup with memory
> and io controllers enabled wrote to an ext4 file in a loop.
> 
> When a CPU is offlined, the memcg_hotplug_cpu_dead() callback function
> flushes per-cpu vmstats counters. However, instead of applying a per-cpu
> counter once to each cgroup in the heirarchy, the per-cpu counter is
> applied repeatedly just to the nested cgroup. Under certain conditions,
> the per-cpu NR_FILE_DIRTY counter is routinely positive during hotplug
> events and the dirty file count artifically inflates. Once the dirty
> file count grows past the dirty_freerun_ceiling(), balance_dirty_pages()
> starts a backgroup writeback each time a file page is marked dirty
> within the nested cgroup.
> 
> This change fixes memcg_hotplug_cpu_dead() so that the per-cpu vmstats
> and vmevents counters are applied once to each cgroup in the heirarchy,
> similar to __mod_memcg_state() and __count_memcg_events().
> 
> Fixes: 42a300353577 ("mm: memcontrol: fix recursive statistics correctness & scalabilty")
> Signed-off-by: Andrew Guerrero <ajgja@amazon.com>
> Reviewed-by: Gunnar Kudrjavets <gunnarku@amazon.com>
> ---
> Hey all,
> 
> This patch is intended for the 5.10 longterm release branch. It will not apply
> cleanly to mainline and is inadvertantly fixed by a larger series of changes in 
> later release branches:
> a3d4c05a4474 ("mm: memcontrol: fix cpuhotplug statistics flushing").

Why can't we take those instead?

> In 5.15, the counter flushing code is completely removed. This may be another
> viable option here too, though it's a larger change.

If it's not needed anymore, why not just remove it with the upstream
commits as well?

thanks,

greg k-h