mm/memcontrol.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
From: jiebin sun <jiebin.sun@intel.com>
Remove the redundant updating of stats_flush_threshold. If the
global var stats_flush_threshold has exceeded the trigger value
for __mem_cgroup_flush_stats, further increment is unnecessary.
Apply the patch and test the pts/hackbench-1.0.0 Count:4 (160 threads).
Score gain: 1.95x
Reduce CPU cycles in __mod_memcg_lruvec_state (44.88% -> 0.12%)
CPU: ICX 8380 x 2 sockets
Core number: 40 x 2 physical cores
Benchmark: pts/hackbench-1.0.0 Count:4 (160 threads)
Signed-off-by: Jiebin Sun <jiebin.sun@intel.com>
---
mm/memcontrol.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index abec50f31fe6..9e8c6f24c694 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -626,7 +626,14 @@ static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val)
x = __this_cpu_add_return(stats_updates, abs(val));
if (x > MEMCG_CHARGE_BATCH) {
- atomic_add(x / MEMCG_CHARGE_BATCH, &stats_flush_threshold);
+ /*
+ * If stats_flush_threshold exceeds the threshold
+ * (>num_online_cpus()), cgroup stats update will be triggered
+ * in __mem_cgroup_flush_stats(). Increasing this var further
+ * is redundant and simply adds overhead in atomic update.
+ */
+ if (atomic_read(&stats_flush_threshold) <= num_online_cpus())
+ atomic_add(x / MEMCG_CHARGE_BATCH, &stats_flush_threshold);
__this_cpu_write(stats_updates, 0);
}
}
--
2.31.1
On Sat, Jul 23, 2022 at 12:49:49AM +0800, Jiebin Sun wrote: > From: jiebin sun <jiebin.sun@intel.com> > > Remove the redundant updating of stats_flush_threshold. If the > global var stats_flush_threshold has exceeded the trigger value > for __mem_cgroup_flush_stats, further increment is unnecessary. > > Apply the patch and test the pts/hackbench-1.0.0 Count:4 (160 threads). > > Score gain: 1.95x > Reduce CPU cycles in __mod_memcg_lruvec_state (44.88% -> 0.12%) > > CPU: ICX 8380 x 2 sockets > Core number: 40 x 2 physical cores > Benchmark: pts/hackbench-1.0.0 Count:4 (160 threads) > > Signed-off-by: Jiebin Sun <jiebin.sun@intel.com> Acked-by: Muchun Song <songmuchun@bytedance.com> Thanks.
On Sat, Jul 23, 2022 at 12:49:49AM +0800, Jiebin Sun wrote: > From: jiebin sun <jiebin.sun@intel.com> > > Remove the redundant updating of stats_flush_threshold. If the > global var stats_flush_threshold has exceeded the trigger value > for __mem_cgroup_flush_stats, further increment is unnecessary. > > Apply the patch and test the pts/hackbench-1.0.0 Count:4 (160 threads). > > Score gain: 1.95x > Reduce CPU cycles in __mod_memcg_lruvec_state (44.88% -> 0.12%) > > CPU: ICX 8380 x 2 sockets > Core number: 40 x 2 physical cores > Benchmark: pts/hackbench-1.0.0 Count:4 (160 threads) > > Signed-off-by: Jiebin Sun <jiebin.sun@intel.com> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev> Good optimization, thanks!
> >On Sat, Jul 23, 2022 at 12:49:49AM +0800, Jiebin Sun wrote: >> From: jiebin sun <jiebin.sun@intel.com> >> >> Remove the redundant updating of stats_flush_threshold. If the global >> var stats_flush_threshold has exceeded the trigger value for >> __mem_cgroup_flush_stats, further increment is unnecessary. >> >> Apply the patch and test the pts/hackbench-1.0.0 Count:4 (160 threads). >> >> Score gain: 1.95x >> Reduce CPU cycles in __mod_memcg_lruvec_state (44.88% -> 0.12%) >> >> CPU: ICX 8380 x 2 sockets >> Core number: 40 x 2 physical cores >> Benchmark: pts/hackbench-1.0.0 Count:4 (160 threads) >> >> Signed-off-by: Jiebin Sun <jiebin.sun@intel.com> > >Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev> > >Good optimization, thanks! Looks good. Nice performance improvement. Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
On Sat, Jul 23, 2022 at 12:49:49AM +0800, Jiebin Sun wrote: > From: jiebin sun <jiebin.sun@intel.com> > > Remove the redundant updating of stats_flush_threshold. If the > global var stats_flush_threshold has exceeded the trigger value > for __mem_cgroup_flush_stats, further increment is unnecessary. > > Apply the patch and test the pts/hackbench-1.0.0 Count:4 (160 threads). > > Score gain: 1.95x > Reduce CPU cycles in __mod_memcg_lruvec_state (44.88% -> 0.12%) > > CPU: ICX 8380 x 2 sockets > Core number: 40 x 2 physical cores > Benchmark: pts/hackbench-1.0.0 Count:4 (160 threads) > > Signed-off-by: Jiebin Sun <jiebin.sun@intel.com> Yes, this makes sense. No need to dirty a cacheline if we are already over the threshold. Acked-by: Shakeel Butt <shakeelb@google.com>
© 2016 - 2026 Red Hat, Inc.