[PATCH] memcg: Optimize exit to user space

Thomas Gleixner posted 1 patch 1 month, 3 weeks ago
include/linux/memcontrol.h |    8 +++++++-
mm/memcontrol.c            |    4 ++--
2 files changed, 9 insertions(+), 3 deletions(-)
[PATCH] memcg: Optimize exit to user space
Posted by Thomas Gleixner 1 month, 3 weeks ago
memcg uses TIF_NOTIFY_RESUME to handle reclaiming on exit to user
space. TIF_NOTIFY_RESUME is a multiplexing TIF bit, which is utilized by
other entities as well.

This results in a unconditional mem_cgroup_handle_over_high() call for
every invocation of resume_user_mode_work(), which is a pointless
exercise as most of the time there is no reclaim work to do.

Especially since RSEQ is used by glibc, TIF_NOTIFY_RESUME is raised
quite frequently and the empty calls show up in exit path profiling.

Optimize this by doing a quick check of the reclaim condition before
invoking it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
 include/linux/memcontrol.h |    8 +++++++-
 mm/memcontrol.c            |    4 ++--
 2 files changed, 9 insertions(+), 3 deletions(-)

--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -900,7 +900,13 @@ unsigned long mem_cgroup_get_zone_lru_si
 	return READ_ONCE(mz->lru_zone_size[zone_idx][lru]);
 }
 
-void mem_cgroup_handle_over_high(gfp_t gfp_mask);
+void __mem_cgroup_handle_over_high(gfp_t gfp_mask);
+
+static inline void mem_cgroup_handle_over_high(gfp_t gfp_mask)
+{
+	if (unlikely(current->memcg_nr_pages_over_high))
+		__mem_cgroup_handle_over_high(gfp_mask);
+}
 
 unsigned long mem_cgroup_get_max(struct mem_cgroup *memcg);
 
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2203,7 +2203,7 @@ static unsigned long calculate_high_dela
  * try_charge() (context permitting), as well as from the userland
  * return path where reclaim is always able to block.
  */
-void mem_cgroup_handle_over_high(gfp_t gfp_mask)
+void __mem_cgroup_handle_over_high(gfp_t gfp_mask)
 {
 	unsigned long penalty_jiffies;
 	unsigned long pflags;
@@ -2486,7 +2486,7 @@ static int try_charge_memcg(struct mem_c
 	if (current->memcg_nr_pages_over_high > MEMCG_CHARGE_BATCH &&
 	    !(current->flags & PF_MEMALLOC) &&
 	    gfpflags_allow_blocking(gfp_mask))
-		mem_cgroup_handle_over_high(gfp_mask);
+		__mem_cgroup_handle_over_high(gfp_mask);
 	return 0;
 }
Re: [PATCH] memcg: Optimize exit to user space
Posted by Shakeel Butt 1 month, 3 weeks ago
On Wed, Aug 13, 2025 at 04:57:55PM +0200, Thomas Gleixner wrote:
> memcg uses TIF_NOTIFY_RESUME to handle reclaiming on exit to user
> space. TIF_NOTIFY_RESUME is a multiplexing TIF bit, which is utilized by
> other entities as well.
> 
> This results in a unconditional mem_cgroup_handle_over_high() call for
> every invocation of resume_user_mode_work(), which is a pointless
> exercise as most of the time there is no reclaim work to do.
> 
> Especially since RSEQ is used by glibc, TIF_NOTIFY_RESUME is raised
> quite frequently and the empty calls show up in exit path profiling.
> 
> Optimize this by doing a quick check of the reclaim condition before
> invoking it.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Roman Gushchin <roman.gushchin@linux.dev>
> Cc: Shakeel Butt <shakeel.butt@linux.dev>
> Cc: Muchun Song <muchun.song@linux.dev>
> Cc: Andrew Morton <akpm@linux-foundation.org>

Acked-by: Shakeel Butt <shakeel.butt@linux.dev>

Since this is seen in profiling data and it is simple enough, I think it
is worth backporting to stable trees as well.

In the followup cleanup, we can remove the (!nr_pages) check inside
__mem_cgroup_handle_over_high() as well.
Re: [PATCH] memcg: Optimize exit to user space
Posted by Andrew Morton 1 month, 3 weeks ago
On Wed, 13 Aug 2025 10:19:03 -0700 Shakeel Butt <shakeel.butt@linux.dev> wrote:

> On Wed, Aug 13, 2025 at 04:57:55PM +0200, Thomas Gleixner wrote:
> > memcg uses TIF_NOTIFY_RESUME to handle reclaiming on exit to user
> > space. TIF_NOTIFY_RESUME is a multiplexing TIF bit, which is utilized by
> > other entities as well.
> > 
> > This results in a unconditional mem_cgroup_handle_over_high() call for
> > every invocation of resume_user_mode_work(), which is a pointless
> > exercise as most of the time there is no reclaim work to do.
> > 
> > Especially since RSEQ is used by glibc, TIF_NOTIFY_RESUME is raised
> > quite frequently and the empty calls show up in exit path profiling.
> > 
> > Optimize this by doing a quick check of the reclaim condition before
> > invoking it.
> > 
> > Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Johannes Weiner <hannes@cmpxchg.org>
> > Cc: Michal Hocko <mhocko@kernel.org>
> > Cc: Roman Gushchin <roman.gushchin@linux.dev>
> > Cc: Shakeel Butt <shakeel.butt@linux.dev>
> > Cc: Muchun Song <muchun.song@linux.dev>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> 
> Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
> 
> Since this is seen in profiling data and it is simple enough, I think it
> is worth backporting to stable trees as well.

People will probably do this, but it's a big break of -stable rules.

If it is a regression fix (ie, has a Fixes:) and if it makes a big
difference (ie, comes with impressive quantitative testing results)
then maybe we could push it into -stable anyway...

> In the followup cleanup, we can remove the (!nr_pages) check inside
> __mem_cgroup_handle_over_high() as well.

yup, how about we do that now

--- a/mm/memcontrol.c~memcg-optimize-exit-to-user-space-fix
+++ a/mm/memcontrol.c
@@ -2213,9 +2213,6 @@ void __mem_cgroup_handle_over_high(gfp_t
 	struct mem_cgroup *memcg;
 	bool in_retry = false;
 
-	if (likely(!nr_pages))
-		return;
-
 	memcg = get_mem_cgroup_from_mm(current->mm);
 	current->memcg_nr_pages_over_high = 0;
 
_
Re: [PATCH] memcg: Optimize exit to user space
Posted by Thomas Gleixner 1 month, 3 weeks ago
On Wed, Aug 13 2025 at 10:19, Shakeel Butt wrote:
> On Wed, Aug 13, 2025 at 04:57:55PM +0200, Thomas Gleixner wrote:
> Since this is seen in profiling data and it is simple enough, I think it
> is worth backporting to stable trees as well.

Your call.

> In the followup cleanup, we can remove the (!nr_pages) check inside
> __mem_cgroup_handle_over_high() as well.

Yes. I did not want to do that in one go, but that's an obvious follow
up.

Thanks,

        tglx
Re: [PATCH] memcg: Optimize exit to user space
Posted by Johannes Weiner 1 month, 3 weeks ago
On Wed, Aug 13, 2025 at 04:57:55PM +0200, Thomas Gleixner wrote:
> memcg uses TIF_NOTIFY_RESUME to handle reclaiming on exit to user
> space. TIF_NOTIFY_RESUME is a multiplexing TIF bit, which is utilized by
> other entities as well.
> 
> This results in a unconditional mem_cgroup_handle_over_high() call for
> every invocation of resume_user_mode_work(), which is a pointless
> exercise as most of the time there is no reclaim work to do.
> 
> Especially since RSEQ is used by glibc, TIF_NOTIFY_RESUME is raised
> quite frequently and the empty calls show up in exit path profiling.
> 
> Optimize this by doing a quick check of the reclaim condition before
> invoking it.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Roman Gushchin <roman.gushchin@linux.dev>
> Cc: Shakeel Butt <shakeel.butt@linux.dev>
> Cc: Muchun Song <muchun.song@linux.dev>
> Cc: Andrew Morton <akpm@linux-foundation.org>

Nice!

Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Re: [PATCH] memcg: Optimize exit to user space
Posted by Roman Gushchin 1 month, 3 weeks ago
Thomas Gleixner <tglx@linutronix.de> writes:

> memcg uses TIF_NOTIFY_RESUME to handle reclaiming on exit to user
> space. TIF_NOTIFY_RESUME is a multiplexing TIF bit, which is utilized by
> other entities as well.
>
> This results in a unconditional mem_cgroup_handle_over_high() call for
> every invocation of resume_user_mode_work(), which is a pointless
> exercise as most of the time there is no reclaim work to do.
>
> Especially since RSEQ is used by glibc, TIF_NOTIFY_RESUME is raised
> quite frequently and the empty calls show up in exit path profiling.
>
> Optimize this by doing a quick check of the reclaim condition before
> invoking it.
>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Roman Gushchin <roman.gushchin@linux.dev>
> Cc: Shakeel Butt <shakeel.butt@linux.dev>
> Cc: Muchun Song <muchun.song@linux.dev>
> Cc: Andrew Morton <akpm@linux-foundation.org>

Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>

Thanks!