include/linux/memcontrol.h | 8 +++++++- mm/memcontrol.c | 4 ++-- 2 files changed, 9 insertions(+), 3 deletions(-)
memcg uses TIF_NOTIFY_RESUME to handle reclaiming on exit to user
space. TIF_NOTIFY_RESUME is a multiplexing TIF bit, which is utilized by
other entities as well.
This results in a unconditional mem_cgroup_handle_over_high() call for
every invocation of resume_user_mode_work(), which is a pointless
exercise as most of the time there is no reclaim work to do.
Especially since RSEQ is used by glibc, TIF_NOTIFY_RESUME is raised
quite frequently and the empty calls show up in exit path profiling.
Optimize this by doing a quick check of the reclaim condition before
invoking it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/memcontrol.h | 8 +++++++-
mm/memcontrol.c | 4 ++--
2 files changed, 9 insertions(+), 3 deletions(-)
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -900,7 +900,13 @@ unsigned long mem_cgroup_get_zone_lru_si
return READ_ONCE(mz->lru_zone_size[zone_idx][lru]);
}
-void mem_cgroup_handle_over_high(gfp_t gfp_mask);
+void __mem_cgroup_handle_over_high(gfp_t gfp_mask);
+
+static inline void mem_cgroup_handle_over_high(gfp_t gfp_mask)
+{
+ if (unlikely(current->memcg_nr_pages_over_high))
+ __mem_cgroup_handle_over_high(gfp_mask);
+}
unsigned long mem_cgroup_get_max(struct mem_cgroup *memcg);
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2203,7 +2203,7 @@ static unsigned long calculate_high_dela
* try_charge() (context permitting), as well as from the userland
* return path where reclaim is always able to block.
*/
-void mem_cgroup_handle_over_high(gfp_t gfp_mask)
+void __mem_cgroup_handle_over_high(gfp_t gfp_mask)
{
unsigned long penalty_jiffies;
unsigned long pflags;
@@ -2486,7 +2486,7 @@ static int try_charge_memcg(struct mem_c
if (current->memcg_nr_pages_over_high > MEMCG_CHARGE_BATCH &&
!(current->flags & PF_MEMALLOC) &&
gfpflags_allow_blocking(gfp_mask))
- mem_cgroup_handle_over_high(gfp_mask);
+ __mem_cgroup_handle_over_high(gfp_mask);
return 0;
}
On Wed, Aug 13, 2025 at 04:57:55PM +0200, Thomas Gleixner wrote: > memcg uses TIF_NOTIFY_RESUME to handle reclaiming on exit to user > space. TIF_NOTIFY_RESUME is a multiplexing TIF bit, which is utilized by > other entities as well. > > This results in a unconditional mem_cgroup_handle_over_high() call for > every invocation of resume_user_mode_work(), which is a pointless > exercise as most of the time there is no reclaim work to do. > > Especially since RSEQ is used by glibc, TIF_NOTIFY_RESUME is raised > quite frequently and the empty calls show up in exit path profiling. > > Optimize this by doing a quick check of the reclaim condition before > invoking it. > > Signed-off-by: Thomas Gleixner <tglx@linutronix.de> > Cc: Johannes Weiner <hannes@cmpxchg.org> > Cc: Michal Hocko <mhocko@kernel.org> > Cc: Roman Gushchin <roman.gushchin@linux.dev> > Cc: Shakeel Butt <shakeel.butt@linux.dev> > Cc: Muchun Song <muchun.song@linux.dev> > Cc: Andrew Morton <akpm@linux-foundation.org> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Since this is seen in profiling data and it is simple enough, I think it is worth backporting to stable trees as well. In the followup cleanup, we can remove the (!nr_pages) check inside __mem_cgroup_handle_over_high() as well.
On Wed, 13 Aug 2025 10:19:03 -0700 Shakeel Butt <shakeel.butt@linux.dev> wrote: > On Wed, Aug 13, 2025 at 04:57:55PM +0200, Thomas Gleixner wrote: > > memcg uses TIF_NOTIFY_RESUME to handle reclaiming on exit to user > > space. TIF_NOTIFY_RESUME is a multiplexing TIF bit, which is utilized by > > other entities as well. > > > > This results in a unconditional mem_cgroup_handle_over_high() call for > > every invocation of resume_user_mode_work(), which is a pointless > > exercise as most of the time there is no reclaim work to do. > > > > Especially since RSEQ is used by glibc, TIF_NOTIFY_RESUME is raised > > quite frequently and the empty calls show up in exit path profiling. > > > > Optimize this by doing a quick check of the reclaim condition before > > invoking it. > > > > Signed-off-by: Thomas Gleixner <tglx@linutronix.de> > > Cc: Johannes Weiner <hannes@cmpxchg.org> > > Cc: Michal Hocko <mhocko@kernel.org> > > Cc: Roman Gushchin <roman.gushchin@linux.dev> > > Cc: Shakeel Butt <shakeel.butt@linux.dev> > > Cc: Muchun Song <muchun.song@linux.dev> > > Cc: Andrew Morton <akpm@linux-foundation.org> > > Acked-by: Shakeel Butt <shakeel.butt@linux.dev> > > Since this is seen in profiling data and it is simple enough, I think it > is worth backporting to stable trees as well. People will probably do this, but it's a big break of -stable rules. If it is a regression fix (ie, has a Fixes:) and if it makes a big difference (ie, comes with impressive quantitative testing results) then maybe we could push it into -stable anyway... > In the followup cleanup, we can remove the (!nr_pages) check inside > __mem_cgroup_handle_over_high() as well. yup, how about we do that now --- a/mm/memcontrol.c~memcg-optimize-exit-to-user-space-fix +++ a/mm/memcontrol.c @@ -2213,9 +2213,6 @@ void __mem_cgroup_handle_over_high(gfp_t struct mem_cgroup *memcg; bool in_retry = false; - if (likely(!nr_pages)) - return; - memcg = get_mem_cgroup_from_mm(current->mm); current->memcg_nr_pages_over_high = 0; _
On Wed, Aug 13 2025 at 10:19, Shakeel Butt wrote: > On Wed, Aug 13, 2025 at 04:57:55PM +0200, Thomas Gleixner wrote: > Since this is seen in profiling data and it is simple enough, I think it > is worth backporting to stable trees as well. Your call. > In the followup cleanup, we can remove the (!nr_pages) check inside > __mem_cgroup_handle_over_high() as well. Yes. I did not want to do that in one go, but that's an obvious follow up. Thanks, tglx
On Wed, Aug 13, 2025 at 04:57:55PM +0200, Thomas Gleixner wrote: > memcg uses TIF_NOTIFY_RESUME to handle reclaiming on exit to user > space. TIF_NOTIFY_RESUME is a multiplexing TIF bit, which is utilized by > other entities as well. > > This results in a unconditional mem_cgroup_handle_over_high() call for > every invocation of resume_user_mode_work(), which is a pointless > exercise as most of the time there is no reclaim work to do. > > Especially since RSEQ is used by glibc, TIF_NOTIFY_RESUME is raised > quite frequently and the empty calls show up in exit path profiling. > > Optimize this by doing a quick check of the reclaim condition before > invoking it. > > Signed-off-by: Thomas Gleixner <tglx@linutronix.de> > Cc: Johannes Weiner <hannes@cmpxchg.org> > Cc: Michal Hocko <mhocko@kernel.org> > Cc: Roman Gushchin <roman.gushchin@linux.dev> > Cc: Shakeel Butt <shakeel.butt@linux.dev> > Cc: Muchun Song <muchun.song@linux.dev> > Cc: Andrew Morton <akpm@linux-foundation.org> Nice! Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Thomas Gleixner <tglx@linutronix.de> writes: > memcg uses TIF_NOTIFY_RESUME to handle reclaiming on exit to user > space. TIF_NOTIFY_RESUME is a multiplexing TIF bit, which is utilized by > other entities as well. > > This results in a unconditional mem_cgroup_handle_over_high() call for > every invocation of resume_user_mode_work(), which is a pointless > exercise as most of the time there is no reclaim work to do. > > Especially since RSEQ is used by glibc, TIF_NOTIFY_RESUME is raised > quite frequently and the empty calls show up in exit path profiling. > > Optimize this by doing a quick check of the reclaim condition before > invoking it. > > Signed-off-by: Thomas Gleixner <tglx@linutronix.de> > Cc: Johannes Weiner <hannes@cmpxchg.org> > Cc: Michal Hocko <mhocko@kernel.org> > Cc: Roman Gushchin <roman.gushchin@linux.dev> > Cc: Shakeel Butt <shakeel.butt@linux.dev> > Cc: Muchun Song <muchun.song@linux.dev> > Cc: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev> Thanks!
© 2016 - 2025 Red Hat, Inc.