From: Michal Hocko <mhocko@suse.com>
Leonardo Bras has noticed that pcp charge cache draining might be
disruptive on workloads relying on 'isolated cpus', a feature commonly
used on workloads that are sensitive to interruption and context
switching such as vRAN and Industrial Control Systems.
There are essentially two ways how to approach the issue. We can either
allow the pcp cache to be drained on a different rather than a local cpu
or avoid remote flushing on isolated cpus.
The current pcp charge cache is really optimized for high performance
and it always relies to stick with its cpu. That means it only requires
local_lock (preempt_disable on !RT) and draining is handed over to pcp
WQ to drain locally again.
The former solution (remote draining) would require to add an additional
locking to prevent local charges from racing with the draining. This
adds an atomic operation to otherwise simple arithmetic fast path in the
try_charge path. Another concern is that the remote draining can cause a
lock contention for the isolated workloads and therefore interfere with
it indirectly via user space interfaces.
Another option is to avoid draining scheduling on isolated cpus
altogether. That means that those remote cpus would keep their charges
even after drain_all_stock returns. This is certainly not optimal either
but it shouldn't really cause any major problems. In the worst case
(many isolated cpus with charges - each of them with MEMCG_CHARGE_BATCH
i.e 64 page) the memory consumption of a memcg would be artificially
higher than can be immediately used from other cpus.
Theoretically a memcg OOM killer could be triggered pre-maturely.
Currently it is not really clear whether this is a practical problem
though. Tight memcg limit would be really counter productive to cpu
isolated workloads pretty much by definition because any memory
reclaimed induced by memcg limit could break user space timing
expectations as those usually expect execution in the userspace most of
the time.
Also charges could be left behind on memcg removal. Any future charge on
those isolated cpus will drain that pcp cache so this won't be a
permanent leak.
Considering cons and pros of both approaches this patch is implementing
the second option and simply do not schedule remote draining if the
target cpu is isolated. This solution is much more simpler. It doesn't
add any new locking and it is more more predictable from the user space
POV. Should the pre-mature memcg OOM become a real life problem, we can
revisit this decision.
Cc: Leonardo Brás <leobras@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Frederic Weisbecker <frederic@kernel.org>
Reported-by: Leonardo Bras <leobras@redhat.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Suggested-by: Roman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
mm/memcontrol.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 0524add35cae..12559c08d976 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2366,7 +2366,7 @@ static void drain_all_stock(struct mem_cgroup *root_memcg)
!test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) {
if (cpu == curcpu)
drain_local_stock(&stock->work);
- else
+ else if (!cpu_is_isolated(cpu))
schedule_work_on(cpu, &stock->work);
}
}
--
2.30.2
Hi Michal,
I love your patch! Yet something to improve:
[auto build test ERROR on akpm-mm/mm-everything]
[also build test ERROR on linus/master v6.3-rc2 next-20230317]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Michal-Hocko/sched-isolation-Add-cpu_is_isolated-API/20230317-214621
base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/r/20230317134448.11082-3-mhocko%40kernel.org
patch subject: [PATCH 2/2] memcg: do not drain charge pcp caches on remote isolated cpus
config: x86_64-randconfig-a013 (https://download.01.org/0day-ci/archive/20230318/202303180617.7E3aIlHf-lkp@intel.com/config)
compiler: gcc-11 (Debian 11.3.0-8) 11.3.0
reproduce (this is a W=1 build):
# https://github.com/intel-lab-lkp/linux/commit/b376cfcf40a43276e1280950bb926fdf3521940a
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Michal-Hocko/sched-isolation-Add-cpu_is_isolated-API/20230317-214621
git checkout b376cfcf40a43276e1280950bb926fdf3521940a
# save the config file
mkdir build_dir && cp config build_dir/.config
make W=1 O=build_dir ARCH=x86_64 olddefconfig
make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash
If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202303180617.7E3aIlHf-lkp@intel.com/
All errors (new ones prefixed by >>):
mm/memcontrol.c: In function 'drain_all_stock':
>> mm/memcontrol.c:2369:35: error: implicit declaration of function 'cpu_is_isolated' [-Werror=implicit-function-declaration]
2369 | else if (!cpu_is_isolated(cpu))
| ^~~~~~~~~~~~~~~
cc1: some warnings being treated as errors
vim +/cpu_is_isolated +2369 mm/memcontrol.c
2331
2332 /*
2333 * Drains all per-CPU charge caches for given root_memcg resp. subtree
2334 * of the hierarchy under it.
2335 */
2336 static void drain_all_stock(struct mem_cgroup *root_memcg)
2337 {
2338 int cpu, curcpu;
2339
2340 /* If someone's already draining, avoid adding running more workers. */
2341 if (!mutex_trylock(&percpu_charge_mutex))
2342 return;
2343 /*
2344 * Notify other cpus that system-wide "drain" is running
2345 * We do not care about races with the cpu hotplug because cpu down
2346 * as well as workers from this path always operate on the local
2347 * per-cpu data. CPU up doesn't touch memcg_stock at all.
2348 */
2349 migrate_disable();
2350 curcpu = smp_processor_id();
2351 for_each_online_cpu(cpu) {
2352 struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
2353 struct mem_cgroup *memcg;
2354 bool flush = false;
2355
2356 rcu_read_lock();
2357 memcg = stock->cached;
2358 if (memcg && stock->nr_pages &&
2359 mem_cgroup_is_descendant(memcg, root_memcg))
2360 flush = true;
2361 else if (obj_stock_flush_required(stock, root_memcg))
2362 flush = true;
2363 rcu_read_unlock();
2364
2365 if (flush &&
2366 !test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) {
2367 if (cpu == curcpu)
2368 drain_local_stock(&stock->work);
> 2369 else if (!cpu_is_isolated(cpu))
2370 schedule_work_on(cpu, &stock->work);
2371 }
2372 }
2373 migrate_enable();
2374 mutex_unlock(&percpu_charge_mutex);
2375 }
2376
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests
On Sat, 18 Mar 2023 06:22:40 +0800 kernel test robot <lkp@intel.com> wrote: > mm/memcontrol.c: In function 'drain_all_stock': > >> mm/memcontrol.c:2369:35: error: implicit declaration of function 'cpu_is_isolated' [-Werror=implicit-function-declaration] > 2369 | else if (!cpu_is_isolated(cpu)) > | ^~~~~~~~~~~~~~~ Thanks. --- a/mm/memcontrol.c~memcg-do-not-drain-charge-pcp-caches-on-remote-isolated-cpus-fix +++ a/mm/memcontrol.c @@ -63,6 +63,7 @@ #include <linux/resume_user_mode.h> #include <linux/psi.h> #include <linux/seq_buf.h> +#include <linux/sched/isolation.h> #include "internal.h" #include <net/sock.h> #include <net/ip.h> _
On Fri 17-03-23 16:32:29, Andrew Morton wrote: > On Sat, 18 Mar 2023 06:22:40 +0800 kernel test robot <lkp@intel.com> wrote: > > > mm/memcontrol.c: In function 'drain_all_stock': > > >> mm/memcontrol.c:2369:35: error: implicit declaration of function 'cpu_is_isolated' [-Werror=implicit-function-declaration] > > 2369 | else if (!cpu_is_isolated(cpu)) > > | ^~~~~~~~~~~~~~~ > > Thanks. > > --- a/mm/memcontrol.c~memcg-do-not-drain-charge-pcp-caches-on-remote-isolated-cpus-fix > +++ a/mm/memcontrol.c > @@ -63,6 +63,7 @@ > #include <linux/resume_user_mode.h> > #include <linux/psi.h> > #include <linux/seq_buf.h> > +#include <linux/sched/isolation.h> > #include "internal.h" > #include <net/sock.h> > #include <net/ip.h> Thanks a lot Andrew! > _ -- Michal Hocko SUSE Labs
Hi Michal,
I love your patch! Yet something to improve:
[auto build test ERROR on akpm-mm/mm-everything]
[also build test ERROR on linus/master v6.3-rc2 next-20230317]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Michal-Hocko/sched-isolation-Add-cpu_is_isolated-API/20230317-214621
base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/r/20230317134448.11082-3-mhocko%40kernel.org
patch subject: [PATCH 2/2] memcg: do not drain charge pcp caches on remote isolated cpus
config: x86_64-randconfig-a014 (https://download.01.org/0day-ci/archive/20230318/202303180520.5lKAJwrg-lkp@intel.com/config)
compiler: clang version 14.0.6 (https://github.com/llvm/llvm-project f28c006a5895fc0e329fe15fead81e37457cb1d1)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/intel-lab-lkp/linux/commit/b376cfcf40a43276e1280950bb926fdf3521940a
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Michal-Hocko/sched-isolation-Add-cpu_is_isolated-API/20230317-214621
git checkout b376cfcf40a43276e1280950bb926fdf3521940a
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 olddefconfig
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash
If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202303180520.5lKAJwrg-lkp@intel.com/
All errors (new ones prefixed by >>):
>> mm/memcontrol.c:2369:14: error: implicit declaration of function 'cpu_is_isolated' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
else if (!cpu_is_isolated(cpu))
^
1 error generated.
vim +/cpu_is_isolated +2369 mm/memcontrol.c
2331
2332 /*
2333 * Drains all per-CPU charge caches for given root_memcg resp. subtree
2334 * of the hierarchy under it.
2335 */
2336 static void drain_all_stock(struct mem_cgroup *root_memcg)
2337 {
2338 int cpu, curcpu;
2339
2340 /* If someone's already draining, avoid adding running more workers. */
2341 if (!mutex_trylock(&percpu_charge_mutex))
2342 return;
2343 /*
2344 * Notify other cpus that system-wide "drain" is running
2345 * We do not care about races with the cpu hotplug because cpu down
2346 * as well as workers from this path always operate on the local
2347 * per-cpu data. CPU up doesn't touch memcg_stock at all.
2348 */
2349 migrate_disable();
2350 curcpu = smp_processor_id();
2351 for_each_online_cpu(cpu) {
2352 struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
2353 struct mem_cgroup *memcg;
2354 bool flush = false;
2355
2356 rcu_read_lock();
2357 memcg = stock->cached;
2358 if (memcg && stock->nr_pages &&
2359 mem_cgroup_is_descendant(memcg, root_memcg))
2360 flush = true;
2361 else if (obj_stock_flush_required(stock, root_memcg))
2362 flush = true;
2363 rcu_read_unlock();
2364
2365 if (flush &&
2366 !test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) {
2367 if (cpu == curcpu)
2368 drain_local_stock(&stock->work);
> 2369 else if (!cpu_is_isolated(cpu))
2370 schedule_work_on(cpu, &stock->work);
2371 }
2372 }
2373 migrate_enable();
2374 mutex_unlock(&percpu_charge_mutex);
2375 }
2376
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests
On Fri, Mar 17, 2023 at 6:44 AM Michal Hocko <mhocko@kernel.org> wrote: > > From: Michal Hocko <mhocko@suse.com> > > Leonardo Bras has noticed that pcp charge cache draining might be > disruptive on workloads relying on 'isolated cpus', a feature commonly > used on workloads that are sensitive to interruption and context > switching such as vRAN and Industrial Control Systems. > > There are essentially two ways how to approach the issue. We can either > allow the pcp cache to be drained on a different rather than a local cpu > or avoid remote flushing on isolated cpus. > > The current pcp charge cache is really optimized for high performance > and it always relies to stick with its cpu. That means it only requires > local_lock (preempt_disable on !RT) and draining is handed over to pcp > WQ to drain locally again. > > The former solution (remote draining) would require to add an additional > locking to prevent local charges from racing with the draining. This > adds an atomic operation to otherwise simple arithmetic fast path in the > try_charge path. Another concern is that the remote draining can cause a > lock contention for the isolated workloads and therefore interfere with > it indirectly via user space interfaces. > > Another option is to avoid draining scheduling on isolated cpus > altogether. That means that those remote cpus would keep their charges > even after drain_all_stock returns. This is certainly not optimal either > but it shouldn't really cause any major problems. In the worst case > (many isolated cpus with charges - each of them with MEMCG_CHARGE_BATCH > i.e 64 page) the memory consumption of a memcg would be artificially > higher than can be immediately used from other cpus. > > Theoretically a memcg OOM killer could be triggered pre-maturely. > Currently it is not really clear whether this is a practical problem > though. Tight memcg limit would be really counter productive to cpu > isolated workloads pretty much by definition because any memory > reclaimed induced by memcg limit could break user space timing > expectations as those usually expect execution in the userspace most of > the time. > > Also charges could be left behind on memcg removal. Any future charge on > those isolated cpus will drain that pcp cache so this won't be a > permanent leak. > > Considering cons and pros of both approaches this patch is implementing > the second option and simply do not schedule remote draining if the > target cpu is isolated. This solution is much more simpler. It doesn't > add any new locking and it is more more predictable from the user space > POV. Should the pre-mature memcg OOM become a real life problem, we can > revisit this decision. > > Cc: Leonardo Brás <leobras@redhat.com> > Cc: Marcelo Tosatti <mtosatti@redhat.com> > Cc: Shakeel Butt <shakeelb@google.com> > Cc: Muchun Song <muchun.song@linux.dev> > Cc: Johannes Weiner <hannes@cmpxchg.org> > Cc: Frederic Weisbecker <frederic@kernel.org> > Reported-by: Leonardo Bras <leobras@redhat.com> > Acked-by: Roman Gushchin <roman.gushchin@linux.dev> > Suggested-by: Roman Gushchin <roman.gushchin@linux.dev> > Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Shakeel Butt <shakeelb@google.com>
© 2016 - 2026 Red Hat, Inc.