[PATCH 03/33] memcg: Prepare to protect against concurrent isolated cpuset change

Frederic Weisbecker posted 33 patches 1 week, 5 days ago
[PATCH 03/33] memcg: Prepare to protect against concurrent isolated cpuset change
Posted by Frederic Weisbecker 1 week, 5 days ago
The HK_TYPE_DOMAIN housekeeping cpumask will soon be made modifiable at
runtime. In order to synchronize against memcg workqueue to make sure
that no asynchronous draining is pending or executing on a newly made
isolated CPU, target and queue a drain work under the same RCU critical
section.

Whenever housekeeping will update the HK_TYPE_DOMAIN cpumask, a memcg
workqueue flush will also be issued in a further change to make sure
that no work remains pending after a CPU has been made isolated.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 mm/memcontrol.c | 21 +++++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index be810c1fbfc3..2289a0299331 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2003,6 +2003,19 @@ static bool is_memcg_drain_needed(struct memcg_stock_pcp *stock,
 	return flush;
 }
 
+static void schedule_drain_work(int cpu, struct work_struct *work)
+{
+	/*
+	 * Protect housekeeping cpumask read and work enqueue together
+	 * in the same RCU critical section so that later cpuset isolated
+	 * partition update only need to wait for an RCU GP and flush the
+	 * pending work on newly isolated CPUs.
+	 */
+	guard(rcu)();
+	if (!cpu_is_isolated(cpu))
+		schedule_work_on(cpu, work);
+}
+
 /*
  * Drains all per-CPU charge caches for given root_memcg resp. subtree
  * of the hierarchy under it.
@@ -2032,8 +2045,8 @@ void drain_all_stock(struct mem_cgroup *root_memcg)
 				      &memcg_st->flags)) {
 			if (cpu == curcpu)
 				drain_local_memcg_stock(&memcg_st->work);
-			else if (!cpu_is_isolated(cpu))
-				schedule_work_on(cpu, &memcg_st->work);
+			else
+				schedule_drain_work(cpu, &memcg_st->work);
 		}
 
 		if (!test_bit(FLUSHING_CACHED_CHARGE, &obj_st->flags) &&
@@ -2042,8 +2055,8 @@ void drain_all_stock(struct mem_cgroup *root_memcg)
 				      &obj_st->flags)) {
 			if (cpu == curcpu)
 				drain_local_obj_stock(&obj_st->work);
-			else if (!cpu_is_isolated(cpu))
-				schedule_work_on(cpu, &obj_st->work);
+			else
+				schedule_drain_work(cpu, &obj_st->work);
 		}
 	}
 	migrate_enable();
-- 
2.51.1
Re: [PATCH 03/33] memcg: Prepare to protect against concurrent isolated cpuset change
Posted by Michal Hocko 1 week, 4 days ago
On Sun 25-01-26 23:45:10, Frederic Weisbecker wrote:
> The HK_TYPE_DOMAIN housekeeping cpumask will soon be made modifiable at
> runtime. In order to synchronize against memcg workqueue to make sure
> that no asynchronous draining is pending or executing on a newly made
> isolated CPU, target and queue a drain work under the same RCU critical
> section.
> 
> Whenever housekeeping will update the HK_TYPE_DOMAIN cpumask, a memcg
> workqueue flush will also be issued in a further change to make sure
> that no work remains pending after a CPU has been made isolated.
> 
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> ---
>  mm/memcontrol.c | 21 +++++++++++++++++----
>  1 file changed, 17 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index be810c1fbfc3..2289a0299331 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2003,6 +2003,19 @@ static bool is_memcg_drain_needed(struct memcg_stock_pcp *stock,
>  	return flush;
>  }
>  
> +static void schedule_drain_work(int cpu, struct work_struct *work)
> +{
> +	/*
> +	 * Protect housekeeping cpumask read and work enqueue together
> +	 * in the same RCU critical section so that later cpuset isolated
> +	 * partition update only need to wait for an RCU GP and flush the
> +	 * pending work on newly isolated CPUs.
> +	 */
> +	guard(rcu)();
> +	if (!cpu_is_isolated(cpu))
> +		schedule_work_on(cpu, work);

Shouldn't this in the guarded rcu section?
-- 
Michal Hocko
SUSE Labs
Re: [PATCH 03/33] memcg: Prepare to protect against concurrent isolated cpuset change
Posted by Frederic Weisbecker 1 week, 4 days ago
Le Mon, Jan 26, 2026 at 05:41:38PM +0100, Michal Hocko a écrit :
> On Sun 25-01-26 23:45:10, Frederic Weisbecker wrote:
> > The HK_TYPE_DOMAIN housekeeping cpumask will soon be made modifiable at
> > runtime. In order to synchronize against memcg workqueue to make sure
> > that no asynchronous draining is pending or executing on a newly made
> > isolated CPU, target and queue a drain work under the same RCU critical
> > section.
> > 
> > Whenever housekeeping will update the HK_TYPE_DOMAIN cpumask, a memcg
> > workqueue flush will also be issued in a further change to make sure
> > that no work remains pending after a CPU has been made isolated.
> > 
> > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> > ---
> >  mm/memcontrol.c | 21 +++++++++++++++++----
> >  1 file changed, 17 insertions(+), 4 deletions(-)
> > 
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index be810c1fbfc3..2289a0299331 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -2003,6 +2003,19 @@ static bool is_memcg_drain_needed(struct memcg_stock_pcp *stock,
> >  	return flush;
> >  }
> >  
> > +static void schedule_drain_work(int cpu, struct work_struct *work)
> > +{
> > +	/*
> > +	 * Protect housekeeping cpumask read and work enqueue together
> > +	 * in the same RCU critical section so that later cpuset isolated
> > +	 * partition update only need to wait for an RCU GP and flush the
> > +	 * pending work on newly isolated CPUs.
> > +	 */
> > +	guard(rcu)();
> > +	if (!cpu_is_isolated(cpu))
> > +		schedule_work_on(cpu, work);
> 
> Shouldn't this in the guarded rcu section?

This is what guard(rcu)() does, right?
Or am I missing something?

Thanks.

-- 
Frederic Weisbecker
SUSE Labs
Re: [PATCH 03/33] memcg: Prepare to protect against concurrent isolated cpuset change
Posted by Michal Hocko 1 week, 3 days ago
On Tue 27-01-26 13:45:06, Frederic Weisbecker wrote:
> Le Mon, Jan 26, 2026 at 05:41:38PM +0100, Michal Hocko a écrit :
> > On Sun 25-01-26 23:45:10, Frederic Weisbecker wrote:
> > > The HK_TYPE_DOMAIN housekeeping cpumask will soon be made modifiable at
> > > runtime. In order to synchronize against memcg workqueue to make sure
> > > that no asynchronous draining is pending or executing on a newly made
> > > isolated CPU, target and queue a drain work under the same RCU critical
> > > section.
> > > 
> > > Whenever housekeeping will update the HK_TYPE_DOMAIN cpumask, a memcg
> > > workqueue flush will also be issued in a further change to make sure
> > > that no work remains pending after a CPU has been made isolated.
> > > 
> > > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> > > ---
> > >  mm/memcontrol.c | 21 +++++++++++++++++----
> > >  1 file changed, 17 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > > index be810c1fbfc3..2289a0299331 100644
> > > --- a/mm/memcontrol.c
> > > +++ b/mm/memcontrol.c
> > > @@ -2003,6 +2003,19 @@ static bool is_memcg_drain_needed(struct memcg_stock_pcp *stock,
> > >  	return flush;
> > >  }
> > >  
> > > +static void schedule_drain_work(int cpu, struct work_struct *work)
> > > +{
> > > +	/*
> > > +	 * Protect housekeeping cpumask read and work enqueue together
> > > +	 * in the same RCU critical section so that later cpuset isolated
> > > +	 * partition update only need to wait for an RCU GP and flush the
> > > +	 * pending work on newly isolated CPUs.
> > > +	 */
> > > +	guard(rcu)();
> > > +	if (!cpu_is_isolated(cpu))
> > > +		schedule_work_on(cpu, work);
> > 
> > Shouldn't this in the guarded rcu section?
> 
> This is what guard(rcu)() does, right?
> Or am I missing something?

I am probably misreading the patch. But I've had the following in mind

	scoped_guard(rcu) {
		if (!cpu_is_isolated(cpu))
			schedule_work_on(cpu, work);
	}
-- 
Michal Hocko
SUSE Labs
Re: [PATCH 03/33] memcg: Prepare to protect against concurrent isolated cpuset change
Posted by Frederic Weisbecker 1 week, 3 days ago
Le Wed, Jan 28, 2026 at 09:45:03AM +0100, Michal Hocko a écrit :
> On Tue 27-01-26 13:45:06, Frederic Weisbecker wrote:
> > Le Mon, Jan 26, 2026 at 05:41:38PM +0100, Michal Hocko a écrit :
> > > On Sun 25-01-26 23:45:10, Frederic Weisbecker wrote:
> > > > The HK_TYPE_DOMAIN housekeeping cpumask will soon be made modifiable at
> > > > runtime. In order to synchronize against memcg workqueue to make sure
> > > > that no asynchronous draining is pending or executing on a newly made
> > > > isolated CPU, target and queue a drain work under the same RCU critical
> > > > section.
> > > > 
> > > > Whenever housekeeping will update the HK_TYPE_DOMAIN cpumask, a memcg
> > > > workqueue flush will also be issued in a further change to make sure
> > > > that no work remains pending after a CPU has been made isolated.
> > > > 
> > > > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> > > > ---
> > > >  mm/memcontrol.c | 21 +++++++++++++++++----
> > > >  1 file changed, 17 insertions(+), 4 deletions(-)
> > > > 
> > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > > > index be810c1fbfc3..2289a0299331 100644
> > > > --- a/mm/memcontrol.c
> > > > +++ b/mm/memcontrol.c
> > > > @@ -2003,6 +2003,19 @@ static bool is_memcg_drain_needed(struct memcg_stock_pcp *stock,
> > > >  	return flush;
> > > >  }
> > > >  
> > > > +static void schedule_drain_work(int cpu, struct work_struct *work)
> > > > +{
> > > > +	/*
> > > > +	 * Protect housekeeping cpumask read and work enqueue together
> > > > +	 * in the same RCU critical section so that later cpuset isolated
> > > > +	 * partition update only need to wait for an RCU GP and flush the
> > > > +	 * pending work on newly isolated CPUs.
> > > > +	 */
> > > > +	guard(rcu)();
> > > > +	if (!cpu_is_isolated(cpu))
> > > > +		schedule_work_on(cpu, work);
> > > 
> > > Shouldn't this in the guarded rcu section?
> > 
> > This is what guard(rcu)() does, right?
> > Or am I missing something?
> 
> I am probably misreading the patch. But I've had the following in mind
> 
> 	scoped_guard(rcu) {
> 		if (!cpu_is_isolated(cpu))
> 			schedule_work_on(cpu, work);
> 	}

guard(...)() protects everything that follows within the same block
(here the whole function) whereas scoped_guard only applies to the
following scope (here what is inside the {} in your example).

So both work.

Thanks.

-- 
Frederic Weisbecker
SUSE Labs
Re: [PATCH 03/33] memcg: Prepare to protect against concurrent isolated cpuset change
Posted by Michal Hocko 1 week, 2 days ago
On Wed 28-01-26 12:27:22, Frederic Weisbecker wrote:
> Le Wed, Jan 28, 2026 at 09:45:03AM +0100, Michal Hocko a écrit :
> > On Tue 27-01-26 13:45:06, Frederic Weisbecker wrote:
> > > Le Mon, Jan 26, 2026 at 05:41:38PM +0100, Michal Hocko a écrit :
> > > > On Sun 25-01-26 23:45:10, Frederic Weisbecker wrote:
> > > > > The HK_TYPE_DOMAIN housekeeping cpumask will soon be made modifiable at
> > > > > runtime. In order to synchronize against memcg workqueue to make sure
> > > > > that no asynchronous draining is pending or executing on a newly made
> > > > > isolated CPU, target and queue a drain work under the same RCU critical
> > > > > section.
> > > > > 
> > > > > Whenever housekeeping will update the HK_TYPE_DOMAIN cpumask, a memcg
> > > > > workqueue flush will also be issued in a further change to make sure
> > > > > that no work remains pending after a CPU has been made isolated.
> > > > > 
> > > > > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> > > > > ---
> > > > >  mm/memcontrol.c | 21 +++++++++++++++++----
> > > > >  1 file changed, 17 insertions(+), 4 deletions(-)
> > > > > 
> > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > > > > index be810c1fbfc3..2289a0299331 100644
> > > > > --- a/mm/memcontrol.c
> > > > > +++ b/mm/memcontrol.c
> > > > > @@ -2003,6 +2003,19 @@ static bool is_memcg_drain_needed(struct memcg_stock_pcp *stock,
> > > > >  	return flush;
> > > > >  }
> > > > >  
> > > > > +static void schedule_drain_work(int cpu, struct work_struct *work)
> > > > > +{
> > > > > +	/*
> > > > > +	 * Protect housekeeping cpumask read and work enqueue together
> > > > > +	 * in the same RCU critical section so that later cpuset isolated
> > > > > +	 * partition update only need to wait for an RCU GP and flush the
> > > > > +	 * pending work on newly isolated CPUs.
> > > > > +	 */
> > > > > +	guard(rcu)();
> > > > > +	if (!cpu_is_isolated(cpu))
> > > > > +		schedule_work_on(cpu, work);
> > > > 
> > > > Shouldn't this in the guarded rcu section?
> > > 
> > > This is what guard(rcu)() does, right?
> > > Or am I missing something?
> > 
> > I am probably misreading the patch. But I've had the following in mind
> > 
> > 	scoped_guard(rcu) {
> > 		if (!cpu_is_isolated(cpu))
> > 			schedule_work_on(cpu, work);
> > 	}
> 
> guard(...)() protects everything that follows within the same block
> (here the whole function) whereas scoped_guard only applies to the
> following scope (here what is inside the {} in your example).
> 
> So both work.

I see. Thanks for the clarification. I would probably prefer a more
explicit call convention but no strong opinion on that.
-- 
Michal Hocko
SUSE Labs
Re: [PATCH 03/33] memcg: Prepare to protect against concurrent isolated cpuset change
Posted by Michal Hocko 1 week, 2 days ago
On Wed 28-01-26 22:18:04, Michal Hocko wrote:
> On Wed 28-01-26 12:27:22, Frederic Weisbecker wrote:
> > Le Wed, Jan 28, 2026 at 09:45:03AM +0100, Michal Hocko a écrit :
> > > On Tue 27-01-26 13:45:06, Frederic Weisbecker wrote:
> > > > Le Mon, Jan 26, 2026 at 05:41:38PM +0100, Michal Hocko a écrit :
> > > > > On Sun 25-01-26 23:45:10, Frederic Weisbecker wrote:
> > > > > > The HK_TYPE_DOMAIN housekeeping cpumask will soon be made modifiable at
> > > > > > runtime. In order to synchronize against memcg workqueue to make sure
> > > > > > that no asynchronous draining is pending or executing on a newly made
> > > > > > isolated CPU, target and queue a drain work under the same RCU critical
> > > > > > section.
> > > > > > 
> > > > > > Whenever housekeeping will update the HK_TYPE_DOMAIN cpumask, a memcg
> > > > > > workqueue flush will also be issued in a further change to make sure
> > > > > > that no work remains pending after a CPU has been made isolated.
> > > > > > 
> > > > > > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> > > > > > ---
> > > > > >  mm/memcontrol.c | 21 +++++++++++++++++----
> > > > > >  1 file changed, 17 insertions(+), 4 deletions(-)
> > > > > > 
> > > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > > > > > index be810c1fbfc3..2289a0299331 100644
> > > > > > --- a/mm/memcontrol.c
> > > > > > +++ b/mm/memcontrol.c
> > > > > > @@ -2003,6 +2003,19 @@ static bool is_memcg_drain_needed(struct memcg_stock_pcp *stock,
> > > > > >  	return flush;
> > > > > >  }
> > > > > >  
> > > > > > +static void schedule_drain_work(int cpu, struct work_struct *work)
> > > > > > +{
> > > > > > +	/*
> > > > > > +	 * Protect housekeeping cpumask read and work enqueue together
> > > > > > +	 * in the same RCU critical section so that later cpuset isolated
> > > > > > +	 * partition update only need to wait for an RCU GP and flush the
> > > > > > +	 * pending work on newly isolated CPUs.
> > > > > > +	 */
> > > > > > +	guard(rcu)();
> > > > > > +	if (!cpu_is_isolated(cpu))
> > > > > > +		schedule_work_on(cpu, work);
> > > > > 
> > > > > Shouldn't this in the guarded rcu section?
> > > > 
> > > > This is what guard(rcu)() does, right?
> > > > Or am I missing something?
> > > 
> > > I am probably misreading the patch. But I've had the following in mind
> > > 
> > > 	scoped_guard(rcu) {
> > > 		if (!cpu_is_isolated(cpu))
> > > 			schedule_work_on(cpu, work);
> > > 	}
> > 
> > guard(...)() protects everything that follows within the same block
> > (here the whole function) whereas scoped_guard only applies to the
> > following scope (here what is inside the {} in your example).
> > 
> > So both work.
> 
> I see. Thanks for the clarification. I would probably prefer a more
> explicit call convention but no strong opinion on that.

Forgot to add
Acked-by: Michal Hocko <mhocko@suse.com>
Thanks!
-- 
Michal Hocko
SUSE Labs