[PATCH v2 08/28] mm: memcontrol: prevent memory cgroup release in get_mem_cgroup_from_folio()

Qi Zheng posted 28 patches 2 days, 8 hours ago
[PATCH v2 08/28] mm: memcontrol: prevent memory cgroup release in get_mem_cgroup_from_folio()
Posted by Qi Zheng 2 days, 8 hours ago
From: Muchun Song <songmuchun@bytedance.com>

In the near future, a folio will no longer pin its corresponding
memory cgroup. To ensure safety, it will only be appropriate to
hold the rcu read lock or acquire a reference to the memory cgroup
returned by folio_memcg(), thereby preventing it from being released.

In the current patch, the rcu read lock is employed to safeguard
against the release of the memory cgroup in get_mem_cgroup_from_folio().

This serves as a preparatory measure for the reparenting of the
LRU pages.

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
---
 mm/memcontrol.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 21b5aad34cae7..431b3154c70c5 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -973,14 +973,19 @@ struct mem_cgroup *get_mem_cgroup_from_current(void)
  */
 struct mem_cgroup *get_mem_cgroup_from_folio(struct folio *folio)
 {
-	struct mem_cgroup *memcg = folio_memcg(folio);
+	struct mem_cgroup *memcg;
 
 	if (mem_cgroup_disabled())
 		return NULL;
 
+	if (!folio_memcg_charged(folio))
+		return root_mem_cgroup;
+
 	rcu_read_lock();
-	if (!memcg || WARN_ON_ONCE(!css_tryget(&memcg->css)))
-		memcg = root_mem_cgroup;
+retry:
+	memcg = folio_memcg(folio);
+	if (unlikely(!css_tryget(&memcg->css)))
+		goto retry;
 	rcu_read_unlock();
 	return memcg;
 }
-- 
2.20.1
Re: [PATCH v2 08/28] mm: memcontrol: prevent memory cgroup release in get_mem_cgroup_from_folio()
Posted by Johannes Weiner 1 day, 18 hours ago
On Wed, Dec 17, 2025 at 03:27:32PM +0800, Qi Zheng wrote:
> From: Muchun Song <songmuchun@bytedance.com>
> 
> In the near future, a folio will no longer pin its corresponding
> memory cgroup. To ensure safety, it will only be appropriate to
> hold the rcu read lock or acquire a reference to the memory cgroup
> returned by folio_memcg(), thereby preventing it from being released.
> 
> In the current patch, the rcu read lock is employed to safeguard
> against the release of the memory cgroup in get_mem_cgroup_from_folio().
> 
> This serves as a preparatory measure for the reparenting of the
> LRU pages.
> 
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
> Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
> ---
>  mm/memcontrol.c | 11 ++++++++---
>  1 file changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 21b5aad34cae7..431b3154c70c5 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -973,14 +973,19 @@ struct mem_cgroup *get_mem_cgroup_from_current(void)
>   */
>  struct mem_cgroup *get_mem_cgroup_from_folio(struct folio *folio)
>  {
> -	struct mem_cgroup *memcg = folio_memcg(folio);
> +	struct mem_cgroup *memcg;
>  
>  	if (mem_cgroup_disabled())
>  		return NULL;
>  
> +	if (!folio_memcg_charged(folio))
> +		return root_mem_cgroup;
> +
>  	rcu_read_lock();
> -	if (!memcg || WARN_ON_ONCE(!css_tryget(&memcg->css)))
> -		memcg = root_mem_cgroup;
> +retry:
> +	memcg = folio_memcg(folio);
> +	if (unlikely(!css_tryget(&memcg->css)))
> +		goto retry;

So starting in patch 27, the tryget can fail if the memcg is offlined,
and the folio's objcg is reparented concurrently. We'll retry until we
find a memcg that isn't dead yet. There's always root_mem_cgroup.

It makes sense, but a loop like this begs the question of how it is
bounded. I pieced it together looking ahead. Since this is a small
diff, it would be nicer to fold it into 27. I didn't see anything in
between depending on it, but correct me if I'm wrong.

Minor style preference:

	/* Comment explaining the above */
	do {
		memcg = folio_memcg(folio);
	} while (!css_tryget(&memcg->css));
Re: [PATCH v2 08/28] mm: memcontrol: prevent memory cgroup release in get_mem_cgroup_from_folio()
Posted by Shakeel Butt 13 hours ago
On Wed, Dec 17, 2025 at 04:45:06PM -0500, Johannes Weiner wrote:
> On Wed, Dec 17, 2025 at 03:27:32PM +0800, Qi Zheng wrote:
> > From: Muchun Song <songmuchun@bytedance.com>
> > 
> > In the near future, a folio will no longer pin its corresponding
> > memory cgroup. To ensure safety, it will only be appropriate to
> > hold the rcu read lock or acquire a reference to the memory cgroup
> > returned by folio_memcg(), thereby preventing it from being released.
> > 
> > In the current patch, the rcu read lock is employed to safeguard
> > against the release of the memory cgroup in get_mem_cgroup_from_folio().
> > 
> > This serves as a preparatory measure for the reparenting of the
> > LRU pages.
> > 
> > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
> > Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
> > ---
> >  mm/memcontrol.c | 11 ++++++++---
> >  1 file changed, 8 insertions(+), 3 deletions(-)
> > 
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 21b5aad34cae7..431b3154c70c5 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -973,14 +973,19 @@ struct mem_cgroup *get_mem_cgroup_from_current(void)
> >   */
> >  struct mem_cgroup *get_mem_cgroup_from_folio(struct folio *folio)
> >  {
> > -	struct mem_cgroup *memcg = folio_memcg(folio);
> > +	struct mem_cgroup *memcg;
> >  
> >  	if (mem_cgroup_disabled())
> >  		return NULL;
> >  
> > +	if (!folio_memcg_charged(folio))
> > +		return root_mem_cgroup;
> > +
> >  	rcu_read_lock();
> > -	if (!memcg || WARN_ON_ONCE(!css_tryget(&memcg->css)))
> > -		memcg = root_mem_cgroup;
> > +retry:
> > +	memcg = folio_memcg(folio);
> > +	if (unlikely(!css_tryget(&memcg->css)))
> > +		goto retry;
> 
> So starting in patch 27, the tryget can fail if the memcg is offlined,

offlined or on its way to free? It is css_tryget() without online.

> and the folio's objcg is reparented concurrently. We'll retry until we
> find a memcg that isn't dead yet. There's always root_mem_cgroup.
> 
> It makes sense, but a loop like this begs the question of how it is
> bounded. I pieced it together looking ahead. Since this is a small
> diff, it would be nicer to fold it into 27. I didn't see anything in
> between depending on it, but correct me if I'm wrong.

I agree to fold it in the patch where it is needed. Currently at this
point in series I don't see how css_tryget() can fail here.

> 
> Minor style preference:
> 
> 	/* Comment explaining the above */
> 	do {
> 		memcg = folio_memcg(folio);
> 	} while (!css_tryget(&memcg->css));
Re: [PATCH v2 08/28] mm: memcontrol: prevent memory cgroup release in get_mem_cgroup_from_folio()
Posted by Johannes Weiner 12 hours ago
On Thu, Dec 18, 2025 at 06:09:50PM -0800, Shakeel Butt wrote:
> On Wed, Dec 17, 2025 at 04:45:06PM -0500, Johannes Weiner wrote:
> > On Wed, Dec 17, 2025 at 03:27:32PM +0800, Qi Zheng wrote:
> > > From: Muchun Song <songmuchun@bytedance.com>
> > > 
> > > In the near future, a folio will no longer pin its corresponding
> > > memory cgroup. To ensure safety, it will only be appropriate to
> > > hold the rcu read lock or acquire a reference to the memory cgroup
> > > returned by folio_memcg(), thereby preventing it from being released.
> > > 
> > > In the current patch, the rcu read lock is employed to safeguard
> > > against the release of the memory cgroup in get_mem_cgroup_from_folio().
> > > 
> > > This serves as a preparatory measure for the reparenting of the
> > > LRU pages.
> > > 
> > > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > > Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
> > > Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
> > > ---
> > >  mm/memcontrol.c | 11 ++++++++---
> > >  1 file changed, 8 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > > index 21b5aad34cae7..431b3154c70c5 100644
> > > --- a/mm/memcontrol.c
> > > +++ b/mm/memcontrol.c
> > > @@ -973,14 +973,19 @@ struct mem_cgroup *get_mem_cgroup_from_current(void)
> > >   */
> > >  struct mem_cgroup *get_mem_cgroup_from_folio(struct folio *folio)
> > >  {
> > > -	struct mem_cgroup *memcg = folio_memcg(folio);
> > > +	struct mem_cgroup *memcg;
> > >  
> > >  	if (mem_cgroup_disabled())
> > >  		return NULL;
> > >  
> > > +	if (!folio_memcg_charged(folio))
> > > +		return root_mem_cgroup;
> > > +
> > >  	rcu_read_lock();
> > > -	if (!memcg || WARN_ON_ONCE(!css_tryget(&memcg->css)))
> > > -		memcg = root_mem_cgroup;
> > > +retry:
> > > +	memcg = folio_memcg(folio);
> > > +	if (unlikely(!css_tryget(&memcg->css)))
> > > +		goto retry;
> > 
> > So starting in patch 27, the tryget can fail if the memcg is offlined,
> 
> offlined or on its way to free? It is css_tryget() without online.

Sorry, I did mean freeing.

But in the new scheme, they will happen much closer together than
before, since charges don't hold a reference to the css anymore.

So when css_killed_work_fn() does

		offline_css(css);
		css_put(css);

on rmdir, that's now the css_put() we expect to drop the refcount to 0
even with folios in circulation.

The race is then:

	get_mem_cgroup_from_folio()	cgroup_rmdir()
	  memcg = folio_memcg(folio);
            folio->objcg->memcg
					  offline_css()
                                            reparent_objcgs()
					      objcg->memcg = objcg->memcg->parent
					  css_put() -> 0
	  !css_tryget(&memcg->css)

and the retry ensures we'll look up objcg->memcg again and find the
live parent and new owner of the folio.
Re: [PATCH v2 08/28] mm: memcontrol: prevent memory cgroup release in get_mem_cgroup_from_folio()
Posted by Johannes Weiner 12 hours ago
On Thu, Dec 18, 2025 at 10:53:57PM -0500, Johannes Weiner wrote:
> On Thu, Dec 18, 2025 at 06:09:50PM -0800, Shakeel Butt wrote:
> > On Wed, Dec 17, 2025 at 04:45:06PM -0500, Johannes Weiner wrote:
> > > So starting in patch 27, the tryget can fail if the memcg is offlined,
> > 
> > offlined or on its way to free? It is css_tryget() without online.
> 
> Sorry, I did mean freeing.
> 
> But in the new scheme, they will happen much closer together than
> before, since charges don't hold a reference to the css anymore.
> 
> So when css_killed_work_fn() does
> 
> 		offline_css(css);
> 		css_put(css);
> 
> on rmdir, that's now the css_put() we expect to drop the refcount to 0
> even with folios in circulation.
> 
> The race is then:
> 
> 	get_mem_cgroup_from_folio()	cgroup_rmdir()
> 	  memcg = folio_memcg(folio);
>             folio->objcg->memcg
> 					  offline_css()
>                                             reparent_objcgs()
> 					      objcg->memcg = objcg->memcg->parent
> 					  css_put() -> 0
> 	  !css_tryget(&memcg->css)
> 
> and the retry ensures we'll look up objcg->memcg again and find the
> live parent and new owner of the folio.

But yes, none of this happens until patch 27.
Re: [PATCH v2 08/28] mm: memcontrol: prevent memory cgroup release in get_mem_cgroup_from_folio()
Posted by Qi Zheng 1 day, 9 hours ago

On 12/18/25 5:45 AM, Johannes Weiner wrote:
> On Wed, Dec 17, 2025 at 03:27:32PM +0800, Qi Zheng wrote:
>> From: Muchun Song <songmuchun@bytedance.com>
>>
>> In the near future, a folio will no longer pin its corresponding
>> memory cgroup. To ensure safety, it will only be appropriate to
>> hold the rcu read lock or acquire a reference to the memory cgroup
>> returned by folio_memcg(), thereby preventing it from being released.
>>
>> In the current patch, the rcu read lock is employed to safeguard
>> against the release of the memory cgroup in get_mem_cgroup_from_folio().
>>
>> This serves as a preparatory measure for the reparenting of the
>> LRU pages.
>>
>> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
>> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
>> Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
>> ---
>>   mm/memcontrol.c | 11 ++++++++---
>>   1 file changed, 8 insertions(+), 3 deletions(-)
>>
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 21b5aad34cae7..431b3154c70c5 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -973,14 +973,19 @@ struct mem_cgroup *get_mem_cgroup_from_current(void)
>>    */
>>   struct mem_cgroup *get_mem_cgroup_from_folio(struct folio *folio)
>>   {
>> -	struct mem_cgroup *memcg = folio_memcg(folio);
>> +	struct mem_cgroup *memcg;
>>   
>>   	if (mem_cgroup_disabled())
>>   		return NULL;
>>   
>> +	if (!folio_memcg_charged(folio))
>> +		return root_mem_cgroup;
>> +
>>   	rcu_read_lock();
>> -	if (!memcg || WARN_ON_ONCE(!css_tryget(&memcg->css)))
>> -		memcg = root_mem_cgroup;
>> +retry:
>> +	memcg = folio_memcg(folio);
>> +	if (unlikely(!css_tryget(&memcg->css)))
>> +		goto retry;
> 
> So starting in patch 27, the tryget can fail if the memcg is offlined,
> and the folio's objcg is reparented concurrently. We'll retry until we
> find a memcg that isn't dead yet. There's always root_mem_cgroup.
> 
> It makes sense, but a loop like this begs the question of how it is
> bounded. I pieced it together looking ahead. Since this is a small
> diff, it would be nicer to fold it into 27. I didn't see anything in
> between depending on it, but correct me if I'm wrong.

Right, will fold it into #27 in the next version.

> 
> Minor style preference:
> 
> 	/* Comment explaining the above */
> 	do {
> 		memcg = folio_memcg(folio);
> 	} while (!css_tryget(&memcg->css));

OK, will do.

Thanks,
Qi