mm/memcontrol: stop resize loop if limit was changed again

[PATCH] mm/memcontrol: stop resize loop if limit was changed again

Posted by Pavel Tikhomirov 1 year, 10 months ago

In memory_max_write() we first set memcg->memory.max and only then
try to enforce it in loop. What if while we are in loop someone else
have changed memcg->memory.max but we are still trying to enforce
the old value? I believe this can lead to nasty consequence like getting
an oom on perfectly fine cgroup within it's limits or excess reclaim.

We also have exactly the same thing in memory_high_write().

So let's stop enforcing old limits if we already have a new ones.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
---
 mm/memcontrol.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 61932c9215e7..81b303728491 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6769,6 +6769,9 @@ static ssize_t memory_high_write(struct kernfs_open_file *of,
 		unsigned long nr_pages = page_counter_read(&memcg->memory);
 		unsigned long reclaimed;
 
+		if (memcg->memory.high != high)
+			break;
+
 		if (nr_pages <= high)
 			break;
 
@@ -6817,6 +6820,9 @@ static ssize_t memory_max_write(struct kernfs_open_file *of,
 	for (;;) {
 		unsigned long nr_pages = page_counter_read(&memcg->memory);
 
+		if (memcg->memory.max != max)
+			break;
+
 		if (nr_pages <= max)
 			break;
 
-- 
2.43.0

Re: [PATCH] mm/memcontrol: stop resize loop if limit was changed again

Posted by Waiman Long 1 year, 10 months ago

On 3/20/24 06:03, Pavel Tikhomirov wrote:
> In memory_max_write() we first set memcg->memory.max and only then
> try to enforce it in loop. What if while we are in loop someone else
> have changed memcg->memory.max but we are still trying to enforce
> the old value? I believe this can lead to nasty consequence like getting
> an oom on perfectly fine cgroup within it's limits or excess reclaim.

Concurrent write to the same cgroup control file is not possible as the 
underlying kernfs_open_file structure has a mutex that serialize access 
to the file. Concurrent write to different cgroup control files is 
possible, though.

Cheers,
Longman

>
> We also have exactly the same thing in memory_high_write().
>
> So let's stop enforcing old limits if we already have a new ones.
>
> Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
> ---
>   mm/memcontrol.c | 6 ++++++
>   1 file changed, 6 insertions(+)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 61932c9215e7..81b303728491 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -6769,6 +6769,9 @@ static ssize_t memory_high_write(struct kernfs_open_file *of,
>   		unsigned long nr_pages = page_counter_read(&memcg->memory);
>   		unsigned long reclaimed;
>   
> +		if (memcg->memory.high != high)
> +			break;
> +
>   		if (nr_pages <= high)
>   			break;
>   
> @@ -6817,6 +6820,9 @@ static ssize_t memory_max_write(struct kernfs_open_file *of,
>   	for (;;) {
>   		unsigned long nr_pages = page_counter_read(&memcg->memory);
>   
> +		if (memcg->memory.max != max)
> +			break;
> +
>   		if (nr_pages <= max)
>   			break;
>

Re: [PATCH] mm/memcontrol: stop resize loop if limit was changed again

Posted by Pavel Tikhomirov 1 year, 10 months ago


On 21/03/2024 01:09, Waiman Long wrote:
> 
> On 3/20/24 06:03, Pavel Tikhomirov wrote:
>> In memory_max_write() we first set memcg->memory.max and only then
>> try to enforce it in loop. What if while we are in loop someone else
>> have changed memcg->memory.max but we are still trying to enforce
>> the old value? I believe this can lead to nasty consequence like getting
>> an oom on perfectly fine cgroup within it's limits or excess reclaim.
> 
> Concurrent write to the same cgroup control file is not possible as the 
> underlying kernfs_open_file structure has a mutex that serialize access 
> to the file. Concurrent write to different cgroup control files is 
> possible, though.

Thanks for pointing this out, now I see it, in kernfs_fop_write_iter() 
we take of->mutex before ops->write() -> cgroup_file_write(). That means 
patch is not needed.

> 
> Cheers,
> Longman
> 
>>
>> We also have exactly the same thing in memory_high_write().
>>
>> So let's stop enforcing old limits if we already have a new ones.
>>
>> Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
>> ---
>>   mm/memcontrol.c | 6 ++++++
>>   1 file changed, 6 insertions(+)
>>
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 61932c9215e7..81b303728491 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -6769,6 +6769,9 @@ static ssize_t memory_high_write(struct 
>> kernfs_open_file *of,
>>           unsigned long nr_pages = page_counter_read(&memcg->memory);
>>           unsigned long reclaimed;
>> +        if (memcg->memory.high != high)
>> +            break;
>> +
>>           if (nr_pages <= high)
>>               break;
>> @@ -6817,6 +6820,9 @@ static ssize_t memory_max_write(struct 
>> kernfs_open_file *of,
>>       for (;;) {
>>           unsigned long nr_pages = page_counter_read(&memcg->memory);
>> +        if (memcg->memory.max != max)
>> +            break;
>> +
>>           if (nr_pages <= max)
>>               break;
> 

-- 
Best regards, Tikhomirov Pavel
Senior Software Developer, Virtuozzo.

Re: [PATCH] mm/memcontrol: stop resize loop if limit was changed again

Posted by Michal Hocko 1 year, 10 months ago

On Wed 20-03-24 13:09:07, Waiman Long wrote:
> 
> On 3/20/24 06:03, Pavel Tikhomirov wrote:
> > In memory_max_write() we first set memcg->memory.max and only then
> > try to enforce it in loop. What if while we are in loop someone else
> > have changed memcg->memory.max but we are still trying to enforce
> > the old value? I believe this can lead to nasty consequence like getting
> > an oom on perfectly fine cgroup within it's limits or excess reclaim.
> 
> Concurrent write to the same cgroup control file is not possible as the
> underlying kernfs_open_file structure has a mutex that serialize access to
> the file.

This is good to know and I was not aware of that. Thanks!
-- 
Michal Hocko
SUSE Labs

Re: [PATCH] mm/memcontrol: stop resize loop if limit was changed again

Posted by Michal Hocko 1 year, 10 months ago

On Wed 20-03-24 18:12:50, Michal Hocko wrote:
> On Wed 20-03-24 13:09:07, Waiman Long wrote:
> > 
> > On 3/20/24 06:03, Pavel Tikhomirov wrote:
> > > In memory_max_write() we first set memcg->memory.max and only then
> > > try to enforce it in loop. What if while we are in loop someone else
> > > have changed memcg->memory.max but we are still trying to enforce
> > > the old value? I believe this can lead to nasty consequence like getting
> > > an oom on perfectly fine cgroup within it's limits or excess reclaim.
> > 
> > Concurrent write to the same cgroup control file is not possible as the
> > underlying kernfs_open_file structure has a mutex that serialize access to
> > the file.
> 
> This is good to know and I was not aware of that. Thanks!

Btw. even if the interface itself is serialized then uncoordinated
userspace is still timing dependent so fundamentally racy.
-- 
Michal Hocko
SUSE Labs

Re: [PATCH] mm/memcontrol: stop resize loop if limit was changed again

Posted by Michal Hocko 1 year, 10 months ago

On Wed 20-03-24 18:03:30, Pavel Tikhomirov wrote:
> In memory_max_write() we first set memcg->memory.max and only then
> try to enforce it in loop. What if while we are in loop someone else
> have changed memcg->memory.max but we are still trying to enforce
> the old value? I believe this can lead to nasty consequence like getting
> an oom on perfectly fine cgroup within it's limits or excess reclaim.

I would argue that uncoordinated hard limit configuration can cause
problems on their own. Beside how is this any different from changing
the high limit while we are inside the reclaim loop?

> We also have exactly the same thing in memory_high_write().
> 
> So let's stop enforcing old limits if we already have a new ones.

I do see any reasons why this would be harmful I just do not see why
this is a real thing or why the new behavior is any better for racing
updaters as those are not deterministic anyway. If you have any actual
usecase then more details would really help to justify this change.

The existing behavior makes some sense as it enforces the given limit
deterministically.

> Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
> ---
>  mm/memcontrol.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 61932c9215e7..81b303728491 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -6769,6 +6769,9 @@ static ssize_t memory_high_write(struct kernfs_open_file *of,
>  		unsigned long nr_pages = page_counter_read(&memcg->memory);
>  		unsigned long reclaimed;
>  
> +		if (memcg->memory.high != high)
> +			break;
> +
>  		if (nr_pages <= high)
>  			break;
>  
> @@ -6817,6 +6820,9 @@ static ssize_t memory_max_write(struct kernfs_open_file *of,
>  	for (;;) {
>  		unsigned long nr_pages = page_counter_read(&memcg->memory);
>  
> +		if (memcg->memory.max != max)
> +			break;
> +
>  		if (nr_pages <= max)
>  			break;
>  
> -- 
> 2.43.0

-- 
Michal Hocko
SUSE Labs

Re: [PATCH] mm/memcontrol: stop resize loop if limit was changed again

Posted by Pavel Tikhomirov 1 year, 10 months ago

On 20/03/2024 18:28, Michal Hocko wrote:
> On Wed 20-03-24 18:03:30, Pavel Tikhomirov wrote:
>> In memory_max_write() we first set memcg->memory.max and only then
>> try to enforce it in loop. What if while we are in loop someone else
>> have changed memcg->memory.max but we are still trying to enforce
>> the old value? I believe this can lead to nasty consequence like getting
>> an oom on perfectly fine cgroup within it's limits or excess reclaim.
> 
> I would argue that uncoordinated hard limit configuration can cause
> problems on their own.

Sorry, didn't know that.

> Beside how is this any different from changing
> the high limit while we are inside the reclaim loop?

I believe reclaim loop rereads limits on each iteration, e.g. in 
reclaim_high(), so it should always be enforcing the right limit.

> 
>> We also have exactly the same thing in memory_high_write().
>>
>> So let's stop enforcing old limits if we already have a new ones.
> 
> I do see any reasons why this would be harmful I just do not see why
> this is a real thing or why the new behavior is any better for racing
> updaters as those are not deterministic anyway. If you have any actual
> usecase then more details would really help to justify this change.
> 
> The existing behavior makes some sense as it enforces the given limit
> deterministically.

I don't have any actual problem, usecase or reproduce at hand, I only 
see a potential problem:

Let's imagine that:

a) We set cgroup max limit to some small value, memory_max_write updates 
memcg->memory.max and starts spinning in loop as it wants to reclaim 
some memory which does not fit in new limit.

b) We don't need small limit anymore and we raise the limit to a big 
value, but memory_max_write() from (a) is still spinning. And if we are 
lucky enough and processes of cgroup are constantly consuming memory, to 
compensate effect from memory_max_write() from (a), so that it will 
continue spinning there forever.

Yes it is not that bad, because memory_max/high_write() also constantly 
checks for pending signals in loop so they won't actually get 
irreversibly stuck. But I just thought it was worth fixing.

> 
>> Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
>> ---
>>   mm/memcontrol.c | 6 ++++++
>>   1 file changed, 6 insertions(+)
>>
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 61932c9215e7..81b303728491 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -6769,6 +6769,9 @@ static ssize_t memory_high_write(struct kernfs_open_file *of,
>>   		unsigned long nr_pages = page_counter_read(&memcg->memory);
>>   		unsigned long reclaimed;
>>   
>> +		if (memcg->memory.high != high)
>> +			break;
>> +
>>   		if (nr_pages <= high)
>>   			break;
>>   
>> @@ -6817,6 +6820,9 @@ static ssize_t memory_max_write(struct kernfs_open_file *of,
>>   	for (;;) {
>>   		unsigned long nr_pages = page_counter_read(&memcg->memory);
>>   
>> +		if (memcg->memory.max != max)
>> +			break;
>> +
>>   		if (nr_pages <= max)
>>   			break;
>>   
>> -- 
>> 2.43.0
> 

-- 
Best regards, Tikhomirov Pavel
Senior Software Developer, Virtuozzo.

Re: [PATCH] mm/memcontrol: stop resize loop if limit was changed again

Posted by Roman Gushchin 1 year, 10 months ago

On Wed, Mar 20, 2024 at 06:55:05PM +0800, Pavel Tikhomirov wrote:
> 
> 
> On 20/03/2024 18:28, Michal Hocko wrote:
> > On Wed 20-03-24 18:03:30, Pavel Tikhomirov wrote:
> > > In memory_max_write() we first set memcg->memory.max and only then
> > > try to enforce it in loop. What if while we are in loop someone else
> > > have changed memcg->memory.max but we are still trying to enforce
> > > the old value? I believe this can lead to nasty consequence like getting
> > > an oom on perfectly fine cgroup within it's limits or excess reclaim.
> > 
> > I would argue that uncoordinated hard limit configuration can cause
> > problems on their own.
> 
> Sorry, didn't know that.
> 
> > Beside how is this any different from changing
> > the high limit while we are inside the reclaim loop?
> 
> I believe reclaim loop rereads limits on each iteration, e.g. in
> reclaim_high(), so it should always be enforcing the right limit.
> 
> > 
> > > We also have exactly the same thing in memory_high_write().
> > > 
> > > So let's stop enforcing old limits if we already have a new ones.
> > 
> > I do see any reasons why this would be harmful I just do not see why
> > this is a real thing or why the new behavior is any better for racing
> > updaters as those are not deterministic anyway. If you have any actual
> > usecase then more details would really help to justify this change.
> > 
> > The existing behavior makes some sense as it enforces the given limit
> > deterministically.
> 
> I don't have any actual problem, usecase or reproduce at hand, I only see a
> potential problem:

If the problem is only potential and also not very severe (it's not a crash
or memory corruption or something like this), I'd say let's keep things as
they are.

Thanks!

Re: [PATCH] mm/memcontrol: stop resize loop if limit was changed again

Posted by Michal Hocko 1 year, 10 months ago

On Wed 20-03-24 18:55:05, Pavel Tikhomirov wrote:
> 
> 
> On 20/03/2024 18:28, Michal Hocko wrote:
> > On Wed 20-03-24 18:03:30, Pavel Tikhomirov wrote:
> > > In memory_max_write() we first set memcg->memory.max and only then
> > > try to enforce it in loop. What if while we are in loop someone else
> > > have changed memcg->memory.max but we are still trying to enforce
> > > the old value? I believe this can lead to nasty consequence like getting
> > > an oom on perfectly fine cgroup within it's limits or excess reclaim.
> > 
> > I would argue that uncoordinated hard limit configuration can cause
> > problems on their own.
> 
> Sorry, didn't know that.

Well, just consider potential over-reclaim as a result of several
competing actors to set the same limit. Or completely indeterministic
final output of the limit setting depending on timing. This simply
cannot work reliably.
 
> > Beside how is this any different from changing
> > the high limit while we are inside the reclaim loop?
> 
> I believe reclaim loop rereads limits on each iteration, e.g. in
> reclaim_high(), so it should always be enforcing the right limit.

Reclaim loop might happen to take quite some time...
 
> > > We also have exactly the same thing in memory_high_write().
> > > 
> > > So let's stop enforcing old limits if we already have a new ones.
> > 
> > I do see any reasons why this would be harmful I just do not see why
> > this is a real thing or why the new behavior is any better for racing
> > updaters as those are not deterministic anyway. If you have any actual
> > usecase then more details would really help to justify this change.
> > 
> > The existing behavior makes some sense as it enforces the given limit
> > deterministically.
> 
> I don't have any actual problem, usecase or reproduce at hand, I only see a
> potential problem:
> 
> Let's imagine that:
> 
> a) We set cgroup max limit to some small value, memory_max_write updates
> memcg->memory.max and starts spinning in loop as it wants to reclaim some
> memory which does not fit in new limit.
> 
> b) We don't need small limit anymore and we raise the limit to a big value,
> but memory_max_write() from (a) is still spinning. And if we are lucky
> enough and processes of cgroup are constantly consuming memory, to
> compensate effect from memory_max_write() from (a), so that it will continue
> spinning there forever.

This is a killable operation, so if you decide to change mind about
limit setting and the current update is still in progress then just
terminate it rather then override by a different process.

> Yes it is not that bad, because memory_max/high_write() also constantly
> checks for pending signals in loop so they won't actually get irreversibly
> stuck. But I just thought it was worth fixing.

If we want to fix this parallel limits setting then we should also think
about a reasonable and predictable behavior and that would likely
require some sort of locking IMO.
-- 
Michal Hocko
SUSE Labs