mm/oom_kill: break evaluation when a task has been selected

[PATCH] mm/oom_kill: break evaluation when a task has been selected

Posted by Zhaoyu Liu 4 years, 1 month ago

oom points no longer need to be calculated if a task is oom_task_origin(),
so return 1 to stop the oom_evaluate_task().

Signed-off-by: Zhaoyu Liu <zackary.liu.pro@gmail.com>
---
 mm/oom_kill.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 3996301450e8..b407fba21d19 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -308,7 +308,7 @@ static enum oom_constraint constrained_alloc(struct oom_control *oc)
 static int oom_evaluate_task(struct task_struct *task, void *arg)
 {
 	struct oom_control *oc = arg;
-	long points;
+	long points = 0;
 
 	if (oom_unkillable_task(task))
 		goto next;
@@ -349,7 +349,7 @@ static int oom_evaluate_task(struct task_struct *task, void *arg)
 	oc->chosen = task;
 	oc->chosen_points = points;
 next:
-	return 0;
+	return points == LONG_MAX;
 abort:
 	if (oc->chosen)
 		put_task_struct(oc->chosen);
-- 
2.17.1

Re: [PATCH] mm/oom_kill: break evaluation when a task has been selected

Posted by Michal Hocko 4 years ago

On Sat 14-05-22 15:52:28, Zhaoyu Liu wrote:
> oom points no longer need to be calculated if a task is oom_task_origin(),
> so return 1 to stop the oom_evaluate_task().

This doesn't really explain why this is really desired. Is this a fix,
optimization?

Please also note that this change has some side effects. For one, the
task marked as oom origin will get killed even if there is still a
pending oom victim which hasn't been fully dismantled. Is this
intentional?

> Signed-off-by: Zhaoyu Liu <zackary.liu.pro@gmail.com>
> ---
>  mm/oom_kill.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 3996301450e8..b407fba21d19 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -308,7 +308,7 @@ static enum oom_constraint constrained_alloc(struct oom_control *oc)
>  static int oom_evaluate_task(struct task_struct *task, void *arg)
>  {
>  	struct oom_control *oc = arg;
> -	long points;
> +	long points = 0;
>  
>  	if (oom_unkillable_task(task))
>  		goto next;
> @@ -349,7 +349,7 @@ static int oom_evaluate_task(struct task_struct *task, void *arg)
>  	oc->chosen = task;
>  	oc->chosen_points = points;
>  next:
> -	return 0;
> +	return points == LONG_MAX;
>  abort:
>  	if (oc->chosen)
>  		put_task_struct(oc->chosen);
> -- 
> 2.17.1

-- 
Michal Hocko
SUSE Labs

Re: [PATCH] mm/oom_kill: break evaluation when a task has been selected

Posted by Zackary Liu 4 years ago

On Jun 1 2022, at 3:45 pm, Michal Hocko <mhocko@suse.com> wrote:

> On Sat 14-05-22 15:52:28, Zhaoyu Liu wrote:
>> oom points no longer need to be calculated if a task is oom_task_origin(),
>> so return 1 to stop the oom_evaluate_task().
> 
> This doesn't really explain why this is really desired. Is this a fix,
> optimization?
> 
> Please also note that this change has some side effects. For one, the
> task marked as oom origin will get killed even if there is still a
> pending oom victim which hasn't been fully dismantled. Is this
> intentional?

Thank you very much for reminding.

From my point of view, the victim was marked in the last oom, and now it
has entered the oom again, which means that the system still has no
deprecated memory available. In order to ensure that the system can
return to normal as soon as possible, killing the origin task
immediately should be A good choice, and the role of this patch is to
end oom_evaluate_task and return true as soon as the origin task is found.
Maybe this patch is not the optimal solution, it has a trade-off.

It is an honor to discuss with you, thank you very much!

zackary

>> Signed-off-by: Zhaoyu Liu <zackary.liu.pro@gmail.com>
>> ---
>>  mm/oom_kill.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>> 
>> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
>> index 3996301450e8..b407fba21d19 100644
>> --- a/mm/oom_kill.c
>> +++ b/mm/oom_kill.c
>> @@ -308,7 +308,7 @@ static enum oom_constraint
>> constrained_alloc(struct oom_control *oc)
>>  static int oom_evaluate_task(struct task_struct *task, void *arg)
>>  {
>>  	struct oom_control *oc = arg;
>> -	long points;
>> +	long points = 0;
>>  
>>  	if (oom_unkillable_task(task))
>>  		goto next;
>> @@ -349,7 +349,7 @@ static int oom_evaluate_task(struct task_struct
>> *task, void *arg)
>>  	oc->chosen = task;
>>  	oc->chosen_points = points;
>>  next:
>> -	return 0;
>> +	return points == LONG_MAX;
>>  abort:
>>  	if (oc->chosen)
>>  		put_task_struct(oc->chosen);
>> -- 
>> 2.17.1
> 
> -- 
> Michal Hocko
> SUSE Labs
>

Re: [PATCH] mm/oom_kill: break evaluation when a task has been selected

Posted by Michal Hocko 4 years ago

On Sat 04-06-22 18:35:19, Zackary Liu wrote:
> 
> On Jun 1 2022, at 3:45 pm, Michal Hocko <mhocko@suse.com> wrote:
> 
> > On Sat 14-05-22 15:52:28, Zhaoyu Liu wrote:
> >> oom points no longer need to be calculated if a task is oom_task_origin(),
> >> so return 1 to stop the oom_evaluate_task().
> > 
> > This doesn't really explain why this is really desired. Is this a fix,
> > optimization?
> > 
> > Please also note that this change has some side effects. For one, the
> > task marked as oom origin will get killed even if there is still a
> > pending oom victim which hasn't been fully dismantled. Is this
> > intentional?
> 
> Thank you very much for reminding.
> 
> From my point of view, the victim was marked in the last oom, and now it
> has entered the oom again, which means that the system still has no
> deprecated memory available.

This is not an unusual situation. OOM victims can take some time to die
and release their memory. The oom_reaper is there to fast forward that
process and guarantee a forward progress. But this can still take some
time. Our general policy is to back off when there is an alive oom
victim encountered. Have a look at the tsk_is_oom_victim test in
oom_evaluate_task. For that heuristic to be effective the whole task
list (wether the global one or memcg) has to be evaluated.

> In order to ensure that the system can
> return to normal as soon as possible, killing the origin task
> immediately should be A good choice, and the role of this patch is to
> end oom_evaluate_task and return true as soon as the origin task is found.

Could you be more specific how does this patch guarantees a forward
progress? What is the actual usecase that benefits from this change?

These are all important information for future reference. Please note I
am not saying the patch is wrong. I just still do not see why it is
useful.

> Maybe this patch is not the optimal solution, it has a trade-off.

If there are trade-offs, please document them in the changelog.

The way I see it is that oom_task_origin heuristic has been introduced
to help killing swapoff operation because the swapped out memory doesn't
fit into memory. This is a very reasonable thing to do in general but it
also represents an early failure visible to the userspace. If there is a
pre-existing oom victim then I would argue that we should try to avoid
the failure.

-- 
Michal Hocko
SUSE Labs

Re: [PATCH] mm/oom_kill: break evaluation when a task has been selected

Posted by Michal Hocko 4 years ago

On Mon 06-06-22 10:33:30, Michal Hocko wrote:
> On Sat 04-06-22 18:35:19, Zackary Liu wrote:
> > 
> > On Jun 1 2022, at 3:45 pm, Michal Hocko <mhocko@suse.com> wrote:
> > 
> > > On Sat 14-05-22 15:52:28, Zhaoyu Liu wrote:
> > >> oom points no longer need to be calculated if a task is oom_task_origin(),
> > >> so return 1 to stop the oom_evaluate_task().
> > > 
> > > This doesn't really explain why this is really desired. Is this a fix,
> > > optimization?
> > > 
> > > Please also note that this change has some side effects. For one, the
> > > task marked as oom origin will get killed even if there is still a
> > > pending oom victim which hasn't been fully dismantled. Is this
> > > intentional?
> > 
> > Thank you very much for reminding.
> > 
> > From my point of view, the victim was marked in the last oom, and now it
> > has entered the oom again, which means that the system still has no
> > deprecated memory available.
> 
> This is not an unusual situation. OOM victims can take some time to die
> and release their memory. The oom_reaper is there to fast forward that
> process and guarantee a forward progress. But this can still take some
> time. Our general policy is to back off when there is an alive oom
> victim encountered. Have a look at the tsk_is_oom_victim test in
> oom_evaluate_task. For that heuristic to be effective the whole task
> list (wether the global one or memcg) has to be evaluated.
> 
> > In order to ensure that the system can
> > return to normal as soon as possible, killing the origin task
> > immediately should be A good choice, and the role of this patch is to
> > end oom_evaluate_task and return true as soon as the origin task is found.
> 
> Could you be more specific how does this patch guarantees a forward
> progress? What is the actual usecase that benefits from this change?
> 
> These are all important information for future reference. Please note I
> am not saying the patch is wrong. I just still do not see why it is
> useful.
> 
> > Maybe this patch is not the optimal solution, it has a trade-off.
> 
> If there are trade-offs, please document them in the changelog.
> 
> The way I see it is that oom_task_origin heuristic has been introduced
> to help killing swapoff operation because the swapped out memory doesn't
> fit into memory. This is a very reasonable thing to do in general but it
> also represents an early failure visible to the userspace. If there is a
> pre-existing oom victim then I would argue that we should try to avoid
> the failure.

Andrew, please drop this patch from your tree. I do not see any real
justification here.

Thanks!
-- 
Michal Hocko
SUSE Labs