[PATCH v6 1/2] mm/oom_kill: Do not delay oom reaper when the victim is frozen

zhongjinji posted 2 patches 1 month ago
[PATCH v6 1/2] mm/oom_kill: Do not delay oom reaper when the victim is frozen
Posted by zhongjinji 1 month ago
The oom reaper is a mechanism to guarantee a forward process during OOM
situation when the oom victim cannot terminate on its own (e.g. being
blocked in uninterruptible state or frozen by cgroup freezer). In order
to give the victim some time to terminate properly the oom reaper is
delayed in its invocation. This is particularly beneficial when the oom
victim is holding robust futex resources as the anonymous memory tear
down can break those. [1]

On the other hand deliberately frozen tasks by the freezer cgroup will
not wake up until they are thawed in the userspace and delay is
effectively pointless. Therefore opt out from the delay for cgroup
frozen oom victims.

Reference:
[1] https://lore.kernel.org/all/20220414144042.677008-1-npache@redhat.com/T/#u

Signed-off-by: zhongjinji <zhongjinji@honor.com>
---
 mm/oom_kill.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 25923cfec9c6..a5e9074896a1 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -700,7 +700,14 @@ static void queue_oom_reaper(struct task_struct *tsk)
 
 	get_task_struct(tsk);
 	timer_setup(&tsk->oom_reaper_timer, wake_oom_reaper, 0);
-	tsk->oom_reaper_timer.expires = jiffies + OOM_REAPER_DELAY;
+	tsk->oom_reaper_timer.expires = jiffies;
+
+	/*
+	 * If the task is frozen by the cgroup freezer, the delay is unnecessary
+	 * because it cannot exit until thawed. Skip the delay for frozen victims.
+	 */
+	if (!frozen(tsk))
+		tsk->oom_reaper_timer.expires += OOM_REAPER_DELAY;
 	add_timer(&tsk->oom_reaper_timer);
 }
 
-- 
2.17.1
Re: [PATCH v6 1/2] mm/oom_kill: Do not delay oom reaper when the victim is frozen
Posted by Michal Hocko 1 month ago
On Fri 29-08-25 14:55:49, zhongjinji wrote:
> The oom reaper is a mechanism to guarantee a forward process during OOM
> situation when the oom victim cannot terminate on its own (e.g. being
> blocked in uninterruptible state or frozen by cgroup freezer). In order
> to give the victim some time to terminate properly the oom reaper is
> delayed in its invocation. This is particularly beneficial when the oom
> victim is holding robust futex resources as the anonymous memory tear
> down can break those. [1]
> 
> On the other hand deliberately frozen tasks by the freezer cgroup will
> not wake up until they are thawed in the userspace and delay is
> effectively pointless. Therefore opt out from the delay for cgroup
> frozen oom victims.
> 
> Reference:
> [1] https://lore.kernel.org/all/20220414144042.677008-1-npache@redhat.com/T/#u
> 
> Signed-off-by: zhongjinji <zhongjinji@honor.com>

Acked-by: Michal Hocko <mhocko@suse.com>
Thanks

> ---
>  mm/oom_kill.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 25923cfec9c6..a5e9074896a1 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -700,7 +700,14 @@ static void queue_oom_reaper(struct task_struct *tsk)
>  
>  	get_task_struct(tsk);
>  	timer_setup(&tsk->oom_reaper_timer, wake_oom_reaper, 0);
> -	tsk->oom_reaper_timer.expires = jiffies + OOM_REAPER_DELAY;
> +	tsk->oom_reaper_timer.expires = jiffies;
> +
> +	/*
> +	 * If the task is frozen by the cgroup freezer, the delay is unnecessary
> +	 * because it cannot exit until thawed. Skip the delay for frozen victims.
> +	 */
> +	if (!frozen(tsk))
> +		tsk->oom_reaper_timer.expires += OOM_REAPER_DELAY;
>  	add_timer(&tsk->oom_reaper_timer);
>  }
>  
> -- 
> 2.17.1

-- 
Michal Hocko
SUSE Labs
Re: [PATCH v6 1/2] mm/oom_kill: Do not delay oom reaper when the victim is frozen
Posted by zhongjinji 1 month ago
> On Fri 29-08-25 14:55:49, zhongjinji wrote:
> > The oom reaper is a mechanism to guarantee a forward process during OOM
> > situation when the oom victim cannot terminate on its own (e.g. being
> > blocked in uninterruptible state or frozen by cgroup freezer). In order
> > to give the victim some time to terminate properly the oom reaper is
> > delayed in its invocation. This is particularly beneficial when the oom
> > victim is holding robust futex resources as the anonymous memory tear
> > down can break those. [1]
> > 
> > On the other hand deliberately frozen tasks by the freezer cgroup will
> > not wake up until they are thawed in the userspace and delay is
> > effectively pointless. Therefore opt out from the delay for cgroup
> > frozen oom victims.
> > 
> > Reference:
> > [1] https://lore.kernel.org/all/20220414144042.677008-1-npache@redhat.com/T/#u
> > 
> > Signed-off-by: zhongjinji <zhongjinji@honor.com>
> 
> Acked-by: Michal Hocko <mhocko@suse.com>
> Thanks

Sorry, I found that it doesn't work now (because I previously tested it by
simulating OOM, which made testing easier but also caused the mistake. I will
re-run the new test). Calling __thaw_task in mark_oom_victim will change the
victim's state to running. However, other threads are still in the frozen state,
so the process still can't exit. We should update it again by moving __thaw_task
to after frozen (this way, executing __thaw_task and frozen in the same function
looks more reasonable). Since mark_oom_victim and queue_oom_reaper always appear
in pairs, this won't introduce any risky changes.

 static void queue_oom_reaper(struct task_struct *tsk)
 {
+       bool delay = !frozen(tsk);
+
+       /*
+        * Make sure that the task is woken up from uninterruptible sleep
+        * if it is frozen because OOM killer wouldn't be able to free
+        * any memory and livelock. freezing_slow_path will tell the freezer
+        * that TIF_MEMDIE tasks should be ignored.
+        */
+       __thaw_task(tsk);
+
        /* mm is already queued? */
        if (test_and_set_bit(MMF_OOM_REAP_QUEUED, &tsk->signal->oom_mm->flags))
                return;
@@ -711,7 +721,7 @@ static void queue_oom_reaper(struct task_struct *tsk)
         * If the task is frozen by the cgroup freezer, the delay is unnecessary
         * because it cannot exit until thawed. Skip the delay for frozen victims.
         */
-       if (!frozen(tsk))
+       if (delay)
                tsk->oom_reaper_timer.expires += OOM_REAPER_DELAY;
        add_timer(&tsk->oom_reaper_timer);
 }
@@ -783,13 +793,6 @@ static void mark_oom_victim(struct task_struct *tsk)
        if (!cmpxchg(&tsk->signal->oom_mm, NULL, mm))
                mmgrab(tsk->signal->oom_mm);

-       /*
-        * Make sure that the task is woken up from uninterruptible sleep
-        * if it is frozen because OOM killer wouldn't be able to free
-        * any memory and livelock. freezing_slow_path will tell the freezer
-        * that TIF_MEMDIE tasks should be ignored.
-        */
-       __thaw_task(tsk);
        atomic_inc(&oom_victims);
        cred = get_task_cred(tsk);
        trace_mark_victim(tsk, cred->uid.val);

> 
> > ---
> >  mm/oom_kill.c | 9 ++++++++-
> >  1 file changed, 8 insertions(+), 1 deletion(-)
> > 
> > diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> > index 25923cfec9c6..a5e9074896a1 100644
> > --- a/mm/oom_kill.c
> > +++ b/mm/oom_kill.c
> > @@ -700,7 +700,14 @@ static void queue_oom_reaper(struct task_struct *tsk)
> >  
> >  	get_task_struct(tsk);
> >  	timer_setup(&tsk->oom_reaper_timer, wake_oom_reaper, 0);
> > -	tsk->oom_reaper_timer.expires = jiffies + OOM_REAPER_DELAY;
> > +	tsk->oom_reaper_timer.expires = jiffies;
> > +
> > +	/*
> > +	 * If the task is frozen by the cgroup freezer, the delay is unnecessary
> > +	 * because it cannot exit until thawed. Skip the delay for frozen victims.
> > +	 */
> > +	if (!frozen(tsk))
> > +		tsk->oom_reaper_timer.expires += OOM_REAPER_DELAY;
> >  	add_timer(&tsk->oom_reaper_timer);
> >  }
> >  
> > -- 
> > 2.17.1
Re: [PATCH v6 1/2] mm/oom_kill: Do not delay oom reaper when the victim is frozen
Posted by Michal Hocko 1 month ago
On Mon 01-09-25 17:30:57, zhongjinji wrote:
> > On Fri 29-08-25 14:55:49, zhongjinji wrote:
> > > The oom reaper is a mechanism to guarantee a forward process during OOM
> > > situation when the oom victim cannot terminate on its own (e.g. being
> > > blocked in uninterruptible state or frozen by cgroup freezer). In order
> > > to give the victim some time to terminate properly the oom reaper is
> > > delayed in its invocation. This is particularly beneficial when the oom
> > > victim is holding robust futex resources as the anonymous memory tear
> > > down can break those. [1]
> > > 
> > > On the other hand deliberately frozen tasks by the freezer cgroup will
> > > not wake up until they are thawed in the userspace and delay is
> > > effectively pointless. Therefore opt out from the delay for cgroup
> > > frozen oom victims.
> > > 
> > > Reference:
> > > [1] https://lore.kernel.org/all/20220414144042.677008-1-npache@redhat.com/T/#u
> > > 
> > > Signed-off-by: zhongjinji <zhongjinji@honor.com>
> > 
> > Acked-by: Michal Hocko <mhocko@suse.com>
> > Thanks
> 
> Sorry, I found that it doesn't work now (because I previously tested it by
> simulating OOM, which made testing easier but also caused the mistake. I will
> re-run the new test). Calling __thaw_task in mark_oom_victim will change the
> victim's state to running. However, other threads are still in the frozen state,
> so the process still can't exit. We should update it again by moving __thaw_task
> to after frozen (this way, executing __thaw_task and frozen in the same function
> looks more reasonable). Since mark_oom_victim and queue_oom_reaper always appear
> in pairs, this won't introduce any risky changes.

Hmm, I must have completely forgot that we are actually thawing the
frozen task! That means that the actual argument for not delaying the
oom reaper doesn't hold.
Now I do see why the existing implementation doesn't really work as you
would expect though. Is there any reason why we are not thawing the
whole process group? I guess I just didn't realize that __thaw_task is
per thread rather than per process back then when I have introduced it.
Because thread specific behavior makes very little sense to me TBH.
So rather than plaing with __thaw_task placement which doesn't really
make much sense wrt to delaying the reaper we should look into that
part.

Sorry, I should have realized earlier when proposing that.

-- 
Michal Hocko
SUSE Labs
Re: [PATCH v6 1/2] mm/oom_kill: Do not delay oom reaper when the victim is frozen
Posted by zhongjinji 1 month ago
> > Sorry, I found that it doesn't work now (because I previously tested it by
> > simulating OOM, which made testing easier but also caused the mistake. I will
> > re-run the new test). Calling __thaw_task in mark_oom_victim will change the
> > victim's state to running. However, other threads are still in the frozen state,
> > so the process still can't exit. We should update it again by moving __thaw_task
> > to after frozen (this way, executing __thaw_task and frozen in the same function
> > looks more reasonable). Since mark_oom_victim and queue_oom_reaper always appear
> > in pairs, this won't introduce any risky changes.
> 
> Hmm, I must have completely forgot that we are actually thawing the
> frozen task! That means that the actual argument for not delaying the
> oom reaper doesn't hold.
> Now I do see why the existing implementation doesn't really work as you
> would expect though. Is there any reason why we are not thawing the
> whole process group? I guess I just didn't realize that __thaw_task is
> per thread rather than per process back then when I have introduced it.

Previously, I didn't know why we needed to call __thaw_task() in 
mark_oom_victim(). Now I understand.

> Because thread specific behavior makes very little sense to me TBH.
> So rather than plaing with __thaw_task placement which doesn't really
> make much sense wrt to delaying the reaper we should look into that
> part.
> 
> Sorry, I should have realized earlier when proposing that.

Is this modification acceptable?

This change only thaws the process previously identified as the victim,
and does not thaw the process being killed in for_each_process.

The reason is that the process being killed in for_each_process is usually
a vfork process, which is only temporary and rarely encountered.

@@ -772,12 +773,18 @@ static void mark_oom_victim(struct task_struct *tsk)
                mmgrab(tsk->signal->oom_mm);

        /*
-        * Make sure that the task is woken up from uninterruptible sleep
+        * Make sure that the process is woken up from uninterruptible sleep
         * if it is frozen because OOM killer wouldn't be able to free
         * any memory and livelock. freezing_slow_path will tell the freezer
-        * that TIF_MEMDIE tasks should be ignored.
+        * that TIF_MEMDIE thread should be ignored.
         */
-       __thaw_task(tsk);
+       rcu_read_lock();
+       for_each_thread(tsk, t) {
+               set_tsk_thread_flag(t, TIF_MEMDIE);
+               __thaw_task(t);
+       }
+       rcu_read_unlock();
+
        atomic_inc(&oom_victims);
        cred = get_task_cred(tsk);
        trace_mark_victim(tsk, cred->uid.val);
Re: [PATCH v6 1/2] mm/oom_kill: Do not delay oom reaper when the victim is frozen
Posted by Michal Hocko 1 month ago
On Wed 03-09-25 00:01:29, zhongjinji wrote:
[...]
> @@ -772,12 +773,18 @@ static void mark_oom_victim(struct task_struct *tsk)
>                 mmgrab(tsk->signal->oom_mm);
> 
>         /*
> -        * Make sure that the task is woken up from uninterruptible sleep
> +        * Make sure that the process is woken up from uninterruptible sleep
>          * if it is frozen because OOM killer wouldn't be able to free
>          * any memory and livelock. freezing_slow_path will tell the freezer
> -        * that TIF_MEMDIE tasks should be ignored.
> +        * that TIF_MEMDIE thread should be ignored.
>          */
> -       __thaw_task(tsk);
> +       rcu_read_lock();
> +       for_each_thread(tsk, t) {
> +               set_tsk_thread_flag(t, TIF_MEMDIE);
> +               __thaw_task(t);
> +       }
> +       rcu_read_unlock();
> +

I would prefer if we had thaw_process() rather than open code it here.
But the implementation matches what I would expect it to do.

>         atomic_inc(&oom_victims);
>         cred = get_task_cred(tsk);
>         trace_mark_victim(tsk, cred->uid.val);

-- 
Michal Hocko
SUSE Labs
Re: [PATCH v6 1/2] mm/oom_kill: Do not delay oom reaper when the victim is frozen
Posted by Shakeel Butt 1 month ago
On Fri, Aug 29, 2025 at 02:55:49PM +0800, zhongjinji wrote:
> The oom reaper is a mechanism to guarantee a forward process during OOM
> situation when the oom victim cannot terminate on its own (e.g. being
> blocked in uninterruptible state or frozen by cgroup freezer). In order
> to give the victim some time to terminate properly the oom reaper is
> delayed in its invocation. This is particularly beneficial when the oom
> victim is holding robust futex resources as the anonymous memory tear
> down can break those. [1]
> 
> On the other hand deliberately frozen tasks by the freezer cgroup will
> not wake up until they are thawed in the userspace and delay is
> effectively pointless. Therefore opt out from the delay for cgroup
> frozen oom victims.
> 
> Reference:
> [1] https://lore.kernel.org/all/20220414144042.677008-1-npache@redhat.com/T/#u
> 
> Signed-off-by: zhongjinji <zhongjinji@honor.com>
> ---
>  mm/oom_kill.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 25923cfec9c6..a5e9074896a1 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -700,7 +700,14 @@ static void queue_oom_reaper(struct task_struct *tsk)
>  
>  	get_task_struct(tsk);
>  	timer_setup(&tsk->oom_reaper_timer, wake_oom_reaper, 0);
> -	tsk->oom_reaper_timer.expires = jiffies + OOM_REAPER_DELAY;
> +	tsk->oom_reaper_timer.expires = jiffies;
> +
> +	/*
> +	 * If the task is frozen by the cgroup freezer, the delay is unnecessary
> +	 * because it cannot exit until thawed. Skip the delay for frozen victims.
> +	 */
> +	if (!frozen(tsk))

Can you please change the above condition with the following to handle
v2 as well?

	if (!frozen(tsk) && !(READ_ONCE(tsk->frozen)))

> +		tsk->oom_reaper_timer.expires += OOM_REAPER_DELAY;
>  	add_timer(&tsk->oom_reaper_timer);
>  }
>  
> -- 
> 2.17.1
>
Re: [PATCH v6 1/2] mm/oom_kill: Do not delay oom reaper when the victim is frozen
Posted by zhongjinji 1 month ago
> >  	get_task_struct(tsk);
> >  	timer_setup(&tsk->oom_reaper_timer, wake_oom_reaper, 0);
> > -	tsk->oom_reaper_timer.expires = jiffies + OOM_REAPER_DELAY;
> > +	tsk->oom_reaper_timer.expires = jiffies;
> > +
> > +	/*
> > +	 * If the task is frozen by the cgroup freezer, the delay is unnecessary
> > +	 * because it cannot exit until thawed. Skip the delay for frozen victims.
> > +	 */
> > +	if (!frozen(tsk))
> 
> Can you please change the above condition with the following to handle
> v2 as well?

Thank you, but I think the cgroupv2 check isn't needed, since a process frozen
by the cgroup v2 freezer won't block exit after being killed. Would it be
better to note in the comment or changelog that this change is for cgroup v1?

> 	if (!frozen(tsk) && !(READ_ONCE(tsk->frozen)))
> 
> > +		tsk->oom_reaper_timer.expires += OOM_REAPER_DELAY;
> >  	add_timer(&tsk->oom_reaper_timer);
> >  }
> >  
> > -- 
> > 2.17.1
> >
Re: [PATCH v6 1/2] mm/oom_kill: Do not delay oom reaper when the victim is frozen
Posted by Liam R. Howlett 1 month ago
* zhongjinji <zhongjinji@honor.com> [250829 02:56]:
> The oom reaper is a mechanism to guarantee a forward process during OOM
> situation when the oom victim cannot terminate on its own (e.g. being
> blocked in uninterruptible state or frozen by cgroup freezer). In order
> to give the victim some time to terminate properly the oom reaper is
> delayed in its invocation. This is particularly beneficial when the oom
> victim is holding robust futex resources as the anonymous memory tear
> down can break those. [1]
> 
> On the other hand deliberately frozen tasks by the freezer cgroup will
> not wake up until they are thawed in the userspace and delay is
> effectively pointless. Therefore opt out from the delay for cgroup
> frozen oom victims.
> 
> Reference:
> [1] https://lore.kernel.org/all/20220414144042.677008-1-npache@redhat.com/T/#u
> 
> Signed-off-by: zhongjinji <zhongjinji@honor.com>

Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>

> ---
>  mm/oom_kill.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 25923cfec9c6..a5e9074896a1 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -700,7 +700,14 @@ static void queue_oom_reaper(struct task_struct *tsk)
>  
>  	get_task_struct(tsk);
>  	timer_setup(&tsk->oom_reaper_timer, wake_oom_reaper, 0);
> -	tsk->oom_reaper_timer.expires = jiffies + OOM_REAPER_DELAY;
> +	tsk->oom_reaper_timer.expires = jiffies;
> +
> +	/*
> +	 * If the task is frozen by the cgroup freezer, the delay is unnecessary
> +	 * because it cannot exit until thawed. Skip the delay for frozen victims.
> +	 */
> +	if (!frozen(tsk))
> +		tsk->oom_reaper_timer.expires += OOM_REAPER_DELAY;
>  	add_timer(&tsk->oom_reaper_timer);
>  }
>  
> -- 
> 2.17.1
> 
>
Re: [PATCH v6 1/2] mm/oom_kill: Do not delay oom reaper when the victim is frozen
Posted by Lorenzo Stoakes 1 month ago
On Fri, Aug 29, 2025 at 02:55:49PM +0800, zhongjinji wrote:
> The oom reaper is a mechanism to guarantee a forward process during OOM
> situation when the oom victim cannot terminate on its own (e.g. being
> blocked in uninterruptible state or frozen by cgroup freezer). In order
> to give the victim some time to terminate properly the oom reaper is
> delayed in its invocation. This is particularly beneficial when the oom
> victim is holding robust futex resources as the anonymous memory tear
> down can break those. [1]
>
> On the other hand deliberately frozen tasks by the freezer cgroup will
> not wake up until they are thawed in the userspace and delay is
> effectively pointless. Therefore opt out from the delay for cgroup
> frozen oom victims.
>
> Reference:
> [1] https://lore.kernel.org/all/20220414144042.677008-1-npache@redhat.com/T/#u
>
> Signed-off-by: zhongjinji <zhongjinji@honor.com>

Nice :) now this is very simple.

LGTM, so:

Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>

> ---
>  mm/oom_kill.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 25923cfec9c6..a5e9074896a1 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -700,7 +700,14 @@ static void queue_oom_reaper(struct task_struct *tsk)
>
>  	get_task_struct(tsk);
>  	timer_setup(&tsk->oom_reaper_timer, wake_oom_reaper, 0);
> -	tsk->oom_reaper_timer.expires = jiffies + OOM_REAPER_DELAY;
> +	tsk->oom_reaper_timer.expires = jiffies;
> +
> +	/*
> +	 * If the task is frozen by the cgroup freezer, the delay is unnecessary
> +	 * because it cannot exit until thawed. Skip the delay for frozen victims.
> +	 */
> +	if (!frozen(tsk))
> +		tsk->oom_reaper_timer.expires += OOM_REAPER_DELAY;
>  	add_timer(&tsk->oom_reaper_timer);
>  }
>
> --
> 2.17.1
>