The oom reaper is a mechanism to guarantee a forward process during OOM
situation when the oom victim cannot terminate on its own (e.g. being
blocked in uninterruptible state or frozen by cgroup freezer). In order
to give the victim some time to terminate properly the oom reaper is
delayed in its invocation. This is particularly beneficial when the oom
victim is holding robust futex resources as the anonymous memory tear
down can break those. [1]
On the other hand deliberately frozen tasks by the freezer cgroup will
not wake up until they are thawed in the userspace and delay is
effectively pointless. Therefore opt out from the delay for cgroup
frozen oom victims.
Reference:
[1] https://lore.kernel.org/all/20220414144042.677008-1-npache@redhat.com/T/#u
Signed-off-by: zhongjinji <zhongjinji@honor.com>
---
mm/oom_kill.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 25923cfec9c6..a5e9074896a1 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -700,7 +700,14 @@ static void queue_oom_reaper(struct task_struct *tsk)
get_task_struct(tsk);
timer_setup(&tsk->oom_reaper_timer, wake_oom_reaper, 0);
- tsk->oom_reaper_timer.expires = jiffies + OOM_REAPER_DELAY;
+ tsk->oom_reaper_timer.expires = jiffies;
+
+ /*
+ * If the task is frozen by the cgroup freezer, the delay is unnecessary
+ * because it cannot exit until thawed. Skip the delay for frozen victims.
+ */
+ if (!frozen(tsk))
+ tsk->oom_reaper_timer.expires += OOM_REAPER_DELAY;
add_timer(&tsk->oom_reaper_timer);
}
--
2.17.1
On Fri 29-08-25 14:55:49, zhongjinji wrote: > The oom reaper is a mechanism to guarantee a forward process during OOM > situation when the oom victim cannot terminate on its own (e.g. being > blocked in uninterruptible state or frozen by cgroup freezer). In order > to give the victim some time to terminate properly the oom reaper is > delayed in its invocation. This is particularly beneficial when the oom > victim is holding robust futex resources as the anonymous memory tear > down can break those. [1] > > On the other hand deliberately frozen tasks by the freezer cgroup will > not wake up until they are thawed in the userspace and delay is > effectively pointless. Therefore opt out from the delay for cgroup > frozen oom victims. > > Reference: > [1] https://lore.kernel.org/all/20220414144042.677008-1-npache@redhat.com/T/#u > > Signed-off-by: zhongjinji <zhongjinji@honor.com> Acked-by: Michal Hocko <mhocko@suse.com> Thanks > --- > mm/oom_kill.c | 9 ++++++++- > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > index 25923cfec9c6..a5e9074896a1 100644 > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -700,7 +700,14 @@ static void queue_oom_reaper(struct task_struct *tsk) > > get_task_struct(tsk); > timer_setup(&tsk->oom_reaper_timer, wake_oom_reaper, 0); > - tsk->oom_reaper_timer.expires = jiffies + OOM_REAPER_DELAY; > + tsk->oom_reaper_timer.expires = jiffies; > + > + /* > + * If the task is frozen by the cgroup freezer, the delay is unnecessary > + * because it cannot exit until thawed. Skip the delay for frozen victims. > + */ > + if (!frozen(tsk)) > + tsk->oom_reaper_timer.expires += OOM_REAPER_DELAY; > add_timer(&tsk->oom_reaper_timer); > } > > -- > 2.17.1 -- Michal Hocko SUSE Labs
> On Fri 29-08-25 14:55:49, zhongjinji wrote: > > The oom reaper is a mechanism to guarantee a forward process during OOM > > situation when the oom victim cannot terminate on its own (e.g. being > > blocked in uninterruptible state or frozen by cgroup freezer). In order > > to give the victim some time to terminate properly the oom reaper is > > delayed in its invocation. This is particularly beneficial when the oom > > victim is holding robust futex resources as the anonymous memory tear > > down can break those. [1] > > > > On the other hand deliberately frozen tasks by the freezer cgroup will > > not wake up until they are thawed in the userspace and delay is > > effectively pointless. Therefore opt out from the delay for cgroup > > frozen oom victims. > > > > Reference: > > [1] https://lore.kernel.org/all/20220414144042.677008-1-npache@redhat.com/T/#u > > > > Signed-off-by: zhongjinji <zhongjinji@honor.com> > > Acked-by: Michal Hocko <mhocko@suse.com> > Thanks Sorry, I found that it doesn't work now (because I previously tested it by simulating OOM, which made testing easier but also caused the mistake. I will re-run the new test). Calling __thaw_task in mark_oom_victim will change the victim's state to running. However, other threads are still in the frozen state, so the process still can't exit. We should update it again by moving __thaw_task to after frozen (this way, executing __thaw_task and frozen in the same function looks more reasonable). Since mark_oom_victim and queue_oom_reaper always appear in pairs, this won't introduce any risky changes. static void queue_oom_reaper(struct task_struct *tsk) { + bool delay = !frozen(tsk); + + /* + * Make sure that the task is woken up from uninterruptible sleep + * if it is frozen because OOM killer wouldn't be able to free + * any memory and livelock. freezing_slow_path will tell the freezer + * that TIF_MEMDIE tasks should be ignored. + */ + __thaw_task(tsk); + /* mm is already queued? */ if (test_and_set_bit(MMF_OOM_REAP_QUEUED, &tsk->signal->oom_mm->flags)) return; @@ -711,7 +721,7 @@ static void queue_oom_reaper(struct task_struct *tsk) * If the task is frozen by the cgroup freezer, the delay is unnecessary * because it cannot exit until thawed. Skip the delay for frozen victims. */ - if (!frozen(tsk)) + if (delay) tsk->oom_reaper_timer.expires += OOM_REAPER_DELAY; add_timer(&tsk->oom_reaper_timer); } @@ -783,13 +793,6 @@ static void mark_oom_victim(struct task_struct *tsk) if (!cmpxchg(&tsk->signal->oom_mm, NULL, mm)) mmgrab(tsk->signal->oom_mm); - /* - * Make sure that the task is woken up from uninterruptible sleep - * if it is frozen because OOM killer wouldn't be able to free - * any memory and livelock. freezing_slow_path will tell the freezer - * that TIF_MEMDIE tasks should be ignored. - */ - __thaw_task(tsk); atomic_inc(&oom_victims); cred = get_task_cred(tsk); trace_mark_victim(tsk, cred->uid.val); > > > --- > > mm/oom_kill.c | 9 ++++++++- > > 1 file changed, 8 insertions(+), 1 deletion(-) > > > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > > index 25923cfec9c6..a5e9074896a1 100644 > > --- a/mm/oom_kill.c > > +++ b/mm/oom_kill.c > > @@ -700,7 +700,14 @@ static void queue_oom_reaper(struct task_struct *tsk) > > > > get_task_struct(tsk); > > timer_setup(&tsk->oom_reaper_timer, wake_oom_reaper, 0); > > - tsk->oom_reaper_timer.expires = jiffies + OOM_REAPER_DELAY; > > + tsk->oom_reaper_timer.expires = jiffies; > > + > > + /* > > + * If the task is frozen by the cgroup freezer, the delay is unnecessary > > + * because it cannot exit until thawed. Skip the delay for frozen victims. > > + */ > > + if (!frozen(tsk)) > > + tsk->oom_reaper_timer.expires += OOM_REAPER_DELAY; > > add_timer(&tsk->oom_reaper_timer); > > } > > > > -- > > 2.17.1
On Mon 01-09-25 17:30:57, zhongjinji wrote: > > On Fri 29-08-25 14:55:49, zhongjinji wrote: > > > The oom reaper is a mechanism to guarantee a forward process during OOM > > > situation when the oom victim cannot terminate on its own (e.g. being > > > blocked in uninterruptible state or frozen by cgroup freezer). In order > > > to give the victim some time to terminate properly the oom reaper is > > > delayed in its invocation. This is particularly beneficial when the oom > > > victim is holding robust futex resources as the anonymous memory tear > > > down can break those. [1] > > > > > > On the other hand deliberately frozen tasks by the freezer cgroup will > > > not wake up until they are thawed in the userspace and delay is > > > effectively pointless. Therefore opt out from the delay for cgroup > > > frozen oom victims. > > > > > > Reference: > > > [1] https://lore.kernel.org/all/20220414144042.677008-1-npache@redhat.com/T/#u > > > > > > Signed-off-by: zhongjinji <zhongjinji@honor.com> > > > > Acked-by: Michal Hocko <mhocko@suse.com> > > Thanks > > Sorry, I found that it doesn't work now (because I previously tested it by > simulating OOM, which made testing easier but also caused the mistake. I will > re-run the new test). Calling __thaw_task in mark_oom_victim will change the > victim's state to running. However, other threads are still in the frozen state, > so the process still can't exit. We should update it again by moving __thaw_task > to after frozen (this way, executing __thaw_task and frozen in the same function > looks more reasonable). Since mark_oom_victim and queue_oom_reaper always appear > in pairs, this won't introduce any risky changes. Hmm, I must have completely forgot that we are actually thawing the frozen task! That means that the actual argument for not delaying the oom reaper doesn't hold. Now I do see why the existing implementation doesn't really work as you would expect though. Is there any reason why we are not thawing the whole process group? I guess I just didn't realize that __thaw_task is per thread rather than per process back then when I have introduced it. Because thread specific behavior makes very little sense to me TBH. So rather than plaing with __thaw_task placement which doesn't really make much sense wrt to delaying the reaper we should look into that part. Sorry, I should have realized earlier when proposing that. -- Michal Hocko SUSE Labs
> > Sorry, I found that it doesn't work now (because I previously tested it by > > simulating OOM, which made testing easier but also caused the mistake. I will > > re-run the new test). Calling __thaw_task in mark_oom_victim will change the > > victim's state to running. However, other threads are still in the frozen state, > > so the process still can't exit. We should update it again by moving __thaw_task > > to after frozen (this way, executing __thaw_task and frozen in the same function > > looks more reasonable). Since mark_oom_victim and queue_oom_reaper always appear > > in pairs, this won't introduce any risky changes. > > Hmm, I must have completely forgot that we are actually thawing the > frozen task! That means that the actual argument for not delaying the > oom reaper doesn't hold. > Now I do see why the existing implementation doesn't really work as you > would expect though. Is there any reason why we are not thawing the > whole process group? I guess I just didn't realize that __thaw_task is > per thread rather than per process back then when I have introduced it. Previously, I didn't know why we needed to call __thaw_task() in mark_oom_victim(). Now I understand. > Because thread specific behavior makes very little sense to me TBH. > So rather than plaing with __thaw_task placement which doesn't really > make much sense wrt to delaying the reaper we should look into that > part. > > Sorry, I should have realized earlier when proposing that. Is this modification acceptable? This change only thaws the process previously identified as the victim, and does not thaw the process being killed in for_each_process. The reason is that the process being killed in for_each_process is usually a vfork process, which is only temporary and rarely encountered. @@ -772,12 +773,18 @@ static void mark_oom_victim(struct task_struct *tsk) mmgrab(tsk->signal->oom_mm); /* - * Make sure that the task is woken up from uninterruptible sleep + * Make sure that the process is woken up from uninterruptible sleep * if it is frozen because OOM killer wouldn't be able to free * any memory and livelock. freezing_slow_path will tell the freezer - * that TIF_MEMDIE tasks should be ignored. + * that TIF_MEMDIE thread should be ignored. */ - __thaw_task(tsk); + rcu_read_lock(); + for_each_thread(tsk, t) { + set_tsk_thread_flag(t, TIF_MEMDIE); + __thaw_task(t); + } + rcu_read_unlock(); + atomic_inc(&oom_victims); cred = get_task_cred(tsk); trace_mark_victim(tsk, cred->uid.val);
On Wed 03-09-25 00:01:29, zhongjinji wrote: [...] > @@ -772,12 +773,18 @@ static void mark_oom_victim(struct task_struct *tsk) > mmgrab(tsk->signal->oom_mm); > > /* > - * Make sure that the task is woken up from uninterruptible sleep > + * Make sure that the process is woken up from uninterruptible sleep > * if it is frozen because OOM killer wouldn't be able to free > * any memory and livelock. freezing_slow_path will tell the freezer > - * that TIF_MEMDIE tasks should be ignored. > + * that TIF_MEMDIE thread should be ignored. > */ > - __thaw_task(tsk); > + rcu_read_lock(); > + for_each_thread(tsk, t) { > + set_tsk_thread_flag(t, TIF_MEMDIE); > + __thaw_task(t); > + } > + rcu_read_unlock(); > + I would prefer if we had thaw_process() rather than open code it here. But the implementation matches what I would expect it to do. > atomic_inc(&oom_victims); > cred = get_task_cred(tsk); > trace_mark_victim(tsk, cred->uid.val); -- Michal Hocko SUSE Labs
On Fri, Aug 29, 2025 at 02:55:49PM +0800, zhongjinji wrote: > The oom reaper is a mechanism to guarantee a forward process during OOM > situation when the oom victim cannot terminate on its own (e.g. being > blocked in uninterruptible state or frozen by cgroup freezer). In order > to give the victim some time to terminate properly the oom reaper is > delayed in its invocation. This is particularly beneficial when the oom > victim is holding robust futex resources as the anonymous memory tear > down can break those. [1] > > On the other hand deliberately frozen tasks by the freezer cgroup will > not wake up until they are thawed in the userspace and delay is > effectively pointless. Therefore opt out from the delay for cgroup > frozen oom victims. > > Reference: > [1] https://lore.kernel.org/all/20220414144042.677008-1-npache@redhat.com/T/#u > > Signed-off-by: zhongjinji <zhongjinji@honor.com> > --- > mm/oom_kill.c | 9 ++++++++- > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > index 25923cfec9c6..a5e9074896a1 100644 > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -700,7 +700,14 @@ static void queue_oom_reaper(struct task_struct *tsk) > > get_task_struct(tsk); > timer_setup(&tsk->oom_reaper_timer, wake_oom_reaper, 0); > - tsk->oom_reaper_timer.expires = jiffies + OOM_REAPER_DELAY; > + tsk->oom_reaper_timer.expires = jiffies; > + > + /* > + * If the task is frozen by the cgroup freezer, the delay is unnecessary > + * because it cannot exit until thawed. Skip the delay for frozen victims. > + */ > + if (!frozen(tsk)) Can you please change the above condition with the following to handle v2 as well? if (!frozen(tsk) && !(READ_ONCE(tsk->frozen))) > + tsk->oom_reaper_timer.expires += OOM_REAPER_DELAY; > add_timer(&tsk->oom_reaper_timer); > } > > -- > 2.17.1 >
> > get_task_struct(tsk); > > timer_setup(&tsk->oom_reaper_timer, wake_oom_reaper, 0); > > - tsk->oom_reaper_timer.expires = jiffies + OOM_REAPER_DELAY; > > + tsk->oom_reaper_timer.expires = jiffies; > > + > > + /* > > + * If the task is frozen by the cgroup freezer, the delay is unnecessary > > + * because it cannot exit until thawed. Skip the delay for frozen victims. > > + */ > > + if (!frozen(tsk)) > > Can you please change the above condition with the following to handle > v2 as well? Thank you, but I think the cgroupv2 check isn't needed, since a process frozen by the cgroup v2 freezer won't block exit after being killed. Would it be better to note in the comment or changelog that this change is for cgroup v1? > if (!frozen(tsk) && !(READ_ONCE(tsk->frozen))) > > > + tsk->oom_reaper_timer.expires += OOM_REAPER_DELAY; > > add_timer(&tsk->oom_reaper_timer); > > } > > > > -- > > 2.17.1 > >
* zhongjinji <zhongjinji@honor.com> [250829 02:56]: > The oom reaper is a mechanism to guarantee a forward process during OOM > situation when the oom victim cannot terminate on its own (e.g. being > blocked in uninterruptible state or frozen by cgroup freezer). In order > to give the victim some time to terminate properly the oom reaper is > delayed in its invocation. This is particularly beneficial when the oom > victim is holding robust futex resources as the anonymous memory tear > down can break those. [1] > > On the other hand deliberately frozen tasks by the freezer cgroup will > not wake up until they are thawed in the userspace and delay is > effectively pointless. Therefore opt out from the delay for cgroup > frozen oom victims. > > Reference: > [1] https://lore.kernel.org/all/20220414144042.677008-1-npache@redhat.com/T/#u > > Signed-off-by: zhongjinji <zhongjinji@honor.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> > --- > mm/oom_kill.c | 9 ++++++++- > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > index 25923cfec9c6..a5e9074896a1 100644 > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -700,7 +700,14 @@ static void queue_oom_reaper(struct task_struct *tsk) > > get_task_struct(tsk); > timer_setup(&tsk->oom_reaper_timer, wake_oom_reaper, 0); > - tsk->oom_reaper_timer.expires = jiffies + OOM_REAPER_DELAY; > + tsk->oom_reaper_timer.expires = jiffies; > + > + /* > + * If the task is frozen by the cgroup freezer, the delay is unnecessary > + * because it cannot exit until thawed. Skip the delay for frozen victims. > + */ > + if (!frozen(tsk)) > + tsk->oom_reaper_timer.expires += OOM_REAPER_DELAY; > add_timer(&tsk->oom_reaper_timer); > } > > -- > 2.17.1 > >
On Fri, Aug 29, 2025 at 02:55:49PM +0800, zhongjinji wrote: > The oom reaper is a mechanism to guarantee a forward process during OOM > situation when the oom victim cannot terminate on its own (e.g. being > blocked in uninterruptible state or frozen by cgroup freezer). In order > to give the victim some time to terminate properly the oom reaper is > delayed in its invocation. This is particularly beneficial when the oom > victim is holding robust futex resources as the anonymous memory tear > down can break those. [1] > > On the other hand deliberately frozen tasks by the freezer cgroup will > not wake up until they are thawed in the userspace and delay is > effectively pointless. Therefore opt out from the delay for cgroup > frozen oom victims. > > Reference: > [1] https://lore.kernel.org/all/20220414144042.677008-1-npache@redhat.com/T/#u > > Signed-off-by: zhongjinji <zhongjinji@honor.com> Nice :) now this is very simple. LGTM, so: Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> > --- > mm/oom_kill.c | 9 ++++++++- > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > index 25923cfec9c6..a5e9074896a1 100644 > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -700,7 +700,14 @@ static void queue_oom_reaper(struct task_struct *tsk) > > get_task_struct(tsk); > timer_setup(&tsk->oom_reaper_timer, wake_oom_reaper, 0); > - tsk->oom_reaper_timer.expires = jiffies + OOM_REAPER_DELAY; > + tsk->oom_reaper_timer.expires = jiffies; > + > + /* > + * If the task is frozen by the cgroup freezer, the delay is unnecessary > + * because it cannot exit until thawed. Skip the delay for frozen victims. > + */ > + if (!frozen(tsk)) > + tsk->oom_reaper_timer.expires += OOM_REAPER_DELAY; > add_timer(&tsk->oom_reaper_timer); > } > > -- > 2.17.1 >
© 2016 - 2025 Red Hat, Inc.