OOM killer is a mechanism that selects and kills processes when the system
runs out of memory to reclaim resources and keep the system stable.
However, the oom victim cannot terminate on its own when it is frozen,
because __thaw_task() only thaws one thread of the victim, while
the other threads remain in the frozen state.
This change will thaw the entire victim process when OOM occurs,
ensuring that the oom victim can terminate on its own.
Signed-off-by: zhongjinji <zhongjinji@honor.com>
---
mm/oom_kill.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 25923cfec9c6..ffa50a1f0132 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -772,12 +772,11 @@ static void mark_oom_victim(struct task_struct *tsk)
mmgrab(tsk->signal->oom_mm);
/*
- * Make sure that the task is woken up from uninterruptible sleep
+ * Make sure that the process is woken up from uninterruptible sleep
* if it is frozen because OOM killer wouldn't be able to free
- * any memory and livelock. freezing_slow_path will tell the freezer
- * that TIF_MEMDIE tasks should be ignored.
+ * any memory and livelock.
*/
- __thaw_task(tsk);
+ thaw_oom_process(tsk);
atomic_inc(&oom_victims);
cred = get_task_cred(tsk);
trace_mark_victim(tsk, cred->uid.val);
--
2.17.1
On Tue 09-09-25 17:06:58, zhongjinji wrote: > OOM killer is a mechanism that selects and kills processes when the system > runs out of memory to reclaim resources and keep the system stable. > However, the oom victim cannot terminate on its own when it is frozen, > because __thaw_task() only thaws one thread of the victim, while > the other threads remain in the frozen state. > > This change will thaw the entire victim process when OOM occurs, > ensuring that the oom victim can terminate on its own. fold this into patch 1. > > Signed-off-by: zhongjinji <zhongjinji@honor.com> > --- > mm/oom_kill.c | 7 +++---- > 1 file changed, 3 insertions(+), 4 deletions(-) > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > index 25923cfec9c6..ffa50a1f0132 100644 > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -772,12 +772,11 @@ static void mark_oom_victim(struct task_struct *tsk) > mmgrab(tsk->signal->oom_mm); > > /* > - * Make sure that the task is woken up from uninterruptible sleep > + * Make sure that the process is woken up from uninterruptible sleep > * if it is frozen because OOM killer wouldn't be able to free > - * any memory and livelock. freezing_slow_path will tell the freezer > - * that TIF_MEMDIE tasks should be ignored. > + * any memory and livelock. > */ > - __thaw_task(tsk); > + thaw_oom_process(tsk); > atomic_inc(&oom_victims); > cred = get_task_cred(tsk); > trace_mark_victim(tsk, cred->uid.val); > -- > 2.17.1 -- Michal Hocko SUSE Labs
> On Tue 09-09-25 17:06:57, zhongjinji wrote: > > OOM killer is a mechanism that selects and kills processes when the system > > runs out of memory to reclaim resources and keep the system stable. > > However, the oom victim cannot terminate on its own when it is frozen, > > because __thaw_task() only thaws one thread of the victim, while > > the other threads remain in the frozen state. > > > > Since __thaw_task did not fully thaw the OOM victim for self-termination, > > introduce thaw_oom_process() to properly thaw OOM victims. > > You will need s@thaw_oom_process@thaw_processes@ The reason for using thaw_oom_process is that the TIF_MEMDIE flag of the thawed thread will be set, which means this function can only be used to thaw processes terminated by the OOM killer. thaw_processes has already been defined in kernel/power/process.c. Would it be better to use thaw_process instead? I am concerned that others might misunderstand the thaw_process function. thaw_process sets all threads to the TIF_MEMDIE state, so it can only be used to thaw processes killed by the OOM killer. If the TIF_MEMDIE flag of a thread is not set, the thread cannot be thawed regardless of the cgroup state. Should we add a function to set the TIF_MEMDIE state for all threads, like the implementation below? -/* - * thaw_oom_process - thaw the OOM victim process - * @p: process to be thawed - * - * Sets TIF_MEMDIE for all threads in the process group and thaws them. - * Threads with TIF_MEMDIE are ignored by the freezer. - */ -void thaw_oom_process(struct task_struct *p) +void thaw_process(struct task_struct *p) { struct task_struct *t; rcu_read_lock(); for_each_thread(p, t) { - set_tsk_thread_flag(t, TIF_MEMDIE); __thaw_task(t); } rcu_read_unlock(); diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 52d285da5ba4..67b65b249757 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -753,6 +753,17 @@ static inline void queue_oom_reaper(struct task_struct *tsk) } #endif /* CONFIG_MMU */ +void mark_oom_victim_die(struct task_struct *p) +{ + struct task_struct *t; + + rcu_read_lock(); + for_each_thread(p, t) { + set_tsk_thread_flag(t, TIF_MEMDIE); + } + rcu_read_unlock(); +} + /** * mark_oom_victim - mark the given task as OOM victim * @tsk: task to mark @@ -782,7 +793,8 @@ static void mark_oom_victim(struct task_struct *tsk) * if it is frozen because OOM killer wouldn't be able to free * any memory and livelock. */ - thaw_oom_process(tsk); + mark_oom_victim_die(tsk); + thaw_process(tsk); > I would also add the caller in this patch. > > > Signed-off-by: zhongjinji <zhongjinji@honor.com> > > Other than that looks good to me. With the above fixed feel free to add > Acked-by: Michal Hocko <mhocko@suse.com>
On Tue 09-09-25 19:41:31, zhongjinji wrote: > > On Tue 09-09-25 17:06:57, zhongjinji wrote: > > > OOM killer is a mechanism that selects and kills processes when the system > > > runs out of memory to reclaim resources and keep the system stable. > > > However, the oom victim cannot terminate on its own when it is frozen, > > > because __thaw_task() only thaws one thread of the victim, while > > > the other threads remain in the frozen state. > > > > > > Since __thaw_task did not fully thaw the OOM victim for self-termination, > > > introduce thaw_oom_process() to properly thaw OOM victims. > > > > You will need s@thaw_oom_process@thaw_processes@ > > The reason for using thaw_oom_process is that the TIF_MEMDIE flag of the > thawed thread will be set, which means this function can only be used to > thaw processes terminated by the OOM killer. Just do not set the flag inside the function. I would even say do not set TIF_MEMDIE to the rest of the thread group at all. More on that below > thaw_processes has already been defined in kernel/power/process.c. > Would it be better to use thaw_process instead? Sorry I meant thaw_process as thaw_processes is handling all the processes. > I am concerned that others might misunderstand the thaw_process function. > thaw_process sets all threads to the TIF_MEMDIE state, so it can only be > used to thaw processes killed by the OOM killer. And that is the reason why it shouldn't be doing that. It should thaw the whole thread group. That's it. > If the TIF_MEMDIE flag of a thread is not set, the thread cannot be thawed > regardless of the cgroup state. Why would that be the case. TIF_MEMDIE should only denote the victim should be able to access memory reserves. Why the whole thread group needs that? While more threads could be caught in the allocation path this is a sort of boost at best. It cannot guarantee any forward progress and we have kept marking only the first thread that way without any issues. > Should we add a function to set the TIF_MEMDIE > state for all threads, like the implementation below? > > -/* > - * thaw_oom_process - thaw the OOM victim process > - * @p: process to be thawed > - * > - * Sets TIF_MEMDIE for all threads in the process group and thaws them. > - * Threads with TIF_MEMDIE are ignored by the freezer. > - */ > -void thaw_oom_process(struct task_struct *p) > +void thaw_process(struct task_struct *p) > { > struct task_struct *t; > > rcu_read_lock(); > for_each_thread(p, t) { > - set_tsk_thread_flag(t, TIF_MEMDIE); > __thaw_task(t); > } > rcu_read_unlock(); > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > index 52d285da5ba4..67b65b249757 100644 > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -753,6 +753,17 @@ static inline void queue_oom_reaper(struct task_struct *tsk) > } > #endif /* CONFIG_MMU */ > > +void mark_oom_victim_die(struct task_struct *p) > +{ > + struct task_struct *t; > + > + rcu_read_lock(); > + for_each_thread(p, t) { > + set_tsk_thread_flag(t, TIF_MEMDIE); > + } > + rcu_read_unlock(); > +} > + > /** > * mark_oom_victim - mark the given task as OOM victim > * @tsk: task to mark > @@ -782,7 +793,8 @@ static void mark_oom_victim(struct task_struct *tsk) > * if it is frozen because OOM killer wouldn't be able to free > * any memory and livelock. > */ > - thaw_oom_process(tsk); > + mark_oom_victim_die(tsk); > + thaw_process(tsk); > > > I would also add the caller in this patch. > > > > > Signed-off-by: zhongjinji <zhongjinji@honor.com> > > > > Other than that looks good to me. With the above fixed feel free to add > > Acked-by: Michal Hocko <mhocko@suse.com> -- Michal Hocko SUSE Labs
> On Tue 09-09-25 19:41:31, zhongjinji wrote: > > > On Tue 09-09-25 17:06:57, zhongjinji wrote: > > > > OOM killer is a mechanism that selects and kills processes when the system > > > > runs out of memory to reclaim resources and keep the system stable. > > > > However, the oom victim cannot terminate on its own when it is frozen, > > > > because __thaw_task() only thaws one thread of the victim, while > > > > the other threads remain in the frozen state. > > > > > > > > Since __thaw_task did not fully thaw the OOM victim for self-termination, > > > > introduce thaw_oom_process() to properly thaw OOM victims. > > > > > > You will need s@thaw_oom_process@thaw_processes@ > > > > The reason for using thaw_oom_process is that the TIF_MEMDIE flag of the > > thawed thread will be set, which means this function can only be used to > > thaw processes terminated by the OOM killer. > > Just do not set the flag inside the function. I would even say do not > set TIF_MEMDIE to the rest of the thread group at all. More on that > below > > > thaw_processes has already been defined in kernel/power/process.c. > > Would it be better to use thaw_process instead? > > Sorry I meant thaw_process as thaw_processes is handling all the > processes. > > > I am concerned that others might misunderstand the thaw_process function. > > thaw_process sets all threads to the TIF_MEMDIE state, so it can only be > > used to thaw processes killed by the OOM killer. > > And that is the reason why it shouldn't be doing that. It should thaw > the whole thread group. That's it. > > > If the TIF_MEMDIE flag of a thread is not set, the thread cannot be thawed > > regardless of the cgroup state. > > Why would that be the case. TIF_MEMDIE should only denote the victim > should be able to access memory reserves. Why the whole thread group > needs that? While more threads could be caught in the allocation path > this is a sort of boost at best. It cannot guarantee any forward > progress and we have kept marking only the first thread that way without > any issues. When a process is frozen, all its threads enter __refrigerator() (in kernel/freezer.c). When __thaw_task is called, the threads are woken up and check the freezing(current) state (in __refrigerator). The freezing check is implemented via freezing_slow_path. When TIF_MEMDIE is set for a thread, freezing_slow_path will return false, allowing the thread to exit the infinite loop in __refrigerator(), and thus the thread will be thawed. The following code can explain how TIF_MEMDIE works in thread thawing. __refrigerator for (;;) { freezing = freezing(current) freezing_slow_path if (test_tsk_thread_flag(p, TIF_MEMDIE)) return false; if (!freezing) break; schedule(); } Since thread_info is not shared within a thread group, TIF_MEMDIE for each thread must be set so that all threads can be thawed.
On Tue 09-09-25 21:51:52, zhongjinji wrote: > > On Tue 09-09-25 19:41:31, zhongjinji wrote: > > > > On Tue 09-09-25 17:06:57, zhongjinji wrote: > > > > > OOM killer is a mechanism that selects and kills processes when the system > > > > > runs out of memory to reclaim resources and keep the system stable. > > > > > However, the oom victim cannot terminate on its own when it is frozen, > > > > > because __thaw_task() only thaws one thread of the victim, while > > > > > the other threads remain in the frozen state. > > > > > > > > > > Since __thaw_task did not fully thaw the OOM victim for self-termination, > > > > > introduce thaw_oom_process() to properly thaw OOM victims. > > > > > > > > You will need s@thaw_oom_process@thaw_processes@ > > > > > > The reason for using thaw_oom_process is that the TIF_MEMDIE flag of the > > > thawed thread will be set, which means this function can only be used to > > > thaw processes terminated by the OOM killer. > > > > Just do not set the flag inside the function. I would even say do not > > set TIF_MEMDIE to the rest of the thread group at all. More on that > > below > > > > > thaw_processes has already been defined in kernel/power/process.c. > > > Would it be better to use thaw_process instead? > > > > Sorry I meant thaw_process as thaw_processes is handling all the > > processes. > > > > > I am concerned that others might misunderstand the thaw_process function. > > > thaw_process sets all threads to the TIF_MEMDIE state, so it can only be > > > used to thaw processes killed by the OOM killer. > > > > And that is the reason why it shouldn't be doing that. It should thaw > > the whole thread group. That's it. > > > > > If the TIF_MEMDIE flag of a thread is not set, the thread cannot be thawed > > > regardless of the cgroup state. > > > > Why would that be the case. TIF_MEMDIE should only denote the victim > > should be able to access memory reserves. Why the whole thread group > > needs that? While more threads could be caught in the allocation path > > this is a sort of boost at best. It cannot guarantee any forward > > progress and we have kept marking only the first thread that way without > > any issues. > > When a process is frozen, all its threads enter __refrigerator() (in kernel/freezer.c). > When __thaw_task is called, the threads are woken up and check the freezing(current) > state (in __refrigerator). The freezing check is implemented via freezing_slow_path. > When TIF_MEMDIE is set for a thread, freezing_slow_path will return false, allowing > the thread to exit the infinite loop in __refrigerator(), and thus the thread will > be thawed. > > The following code can explain how TIF_MEMDIE works in thread thawing. > __refrigerator > for (;;) { > freezing = freezing(current) > freezing_slow_path > if (test_tsk_thread_flag(p, TIF_MEMDIE)) > return false; > if (!freezing) > break; > schedule(); > } OK, I see. We could deal with that by checking tsk_is_oom_victim() instead of TIF_MEMDIE > Since thread_info is not shared within a thread group, TIF_MEMDIE for each thread > must be set so that all threads can be thawed. -- Michal Hocko SUSE Labs
> > > On Tue 09-09-25 19:41:31, zhongjinji wrote: > > > > > On Tue 09-09-25 17:06:57, zhongjinji wrote: > > > > > > OOM killer is a mechanism that selects and kills processes when the system > > > > > > runs out of memory to reclaim resources and keep the system stable. > > > > > > However, the oom victim cannot terminate on its own when it is frozen, > > > > > > because __thaw_task() only thaws one thread of the victim, while > > > > > > the other threads remain in the frozen state. > > > > > > > > > > > > Since __thaw_task did not fully thaw the OOM victim for self-termination, > > > > > > introduce thaw_oom_process() to properly thaw OOM victims. > > > > > > > > > > You will need s@thaw_oom_process@thaw_processes@ > > > > > > > > The reason for using thaw_oom_process is that the TIF_MEMDIE flag of the > > > > thawed thread will be set, which means this function can only be used to > > > > thaw processes terminated by the OOM killer. > > > > > > Just do not set the flag inside the function. I would even say do not > > > set TIF_MEMDIE to the rest of the thread group at all. More on that > > > below > > > > > > > thaw_processes has already been defined in kernel/power/process.c. > > > > Would it be better to use thaw_process instead? > > > > > > Sorry I meant thaw_process as thaw_processes is handling all the > > > processes. > > > > > > > I am concerned that others might misunderstand the thaw_process function. > > > > thaw_process sets all threads to the TIF_MEMDIE state, so it can only be > > > > used to thaw processes killed by the OOM killer. > > > > > > And that is the reason why it shouldn't be doing that. It should thaw > > > the whole thread group. That's it. > > > > > > > If the TIF_MEMDIE flag of a thread is not set, the thread cannot be thawed > > > > regardless of the cgroup state. > > > > > > Why would that be the case. TIF_MEMDIE should only denote the victim > > > should be able to access memory reserves. Why the whole thread group > > > needs that? While more threads could be caught in the allocation path > > > this is a sort of boost at best. It cannot guarantee any forward > > > progress and we have kept marking only the first thread that way without > > > any issues. > > > > When a process is frozen, all its threads enter __refrigerator() (in kernel/freezer.c). > > When __thaw_task is called, the threads are woken up and check the freezing(current) > > state (in __refrigerator). The freezing check is implemented via freezing_slow_path. > > When TIF_MEMDIE is set for a thread, freezing_slow_path will return false, allowing > > the thread to exit the infinite loop in __refrigerator(), and thus the thread will > > be thawed. > > > > The following code can explain how TIF_MEMDIE works in thread thawing. > > __refrigerator > > for (;;) { > > freezing = freezing(current) > > freezing_slow_path > > if (test_tsk_thread_flag(p, TIF_MEMDIE)) > > return false; > > if (!freezing) > > break; > > schedule(); > > } > > OK, I see. We could deal with that by checking tsk_is_oom_victim() > instead of TIF_MEMDIE Thank you, this looks great. It seems that oom_reserves_allowed implies that tsk_is_oom_victim is not always effective (in page_alloc.c). I will check it.
On Tue, Sep 9, 2025 at 2:15 AM Michal Hocko <mhocko@suse.com> wrote: > > On Tue 09-09-25 17:06:58, zhongjinji wrote: > > OOM killer is a mechanism that selects and kills processes when the system > > runs out of memory to reclaim resources and keep the system stable. > > However, the oom victim cannot terminate on its own when it is frozen, > > because __thaw_task() only thaws one thread of the victim, while > > the other threads remain in the frozen state. > > > > This change will thaw the entire victim process when OOM occurs, > > ensuring that the oom victim can terminate on its own. > > fold this into patch 1. +1 With that done, Reviewed-by: Suren Baghdasaryan <surenb@google.com> > > > > > Signed-off-by: zhongjinji <zhongjinji@honor.com> > > --- > > mm/oom_kill.c | 7 +++---- > > 1 file changed, 3 insertions(+), 4 deletions(-) > > > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > > index 25923cfec9c6..ffa50a1f0132 100644 > > --- a/mm/oom_kill.c > > +++ b/mm/oom_kill.c > > @@ -772,12 +772,11 @@ static void mark_oom_victim(struct task_struct *tsk) > > mmgrab(tsk->signal->oom_mm); > > > > /* > > - * Make sure that the task is woken up from uninterruptible sleep > > + * Make sure that the process is woken up from uninterruptible sleep > > * if it is frozen because OOM killer wouldn't be able to free > > - * any memory and livelock. freezing_slow_path will tell the freezer > > - * that TIF_MEMDIE tasks should be ignored. > > + * any memory and livelock. > > */ > > - __thaw_task(tsk); > > + thaw_oom_process(tsk); > > atomic_inc(&oom_victims); > > cred = get_task_cred(tsk); > > trace_mark_victim(tsk, cred->uid.val); > > -- > > 2.17.1 > > -- > Michal Hocko > SUSE Labs
© 2016 - 2025 Red Hat, Inc.