[PATCH v3] memcg: fix soft lockup in the OOM process

Chen Ridong posted 1 patch 11 months, 4 weeks ago
mm/memcontrol.c | 7 ++++++-
mm/oom_kill.c   | 8 +++++++-
2 files changed, 13 insertions(+), 2 deletions(-)
[PATCH v3] memcg: fix soft lockup in the OOM process
Posted by Chen Ridong 11 months, 4 weeks ago
From: Chen Ridong <chenridong@huawei.com>

A soft lockup issue was found in the product with about 56,000 tasks were
in the OOM cgroup, it was traversing them when the soft lockup was
triggered.

watchdog: BUG: soft lockup - CPU#2 stuck for 23s! [VM Thread:1503066]
CPU: 2 PID: 1503066 Comm: VM Thread Kdump: loaded Tainted: G
Hardware name: Huawei Cloud OpenStack Nova, BIOS
RIP: 0010:console_unlock+0x343/0x540
RSP: 0000:ffffb751447db9a0 EFLAGS: 00000247 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000001 RBX: 0000000000000000 RCX: 00000000ffffffff
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000247
RBP: ffffffffafc71f90 R08: 0000000000000000 R09: 0000000000000040
R10: 0000000000000080 R11: 0000000000000000 R12: ffffffffafc74bd0
R13: ffffffffaf60a220 R14: 0000000000000247 R15: 0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f2fe6ad91f0 CR3: 00000004b2076003 CR4: 0000000000360ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 vprintk_emit+0x193/0x280
 printk+0x52/0x6e
 dump_task+0x114/0x130
 mem_cgroup_scan_tasks+0x76/0x100
 dump_header+0x1fe/0x210
 oom_kill_process+0xd1/0x100
 out_of_memory+0x125/0x570
 mem_cgroup_out_of_memory+0xb5/0xd0
 try_charge+0x720/0x770
 mem_cgroup_try_charge+0x86/0x180
 mem_cgroup_try_charge_delay+0x1c/0x40
 do_anonymous_page+0xb5/0x390
 handle_mm_fault+0xc4/0x1f0

This is because thousands of processes are in the OOM cgroup, it takes a
long time to traverse all of them. As a result, this lead to soft lockup
in the OOM process.

To fix this issue, call 'cond_resched' in the 'mem_cgroup_scan_tasks'
function per 1000 iterations. For global OOM, call
'touch_softlockup_watchdog' per 1000 iterations to avoid this issue.

Fixes: 9cbb78bb3143 ("mm, memcg: introduce own oom handler to iterate only over its own threads")
Signed-off-by: Chen Ridong <chenridong@huawei.com>
---
 mm/memcontrol.c | 7 ++++++-
 mm/oom_kill.c   | 8 +++++++-
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 65fb5eee1466..46f8b372d212 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1161,6 +1161,7 @@ void mem_cgroup_scan_tasks(struct mem_cgroup *memcg,
 {
 	struct mem_cgroup *iter;
 	int ret = 0;
+	int i = 0;
 
 	BUG_ON(mem_cgroup_is_root(memcg));
 
@@ -1169,8 +1170,12 @@ void mem_cgroup_scan_tasks(struct mem_cgroup *memcg,
 		struct task_struct *task;
 
 		css_task_iter_start(&iter->css, CSS_TASK_ITER_PROCS, &it);
-		while (!ret && (task = css_task_iter_next(&it)))
+		while (!ret && (task = css_task_iter_next(&it))) {
+			/* Avoid potential softlockup warning */
+			if ((++i & 1023) == 0)
+				cond_resched();
 			ret = fn(task, arg);
+		}
 		css_task_iter_end(&it);
 		if (ret) {
 			mem_cgroup_iter_break(memcg, iter);
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 1c485beb0b93..044ebab2c941 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -44,6 +44,7 @@
 #include <linux/init.h>
 #include <linux/mmu_notifier.h>
 #include <linux/cred.h>
+#include <linux/nmi.h>
 
 #include <asm/tlb.h>
 #include "internal.h"
@@ -430,10 +431,15 @@ static void dump_tasks(struct oom_control *oc)
 		mem_cgroup_scan_tasks(oc->memcg, dump_task, oc);
 	else {
 		struct task_struct *p;
+		int i = 0;
 
 		rcu_read_lock();
-		for_each_process(p)
+		for_each_process(p) {
+			/* Avoid potential softlockup warning */
+			if ((++i & 1023) == 0)
+				touch_softlockup_watchdog();
 			dump_task(p, oc);
+		}
 		rcu_read_unlock();
 	}
 }
-- 
2.34.1
Re: [PATCH v3] memcg: fix soft lockup in the OOM process
Posted by Michal Hocko 11 months, 2 weeks ago
On Tue 24-12-24 02:52:38, Chen Ridong wrote:
> From: Chen Ridong <chenridong@huawei.com>
> 
> A soft lockup issue was found in the product with about 56,000 tasks were
> in the OOM cgroup, it was traversing them when the soft lockup was
> triggered.
> 
> watchdog: BUG: soft lockup - CPU#2 stuck for 23s! [VM Thread:1503066]
> CPU: 2 PID: 1503066 Comm: VM Thread Kdump: loaded Tainted: G
> Hardware name: Huawei Cloud OpenStack Nova, BIOS
> RIP: 0010:console_unlock+0x343/0x540
> RSP: 0000:ffffb751447db9a0 EFLAGS: 00000247 ORIG_RAX: ffffffffffffff13
> RAX: 0000000000000001 RBX: 0000000000000000 RCX: 00000000ffffffff
> RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000247
> RBP: ffffffffafc71f90 R08: 0000000000000000 R09: 0000000000000040
> R10: 0000000000000080 R11: 0000000000000000 R12: ffffffffafc74bd0
> R13: ffffffffaf60a220 R14: 0000000000000247 R15: 0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f2fe6ad91f0 CR3: 00000004b2076003 CR4: 0000000000360ee0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  vprintk_emit+0x193/0x280
>  printk+0x52/0x6e
>  dump_task+0x114/0x130
>  mem_cgroup_scan_tasks+0x76/0x100
>  dump_header+0x1fe/0x210
>  oom_kill_process+0xd1/0x100
>  out_of_memory+0x125/0x570
>  mem_cgroup_out_of_memory+0xb5/0xd0
>  try_charge+0x720/0x770
>  mem_cgroup_try_charge+0x86/0x180
>  mem_cgroup_try_charge_delay+0x1c/0x40
>  do_anonymous_page+0xb5/0x390
>  handle_mm_fault+0xc4/0x1f0
> 
> This is because thousands of processes are in the OOM cgroup, it takes a
> long time to traverse all of them. As a result, this lead to soft lockup
> in the OOM process.
> 
> To fix this issue, call 'cond_resched' in the 'mem_cgroup_scan_tasks'
> function per 1000 iterations. For global OOM, call
> 'touch_softlockup_watchdog' per 1000 iterations to avoid this issue.
> 
> Fixes: 9cbb78bb3143 ("mm, memcg: introduce own oom handler to iterate only over its own threads")
> Signed-off-by: Chen Ridong <chenridong@huawei.com>

LGTM, I would really not overthink that much. PREEMPT_NONE and Soft
lockups will hopefully soon become a non-issue.

Acked-by: Michal Hocko <mhocko@suse.com>

-- 
Michal Hocko
SUSE Labs
Re: [PATCH v3] memcg: fix soft lockup in the OOM process
Posted by Vlastimil Babka 11 months, 2 weeks ago
On 12/24/24 03:52, Chen Ridong wrote:
> From: Chen Ridong <chenridong@huawei.com>

+CC RCU

> A soft lockup issue was found in the product with about 56,000 tasks were
> in the OOM cgroup, it was traversing them when the soft lockup was
> triggered.
> 
> watchdog: BUG: soft lockup - CPU#2 stuck for 23s! [VM Thread:1503066]
> CPU: 2 PID: 1503066 Comm: VM Thread Kdump: loaded Tainted: G
> Hardware name: Huawei Cloud OpenStack Nova, BIOS
> RIP: 0010:console_unlock+0x343/0x540
> RSP: 0000:ffffb751447db9a0 EFLAGS: 00000247 ORIG_RAX: ffffffffffffff13
> RAX: 0000000000000001 RBX: 0000000000000000 RCX: 00000000ffffffff
> RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000247
> RBP: ffffffffafc71f90 R08: 0000000000000000 R09: 0000000000000040
> R10: 0000000000000080 R11: 0000000000000000 R12: ffffffffafc74bd0
> R13: ffffffffaf60a220 R14: 0000000000000247 R15: 0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f2fe6ad91f0 CR3: 00000004b2076003 CR4: 0000000000360ee0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  vprintk_emit+0x193/0x280
>  printk+0x52/0x6e
>  dump_task+0x114/0x130
>  mem_cgroup_scan_tasks+0x76/0x100
>  dump_header+0x1fe/0x210
>  oom_kill_process+0xd1/0x100
>  out_of_memory+0x125/0x570
>  mem_cgroup_out_of_memory+0xb5/0xd0
>  try_charge+0x720/0x770
>  mem_cgroup_try_charge+0x86/0x180
>  mem_cgroup_try_charge_delay+0x1c/0x40
>  do_anonymous_page+0xb5/0x390
>  handle_mm_fault+0xc4/0x1f0
> 
> This is because thousands of processes are in the OOM cgroup, it takes a
> long time to traverse all of them. As a result, this lead to soft lockup
> in the OOM process.
> 
> To fix this issue, call 'cond_resched' in the 'mem_cgroup_scan_tasks'
> function per 1000 iterations. For global OOM, call
> 'touch_softlockup_watchdog' per 1000 iterations to avoid this issue.
> 
> Fixes: 9cbb78bb3143 ("mm, memcg: introduce own oom handler to iterate only over its own threads")
> Signed-off-by: Chen Ridong <chenridong@huawei.com>
> ---
>  mm/memcontrol.c | 7 ++++++-
>  mm/oom_kill.c   | 8 +++++++-
>  2 files changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 65fb5eee1466..46f8b372d212 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1161,6 +1161,7 @@ void mem_cgroup_scan_tasks(struct mem_cgroup *memcg,
>  {
>  	struct mem_cgroup *iter;
>  	int ret = 0;
> +	int i = 0;
>  
>  	BUG_ON(mem_cgroup_is_root(memcg));
>  
> @@ -1169,8 +1170,12 @@ void mem_cgroup_scan_tasks(struct mem_cgroup *memcg,
>  		struct task_struct *task;
>  
>  		css_task_iter_start(&iter->css, CSS_TASK_ITER_PROCS, &it);
> -		while (!ret && (task = css_task_iter_next(&it)))
> +		while (!ret && (task = css_task_iter_next(&it))) {
> +			/* Avoid potential softlockup warning */
> +			if ((++i & 1023) == 0)
> +				cond_resched();
>  			ret = fn(task, arg);
> +		}
>  		css_task_iter_end(&it);
>  		if (ret) {
>  			mem_cgroup_iter_break(memcg, iter);
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 1c485beb0b93..044ebab2c941 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -44,6 +44,7 @@
>  #include <linux/init.h>
>  #include <linux/mmu_notifier.h>
>  #include <linux/cred.h>
> +#include <linux/nmi.h>
>  
>  #include <asm/tlb.h>
>  #include "internal.h"
> @@ -430,10 +431,15 @@ static void dump_tasks(struct oom_control *oc)
>  		mem_cgroup_scan_tasks(oc->memcg, dump_task, oc);
>  	else {
>  		struct task_struct *p;
> +		int i = 0;
>  
>  		rcu_read_lock();
> -		for_each_process(p)
> +		for_each_process(p) {
> +			/* Avoid potential softlockup warning */
> +			if ((++i & 1023) == 0)
> +				touch_softlockup_watchdog();

This might suppress the soft lockup, but won't a rcu stall still be detected?

>  			dump_task(p, oc);
> +		}
>  		rcu_read_unlock();
>  	}
>  }
Re: [PATCH v3] memcg: fix soft lockup in the OOM process
Posted by Chen Ridong 11 months, 1 week ago

On 2025/1/6 16:45, Vlastimil Babka wrote:
> On 12/24/24 03:52, Chen Ridong wrote:
>> From: Chen Ridong <chenridong@huawei.com>
> 
> +CC RCU
> 
>> A soft lockup issue was found in the product with about 56,000 tasks were
>> in the OOM cgroup, it was traversing them when the soft lockup was
>> triggered.
>>
>> watchdog: BUG: soft lockup - CPU#2 stuck for 23s! [VM Thread:1503066]
>> CPU: 2 PID: 1503066 Comm: VM Thread Kdump: loaded Tainted: G
>> Hardware name: Huawei Cloud OpenStack Nova, BIOS
>> RIP: 0010:console_unlock+0x343/0x540
>> RSP: 0000:ffffb751447db9a0 EFLAGS: 00000247 ORIG_RAX: ffffffffffffff13
>> RAX: 0000000000000001 RBX: 0000000000000000 RCX: 00000000ffffffff
>> RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000247
>> RBP: ffffffffafc71f90 R08: 0000000000000000 R09: 0000000000000040
>> R10: 0000000000000080 R11: 0000000000000000 R12: ffffffffafc74bd0
>> R13: ffffffffaf60a220 R14: 0000000000000247 R15: 0000000000000000
>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 00007f2fe6ad91f0 CR3: 00000004b2076003 CR4: 0000000000360ee0
>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> Call Trace:
>>  vprintk_emit+0x193/0x280
>>  printk+0x52/0x6e
>>  dump_task+0x114/0x130
>>  mem_cgroup_scan_tasks+0x76/0x100
>>  dump_header+0x1fe/0x210
>>  oom_kill_process+0xd1/0x100
>>  out_of_memory+0x125/0x570
>>  mem_cgroup_out_of_memory+0xb5/0xd0
>>  try_charge+0x720/0x770
>>  mem_cgroup_try_charge+0x86/0x180
>>  mem_cgroup_try_charge_delay+0x1c/0x40
>>  do_anonymous_page+0xb5/0x390
>>  handle_mm_fault+0xc4/0x1f0
>>
>> This is because thousands of processes are in the OOM cgroup, it takes a
>> long time to traverse all of them. As a result, this lead to soft lockup
>> in the OOM process.
>>
>> To fix this issue, call 'cond_resched' in the 'mem_cgroup_scan_tasks'
>> function per 1000 iterations. For global OOM, call
>> 'touch_softlockup_watchdog' per 1000 iterations to avoid this issue.
>>
>> Fixes: 9cbb78bb3143 ("mm, memcg: introduce own oom handler to iterate only over its own threads")
>> Signed-off-by: Chen Ridong <chenridong@huawei.com>
>> ---
>>  mm/memcontrol.c | 7 ++++++-
>>  mm/oom_kill.c   | 8 +++++++-
>>  2 files changed, 13 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 65fb5eee1466..46f8b372d212 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -1161,6 +1161,7 @@ void mem_cgroup_scan_tasks(struct mem_cgroup *memcg,
>>  {
>>  	struct mem_cgroup *iter;
>>  	int ret = 0;
>> +	int i = 0;
>>  
>>  	BUG_ON(mem_cgroup_is_root(memcg));
>>  
>> @@ -1169,8 +1170,12 @@ void mem_cgroup_scan_tasks(struct mem_cgroup *memcg,
>>  		struct task_struct *task;
>>  
>>  		css_task_iter_start(&iter->css, CSS_TASK_ITER_PROCS, &it);
>> -		while (!ret && (task = css_task_iter_next(&it)))
>> +		while (!ret && (task = css_task_iter_next(&it))) {
>> +			/* Avoid potential softlockup warning */
>> +			if ((++i & 1023) == 0)
>> +				cond_resched();
>>  			ret = fn(task, arg);
>> +		}
>>  		css_task_iter_end(&it);
>>  		if (ret) {
>>  			mem_cgroup_iter_break(memcg, iter);
>> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
>> index 1c485beb0b93..044ebab2c941 100644
>> --- a/mm/oom_kill.c
>> +++ b/mm/oom_kill.c
>> @@ -44,6 +44,7 @@
>>  #include <linux/init.h>
>>  #include <linux/mmu_notifier.h>
>>  #include <linux/cred.h>
>> +#include <linux/nmi.h>
>>  
>>  #include <asm/tlb.h>
>>  #include "internal.h"
>> @@ -430,10 +431,15 @@ static void dump_tasks(struct oom_control *oc)
>>  		mem_cgroup_scan_tasks(oc->memcg, dump_task, oc);
>>  	else {
>>  		struct task_struct *p;
>> +		int i = 0;
>>  
>>  		rcu_read_lock();
>> -		for_each_process(p)
>> +		for_each_process(p) {
>> +			/* Avoid potential softlockup warning */
>> +			if ((++i & 1023) == 0)
>> +				touch_softlockup_watchdog();
> 
> This might suppress the soft lockup, but won't a rcu stall still be detected?

Yes, rcu stall was still detected.
For global OOM, system is likely to struggle, do we have to do some
works to suppress RCU detete?

Best regards,
Ridong

> 
>>  			dump_task(p, oc);
>> +		}
>>  		rcu_read_unlock();
>>  	}
>>  }
>
Re: [PATCH v3] memcg: fix soft lockup in the OOM process
Posted by Andrew Morton 11 months, 1 week ago
On Mon, 13 Jan 2025 14:51:55 +0800 Chen Ridong <chenridong@huaweicloud.com> wrote:

> 
> 
> On 2025/1/6 16:45, Vlastimil Babka wrote:
> > On 12/24/24 03:52, Chen Ridong wrote:
> >> From: Chen Ridong <chenridong@huawei.com>
> > 
> > +CC RCU
> > 
> >> A soft lockup issue was found in the product with about 56,000 tasks were
> >> in the OOM cgroup, it was traversing them when the soft lockup was
> >> triggered.
> >>
>
> ...
>
> >> @@ -430,10 +431,15 @@ static void dump_tasks(struct oom_control *oc)
> >>  		mem_cgroup_scan_tasks(oc->memcg, dump_task, oc);
> >>  	else {
> >>  		struct task_struct *p;
> >> +		int i = 0;
> >>  
> >>  		rcu_read_lock();
> >> -		for_each_process(p)
> >> +		for_each_process(p) {
> >> +			/* Avoid potential softlockup warning */
> >> +			if ((++i & 1023) == 0)
> >> +				touch_softlockup_watchdog();
> > 
> > This might suppress the soft lockup, but won't a rcu stall still be detected?
> 
> Yes, rcu stall was still detected.
> For global OOM, system is likely to struggle, do we have to do some
> works to suppress RCU detete?

rcu_cpu_stall_reset()?
Re: [PATCH v3] memcg: fix soft lockup in the OOM process
Posted by Michal Hocko 11 months, 1 week ago
On Mon 13-01-25 19:45:46, Andrew Morton wrote:
> On Mon, 13 Jan 2025 14:51:55 +0800 Chen Ridong <chenridong@huaweicloud.com> wrote:
> 
> > 
> > 
> > On 2025/1/6 16:45, Vlastimil Babka wrote:
> > > On 12/24/24 03:52, Chen Ridong wrote:
> > >> From: Chen Ridong <chenridong@huawei.com>
> > > 
> > > +CC RCU
> > > 
> > >> A soft lockup issue was found in the product with about 56,000 tasks were
> > >> in the OOM cgroup, it was traversing them when the soft lockup was
> > >> triggered.
> > >>
> >
> > ...
> >
> > >> @@ -430,10 +431,15 @@ static void dump_tasks(struct oom_control *oc)
> > >>  		mem_cgroup_scan_tasks(oc->memcg, dump_task, oc);
> > >>  	else {
> > >>  		struct task_struct *p;
> > >> +		int i = 0;
> > >>  
> > >>  		rcu_read_lock();
> > >> -		for_each_process(p)
> > >> +		for_each_process(p) {
> > >> +			/* Avoid potential softlockup warning */
> > >> +			if ((++i & 1023) == 0)
> > >> +				touch_softlockup_watchdog();
> > > 
> > > This might suppress the soft lockup, but won't a rcu stall still be detected?
> > 
> > Yes, rcu stall was still detected.
> > For global OOM, system is likely to struggle, do we have to do some
> > works to suppress RCU detete?
> 
> rcu_cpu_stall_reset()?

Do we really care about those? The code to iterate over all processes
under RCU is there (basically) since ever and yet we do not seem to have
many reports of stalls? Chen's situation is specific to memcg OOM and
touching the global case was mostly for consistency reasons.
-- 
Michal Hocko
SUSE Labs
Re: [PATCH v3] memcg: fix soft lockup in the OOM process
Posted by Vlastimil Babka 11 months, 1 week ago
On 1/14/25 09:40, Michal Hocko wrote:
> On Mon 13-01-25 19:45:46, Andrew Morton wrote:
>> On Mon, 13 Jan 2025 14:51:55 +0800 Chen Ridong <chenridong@huaweicloud.com> wrote:
>> 
>> > >> @@ -430,10 +431,15 @@ static void dump_tasks(struct oom_control *oc)
>> > >>  		mem_cgroup_scan_tasks(oc->memcg, dump_task, oc);
>> > >>  	else {
>> > >>  		struct task_struct *p;
>> > >> +		int i = 0;
>> > >>  
>> > >>  		rcu_read_lock();
>> > >> -		for_each_process(p)
>> > >> +		for_each_process(p) {
>> > >> +			/* Avoid potential softlockup warning */
>> > >> +			if ((++i & 1023) == 0)
>> > >> +				touch_softlockup_watchdog();
>> > > 
>> > > This might suppress the soft lockup, but won't a rcu stall still be detected?
>> > 
>> > Yes, rcu stall was still detected.

"was" or "would be"? I thought only the memcg case was observed, or was that
some deliberate stress test of the global case? (or the pr_info() console
stress test mentioned earlier, but created outside of the oom code?)

>> > For global OOM, system is likely to struggle, do we have to do some
>> > works to suppress RCU detete?
>> 
>> rcu_cpu_stall_reset()?
> 
> Do we really care about those? The code to iterate over all processes
> under RCU is there (basically) since ever and yet we do not seem to have
> many reports of stalls? Chen's situation is specific to memcg OOM and
> touching the global case was mostly for consistency reasons.

Then I'd rather not touch the global case then if it's theoretical? It's not
even exactly consistent, given it's a cond_resched() in the memcg code (that
can be eventually automatically removed once/if lazy preempt becomes the
sole implementation), but the touch_softlockup_watchdog() would remain,
while doing only half of the job?
Re: [PATCH v3] memcg: fix soft lockup in the OOM process
Posted by Chen Ridong 11 months, 1 week ago

On 2025/1/14 17:20, Vlastimil Babka wrote:
> On 1/14/25 09:40, Michal Hocko wrote:
>> On Mon 13-01-25 19:45:46, Andrew Morton wrote:
>>> On Mon, 13 Jan 2025 14:51:55 +0800 Chen Ridong <chenridong@huaweicloud.com> wrote:
>>>
>>>>>> @@ -430,10 +431,15 @@ static void dump_tasks(struct oom_control *oc)
>>>>>>  		mem_cgroup_scan_tasks(oc->memcg, dump_task, oc);
>>>>>>  	else {
>>>>>>  		struct task_struct *p;
>>>>>> +		int i = 0;
>>>>>>  
>>>>>>  		rcu_read_lock();
>>>>>> -		for_each_process(p)
>>>>>> +		for_each_process(p) {
>>>>>> +			/* Avoid potential softlockup warning */
>>>>>> +			if ((++i & 1023) == 0)
>>>>>> +				touch_softlockup_watchdog();
>>>>>
>>>>> This might suppress the soft lockup, but won't a rcu stall still be detected?
>>>>
>>>> Yes, rcu stall was still detected.
> 
> "was" or "would be"? I thought only the memcg case was observed, or was that
> some deliberate stress test of the global case? (or the pr_info() console
> stress test mentioned earlier, but created outside of the oom code?)
> 

It's not easy to reproduce for global OOM. Because the pr_info() console
stress test can also lead to other softlockups or RCU warnings(not
causeed by OOM process) because the whole system is struggling.However,
if I add mdelay(1) in the dump_task() function (just to slow down
dump_task, assuming this is slowed by pr_info()) and trigger a global
OOM, RCU warnings can be observed.

I think this can verify that global OOM can trigger RCU warnings in the
specific scenarios.

>>>> For global OOM, system is likely to struggle, do we have to do some
>>>> works to suppress RCU detete?
>>>
>>> rcu_cpu_stall_reset()?
>>
>> Do we really care about those? The code to iterate over all processes
>> under RCU is there (basically) since ever and yet we do not seem to have
>> many reports of stalls? Chen's situation is specific to memcg OOM and
>> touching the global case was mostly for consistency reasons.
> 
> Then I'd rather not touch the global case then if it's theoretical? It's not
> even exactly consistent, given it's a cond_resched() in the memcg code (that
> can be eventually automatically removed once/if lazy preempt becomes the
> sole implementation), but the touch_softlockup_watchdog() would remain,
> while doing only half of the job?
Re: [PATCH v3] memcg: fix soft lockup in the OOM process
Posted by Paul E. McKenney 11 months, 1 week ago
On Tue, Jan 14, 2025 at 08:13:37PM +0800, Chen Ridong wrote:
> 
> 
> On 2025/1/14 17:20, Vlastimil Babka wrote:
> > On 1/14/25 09:40, Michal Hocko wrote:
> >> On Mon 13-01-25 19:45:46, Andrew Morton wrote:
> >>> On Mon, 13 Jan 2025 14:51:55 +0800 Chen Ridong <chenridong@huaweicloud.com> wrote:
> >>>
> >>>>>> @@ -430,10 +431,15 @@ static void dump_tasks(struct oom_control *oc)
> >>>>>>  		mem_cgroup_scan_tasks(oc->memcg, dump_task, oc);
> >>>>>>  	else {
> >>>>>>  		struct task_struct *p;
> >>>>>> +		int i = 0;
> >>>>>>  
> >>>>>>  		rcu_read_lock();
> >>>>>> -		for_each_process(p)
> >>>>>> +		for_each_process(p) {
> >>>>>> +			/* Avoid potential softlockup warning */
> >>>>>> +			if ((++i & 1023) == 0)
> >>>>>> +				touch_softlockup_watchdog();
> >>>>>
> >>>>> This might suppress the soft lockup, but won't a rcu stall still be detected?
> >>>>
> >>>> Yes, rcu stall was still detected.
> > 
> > "was" or "would be"? I thought only the memcg case was observed, or was that
> > some deliberate stress test of the global case? (or the pr_info() console
> > stress test mentioned earlier, but created outside of the oom code?)
> > 
> 
> It's not easy to reproduce for global OOM. Because the pr_info() console
> stress test can also lead to other softlockups or RCU warnings(not
> causeed by OOM process) because the whole system is struggling.However,
> if I add mdelay(1) in the dump_task() function (just to slow down
> dump_task, assuming this is slowed by pr_info()) and trigger a global
> OOM, RCU warnings can be observed.
> 
> I think this can verify that global OOM can trigger RCU warnings in the
> specific scenarios.

We do have a recently upstreamed rcutree.csd_lock_suppress_rcu_stall
kernel boot parameter that causes RCU CPU stall warnings to suppress
most of the output when there is an ongoing CSD-lock stall.

Would it make sense to do something similar when the system is in OOM,
give or take the traditional difficulty of determining exactly when OOM
starts and ends?

1dd01c06506c ("rcu: Summarize RCU CPU stall warnings during CSD-lock stalls")

							Thanx, Paul

> >>>> For global OOM, system is likely to struggle, do we have to do some
> >>>> works to suppress RCU detete?
> >>>
> >>> rcu_cpu_stall_reset()?
> >>
> >> Do we really care about those? The code to iterate over all processes
> >> under RCU is there (basically) since ever and yet we do not seem to have
> >> many reports of stalls? Chen's situation is specific to memcg OOM and
> >> touching the global case was mostly for consistency reasons.
> > 
> > Then I'd rather not touch the global case then if it's theoretical? It's not
> > even exactly consistent, given it's a cond_resched() in the memcg code (that
> > can be eventually automatically removed once/if lazy preempt becomes the
> > sole implementation), but the touch_softlockup_watchdog() would remain,
> > while doing only half of the job?
> 
>
Re: [PATCH v3] memcg: fix soft lockup in the OOM process
Posted by chenridong 11 months ago

On 2025/1/15 2:42, Paul E. McKenney wrote:
> On Tue, Jan 14, 2025 at 08:13:37PM +0800, Chen Ridong wrote:
>>
>>
>> On 2025/1/14 17:20, Vlastimil Babka wrote:
>>> On 1/14/25 09:40, Michal Hocko wrote:
>>>> On Mon 13-01-25 19:45:46, Andrew Morton wrote:
>>>>> On Mon, 13 Jan 2025 14:51:55 +0800 Chen Ridong <chenridong@huaweicloud.com> wrote:
>>>>>
>>>>>>>> @@ -430,10 +431,15 @@ static void dump_tasks(struct oom_control *oc)
>>>>>>>>  		mem_cgroup_scan_tasks(oc->memcg, dump_task, oc);
>>>>>>>>  	else {
>>>>>>>>  		struct task_struct *p;
>>>>>>>> +		int i = 0;
>>>>>>>>  
>>>>>>>>  		rcu_read_lock();
>>>>>>>> -		for_each_process(p)
>>>>>>>> +		for_each_process(p) {
>>>>>>>> +			/* Avoid potential softlockup warning */
>>>>>>>> +			if ((++i & 1023) == 0)
>>>>>>>> +				touch_softlockup_watchdog();
>>>>>>>
>>>>>>> This might suppress the soft lockup, but won't a rcu stall still be detected?
>>>>>>
>>>>>> Yes, rcu stall was still detected.
>>>
>>> "was" or "would be"? I thought only the memcg case was observed, or was that
>>> some deliberate stress test of the global case? (or the pr_info() console
>>> stress test mentioned earlier, but created outside of the oom code?)
>>>
>>
>> It's not easy to reproduce for global OOM. Because the pr_info() console
>> stress test can also lead to other softlockups or RCU warnings(not
>> causeed by OOM process) because the whole system is struggling.However,
>> if I add mdelay(1) in the dump_task() function (just to slow down
>> dump_task, assuming this is slowed by pr_info()) and trigger a global
>> OOM, RCU warnings can be observed.
>>
>> I think this can verify that global OOM can trigger RCU warnings in the
>> specific scenarios.
> 
> We do have a recently upstreamed rcutree.csd_lock_suppress_rcu_stall
> kernel boot parameter that causes RCU CPU stall warnings to suppress
> most of the output when there is an ongoing CSD-lock stall.
> 
> Would it make sense to do something similar when the system is in OOM,
> give or take the traditional difficulty of determining exactly when OOM
> starts and ends?
> 
> 1dd01c06506c ("rcu: Summarize RCU CPU stall warnings during CSD-lock stalls")
> 
> 							Thanx, Paul
> 

I prefer to just drop it.

Unlike memcg OOM, global OOM doesn't usually happen. Although it
'verified' that the RCU warning can be observed, we haven't encountered
it in practice. Besides, other RCU warnings may also be observed during
global OOM, and it's difficult to circumvent all the warnings.

Best regards,
Ridong

>>>>>> For global OOM, system is likely to struggle, do we have to do some
>>>>>> works to suppress RCU detete?
>>>>>
>>>>> rcu_cpu_stall_reset()?
>>>>
>>>> Do we really care about those? The code to iterate over all processes
>>>> under RCU is there (basically) since ever and yet we do not seem to have
>>>> many reports of stalls? Chen's situation is specific to memcg OOM and
>>>> touching the global case was mostly for consistency reasons.
>>>
>>> Then I'd rather not touch the global case then if it's theoretical? It's not
>>> even exactly consistent, given it's a cond_resched() in the memcg code (that
>>> can be eventually automatically removed once/if lazy preempt becomes the
>>> sole implementation), but the touch_softlockup_watchdog() would remain,
>>> while doing only half of the job?
>>
>>
Re: [PATCH v3] memcg: fix soft lockup in the OOM process
Posted by Michal Hocko 11 months, 1 week ago
On Tue 14-01-25 10:20:28, Vlastimil Babka wrote:
> On 1/14/25 09:40, Michal Hocko wrote:
> > On Mon 13-01-25 19:45:46, Andrew Morton wrote:
[...]
> >> > For global OOM, system is likely to struggle, do we have to do some
> >> > works to suppress RCU detete?
> >> 
> >> rcu_cpu_stall_reset()?
> > 
> > Do we really care about those? The code to iterate over all processes
> > under RCU is there (basically) since ever and yet we do not seem to have
> > many reports of stalls? Chen's situation is specific to memcg OOM and
> > touching the global case was mostly for consistency reasons.
> 
> Then I'd rather not touch the global case then if it's theoretical?

No strong opinion on this on my side. The only actual reason
touch_softlockup_watchdog is there is becuase it originally had
incorrectly cond_resched there. If half silencing (soft lock up
detector only) disturbs people then let's just drop that hunk.
-- 
Michal Hocko
SUSE Labs
Re: [PATCH v3] memcg: fix soft lockup in the OOM process
Posted by Chen Ridong 11 months, 1 week ago

On 2025/1/14 17:30, Michal Hocko wrote:
> On Tue 14-01-25 10:20:28, Vlastimil Babka wrote:
>> On 1/14/25 09:40, Michal Hocko wrote:
>>> On Mon 13-01-25 19:45:46, Andrew Morton wrote:
> [...]
>>>>> For global OOM, system is likely to struggle, do we have to do some
>>>>> works to suppress RCU detete?
>>>>
>>>> rcu_cpu_stall_reset()?
>>>
>>> Do we really care about those? The code to iterate over all processes
>>> under RCU is there (basically) since ever and yet we do not seem to have
>>> many reports of stalls? Chen's situation is specific to memcg OOM and
>>> touching the global case was mostly for consistency reasons.
>>
>> Then I'd rather not touch the global case then if it's theoretical?
> 
> No strong opinion on this on my side. The only actual reason
> touch_softlockup_watchdog is there is becuase it originally had
> incorrectly cond_resched there. If half silencing (soft lock up
> detector only) disturbs people then let's just drop that hunk.

So do I. If there are no other opinions, I will drop it.

Best regards,
Ridong
Re: [PATCH v3] memcg: fix soft lockup in the OOM process
Posted by Michal Koutný 11 months, 2 weeks ago
Hello.

On Tue, Dec 24, 2024 at 02:52:38AM +0000, Chen Ridong <chenridong@huaweicloud.com> wrote:
> A soft lockup issue was found in the product with about 56,000 tasks were
> in the OOM cgroup, it was traversing them when the soft lockup was
> triggered.

Why is this softlockup a problem? 
It's lot of tasks afterall and possibly a slow console (given looking
for a victim among the comparable number didn't trigger it).

> To fix this issue, call 'cond_resched' in the 'mem_cgroup_scan_tasks'
> function per 1000 iterations. For global OOM, call
> 'touch_softlockup_watchdog' per 1000 iterations to avoid this issue.

This only hides the issue. It could be similarly fixed by simply
decreasing loglevel= ;-)

cond_resched() in the memcg case may be OK but the arbitrary touch for
global situation may hide possibly useful troubleshooting information.
(Yeah, cond_resched() won't fit inside RCU section as in other global
task iterations.)

0.02€,
Michal
Re: [PATCH v3] memcg: fix soft lockup in the OOM process
Posted by Chen Ridong 11 months, 2 weeks ago

On 2025/1/4 0:18, Michal Koutný wrote:
> Hello.
> 
> On Tue, Dec 24, 2024 at 02:52:38AM +0000, Chen Ridong <chenridong@huaweicloud.com> wrote:
>> A soft lockup issue was found in the product with about 56,000 tasks were
>> in the OOM cgroup, it was traversing them when the soft lockup was
>> triggered.
> 
> Why is this softlockup a problem? 
> It's lot of tasks afterall and possibly a slow console (given looking
> for a victim among the comparable number didn't trigger it).
> 

It's not a slow console, but rather 'console pressure'. When a lot of
tasks apply to the console, it can make 'pr_info' slow. In my case,
these tasks will apply to the console. I reproduced this issue using a
test ko that creates many tasks, all of which just call 'pr_info'.

Best regards,
Ridong

>> To fix this issue, call 'cond_resched' in the 'mem_cgroup_scan_tasks'
>> function per 1000 iterations. For global OOM, call
>> 'touch_softlockup_watchdog' per 1000 iterations to avoid this issue.
> 
> This only hides the issue. It could be similarly fixed by simply
> decreasing loglevel= ;-)
> 
> cond_resched() in the memcg case may be OK but the arbitrary touch for
> global situation may hide possibly useful troubleshooting information.
> (Yeah, cond_resched() won't fit inside RCU section as in other global
> task iterations.)
> 
> 0.02€,
> Michal

Re: [PATCH v3] memcg: fix soft lockup in the OOM process
Posted by David Rientjes 11 months, 3 weeks ago
On Tue, 24 Dec 2024, Chen Ridong wrote:

> From: Chen Ridong <chenridong@huawei.com>
> 
> A soft lockup issue was found in the product with about 56,000 tasks were
> in the OOM cgroup, it was traversing them when the soft lockup was
> triggered.
> 
> watchdog: BUG: soft lockup - CPU#2 stuck for 23s! [VM Thread:1503066]
> CPU: 2 PID: 1503066 Comm: VM Thread Kdump: loaded Tainted: G
> Hardware name: Huawei Cloud OpenStack Nova, BIOS
> RIP: 0010:console_unlock+0x343/0x540
> RSP: 0000:ffffb751447db9a0 EFLAGS: 00000247 ORIG_RAX: ffffffffffffff13
> RAX: 0000000000000001 RBX: 0000000000000000 RCX: 00000000ffffffff
> RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000247
> RBP: ffffffffafc71f90 R08: 0000000000000000 R09: 0000000000000040
> R10: 0000000000000080 R11: 0000000000000000 R12: ffffffffafc74bd0
> R13: ffffffffaf60a220 R14: 0000000000000247 R15: 0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f2fe6ad91f0 CR3: 00000004b2076003 CR4: 0000000000360ee0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  vprintk_emit+0x193/0x280
>  printk+0x52/0x6e
>  dump_task+0x114/0x130
>  mem_cgroup_scan_tasks+0x76/0x100
>  dump_header+0x1fe/0x210
>  oom_kill_process+0xd1/0x100
>  out_of_memory+0x125/0x570
>  mem_cgroup_out_of_memory+0xb5/0xd0
>  try_charge+0x720/0x770
>  mem_cgroup_try_charge+0x86/0x180
>  mem_cgroup_try_charge_delay+0x1c/0x40
>  do_anonymous_page+0xb5/0x390
>  handle_mm_fault+0xc4/0x1f0
> 
> This is because thousands of processes are in the OOM cgroup, it takes a
> long time to traverse all of them. As a result, this lead to soft lockup
> in the OOM process.
> 
> To fix this issue, call 'cond_resched' in the 'mem_cgroup_scan_tasks'
> function per 1000 iterations. For global OOM, call
> 'touch_softlockup_watchdog' per 1000 iterations to avoid this issue.
> 
> Fixes: 9cbb78bb3143 ("mm, memcg: introduce own oom handler to iterate only over its own threads")
> Signed-off-by: Chen Ridong <chenridong@huawei.com>

Looks fine to me, although we do a lot of processes traversals for oom 
kill selection as well and this hasn't ever popped up as a significant 
concern.  We have cases far beyond 56k processes.  No objection to the 
approach, however.