mm/memcontrol.c | 7 ++++++- mm/oom_kill.c | 8 +++++++- 2 files changed, 13 insertions(+), 2 deletions(-)
From: Chen Ridong <chenridong@huawei.com>
A soft lockup issue was found in the product with about 56,000 tasks were
in the OOM cgroup, it was traversing them when the soft lockup was
triggered.
watchdog: BUG: soft lockup - CPU#2 stuck for 23s! [VM Thread:1503066]
CPU: 2 PID: 1503066 Comm: VM Thread Kdump: loaded Tainted: G
Hardware name: Huawei Cloud OpenStack Nova, BIOS
RIP: 0010:console_unlock+0x343/0x540
RSP: 0000:ffffb751447db9a0 EFLAGS: 00000247 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000001 RBX: 0000000000000000 RCX: 00000000ffffffff
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000247
RBP: ffffffffafc71f90 R08: 0000000000000000 R09: 0000000000000040
R10: 0000000000000080 R11: 0000000000000000 R12: ffffffffafc74bd0
R13: ffffffffaf60a220 R14: 0000000000000247 R15: 0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f2fe6ad91f0 CR3: 00000004b2076003 CR4: 0000000000360ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
vprintk_emit+0x193/0x280
printk+0x52/0x6e
dump_task+0x114/0x130
mem_cgroup_scan_tasks+0x76/0x100
dump_header+0x1fe/0x210
oom_kill_process+0xd1/0x100
out_of_memory+0x125/0x570
mem_cgroup_out_of_memory+0xb5/0xd0
try_charge+0x720/0x770
mem_cgroup_try_charge+0x86/0x180
mem_cgroup_try_charge_delay+0x1c/0x40
do_anonymous_page+0xb5/0x390
handle_mm_fault+0xc4/0x1f0
This is because thousands of processes are in the OOM cgroup, it takes a
long time to traverse all of them. As a result, this lead to soft lockup
in the OOM process.
To fix this issue, call 'cond_resched' in the 'mem_cgroup_scan_tasks'
function per 1000 iterations. For global OOM, call
'touch_softlockup_watchdog' per 1000 iterations to avoid this issue.
Fixes: 9cbb78bb3143 ("mm, memcg: introduce own oom handler to iterate only over its own threads")
Signed-off-by: Chen Ridong <chenridong@huawei.com>
---
mm/memcontrol.c | 7 ++++++-
mm/oom_kill.c | 8 +++++++-
2 files changed, 13 insertions(+), 2 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 65fb5eee1466..46f8b372d212 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1161,6 +1161,7 @@ void mem_cgroup_scan_tasks(struct mem_cgroup *memcg,
{
struct mem_cgroup *iter;
int ret = 0;
+ int i = 0;
BUG_ON(mem_cgroup_is_root(memcg));
@@ -1169,8 +1170,12 @@ void mem_cgroup_scan_tasks(struct mem_cgroup *memcg,
struct task_struct *task;
css_task_iter_start(&iter->css, CSS_TASK_ITER_PROCS, &it);
- while (!ret && (task = css_task_iter_next(&it)))
+ while (!ret && (task = css_task_iter_next(&it))) {
+ /* Avoid potential softlockup warning */
+ if ((++i & 1023) == 0)
+ cond_resched();
ret = fn(task, arg);
+ }
css_task_iter_end(&it);
if (ret) {
mem_cgroup_iter_break(memcg, iter);
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 1c485beb0b93..044ebab2c941 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -44,6 +44,7 @@
#include <linux/init.h>
#include <linux/mmu_notifier.h>
#include <linux/cred.h>
+#include <linux/nmi.h>
#include <asm/tlb.h>
#include "internal.h"
@@ -430,10 +431,15 @@ static void dump_tasks(struct oom_control *oc)
mem_cgroup_scan_tasks(oc->memcg, dump_task, oc);
else {
struct task_struct *p;
+ int i = 0;
rcu_read_lock();
- for_each_process(p)
+ for_each_process(p) {
+ /* Avoid potential softlockup warning */
+ if ((++i & 1023) == 0)
+ touch_softlockup_watchdog();
dump_task(p, oc);
+ }
rcu_read_unlock();
}
}
--
2.34.1
On Tue 24-12-24 02:52:38, Chen Ridong wrote:
> From: Chen Ridong <chenridong@huawei.com>
>
> A soft lockup issue was found in the product with about 56,000 tasks were
> in the OOM cgroup, it was traversing them when the soft lockup was
> triggered.
>
> watchdog: BUG: soft lockup - CPU#2 stuck for 23s! [VM Thread:1503066]
> CPU: 2 PID: 1503066 Comm: VM Thread Kdump: loaded Tainted: G
> Hardware name: Huawei Cloud OpenStack Nova, BIOS
> RIP: 0010:console_unlock+0x343/0x540
> RSP: 0000:ffffb751447db9a0 EFLAGS: 00000247 ORIG_RAX: ffffffffffffff13
> RAX: 0000000000000001 RBX: 0000000000000000 RCX: 00000000ffffffff
> RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000247
> RBP: ffffffffafc71f90 R08: 0000000000000000 R09: 0000000000000040
> R10: 0000000000000080 R11: 0000000000000000 R12: ffffffffafc74bd0
> R13: ffffffffaf60a220 R14: 0000000000000247 R15: 0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f2fe6ad91f0 CR3: 00000004b2076003 CR4: 0000000000360ee0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> vprintk_emit+0x193/0x280
> printk+0x52/0x6e
> dump_task+0x114/0x130
> mem_cgroup_scan_tasks+0x76/0x100
> dump_header+0x1fe/0x210
> oom_kill_process+0xd1/0x100
> out_of_memory+0x125/0x570
> mem_cgroup_out_of_memory+0xb5/0xd0
> try_charge+0x720/0x770
> mem_cgroup_try_charge+0x86/0x180
> mem_cgroup_try_charge_delay+0x1c/0x40
> do_anonymous_page+0xb5/0x390
> handle_mm_fault+0xc4/0x1f0
>
> This is because thousands of processes are in the OOM cgroup, it takes a
> long time to traverse all of them. As a result, this lead to soft lockup
> in the OOM process.
>
> To fix this issue, call 'cond_resched' in the 'mem_cgroup_scan_tasks'
> function per 1000 iterations. For global OOM, call
> 'touch_softlockup_watchdog' per 1000 iterations to avoid this issue.
>
> Fixes: 9cbb78bb3143 ("mm, memcg: introduce own oom handler to iterate only over its own threads")
> Signed-off-by: Chen Ridong <chenridong@huawei.com>
LGTM, I would really not overthink that much. PREEMPT_NONE and Soft
lockups will hopefully soon become a non-issue.
Acked-by: Michal Hocko <mhocko@suse.com>
--
Michal Hocko
SUSE Labs
On 12/24/24 03:52, Chen Ridong wrote:
> From: Chen Ridong <chenridong@huawei.com>
+CC RCU
> A soft lockup issue was found in the product with about 56,000 tasks were
> in the OOM cgroup, it was traversing them when the soft lockup was
> triggered.
>
> watchdog: BUG: soft lockup - CPU#2 stuck for 23s! [VM Thread:1503066]
> CPU: 2 PID: 1503066 Comm: VM Thread Kdump: loaded Tainted: G
> Hardware name: Huawei Cloud OpenStack Nova, BIOS
> RIP: 0010:console_unlock+0x343/0x540
> RSP: 0000:ffffb751447db9a0 EFLAGS: 00000247 ORIG_RAX: ffffffffffffff13
> RAX: 0000000000000001 RBX: 0000000000000000 RCX: 00000000ffffffff
> RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000247
> RBP: ffffffffafc71f90 R08: 0000000000000000 R09: 0000000000000040
> R10: 0000000000000080 R11: 0000000000000000 R12: ffffffffafc74bd0
> R13: ffffffffaf60a220 R14: 0000000000000247 R15: 0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f2fe6ad91f0 CR3: 00000004b2076003 CR4: 0000000000360ee0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> vprintk_emit+0x193/0x280
> printk+0x52/0x6e
> dump_task+0x114/0x130
> mem_cgroup_scan_tasks+0x76/0x100
> dump_header+0x1fe/0x210
> oom_kill_process+0xd1/0x100
> out_of_memory+0x125/0x570
> mem_cgroup_out_of_memory+0xb5/0xd0
> try_charge+0x720/0x770
> mem_cgroup_try_charge+0x86/0x180
> mem_cgroup_try_charge_delay+0x1c/0x40
> do_anonymous_page+0xb5/0x390
> handle_mm_fault+0xc4/0x1f0
>
> This is because thousands of processes are in the OOM cgroup, it takes a
> long time to traverse all of them. As a result, this lead to soft lockup
> in the OOM process.
>
> To fix this issue, call 'cond_resched' in the 'mem_cgroup_scan_tasks'
> function per 1000 iterations. For global OOM, call
> 'touch_softlockup_watchdog' per 1000 iterations to avoid this issue.
>
> Fixes: 9cbb78bb3143 ("mm, memcg: introduce own oom handler to iterate only over its own threads")
> Signed-off-by: Chen Ridong <chenridong@huawei.com>
> ---
> mm/memcontrol.c | 7 ++++++-
> mm/oom_kill.c | 8 +++++++-
> 2 files changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 65fb5eee1466..46f8b372d212 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1161,6 +1161,7 @@ void mem_cgroup_scan_tasks(struct mem_cgroup *memcg,
> {
> struct mem_cgroup *iter;
> int ret = 0;
> + int i = 0;
>
> BUG_ON(mem_cgroup_is_root(memcg));
>
> @@ -1169,8 +1170,12 @@ void mem_cgroup_scan_tasks(struct mem_cgroup *memcg,
> struct task_struct *task;
>
> css_task_iter_start(&iter->css, CSS_TASK_ITER_PROCS, &it);
> - while (!ret && (task = css_task_iter_next(&it)))
> + while (!ret && (task = css_task_iter_next(&it))) {
> + /* Avoid potential softlockup warning */
> + if ((++i & 1023) == 0)
> + cond_resched();
> ret = fn(task, arg);
> + }
> css_task_iter_end(&it);
> if (ret) {
> mem_cgroup_iter_break(memcg, iter);
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 1c485beb0b93..044ebab2c941 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -44,6 +44,7 @@
> #include <linux/init.h>
> #include <linux/mmu_notifier.h>
> #include <linux/cred.h>
> +#include <linux/nmi.h>
>
> #include <asm/tlb.h>
> #include "internal.h"
> @@ -430,10 +431,15 @@ static void dump_tasks(struct oom_control *oc)
> mem_cgroup_scan_tasks(oc->memcg, dump_task, oc);
> else {
> struct task_struct *p;
> + int i = 0;
>
> rcu_read_lock();
> - for_each_process(p)
> + for_each_process(p) {
> + /* Avoid potential softlockup warning */
> + if ((++i & 1023) == 0)
> + touch_softlockup_watchdog();
This might suppress the soft lockup, but won't a rcu stall still be detected?
> dump_task(p, oc);
> + }
> rcu_read_unlock();
> }
> }
On 2025/1/6 16:45, Vlastimil Babka wrote:
> On 12/24/24 03:52, Chen Ridong wrote:
>> From: Chen Ridong <chenridong@huawei.com>
>
> +CC RCU
>
>> A soft lockup issue was found in the product with about 56,000 tasks were
>> in the OOM cgroup, it was traversing them when the soft lockup was
>> triggered.
>>
>> watchdog: BUG: soft lockup - CPU#2 stuck for 23s! [VM Thread:1503066]
>> CPU: 2 PID: 1503066 Comm: VM Thread Kdump: loaded Tainted: G
>> Hardware name: Huawei Cloud OpenStack Nova, BIOS
>> RIP: 0010:console_unlock+0x343/0x540
>> RSP: 0000:ffffb751447db9a0 EFLAGS: 00000247 ORIG_RAX: ffffffffffffff13
>> RAX: 0000000000000001 RBX: 0000000000000000 RCX: 00000000ffffffff
>> RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000247
>> RBP: ffffffffafc71f90 R08: 0000000000000000 R09: 0000000000000040
>> R10: 0000000000000080 R11: 0000000000000000 R12: ffffffffafc74bd0
>> R13: ffffffffaf60a220 R14: 0000000000000247 R15: 0000000000000000
>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 00007f2fe6ad91f0 CR3: 00000004b2076003 CR4: 0000000000360ee0
>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> Call Trace:
>> vprintk_emit+0x193/0x280
>> printk+0x52/0x6e
>> dump_task+0x114/0x130
>> mem_cgroup_scan_tasks+0x76/0x100
>> dump_header+0x1fe/0x210
>> oom_kill_process+0xd1/0x100
>> out_of_memory+0x125/0x570
>> mem_cgroup_out_of_memory+0xb5/0xd0
>> try_charge+0x720/0x770
>> mem_cgroup_try_charge+0x86/0x180
>> mem_cgroup_try_charge_delay+0x1c/0x40
>> do_anonymous_page+0xb5/0x390
>> handle_mm_fault+0xc4/0x1f0
>>
>> This is because thousands of processes are in the OOM cgroup, it takes a
>> long time to traverse all of them. As a result, this lead to soft lockup
>> in the OOM process.
>>
>> To fix this issue, call 'cond_resched' in the 'mem_cgroup_scan_tasks'
>> function per 1000 iterations. For global OOM, call
>> 'touch_softlockup_watchdog' per 1000 iterations to avoid this issue.
>>
>> Fixes: 9cbb78bb3143 ("mm, memcg: introduce own oom handler to iterate only over its own threads")
>> Signed-off-by: Chen Ridong <chenridong@huawei.com>
>> ---
>> mm/memcontrol.c | 7 ++++++-
>> mm/oom_kill.c | 8 +++++++-
>> 2 files changed, 13 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 65fb5eee1466..46f8b372d212 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -1161,6 +1161,7 @@ void mem_cgroup_scan_tasks(struct mem_cgroup *memcg,
>> {
>> struct mem_cgroup *iter;
>> int ret = 0;
>> + int i = 0;
>>
>> BUG_ON(mem_cgroup_is_root(memcg));
>>
>> @@ -1169,8 +1170,12 @@ void mem_cgroup_scan_tasks(struct mem_cgroup *memcg,
>> struct task_struct *task;
>>
>> css_task_iter_start(&iter->css, CSS_TASK_ITER_PROCS, &it);
>> - while (!ret && (task = css_task_iter_next(&it)))
>> + while (!ret && (task = css_task_iter_next(&it))) {
>> + /* Avoid potential softlockup warning */
>> + if ((++i & 1023) == 0)
>> + cond_resched();
>> ret = fn(task, arg);
>> + }
>> css_task_iter_end(&it);
>> if (ret) {
>> mem_cgroup_iter_break(memcg, iter);
>> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
>> index 1c485beb0b93..044ebab2c941 100644
>> --- a/mm/oom_kill.c
>> +++ b/mm/oom_kill.c
>> @@ -44,6 +44,7 @@
>> #include <linux/init.h>
>> #include <linux/mmu_notifier.h>
>> #include <linux/cred.h>
>> +#include <linux/nmi.h>
>>
>> #include <asm/tlb.h>
>> #include "internal.h"
>> @@ -430,10 +431,15 @@ static void dump_tasks(struct oom_control *oc)
>> mem_cgroup_scan_tasks(oc->memcg, dump_task, oc);
>> else {
>> struct task_struct *p;
>> + int i = 0;
>>
>> rcu_read_lock();
>> - for_each_process(p)
>> + for_each_process(p) {
>> + /* Avoid potential softlockup warning */
>> + if ((++i & 1023) == 0)
>> + touch_softlockup_watchdog();
>
> This might suppress the soft lockup, but won't a rcu stall still be detected?
Yes, rcu stall was still detected.
For global OOM, system is likely to struggle, do we have to do some
works to suppress RCU detete?
Best regards,
Ridong
>
>> dump_task(p, oc);
>> + }
>> rcu_read_unlock();
>> }
>> }
>
On Mon, 13 Jan 2025 14:51:55 +0800 Chen Ridong <chenridong@huaweicloud.com> wrote:
>
>
> On 2025/1/6 16:45, Vlastimil Babka wrote:
> > On 12/24/24 03:52, Chen Ridong wrote:
> >> From: Chen Ridong <chenridong@huawei.com>
> >
> > +CC RCU
> >
> >> A soft lockup issue was found in the product with about 56,000 tasks were
> >> in the OOM cgroup, it was traversing them when the soft lockup was
> >> triggered.
> >>
>
> ...
>
> >> @@ -430,10 +431,15 @@ static void dump_tasks(struct oom_control *oc)
> >> mem_cgroup_scan_tasks(oc->memcg, dump_task, oc);
> >> else {
> >> struct task_struct *p;
> >> + int i = 0;
> >>
> >> rcu_read_lock();
> >> - for_each_process(p)
> >> + for_each_process(p) {
> >> + /* Avoid potential softlockup warning */
> >> + if ((++i & 1023) == 0)
> >> + touch_softlockup_watchdog();
> >
> > This might suppress the soft lockup, but won't a rcu stall still be detected?
>
> Yes, rcu stall was still detected.
> For global OOM, system is likely to struggle, do we have to do some
> works to suppress RCU detete?
rcu_cpu_stall_reset()?
On Mon 13-01-25 19:45:46, Andrew Morton wrote:
> On Mon, 13 Jan 2025 14:51:55 +0800 Chen Ridong <chenridong@huaweicloud.com> wrote:
>
> >
> >
> > On 2025/1/6 16:45, Vlastimil Babka wrote:
> > > On 12/24/24 03:52, Chen Ridong wrote:
> > >> From: Chen Ridong <chenridong@huawei.com>
> > >
> > > +CC RCU
> > >
> > >> A soft lockup issue was found in the product with about 56,000 tasks were
> > >> in the OOM cgroup, it was traversing them when the soft lockup was
> > >> triggered.
> > >>
> >
> > ...
> >
> > >> @@ -430,10 +431,15 @@ static void dump_tasks(struct oom_control *oc)
> > >> mem_cgroup_scan_tasks(oc->memcg, dump_task, oc);
> > >> else {
> > >> struct task_struct *p;
> > >> + int i = 0;
> > >>
> > >> rcu_read_lock();
> > >> - for_each_process(p)
> > >> + for_each_process(p) {
> > >> + /* Avoid potential softlockup warning */
> > >> + if ((++i & 1023) == 0)
> > >> + touch_softlockup_watchdog();
> > >
> > > This might suppress the soft lockup, but won't a rcu stall still be detected?
> >
> > Yes, rcu stall was still detected.
> > For global OOM, system is likely to struggle, do we have to do some
> > works to suppress RCU detete?
>
> rcu_cpu_stall_reset()?
Do we really care about those? The code to iterate over all processes
under RCU is there (basically) since ever and yet we do not seem to have
many reports of stalls? Chen's situation is specific to memcg OOM and
touching the global case was mostly for consistency reasons.
--
Michal Hocko
SUSE Labs
On 1/14/25 09:40, Michal Hocko wrote:
> On Mon 13-01-25 19:45:46, Andrew Morton wrote:
>> On Mon, 13 Jan 2025 14:51:55 +0800 Chen Ridong <chenridong@huaweicloud.com> wrote:
>>
>> > >> @@ -430,10 +431,15 @@ static void dump_tasks(struct oom_control *oc)
>> > >> mem_cgroup_scan_tasks(oc->memcg, dump_task, oc);
>> > >> else {
>> > >> struct task_struct *p;
>> > >> + int i = 0;
>> > >>
>> > >> rcu_read_lock();
>> > >> - for_each_process(p)
>> > >> + for_each_process(p) {
>> > >> + /* Avoid potential softlockup warning */
>> > >> + if ((++i & 1023) == 0)
>> > >> + touch_softlockup_watchdog();
>> > >
>> > > This might suppress the soft lockup, but won't a rcu stall still be detected?
>> >
>> > Yes, rcu stall was still detected.
"was" or "would be"? I thought only the memcg case was observed, or was that
some deliberate stress test of the global case? (or the pr_info() console
stress test mentioned earlier, but created outside of the oom code?)
>> > For global OOM, system is likely to struggle, do we have to do some
>> > works to suppress RCU detete?
>>
>> rcu_cpu_stall_reset()?
>
> Do we really care about those? The code to iterate over all processes
> under RCU is there (basically) since ever and yet we do not seem to have
> many reports of stalls? Chen's situation is specific to memcg OOM and
> touching the global case was mostly for consistency reasons.
Then I'd rather not touch the global case then if it's theoretical? It's not
even exactly consistent, given it's a cond_resched() in the memcg code (that
can be eventually automatically removed once/if lazy preempt becomes the
sole implementation), but the touch_softlockup_watchdog() would remain,
while doing only half of the job?
On 2025/1/14 17:20, Vlastimil Babka wrote:
> On 1/14/25 09:40, Michal Hocko wrote:
>> On Mon 13-01-25 19:45:46, Andrew Morton wrote:
>>> On Mon, 13 Jan 2025 14:51:55 +0800 Chen Ridong <chenridong@huaweicloud.com> wrote:
>>>
>>>>>> @@ -430,10 +431,15 @@ static void dump_tasks(struct oom_control *oc)
>>>>>> mem_cgroup_scan_tasks(oc->memcg, dump_task, oc);
>>>>>> else {
>>>>>> struct task_struct *p;
>>>>>> + int i = 0;
>>>>>>
>>>>>> rcu_read_lock();
>>>>>> - for_each_process(p)
>>>>>> + for_each_process(p) {
>>>>>> + /* Avoid potential softlockup warning */
>>>>>> + if ((++i & 1023) == 0)
>>>>>> + touch_softlockup_watchdog();
>>>>>
>>>>> This might suppress the soft lockup, but won't a rcu stall still be detected?
>>>>
>>>> Yes, rcu stall was still detected.
>
> "was" or "would be"? I thought only the memcg case was observed, or was that
> some deliberate stress test of the global case? (or the pr_info() console
> stress test mentioned earlier, but created outside of the oom code?)
>
It's not easy to reproduce for global OOM. Because the pr_info() console
stress test can also lead to other softlockups or RCU warnings(not
causeed by OOM process) because the whole system is struggling.However,
if I add mdelay(1) in the dump_task() function (just to slow down
dump_task, assuming this is slowed by pr_info()) and trigger a global
OOM, RCU warnings can be observed.
I think this can verify that global OOM can trigger RCU warnings in the
specific scenarios.
>>>> For global OOM, system is likely to struggle, do we have to do some
>>>> works to suppress RCU detete?
>>>
>>> rcu_cpu_stall_reset()?
>>
>> Do we really care about those? The code to iterate over all processes
>> under RCU is there (basically) since ever and yet we do not seem to have
>> many reports of stalls? Chen's situation is specific to memcg OOM and
>> touching the global case was mostly for consistency reasons.
>
> Then I'd rather not touch the global case then if it's theoretical? It's not
> even exactly consistent, given it's a cond_resched() in the memcg code (that
> can be eventually automatically removed once/if lazy preempt becomes the
> sole implementation), but the touch_softlockup_watchdog() would remain,
> while doing only half of the job?
On Tue, Jan 14, 2025 at 08:13:37PM +0800, Chen Ridong wrote:
>
>
> On 2025/1/14 17:20, Vlastimil Babka wrote:
> > On 1/14/25 09:40, Michal Hocko wrote:
> >> On Mon 13-01-25 19:45:46, Andrew Morton wrote:
> >>> On Mon, 13 Jan 2025 14:51:55 +0800 Chen Ridong <chenridong@huaweicloud.com> wrote:
> >>>
> >>>>>> @@ -430,10 +431,15 @@ static void dump_tasks(struct oom_control *oc)
> >>>>>> mem_cgroup_scan_tasks(oc->memcg, dump_task, oc);
> >>>>>> else {
> >>>>>> struct task_struct *p;
> >>>>>> + int i = 0;
> >>>>>>
> >>>>>> rcu_read_lock();
> >>>>>> - for_each_process(p)
> >>>>>> + for_each_process(p) {
> >>>>>> + /* Avoid potential softlockup warning */
> >>>>>> + if ((++i & 1023) == 0)
> >>>>>> + touch_softlockup_watchdog();
> >>>>>
> >>>>> This might suppress the soft lockup, but won't a rcu stall still be detected?
> >>>>
> >>>> Yes, rcu stall was still detected.
> >
> > "was" or "would be"? I thought only the memcg case was observed, or was that
> > some deliberate stress test of the global case? (or the pr_info() console
> > stress test mentioned earlier, but created outside of the oom code?)
> >
>
> It's not easy to reproduce for global OOM. Because the pr_info() console
> stress test can also lead to other softlockups or RCU warnings(not
> causeed by OOM process) because the whole system is struggling.However,
> if I add mdelay(1) in the dump_task() function (just to slow down
> dump_task, assuming this is slowed by pr_info()) and trigger a global
> OOM, RCU warnings can be observed.
>
> I think this can verify that global OOM can trigger RCU warnings in the
> specific scenarios.
We do have a recently upstreamed rcutree.csd_lock_suppress_rcu_stall
kernel boot parameter that causes RCU CPU stall warnings to suppress
most of the output when there is an ongoing CSD-lock stall.
Would it make sense to do something similar when the system is in OOM,
give or take the traditional difficulty of determining exactly when OOM
starts and ends?
1dd01c06506c ("rcu: Summarize RCU CPU stall warnings during CSD-lock stalls")
Thanx, Paul
> >>>> For global OOM, system is likely to struggle, do we have to do some
> >>>> works to suppress RCU detete?
> >>>
> >>> rcu_cpu_stall_reset()?
> >>
> >> Do we really care about those? The code to iterate over all processes
> >> under RCU is there (basically) since ever and yet we do not seem to have
> >> many reports of stalls? Chen's situation is specific to memcg OOM and
> >> touching the global case was mostly for consistency reasons.
> >
> > Then I'd rather not touch the global case then if it's theoretical? It's not
> > even exactly consistent, given it's a cond_resched() in the memcg code (that
> > can be eventually automatically removed once/if lazy preempt becomes the
> > sole implementation), but the touch_softlockup_watchdog() would remain,
> > while doing only half of the job?
>
>
On 2025/1/15 2:42, Paul E. McKenney wrote:
> On Tue, Jan 14, 2025 at 08:13:37PM +0800, Chen Ridong wrote:
>>
>>
>> On 2025/1/14 17:20, Vlastimil Babka wrote:
>>> On 1/14/25 09:40, Michal Hocko wrote:
>>>> On Mon 13-01-25 19:45:46, Andrew Morton wrote:
>>>>> On Mon, 13 Jan 2025 14:51:55 +0800 Chen Ridong <chenridong@huaweicloud.com> wrote:
>>>>>
>>>>>>>> @@ -430,10 +431,15 @@ static void dump_tasks(struct oom_control *oc)
>>>>>>>> mem_cgroup_scan_tasks(oc->memcg, dump_task, oc);
>>>>>>>> else {
>>>>>>>> struct task_struct *p;
>>>>>>>> + int i = 0;
>>>>>>>>
>>>>>>>> rcu_read_lock();
>>>>>>>> - for_each_process(p)
>>>>>>>> + for_each_process(p) {
>>>>>>>> + /* Avoid potential softlockup warning */
>>>>>>>> + if ((++i & 1023) == 0)
>>>>>>>> + touch_softlockup_watchdog();
>>>>>>>
>>>>>>> This might suppress the soft lockup, but won't a rcu stall still be detected?
>>>>>>
>>>>>> Yes, rcu stall was still detected.
>>>
>>> "was" or "would be"? I thought only the memcg case was observed, or was that
>>> some deliberate stress test of the global case? (or the pr_info() console
>>> stress test mentioned earlier, but created outside of the oom code?)
>>>
>>
>> It's not easy to reproduce for global OOM. Because the pr_info() console
>> stress test can also lead to other softlockups or RCU warnings(not
>> causeed by OOM process) because the whole system is struggling.However,
>> if I add mdelay(1) in the dump_task() function (just to slow down
>> dump_task, assuming this is slowed by pr_info()) and trigger a global
>> OOM, RCU warnings can be observed.
>>
>> I think this can verify that global OOM can trigger RCU warnings in the
>> specific scenarios.
>
> We do have a recently upstreamed rcutree.csd_lock_suppress_rcu_stall
> kernel boot parameter that causes RCU CPU stall warnings to suppress
> most of the output when there is an ongoing CSD-lock stall.
>
> Would it make sense to do something similar when the system is in OOM,
> give or take the traditional difficulty of determining exactly when OOM
> starts and ends?
>
> 1dd01c06506c ("rcu: Summarize RCU CPU stall warnings during CSD-lock stalls")
>
> Thanx, Paul
>
I prefer to just drop it.
Unlike memcg OOM, global OOM doesn't usually happen. Although it
'verified' that the RCU warning can be observed, we haven't encountered
it in practice. Besides, other RCU warnings may also be observed during
global OOM, and it's difficult to circumvent all the warnings.
Best regards,
Ridong
>>>>>> For global OOM, system is likely to struggle, do we have to do some
>>>>>> works to suppress RCU detete?
>>>>>
>>>>> rcu_cpu_stall_reset()?
>>>>
>>>> Do we really care about those? The code to iterate over all processes
>>>> under RCU is there (basically) since ever and yet we do not seem to have
>>>> many reports of stalls? Chen's situation is specific to memcg OOM and
>>>> touching the global case was mostly for consistency reasons.
>>>
>>> Then I'd rather not touch the global case then if it's theoretical? It's not
>>> even exactly consistent, given it's a cond_resched() in the memcg code (that
>>> can be eventually automatically removed once/if lazy preempt becomes the
>>> sole implementation), but the touch_softlockup_watchdog() would remain,
>>> while doing only half of the job?
>>
>>
On Tue 14-01-25 10:20:28, Vlastimil Babka wrote: > On 1/14/25 09:40, Michal Hocko wrote: > > On Mon 13-01-25 19:45:46, Andrew Morton wrote: [...] > >> > For global OOM, system is likely to struggle, do we have to do some > >> > works to suppress RCU detete? > >> > >> rcu_cpu_stall_reset()? > > > > Do we really care about those? The code to iterate over all processes > > under RCU is there (basically) since ever and yet we do not seem to have > > many reports of stalls? Chen's situation is specific to memcg OOM and > > touching the global case was mostly for consistency reasons. > > Then I'd rather not touch the global case then if it's theoretical? No strong opinion on this on my side. The only actual reason touch_softlockup_watchdog is there is becuase it originally had incorrectly cond_resched there. If half silencing (soft lock up detector only) disturbs people then let's just drop that hunk. -- Michal Hocko SUSE Labs
On 2025/1/14 17:30, Michal Hocko wrote: > On Tue 14-01-25 10:20:28, Vlastimil Babka wrote: >> On 1/14/25 09:40, Michal Hocko wrote: >>> On Mon 13-01-25 19:45:46, Andrew Morton wrote: > [...] >>>>> For global OOM, system is likely to struggle, do we have to do some >>>>> works to suppress RCU detete? >>>> >>>> rcu_cpu_stall_reset()? >>> >>> Do we really care about those? The code to iterate over all processes >>> under RCU is there (basically) since ever and yet we do not seem to have >>> many reports of stalls? Chen's situation is specific to memcg OOM and >>> touching the global case was mostly for consistency reasons. >> >> Then I'd rather not touch the global case then if it's theoretical? > > No strong opinion on this on my side. The only actual reason > touch_softlockup_watchdog is there is becuase it originally had > incorrectly cond_resched there. If half silencing (soft lock up > detector only) disturbs people then let's just drop that hunk. So do I. If there are no other opinions, I will drop it. Best regards, Ridong
Hello. On Tue, Dec 24, 2024 at 02:52:38AM +0000, Chen Ridong <chenridong@huaweicloud.com> wrote: > A soft lockup issue was found in the product with about 56,000 tasks were > in the OOM cgroup, it was traversing them when the soft lockup was > triggered. Why is this softlockup a problem? It's lot of tasks afterall and possibly a slow console (given looking for a victim among the comparable number didn't trigger it). > To fix this issue, call 'cond_resched' in the 'mem_cgroup_scan_tasks' > function per 1000 iterations. For global OOM, call > 'touch_softlockup_watchdog' per 1000 iterations to avoid this issue. This only hides the issue. It could be similarly fixed by simply decreasing loglevel= ;-) cond_resched() in the memcg case may be OK but the arbitrary touch for global situation may hide possibly useful troubleshooting information. (Yeah, cond_resched() won't fit inside RCU section as in other global task iterations.) 0.02€, Michal
On 2025/1/4 0:18, Michal Koutný wrote: > Hello. > > On Tue, Dec 24, 2024 at 02:52:38AM +0000, Chen Ridong <chenridong@huaweicloud.com> wrote: >> A soft lockup issue was found in the product with about 56,000 tasks were >> in the OOM cgroup, it was traversing them when the soft lockup was >> triggered. > > Why is this softlockup a problem? > It's lot of tasks afterall and possibly a slow console (given looking > for a victim among the comparable number didn't trigger it). > It's not a slow console, but rather 'console pressure'. When a lot of tasks apply to the console, it can make 'pr_info' slow. In my case, these tasks will apply to the console. I reproduced this issue using a test ko that creates many tasks, all of which just call 'pr_info'. Best regards, Ridong >> To fix this issue, call 'cond_resched' in the 'mem_cgroup_scan_tasks' >> function per 1000 iterations. For global OOM, call >> 'touch_softlockup_watchdog' per 1000 iterations to avoid this issue. > > This only hides the issue. It could be similarly fixed by simply > decreasing loglevel= ;-) > > cond_resched() in the memcg case may be OK but the arbitrary touch for > global situation may hide possibly useful troubleshooting information. > (Yeah, cond_resched() won't fit inside RCU section as in other global > task iterations.) > > 0.02€, > Michal
On Tue, 24 Dec 2024, Chen Ridong wrote:
> From: Chen Ridong <chenridong@huawei.com>
>
> A soft lockup issue was found in the product with about 56,000 tasks were
> in the OOM cgroup, it was traversing them when the soft lockup was
> triggered.
>
> watchdog: BUG: soft lockup - CPU#2 stuck for 23s! [VM Thread:1503066]
> CPU: 2 PID: 1503066 Comm: VM Thread Kdump: loaded Tainted: G
> Hardware name: Huawei Cloud OpenStack Nova, BIOS
> RIP: 0010:console_unlock+0x343/0x540
> RSP: 0000:ffffb751447db9a0 EFLAGS: 00000247 ORIG_RAX: ffffffffffffff13
> RAX: 0000000000000001 RBX: 0000000000000000 RCX: 00000000ffffffff
> RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000247
> RBP: ffffffffafc71f90 R08: 0000000000000000 R09: 0000000000000040
> R10: 0000000000000080 R11: 0000000000000000 R12: ffffffffafc74bd0
> R13: ffffffffaf60a220 R14: 0000000000000247 R15: 0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f2fe6ad91f0 CR3: 00000004b2076003 CR4: 0000000000360ee0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> vprintk_emit+0x193/0x280
> printk+0x52/0x6e
> dump_task+0x114/0x130
> mem_cgroup_scan_tasks+0x76/0x100
> dump_header+0x1fe/0x210
> oom_kill_process+0xd1/0x100
> out_of_memory+0x125/0x570
> mem_cgroup_out_of_memory+0xb5/0xd0
> try_charge+0x720/0x770
> mem_cgroup_try_charge+0x86/0x180
> mem_cgroup_try_charge_delay+0x1c/0x40
> do_anonymous_page+0xb5/0x390
> handle_mm_fault+0xc4/0x1f0
>
> This is because thousands of processes are in the OOM cgroup, it takes a
> long time to traverse all of them. As a result, this lead to soft lockup
> in the OOM process.
>
> To fix this issue, call 'cond_resched' in the 'mem_cgroup_scan_tasks'
> function per 1000 iterations. For global OOM, call
> 'touch_softlockup_watchdog' per 1000 iterations to avoid this issue.
>
> Fixes: 9cbb78bb3143 ("mm, memcg: introduce own oom handler to iterate only over its own threads")
> Signed-off-by: Chen Ridong <chenridong@huawei.com>
Looks fine to me, although we do a lot of processes traversals for oom
kill selection as well and this hasn't ever popped up as a significant
concern. We have cases far beyond 56k processes. No objection to the
approach, however.
© 2016 - 2025 Red Hat, Inc.