[PATCH] mm/oom_kill: revert watchdog reset in global OOM process

Chen Ridong posted 1 patch 10 months, 1 week ago
mm/oom_kill.c | 8 +-------
1 file changed, 1 insertion(+), 7 deletions(-)
[PATCH] mm/oom_kill: revert watchdog reset in global OOM process
Posted by Chen Ridong 10 months, 1 week ago
From: Chen Ridong <chenridong@huawei.com>

Unlike memcg OOM, which is relatively common, global OOM events are rare
and typically indicate that the entire system is under severe memory
pressure. The commit ade81479c7dd ("memcg: fix soft lockup in the OOM
process") added the touch_softlockup_watchdog in the global OOM handler to
suppess the soft lockup issues. However, while this change can suppress
soft lockup warnings, it does not address RCU stalls, which can still be
detected and may cause unnecessary disturbances. Simply remove the
modification from the global OOM handler.

Fixes: ade81479c7dd ("memcg: fix soft lockup in the OOM process")
Signed-off-by: Chen Ridong <chenridong@huawei.com>
---
 mm/oom_kill.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 25923cfec9c6..2d8b27604ef8 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -44,7 +44,6 @@
 #include <linux/init.h>
 #include <linux/mmu_notifier.h>
 #include <linux/cred.h>
-#include <linux/nmi.h>
 
 #include <asm/tlb.h>
 #include "internal.h"
@@ -431,15 +430,10 @@ static void dump_tasks(struct oom_control *oc)
 		mem_cgroup_scan_tasks(oc->memcg, dump_task, oc);
 	else {
 		struct task_struct *p;
-		int i = 0;
 
 		rcu_read_lock();
-		for_each_process(p) {
-			/* Avoid potential softlockup warning */
-			if ((++i & 1023) == 0)
-				touch_softlockup_watchdog();
+		for_each_process(p)
 			dump_task(p, oc);
-		}
 		rcu_read_unlock();
 	}
 }
-- 
2.34.1
Re: [PATCH] mm/oom_kill: revert watchdog reset in global OOM process
Posted by Michal Hocko 10 months, 1 week ago
On Wed 12-02-25 02:57:07, Chen Ridong wrote:
> From: Chen Ridong <chenridong@huawei.com>
> 
> Unlike memcg OOM, which is relatively common, global OOM events are rare
> and typically indicate that the entire system is under severe memory
> pressure. The commit ade81479c7dd ("memcg: fix soft lockup in the OOM
> process") added the touch_softlockup_watchdog in the global OOM handler to
> suppess the soft lockup issues. However, while this change can suppress
> soft lockup warnings, it does not address RCU stalls, which can still be
> detected and may cause unnecessary disturbances. Simply remove the
> modification from the global OOM handler.
> 
> Fixes: ade81479c7dd ("memcg: fix soft lockup in the OOM process")

But this is not really fixing anything, is it? While this doesn't
address a potential RCU stall it doesn't address any actual problem.
So why do we want to do this?

> Signed-off-by: Chen Ridong <chenridong@huawei.com>
> ---
>  mm/oom_kill.c | 8 +-------
>  1 file changed, 1 insertion(+), 7 deletions(-)
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 25923cfec9c6..2d8b27604ef8 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -44,7 +44,6 @@
>  #include <linux/init.h>
>  #include <linux/mmu_notifier.h>
>  #include <linux/cred.h>
> -#include <linux/nmi.h>
>  
>  #include <asm/tlb.h>
>  #include "internal.h"
> @@ -431,15 +430,10 @@ static void dump_tasks(struct oom_control *oc)
>  		mem_cgroup_scan_tasks(oc->memcg, dump_task, oc);
>  	else {
>  		struct task_struct *p;
> -		int i = 0;
>  
>  		rcu_read_lock();
> -		for_each_process(p) {
> -			/* Avoid potential softlockup warning */
> -			if ((++i & 1023) == 0)
> -				touch_softlockup_watchdog();
> +		for_each_process(p)
>  			dump_task(p, oc);
> -		}
>  		rcu_read_unlock();
>  	}
>  }
> -- 
> 2.34.1

-- 
Michal Hocko
SUSE Labs
Re: [PATCH] mm/oom_kill: revert watchdog reset in global OOM process
Posted by Chen Ridong 10 months, 1 week ago

On 2025/2/12 16:57, Michal Hocko wrote:
> On Wed 12-02-25 02:57:07, Chen Ridong wrote:
>> From: Chen Ridong <chenridong@huawei.com>
>>
>> Unlike memcg OOM, which is relatively common, global OOM events are rare
>> and typically indicate that the entire system is under severe memory
>> pressure. The commit ade81479c7dd ("memcg: fix soft lockup in the OOM
>> process") added the touch_softlockup_watchdog in the global OOM handler to
>> suppess the soft lockup issues. However, while this change can suppress
>> soft lockup warnings, it does not address RCU stalls, which can still be
>> detected and may cause unnecessary disturbances. Simply remove the
>> modification from the global OOM handler.
>>
>> Fixes: ade81479c7dd ("memcg: fix soft lockup in the OOM process")
> 
> But this is not really fixing anything, is it? While this doesn't
> address a potential RCU stall it doesn't address any actual problem.
> So why do we want to do this?
> 


[1]
https://lore.kernel.org/cgroups/0d9ea655-5c1a-4ba9-9eeb-b45d74cc68d0@huaweicloud.com/

As previously discussed, the work I have done on the global OOM is 'half
of the job'. Based on our discussions, I thought that it would be best
to abandon this approach for global OOM. Therefore, I am sending this
patch to revert the changes.

Or just leave it?

Best regards,
Ridong

>> Signed-off-by: Chen Ridong <chenridong@huawei.com>
>> ---
>>  mm/oom_kill.c | 8 +-------
>>  1 file changed, 1 insertion(+), 7 deletions(-)
>>
>> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
>> index 25923cfec9c6..2d8b27604ef8 100644
>> --- a/mm/oom_kill.c
>> +++ b/mm/oom_kill.c
>> @@ -44,7 +44,6 @@
>>  #include <linux/init.h>
>>  #include <linux/mmu_notifier.h>
>>  #include <linux/cred.h>
>> -#include <linux/nmi.h>
>>  
>>  #include <asm/tlb.h>
>>  #include "internal.h"
>> @@ -431,15 +430,10 @@ static void dump_tasks(struct oom_control *oc)
>>  		mem_cgroup_scan_tasks(oc->memcg, dump_task, oc);
>>  	else {
>>  		struct task_struct *p;
>> -		int i = 0;
>>  
>>  		rcu_read_lock();
>> -		for_each_process(p) {
>> -			/* Avoid potential softlockup warning */
>> -			if ((++i & 1023) == 0)
>> -				touch_softlockup_watchdog();
>> +		for_each_process(p)
>>  			dump_task(p, oc);
>> -		}
>>  		rcu_read_unlock();
>>  	}
>>  }
>> -- 
>> 2.34.1
>
Re: [PATCH] mm/oom_kill: revert watchdog reset in global OOM process
Posted by Vlastimil Babka 10 months, 1 week ago
On 2/12/25 10:19, Chen Ridong wrote:
> 
> 
> On 2025/2/12 16:57, Michal Hocko wrote:
>> On Wed 12-02-25 02:57:07, Chen Ridong wrote:
>>> From: Chen Ridong <chenridong@huawei.com>
>>>
>>> Unlike memcg OOM, which is relatively common, global OOM events are rare
>>> and typically indicate that the entire system is under severe memory
>>> pressure. The commit ade81479c7dd ("memcg: fix soft lockup in the OOM
>>> process") added the touch_softlockup_watchdog in the global OOM handler to
>>> suppess the soft lockup issues. However, while this change can suppress
>>> soft lockup warnings, it does not address RCU stalls, which can still be
>>> detected and may cause unnecessary disturbances. Simply remove the
>>> modification from the global OOM handler.
>>>
>>> Fixes: ade81479c7dd ("memcg: fix soft lockup in the OOM process")
>> 
>> But this is not really fixing anything, is it? While this doesn't
>> address a potential RCU stall it doesn't address any actual problem.
>> So why do we want to do this?
>> 
> 
> 
> [1]
> https://lore.kernel.org/cgroups/0d9ea655-5c1a-4ba9-9eeb-b45d74cc68d0@huaweicloud.com/
> 
> As previously discussed, the work I have done on the global OOM is 'half
> of the job'. Based on our discussions, I thought that it would be best
> to abandon this approach for global OOM. Therefore, I am sending this
> patch to revert the changes.
> 
> Or just leave it?

I suggested that part doesn't need to be in the patch, but if it was merged
with it, we can just leave it there. Thanks.
Re: [PATCH] mm/oom_kill: revert watchdog reset in global OOM process
Posted by Chen Ridong 10 months, 1 week ago

On 2025/2/12 17:34, Vlastimil Babka wrote:
> On 2/12/25 10:19, Chen Ridong wrote:
>>
>>
>> On 2025/2/12 16:57, Michal Hocko wrote:
>>> On Wed 12-02-25 02:57:07, Chen Ridong wrote:
>>>> From: Chen Ridong <chenridong@huawei.com>
>>>>
>>>> Unlike memcg OOM, which is relatively common, global OOM events are rare
>>>> and typically indicate that the entire system is under severe memory
>>>> pressure. The commit ade81479c7dd ("memcg: fix soft lockup in the OOM
>>>> process") added the touch_softlockup_watchdog in the global OOM handler to
>>>> suppess the soft lockup issues. However, while this change can suppress
>>>> soft lockup warnings, it does not address RCU stalls, which can still be
>>>> detected and may cause unnecessary disturbances. Simply remove the
>>>> modification from the global OOM handler.
>>>>
>>>> Fixes: ade81479c7dd ("memcg: fix soft lockup in the OOM process")
>>>
>>> But this is not really fixing anything, is it? While this doesn't
>>> address a potential RCU stall it doesn't address any actual problem.
>>> So why do we want to do this?
>>>
>>
>>
>> [1]
>> https://lore.kernel.org/cgroups/0d9ea655-5c1a-4ba9-9eeb-b45d74cc68d0@huaweicloud.com/
>>
>> As previously discussed, the work I have done on the global OOM is 'half
>> of the job'. Based on our discussions, I thought that it would be best
>> to abandon this approach for global OOM. Therefore, I am sending this
>> patch to revert the changes.
>>
>> Or just leave it?
> 
> I suggested that part doesn't need to be in the patch, but if it was merged
> with it, we can just leave it there. Thanks.

See. Thank you very much.

Best regards,
Ridong
Re: [PATCH] mm/oom_kill: revert watchdog reset in global OOM process
Posted by Michal Hocko 10 months, 1 week ago
On Wed 12-02-25 10:34:06, Vlastimil Babka wrote:
> On 2/12/25 10:19, Chen Ridong wrote:
> > 
> > 
> > On 2025/2/12 16:57, Michal Hocko wrote:
> >> On Wed 12-02-25 02:57:07, Chen Ridong wrote:
> >>> From: Chen Ridong <chenridong@huawei.com>
> >>>
> >>> Unlike memcg OOM, which is relatively common, global OOM events are rare
> >>> and typically indicate that the entire system is under severe memory
> >>> pressure. The commit ade81479c7dd ("memcg: fix soft lockup in the OOM
> >>> process") added the touch_softlockup_watchdog in the global OOM handler to
> >>> suppess the soft lockup issues. However, while this change can suppress
> >>> soft lockup warnings, it does not address RCU stalls, which can still be
> >>> detected and may cause unnecessary disturbances. Simply remove the
> >>> modification from the global OOM handler.
> >>>
> >>> Fixes: ade81479c7dd ("memcg: fix soft lockup in the OOM process")
> >> 
> >> But this is not really fixing anything, is it? While this doesn't
> >> address a potential RCU stall it doesn't address any actual problem.
> >> So why do we want to do this?
> >> 
> > 
> > 
> > [1]
> > https://lore.kernel.org/cgroups/0d9ea655-5c1a-4ba9-9eeb-b45d74cc68d0@huaweicloud.com/
> > 
> > As previously discussed, the work I have done on the global OOM is 'half
> > of the job'. Based on our discussions, I thought that it would be best
> > to abandon this approach for global OOM. Therefore, I am sending this
> > patch to revert the changes.
> > 
> > Or just leave it?
> 
> I suggested that part doesn't need to be in the patch, but if it was merged
> with it, we can just leave it there. Thanks.

Agreed!

-- 
Michal Hocko
SUSE Labs
Re: [PATCH] mm/oom_kill: revert watchdog reset in global OOM process
Posted by Chen Ridong 10 months, 1 week ago

On 2025/2/12 10:57, Chen Ridong wrote:
> From: Chen Ridong <chenridong@huawei.com>
> 
> Unlike memcg OOM, which is relatively common, global OOM events are rare
> and typically indicate that the entire system is under severe memory
> pressure. The commit ade81479c7dd ("memcg: fix soft lockup in the OOM
> process") added the touch_softlockup_watchdog in the global OOM handler to
> suppess the soft lockup issues. However, while this change can suppress
> soft lockup warnings, it does not address RCU stalls, which can still be
> detected and may cause unnecessary disturbances. Simply remove the
> modification from the global OOM handler.
> 
> Fixes: ade81479c7dd ("memcg: fix soft lockup in the OOM process")
> Signed-off-by: Chen Ridong <chenridong@huawei.com>
> ---
>  mm/oom_kill.c | 8 +-------
>  1 file changed, 1 insertion(+), 7 deletions(-)
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 25923cfec9c6..2d8b27604ef8 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -44,7 +44,6 @@
>  #include <linux/init.h>
>  #include <linux/mmu_notifier.h>
>  #include <linux/cred.h>
> -#include <linux/nmi.h>
>  
>  #include <asm/tlb.h>
>  #include "internal.h"
> @@ -431,15 +430,10 @@ static void dump_tasks(struct oom_control *oc)
>  		mem_cgroup_scan_tasks(oc->memcg, dump_task, oc);
>  	else {
>  		struct task_struct *p;
> -		int i = 0;
>  
>  		rcu_read_lock();
> -		for_each_process(p) {
> -			/* Avoid potential softlockup warning */
> -			if ((++i & 1023) == 0)
> -				touch_softlockup_watchdog();
> +		for_each_process(p)
>  			dump_task(p, oc);
> -		}
>  		rcu_read_unlock();
>  	}
>  }

Add discussion link:
https://lore.kernel.org/cgroups/0d9ea655-5c1a-4ba9-9eeb-b45d74cc68d0@huaweicloud.com/