[RFC 2/2] ACPI: APEI: fix reboot caused by synchronous error loop because of memory_failure() failed

Lv Ying posted 2 patches 2 years, 9 months ago
[RFC 2/2] ACPI: APEI: fix reboot caused by synchronous error loop because of memory_failure() failed
Posted by Lv Ying 2 years, 9 months ago
Synchronous error was detected as a result of user-space accessing a
corrupt memory location the CPU may take an abort instead. On arm64 this
is a 'synchronous external abort' which can be notified by SEA.

If memory_failure() failed, we return to user-space will trigger SEA again,
such loop may cause platform firmware to exceed some threshold and reboot
when Linux could have recovered from this error. Not all memory_failure()
processing failures will cause the reboot, VM_FAULT_HWPOISON[_LARGE]
handling in arm64 page fault will send SIGBUS signal to the user-space
accessing process to terminate this loop.

If process mapping fault page, but memory_failure() abnormal return before
try_to_unmap(), for example, the fault page process mapping is KSM page.
In this case, arm64 cannot use the page fault process to terminate the
loop.

Add judgement of memory_failure() result in task_work before returning to
user-space. If memory_failure() failed, send SIGBUS signal to the current
process to avoid SEA loop.

Signed-off-by: Lv Ying <lvying6@huawei.com>
---
 mm/memory-failure.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 3b6ac3694b8d..4c1c558f7161 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -2266,7 +2266,11 @@ static void __memory_failure_work_func(struct work_struct *work, bool sync)
 			break;
 		if (entry.flags & MF_SOFT_OFFLINE)
 			soft_offline_page(entry.pfn, entry.flags);
-		else if (!sync || (entry.flags & MF_ACTION_REQUIRED))
+		else if (sync) {
+			if ((entry.flags & MF_ACTION_REQUIRED) &&
+					memory_failure(entry.pfn, entry.flags))
+				force_sig_mceerr(BUS_MCEERR_AR, 0, 0);
+		} else
 			memory_failure(entry.pfn, entry.flags);
 	}
 }
-- 
2.33.0
Re: [RFC 2/2] ACPI: APEI: fix reboot caused by synchronous error loop because of memory_failure() failed
Posted by Bixuan Cui 2 years, 9 months ago

在 2022/12/5 19:51, Lv Ying 写道:
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 3b6ac3694b8d..4c1c558f7161 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -2266,7 +2266,11 @@ static void __memory_failure_work_func(struct work_struct *work, bool sync)
>   			break;
>   		if (entry.flags & MF_SOFT_OFFLINE)
>   			soft_offline_page(entry.pfn, entry.flags);
> -		else if (!sync || (entry.flags & MF_ACTION_REQUIRED))
> +		else if (sync) {
> +			if ((entry.flags & MF_ACTION_REQUIRED) &&
> +					memory_failure(entry.pfn, entry.flags))
> +				force_sig_mceerr(BUS_MCEERR_AR, 0, 0);
> +		} else
>   			memory_failure(entry.pfn, entry.flags);
Hi,

Some of the ideas in this patch are wrong :-(

1. As Shuai Xue said, it is wrong to judge synchronization error and 
asynchronization error through functions such as 
memory_failure_queue_kick()/ghes_proc()/ghes_proc_in_irq(), because both 
synchronization error and asynchronization error may go to the same 
notification.

2. There is no need to pass 'sync' to __memory_failure_work_func(), 
because memory_failure() can directly handle synchronous and 
asynchronous errors according to entry.flags & MF_ACTION_REQUIRED:

entry.flags & MF_ACTION_REQUIRED == 1: Action: poison page and kill task 
for synchronous error
entry.flags & MF_ACTION_REQUIRED == 0: Action: poison page for 
asynchronous error

Reference x86:
do_machine_check # MCE, synchronous
    ->kill_me_maybe
      ->memory_failure(p->mce_addr >> PAGE_SHIFT, MF_ACTION_REQUIRED);

uc_decode_notifier # CMCI, asynchronous
    ->memory_failure(pfn, 0)

At the same time, the modification here is repeated with your patch 01
  	if (sev == GHES_SEV_RECOVERABLE && sec_sev == GHES_SEV_RECOVERABLE)
-		flags = 0;
+		flags = sync ? MF_ACTION_REQUIRED : 0;

3. Why add 'force_sig_mceerr(BUS_MCEERR_AR, 0, 0)' after 
memory_failure(pfn, MF_ACTION_REQUIRED)?
The task will be killed in memory_failure():
if poisoned, kill_accessing_process()->kill_proc()
if not poisoned, hwpoison_user_mappings()->collect_procs()->kill_procs()

Reference x86 to handle synchronous error:
kill_me_maybe()
{
     int flags = MF_ACTION_REQUIRED;
     ret = memory_failure(p->mce_addr >> PAGE_SHIFT, flags);
     if (!ret) {
	...
         return;
     }
     if (ret == -EHWPOISON || ret == -EOPNOTSUPP)
         return;

     pr_err("Memory error not recovered");
     kill_me_now(cb);
}


Thanks,
Bixuan Cui

Re: [RFC 2/2] ACPI: APEI: fix reboot caused by synchronous error loop because of memory_failure() failed
Posted by Lv Ying 2 years, 9 months ago
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index 3b6ac3694b8d..4c1c558f7161 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -2266,7 +2266,11 @@ static void __memory_failure_work_func(struct 
>> work_struct *work, bool sync)
>>               break;
>>           if (entry.flags & MF_SOFT_OFFLINE)
>>               soft_offline_page(entry.pfn, entry.flags);
>> -        else if (!sync || (entry.flags & MF_ACTION_REQUIRED))
>> +        else if (sync) {
>> +            if ((entry.flags & MF_ACTION_REQUIRED) &&
>> +                    memory_failure(entry.pfn, entry.flags))
>> +                force_sig_mceerr(BUS_MCEERR_AR, 0, 0);
>> +        } else
>>               memory_failure(entry.pfn, entry.flags);
> Hi,
> 
> Some of the ideas in this patch are wrong :-(
> 
> 1. As Shuai Xue said, it is wrong to judge synchronization error and 
> asynchronization error through functions such as 
> memory_failure_queue_kick()/ghes_proc()/ghes_proc_in_irq(), because both 
> synchronization error and asynchronization error may go to the same 
> notification.
> 
Hi Bixuan:

Thanks for your review. I agree with you that ghes_proc_in_irq() is
called in SDEI, SEA, NMI notify type, they are NMI-like notify, this
function run some job which may not be NMI safe in IRQ context. And NMI
may be asynchronous error.

However, cureent kernel use ghes_kick_task_work in ghes_proc_in_irq(),
there is an assumption here that ghes_proc_in_irq() are currently in the
context of a synchronous exception, although this is not appropriate.

The challenge for my patch is to prove the rationality of distinguishing
synchronous errors. I do not have a good idea yet of distinguishing 
synchronous error by looking through ACPI/UEFI spec, so I sent this
patchset for more input. And I resent RFC PATCH v1 [1]add this as TODO.

> 2. There is no need to pass 'sync' to __memory_failure_work_func(), 
> because memory_failure() can directly handle synchronous and 
> asynchronous errors according to entry.flags & MF_ACTION_REQUIRED:
> 
> entry.flags & MF_ACTION_REQUIRED == 1: Action: poison page and kill task 
> for synchronous error
> entry.flags & MF_ACTION_REQUIRED == 0: Action: poison page for 
> asynchronous error
> 
> Reference x86:
> do_machine_check # MCE, synchronous
>     ->kill_me_maybe
>       ->memory_failure(p->mce_addr >> PAGE_SHIFT, MF_ACTION_REQUIRED);
> 
> uc_decode_notifier # CMCI, asynchronous
>     ->memory_failure(pfn, 0)
> 
> At the same time, the modification here is repeated with your patch 01
>       if (sev == GHES_SEV_RECOVERABLE && sec_sev == GHES_SEV_RECOVERABLE)
> -        flags = 0;
> +        flags = sync ? MF_ACTION_REQUIRED : 0;
> 

Thanks, there is indeed no need to pass 'sync' to
__memory_failure_work_func(). MF_ACTION_REQUIRED can cover this, I will 
update it in the next version patchset.

> 3. Why add 'force_sig_mceerr(BUS_MCEERR_AR, 0, 0)' after 
> memory_failure(pfn, MF_ACTION_REQUIRED)?
> The task will be killed in memory_failure():
> if poisoned, kill_accessing_process()->kill_proc()
> if not poisoned, hwpoison_user_mappings()->collect_procs()->kill_procs()
> 
> Reference x86 to handle synchronous error:
> kill_me_maybe()
> {
>      int flags = MF_ACTION_REQUIRED;
>      ret = memory_failure(p->mce_addr >> PAGE_SHIFT, flags);
>      if (!ret) {
>      ...
>          return;
>      }
>      if (ret == -EHWPOISON || ret == -EOPNOTSUPP)
>          return;
> 
>      pr_err("Memory error not recovered");
>      kill_me_now(cb);
> }
> 

Thanks again, this patch is based on synchronous error is not 
distinguished from
asynchronous  error, in that case, kill_accessing_process() run in 
kthread worker may not kill current thread. Now, based on the first 
patch, this SEA loop can be handled. But this patch is also needed 
reference x86 kill_me_maybe(), I update this patch in RFC PATCH v1[1].
I will integrate this patch into the first patch, because this patch 
commit message is not suitable based on the first patch.

[1] 
https://lore.kernel.org/linux-mm/20221207093935.1972530-1-lvying6@huawei.com/T/


-- 
Thanks!
Lv Ying