[PATCH 5/6] mm, hwpoison: kill procs if unmap fails

Miaohe Lin posted 6 patches 3 years, 7 months ago
There is a newer version of this series
[PATCH 5/6] mm, hwpoison: kill procs if unmap fails
Posted by Miaohe Lin 3 years, 7 months ago
If try_to_unmap() fails, the hwpoisoned page still resides in the address
space of some processes. We should kill these processes or the hwpoisoned
page might be consumed later. collect_procs() is always called to collect
relevant processes now so they can be killed later if unmap fails.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
 mm/memory-failure.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index a2f4e8b00a26..5f9615a86296 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1396,7 +1396,7 @@ static bool hwpoison_user_mappings(struct page *p, unsigned long pfn,
 	struct address_space *mapping;
 	LIST_HEAD(tokill);
 	bool unmap_success;
-	int kill = 1, forcekill;
+	int forcekill;
 	bool mlocked = PageMlocked(hpage);
 
 	/*
@@ -1437,7 +1437,6 @@ static bool hwpoison_user_mappings(struct page *p, unsigned long pfn,
 		if (page_mkclean(hpage)) {
 			SetPageDirty(hpage);
 		} else {
-			kill = 0;
 			ttu |= TTU_IGNORE_HWPOISON;
 			pr_info("%#lx: corrupted page was clean: dropped without side effects\n",
 				pfn);
@@ -1452,8 +1451,7 @@ static bool hwpoison_user_mappings(struct page *p, unsigned long pfn,
 	 * Error handling: We ignore errors here because
 	 * there's nothing that can be done.
 	 */
-	if (kill)
-		collect_procs(hpage, &tokill, flags & MF_ACTION_REQUIRED);
+	collect_procs(hpage, &tokill, flags & MF_ACTION_REQUIRED);
 
 	if (PageHuge(hpage) && !PageAnon(hpage)) {
 		/*
@@ -1495,7 +1493,8 @@ static bool hwpoison_user_mappings(struct page *p, unsigned long pfn,
 	 * use a more force-full uncatchable kill to prevent
 	 * any accesses to the poisoned memory.
 	 */
-	forcekill = PageDirty(hpage) || (flags & MF_MUST_KILL);
+	forcekill = PageDirty(hpage) || (flags & MF_MUST_KILL) ||
+		    !unmap_success;
 	kill_procs(&tokill, forcekill, !unmap_success, pfn, flags);
 
 	return unmap_success;
-- 
2.23.0
Re: [PATCH 5/6] mm, hwpoison: kill procs if unmap fails
Posted by HORIGUCHI NAOYA(堀口 直也) 3 years, 7 months ago
On Thu, Aug 18, 2022 at 09:00:15PM +0800, Miaohe Lin wrote:
> If try_to_unmap() fails, the hwpoisoned page still resides in the address
> space of some processes. We should kill these processes or the hwpoisoned
> page might be consumed later. collect_procs() is always called to collect
> relevant processes now so they can be killed later if unmap fails.
> 
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> ---
>  mm/memory-failure.c | 9 ++++-----
>  1 file changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index a2f4e8b00a26..5f9615a86296 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1396,7 +1396,7 @@ static bool hwpoison_user_mappings(struct page *p, unsigned long pfn,
>  	struct address_space *mapping;
>  	LIST_HEAD(tokill);
>  	bool unmap_success;
> -	int kill = 1, forcekill;
> +	int forcekill;
>  	bool mlocked = PageMlocked(hpage);
>  
>  	/*
> @@ -1437,7 +1437,6 @@ static bool hwpoison_user_mappings(struct page *p, unsigned long pfn,
>  		if (page_mkclean(hpage)) {
>  			SetPageDirty(hpage);
>  		} else {
> -			kill = 0;
>  			ttu |= TTU_IGNORE_HWPOISON;
>  			pr_info("%#lx: corrupted page was clean: dropped without side effects\n",
>  				pfn);
> @@ -1452,8 +1451,7 @@ static bool hwpoison_user_mappings(struct page *p, unsigned long pfn,
>  	 * Error handling: We ignore errors here because
>  	 * there's nothing that can be done.

This above comment might be deprecated now (I'm not sure what this really mean),
so could you drop or update this?

Anyway, the patch looks good to me.

Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>

>  	 */
> -	if (kill)
> -		collect_procs(hpage, &tokill, flags & MF_ACTION_REQUIRED);
> +	collect_procs(hpage, &tokill, flags & MF_ACTION_REQUIRED);
>  
>  	if (PageHuge(hpage) && !PageAnon(hpage)) {
>  		/*
> @@ -1495,7 +1493,8 @@ static bool hwpoison_user_mappings(struct page *p, unsigned long pfn,
>  	 * use a more force-full uncatchable kill to prevent
>  	 * any accesses to the poisoned memory.
>  	 */
> -	forcekill = PageDirty(hpage) || (flags & MF_MUST_KILL);
> +	forcekill = PageDirty(hpage) || (flags & MF_MUST_KILL) ||
> +		    !unmap_success;
>  	kill_procs(&tokill, forcekill, !unmap_success, pfn, flags);
>  
>  	return unmap_success;
> -- 
> 2.23.0
Re: [PATCH 5/6] mm, hwpoison: kill procs if unmap fails
Posted by Miaohe Lin 3 years, 7 months ago
On 2022/8/19 13:24, HORIGUCHI NAOYA(堀口 直也) wrote:
> On Thu, Aug 18, 2022 at 09:00:15PM +0800, Miaohe Lin wrote:
>> If try_to_unmap() fails, the hwpoisoned page still resides in the address
>> space of some processes. We should kill these processes or the hwpoisoned
>> page might be consumed later. collect_procs() is always called to collect
>> relevant processes now so they can be killed later if unmap fails.
>>
>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>> ---
>>  mm/memory-failure.c | 9 ++++-----
>>  1 file changed, 4 insertions(+), 5 deletions(-)
>>
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index a2f4e8b00a26..5f9615a86296 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -1396,7 +1396,7 @@ static bool hwpoison_user_mappings(struct page *p, unsigned long pfn,
>>  	struct address_space *mapping;
>>  	LIST_HEAD(tokill);
>>  	bool unmap_success;
>> -	int kill = 1, forcekill;
>> +	int forcekill;
>>  	bool mlocked = PageMlocked(hpage);
>>  
>>  	/*
>> @@ -1437,7 +1437,6 @@ static bool hwpoison_user_mappings(struct page *p, unsigned long pfn,
>>  		if (page_mkclean(hpage)) {
>>  			SetPageDirty(hpage);
>>  		} else {
>> -			kill = 0;
>>  			ttu |= TTU_IGNORE_HWPOISON;
>>  			pr_info("%#lx: corrupted page was clean: dropped without side effects\n",
>>  				pfn);
>> @@ -1452,8 +1451,7 @@ static bool hwpoison_user_mappings(struct page *p, unsigned long pfn,
>>  	 * Error handling: We ignore errors here because
>>  	 * there's nothing that can be done.
> 
> This above comment might be deprecated now (I'm not sure what this really mean),
> so could you drop or update this?

Do you mean remove the below comment? In fact, this doesn't make much sense for me.

* Error handling: We ignore errors here because
* there's nothing that can be done.

> 
> Anyway, the patch looks good to me.
> 
> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>

Many thanks for review.

Thanks,
Miaohe Lin

Re: [PATCH 5/6] mm, hwpoison: kill procs if unmap fails
Posted by HORIGUCHI NAOYA(堀口 直也) 3 years, 7 months ago
On Fri, Aug 19, 2022 at 03:37:23PM +0800, Miaohe Lin wrote:
> On 2022/8/19 13:24, HORIGUCHI NAOYA(堀口 直也) wrote:
> > On Thu, Aug 18, 2022 at 09:00:15PM +0800, Miaohe Lin wrote:
> >> If try_to_unmap() fails, the hwpoisoned page still resides in the address
> >> space of some processes. We should kill these processes or the hwpoisoned
> >> page might be consumed later. collect_procs() is always called to collect
> >> relevant processes now so they can be killed later if unmap fails.
> >>
> >> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> >> ---
> >>  mm/memory-failure.c | 9 ++++-----
> >>  1 file changed, 4 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> >> index a2f4e8b00a26..5f9615a86296 100644
> >> --- a/mm/memory-failure.c
> >> +++ b/mm/memory-failure.c
> >> @@ -1396,7 +1396,7 @@ static bool hwpoison_user_mappings(struct page *p, unsigned long pfn,
> >>  	struct address_space *mapping;
> >>  	LIST_HEAD(tokill);
> >>  	bool unmap_success;
> >> -	int kill = 1, forcekill;
> >> +	int forcekill;
> >>  	bool mlocked = PageMlocked(hpage);
> >>  
> >>  	/*
> >> @@ -1437,7 +1437,6 @@ static bool hwpoison_user_mappings(struct page *p, unsigned long pfn,
> >>  		if (page_mkclean(hpage)) {
> >>  			SetPageDirty(hpage);
> >>  		} else {
> >> -			kill = 0;
> >>  			ttu |= TTU_IGNORE_HWPOISON;
> >>  			pr_info("%#lx: corrupted page was clean: dropped without side effects\n",
> >>  				pfn);
> >> @@ -1452,8 +1451,7 @@ static bool hwpoison_user_mappings(struct page *p, unsigned long pfn,
> >>  	 * Error handling: We ignore errors here because
> >>  	 * there's nothing that can be done.
> > 
> > This above comment might be deprecated now (I'm not sure what this really mean),
> > so could you drop or update this?
> 
> Do you mean remove the below comment? In fact, this doesn't make much sense for me.
> 
> * Error handling: We ignore errors here because
> * there's nothing that can be done.

Yes, that's what I meant.

Thanks,
Naoya Horiguchi