[PATCH v2 2/3] mm/memory-failure: improve large block size folio handling.

Zi Yan posted 3 patches 2 months ago
[PATCH v2 2/3] mm/memory-failure: improve large block size folio handling.
Posted by Zi Yan 2 months ago
Large block size (LBS) folios cannot be split to order-0 folios but
min_order_for_folio(). Current split fails directly, but that is not
optimal. Split the folio to min_order_for_folio(), so that, after split,
only the folio containing the poisoned page becomes unusable instead.

For soft offline, do not split the large folio if it cannot be split to
order-0. Since the folio is still accessible from userspace and premature
split might lead to potential performance loss.

Suggested-by: Jane Chu <jane.chu@oracle.com>
Signed-off-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
---
 mm/memory-failure.c | 25 +++++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index f698df156bf8..443df9581c24 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1656,12 +1656,13 @@ static int identify_page_state(unsigned long pfn, struct page *p,
  * there is still more to do, hence the page refcount we took earlier
  * is still needed.
  */
-static int try_to_split_thp_page(struct page *page, bool release)
+static int try_to_split_thp_page(struct page *page, unsigned int new_order,
+		bool release)
 {
 	int ret;
 
 	lock_page(page);
-	ret = split_huge_page(page);
+	ret = split_huge_page_to_list_to_order(page, NULL, new_order);
 	unlock_page(page);
 
 	if (ret && release)
@@ -2280,6 +2281,7 @@ int memory_failure(unsigned long pfn, int flags)
 	folio_unlock(folio);
 
 	if (folio_test_large(folio)) {
+		int new_order = min_order_for_split(folio);
 		/*
 		 * The flag must be set after the refcount is bumped
 		 * otherwise it may race with THP split.
@@ -2294,7 +2296,14 @@ int memory_failure(unsigned long pfn, int flags)
 		 * page is a valid handlable page.
 		 */
 		folio_set_has_hwpoisoned(folio);
-		if (try_to_split_thp_page(p, false) < 0) {
+		/*
+		 * If the folio cannot be split to order-0, kill the process,
+		 * but split the folio anyway to minimize the amount of unusable
+		 * pages.
+		 */
+		if (try_to_split_thp_page(p, new_order, false) || new_order) {
+			/* get folio again in case the original one is split */
+			folio = page_folio(p);
 			res = -EHWPOISON;
 			kill_procs_now(p, pfn, flags, folio);
 			put_page(p);
@@ -2621,7 +2630,15 @@ static int soft_offline_in_use_page(struct page *page)
 	};
 
 	if (!huge && folio_test_large(folio)) {
-		if (try_to_split_thp_page(page, true)) {
+		int new_order = min_order_for_split(folio);
+
+		/*
+		 * If the folio cannot be split to order-0, do not split it at
+		 * all to retain the still accessible large folio.
+		 * NOTE: if getting free memory is perferred, split it like it
+		 * is done in memory_failure().
+		 */
+		if (new_order || try_to_split_thp_page(page, new_order, true)) {
 			pr_info("%#lx: thp split failed\n", pfn);
 			return -EBUSY;
 		}
-- 
2.51.0
Re: [PATCH v2 2/3] mm/memory-failure: improve large block size folio handling.
Posted by Yang Shi 2 months ago
On Wed, Oct 15, 2025 at 8:38 PM Zi Yan <ziy@nvidia.com> wrote:
>
> Large block size (LBS) folios cannot be split to order-0 folios but
> min_order_for_folio(). Current split fails directly, but that is not
> optimal. Split the folio to min_order_for_folio(), so that, after split,
> only the folio containing the poisoned page becomes unusable instead.
>
> For soft offline, do not split the large folio if it cannot be split to
> order-0. Since the folio is still accessible from userspace and premature
> split might lead to potential performance loss.
>
> Suggested-by: Jane Chu <jane.chu@oracle.com>
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
> ---
>  mm/memory-failure.c | 25 +++++++++++++++++++++----
>  1 file changed, 21 insertions(+), 4 deletions(-)
>
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index f698df156bf8..443df9581c24 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1656,12 +1656,13 @@ static int identify_page_state(unsigned long pfn, struct page *p,
>   * there is still more to do, hence the page refcount we took earlier
>   * is still needed.
>   */
> -static int try_to_split_thp_page(struct page *page, bool release)
> +static int try_to_split_thp_page(struct page *page, unsigned int new_order,
> +               bool release)
>  {
>         int ret;
>
>         lock_page(page);
> -       ret = split_huge_page(page);
> +       ret = split_huge_page_to_list_to_order(page, NULL, new_order);
>         unlock_page(page);
>
>         if (ret && release)
> @@ -2280,6 +2281,7 @@ int memory_failure(unsigned long pfn, int flags)
>         folio_unlock(folio);
>
>         if (folio_test_large(folio)) {
> +               int new_order = min_order_for_split(folio);
>                 /*
>                  * The flag must be set after the refcount is bumped
>                  * otherwise it may race with THP split.
> @@ -2294,7 +2296,14 @@ int memory_failure(unsigned long pfn, int flags)
>                  * page is a valid handlable page.
>                  */
>                 folio_set_has_hwpoisoned(folio);
> -               if (try_to_split_thp_page(p, false) < 0) {
> +               /*
> +                * If the folio cannot be split to order-0, kill the process,
> +                * but split the folio anyway to minimize the amount of unusable
> +                * pages.
> +                */
> +               if (try_to_split_thp_page(p, new_order, false) || new_order) {

folio split will clear PG_has_hwpoisoned flag. It is ok for splitting
to order-0 folios because the PG_hwpoisoned flag is set on the
poisoned page. But if you split the folio to some smaller order large
folios, it seems you need to keep PG_has_hwpoisoned flag on the
poisoned folio.

Yang


> +                       /* get folio again in case the original one is split */
> +                       folio = page_folio(p);
>                         res = -EHWPOISON;
>                         kill_procs_now(p, pfn, flags, folio);
>                         put_page(p);
> @@ -2621,7 +2630,15 @@ static int soft_offline_in_use_page(struct page *page)
>         };
>
>         if (!huge && folio_test_large(folio)) {
> -               if (try_to_split_thp_page(page, true)) {
> +               int new_order = min_order_for_split(folio);
> +
> +               /*
> +                * If the folio cannot be split to order-0, do not split it at
> +                * all to retain the still accessible large folio.
> +                * NOTE: if getting free memory is perferred, split it like it
> +                * is done in memory_failure().
> +                */
> +               if (new_order || try_to_split_thp_page(page, new_order, true)) {
>                         pr_info("%#lx: thp split failed\n", pfn);
>                         return -EBUSY;
>                 }
> --
> 2.51.0
>
>
Re: [PATCH v2 2/3] mm/memory-failure: improve large block size folio handling.
Posted by Zi Yan 1 month, 4 weeks ago
On 17 Oct 2025, at 15:11, Yang Shi wrote:

> On Wed, Oct 15, 2025 at 8:38 PM Zi Yan <ziy@nvidia.com> wrote:
>>
>> Large block size (LBS) folios cannot be split to order-0 folios but
>> min_order_for_folio(). Current split fails directly, but that is not
>> optimal. Split the folio to min_order_for_folio(), so that, after split,
>> only the folio containing the poisoned page becomes unusable instead.
>>
>> For soft offline, do not split the large folio if it cannot be split to
>> order-0. Since the folio is still accessible from userspace and premature
>> split might lead to potential performance loss.
>>
>> Suggested-by: Jane Chu <jane.chu@oracle.com>
>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
>> ---
>>  mm/memory-failure.c | 25 +++++++++++++++++++++----
>>  1 file changed, 21 insertions(+), 4 deletions(-)
>>
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index f698df156bf8..443df9581c24 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -1656,12 +1656,13 @@ static int identify_page_state(unsigned long pfn, struct page *p,
>>   * there is still more to do, hence the page refcount we took earlier
>>   * is still needed.
>>   */
>> -static int try_to_split_thp_page(struct page *page, bool release)
>> +static int try_to_split_thp_page(struct page *page, unsigned int new_order,
>> +               bool release)
>>  {
>>         int ret;
>>
>>         lock_page(page);
>> -       ret = split_huge_page(page);
>> +       ret = split_huge_page_to_list_to_order(page, NULL, new_order);
>>         unlock_page(page);
>>
>>         if (ret && release)
>> @@ -2280,6 +2281,7 @@ int memory_failure(unsigned long pfn, int flags)
>>         folio_unlock(folio);
>>
>>         if (folio_test_large(folio)) {
>> +               int new_order = min_order_for_split(folio);
>>                 /*
>>                  * The flag must be set after the refcount is bumped
>>                  * otherwise it may race with THP split.
>> @@ -2294,7 +2296,14 @@ int memory_failure(unsigned long pfn, int flags)
>>                  * page is a valid handlable page.
>>                  */
>>                 folio_set_has_hwpoisoned(folio);
>> -               if (try_to_split_thp_page(p, false) < 0) {
>> +               /*
>> +                * If the folio cannot be split to order-0, kill the process,
>> +                * but split the folio anyway to minimize the amount of unusable
>> +                * pages.
>> +                */
>> +               if (try_to_split_thp_page(p, new_order, false) || new_order) {
>
> folio split will clear PG_has_hwpoisoned flag. It is ok for splitting
> to order-0 folios because the PG_hwpoisoned flag is set on the
> poisoned page. But if you split the folio to some smaller order large
> folios, it seems you need to keep PG_has_hwpoisoned flag on the
> poisoned folio.

OK, this means all pages in a folio with folio_test_has_hwpoisoned() should be
checked to be able to set after-split folio's flag properly. Current folio
split code does not do that. I am thinking about whether that causes any
issue. Probably not, because:

1. before Patch 1 is applied, large after-split folios are already causing
a warning in memory_failure(). That kinda masks this issue.
2. after Patch 1 is applied, no large after-split folios will appear,
since the split will fail.

@Miaohe and @Jane, please let me know if my above reasoning makes sense or not.

To make this patch right, folio's has_hwpoisoned flag needs to be preserved
like what Yang described above. My current plan is to move
folio_clear_has_hwpoisoned(folio) into __split_folio_to_order() and
scan every page in the folio if the folio's has_hwpoisoned is set.
There will be redundant scans in non uniform split case, since a has_hwpoisoned
folio can be split multiple times (leading to multiple page scans), unless
the scan result is stored.

@Miaohe and @Jane, is it possible to have multiple HW poisoned pages in
a folio? Is the memory failure process like 1) page access causing MCE,
2) memory_failure() is used to handle it and split the large folio containing
it? Or multiple MCEs can be received and multiple pages in a folio are marked
then a split would happen?

>
> Yang
>
>
>> +                       /* get folio again in case the original one is split */
>> +                       folio = page_folio(p);
>>                         res = -EHWPOISON;
>>                         kill_procs_now(p, pfn, flags, folio);
>>                         put_page(p);
>> @@ -2621,7 +2630,15 @@ static int soft_offline_in_use_page(struct page *page)
>>         };
>>
>>         if (!huge && folio_test_large(folio)) {
>> -               if (try_to_split_thp_page(page, true)) {
>> +               int new_order = min_order_for_split(folio);
>> +
>> +               /*
>> +                * If the folio cannot be split to order-0, do not split it at
>> +                * all to retain the still accessible large folio.
>> +                * NOTE: if getting free memory is perferred, split it like it
>> +                * is done in memory_failure().
>> +                */
>> +               if (new_order || try_to_split_thp_page(page, new_order, true)) {
>>                         pr_info("%#lx: thp split failed\n", pfn);
>>                         return -EBUSY;
>>                 }
>> --
>> 2.51.0
>>
>>


--
Best Regards,
Yan, Zi
Re: [PATCH v2 2/3] mm/memory-failure: improve large block size folio handling.
Posted by Miaohe Lin 1 month, 4 weeks ago
On 2025/10/21 3:46, Zi Yan wrote:
> On 17 Oct 2025, at 15:11, Yang Shi wrote:
> 
>> On Wed, Oct 15, 2025 at 8:38 PM Zi Yan <ziy@nvidia.com> wrote:
>>>
>>> Large block size (LBS) folios cannot be split to order-0 folios but
>>> min_order_for_folio(). Current split fails directly, but that is not
>>> optimal. Split the folio to min_order_for_folio(), so that, after split,
>>> only the folio containing the poisoned page becomes unusable instead.
>>>
>>> For soft offline, do not split the large folio if it cannot be split to
>>> order-0. Since the folio is still accessible from userspace and premature
>>> split might lead to potential performance loss.
>>>
>>> Suggested-by: Jane Chu <jane.chu@oracle.com>
>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>>> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
>>> ---
>>>  mm/memory-failure.c | 25 +++++++++++++++++++++----
>>>  1 file changed, 21 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>>> index f698df156bf8..443df9581c24 100644
>>> --- a/mm/memory-failure.c
>>> +++ b/mm/memory-failure.c
>>> @@ -1656,12 +1656,13 @@ static int identify_page_state(unsigned long pfn, struct page *p,
>>>   * there is still more to do, hence the page refcount we took earlier
>>>   * is still needed.
>>>   */
>>> -static int try_to_split_thp_page(struct page *page, bool release)
>>> +static int try_to_split_thp_page(struct page *page, unsigned int new_order,
>>> +               bool release)
>>>  {
>>>         int ret;
>>>
>>>         lock_page(page);
>>> -       ret = split_huge_page(page);
>>> +       ret = split_huge_page_to_list_to_order(page, NULL, new_order);
>>>         unlock_page(page);
>>>
>>>         if (ret && release)
>>> @@ -2280,6 +2281,7 @@ int memory_failure(unsigned long pfn, int flags)
>>>         folio_unlock(folio);
>>>
>>>         if (folio_test_large(folio)) {
>>> +               int new_order = min_order_for_split(folio);
>>>                 /*
>>>                  * The flag must be set after the refcount is bumped
>>>                  * otherwise it may race with THP split.
>>> @@ -2294,7 +2296,14 @@ int memory_failure(unsigned long pfn, int flags)
>>>                  * page is a valid handlable page.
>>>                  */
>>>                 folio_set_has_hwpoisoned(folio);
>>> -               if (try_to_split_thp_page(p, false) < 0) {
>>> +               /*
>>> +                * If the folio cannot be split to order-0, kill the process,
>>> +                * but split the folio anyway to minimize the amount of unusable
>>> +                * pages.
>>> +                */
>>> +               if (try_to_split_thp_page(p, new_order, false) || new_order) {
>>
>> folio split will clear PG_has_hwpoisoned flag. It is ok for splitting
>> to order-0 folios because the PG_hwpoisoned flag is set on the
>> poisoned page. But if you split the folio to some smaller order large
>> folios, it seems you need to keep PG_has_hwpoisoned flag on the
>> poisoned folio.
> 
> OK, this means all pages in a folio with folio_test_has_hwpoisoned() should be
> checked to be able to set after-split folio's flag properly. Current folio
> split code does not do that. I am thinking about whether that causes any
> issue. Probably not, because:
> 
> 1. before Patch 1 is applied, large after-split folios are already causing
> a warning in memory_failure(). That kinda masks this issue.
> 2. after Patch 1 is applied, no large after-split folios will appear,
> since the split will fail.
> 
> @Miaohe and @Jane, please let me know if my above reasoning makes sense or not.
> 
> To make this patch right, folio's has_hwpoisoned flag needs to be preserved
> like what Yang described above. My current plan is to move
> folio_clear_has_hwpoisoned(folio) into __split_folio_to_order() and
> scan every page in the folio if the folio's has_hwpoisoned is set.
> There will be redundant scans in non uniform split case, since a has_hwpoisoned
> folio can be split multiple times (leading to multiple page scans), unless
> the scan result is stored.
> 
> @Miaohe and @Jane, is it possible to have multiple HW poisoned pages in
> a folio? Is the memory failure process like 1) page access causing MCE,
> 2) memory_failure() is used to handle it and split the large folio containing
> it? Or multiple MCEs can be received and multiple pages in a folio are marked
> then a split would happen?
memory_failure() is called with mf_mutex held. So I think even if multiple pages
in a folio trigger multiple MCEs at the same time, only one page will have HWPoison
flag set when splitting folio. If folio is successfully split, things look fine.
But if it fails to split folio due to e.g. extra refcnt held by others, the following
pages will see that there are already multiple pages in a folio are marked as HWPoison.
This is the scenario I can think of at the moment.

Thanks.
.

Re: [PATCH v2 2/3] mm/memory-failure: improve large block size folio handling.
Posted by Yang Shi 1 month, 4 weeks ago
On Mon, Oct 20, 2025 at 12:46 PM Zi Yan <ziy@nvidia.com> wrote:
>
> On 17 Oct 2025, at 15:11, Yang Shi wrote:
>
> > On Wed, Oct 15, 2025 at 8:38 PM Zi Yan <ziy@nvidia.com> wrote:
> >>
> >> Large block size (LBS) folios cannot be split to order-0 folios but
> >> min_order_for_folio(). Current split fails directly, but that is not
> >> optimal. Split the folio to min_order_for_folio(), so that, after split,
> >> only the folio containing the poisoned page becomes unusable instead.
> >>
> >> For soft offline, do not split the large folio if it cannot be split to
> >> order-0. Since the folio is still accessible from userspace and premature
> >> split might lead to potential performance loss.
> >>
> >> Suggested-by: Jane Chu <jane.chu@oracle.com>
> >> Signed-off-by: Zi Yan <ziy@nvidia.com>
> >> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
> >> ---
> >>  mm/memory-failure.c | 25 +++++++++++++++++++++----
> >>  1 file changed, 21 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> >> index f698df156bf8..443df9581c24 100644
> >> --- a/mm/memory-failure.c
> >> +++ b/mm/memory-failure.c
> >> @@ -1656,12 +1656,13 @@ static int identify_page_state(unsigned long pfn, struct page *p,
> >>   * there is still more to do, hence the page refcount we took earlier
> >>   * is still needed.
> >>   */
> >> -static int try_to_split_thp_page(struct page *page, bool release)
> >> +static int try_to_split_thp_page(struct page *page, unsigned int new_order,
> >> +               bool release)
> >>  {
> >>         int ret;
> >>
> >>         lock_page(page);
> >> -       ret = split_huge_page(page);
> >> +       ret = split_huge_page_to_list_to_order(page, NULL, new_order);
> >>         unlock_page(page);
> >>
> >>         if (ret && release)
> >> @@ -2280,6 +2281,7 @@ int memory_failure(unsigned long pfn, int flags)
> >>         folio_unlock(folio);
> >>
> >>         if (folio_test_large(folio)) {
> >> +               int new_order = min_order_for_split(folio);
> >>                 /*
> >>                  * The flag must be set after the refcount is bumped
> >>                  * otherwise it may race with THP split.
> >> @@ -2294,7 +2296,14 @@ int memory_failure(unsigned long pfn, int flags)
> >>                  * page is a valid handlable page.
> >>                  */
> >>                 folio_set_has_hwpoisoned(folio);
> >> -               if (try_to_split_thp_page(p, false) < 0) {
> >> +               /*
> >> +                * If the folio cannot be split to order-0, kill the process,
> >> +                * but split the folio anyway to minimize the amount of unusable
> >> +                * pages.
> >> +                */
> >> +               if (try_to_split_thp_page(p, new_order, false) || new_order) {
> >
> > folio split will clear PG_has_hwpoisoned flag. It is ok for splitting
> > to order-0 folios because the PG_hwpoisoned flag is set on the
> > poisoned page. But if you split the folio to some smaller order large
> > folios, it seems you need to keep PG_has_hwpoisoned flag on the
> > poisoned folio.
>
> OK, this means all pages in a folio with folio_test_has_hwpoisoned() should be
> checked to be able to set after-split folio's flag properly. Current folio
> split code does not do that. I am thinking about whether that causes any
> issue. Probably not, because:
>
> 1. before Patch 1 is applied, large after-split folios are already causing
> a warning in memory_failure(). That kinda masks this issue.
> 2. after Patch 1 is applied, no large after-split folios will appear,
> since the split will fail.

I'm a little bit confused. Didn't this patch split large folio to
new-order-large-folio (new order is min order)? So this patch had
code:
if (try_to_split_thp_page(p, new_order, false) || new_order) {

Thanks,
Yang

>
> @Miaohe and @Jane, please let me know if my above reasoning makes sense or not.
>
> To make this patch right, folio's has_hwpoisoned flag needs to be preserved
> like what Yang described above. My current plan is to move
> folio_clear_has_hwpoisoned(folio) into __split_folio_to_order() and
> scan every page in the folio if the folio's has_hwpoisoned is set.
> There will be redundant scans in non uniform split case, since a has_hwpoisoned
> folio can be split multiple times (leading to multiple page scans), unless
> the scan result is stored.
>
> @Miaohe and @Jane, is it possible to have multiple HW poisoned pages in
> a folio? Is the memory failure process like 1) page access causing MCE,
> 2) memory_failure() is used to handle it and split the large folio containing
> it? Or multiple MCEs can be received and multiple pages in a folio are marked
> then a split would happen?
>
> >
> > Yang
> >
> >
> >> +                       /* get folio again in case the original one is split */
> >> +                       folio = page_folio(p);
> >>                         res = -EHWPOISON;
> >>                         kill_procs_now(p, pfn, flags, folio);
> >>                         put_page(p);
> >> @@ -2621,7 +2630,15 @@ static int soft_offline_in_use_page(struct page *page)
> >>         };
> >>
> >>         if (!huge && folio_test_large(folio)) {
> >> -               if (try_to_split_thp_page(page, true)) {
> >> +               int new_order = min_order_for_split(folio);
> >> +
> >> +               /*
> >> +                * If the folio cannot be split to order-0, do not split it at
> >> +                * all to retain the still accessible large folio.
> >> +                * NOTE: if getting free memory is perferred, split it like it
> >> +                * is done in memory_failure().
> >> +                */
> >> +               if (new_order || try_to_split_thp_page(page, new_order, true)) {
> >>                         pr_info("%#lx: thp split failed\n", pfn);
> >>                         return -EBUSY;
> >>                 }
> >> --
> >> 2.51.0
> >>
> >>
>
>
> --
> Best Regards,
> Yan, Zi
Re: [PATCH v2 2/3] mm/memory-failure: improve large block size folio handling.
Posted by Zi Yan 1 month, 4 weeks ago
On 20 Oct 2025, at 19:41, Yang Shi wrote:

> On Mon, Oct 20, 2025 at 12:46 PM Zi Yan <ziy@nvidia.com> wrote:
>>
>> On 17 Oct 2025, at 15:11, Yang Shi wrote:
>>
>>> On Wed, Oct 15, 2025 at 8:38 PM Zi Yan <ziy@nvidia.com> wrote:
>>>>
>>>> Large block size (LBS) folios cannot be split to order-0 folios but
>>>> min_order_for_folio(). Current split fails directly, but that is not
>>>> optimal. Split the folio to min_order_for_folio(), so that, after split,
>>>> only the folio containing the poisoned page becomes unusable instead.
>>>>
>>>> For soft offline, do not split the large folio if it cannot be split to
>>>> order-0. Since the folio is still accessible from userspace and premature
>>>> split might lead to potential performance loss.
>>>>
>>>> Suggested-by: Jane Chu <jane.chu@oracle.com>
>>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>>>> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
>>>> ---
>>>>  mm/memory-failure.c | 25 +++++++++++++++++++++----
>>>>  1 file changed, 21 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>>>> index f698df156bf8..443df9581c24 100644
>>>> --- a/mm/memory-failure.c
>>>> +++ b/mm/memory-failure.c
>>>> @@ -1656,12 +1656,13 @@ static int identify_page_state(unsigned long pfn, struct page *p,
>>>>   * there is still more to do, hence the page refcount we took earlier
>>>>   * is still needed.
>>>>   */
>>>> -static int try_to_split_thp_page(struct page *page, bool release)
>>>> +static int try_to_split_thp_page(struct page *page, unsigned int new_order,
>>>> +               bool release)
>>>>  {
>>>>         int ret;
>>>>
>>>>         lock_page(page);
>>>> -       ret = split_huge_page(page);
>>>> +       ret = split_huge_page_to_list_to_order(page, NULL, new_order);
>>>>         unlock_page(page);
>>>>
>>>>         if (ret && release)
>>>> @@ -2280,6 +2281,7 @@ int memory_failure(unsigned long pfn, int flags)
>>>>         folio_unlock(folio);
>>>>
>>>>         if (folio_test_large(folio)) {
>>>> +               int new_order = min_order_for_split(folio);
>>>>                 /*
>>>>                  * The flag must be set after the refcount is bumped
>>>>                  * otherwise it may race with THP split.
>>>> @@ -2294,7 +2296,14 @@ int memory_failure(unsigned long pfn, int flags)
>>>>                  * page is a valid handlable page.
>>>>                  */
>>>>                 folio_set_has_hwpoisoned(folio);
>>>> -               if (try_to_split_thp_page(p, false) < 0) {
>>>> +               /*
>>>> +                * If the folio cannot be split to order-0, kill the process,
>>>> +                * but split the folio anyway to minimize the amount of unusable
>>>> +                * pages.
>>>> +                */
>>>> +               if (try_to_split_thp_page(p, new_order, false) || new_order) {
>>>
>>> folio split will clear PG_has_hwpoisoned flag. It is ok for splitting
>>> to order-0 folios because the PG_hwpoisoned flag is set on the
>>> poisoned page. But if you split the folio to some smaller order large
>>> folios, it seems you need to keep PG_has_hwpoisoned flag on the
>>> poisoned folio.
>>
>> OK, this means all pages in a folio with folio_test_has_hwpoisoned() should be
>> checked to be able to set after-split folio's flag properly. Current folio
>> split code does not do that. I am thinking about whether that causes any
>> issue. Probably not, because:
>>
>> 1. before Patch 1 is applied, large after-split folios are already causing
>> a warning in memory_failure(). That kinda masks this issue.
>> 2. after Patch 1 is applied, no large after-split folios will appear,
>> since the split will fail.
>
> I'm a little bit confused. Didn't this patch split large folio to
> new-order-large-folio (new order is min order)? So this patch had
> code:
> if (try_to_split_thp_page(p, new_order, false) || new_order) {

Yes, but this is Patch 2 in this series. Patch 1 is
"mm/huge_memory: do not change split_huge_page*() target order silently."
and sent separately as a hotfix[1].

Patch 2 and 3 in this series will be sent later when 1) Patch 1 is merged,
and 2) a prerequisite patch to address the issue you mentioned above is added
long with them.

[1] https://lore.kernel.org/linux-mm/20251017013630.139907-1-ziy@nvidia.com/

>
> Thanks,
> Yang
>
>>
>> @Miaohe and @Jane, please let me know if my above reasoning makes sense or not.
>>
>> To make this patch right, folio's has_hwpoisoned flag needs to be preserved
>> like what Yang described above. My current plan is to move
>> folio_clear_has_hwpoisoned(folio) into __split_folio_to_order() and
>> scan every page in the folio if the folio's has_hwpoisoned is set.
>> There will be redundant scans in non uniform split case, since a has_hwpoisoned
>> folio can be split multiple times (leading to multiple page scans), unless
>> the scan result is stored.
>>
>> @Miaohe and @Jane, is it possible to have multiple HW poisoned pages in
>> a folio? Is the memory failure process like 1) page access causing MCE,
>> 2) memory_failure() is used to handle it and split the large folio containing
>> it? Or multiple MCEs can be received and multiple pages in a folio are marked
>> then a split would happen?
>>
>>>
>>> Yang
>>>
>>>
>>>> +                       /* get folio again in case the original one is split */
>>>> +                       folio = page_folio(p);
>>>>                         res = -EHWPOISON;
>>>>                         kill_procs_now(p, pfn, flags, folio);
>>>>                         put_page(p);
>>>> @@ -2621,7 +2630,15 @@ static int soft_offline_in_use_page(struct page *page)
>>>>         };
>>>>
>>>>         if (!huge && folio_test_large(folio)) {
>>>> -               if (try_to_split_thp_page(page, true)) {
>>>> +               int new_order = min_order_for_split(folio);
>>>> +
>>>> +               /*
>>>> +                * If the folio cannot be split to order-0, do not split it at
>>>> +                * all to retain the still accessible large folio.
>>>> +                * NOTE: if getting free memory is perferred, split it like it
>>>> +                * is done in memory_failure().
>>>> +                */
>>>> +               if (new_order || try_to_split_thp_page(page, new_order, true)) {
>>>>                         pr_info("%#lx: thp split failed\n", pfn);
>>>>                         return -EBUSY;
>>>>                 }
>>>> --
>>>> 2.51.0
>>>>
>>>>
>>
>>
>> --
>> Best Regards,
>> Yan, Zi


--
Best Regards,
Yan, Zi
Re: [PATCH v2 2/3] mm/memory-failure: improve large block size folio handling.
Posted by David Hildenbrand 1 month, 4 weeks ago
On 21.10.25 03:23, Zi Yan wrote:
> On 20 Oct 2025, at 19:41, Yang Shi wrote:
> 
>> On Mon, Oct 20, 2025 at 12:46 PM Zi Yan <ziy@nvidia.com> wrote:
>>>
>>> On 17 Oct 2025, at 15:11, Yang Shi wrote:
>>>
>>>> On Wed, Oct 15, 2025 at 8:38 PM Zi Yan <ziy@nvidia.com> wrote:
>>>>>
>>>>> Large block size (LBS) folios cannot be split to order-0 folios but
>>>>> min_order_for_folio(). Current split fails directly, but that is not
>>>>> optimal. Split the folio to min_order_for_folio(), so that, after split,
>>>>> only the folio containing the poisoned page becomes unusable instead.
>>>>>
>>>>> For soft offline, do not split the large folio if it cannot be split to
>>>>> order-0. Since the folio is still accessible from userspace and premature
>>>>> split might lead to potential performance loss.
>>>>>
>>>>> Suggested-by: Jane Chu <jane.chu@oracle.com>
>>>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>>>>> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
>>>>> ---
>>>>>   mm/memory-failure.c | 25 +++++++++++++++++++++----
>>>>>   1 file changed, 21 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>>>>> index f698df156bf8..443df9581c24 100644
>>>>> --- a/mm/memory-failure.c
>>>>> +++ b/mm/memory-failure.c
>>>>> @@ -1656,12 +1656,13 @@ static int identify_page_state(unsigned long pfn, struct page *p,
>>>>>    * there is still more to do, hence the page refcount we took earlier
>>>>>    * is still needed.
>>>>>    */
>>>>> -static int try_to_split_thp_page(struct page *page, bool release)
>>>>> +static int try_to_split_thp_page(struct page *page, unsigned int new_order,
>>>>> +               bool release)
>>>>>   {
>>>>>          int ret;
>>>>>
>>>>>          lock_page(page);
>>>>> -       ret = split_huge_page(page);
>>>>> +       ret = split_huge_page_to_list_to_order(page, NULL, new_order);
>>>>>          unlock_page(page);
>>>>>
>>>>>          if (ret && release)
>>>>> @@ -2280,6 +2281,7 @@ int memory_failure(unsigned long pfn, int flags)
>>>>>          folio_unlock(folio);
>>>>>
>>>>>          if (folio_test_large(folio)) {
>>>>> +               int new_order = min_order_for_split(folio);
>>>>>                  /*
>>>>>                   * The flag must be set after the refcount is bumped
>>>>>                   * otherwise it may race with THP split.
>>>>> @@ -2294,7 +2296,14 @@ int memory_failure(unsigned long pfn, int flags)
>>>>>                   * page is a valid handlable page.
>>>>>                   */
>>>>>                  folio_set_has_hwpoisoned(folio);
>>>>> -               if (try_to_split_thp_page(p, false) < 0) {
>>>>> +               /*
>>>>> +                * If the folio cannot be split to order-0, kill the process,
>>>>> +                * but split the folio anyway to minimize the amount of unusable
>>>>> +                * pages.
>>>>> +                */
>>>>> +               if (try_to_split_thp_page(p, new_order, false) || new_order) {
>>>>
>>>> folio split will clear PG_has_hwpoisoned flag. It is ok for splitting
>>>> to order-0 folios because the PG_hwpoisoned flag is set on the
>>>> poisoned page. But if you split the folio to some smaller order large
>>>> folios, it seems you need to keep PG_has_hwpoisoned flag on the
>>>> poisoned folio.
>>>
>>> OK, this means all pages in a folio with folio_test_has_hwpoisoned() should be
>>> checked to be able to set after-split folio's flag properly. Current folio
>>> split code does not do that. I am thinking about whether that causes any
>>> issue. Probably not, because:
>>>
>>> 1. before Patch 1 is applied, large after-split folios are already causing
>>> a warning in memory_failure(). That kinda masks this issue.
>>> 2. after Patch 1 is applied, no large after-split folios will appear,
>>> since the split will fail.
>>
>> I'm a little bit confused. Didn't this patch split large folio to
>> new-order-large-folio (new order is min order)? So this patch had
>> code:
>> if (try_to_split_thp_page(p, new_order, false) || new_order) {
> 
> Yes, but this is Patch 2 in this series. Patch 1 is
> "mm/huge_memory: do not change split_huge_page*() target order silently."
> and sent separately as a hotfix[1].

I'm confused now as well. I'd like to review, will there be a v3 that 
only contains patch #2+#3?

Thanks!

-- 
Cheers

David / dhildenb

Re: [PATCH v2 2/3] mm/memory-failure: improve large block size folio handling.
Posted by Zi Yan 1 month, 4 weeks ago
On 21 Oct 2025, at 11:44, David Hildenbrand wrote:

> On 21.10.25 03:23, Zi Yan wrote:
>> On 20 Oct 2025, at 19:41, Yang Shi wrote:
>>
>>> On Mon, Oct 20, 2025 at 12:46 PM Zi Yan <ziy@nvidia.com> wrote:
>>>>
>>>> On 17 Oct 2025, at 15:11, Yang Shi wrote:
>>>>
>>>>> On Wed, Oct 15, 2025 at 8:38 PM Zi Yan <ziy@nvidia.com> wrote:
>>>>>>
>>>>>> Large block size (LBS) folios cannot be split to order-0 folios but
>>>>>> min_order_for_folio(). Current split fails directly, but that is not
>>>>>> optimal. Split the folio to min_order_for_folio(), so that, after split,
>>>>>> only the folio containing the poisoned page becomes unusable instead.
>>>>>>
>>>>>> For soft offline, do not split the large folio if it cannot be split to
>>>>>> order-0. Since the folio is still accessible from userspace and premature
>>>>>> split might lead to potential performance loss.
>>>>>>
>>>>>> Suggested-by: Jane Chu <jane.chu@oracle.com>
>>>>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>>>>>> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
>>>>>> ---
>>>>>>   mm/memory-failure.c | 25 +++++++++++++++++++++----
>>>>>>   1 file changed, 21 insertions(+), 4 deletions(-)
>>>>>>
>>>>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>>>>>> index f698df156bf8..443df9581c24 100644
>>>>>> --- a/mm/memory-failure.c
>>>>>> +++ b/mm/memory-failure.c
>>>>>> @@ -1656,12 +1656,13 @@ static int identify_page_state(unsigned long pfn, struct page *p,
>>>>>>    * there is still more to do, hence the page refcount we took earlier
>>>>>>    * is still needed.
>>>>>>    */
>>>>>> -static int try_to_split_thp_page(struct page *page, bool release)
>>>>>> +static int try_to_split_thp_page(struct page *page, unsigned int new_order,
>>>>>> +               bool release)
>>>>>>   {
>>>>>>          int ret;
>>>>>>
>>>>>>          lock_page(page);
>>>>>> -       ret = split_huge_page(page);
>>>>>> +       ret = split_huge_page_to_list_to_order(page, NULL, new_order);
>>>>>>          unlock_page(page);
>>>>>>
>>>>>>          if (ret && release)
>>>>>> @@ -2280,6 +2281,7 @@ int memory_failure(unsigned long pfn, int flags)
>>>>>>          folio_unlock(folio);
>>>>>>
>>>>>>          if (folio_test_large(folio)) {
>>>>>> +               int new_order = min_order_for_split(folio);
>>>>>>                  /*
>>>>>>                   * The flag must be set after the refcount is bumped
>>>>>>                   * otherwise it may race with THP split.
>>>>>> @@ -2294,7 +2296,14 @@ int memory_failure(unsigned long pfn, int flags)
>>>>>>                   * page is a valid handlable page.
>>>>>>                   */
>>>>>>                  folio_set_has_hwpoisoned(folio);
>>>>>> -               if (try_to_split_thp_page(p, false) < 0) {
>>>>>> +               /*
>>>>>> +                * If the folio cannot be split to order-0, kill the process,
>>>>>> +                * but split the folio anyway to minimize the amount of unusable
>>>>>> +                * pages.
>>>>>> +                */
>>>>>> +               if (try_to_split_thp_page(p, new_order, false) || new_order) {
>>>>>
>>>>> folio split will clear PG_has_hwpoisoned flag. It is ok for splitting
>>>>> to order-0 folios because the PG_hwpoisoned flag is set on the
>>>>> poisoned page. But if you split the folio to some smaller order large
>>>>> folios, it seems you need to keep PG_has_hwpoisoned flag on the
>>>>> poisoned folio.
>>>>
>>>> OK, this means all pages in a folio with folio_test_has_hwpoisoned() should be
>>>> checked to be able to set after-split folio's flag properly. Current folio
>>>> split code does not do that. I am thinking about whether that causes any
>>>> issue. Probably not, because:
>>>>
>>>> 1. before Patch 1 is applied, large after-split folios are already causing
>>>> a warning in memory_failure(). That kinda masks this issue.
>>>> 2. after Patch 1 is applied, no large after-split folios will appear,
>>>> since the split will fail.
>>>
>>> I'm a little bit confused. Didn't this patch split large folio to
>>> new-order-large-folio (new order is min order)? So this patch had
>>> code:
>>> if (try_to_split_thp_page(p, new_order, false) || new_order) {
>>
>> Yes, but this is Patch 2 in this series. Patch 1 is
>> "mm/huge_memory: do not change split_huge_page*() target order silently."
>> and sent separately as a hotfix[1].
>
> I'm confused now as well. I'd like to review, will there be a v3 that only contains patch #2+#3?

Yes. The new V3 will have 3 patches:
1. a new patch addresses Yang’s concern on setting has_hwpoisoned on after-split
large folios.
2. patch#2,
3. patch#3.

The plan is to send them out once patch 1 is upstreamed. Let me know if you think
it is OK to send them out earlier as Andrew already picked up patch 1.

I also would like to get some feedback on my approach to setting has_hwpoisoned:

folio's has_hwpoisoned flag needs to be preserved
like what Yang described above. My current plan is to move
folio_clear_has_hwpoisoned(folio) into __split_folio_to_order() and
scan every page in the folio if the folio's has_hwpoisoned is set.
There will be redundant scans in non uniform split case, since a has_hwpoisoned
folio can be split multiple times (leading to multiple page scans), unless
the scan result is stored.

Best Regards,
Yan, Zi
Re: [PATCH v2 2/3] mm/memory-failure: improve large block size folio handling.
Posted by David Hildenbrand 1 month, 4 weeks ago
On 21.10.25 17:55, Zi Yan wrote:
> On 21 Oct 2025, at 11:44, David Hildenbrand wrote:
> 
>> On 21.10.25 03:23, Zi Yan wrote:
>>> On 20 Oct 2025, at 19:41, Yang Shi wrote:
>>>
>>>> On Mon, Oct 20, 2025 at 12:46 PM Zi Yan <ziy@nvidia.com> wrote:
>>>>>
>>>>> On 17 Oct 2025, at 15:11, Yang Shi wrote:
>>>>>
>>>>>> On Wed, Oct 15, 2025 at 8:38 PM Zi Yan <ziy@nvidia.com> wrote:
>>>>>>>
>>>>>>> Large block size (LBS) folios cannot be split to order-0 folios but
>>>>>>> min_order_for_folio(). Current split fails directly, but that is not
>>>>>>> optimal. Split the folio to min_order_for_folio(), so that, after split,
>>>>>>> only the folio containing the poisoned page becomes unusable instead.
>>>>>>>
>>>>>>> For soft offline, do not split the large folio if it cannot be split to
>>>>>>> order-0. Since the folio is still accessible from userspace and premature
>>>>>>> split might lead to potential performance loss.
>>>>>>>
>>>>>>> Suggested-by: Jane Chu <jane.chu@oracle.com>
>>>>>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>>>>>>> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
>>>>>>> ---
>>>>>>>    mm/memory-failure.c | 25 +++++++++++++++++++++----
>>>>>>>    1 file changed, 21 insertions(+), 4 deletions(-)
>>>>>>>
>>>>>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>>>>>>> index f698df156bf8..443df9581c24 100644
>>>>>>> --- a/mm/memory-failure.c
>>>>>>> +++ b/mm/memory-failure.c
>>>>>>> @@ -1656,12 +1656,13 @@ static int identify_page_state(unsigned long pfn, struct page *p,
>>>>>>>     * there is still more to do, hence the page refcount we took earlier
>>>>>>>     * is still needed.
>>>>>>>     */
>>>>>>> -static int try_to_split_thp_page(struct page *page, bool release)
>>>>>>> +static int try_to_split_thp_page(struct page *page, unsigned int new_order,
>>>>>>> +               bool release)
>>>>>>>    {
>>>>>>>           int ret;
>>>>>>>
>>>>>>>           lock_page(page);
>>>>>>> -       ret = split_huge_page(page);
>>>>>>> +       ret = split_huge_page_to_list_to_order(page, NULL, new_order);
>>>>>>>           unlock_page(page);
>>>>>>>
>>>>>>>           if (ret && release)
>>>>>>> @@ -2280,6 +2281,7 @@ int memory_failure(unsigned long pfn, int flags)
>>>>>>>           folio_unlock(folio);
>>>>>>>
>>>>>>>           if (folio_test_large(folio)) {
>>>>>>> +               int new_order = min_order_for_split(folio);
>>>>>>>                   /*
>>>>>>>                    * The flag must be set after the refcount is bumped
>>>>>>>                    * otherwise it may race with THP split.
>>>>>>> @@ -2294,7 +2296,14 @@ int memory_failure(unsigned long pfn, int flags)
>>>>>>>                    * page is a valid handlable page.
>>>>>>>                    */
>>>>>>>                   folio_set_has_hwpoisoned(folio);
>>>>>>> -               if (try_to_split_thp_page(p, false) < 0) {
>>>>>>> +               /*
>>>>>>> +                * If the folio cannot be split to order-0, kill the process,
>>>>>>> +                * but split the folio anyway to minimize the amount of unusable
>>>>>>> +                * pages.
>>>>>>> +                */
>>>>>>> +               if (try_to_split_thp_page(p, new_order, false) || new_order) {
>>>>>>
>>>>>> folio split will clear PG_has_hwpoisoned flag. It is ok for splitting
>>>>>> to order-0 folios because the PG_hwpoisoned flag is set on the
>>>>>> poisoned page. But if you split the folio to some smaller order large
>>>>>> folios, it seems you need to keep PG_has_hwpoisoned flag on the
>>>>>> poisoned folio.
>>>>>
>>>>> OK, this means all pages in a folio with folio_test_has_hwpoisoned() should be
>>>>> checked to be able to set after-split folio's flag properly. Current folio
>>>>> split code does not do that. I am thinking about whether that causes any
>>>>> issue. Probably not, because:
>>>>>
>>>>> 1. before Patch 1 is applied, large after-split folios are already causing
>>>>> a warning in memory_failure(). That kinda masks this issue.
>>>>> 2. after Patch 1 is applied, no large after-split folios will appear,
>>>>> since the split will fail.
>>>>
>>>> I'm a little bit confused. Didn't this patch split large folio to
>>>> new-order-large-folio (new order is min order)? So this patch had
>>>> code:
>>>> if (try_to_split_thp_page(p, new_order, false) || new_order) {
>>>
>>> Yes, but this is Patch 2 in this series. Patch 1 is
>>> "mm/huge_memory: do not change split_huge_page*() target order silently."
>>> and sent separately as a hotfix[1].
>>
>> I'm confused now as well. I'd like to review, will there be a v3 that only contains patch #2+#3?
> 
> Yes. The new V3 will have 3 patches:
> 1. a new patch addresses Yang’s concern on setting has_hwpoisoned on after-split
> large folios.
> 2. patch#2,
> 3. patch#3.

Okay, I'll wait with the review until you resend :)

> 
> The plan is to send them out once patch 1 is upstreamed. Let me know if you think
> it is OK to send them out earlier as Andrew already picked up patch 1.

It's in mm/mm-new + mm/mm-unstable, AFAIKT. So sure, send it against one 
of the tress (I prefer mm-unstable but usually we should target mm-new).

> 
> I also would like to get some feedback on my approach to setting has_hwpoisoned:
> 
> folio's has_hwpoisoned flag needs to be preserved
> like what Yang described above. My current plan is to move
> folio_clear_has_hwpoisoned(folio) into __split_folio_to_order() and
> scan every page in the folio if the folio's has_hwpoisoned is set.

Oh, that's nasty indeed ... will have to think about that a bit.

Maybe we can keep it simple and always set folio_set_has_hwpoisoned() on 
all split folios? Essentially turning it into a "maybe_has" semantics.

IIUC, the existing folio_stest_has_hwpoisoned users can deal with that?

-- 
Cheers

David / dhildenb

Re: [PATCH v2 2/3] mm/memory-failure: improve large block size folio handling.
Posted by Zi Yan 1 month, 4 weeks ago
On 21 Oct 2025, at 14:28, David Hildenbrand wrote:

> On 21.10.25 17:55, Zi Yan wrote:
>> On 21 Oct 2025, at 11:44, David Hildenbrand wrote:
>>
>>> On 21.10.25 03:23, Zi Yan wrote:
>>>> On 20 Oct 2025, at 19:41, Yang Shi wrote:
>>>>
>>>>> On Mon, Oct 20, 2025 at 12:46 PM Zi Yan <ziy@nvidia.com> wrote:
>>>>>>
>>>>>> On 17 Oct 2025, at 15:11, Yang Shi wrote:
>>>>>>
>>>>>>> On Wed, Oct 15, 2025 at 8:38 PM Zi Yan <ziy@nvidia.com> wrote:
>>>>>>>>
>>>>>>>> Large block size (LBS) folios cannot be split to order-0 folios but
>>>>>>>> min_order_for_folio(). Current split fails directly, but that is not
>>>>>>>> optimal. Split the folio to min_order_for_folio(), so that, after split,
>>>>>>>> only the folio containing the poisoned page becomes unusable instead.
>>>>>>>>
>>>>>>>> For soft offline, do not split the large folio if it cannot be split to
>>>>>>>> order-0. Since the folio is still accessible from userspace and premature
>>>>>>>> split might lead to potential performance loss.
>>>>>>>>
>>>>>>>> Suggested-by: Jane Chu <jane.chu@oracle.com>
>>>>>>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>>>>>>>> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
>>>>>>>> ---
>>>>>>>>    mm/memory-failure.c | 25 +++++++++++++++++++++----
>>>>>>>>    1 file changed, 21 insertions(+), 4 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>>>>>>>> index f698df156bf8..443df9581c24 100644
>>>>>>>> --- a/mm/memory-failure.c
>>>>>>>> +++ b/mm/memory-failure.c
>>>>>>>> @@ -1656,12 +1656,13 @@ static int identify_page_state(unsigned long pfn, struct page *p,
>>>>>>>>     * there is still more to do, hence the page refcount we took earlier
>>>>>>>>     * is still needed.
>>>>>>>>     */
>>>>>>>> -static int try_to_split_thp_page(struct page *page, bool release)
>>>>>>>> +static int try_to_split_thp_page(struct page *page, unsigned int new_order,
>>>>>>>> +               bool release)
>>>>>>>>    {
>>>>>>>>           int ret;
>>>>>>>>
>>>>>>>>           lock_page(page);
>>>>>>>> -       ret = split_huge_page(page);
>>>>>>>> +       ret = split_huge_page_to_list_to_order(page, NULL, new_order);
>>>>>>>>           unlock_page(page);
>>>>>>>>
>>>>>>>>           if (ret && release)
>>>>>>>> @@ -2280,6 +2281,7 @@ int memory_failure(unsigned long pfn, int flags)
>>>>>>>>           folio_unlock(folio);
>>>>>>>>
>>>>>>>>           if (folio_test_large(folio)) {
>>>>>>>> +               int new_order = min_order_for_split(folio);
>>>>>>>>                   /*
>>>>>>>>                    * The flag must be set after the refcount is bumped
>>>>>>>>                    * otherwise it may race with THP split.
>>>>>>>> @@ -2294,7 +2296,14 @@ int memory_failure(unsigned long pfn, int flags)
>>>>>>>>                    * page is a valid handlable page.
>>>>>>>>                    */
>>>>>>>>                   folio_set_has_hwpoisoned(folio);
>>>>>>>> -               if (try_to_split_thp_page(p, false) < 0) {
>>>>>>>> +               /*
>>>>>>>> +                * If the folio cannot be split to order-0, kill the process,
>>>>>>>> +                * but split the folio anyway to minimize the amount of unusable
>>>>>>>> +                * pages.
>>>>>>>> +                */
>>>>>>>> +               if (try_to_split_thp_page(p, new_order, false) || new_order) {
>>>>>>>
>>>>>>> folio split will clear PG_has_hwpoisoned flag. It is ok for splitting
>>>>>>> to order-0 folios because the PG_hwpoisoned flag is set on the
>>>>>>> poisoned page. But if you split the folio to some smaller order large
>>>>>>> folios, it seems you need to keep PG_has_hwpoisoned flag on the
>>>>>>> poisoned folio.
>>>>>>
>>>>>> OK, this means all pages in a folio with folio_test_has_hwpoisoned() should be
>>>>>> checked to be able to set after-split folio's flag properly. Current folio
>>>>>> split code does not do that. I am thinking about whether that causes any
>>>>>> issue. Probably not, because:
>>>>>>
>>>>>> 1. before Patch 1 is applied, large after-split folios are already causing
>>>>>> a warning in memory_failure(). That kinda masks this issue.
>>>>>> 2. after Patch 1 is applied, no large after-split folios will appear,
>>>>>> since the split will fail.
>>>>>
>>>>> I'm a little bit confused. Didn't this patch split large folio to
>>>>> new-order-large-folio (new order is min order)? So this patch had
>>>>> code:
>>>>> if (try_to_split_thp_page(p, new_order, false) || new_order) {
>>>>
>>>> Yes, but this is Patch 2 in this series. Patch 1 is
>>>> "mm/huge_memory: do not change split_huge_page*() target order silently."
>>>> and sent separately as a hotfix[1].
>>>
>>> I'm confused now as well. I'd like to review, will there be a v3 that only contains patch #2+#3?
>>
>> Yes. The new V3 will have 3 patches:
>> 1. a new patch addresses Yang’s concern on setting has_hwpoisoned on after-split
>> large folios.
>> 2. patch#2,
>> 3. patch#3.
>
> Okay, I'll wait with the review until you resend :)
>
>>
>> The plan is to send them out once patch 1 is upstreamed. Let me know if you think
>> it is OK to send them out earlier as Andrew already picked up patch 1.
>
> It's in mm/mm-new + mm/mm-unstable, AFAIKT. So sure, send it against one of the tress (I prefer mm-unstable but usually we should target mm-new).

Sure.
>
>>
>> I also would like to get some feedback on my approach to setting has_hwpoisoned:
>>
>> folio's has_hwpoisoned flag needs to be preserved
>> like what Yang described above. My current plan is to move
>> folio_clear_has_hwpoisoned(folio) into __split_folio_to_order() and
>> scan every page in the folio if the folio's has_hwpoisoned is set.
>
> Oh, that's nasty indeed ... will have to think about that a bit.
>
> Maybe we can keep it simple and always set folio_set_has_hwpoisoned() on all split folios? Essentially turning it into a "maybe_has" semantics.
>
> IIUC, the existing folio_stest_has_hwpoisoned users can deal with that?

folio_test_has_hwpoisoned() direct users are fine. They are shmem.c
and memory.c, where the former would copy data in PAGE_SIZE instead of folio size
and the latter would not install PMD entry for the folio (impossible to hit
this until we have > PMD mTHPs and split them to PMD THPs).

The caller of folio_contain_hwpoisoned_page(), which calls
folio_test_has_hwpoisoned(), would have issues:

1. shmem_write_begin() in shmem.c: it returns -EIO for shmem writes.
2. thp_underused() in huge_memory.c: it does not scan the folio.
3. shrink_folio_list() in vmscan.c: it does not reclaim large hwpoisoned folios.
4. do_migrate_range() in memory_hotplug.c: it skips the large hwpoisoned folios.

These behaviors are fine for folios truly containing hwpoisoned pages,
but might not be desirable for false positive cases. A scan to make sure
hwpoisoned pages are indeed present is inevitable. Rather than making
all callers to do the scan, scanning at split time might be better, IMHO.

Let me send a patchset with scanning at split time. Hopefully, more people
can chime in to provide feedbacks.


--
Best Regards,
Yan, Zi
Re: [PATCH v2 2/3] mm/memory-failure: improve large block size folio handling.
Posted by Yang Shi 1 month, 4 weeks ago
On Tue, Oct 21, 2025 at 11:58 AM Zi Yan <ziy@nvidia.com> wrote:
>
> On 21 Oct 2025, at 14:28, David Hildenbrand wrote:
>
> > On 21.10.25 17:55, Zi Yan wrote:
> >> On 21 Oct 2025, at 11:44, David Hildenbrand wrote:
> >>
> >>> On 21.10.25 03:23, Zi Yan wrote:
> >>>> On 20 Oct 2025, at 19:41, Yang Shi wrote:
> >>>>
> >>>>> On Mon, Oct 20, 2025 at 12:46 PM Zi Yan <ziy@nvidia.com> wrote:
> >>>>>>
> >>>>>> On 17 Oct 2025, at 15:11, Yang Shi wrote:
> >>>>>>
> >>>>>>> On Wed, Oct 15, 2025 at 8:38 PM Zi Yan <ziy@nvidia.com> wrote:
> >>>>>>>>
> >>>>>>>> Large block size (LBS) folios cannot be split to order-0 folios but
> >>>>>>>> min_order_for_folio(). Current split fails directly, but that is not
> >>>>>>>> optimal. Split the folio to min_order_for_folio(), so that, after split,
> >>>>>>>> only the folio containing the poisoned page becomes unusable instead.
> >>>>>>>>
> >>>>>>>> For soft offline, do not split the large folio if it cannot be split to
> >>>>>>>> order-0. Since the folio is still accessible from userspace and premature
> >>>>>>>> split might lead to potential performance loss.
> >>>>>>>>
> >>>>>>>> Suggested-by: Jane Chu <jane.chu@oracle.com>
> >>>>>>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
> >>>>>>>> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
> >>>>>>>> ---
> >>>>>>>>    mm/memory-failure.c | 25 +++++++++++++++++++++----
> >>>>>>>>    1 file changed, 21 insertions(+), 4 deletions(-)
> >>>>>>>>
> >>>>>>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> >>>>>>>> index f698df156bf8..443df9581c24 100644
> >>>>>>>> --- a/mm/memory-failure.c
> >>>>>>>> +++ b/mm/memory-failure.c
> >>>>>>>> @@ -1656,12 +1656,13 @@ static int identify_page_state(unsigned long pfn, struct page *p,
> >>>>>>>>     * there is still more to do, hence the page refcount we took earlier
> >>>>>>>>     * is still needed.
> >>>>>>>>     */
> >>>>>>>> -static int try_to_split_thp_page(struct page *page, bool release)
> >>>>>>>> +static int try_to_split_thp_page(struct page *page, unsigned int new_order,
> >>>>>>>> +               bool release)
> >>>>>>>>    {
> >>>>>>>>           int ret;
> >>>>>>>>
> >>>>>>>>           lock_page(page);
> >>>>>>>> -       ret = split_huge_page(page);
> >>>>>>>> +       ret = split_huge_page_to_list_to_order(page, NULL, new_order);
> >>>>>>>>           unlock_page(page);
> >>>>>>>>
> >>>>>>>>           if (ret && release)
> >>>>>>>> @@ -2280,6 +2281,7 @@ int memory_failure(unsigned long pfn, int flags)
> >>>>>>>>           folio_unlock(folio);
> >>>>>>>>
> >>>>>>>>           if (folio_test_large(folio)) {
> >>>>>>>> +               int new_order = min_order_for_split(folio);
> >>>>>>>>                   /*
> >>>>>>>>                    * The flag must be set after the refcount is bumped
> >>>>>>>>                    * otherwise it may race with THP split.
> >>>>>>>> @@ -2294,7 +2296,14 @@ int memory_failure(unsigned long pfn, int flags)
> >>>>>>>>                    * page is a valid handlable page.
> >>>>>>>>                    */
> >>>>>>>>                   folio_set_has_hwpoisoned(folio);
> >>>>>>>> -               if (try_to_split_thp_page(p, false) < 0) {
> >>>>>>>> +               /*
> >>>>>>>> +                * If the folio cannot be split to order-0, kill the process,
> >>>>>>>> +                * but split the folio anyway to minimize the amount of unusable
> >>>>>>>> +                * pages.
> >>>>>>>> +                */
> >>>>>>>> +               if (try_to_split_thp_page(p, new_order, false) || new_order) {
> >>>>>>>
> >>>>>>> folio split will clear PG_has_hwpoisoned flag. It is ok for splitting
> >>>>>>> to order-0 folios because the PG_hwpoisoned flag is set on the
> >>>>>>> poisoned page. But if you split the folio to some smaller order large
> >>>>>>> folios, it seems you need to keep PG_has_hwpoisoned flag on the
> >>>>>>> poisoned folio.
> >>>>>>
> >>>>>> OK, this means all pages in a folio with folio_test_has_hwpoisoned() should be
> >>>>>> checked to be able to set after-split folio's flag properly. Current folio
> >>>>>> split code does not do that. I am thinking about whether that causes any
> >>>>>> issue. Probably not, because:
> >>>>>>
> >>>>>> 1. before Patch 1 is applied, large after-split folios are already causing
> >>>>>> a warning in memory_failure(). That kinda masks this issue.
> >>>>>> 2. after Patch 1 is applied, no large after-split folios will appear,
> >>>>>> since the split will fail.
> >>>>>
> >>>>> I'm a little bit confused. Didn't this patch split large folio to
> >>>>> new-order-large-folio (new order is min order)? So this patch had
> >>>>> code:
> >>>>> if (try_to_split_thp_page(p, new_order, false) || new_order) {
> >>>>
> >>>> Yes, but this is Patch 2 in this series. Patch 1 is
> >>>> "mm/huge_memory: do not change split_huge_page*() target order silently."
> >>>> and sent separately as a hotfix[1].
> >>>
> >>> I'm confused now as well. I'd like to review, will there be a v3 that only contains patch #2+#3?
> >>
> >> Yes. The new V3 will have 3 patches:
> >> 1. a new patch addresses Yang’s concern on setting has_hwpoisoned on after-split
> >> large folios.
> >> 2. patch#2,
> >> 3. patch#3.
> >
> > Okay, I'll wait with the review until you resend :)
> >
> >>
> >> The plan is to send them out once patch 1 is upstreamed. Let me know if you think
> >> it is OK to send them out earlier as Andrew already picked up patch 1.
> >
> > It's in mm/mm-new + mm/mm-unstable, AFAIKT. So sure, send it against one of the tress (I prefer mm-unstable but usually we should target mm-new).
>
> Sure.
> >
> >>
> >> I also would like to get some feedback on my approach to setting has_hwpoisoned:
> >>
> >> folio's has_hwpoisoned flag needs to be preserved
> >> like what Yang described above. My current plan is to move
> >> folio_clear_has_hwpoisoned(folio) into __split_folio_to_order() and
> >> scan every page in the folio if the folio's has_hwpoisoned is set.
> >
> > Oh, that's nasty indeed ... will have to think about that a bit.
> >
> > Maybe we can keep it simple and always set folio_set_has_hwpoisoned() on all split folios? Essentially turning it into a "maybe_has" semantics.
> >
> > IIUC, the existing folio_stest_has_hwpoisoned users can deal with that?
>
> folio_test_has_hwpoisoned() direct users are fine. They are shmem.c
> and memory.c, where the former would copy data in PAGE_SIZE instead of folio size
> and the latter would not install PMD entry for the folio (impossible to hit
> this until we have > PMD mTHPs and split them to PMD THPs).
>
> The caller of folio_contain_hwpoisoned_page(), which calls
> folio_test_has_hwpoisoned(), would have issues:
>
> 1. shmem_write_begin() in shmem.c: it returns -EIO for shmem writes.
> 2. thp_underused() in huge_memory.c: it does not scan the folio.
> 3. shrink_folio_list() in vmscan.c: it does not reclaim large hwpoisoned folios.
> 4. do_migrate_range() in memory_hotplug.c: it skips the large hwpoisoned folios.
>
> These behaviors are fine for folios truly containing hwpoisoned pages,
> but might not be desirable for false positive cases. A scan to make sure
> hwpoisoned pages are indeed present is inevitable. Rather than making
> all callers to do the scan, scanning at split time might be better, IMHO.

Yeah, I was trying to figure out a simpler way too. For example, we
can defer to set this flag to page fault time when page fault sees the
poisoned page when installing PTEs. But it can't cover most of the
cases mentioned by Zi Yan above. We may run into them before any page
fault happens.

Thanks,
Yang

>
> Let me send a patchset with scanning at split time. Hopefully, more people
> can chime in to provide feedbacks.
>
>
> --
> Best Regards,
> Yan, Zi
Re: [PATCH v2 2/3] mm/memory-failure: improve large block size folio handling.
Posted by Lorenzo Stoakes 2 months ago
On Wed, Oct 15, 2025 at 11:34:51PM -0400, Zi Yan wrote:
> Large block size (LBS) folios cannot be split to order-0 folios but
> min_order_for_folio(). Current split fails directly, but that is not
> optimal. Split the folio to min_order_for_folio(), so that, after split,
> only the folio containing the poisoned page becomes unusable instead.
>
> For soft offline, do not split the large folio if it cannot be split to
> order-0. Since the folio is still accessible from userspace and premature
> split might lead to potential performance loss.
>
> Suggested-by: Jane Chu <jane.chu@oracle.com>
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
> ---
>  mm/memory-failure.c | 25 +++++++++++++++++++++----
>  1 file changed, 21 insertions(+), 4 deletions(-)
>
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index f698df156bf8..443df9581c24 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1656,12 +1656,13 @@ static int identify_page_state(unsigned long pfn, struct page *p,
>   * there is still more to do, hence the page refcount we took earlier
>   * is still needed.
>   */
> -static int try_to_split_thp_page(struct page *page, bool release)
> +static int try_to_split_thp_page(struct page *page, unsigned int new_order,
> +		bool release)
>  {
>  	int ret;
>
>  	lock_page(page);
> -	ret = split_huge_page(page);
> +	ret = split_huge_page_to_list_to_order(page, NULL, new_order);

I wonder if we need a wrapper for these list==NULL cases, as
split_huge_page_to_list_to_order suggests you always have a list provided... and
this is ugly :)

split_huge_page_to_order() seems good.

>  	unlock_page(page);
>
>  	if (ret && release)
> @@ -2280,6 +2281,7 @@ int memory_failure(unsigned long pfn, int flags)
>  	folio_unlock(folio);
>
>  	if (folio_test_large(folio)) {
> +		int new_order = min_order_for_split(folio);

Newline after decl?

>  		/*
>  		 * The flag must be set after the refcount is bumped
>  		 * otherwise it may race with THP split.
> @@ -2294,7 +2296,14 @@ int memory_failure(unsigned long pfn, int flags)
>  		 * page is a valid handlable page.
>  		 */
>  		folio_set_has_hwpoisoned(folio);
> -		if (try_to_split_thp_page(p, false) < 0) {
> +		/*
> +		 * If the folio cannot be split to order-0, kill the process,
> +		 * but split the folio anyway to minimize the amount of unusable
> +		 * pages.
> +		 */
> +		if (try_to_split_thp_page(p, new_order, false) || new_order) {

Please use /* release= */false here


I'm also not sure about the logic here, it feels unclear.

Something like:

	err = try_to_to_split_thp_page(p, new_order, /* release= */false);

		/*
		 * If the folio cannot be split, kill the process.
		 * If it can be split, but not to order-0, then this defeats the
		 * expectation that we do so, but we want the split to have been
		 * made to
		 */

	if (err || new_order > 0) {
	}


> +			/* get folio again in case the original one is split */
> +			folio = page_folio(p);
>  			res = -EHWPOISON;
>  			kill_procs_now(p, pfn, flags, folio);
>  			put_page(p);
> @@ -2621,7 +2630,15 @@ static int soft_offline_in_use_page(struct page *page)
>  	};
>
>  	if (!huge && folio_test_large(folio)) {
> -		if (try_to_split_thp_page(page, true)) {
> +		int new_order = min_order_for_split(folio);
> +
> +		/*
> +		 * If the folio cannot be split to order-0, do not split it at
> +		 * all to retain the still accessible large folio.
> +		 * NOTE: if getting free memory is perferred, split it like it

Typo perferred -> preferred.


> +		 * is done in memory_failure().

I'm confused as to your comment here though, we're not splitting it like
memory_failure()? We're splitting a. with release and b. only if we can target
order-0.

So how would this preference in any way be a thing that happens? :) I may be
missing something here.

> +		 */
> +		if (new_order || try_to_split_thp_page(page, new_order, true)) {

Same comment as above with /* release= */true.

You should pass 0 not new_order to try_to_split_thp_page() here as it has to be
0 for the function to be invoked and that's just obviously clearer.


>  			pr_info("%#lx: thp split failed\n", pfn);
>  			return -EBUSY;
>  		}
> --
> 2.51.0
>
Re: [PATCH v2 2/3] mm/memory-failure: improve large block size folio handling.
Posted by Zi Yan 1 month, 4 weeks ago
On 17 Oct 2025, at 5:33, Lorenzo Stoakes wrote:

> On Wed, Oct 15, 2025 at 11:34:51PM -0400, Zi Yan wrote:
>> Large block size (LBS) folios cannot be split to order-0 folios but
>> min_order_for_folio(). Current split fails directly, but that is not
>> optimal. Split the folio to min_order_for_folio(), so that, after split,
>> only the folio containing the poisoned page becomes unusable instead.
>>
>> For soft offline, do not split the large folio if it cannot be split to
>> order-0. Since the folio is still accessible from userspace and premature
>> split might lead to potential performance loss.
>>
>> Suggested-by: Jane Chu <jane.chu@oracle.com>
>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
>> ---
>>  mm/memory-failure.c | 25 +++++++++++++++++++++----
>>  1 file changed, 21 insertions(+), 4 deletions(-)
>>
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index f698df156bf8..443df9581c24 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -1656,12 +1656,13 @@ static int identify_page_state(unsigned long pfn, struct page *p,
>>   * there is still more to do, hence the page refcount we took earlier
>>   * is still needed.
>>   */
>> -static int try_to_split_thp_page(struct page *page, bool release)
>> +static int try_to_split_thp_page(struct page *page, unsigned int new_order,
>> +		bool release)
>>  {
>>  	int ret;
>>
>>  	lock_page(page);
>> -	ret = split_huge_page(page);
>> +	ret = split_huge_page_to_list_to_order(page, NULL, new_order);
>
> I wonder if we need a wrapper for these list==NULL cases, as
> split_huge_page_to_list_to_order suggests you always have a list provided... and
> this is ugly :)
>
> split_huge_page_to_order() seems good.

Yes, this suggestion motivated me to remove unused list==NULL parameter in
try_folio_split_to_order(). Thanks.

>
>>  	unlock_page(page);
>>
>>  	if (ret && release)
>> @@ -2280,6 +2281,7 @@ int memory_failure(unsigned long pfn, int flags)
>>  	folio_unlock(folio);
>>
>>  	if (folio_test_large(folio)) {
>> +		int new_order = min_order_for_split(folio);
>
> Newline after decl?

Sure.

>
>>  		/*
>>  		 * The flag must be set after the refcount is bumped
>>  		 * otherwise it may race with THP split.
>> @@ -2294,7 +2296,14 @@ int memory_failure(unsigned long pfn, int flags)
>>  		 * page is a valid handlable page.
>>  		 */
>>  		folio_set_has_hwpoisoned(folio);
>> -		if (try_to_split_thp_page(p, false) < 0) {
>> +		/*
>> +		 * If the folio cannot be split to order-0, kill the process,
>> +		 * but split the folio anyway to minimize the amount of unusable
>> +		 * pages.
>> +		 */
>> +		if (try_to_split_thp_page(p, new_order, false) || new_order) {
>
> Please use /* release= */false here

OK.

>
>
> I'm also not sure about the logic here, it feels unclear.
>
> Something like:
>
> 	err = try_to_to_split_thp_page(p, new_order, /* release= */false);
>
> 		/*
> 		 * If the folio cannot be split, kill the process.
> 		 * If it can be split, but not to order-0, then this defeats the
> 		 * expectation that we do so, but we want the split to have been
> 		 * made to
> 		 */
>
> 	if (err || new_order > 0) {
> 	}

Will make the change.

>
>
>> +			/* get folio again in case the original one is split */
>> +			folio = page_folio(p);
>>  			res = -EHWPOISON;
>>  			kill_procs_now(p, pfn, flags, folio);
>>  			put_page(p);
>> @@ -2621,7 +2630,15 @@ static int soft_offline_in_use_page(struct page *page)
>>  	};
>>
>>  	if (!huge && folio_test_large(folio)) {
>> -		if (try_to_split_thp_page(page, true)) {
>> +		int new_order = min_order_for_split(folio);
>> +
>> +		/*
>> +		 * If the folio cannot be split to order-0, do not split it at
>> +		 * all to retain the still accessible large folio.
>> +		 * NOTE: if getting free memory is perferred, split it like it
>
> Typo perferred -> preferred.

Got it.

>
>
>> +		 * is done in memory_failure().
>
> I'm confused as to your comment here though, we're not splitting it like
> memory_failure()? We're splitting a. with release and b. only if we can target
> order-0.
>
> So how would this preference in any way be a thing that happens? :) I may be
> missing something here.

For non LBS folios, min_order_for_split() returns 0. In that case, the split
would happen.

>
>> +		 */
>> +		if (new_order || try_to_split_thp_page(page, new_order, true)) {
>
> Same comment as above with /* release= */true.

Sure.

>
> You should pass 0 not new_order to try_to_split_thp_page() here as it has to be
> 0 for the function to be invoked and that's just obviously clearer.

OK. How about try_to_split_thp_page(page, /* new_order= */ 0, /* release= */ true)?
So that readers can tell 0 is the value of new_order.

>
>
>>  			pr_info("%#lx: thp split failed\n", pfn);
>>  			return -EBUSY;
>>  		}

Thank you for the feedback.


--
Best Regards,
Yan, Zi