[v4] Optimize folio split in memory failure

[PATCH v4 0/3] Optimize folio split in memory failure

Posted by Zi Yan 3 months, 1 week ago

Hi all,

This patchset is a follow-up of "[PATCH v3] mm/huge_memory: do not change
split_huge_page*() target order silently."[1] and
[PATCH v4] mm/huge_memory: preserve PG_has_hwpoisoned if a folio is split
to >0 order[2], since both are separated out as hotfixes. It improves how
memory failure code handles large block size(LBS) folios with
min_order_for_split() > 0. By splitting a large folio containing HW
poisoned pages to min_order_for_split(), the after-split folios without
HW poisoned pages could be freed for reuse. To achieve this, folio split
code needs to set has_hwpoisoned on after-split folios containing HW
poisoned pages and it is done in the hotfix in [2].

This patchset includes:
1. A patch adds split_huge_page_to_order(),
2. Patch 2 and Patch 3 of "[PATCH v2 0/3] Do not change split folio target
   order"[3],

This patchset is based on mm-new.

Changelog
===
From V3[4]:
1. Patch, mm/huge_memory: preserve PG_has_hwpoisoned if a folio is split
   to >0 order, is sent separately as a hotfix[2].
2. made newly added new_order const in memory_failure() and
   soft_offline_in_use_page().
3. explained in a comment why in memory_failure() after-split >0 order
   folios are still treated as if the split failed.


From V2[3]:
1. Patch 1 is sent separately as a hotfix[1].
2. set has_hwpoisoned on after-split folios if any contains HW poisoned
   pages.
3. added split_huge_page_to_order().
4. added a missing newline after variable decalaration.
5. added /* release= */ to try_to_split_thp_page().
6. restructured try_to_split_thp_page() in memory_failure().
7. fixed a typo.
8. reworded the comment in soft_offline_in_use_page() for better
   understanding.


Link: https://lore.kernel.org/all/20251017013630.139907-1-ziy@nvidia.com/ [1]
Link: https://lore.kernel.org/all/20251023030521.473097-1-ziy@nvidia.com/ [2]
Link: https://lore.kernel.org/all/20251016033452.125479-1-ziy@nvidia.com/ [3]
Link: https://lore.kernel.org/all/20251022033531.389351-1-ziy@nvidia.com/ [4]

Zi Yan (3):
  mm/huge_memory: add split_huge_page_to_order()
  mm/memory-failure: improve large block size folio handling.
  mm/huge_memory: fix kernel-doc comments for folio_split() and related.

 include/linux/huge_mm.h | 22 ++++++++++++++++------
 mm/huge_memory.c        | 27 +++++++++++++++------------
 mm/memory-failure.c     | 31 +++++++++++++++++++++++++++----
 3 files changed, 58 insertions(+), 22 deletions(-)

-- 
2.43.0

Re: [PATCH v4 0/3] Optimize folio split in memory failure

Posted by Andrew Morton 3 months, 1 week ago

On Wed, 29 Oct 2025 21:40:17 -0400 Zi Yan <ziy@nvidia.com> wrote:

> This patchset is a follow-up of "[PATCH v3] mm/huge_memory: do not change
> split_huge_page*() target order silently."[1] and
> [PATCH v4] mm/huge_memory: preserve PG_has_hwpoisoned if a folio is split
> to >0 order[2], since both are separated out as hotfixes. It improves how
> memory failure code handles large block size(LBS) folios with
> min_order_for_split() > 0. By splitting a large folio containing HW
> poisoned pages to min_order_for_split(), the after-split folios without
> HW poisoned pages could be freed for reuse. To achieve this, folio split
> code needs to set has_hwpoisoned on after-split folios containing HW
> poisoned pages and it is done in the hotfix in [2].
> 
> This patchset includes:
> 1. A patch adds split_huge_page_to_order(),
> 2. Patch 2 and Patch 3 of "[PATCH v2 0/3] Do not change split folio target
>    order"[3],

Sorry, but best I can tell, none of this tells anyone anything about
this patchset!

Could we please have a [0/N] which provides the usual overview of these
three patches?

Please put yourself in the position of someone reading Linus's tree in
2028 wondering "hm, what does this series do".  All this short-term
transient patch-timing development-time stuff is of no interest to
them and is best placed below the ^---$ separator.

Thanks.

Re: [PATCH v4 0/3] Optimize folio split in memory failure

Posted by Zi Yan 3 months, 1 week ago

On 30 Oct 2025, at 23:42, Andrew Morton wrote:

> On Wed, 29 Oct 2025 21:40:17 -0400 Zi Yan <ziy@nvidia.com> wrote:
>
>> This patchset is a follow-up of "[PATCH v3] mm/huge_memory: do not change
>> split_huge_page*() target order silently."[1] and
>> [PATCH v4] mm/huge_memory: preserve PG_has_hwpoisoned if a folio is split
>> to >0 order[2], since both are separated out as hotfixes. It improves how
>> memory failure code handles large block size(LBS) folios with
>> min_order_for_split() > 0. By splitting a large folio containing HW
>> poisoned pages to min_order_for_split(), the after-split folios without
>> HW poisoned pages could be freed for reuse. To achieve this, folio split
>> code needs to set has_hwpoisoned on after-split folios containing HW
>> poisoned pages and it is done in the hotfix in [2].
>>
>> This patchset includes:
>> 1. A patch adds split_huge_page_to_order(),
>> 2. Patch 2 and Patch 3 of "[PATCH v2 0/3] Do not change split folio target
>>    order"[3],
>
> Sorry, but best I can tell, none of this tells anyone anything about
> this patchset!
>
> Could we please have a [0/N] which provides the usual overview of these
> three patches?
>
> Please put yourself in the position of someone reading Linus's tree in
> 2028 wondering "hm, what does this series do".  All this short-term
> transient patch-timing development-time stuff is of no interest to
> them and is best placed below the ^---$ separator.
>

How about?

The patchset optimizes folio split operations in memory failure code by:
always splitting a folio to min_order_for_split() to minimize unusable
pages, even if min_order_for_split() is non zero and memory failure code
would take the failed path eventually. This means instead of making
the entire original folio unusable memory failure code would only make
its after-split folio, with min_order_for_split() and containing the
page marked as HWPoisoned, unusable. For soft offline case, since the
original folio is still accessible, do not split it. In addition,
add split_huge_page_to_order() to improve code readability and fix
kernel-doc comment format for folio_split() and other related functions.

--
Best Regards,
Yan, Zi