include/linux/page-flags.h | 2 +- mm/memory-failure.c | 32 +++--------- mm/page_alloc.c | 101 +++++++++++++++++++++++++++++++++++++ 3 files changed, 108 insertions(+), 27 deletions(-)
At the end of dissolve_free_hugetlb_folio that a free HugeTLB folio becomes non-HugeTLB, it is released to buddy allocator as a high-order folio, e.g. a folio that contains 262144 pages if the folio was a 1G HugeTLB hugepage. This is problematic if the HugeTLB hugepage contained HWPoison subpages. In that case, since buddy allocator does not check HWPoison for non-zero-order folio, the raw HWPoison page can be given out with its buddy page and be re-used by either kernel or userspace. Memory failure recovery (MFR) in kernel does attempt to take raw HWPoison page off buddy allocator after dissolve_free_hugetlb_folio. However, there is always a time window between dissolve_free_hugetlb_folio frees a HWPoison high-order folio to buddy allocator and MFR takes HWPoison raw page off buddy allocator. One obvious way to avoid this problem is to add page sanity checks in page allocate or free path. However, it is against the past efforts to reduce sanity check overhead [1,2,3]. Introduce free_has_hwpoison_pages to only free the healthy pages and excludes the HWPoison ones in the high-order folio. The idea is to iterate through the sub-pages of the folio to identify contiguous ranges of healthy pages. Instead of freeing pages one by one, decompose healthy ranges into the largest possible blocks. Each block meets the requirements to be freed to buddy allocator by calling __free_frozen_pages directly. free_has_hwpoison_pages has linear time complexity O(N) wrt the number of pages in the folio. While the power-of-two decomposition ensures that the number of calls to the buddy allocator is logarithmic for each contiguous healthy range, the mandatory linear scan of pages to identify PageHWPoison defines the overall time complexity. I tested with some test-only code [4] and hugetlb-mfr [5], by checking the status of pcplist and freelist immediately after dissolve_free_hugetlb_folio a free hugetlb page that contains 3 HWPoison raw pages: * HWPoison pages are excluded by free_has_hwpoison_pages. * Some healthy pages can be in zone->per_cpu_pageset (pcplist) because pcp_count is not high enough. * Many healthy pages are already in some order's zone->free_area[order].free_list (freelist). * In rare cases, some healthy pages are in neither pcplist nor freelist. My best guest is they are allocated before the test checks. [1] https://lore.kernel.org/linux-mm/1460711275-1130-15-git-send-email-mgorman@techsingularity.net/ [2] https://lore.kernel.org/linux-mm/1460711275-1130-16-git-send-email-mgorman@techsingularity.net/ [3] https://lore.kernel.org/all/20230216095131.17336-1-vbabka@suse.cz [4] https://drive.google.com/file/d/1CzJn1Cc4wCCm183Y77h244fyZIkTLzCt/view?usp=sharing [5] https://lore.kernel.org/linux-mm/20251116013223.1557158-3-jiaqiyan@google.com Jiaqi Yan (3): mm/memory-failure: set has_hwpoisoned flags on HugeTLB folio mm/page_alloc: only free healthy pages in high-order HWPoison folio mm/memory-failure: simplify __page_handle_poison include/linux/page-flags.h | 2 +- mm/memory-failure.c | 32 +++--------- mm/page_alloc.c | 101 +++++++++++++++++++++++++++++++++++++ 3 files changed, 108 insertions(+), 27 deletions(-) -- 2.52.0.322.g1dd061c0dc-goog
On Fri, Dec 19, 2025 at 10:33 AM Jiaqi Yan <jiaqiyan@google.com> wrote: > > At the end of dissolve_free_hugetlb_folio that a free HugeTLB > folio becomes non-HugeTLB, it is released to buddy allocator > as a high-order folio, e.g. a folio that contains 262144 pages > if the folio was a 1G HugeTLB hugepage. > > This is problematic if the HugeTLB hugepage contained HWPoison > subpages. In that case, since buddy allocator does not check > HWPoison for non-zero-order folio, the raw HWPoison page can > be given out with its buddy page and be re-used by either > kernel or userspace. > > Memory failure recovery (MFR) in kernel does attempt to take > raw HWPoison page off buddy allocator after > dissolve_free_hugetlb_folio. However, there is always a time > window between dissolve_free_hugetlb_folio frees a HWPoison > high-order folio to buddy allocator and MFR takes HWPoison > raw page off buddy allocator. > > One obvious way to avoid this problem is to add page sanity > checks in page allocate or free path. However, it is against > the past efforts to reduce sanity check overhead [1,2,3]. > > Introduce free_has_hwpoison_pages to only free the healthy > pages and excludes the HWPoison ones in the high-order folio. > The idea is to iterate through the sub-pages of the folio to > identify contiguous ranges of healthy pages. Instead of freeing > pages one by one, decompose healthy ranges into the largest > possible blocks. Each block meets the requirements to be freed > to buddy allocator by calling __free_frozen_pages directly. > > free_has_hwpoison_pages has linear time complexity O(N) wrt the > number of pages in the folio. While the power-of-two decomposition > ensures that the number of calls to the buddy allocator is > logarithmic for each contiguous healthy range, the mandatory > linear scan of pages to identify PageHWPoison defines the > overall time complexity. > > I tested with some test-only code [4] and hugetlb-mfr [5], by > checking the status of pcplist and freelist immediately after > dissolve_free_hugetlb_folio a free hugetlb page that contains > 3 HWPoison raw pages: > > * HWPoison pages are excluded by free_has_hwpoison_pages. > > * Some healthy pages can be in zone->per_cpu_pageset (pcplist) > because pcp_count is not high enough. > > * Many healthy pages are already in some order's > zone->free_area[order].free_list (freelist). > > * In rare cases, some healthy pages are in neither pcplist > nor freelist. My best guest is they are allocated before > the test checks. Sorry, just realized changelog is missing. Appending it here: Changelog v1 [6] => v2: - Total reimplementation based on discussions with Mathew Wilcox, Harry Hoo, Zi Yan etc. - hugetlb_free_hwpoison_folio => free_has_hwpoison_pages. - Utilize has_hwpoisoned flag to tell buddy allocator a high-order folio contains HWPoison. - Simplify __page_handle_poison given that HWPoison page won't be freed within the high-order folio. > > [1] https://lore.kernel.org/linux-mm/1460711275-1130-15-git-send-email-mgorman@techsingularity.net/ > [2] https://lore.kernel.org/linux-mm/1460711275-1130-16-git-send-email-mgorman@techsingularity.net/ > [3] https://lore.kernel.org/all/20230216095131.17336-1-vbabka@suse.cz > [4] https://drive.google.com/file/d/1CzJn1Cc4wCCm183Y77h244fyZIkTLzCt/view?usp=sharing > [5] https://lore.kernel.org/linux-mm/20251116013223.1557158-3-jiaqiyan@google.com [6] https://lore.kernel.org/linux-mm/20251116014721.1561456-1-jiaqiyan@google.com > > Jiaqi Yan (3): > mm/memory-failure: set has_hwpoisoned flags on HugeTLB folio > mm/page_alloc: only free healthy pages in high-order HWPoison folio > mm/memory-failure: simplify __page_handle_poison > > include/linux/page-flags.h | 2 +- > mm/memory-failure.c | 32 +++--------- > mm/page_alloc.c | 101 +++++++++++++++++++++++++++++++++++++ > 3 files changed, 108 insertions(+), 27 deletions(-) > > -- > 2.52.0.322.g1dd061c0dc-goog >
© 2016 - 2026 Red Hat, Inc.