[PATCH] mm/memory-failure: Use zone_pcp_disable() for poison handling

Kaitao Cheng posted 1 patch 4 weeks, 1 day ago
mm/memory-failure.c | 18 +++---------------
1 file changed, 3 insertions(+), 15 deletions(-)
[PATCH] mm/memory-failure: Use zone_pcp_disable() for poison handling
Posted by Kaitao Cheng 4 weeks, 1 day ago
From: Kaitao Cheng <chengkaitao@kylinos.cn>

__page_handle_poison() used drain_all_pages() instead of
zone_pcp_disable() because dissolve_free_hugetlb_folio() could restore
HVO vmemmap pages and decrement hugetlb_optimize_vmemmap_key. That static
key update took cpu_hotplug_lock through static_key_slow_dec(), while
zone_pcp_disable() holds pcp_batch_high_lock. CPU hotplug takes the locks
in the opposite order through page_alloc_cpu_online/dead(), so the
combination could deadlock.

That dependency no longer exists. Commit da3e2d1ca43d ("mm/hugetlb:
remove hugetlb_optimize_vmemmap_key static key") removed the HVO static
key and the static_branch_dec() from hugetlb_vmemmap_restore_folio().
The dissolve_free_hugetlb_folio() path no longer reaches
static_key_slow_dec().

Use zone_pcp_disable() again while dissolving the hugetlb folio and
taking the target page off the buddy allocator. This prevents the drained
PCP lists from being refilled before take_page_off_buddy() runs, making
the page isolation deterministic.

Signed-off-by: Kaitao Cheng <chengkaitao@kylinos.cn>
---
 mm/memory-failure.c | 18 +++---------------
 1 file changed, 3 insertions(+), 15 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 866c4428ac7e..b9619d43173b 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -172,23 +172,11 @@ static int __page_handle_poison(struct page *page)
 {
 	int ret;
 
-	/*
-	 * zone_pcp_disable() can't be used here. It will
-	 * hold pcp_batch_high_lock and dissolve_free_hugetlb_folio() might hold
-	 * cpu_hotplug_lock via static_key_slow_dec() when hugetlb vmemmap
-	 * optimization is enabled. This will break current lock dependency
-	 * chain and leads to deadlock.
-	 * Disabling pcp before dissolving the page was a deterministic
-	 * approach because we made sure that those pages cannot end up in any
-	 * PCP list. Draining PCP lists expels those pages to the buddy system,
-	 * but nothing guarantees that those pages do not get back to a PCP
-	 * queue if we need to refill those.
-	 */
+	zone_pcp_disable(page_zone(page));
 	ret = dissolve_free_hugetlb_folio(page_folio(page));
-	if (!ret) {
-		drain_all_pages(page_zone(page));
+	if (!ret)
 		ret = take_page_off_buddy(page);
-	}
+	zone_pcp_enable(page_zone(page));
 
 	return ret;
 }
-- 
2.50.1 (Apple Git-155)
Re: [PATCH] mm/memory-failure: Use zone_pcp_disable() for poison handling
Posted by Miaohe Lin 4 weeks ago
On 2026/5/14 16:57, Kaitao Cheng wrote:
> From: Kaitao Cheng <chengkaitao@kylinos.cn>
> 
> __page_handle_poison() used drain_all_pages() instead of
> zone_pcp_disable() because dissolve_free_hugetlb_folio() could restore
> HVO vmemmap pages and decrement hugetlb_optimize_vmemmap_key. That static
> key update took cpu_hotplug_lock through static_key_slow_dec(), while
> zone_pcp_disable() holds pcp_batch_high_lock. CPU hotplug takes the locks
> in the opposite order through page_alloc_cpu_online/dead(), so the
> combination could deadlock.
> 
> That dependency no longer exists. Commit da3e2d1ca43d ("mm/hugetlb:
> remove hugetlb_optimize_vmemmap_key static key") removed the HVO static
> key and the static_branch_dec() from hugetlb_vmemmap_restore_folio().
> The dissolve_free_hugetlb_folio() path no longer reaches
> static_key_slow_dec().
> 
> Use zone_pcp_disable() again while dissolving the hugetlb folio and
> taking the target page off the buddy allocator. This prevents the drained
> PCP lists from being refilled before take_page_off_buddy() runs, making
> the page isolation deterministic.
> 
> Signed-off-by: Kaitao Cheng <chengkaitao@kylinos.cn>

Acked-by: Miaohe Lin <linmiaohe@huawei.com>

Thanks.
.
Re: [PATCH] mm/memory-failure: Use zone_pcp_disable() for poison handling
Posted by Oscar Salvador 4 weeks, 1 day ago
On Thu, May 14, 2026 at 04:57:54PM +0800, Kaitao Cheng wrote:
> From: Kaitao Cheng <chengkaitao@kylinos.cn>
> 
> __page_handle_poison() used drain_all_pages() instead of
> zone_pcp_disable() because dissolve_free_hugetlb_folio() could restore
> HVO vmemmap pages and decrement hugetlb_optimize_vmemmap_key. That static
> key update took cpu_hotplug_lock through static_key_slow_dec(), while
> zone_pcp_disable() holds pcp_batch_high_lock. CPU hotplug takes the locks
> in the opposite order through page_alloc_cpu_online/dead(), so the
> combination could deadlock.
> 
> That dependency no longer exists. Commit da3e2d1ca43d ("mm/hugetlb:
> remove hugetlb_optimize_vmemmap_key static key") removed the HVO static
> key and the static_branch_dec() from hugetlb_vmemmap_restore_folio().
> The dissolve_free_hugetlb_folio() path no longer reaches
> static_key_slow_dec().
> 
> Use zone_pcp_disable() again while dissolving the hugetlb folio and
> taking the target page off the buddy allocator. This prevents the drained
> PCP lists from being refilled before take_page_off_buddy() runs, making
> the page isolation deterministic.
> 
> Signed-off-by: Kaitao Cheng <chengkaitao@kylinos.cn>

Reviewed-by: Oscar Salvador <osalvador@suse.de>

 

-- 
Oscar Salvador
SUSE Labs