[PATCH v2 3/3] mm/memory-failure: simplify __page_handle_poison

Jiaqi Yan posted 3 patches 1 month, 3 weeks ago
[PATCH v2 3/3] mm/memory-failure: simplify __page_handle_poison
Posted by Jiaqi Yan 1 month, 3 weeks ago
Now that no HWPoison page will be given away to buddy allocator
at the end of dissolve_free_hugetlb_folio, there is no need to
drain_all_pages and take_page_off_buddy anymore, so remove them.

Also make __page_handle_poison return either 0 for success or
negative for failure, following the convention for functions
that perform an action.

Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
---
 mm/memory-failure.c | 31 +++++--------------------------
 1 file changed, 5 insertions(+), 26 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index d204de6c9792a..54ea840ded162 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -164,33 +164,12 @@ static DEFINE_MUTEX(pfn_space_lock);
 
 /*
  * Return values:
- *   1:   the page is dissolved (if needed) and taken off from buddy,
- *   0:   the page is dissolved (if needed) and not taken off from buddy,
+ *   = 0: the page is dissolved (if needed)
  *   < 0: failed to dissolve.
  */
 static int __page_handle_poison(struct page *page)
 {
-	int ret;
-
-	/*
-	 * zone_pcp_disable() can't be used here. It will
-	 * hold pcp_batch_high_lock and dissolve_free_hugetlb_folio() might hold
-	 * cpu_hotplug_lock via static_key_slow_dec() when hugetlb vmemmap
-	 * optimization is enabled. This will break current lock dependency
-	 * chain and leads to deadlock.
-	 * Disabling pcp before dissolving the page was a deterministic
-	 * approach because we made sure that those pages cannot end up in any
-	 * PCP list. Draining PCP lists expels those pages to the buddy system,
-	 * but nothing guarantees that those pages do not get back to a PCP
-	 * queue if we need to refill those.
-	 */
-	ret = dissolve_free_hugetlb_folio(page_folio(page));
-	if (!ret) {
-		drain_all_pages(page_zone(page));
-		ret = take_page_off_buddy(page);
-	}
-
-	return ret;
+	return dissolve_free_hugetlb_folio(page_folio(page));
 }
 
 static bool page_handle_poison(struct page *page, bool hugepage_or_freepage, bool release)
@@ -200,7 +179,7 @@ static bool page_handle_poison(struct page *page, bool hugepage_or_freepage, boo
 		 * Doing this check for free pages is also fine since
 		 * dissolve_free_hugetlb_folio() returns 0 for non-hugetlb folios as well.
 		 */
-		if (__page_handle_poison(page) <= 0)
+		if (__page_handle_poison(page) < 0)
 			/*
 			 * We could fail to take off the target page from buddy
 			 * for example due to racy page allocation, but that's
@@ -1174,7 +1153,7 @@ static int me_huge_page(struct page_state *ps, struct page *p)
 		 * subpages.
 		 */
 		folio_put(folio);
-		if (__page_handle_poison(p) > 0) {
+		if (!__page_handle_poison(p)) {
 			page_ref_inc(p);
 			res = MF_RECOVERED;
 		} else {
@@ -2067,7 +2046,7 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb
 	 */
 	if (res == 0) {
 		folio_unlock(folio);
-		if (__page_handle_poison(p) > 0) {
+		if (!__page_handle_poison(p)) {
 			page_ref_inc(p);
 			res = MF_RECOVERED;
 		} else {
-- 
2.52.0.322.g1dd061c0dc-goog
Re: [PATCH v2 3/3] mm/memory-failure: simplify __page_handle_poison
Posted by Matthew Wilcox 1 month, 2 weeks ago
On Fri, Dec 19, 2025 at 06:33:46PM +0000, Jiaqi Yan wrote:
> Now that no HWPoison page will be given away to buddy allocator
> at the end of dissolve_free_hugetlb_folio, there is no need to
> drain_all_pages and take_page_off_buddy anymore, so remove them.

What if we discover a hardware error in a page which is currently
in the buddy system?  Why don't we need to remove that page from the
buddy system?
Re: [PATCH v2 3/3] mm/memory-failure: simplify __page_handle_poison
Posted by Jiaqi Yan 1 month, 2 weeks ago
On Mon, Dec 22, 2025 at 2:12 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Fri, Dec 19, 2025 at 06:33:46PM +0000, Jiaqi Yan wrote:
> > Now that no HWPoison page will be given away to buddy allocator
> > at the end of dissolve_free_hugetlb_folio, there is no need to
> > drain_all_pages and take_page_off_buddy anymore, so remove them.
>
> What if we discover a hardware error in a page which is currently
> in the buddy system?  Why don't we need to remove that page from the
> buddy system?

Thanks for your comment.

memory_failure() explicitly handles is_free_buddy_page() case and is
removing from buddy allocator with take_page_off_buddy() directly.

However, when soft_offline_page() handles free page with
page_handle_poison(), removing from buddy allocator is missing with
this patch.

I will fix this in v3. One way is to split hugepage_or_freepage and
only take_page_off_buddy() for non-huge free page.