From nobody Sun Feb 8 09:16:42 2026 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8E5E716A956 for ; Sun, 16 Nov 2025 01:47:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763257648; cv=none; b=EX4FzKSzSrFEZfdZJJvSnaB8I1jeXHQGUJQALC1mWLfoWDxoe2/fZTBUAnPRDANxxUJi5BijmdjPZfg8h2K0ZNjgvh2o5kv4MF7US1xSvlXVYvHup4lY+zxwcD/dhu/FaZhYl6knv8ytyEXy3JOlHYql9w/qlTPLeyW8SOoPOOw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763257648; c=relaxed/simple; bh=56cMUgnKf+QaHfDzv3rUqe4MzNOeCRhzSSrVCQXlcxc=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=lXIRqc2vq7VmX7tneQy47UXtPdH781CXmxp7cCfUkEm5nauPP+PP3awQto3pzV+gu1QUk1PfSxc1Xejj214v7bV0Hp9PDnzknX68gItnAC/cOkNM8LRHZWfuq3ylUgtYlzZj0cwrFcRZkSJWGMQxoUMJM0KIg2ziDLMv5B4kIuc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jiaqiyan.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=nPp+9hLO; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jiaqiyan.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="nPp+9hLO" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-34378c914b4so5889972a91.1 for ; Sat, 15 Nov 2025 17:47:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1763257646; x=1763862446; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=8iahc/OE7c9HoKRDc0duvnCB5sVq2ZelHCgN4s7hCPM=; b=nPp+9hLOz1wIoI4UevvUSi8X8yvg8fNaxbZTp43dXN7W6v2Ii7V66HOkjakWKYZzzE HacvoCw7gUm9aOjSnWIeJaiEkhsz/cq43eEcjb502ht9c7PjtEy8jwchjRxNEqdr2JRv EZjOdQ7S74krvRn2LcLwNrL7elhXiH1wWELb9pQle2VK+sr3yo3Qx1/5dK53QjICzcU1 kQIdHKHSTQ6XaScvt4n3nnBYWCMADFGjoqo6Ewk4UW7TQKB2bJw7+wbh+nr94RKyWfS1 IzFmKwNG3pxkrJGfxWAlBoW3w+TFiTTFlMVvaeRscgU1leaA4PFWM2qDCigqK0GInbr+ /zgA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1763257646; x=1763862446; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=8iahc/OE7c9HoKRDc0duvnCB5sVq2ZelHCgN4s7hCPM=; b=TsMV70riyGbsYhJ5c3pLbekUiwG9iqXFA8OO06DGWxOkHGQE3wZY5zCnFrVpFuheX/ dUeosK6FfnNkEDbrL66kqtkXdglSQULPmkoql/VrSkD0xfWEmNt1YAw+2Qf63JboATlp TjVFDlKprRCEHQxIFeYeCbJIS8XDgN/u+fLTKEazy/JmN0CYopHC76XTW+9QiiJTX7B7 6DOo+ergT/qFOz9jNllE7/MpEr6nlIBfxqv7x7rWslvLX5VUy62iYgyf3OFjAwyOdI70 EYTfiQf3amyZ+QeldgKDRIxHwpq9upi1KRv7xikdZstqruuAJCnrYQUmgwNPEE9hITu0 LvUg== X-Forwarded-Encrypted: i=1; AJvYcCU+nX+hmCjtP360vUwk39ndoM93aCIuk1Ijh12pU+KDJuSlZoYitY0Mw3GBRTDc2SRn77n7QnkT6csztlo=@vger.kernel.org X-Gm-Message-State: AOJu0YxPP9F2ZtFJj948iJjf1W4YI5l8qnz2Fpe5L31hd9Ryod3uWyHi wYxw0m//OITgb5RPVE5+TD3wXgSnljcP3cuuh/byMrg+kfynR0OfLjPME73M2FVsRU45jvBpIOd UZDc50yY4dB4AqQ== X-Google-Smtp-Source: AGHT+IENVvEwJMRWsbG8t8DKamJf+m7Kq67/5OJ3seeLWXwPuYI2slv5FtZcLQnrkEJ74Z2KnexNe0JoOpqTUg== X-Received: from pjod1.prod.google.com ([2002:a17:90a:8d81:b0:33b:b692:47b0]) (user=jiaqiyan job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:1a8a:b0:341:69e3:785a with SMTP id 98e67ed59e1d1-343f9fdf8efmr10663049a91.16.1763257645886; Sat, 15 Nov 2025 17:47:25 -0800 (PST) Date: Sun, 16 Nov 2025 01:47:20 +0000 In-Reply-To: <20251116014721.1561456-1-jiaqiyan@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20251116014721.1561456-1-jiaqiyan@google.com> X-Mailer: git-send-email 2.52.0.rc1.455.g30608eb744-goog Message-ID: <20251116014721.1561456-2-jiaqiyan@google.com> Subject: [PATCH v1 1/2] mm/huge_memory: introduce uniform_split_unmapped_folio_to_zero_order From: Jiaqi Yan To: nao.horiguchi@gmail.com, linmiaohe@huawei.com, ziy@nvidia.com Cc: david@redhat.com, lorenzo.stoakes@oracle.com, william.roche@oracle.com, harry.yoo@oracle.com, tony.luck@intel.com, wangkefeng.wang@huawei.com, willy@infradead.org, jane.chu@oracle.com, akpm@linux-foundation.org, osalvador@suse.de, muchun.song@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, Jiaqi Yan Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When freeing a high-order folio that contains HWPoison pages, to ensure these HWPoison pages are not added to buddy allocator, we can first uniformly split a free and unmapped high-order folio to 0-order folios first, then only add non-HWPoison folios to buddy allocator and exclude HWPoison ones. Introduce uniform_split_unmapped_folio_to_zero_order, a wrapper to the existing __split_unmapped_folio. Caller can use it to uniformly split an unmapped high-order folio into 0-order folios. No functional change. It will be used in a subsequent commit. Signed-off-by: Jiaqi Yan --- include/linux/huge_mm.h | 6 ++++++ mm/huge_memory.c | 8 ++++++++ 2 files changed, 14 insertions(+) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 71ac78b9f834f..ef6a84973e157 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -365,6 +365,7 @@ unsigned long thp_get_unmapped_area_vmflags(struct file= *filp, unsigned long add vm_flags_t vm_flags); =20 bool can_split_folio(struct folio *folio, int caller_pins, int *pextra_pin= s); +int uniform_split_unmapped_folio_to_zero_order(struct folio *folio); int split_huge_page_to_list_to_order(struct page *page, struct list_head *= list, unsigned int new_order); int min_order_for_split(struct folio *folio); @@ -569,6 +570,11 @@ can_split_folio(struct folio *folio, int caller_pins, = int *pextra_pins) { return false; } +static inline int uniform_split_unmapped_folio_to_zero_order(struct folio = *folio) +{ + VM_WARN_ON_ONCE_PAGE(1, page); + return -EINVAL; +} static inline int split_huge_page_to_list_to_order(struct page *page, struct list_head *list, unsigned int new_order) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 323654fb4f8cf..c7b6c1c75a18e 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3515,6 +3515,14 @@ static int __split_unmapped_folio(struct folio *foli= o, int new_order, return ret; } =20 +int uniform_split_unmapped_folio_to_zero_order(struct folio *folio) +{ + return __split_unmapped_folio(folio, /*new_order=3D*/0, + /*split_at=3D*/&folio->page, + /*xas=3D*/NULL, /*mapping=3D*/NULL, + /*uniform_split=3D*/true); +} + bool non_uniform_split_supported(struct folio *folio, unsigned int new_ord= er, bool warns) { --=20 2.52.0.rc1.455.g30608eb744-goog From nobody Sun Feb 8 09:16:42 2026 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1B26C218596 for ; Sun, 16 Nov 2025 01:47:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763257649; cv=none; b=qiPl49OrZuEMXchW8n1lR9mL/oLS3VKvZ6vluw+pcUPnN1tVs8evzZwobCvHDSn6WA+Fl7H/cTv2uWnf0ZN5xFXG7kOglWiqMAxM5/zPywKGuS2BPwzbwmkq2woAlKHb/EszSQqv4EdplpE9YF74jJPZ6y0G+cXMBdKF5j5Z2n0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763257649; c=relaxed/simple; bh=uyaZF+lF6L8ptszVVGKgVrp5Zdu7E4WsbkiWNa/nfUg=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=TXimZwybtZsR+R4Vm6acm4hUFnsC/lahw7iDcMHXL3EfsnhcoZrE2sFGyFnW0/hSVz5q7fYFyqvgoZjaZaERt8w3U9xoM2g81/mbYE/s7Xm0AXbSPLFbXYUwazLp4T1Xvlo9X87gCnv5XrUrO3JZSetF/AhrV1sUhX6peTqslVc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jiaqiyan.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=fMyDMWoz; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jiaqiyan.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="fMyDMWoz" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-34566e62f16so1412975a91.1 for ; Sat, 15 Nov 2025 17:47:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1763257647; x=1763862447; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=ydqY8kHaLqM8e+ckUd1GoyNEbjtXYhx/4GzlhdYJPLg=; b=fMyDMWozbrb5BfRiljWMQSZzEe0aWv1+q/mxVFZ/MyXodqTLv5Ma+z2zfuS9HLOiTG xFeQFwavoMrucVTvLUfk46/g9ItiNJ+vCzivLpZt1cRPBj/pH3japS31PhODAInsII8m 7W+GzfAn9VHj7GURMFHNk7OA2OExMBEuUKxefnELE3mmzzhX8glmDQ3C7Wz0YjyDIfzg Wo+EXkjCIj+Z4Y/AOEMFGkqoHFsIjI/IqMC5MKBC8E2lSpGJ1NlulKJjfViv+ttbDv6Q 2ke0FJt+3SYjNnPdx9Rc6BOKzUZsDDhMHtmdQ1Zi1pRD4ZloThQDfBwX3rWuYMUDImXO k6HA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1763257647; x=1763862447; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ydqY8kHaLqM8e+ckUd1GoyNEbjtXYhx/4GzlhdYJPLg=; b=fpyXQKByBAxT936jKkarS6B98PrrhaMNiyqWYOBu5n6YG1VLxQ9xVEn9cE/kRwHwsj unU+/z0VXVwzXQvWmM86vChNaN67VTOkuxkLmRc/ks0K5uL7r3X58PwoBTjmjh7V4fLg AXPpTtbYtmyeWMuxVWFWN2nBdW33KfK/4iZVrcEo76+Oyt+ABQ0j+tMS4jB+hW5ar+sq iIf5dVoRwZNN1zh1xYo05DP1o88aAl3AMSRIt9nRchLttJL4eo4DZkBF2MUJxEJPRQlX lb4T72cJHVpGPw4N/3WsiOO7Hh2UdYGQMOhuxuNIpRef9DtZdWUEbwqX4XCbTJeElHAk dYAg== X-Forwarded-Encrypted: i=1; AJvYcCUsa9MD6WKX7TvjoQBA6cFN01wK66tSuzf9X/ZEF3ONx3Bv+Q15FyRRtW/AkqVMdWdKS9t/Rhc98Uf146w=@vger.kernel.org X-Gm-Message-State: AOJu0YyY88VXr73FP8Drfn06LeKQWiOHX6XJI4BhhlthL5pusvt7E7Ff V2AwIpl2fMqXAZpl739uqDhx650zwFdvq5KeqTzkTL3jBJBZcllCylU8dSXS7DxfrVE+JlfShRn d7BdWd2VaocrFSg== X-Google-Smtp-Source: AGHT+IG+V3JoR1zxw8jLEaEAc7T7dwC4LvlgdWGc3WKTGU3XOduJztVqaChJIeTyJ4+Ey/Vrzth5e+wjM1k7sA== X-Received: from pjbnk23.prod.google.com ([2002:a17:90b:1957:b0:343:c010:4493]) (user=jiaqiyan job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:3fd0:b0:330:bca5:13d9 with SMTP id 98e67ed59e1d1-343fa732781mr7620970a91.32.1763257647429; Sat, 15 Nov 2025 17:47:27 -0800 (PST) Date: Sun, 16 Nov 2025 01:47:21 +0000 In-Reply-To: <20251116014721.1561456-1-jiaqiyan@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20251116014721.1561456-1-jiaqiyan@google.com> X-Mailer: git-send-email 2.52.0.rc1.455.g30608eb744-goog Message-ID: <20251116014721.1561456-3-jiaqiyan@google.com> Subject: [PATCH v1 2/2] mm/memory-failure: avoid free HWPoison high-order folio From: Jiaqi Yan To: nao.horiguchi@gmail.com, linmiaohe@huawei.com, ziy@nvidia.com Cc: david@redhat.com, lorenzo.stoakes@oracle.com, william.roche@oracle.com, harry.yoo@oracle.com, tony.luck@intel.com, wangkefeng.wang@huawei.com, willy@infradead.org, jane.chu@oracle.com, akpm@linux-foundation.org, osalvador@suse.de, muchun.song@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, Jiaqi Yan Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" At the end of dissolve_free_hugetlb_folio, when a free HugeTLB folio becomes non-HugeTLB, it is released to buddy allocator as a high-order folio, e.g. a folio that contains 262144 pages if the folio was a 1G HugeTLB hugepage. This is problematic if the HugeTLB hugepage contained HWPoison subpages. In that case, since buddy allocator does not check HWPoison for non-zero-order folio, the raw HWPoison page can be given out with its buddy page and be re-used by either kernel or userspace. Memory failure recovery (MFR) in kernel does attempt to take raw HWPoison page off buddy allocator after dissolve_free_hugetlb_folio. However, there is always a time window between freed to buddy allocator and taken off from buddy allocator. One obvious way to avoid this problem is to add page sanity checks in page allocate or free path. However, it is against the past efforts to reduce sanity check overhead [1,2,3]. Introduce hugetlb_free_hwpoison_folio to solve this problem. The idea is, in case a HugeTLB folio for sure contains HWPoison page(s), first split the non-HugeTLB high-order folio uniformly into 0-order folios, then let healthy pages join the buddy allocator while reject the HWPoison ones. [1] https://lore.kernel.org/linux-mm/1460711275-1130-15-git-send-email-mgor= man@techsingularity.net/ [2] https://lore.kernel.org/linux-mm/1460711275-1130-16-git-send-email-mgor= man@techsingularity.net/ [3] https://lore.kernel.org/all/20230216095131.17336-1-vbabka@suse.cz Signed-off-by: Jiaqi Yan --- include/linux/hugetlb.h | 4 ++++ mm/hugetlb.c | 8 ++++++-- mm/memory-failure.c | 43 +++++++++++++++++++++++++++++++++++++++++ 3 files changed, 53 insertions(+), 2 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 8e63e46b8e1f0..e1c334a7db2fe 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -870,8 +870,12 @@ int dissolve_free_hugetlb_folios(unsigned long start_p= fn, unsigned long end_pfn); =20 #ifdef CONFIG_MEMORY_FAILURE +extern void hugetlb_free_hwpoison_folio(struct folio *folio); extern void folio_clear_hugetlb_hwpoison(struct folio *folio); #else +static inline void hugetlb_free_hwpoison_folio(struct folio *folio) +{ +} static inline void folio_clear_hugetlb_hwpoison(struct folio *folio) { } diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 0455119716ec0..801ca1a14c0f0 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1596,6 +1596,7 @@ static void __update_and_free_hugetlb_folio(struct hs= tate *h, struct folio *folio) { bool clear_flag =3D folio_test_hugetlb_vmemmap_optimized(folio); + bool has_hwpoison =3D folio_test_hwpoison(folio); =20 if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported()) return; @@ -1638,12 +1639,15 @@ static void __update_and_free_hugetlb_folio(struct = hstate *h, * Move PageHWPoison flag from head page to the raw error pages, * which makes any healthy subpages reusable. */ - if (unlikely(folio_test_hwpoison(folio))) + if (unlikely(has_hwpoison)) folio_clear_hugetlb_hwpoison(folio); =20 folio_ref_unfreeze(folio, 1); =20 - hugetlb_free_folio(folio); + if (unlikely(has_hwpoison)) + hugetlb_free_hwpoison_folio(folio); + else + hugetlb_free_folio(folio); } =20 /* diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 3edebb0cda30b..e6a9deba6292a 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -2002,6 +2002,49 @@ int __get_huge_page_for_hwpoison(unsigned long pfn, = int flags, return ret; } =20 +void hugetlb_free_hwpoison_folio(struct folio *folio) +{ + struct folio *curr, *next; + struct folio *end_folio =3D folio_next(folio); + int ret; + + VM_WARN_ON_FOLIO(folio_ref_count(folio) !=3D 1, folio); + + ret =3D uniform_split_unmapped_folio_to_zero_order(folio); + if (ret) { + /* + * In case of split failure, none of the pages in folio + * will be freed to buddy allocator. + */ + pr_err("%#lx: failed to split free %d-order folio with HWPoison page(s):= %d\n", + folio_pfn(folio), folio_order(folio), ret); + return; + } + + /* Expect 1st folio's refcount=3D=3D1, and other's refcount=3D=3D0. */ + for (curr =3D folio; curr !=3D end_folio; curr =3D next) { + next =3D folio_next(curr); + + VM_WARN_ON_FOLIO(folio_order(curr), curr); + + if (PageHWPoison(&curr->page)) { + if (curr !=3D folio) + folio_ref_inc(curr); + + VM_WARN_ON_FOLIO(folio_ref_count(curr) !=3D 1, curr); + pr_warn("%#lx: prevented freeing HWPoison page\n", + folio_pfn(curr)); + continue; + } + + if (curr =3D=3D folio) + folio_ref_dec(curr); + + VM_WARN_ON_FOLIO(folio_ref_count(curr), curr); + free_frozen_pages(&curr->page, folio_order(curr)); + } +} + /* * Taking refcount of hugetlb pages needs extra care about race conditions * with basic operations like hugepage allocation/free/demotion. --=20 2.52.0.rc1.455.g30608eb744-goog