From nobody Wed Jun 10 22:49:18 2026 Received: from stravinsky.debian.org (stravinsky.debian.org [82.195.75.108]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EEE813F58EB; Tue, 9 Jun 2026 10:57:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=82.195.75.108 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781002641; cv=none; b=BuJFSlyeWkZ61vgrkd5MgdDtkdsRKUThYgKp3R5Nkaq4ijB4qNKbGxm+G+XDZc6KlEDyoKul3m1gURGfTygjPIGYZKKvUzMljzCJ6ZDlZQZxT321msQZ289IXWL0NYl4MAKcayhrjOXd0BsY4PRYgaXGqhrMm8EyQAr+0H/bO68= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781002641; c=relaxed/simple; bh=S/LBPsjK30OHelXFeIEq2OoL51toNBeqnKCcHXf3YZ8=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=DCxn8MepUMx0v7znuTsjZYlCnFgyXjnS0RH6+nPaW94PjEV6c9MNvQA349VvZh8t+0zdz7y4NTn2iMZiC4K8Nytn+jtnqyP4LlfxLoRkEtXP4X5LADBU2O58uySeXUbZJi+VhjG6Ds0UtWV0uH0D4gVIe42fVPTA/I/7EqTsVEI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=debian.org; spf=pass smtp.mailfrom=debian.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b=R2Ebasz7; arc=none smtp.client-ip=82.195.75.108 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=debian.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=debian.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b="R2Ebasz7" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=debian.org; s=smtpauto.stravinsky; h=X-Debian-User:Cc:To:In-Reply-To:References: Message-Id:Content-Transfer-Encoding:Content-Type:MIME-Version:Subject:Date: From:Reply-To:Content-ID:Content-Description; bh=fa8zz6R4p8oQR9YfNRRWZX08Hmu4ZhsxlCE5B6JMwdY=; b=R2Ebasz7sqx+PQI4Hl+rYq/Yah KTpeniLLengUedpR0JZxa5kHQkS/sXBigQJVRtOblpTfnfdBT7PlkvhZL2VTR5zqvINsMgKVKrZ46 qzxqWewlD80vhGHWzUC+QnReOX533BW5H8jSGhy+0WZfadKWzi0vZNbJNPhCY2GVozO5KpZNAQa+4 DTSWN/XhKQe06Y5LHdmR14pTdR5qTILMr5lEHJ0HJk/FdI+Z2sj866hO11kM7Dp7GbfK1Y9mD0CnH f1piZ1PerfnSuwVedlK4tkEItLxaeSBwIi/QUS+x5E3XjsOtDokiuWKgkeQx9RG//0zC3fU5qqxSL wTS/Cw1w==; Received: from authenticated-user by stravinsky.debian.org with esmtpsa (TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim 4.96) (envelope-from ) id 1wWu8w-008LtA-2j; Tue, 09 Jun 2026 10:57:15 +0000 From: Breno Leitao Date: Tue, 09 Jun 2026 03:56:55 -0700 Subject: [PATCH v9 1/6] mm/memory-failure: drop dead error_states[] entry for reserved pages Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260609-ecc_panic-v9-1-432a74002e74@debian.org> References: <20260609-ecc_panic-v9-0-432a74002e74@debian.org> In-Reply-To: <20260609-ecc_panic-v9-0-432a74002e74@debian.org> To: Miaohe Lin , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Shuah Khan , Naoya Horiguchi , Jonathan Corbet , Shuah Khan , "Liam R. Howlett" , lance.yang@linux.dev, Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , "Liam R. Howlett" Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, Breno Leitao , linux-trace-kernel@vger.kernel.org, kernel-team@meta.com X-Mailer: b4 0.14.3 X-Developer-Signature: v=1; a=openpgp-sha256; l=3199; i=leitao@debian.org; h=from:subject:message-id; bh=S/LBPsjK30OHelXFeIEq2OoL51toNBeqnKCcHXf3YZ8=; b=owEBbQKS/ZANAwAIATWjk5/8eHdtAcsmYgBqJ/F+RwM7FNvMEmtMWZ1Zg7fCTG+bxJiZsvZCe jH49HaDs1+JAjMEAAEIAB0WIQSshTmm6PRnAspKQ5s1o5Of/Hh3bQUCaifxfgAKCRA1o5Of/Hh3 bfWQD/99pUoutynRjvnbUr1KG39G7HbZBPVQotVsVqujJ9kMtd9WYO/9Mq3ApucmJ6rxrDOdobr yPsAWi5L8Wm73ds8YU2RpvtWBfvLX/YKGlouKI4VCMFSfMXgSpIVfvqToBnSnXnle1Pm6uN4aHG Su/eUPNTnkRnSCNIQZ3NoOJqX9xlrjiw87zP9aNgGsevb8IL5aye9meJZT6lwaJEa4WGQTryZVK 1EWkgs3iLPgAy8Nf5jktGpIypnon4nRmgrDPra9MiuftTp02o5z9W7InLcCTU8iHtkWuwiUGyw6 8+z+8GWgLrWP7kOSwnAsTpCKXoBjWaK0KMEEjgJm5q9GAMX3oI71isW9u6MfmWF5aOGBBpWAmw6 Fwj3o7IdL5b8hVzrjX9Yr+p+YmZ6bqBm7RPZu/JCnD5v2xHTCaL5LlSq9sfXGbcb5hl0y8zOyCN VfgEju0mpzTSYmG1UVCfqTfEoyYpaA3Yayl/HpNQYWnRS31qopEgKxfK8bUxEoA2V7QUBJLs9kb 9jeKdSDWaKGK11XjAbCoFgOXq4D3kBZY19n9Yw9Y8+XCFaazzzLLgWLIHKZUvhhKduqc1nzU9yX yfXU1OZKQWCSVA1cl/ULE1LKeZX7FwWiNh+J68MQDCHCvREc08UbvgpdecFdOpdsOyllqWBpmZL xRzlYtE7rZewwOw== X-Developer-Key: i=leitao@debian.org; a=openpgp; fpr=AC8539A6E8F46702CA4A439B35A3939FFC78776D X-Debian-User: leitao The first entry of error_states[], { reserved, reserved, MF_MSG_KERNEL, me_kernel }, is unreachable. identify_page_state() has two callers, and neither one can dispatch a PG_reserved page to me_kernel(): * memory_failure() reaches identify_page_state() only after get_hwpoison_page() returned 1. get_any_page() reaches that return only via __get_hwpoison_page(), which only takes a refcount when the page is HWPoisonHandlable(). HWPoisonHandlable() is an allowlist for LRU, free-buddy, and (for soft-offline) movable_ops pages -- PG_reserved pages do not satisfy any of these, so they fail with -EBUSY/-EIO long before identify_page_state() runs. * try_memory_failure_hugetlb() reaches identify_page_state() only via the MF_HUGETLB_IN_USED branch, where the page is necessarily a hugetlb folio. hugetlb folios don't carry PG_reserved at that point: hugetlb_folio_init_vmemmap() calls __folio_clear_reserved() during init, so the reserved entry would not match even if it were still present. me_kernel() never executes and the entry exists only to be matched against by code that cannot see it. Drop the entry, the me_kernel() helper, and the now-unused "reserved" macro. Leave the MF_MSG_KERNEL enum value in place: it remains part of the tracepoint and pr_err() string tables, and follow-on work to classify unrecoverable kernel pages can reuse it without churning the user-visible enum. No functional change. Suggested-by: David Hildenbrand Acked-by: David Hildenbrand (Arm) Reviewed-by: Lance Yang Acked-by: Miaohe Lin Signed-off-by: Breno Leitao --- mm/memory-failure.c | 14 -------------- 1 file changed, 14 deletions(-) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 51508a55c405..f4d3e6e20e13 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -980,17 +980,6 @@ static bool has_extra_refcount(struct page_state *ps, = struct page *p, return false; } =20 -/* - * Error hit kernel page. - * Do nothing, try to be lucky and not touch this instead. For a few cases= we - * could be more sophisticated. - */ -static int me_kernel(struct page_state *ps, struct page *p) -{ - unlock_page(p); - return MF_IGNORED; -} - /* * Page in unknown state. Do nothing. * This is a catch-all in case we fail to make sense of the page state. @@ -1199,10 +1188,8 @@ static int me_huge_page(struct page_state *ps, struc= t page *p) #define mlock (1UL << PG_mlocked) #define lru (1UL << PG_lru) #define head (1UL << PG_head) -#define reserved (1UL << PG_reserved) =20 static struct page_state error_states[] =3D { - { reserved, reserved, MF_MSG_KERNEL, me_kernel }, /* * free pages are specially detected outside this table: * PG_buddy pages only make a small fraction of all free pages. @@ -1234,7 +1221,6 @@ static struct page_state error_states[] =3D { #undef mlock #undef lru #undef head -#undef reserved =20 static void update_per_node_mf_stats(unsigned long pfn, enum mf_result result) --=20 2.53.0-Meta From nobody Wed Jun 10 22:49:18 2026 Received: from stravinsky.debian.org (stravinsky.debian.org [82.195.75.108]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D9EF13F58EB; Tue, 9 Jun 2026 10:57:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=82.195.75.108 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781002646; cv=none; b=D+e8KDyXhgwmvrsYzXY+oFUAQpFGQvcNJkvb7BjBjg2V6Pc/Gy+6F6tGNPDS5Q09gOaKlTs6TTWIBLQehTvwM2xOUb/rLcDgAS7IM4a2P9RKkE5el11M2Gr52E0j0N8S+yT7xsWriy6gDmVGMPQz0LnhpUQtpagqfn9XMDCyV8s= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781002646; c=relaxed/simple; bh=CUYyoKyV+ajjuf7tCF/I88MYnnmBVEA/RgWm86IO5uw=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=tU7kG6ULKEoY3Hl6gid8MlcjhuJIb9NZ5urCtKQPai7jAypLxfZ0VFKWZdoBzDETR8a0dd0g8HgBmmRZZXz+09Z8Xl4ELfx93ts1qWdE2mWu5HQkja0fW7eWErpWa5kyWt9i9z2tu5/DFvoKLT5k4ndb5CM8RZOQwqUOSBpPQws= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=debian.org; spf=pass smtp.mailfrom=debian.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b=H35lLF9x; arc=none smtp.client-ip=82.195.75.108 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=debian.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=debian.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b="H35lLF9x" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=debian.org; s=smtpauto.stravinsky; h=X-Debian-User:Cc:To:In-Reply-To:References: Message-Id:Content-Transfer-Encoding:Content-Type:MIME-Version:Subject:Date: From:Reply-To:Content-ID:Content-Description; bh=Gu/Cl4HXUPWLQ14somURAK/E71MHGU9VroHeZQlQXXU=; b=H35lLF9xEPajDqFU4+VwB37xTc Xgat1VC7Jj2yqNryAUasB3nFF+z1di3CWKHvTGOCvRP9p7Z/z8+m5bNvALYxKRGq8ZJlTI98R0hi0 7BHOnkaQlvbxuIWUfDhKhsiwGxonKB+GXFVhfqrjoi57KIAMMH2i+X6IjNNQ5TVcEUJjA5j+lIwvk sXT3S8A5Osdkm2p+2fqPT9xFK30mFz2y1mlNJJuMJNDXPnBfIBPIJHenvhj5LsMZ/A8gxVoysxz2W hau7FSlh2TC2ce1YP5qpP9dQzuQwWLERtcIkL4wt/0d/aY1RxzL9/cjLmoFkCUZ4UxuiM73+W5/fk mn6L/dLw==; Received: from authenticated-user by stravinsky.debian.org with esmtpsa (TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim 4.96) (envelope-from ) id 1wWu92-008LtC-2T; Tue, 09 Jun 2026 10:57:21 +0000 From: Breno Leitao Date: Tue, 09 Jun 2026 03:56:56 -0700 Subject: [PATCH v9 2/6] mm/memory-failure: surface unhandlable kernel pages as -ENOTRECOVERABLE Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260609-ecc_panic-v9-2-432a74002e74@debian.org> References: <20260609-ecc_panic-v9-0-432a74002e74@debian.org> In-Reply-To: <20260609-ecc_panic-v9-0-432a74002e74@debian.org> To: Miaohe Lin , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Shuah Khan , Naoya Horiguchi , Jonathan Corbet , Shuah Khan , "Liam R. Howlett" , lance.yang@linux.dev, Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , "Liam R. Howlett" Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, Breno Leitao , linux-trace-kernel@vger.kernel.org, kernel-team@meta.com X-Mailer: b4 0.14.3 X-Developer-Signature: v=1; a=openpgp-sha256; l=7083; i=leitao@debian.org; h=from:subject:message-id; bh=CUYyoKyV+ajjuf7tCF/I88MYnnmBVEA/RgWm86IO5uw=; b=owEBbQKS/ZANAwAIATWjk5/8eHdtAcsmYgBqJ/F/MwDyE/qXxp85Rs8ajHfuMkT4dYImzyuTA 6+g+maicm2JAjMEAAEIAB0WIQSshTmm6PRnAspKQ5s1o5Of/Hh3bQUCaifxfwAKCRA1o5Of/Hh3 bc/uD/9tc94sUHlvAtFvm6Cj+R3myfcwXShRskHU2E4luB9pG7qWI88aosNYa4uQLzJ2Wa5RjZb sFyLYrmE/QR8Cc2Nhd5wk2l4vN4nhfCdUVYQ5kG6xdMjjG+2Xh3Pojw8efEccTaSGZOEmewpA38 HzUE4ELNcij+Rc9uLB93CRld2/XIU3SajLSmy3FF4d9rUZ9U41SLIuO3xQfPe1iuMkwCUAeYIsC nTdAuPxBTVpJz+ZKHBFs3mwIsXEoooBycl5GPUC8Ss1PSLYd6Yk7cmxvX11Owm9UutRc54Ej1RP sn6mXpFu9wbByf54DDlbIqVOVEp7bjpZePGx5IN/b8eDOGHVkXr4SUVPEO/CfzBQdtgmWNtiuZ9 pgmPilNY0x7rPq7UkstS+lZTMewLq+5UJBUgTkHO3CWT9Mo7zs8JrvyrOPCVwbqCYDKhRwjT68+ G6SCT7PCeJ97s2n6VZQbWv282+OUz3dGxFZ9EL/BtzxrUvtsFIzPgldaauX/blaqlgJExUQYJlw OwD30kUi3j0197fAOrHbHjYveMfLGitHKNkbTdg3D/iYmNUno1VIpnfE7L2t0bhSGhFcv49GBJM u1JtvwqsbuR1yhXBmFg7bTrCxQOaA6ubFzeQozM698lmtGQnXq0mQ4YrhgOlDR2JxFUzfjDCzg/ Poj6q6wL/Pm8mHg== X-Developer-Key: i=leitao@debian.org; a=openpgp; fpr=AC8539A6E8F46702CA4A439B35A3939FFC78776D X-Debian-User: leitao get_any_page() collapses every HWPoisonHandlable() rejection into a single -EIO via the __get_hwpoison_page() -> -EBUSY -> shake_page() -> retry path. That is correct for the transient case (a userspace folio briefly off LRU during migration or compaction, which a later shake can drag back), but wrong for stable kernel-owned pages: slab, page-table, large-kmalloc and PG_reserved pages will never become HWPoisonHandlable(), so the retry loop is wasted work and the final -EIO loses the "this is structurally unrecoverable" information. memory_failure() then maps -EIO into MF_MSG_GET_HWPOISON, which the panic-on-unrecoverable sysctl deliberately does not act on. Introduce HWPoisonKernelOwned(), a small predicate that positively identifies pages the hwpoison handler cannot recover from: HWPoisonKernelOwned(p, flags) :=3D !(MF_SOFT_OFFLINE && page_has_movable_ops(p)) && (PageReserved(p) || PageSlab(head) || PageTable(head) || PageLargeKmalloc(head)) where head =3D compound_head(p). PG_reserved is a per-page flag (PF_NO_COMPOUND) and is tested on the page directly. The slab, page-table and large-kmalloc page-type bits are only stored on the head page, so those tests resolve the compound head first, then re-read compound_head(page) afterwards: a concurrent split or compound free that moves head invalidates the just-read flags and the loop retries. The lookup still takes no refcount, mirroring the rest of get_any_page(); the recheck closes the common split race, and a residual free->alloc->free in the same window can only mis-tag a genuinely poisoned page, never reclassify a handlable one. The MF_SOFT_OFFLINE / page_has_movable_ops() opt-out mirrors the same exception in HWPoisonHandlable(): soft-offline is allowed to migrate movable_ops pages even though they are not on the LRU, and we must not pre-empt that with an unrecoverable verdict. The list is intentionally not exhaustive. vmalloc and kernel-stack pages, for example, do not carry a page_type bit and would need a different oracle; they keep going through the existing retry path unchanged. This is the smallest set we can identify with certainty by page type. Wire the helper into the top of get_any_page() to short-circuit those pages before the retry loop runs. On a hit, drop the caller's MF_COUNT_INCREASED reference (if any) and return -ENOTRECOVERABLE straight away. Pages outside the helper's positive list still take the existing retry path and return -EIO, leaving operator-visible behaviour for those cases unchanged. Extend the unhandlable-page pr_err() to fire for either errno and update the get_hwpoison_page() kerneldoc to document the new return. memory_failure() still folds every negative return into MF_MSG_GET_HWPOISON via its existing "else if (res < 0)" branch, so this patch on its own only changes the errno that soft_offline_page() can propagate to its callers. A follow-up wires -ENOTRECOVERABLE through memory_failure() and reports MF_MSG_KERNEL for the unrecoverable cases, which is what the panic_on_unrecoverable_memory_failure sysctl observes. Suggested-by: David Hildenbrand Suggested-by: Lance Yang Signed-off-by: Breno Leitao --- mm/memory-failure.c | 60 +++++++++++++++++++++++++++++++++++++++++++++++++= ++-- 1 file changed, 58 insertions(+), 2 deletions(-) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index f4d3e6e20e13..eed9de387694 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1325,6 +1325,46 @@ static inline bool HWPoisonHandlable(struct page *pa= ge, unsigned long flags) return PageLRU(page) || is_free_buddy_page(page); } =20 +/* + * Positive identification of pages the hwpoison handler cannot recover. + * These page types are owned by kernel internals (no userspace mapping + * to unmap, no file mapping to invalidate, no migration target), so the + * shake_page() / retry loop in get_any_page() can never turn them into + * something HWPoisonHandlable() will accept. Short-circuit them to + * -ENOTRECOVERABLE so callers can panic on operator request instead of + * spinning through retries that exit as a transient-looking -EIO. + * + * The MF_SOFT_OFFLINE / page_has_movable_ops() opt-out mirrors + * HWPoisonHandlable(): soft-offline is allowed to migrate movable_ops + * pages even though they are not on the LRU. + */ +static inline bool HWPoisonKernelOwned(struct page *page, unsigned long fl= ags) +{ + struct page *head; + + if ((flags & MF_SOFT_OFFLINE) && page_has_movable_ops(page)) + return false; + + /* PG_reserved is a per-page flag, never set on a compound page. */ + if (PageReserved(page)) + return true; + + /* + * Page-type bits live only on the head page, so resolve any tail + * first. The check takes no refcount; recheck the head afterwards + * so a concurrent split or compound free cannot leave us trusting + * a stale view. A free->alloc->free in the same window is still + * possible but closing it would require taking a reference here. + */ +retry: + head =3D compound_head(page); + if (!(PageSlab(head) || PageTable(head) || PageLargeKmalloc(head))) + return false; + if (head !=3D compound_head(page)) + goto retry; + return true; +} + static int __get_hwpoison_page(struct page *page, unsigned long flags) { struct folio *folio =3D page_folio(page); @@ -1371,6 +1411,19 @@ static int get_any_page(struct page *p, unsigned lon= g flags) if (flags & MF_COUNT_INCREASED) count_increased =3D true; =20 + /* + * Page types we know are kernel-owned and cannot be recovered. + * Short-circuit before the shake_page() / retry loop, which + * cannot turn any of these into something HWPoisonHandlable(). + * Drop the caller's reference if MF_COUNT_INCREASED took one. + */ + if (HWPoisonKernelOwned(p, flags)) { + if (count_increased) + put_page(p); + ret =3D -ENOTRECOVERABLE; + goto out; + } + try_again: if (!count_increased) { ret =3D __get_hwpoison_page(p, flags); @@ -1418,7 +1471,7 @@ static int get_any_page(struct page *p, unsigned long= flags) ret =3D -EIO; } out: - if (ret =3D=3D -EIO) + if (ret =3D=3D -EIO || ret =3D=3D -ENOTRECOVERABLE) pr_err("%#lx: unhandlable page.\n", page_to_pfn(p)); =20 return ret; @@ -1475,7 +1528,10 @@ static int __get_unpoison_page(struct page *page) * -EIO for pages on which we can not handle memory errors, * -EBUSY when get_hwpoison_page() has raced with page lifecycle * operations like allocation and free, - * -EHWPOISON when the page is hwpoisoned and taken off from buddy. + * -EHWPOISON when the page is hwpoisoned and taken off from buddy, + * -ENOTRECOVERABLE for kernel-owned pages identified by + * HWPoisonKernelOwned() (PG_reserved, slab, + * page-table, large-kmalloc) that the handler cannot recover. */ static int get_hwpoison_page(struct page *p, unsigned long flags) { --=20 2.53.0-Meta From nobody Wed Jun 10 22:49:18 2026 Received: from stravinsky.debian.org (stravinsky.debian.org [82.195.75.108]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6E0652BEC3F; Tue, 9 Jun 2026 10:57:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=82.195.75.108 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781002651; cv=none; b=pHpJWsVOvvnJZuGiZ/sIZHOdcKZZt/3JlV4ScCiAwc/INKax2UaElfMtFT2spcFr/KxaXE/c5L1eseAJ0AD8Ma7KK8iGxSrqBMDNextzS+n6ylkkkZ0XwzNq72/eqFrWjPz6JXSiRSxWFywWwEPR1bs7tUmVf4AOLqKkqukvblw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781002651; c=relaxed/simple; bh=Oc/n/XwBBXtwbj6X9XE1i3pEwNRXj+wVbkUgEoDgdsE=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=eJRDQVxxXEG0LjMRMui0DF3IY1XY24C5ftMswLJA0u2Wd63BJ+pDh32fSdBeBeQvV8uobsDFtNZDsNdBfPhi4xLt3NteHVLKxLB6F2kJVRgPv1cylPdWIyOdVIrBLNb/8woFyPpvtLi8CPVKaoqziVogP2eUCvNl7F4UYbJh3JY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=debian.org; spf=pass smtp.mailfrom=debian.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b=Lpx7hj9G; arc=none smtp.client-ip=82.195.75.108 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=debian.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=debian.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b="Lpx7hj9G" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=debian.org; s=smtpauto.stravinsky; h=X-Debian-User:Cc:To:In-Reply-To:References: Message-Id:Content-Transfer-Encoding:Content-Type:MIME-Version:Subject:Date: From:Reply-To:Content-ID:Content-Description; bh=+nohVdFqLh53p7CXnhNyUtPR5DDNhEHlR6gaOs/wF1M=; b=Lpx7hj9G3s2TQTtB/xam5NsxlC y+WwN0BvnZBo543Eh7xDClDaakrH+ZmRXr2aiv7yjHHbwJclE6ePbiAc2H5WdRwT/SHi21ZbErK5g MJxu++r1QV0RILCD5OruTlkld+JYAp28rUdrXlv7r2ptbUB5XKmOsnhvAT9epcRyjb5+PlIWEXnho c+AyhQDWoetpWJA67YIMl96bVfDOkaH/e8Cn+EYS8K/ta9Mt5ACfoA7z1NvqxyymRovS4C2MOTcPM TGN2GDM+JrfhaZoaLqBFdTqhqCONTdSACimyc7w7wlYrGCJeCFOvQpZV+kgE8lu2PL8l5BzbFZfdP VdtDaaqQ==; Received: from authenticated-user by stravinsky.debian.org with esmtpsa (TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim 4.96) (envelope-from ) id 1wWu98-008LtT-1n; Tue, 09 Jun 2026 10:57:26 +0000 From: Breno Leitao Date: Tue, 09 Jun 2026 03:56:57 -0700 Subject: [PATCH v9 3/6] mm/memory-failure: report MF_MSG_KERNEL for unrecoverable kernel pages Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260609-ecc_panic-v9-3-432a74002e74@debian.org> References: <20260609-ecc_panic-v9-0-432a74002e74@debian.org> In-Reply-To: <20260609-ecc_panic-v9-0-432a74002e74@debian.org> To: Miaohe Lin , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Shuah Khan , Naoya Horiguchi , Jonathan Corbet , Shuah Khan , "Liam R. Howlett" , lance.yang@linux.dev, Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , "Liam R. Howlett" Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, Breno Leitao , linux-trace-kernel@vger.kernel.org, kernel-team@meta.com X-Mailer: b4 0.14.3 X-Developer-Signature: v=1; a=openpgp-sha256; l=2453; i=leitao@debian.org; h=from:subject:message-id; bh=Oc/n/XwBBXtwbj6X9XE1i3pEwNRXj+wVbkUgEoDgdsE=; b=owEBbQKS/ZANAwAIATWjk5/8eHdtAcsmYgBqJ/F/kF2/XCBiL5ksWydZps0h/w0J3/rHD35aY 2p6eZM0TB6JAjMEAAEIAB0WIQSshTmm6PRnAspKQ5s1o5Of/Hh3bQUCaifxfwAKCRA1o5Of/Hh3 bUotD/4kz5OLP+m2jSn6O6pSdolh2/muiWL7jwAJoIvQHPL9hMW15HTEPNeOJgT5BGvvsGw5tMH c6I4yqQuN7qu/HIlsjn7r+eIRt8e3RUi3FseWx0v4zAD8W7t4/gy32ASUAZDh2HQostVe6jODsI nHxByOIw8FvQmcG1EryDDz/zVlXGeU96DB/1gEdrEEKmLZc4LzDtz59Q6/UZVVwfk6Wb8S6v5y2 XFJPV8ZtuA0ARqrghH8HfJHtErdgKvEDjOWAwgUKfoM5+7Xb+cd83pai3fY9Jel840DP1DX6Qj8 vCdCONy8NtQrqEC5w8rMQlpxSpWcvYax9eBaXZM1v1NoMiKhAOgJ82VZFQCEsGP8Ce6HoDDN7/5 4cAN7QqGpLi0ntBoxiAqwO1kMirrciC+I918RJH4hS/VaFSM3krN1rplSDRktJtZWwoubTYrHgG 47EkJbJecMzwO6GiBlyV3+8MRCLugLAM5iWWUM5QAaZerKVrMfpoO3te5Gd0C7oqVDzoii2h7Tz c4qDqWrMImt0aCAfar6u2IaLjLnELIaoZS4CyOWUSoXIo7mIDuNXv/yvxy4mRVM/Q+6hKd/Pm+3 lwp1lVmZ7Y+rok9jWEi2fEy4FO/4U/EU7Dp7X2K+4r3fdOQoHhIm56F05PzJlFO7tkXuy5uvy3G SJpfI1qHxJBB3+Q== X-Developer-Key: i=leitao@debian.org; a=openpgp; fpr=AC8539A6E8F46702CA4A439B35A3939FFC78776D X-Debian-User: leitao The previous patch teaches get_any_page() to return -ENOTRECOVERABLE for stable unhandlable kernel pages (PG_reserved, slab, page tables, large-kmalloc). memory_failure() still folds every negative return into MF_MSG_GET_HWPOISON, so callers that want to react to the unrecoverable cases (a panic option, smarter logging) cannot tell them apart from transient page-allocator races. Turn the post-call branch into a switch over the get_hwpoison_page() return code: map -ENOTRECOVERABLE to MF_MSG_KERNEL and any other negative return to MF_MSG_GET_HWPOISON. case 0 keeps the existing free-buddy / kernel-high-order handling and case 1 falls through to the rest of memory_failure() unchanged. The MF_MSG_KERNEL label and tracepoint string are kept as "reserved kernel page" to avoid breaking userspace tools that match on those literals; the enum value still adequately tags the failure even though it now also covers slab, page tables and large-kmalloc pages. Suggested-by: David Hildenbrand Acked-by: David Hildenbrand (Arm) Acked-by: Miaohe Lin Signed-off-by: Breno Leitao --- mm/memory-failure.c | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index eed9de387694..35f2b5d89fbe 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -2444,7 +2444,8 @@ int memory_failure(unsigned long pfn, int flags) * that may make page_ref_freeze()/page_ref_unfreeze() mismatch. */ res =3D get_hwpoison_page(p, flags); - if (!res) { + switch (res) { + case 0: if (is_free_buddy_page(p)) { if (take_page_off_buddy(p)) { page_ref_inc(p); @@ -2463,7 +2464,19 @@ int memory_failure(unsigned long pfn, int flags) res =3D action_result(pfn, MF_MSG_KERNEL_HIGH_ORDER, MF_IGNORED); } goto unlock_mutex; - } else if (res < 0) { + case 1: + /* Got a refcount on a handlable page. */ + break; + case -ENOTRECOVERABLE: + /* + * Stable unhandlable kernel-owned page (PG_reserved, + * slab, page tables, large-kmalloc). + * No recovery possible. + */ + res =3D action_result(pfn, MF_MSG_KERNEL, MF_IGNORED); + goto unlock_mutex; + default: + /* Transient lifecycle race with the page allocator. */ res =3D action_result(pfn, MF_MSG_GET_HWPOISON, MF_IGNORED); goto unlock_mutex; } --=20 2.53.0-Meta From nobody Wed Jun 10 22:49:18 2026 Received: from stravinsky.debian.org (stravinsky.debian.org [82.195.75.108]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 035B93FBEAB; Tue, 9 Jun 2026 10:57:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=82.195.75.108 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781002657; cv=none; b=bL4KEqKz2ebavX/jVmCKK0wSA1ZEo00QEw1l5691Haw5iGrhgbJzfH3VPWFhZV499vzNx0mW/up9yN/8illpp4Ct0IlFVstFzjCxhyynnLUcXG58di5biIrm2yHfqh6HDzyt4mKgN/qXS9WtJAGSZFaeUOJt+tinWWc0h9bSXFU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781002657; c=relaxed/simple; bh=pptrxwgjoBdTEzfy0Zc6OZLCiUUl50m92rZCrLci1Ug=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=K4CRCU+qpzhGjTKXdBfDoGzOqzaU1JfxCXnUZihNJwNASySjyjbP6dUYjQCAFXVRq8nggqEdYkCi66dceaOx3OS8kajrYt8CmOO8ACjOpjO2CBH/NqlcmzUJF/HKD60LtVwQj51ml3v7XkXb/mJxAzLH9PBWOOIrN0F40/Z86ZE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=debian.org; spf=pass smtp.mailfrom=debian.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b=PPguWdWz; arc=none smtp.client-ip=82.195.75.108 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=debian.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=debian.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b="PPguWdWz" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=debian.org; s=smtpauto.stravinsky; h=X-Debian-User:Cc:To:In-Reply-To:References: Message-Id:Content-Transfer-Encoding:Content-Type:MIME-Version:Subject:Date: From:Reply-To:Content-ID:Content-Description; bh=BtvYFqoloPPtAp1pIQqyXcufUFMrAyAb/WE85RRt7eI=; b=PPguWdWz7YUz61OS+TRNLJgm6N 1aYxKyk2lPR2q4ICiU3HYfikJz1uv70/ogItEDn9gmsp+Oat4KqxF/NqEoAteLwN4GUMqOvDTYG1C w/w0kWaqhxYBKUiejwGfXJ04DM7P5xPHR1qRyfJEFEzIoGxWkX6gEL09fjIQWs4sRdQpnub5bXEg3 nZKWjrXdXI6Lihmj+S37+MSZbzq23Ih9z8c69msxUwjzUpDEy4NjhXZhuHjqB3mGkPEW2vczxI+Fu orXfRXPzQa8W7yq3DVfGLbKK3as3qZ9DWFB7iDp1ABfNScx0pnop689ILvS2RxYDVbBe6Acg1mlON wvKfwVgQ==; Received: from authenticated-user by stravinsky.debian.org with esmtpsa (TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim 4.96) (envelope-from ) id 1wWu9E-008Ltr-0F; Tue, 09 Jun 2026 10:57:32 +0000 From: Breno Leitao Date: Tue, 09 Jun 2026 03:56:58 -0700 Subject: [PATCH v9 4/6] mm/memory-failure: add panic option for unrecoverable pages Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260609-ecc_panic-v9-4-432a74002e74@debian.org> References: <20260609-ecc_panic-v9-0-432a74002e74@debian.org> In-Reply-To: <20260609-ecc_panic-v9-0-432a74002e74@debian.org> To: Miaohe Lin , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Shuah Khan , Naoya Horiguchi , Jonathan Corbet , Shuah Khan , "Liam R. Howlett" , lance.yang@linux.dev, Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , "Liam R. Howlett" Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, Breno Leitao , linux-trace-kernel@vger.kernel.org, kernel-team@meta.com X-Mailer: b4 0.14.3 X-Developer-Signature: v=1; a=openpgp-sha256; l=3152; i=leitao@debian.org; h=from:subject:message-id; bh=pptrxwgjoBdTEzfy0Zc6OZLCiUUl50m92rZCrLci1Ug=; b=owEBbQKS/ZANAwAIATWjk5/8eHdtAcsmYgBqJ/F/uyEq1JIbPVqIFYZev6obTOmTvOQvfMd3s zL4LNHIrhOJAjMEAAEIAB0WIQSshTmm6PRnAspKQ5s1o5Of/Hh3bQUCaifxfwAKCRA1o5Of/Hh3 bQCJD/9DiZd5c3g/931cj1m5gW6orYI3GnfV9eUGkKD0VFG3Jz7S+pA6eCWImC6gc2AVh6hhX63 p4KROpQub0l5Dhc2jNMdjr+jfYqmykBfADbdWk5RWZ/Z3mYPCQeDmFM8aA8fKE+Keoingbv5uAn +rL2SRe2YrBIGE11QHluVBU+B7EgRV8bWn/UDhwjMIjE63XA8ZmEbg2NUWdNyqqs0cOTRzgAntu AixH87/fyd7qfLrAtat670Lw52FvxGSiUJ6GVGS2WQH0z0qVQ6kjZvsH5/nA/Mkgoiccs6HISK1 IOei3GNsPLiEyxfZzUM1Jtp9u5xqr+MO4aclGMI+BAuP6SQV4Y14Fo8yNpl7SvnMxry7gM1eBK9 UlwNX0d+thLKRbqiv+o58OK82lVbKeHPNZ4TyH6fvHY/kbmyvU5oUwXoD7lLhXFg7Wrj7dN2UJO UhV9m4jx/DpmKu9zaGeYK/ffepGvKN1gh6GKU4kxdzXVs7O28j6YAC4BKLvob3V7WQ7+xd1455f Vupd5VI4DZWurmPRFQnookUOXJshHRiH9ADMHn2x7asuaRAOF2hVVNr0y9+rK05uFzD49ckvxjK Br6PEI94VQa0oA15IGfFrrKwVn6rAoAHFDy7v0HR9fyDUsGNDhRKCkMfgADwO6/QnSqLfTc6tVm 07+h3LVb5zfyi4g== X-Developer-Key: i=leitao@debian.org; a=openpgp; fpr=AC8539A6E8F46702CA4A439B35A3939FFC78776D X-Debian-User: leitao Add a sysctl panic_on_unrecoverable_memory_failure (disabled by default) that triggers a kernel panic when memory_failure() encounters pages that cannot be recovered. This provides a clean crash with useful debug information rather than allowing silent data corruption or a delayed crash at an unrelated code path. Panic eligibility is intentionally narrow: only MF_MSG_KERNEL with result =3D=3D MF_IGNORED panics. After the previous patch, MF_MSG_KERNEL covers PG_reserved pages and the kernel-owned pages promoted from get_hwpoison_page() via -ENOTRECOVERABLE (slab, page tables, large-kmalloc). All other action types are excluded: - MF_MSG_GET_HWPOISON and MF_MSG_KERNEL_HIGH_ORDER can be reached by transient refcount races with the page allocator (an in-flight buddy allocation has refcount 0 and is no longer on the buddy free list, briefly), and panicking on them would risk killing the box for what is actually a recoverable userspace page. - MF_MSG_UNKNOWN means identify_page_state() could not classify the page; that is precisely the wrong basis for a panic decision. Signed-off-by: Breno Leitao --- mm/memory-failure.c | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 35f2b5d89fbe..a8b466a48b02 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -74,6 +74,8 @@ static int sysctl_memory_failure_recovery __read_mostly = =3D 1; =20 static int sysctl_enable_soft_offline __read_mostly =3D 1; =20 +static int sysctl_panic_on_unrecoverable_mf __read_mostly; + atomic_long_t num_poisoned_pages __read_mostly =3D ATOMIC_LONG_INIT(0); =20 static bool hw_memory_failure __read_mostly =3D false; @@ -155,6 +157,15 @@ static const struct ctl_table memory_failure_table[] = =3D { .proc_handler =3D proc_dointvec_minmax, .extra1 =3D SYSCTL_ZERO, .extra2 =3D SYSCTL_ONE, + }, + { + .procname =3D "panic_on_unrecoverable_memory_failure", + .data =3D &sysctl_panic_on_unrecoverable_mf, + .maxlen =3D sizeof(sysctl_panic_on_unrecoverable_mf), + .mode =3D 0644, + .proc_handler =3D proc_dointvec_minmax, + .extra1 =3D SYSCTL_ZERO, + .extra2 =3D SYSCTL_ONE, } }; =20 @@ -1255,6 +1266,15 @@ static void update_per_node_mf_stats(unsigned long p= fn, ++mf_stats->total; } =20 +static bool panic_on_unrecoverable_mf(enum mf_action_page_type type, + enum mf_result result) +{ + if (!sysctl_panic_on_unrecoverable_mf) + return false; + + return type =3D=3D MF_MSG_KERNEL && result =3D=3D MF_IGNORED; +} + /* * "Dirty/Clean" indication is not 100% accurate due to the possibility of * setting PG_dirty outside page lock. See also comment above set_page_dir= ty(). @@ -1272,6 +1292,9 @@ static int action_result(unsigned long pfn, enum mf_a= ction_page_type type, pr_err("%#lx: recovery action for %s: %s\n", pfn, action_page_types[type], action_name[result]); =20 + if (panic_on_unrecoverable_mf(type, result)) + panic("Memory failure: %#lx: unrecoverable page", pfn); + return (result =3D=3D MF_RECOVERED || result =3D=3D MF_DELAYED) ? 0 : -EB= USY; } =20 --=20 2.53.0-Meta From nobody Wed Jun 10 22:49:18 2026 Received: from stravinsky.debian.org (stravinsky.debian.org [82.195.75.108]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8FD352E7BD3; Tue, 9 Jun 2026 10:57:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=82.195.75.108 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781002663; cv=none; b=ZGmFhUpC2Ah6kEZ7jHOzxlR8avOi2YHYIMw1hM2nKMebjUFcMt1L+UaRQogXGkmgDrzOvQ7xTAioSHdvJQ+B4TPbiJarPx9QDJMwYiEs67SfbP3wxYgdw7q/YzdeziknsHdobcHQuUOra784kE3hqZt9a4MKlJqRfrrDEeoGdFk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781002663; c=relaxed/simple; bh=SB9bQGwm6bnhG584f4s6nsshLf8IzQMWwkO39s0ginA=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=FMTghEwm5xWnB8TVFyjJN8U4ipokvb291IXlsIZmmRKmc8sVPjGzg1kwbVgulmydmVpfAfX/TehtB3J3pU8isI7sfSEZ+vaw8BWfzE/eThr/NFcbfAYRfMIwFS43BgVQU/AKtr3+/cMB0qpRnCT8728ydLw30xIualE8J4cC2+Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=debian.org; spf=pass smtp.mailfrom=debian.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b=noH0pTtG; arc=none smtp.client-ip=82.195.75.108 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=debian.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=debian.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b="noH0pTtG" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=debian.org; s=smtpauto.stravinsky; h=X-Debian-User:Cc:To:In-Reply-To:References: Message-Id:Content-Transfer-Encoding:Content-Type:MIME-Version:Subject:Date: From:Reply-To:Content-ID:Content-Description; bh=q9kUwYzm1ZRZIOnRUwI8Icw/M3d0mEYD6AM8mlorlt4=; b=noH0pTtGLkGxSnvbe6nc6FcdOl FQz/U7cQK21/M/zL9YFmjk81sl5LgUk8acLtfIUCiCU8leH4f0y0pxes3tgNJe9mhOROAgUTDuXON JdFPqaunWHGuN8I2ipyKmjESyfHlItL5i/QGPJmvONrApGFovih4pkp0dOW+aLyNi6tQGoTxY79i5 dooQUs+q80P9xMv9JSExhF1H44b70sgBc0rwBm0jsg6OAlmu0HQwtrZRAOHYRk3u+KSZKZzZNM1hW LyxycfEz/92NAHGwlddVMgA80GvcFxvV2Sjs83/eAl1EFZDHr8G8ZMw4257Usi3IIiuWBZfwU/lAE cnAJfniA==; Received: from authenticated-user by stravinsky.debian.org with esmtpsa (TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim 4.96) (envelope-from ) id 1wWu9J-008Lu6-27; Tue, 09 Jun 2026 10:57:38 +0000 From: Breno Leitao Date: Tue, 09 Jun 2026 03:56:59 -0700 Subject: [PATCH v9 5/6] Documentation: document panic_on_unrecoverable_memory_failure sysctl Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260609-ecc_panic-v9-5-432a74002e74@debian.org> References: <20260609-ecc_panic-v9-0-432a74002e74@debian.org> In-Reply-To: <20260609-ecc_panic-v9-0-432a74002e74@debian.org> To: Miaohe Lin , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Shuah Khan , Naoya Horiguchi , Jonathan Corbet , Shuah Khan , "Liam R. Howlett" , lance.yang@linux.dev, Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , "Liam R. Howlett" Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, Breno Leitao , linux-trace-kernel@vger.kernel.org, kernel-team@meta.com X-Mailer: b4 0.14.3 X-Developer-Signature: v=1; a=openpgp-sha256; l=5056; i=leitao@debian.org; h=from:subject:message-id; bh=SB9bQGwm6bnhG584f4s6nsshLf8IzQMWwkO39s0ginA=; b=owEBbQKS/ZANAwAIATWjk5/8eHdtAcsmYgBqJ/F/9OMrId6ol3yxlXgQ2vVelCiaG8GLdcHPK 78grchQEgCJAjMEAAEIAB0WIQSshTmm6PRnAspKQ5s1o5Of/Hh3bQUCaifxfwAKCRA1o5Of/Hh3 bZQwD/47vdGYgiYrq5oQdl1keqjiYpozEvjnvZLgHex8ev/dhiUC3NE2MZNkfs9vNx1LLPJ9tNT Imf054T26zg5f7GM7RUpHknCbZrCSiW1aWfWDF1MHDKqF7D6JrLVLM0mH5X1qOGxxD22kSBmxOV GJDRJUidqBB6gossbpeKaUTYzpCVKelTnP7x10Af9FIDxA/Dw04KVWqUm5PR7b7EJYAFR9sVPD3 YQsqlEJBNxBpMfQdVtpeI0iSITS6OCZ4dzX/hhgkWO+RxQiKCr7fIGxvY9UtcwLWgxe4DOJIXnL nrehAluVN3npEaFZsfy/7/KOGfyVinjf7V0HeqVsba0L3IN8AoHy4PMDd1sljshsd/EvJg7EzlA 7llj/dawYOQ9vuF3SYAvWa8araUrTcq0BonvXo45Kw7yaWFKA/uY3NQX0dS2643lqGc4ZIjFE5J +fLpWxYYoGz0chV6ltjKuTJaW3sASILPTnZ5ghHR2qy/qpmCabhi179I0S1Hd5nX0sLrIovl7/b PPtcPMK9ZQySavGAQZfJA5G07SXZbQiaE7cVcIl+JKAzfI7in2GGq3J0nRNKwy2TrAO4rstpdKd bJAlH6xAHJsURUnBKR5CphsxjnHFzCrwkjR+wY+lgxcsHYCpS8ujPw5tSagTWrHOzkfhAt4nIVF OFiBV5utx0kgmRQ== X-Developer-Key: i=leitao@debian.org; a=openpgp; fpr=AC8539A6E8F46702CA4A439B35A3939FFC78776D X-Debian-User: leitao Add documentation for the new vm.panic_on_unrecoverable_memory_failure sysctl, describing which failures trigger a panic (kernel-owned pages the handler cannot recover) and which are intentionally left out (transient allocator races and unclassified pages). Signed-off-by: Breno Leitao --- Documentation/admin-guide/sysctl/vm.rst | 85 +++++++++++++++++++++++++++++= ++++ 1 file changed, 85 insertions(+) diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-= guide/sysctl/vm.rst index 97e12359775c..f71d87039904 100644 --- a/Documentation/admin-guide/sysctl/vm.rst +++ b/Documentation/admin-guide/sysctl/vm.rst @@ -67,6 +67,7 @@ Currently, these files are in /proc/sys/vm: - page-cluster - page_lock_unfairness - panic_on_oom +- panic_on_unrecoverable_memory_failure - percpu_pagelist_high_fraction - stat_interval - stat_refresh @@ -925,6 +926,90 @@ panic_on_oom=3D2+kdump gives you very strong tool to i= nvestigate why oom happens. You can get snapshot. =20 =20 +panic_on_unrecoverable_memory_failure +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +When a hardware memory error (e.g. multi-bit ECC) hits a kernel page +that cannot be recovered by the memory failure handler, the default +behaviour is to ignore the error and continue operation. This is +dangerous because the corrupted data remains accessible to the kernel, +risking silent data corruption or a delayed crash when the poisoned +memory is next accessed. + +When enabled, this sysctl triggers a panic on memory failure events +hitting kernel-owned pages that the handler cannot recover: +``PageReserved`` (firmware reservations, kernel image, vDSO, zero +page, and similar memblock-reserved regions), ``PageSlab``, +``PageTable``, and ``PageLargeKmalloc``. These are owned by the +kernel and the memory failure handler cannot reliably evict their +contents. + +For soft offline (``madvise(MADV_SOFT_OFFLINE)``, +``/sys/devices/system/memory/soft_offline_page``), pages owned by +``movable_ops`` are exempted, since soft offline is allowed to +migrate them even though they are not on the LRU. + +Other unrecoverable kernel-owned populations (vmalloc allocations, +kernel stack pages, ...) are not currently covered because the +handler has no page-type signal that distinguishes them from a +userspace folio temporarily off the LRU during migration or +compaction. Such pages still go through the standard +MF_MSG_GET_HWPOISON path: ``PG_hwpoison`` is set on them and a +delayed crash on the next access remains possible. Coverage may +grow as the handler gains stronger kernel-ownership signals. + +Recoverable failure paths are also intentionally left out: in-flight +buddy allocations and other transient races with the page allocator +can reach the same diagnostic, and panicking on them would risk +killing the box for a page destined for userspace where the standard +SIGBUS recovery path applies. Pages whose state could not be +classified at all are not covered either, since an unknown state is +not a sound basis for a panic decision. + +For many environments it is preferable to panic immediately with a clean +crash dump that captures the original error context, rather than to +continue and face a random crash later whose cause is difficult to +diagnose. + +Use cases +--------- + +This option is most useful in environments where unattributed crashes +are expensive to debug or where data integrity must take precedence +over availability: + +* Large fleets, where multi-bit ECC errors on kernel pages are observed + regularly and post-mortem analysis of an unrelated downstream crash + (often seconds to minutes after the original error) consumes + significant engineering effort. + +* Systems configured with kdump, where panicking at the moment of the + hardware error produces a vmcore that still contains the faulting + address, the affected page state, and the originating MCE/GHES + record =E2=80=94 context that is typically lost by the time a delayed cr= ash + occurs. + +* High-availability clusters that rely on fast, deterministic node + failure for failover, and prefer an immediate panic over silent data + corruption propagating to replicas or persistent storage. + +* Kernel and platform developers reproducing hwpoison issues with + tools such as ``mce-inject`` or error-injection debugfs interfaces, + where panicking on the unrecoverable path makes regressions + immediately visible instead of surfacing as later, unrelated + failures. + +=3D =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D +0 Try to continue operation (default). +1 Panic immediately. If the ``panic`` sysctl is also non-zero then the + machine will be rebooted. +=3D =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +Example:: + + echo 1 > /proc/sys/vm/panic_on_unrecoverable_memory_failure + + percpu_pagelist_high_fraction =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D =20 --=20 2.53.0-Meta From nobody Wed Jun 10 22:49:18 2026 Received: from stravinsky.debian.org (stravinsky.debian.org [82.195.75.108]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D893A3F58CE; Tue, 9 Jun 2026 10:57:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=82.195.75.108 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781002669; cv=none; b=TpKlfruCs1QSoHgUoxsoHRQXh4dtQKruUcaDpVLeyiJoLtjkHoka8Rb2V9k9tt9NMaO5H2+hn0l//96gJTRyYzXRjMz68GEtnkb186Qx/PM+fbozeNNPzfEw4lDnVsX5D3biZ2rkB1kWkPv479Opc12OP2bdvP9JB4fEcFC39f0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781002669; c=relaxed/simple; bh=w+Nvwa+39pceerx+IWvAftelt8mEObSt1TNFaahZHeA=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=U+sB8RzmT34+tz9Iz6mkXKvGZUWTvJWAcCQ2GEIV8zRdcuSd7RowDPx77cv3AY3ahD2X/pk2Or9ktvo6yAsgRIYpSKtecIvUIjh9jBvKe1fmnacFjvf4S4YPKKIGz0KB/lu84kQuItI/0sSCxFzN5qaK7LaARacBBa2B4m/1OEs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=debian.org; spf=pass smtp.mailfrom=debian.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b=LLyynDcB; arc=none smtp.client-ip=82.195.75.108 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=debian.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=debian.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b="LLyynDcB" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=debian.org; s=smtpauto.stravinsky; h=X-Debian-User:Cc:To:In-Reply-To:References: Message-Id:Content-Transfer-Encoding:Content-Type:MIME-Version:Subject:Date: From:Reply-To:Content-ID:Content-Description; bh=Q7TvAD/W7qEPja1WO+XR8ryHtf6pLEekuUp0XItZmbE=; b=LLyynDcBn/kej7ZxnzjmwaPCHF 6bWymESW5D00Zk9lIGIusRX8r56AntU7fW/D2lOsflbzPMRuUpCq47JGQmIzq1bpa89zkSsihk5D3 ExxR7aGaHhWS+rKJKUEIaKxZAQIn6rbDdBxhrPdyOFpaO4ZtfnkubRVEzQs51XmIxKKwPUUfXFrnl c0tX1ITTP06d4/Mhw11CZcYVUmhOfMMGF8S/C3A2OTBkeZMRLnPGcIofpjFqhXwm6LAm7QRycAb3z 9Y3+SQuB9sb/+uuobQIcJFjP455CfEPusrjfXEYRdl8gbm6oJd14+MNE4eA1Rq6dGie0eQrZFOjF8 VeZvKYDg==; Received: from authenticated-user by stravinsky.debian.org with esmtpsa (TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim 4.96) (envelope-from ) id 1wWu9P-008LuW-1E; Tue, 09 Jun 2026 10:57:43 +0000 From: Breno Leitao Date: Tue, 09 Jun 2026 03:57:00 -0700 Subject: [PATCH v9 6/6] selftests/mm: add hwpoison-panic destructive test Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260609-ecc_panic-v9-6-432a74002e74@debian.org> References: <20260609-ecc_panic-v9-0-432a74002e74@debian.org> In-Reply-To: <20260609-ecc_panic-v9-0-432a74002e74@debian.org> To: Miaohe Lin , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Shuah Khan , Naoya Horiguchi , Jonathan Corbet , Shuah Khan , "Liam R. Howlett" , lance.yang@linux.dev, Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , "Liam R. Howlett" Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, Breno Leitao , linux-trace-kernel@vger.kernel.org, kernel-team@meta.com X-Mailer: b4 0.14.3 X-Developer-Signature: v=1; a=openpgp-sha256; l=11158; i=leitao@debian.org; h=from:subject:message-id; bh=w+Nvwa+39pceerx+IWvAftelt8mEObSt1TNFaahZHeA=; b=owEBbQKS/ZANAwAIATWjk5/8eHdtAcsmYgBqJ/F/4facG8uiJur6CzzOpPX/GRX+YLanDZXuW LFuS6LbEMeJAjMEAAEIAB0WIQSshTmm6PRnAspKQ5s1o5Of/Hh3bQUCaifxfwAKCRA1o5Of/Hh3 bWIWD/9l0sAxnjYROoYKH7YnQAWnduaSHh20QHhSnT9vFtaDgZJBwhxs4x7DWP40xO4SF1l3cRU /+6dHi93ehFWfxf+DS+gA/iTmtFod0YEnlFCHYOGAdtdJ4M+JeeWMuI+RthKeIoIaQdZYduecbs Pb88PrLGruNxboOJrgAwJ4EsndxXGt6b6/8O2093RhJFr0dc3oBecdfQWrbgDJaW6pM35fyNok7 DhjeR0XYTEsv/SyyjNDeg0Z8FAJtl8gMfADHQVz2Oy8QkHk/k82rFIJj177l3YlAa6tgA7Xn03d 7cgarOQJfjdM2uypBRd8bzt3XNw7ReQK8+koCXRVSvIwtfq92ZUQ3X6qPzhgU/4Dbeo7a84F1hP r40LIl2CSFRoAw/1EXuSD/2lHr3lQ8n/VUlpN5E8/qIUf6Vc6XKMXWlwZtG9Hmm8mrk2SRyKU24 l36HHB6aYaJyMdFCyrEUDYGP418gKGWyFuJQRATHMXaN+eIK7ZsQ6jFgYQWBjAkAG9v1RNBdWyA 9hPI4x6tF8rD2rRznss5zN/sbFYx83L3CwWJHupxLqQRPJdmeeehPrrO3qhGdxvGrEkvcrLdZ5D 02h8X7f7r+2IJSKfhsO66PHw9aaXdaJS8ZHj4l6Dnctk7EXCRinO5MoICakLtmcgcLGTVGM7o9P LcMfvp3ak6LxMwQ== X-Developer-Key: i=leitao@debian.org; a=openpgp; fpr=AC8539A6E8F46702CA4A439B35A3939FFC78776D X-Debian-User: leitao Add a destructive selftest that verifies vm.panic_on_unrecoverable_memory_failure actually panics when a hwpoison error hits a kernel-owned page. Three "kinds" of kernel-owned page can be targeted, selectable via the script's first positional argument (default: rodata): rodata - a PG_reserved page in the kernel rodata range, sourced from the "Kernel rodata" sub-resource of "System RAM" in /proc/iomem. That entry is reported on every major architecture and guarantees the chosen PFN is backed by struct page (an online System RAM range, not a firmware hole), is PG_reserved, and is read-only -- so even if the panic fails to fire for some reason, the resulting PG_hwpoison marker on rodata does not corrupt writable kernel state. slab - a slab page found by walking /proc/kpageflags for the first PFN with KPF_SLAB set (and KPF_HWPOISON / KPF_NOPAGE / KPF_COMPOUND_TAIL clear). Exercises the get_any_page() path on a non PG_reserved kernel-owned page and so catches regressions where get_any_page() collapses kernel-owned pages into a transient -EIO instead of -ENOTRECOVERABLE. pgtable - same as slab, but the PFN is selected via KPF_PGTABLE. PageLargeKmalloc, the fourth page type matched by HWPoisonKernelOwned(), is intentionally not covered: it is a PAGE_TYPE_OPS flag with no /proc/kpageflags bit, so selecting such a PFN from userspace is not feasible. The slab and pgtable variants already exercise the same get_any_page() positive-check branch. The script enables the sysctl and writes the selected physical address to /sys/devices/system/memory/hard_offline_page. A successful run crashes the kernel with Memory failure: : unrecoverable page A return from the inject means the panic did not fire and the test fails. Test outcome is therefore observed externally (serial console, kdump) rather than from the script's own exit code. The script is intentionally NOT wired into run_vmtests.sh: every successful run panics the kernel, which is incompatible with the sequential "run each category in the same VM" model that run_vmtests.sh assumes. It is also not registered as a TEST_PROGS / ksft_* wrapper so a default kselftest run does not opt itself into a panic. The script is meant to be executed manually inside a disposable VM (e.g. virtme-ng), one variant per VM boot, and requires RUN_DESTRUCTIVE=3D1 in the environment as a safety net. Signed-off-by: Breno Leitao --- tools/testing/selftests/mm/Makefile | 4 + tools/testing/selftests/mm/hwpoison-panic.sh | 208 +++++++++++++++++++++++= ++++ 2 files changed, 212 insertions(+) diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/= mm/Makefile index e6df968f0971..ed321ae709da 100644 --- a/tools/testing/selftests/mm/Makefile +++ b/tools/testing/selftests/mm/Makefile @@ -174,6 +174,10 @@ TEST_PROGS +=3D ksft_userfaultfd.sh TEST_PROGS +=3D ksft_vma_merge.sh TEST_PROGS +=3D ksft_vmalloc.sh =20 +# Destructive: every successful run panics the kernel. Installed and +# kept executable, but not run from a default kselftest invocation. +TEST_PROGS_EXTENDED +=3D hwpoison-panic.sh + TEST_FILES :=3D test_vmalloc.sh TEST_FILES +=3D test_hmm.sh TEST_FILES +=3D va_high_addr_switch.sh diff --git a/tools/testing/selftests/mm/hwpoison-panic.sh b/tools/testing/s= elftests/mm/hwpoison-panic.sh new file mode 100755 index 000000000000..fe58e7638a8b --- /dev/null +++ b/tools/testing/selftests/mm/hwpoison-panic.sh @@ -0,0 +1,208 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# +# Verify vm.panic_on_unrecoverable_memory_failure by injecting a hwpoison +# error on a kernel-owned page and confirming the kernel panics. +# +# Three "kinds" of kernel-owned page can be targeted, selectable via the +# first positional argument (default: rodata): +# +# rodata - a PG_reserved page in the kernel rodata range +# (sourced from /proc/iomem "Kernel rodata"). Exercises +# memory_failure() -> get_any_page() on a PageReserved page. +# +# slab - a slab page found via /proc/kpageflags (KPF_SLAB). +# Exercises memory_failure() -> get_any_page() on a non +# PG_reserved kernel-owned page. This path is what catches +# regressions where get_any_page() collapses kernel-owned +# pages into a transient -EIO instead of -ENOTRECOVERABLE. +# +# pgtable - a page-table page found via /proc/kpageflags (KPF_PGTABLE). +# Same path as slab, different page type. +# +# This test is DESTRUCTIVE: a successful run crashes the kernel. It is +# meant to be executed inside a disposable VM (e.g. virtme-ng) with a +# serial console captured by the harness. It is skipped unless the +# caller opts in via RUN_DESTRUCTIVE=3D1. +# +# Test passes externally: the kernel must panic with +# "Memory failure: : unrecoverable page" +# A return from the inject means the panic did not fire and the test +# fails. +# +# Author: Breno Leitao + +set -u + +ksft_skip=3D4 +sysctl_path=3D/proc/sys/vm/panic_on_unrecoverable_memory_failure +inject_path=3D/sys/devices/system/memory/hard_offline_page +kpageflags_path=3D/proc/kpageflags + +# /proc/kpageflags bit positions (see include/uapi/linux/kernel-page-flags= .h) +KPF_SLAB=3D7 +KPF_COMPOUND_TAIL=3D16 +KPF_HWPOISON=3D19 +KPF_NOPAGE=3D20 +KPF_PGTABLE=3D26 + +kind=3D${1:-rodata} + +ksft_print() { echo "# $*"; } +ksft_exit_skip() { ksft_print "$*"; exit "$ksft_skip"; } +ksft_exit_fail() { echo "not ok 1 $*"; exit 1; } + +if [ "$(id -u)" -ne 0 ]; then + ksft_exit_skip "must run as root" +fi + +if [ ! -w "$sysctl_path" ]; then + ksft_exit_skip "$sysctl_path not present (kernel without the sysctl?)" +fi + +if [ ! -w "$inject_path" ]; then + ksft_exit_skip "$inject_path not present (no MEMORY_HOTPLUG?)" +fi + +if [ "${RUN_DESTRUCTIVE:-0}" !=3D "1" ]; then + ksft_exit_skip "destructive test; re-run with RUN_DESTRUCTIVE=3D1 inside = a disposable VM" +fi + +# Pick a PFN inside the kernel image rodata region of /proc/iomem. +# This is preferred over a top-level "Reserved" entry because top-level +# Reserved ranges are often firmware holes that have no backing struct +# page; pfn_to_online_page() returns NULL on those and memory_failure() +# bails out with -ENXIO before reaching the panic path. +# +# "Kernel rodata" is reported as a sub-resource of "System RAM" on every +# major architecture, which guarantees: +# - the PFN is backed by struct page (within an online memory range); +# - PG_reserved is set on the page (kernel image area); +# - the memory is read-only, so setting PG_hwpoison on it does not +# corrupt writable kernel state if the panic somehow does not fire. +# +# /proc/iomem entries look like (indented for sub-resources): +# " 02500000-02ffffff : Kernel rodata" +pick_rodata_phys_addr() { + awk -v pagesize=3D"$(getconf PAGE_SIZE)" ' + # Convert a hex string to a number without relying on the gawk-only + # strtonum(). mawk lacks it and would otherwise spuriously skip + # this test on distros that ship mawk as /usr/bin/awk. + function hex2num(s, n, i, c, v) { + n =3D 0 + for (i =3D 1; i <=3D length(s); i++) { + c =3D tolower(substr(s, i, 1)) + v =3D index("0123456789abcdef", c) - 1 + if (v < 0) + return -1 + n =3D n * 16 + v + } + return n + } + /: Kernel rodata[[:space:]]*$/ { + sub(/^[[:space:]]+/, "") + n =3D split($0, a, /[- ]/) + start =3D hex2num(a[1]) + end =3D hex2num(a[2]) + if (end <=3D start) + next + # Page-align upward and emit the first byte of that page. + pfn =3D int((start + pagesize - 1) / pagesize) + printf "0x%x\n", pfn * pagesize + exit 0 + } + ' /proc/iomem +} + +# Walk /proc/kpageflags and return the phys addr of the first PFN that +# has bit $1 set, with KPF_HWPOISON, KPF_NOPAGE and KPF_COMPOUND_TAIL +# all clear (so we attack a real, non-tail, not-already-poisoned page). +# +# We skip the first 16 MiB of PFNs to step past low-memory special +# ranges (BIOS/EFI/ACPI/etc.) that often are PG_reserved and would not +# exhibit the slab/pgtable type we are looking for. +pick_kpageflags_phys_addr() { + local want_bit=3D$1 + local pagesize skip_pfn + + [ -r "$kpageflags_path" ] || return + + pagesize=3D$(getconf PAGE_SIZE) + skip_pfn=3D$(((16 * 1024 * 1024) / pagesize)) + + od -An -tx8 -v -w8 -j "$((skip_pfn * 8))" "$kpageflags_path" 2>/dev/null = | \ + awk -v want_bit=3D"$want_bit" \ + -v hwp_bit=3D"$KPF_HWPOISON" \ + -v nopage_bit=3D"$KPF_NOPAGE" \ + -v tail_bit=3D"$KPF_COMPOUND_TAIL" \ + -v base_pfn=3D"$skip_pfn" \ + -v pagesize=3D"$pagesize" ' + # Test whether bit "b" is set in the 16-hex-digit value "hex". + # Done with substring + per-digit lookup so we never rely on awk + # bitwise operators (mawk lacks them), 64-bit FP precision or the + # gawk-only strtonum(). + function bit_set(hex, b, di, bi, c, v) { + di =3D int(b / 4) + bi =3D b - di * 4 + c =3D substr(hex, length(hex) - di, 1) + v =3D index("0123456789abcdef", tolower(c)) - 1 + if (bi =3D=3D 0) return (v % 2) =3D=3D 1 + if (bi =3D=3D 1) return int(v / 2) % 2 =3D=3D 1 + if (bi =3D=3D 2) return int(v / 4) % 2 =3D=3D 1 + return int(v / 8) % 2 =3D=3D 1 + } + { + gsub(/^[[:space:]]+/, "") + h =3D $1 + if (bit_set(h, want_bit) && + !bit_set(h, hwp_bit) && + !bit_set(h, nopage_bit) && + !bit_set(h, tail_bit)) { + pfn =3D base_pfn + NR - 1 + printf "0x%x\n", pfn * pagesize + exit 0 + } + } + ' +} + +case "$kind" in +rodata) + phys_addr=3D$(pick_rodata_phys_addr) + missing_msg=3D'no "Kernel rodata" entry in /proc/iomem' + ;; +slab) + phys_addr=3D$(pick_kpageflags_phys_addr "$KPF_SLAB") + missing_msg=3D"no usable slab PFN found in $kpageflags_path" + ;; +pgtable) + phys_addr=3D$(pick_kpageflags_phys_addr "$KPF_PGTABLE") + missing_msg=3D"no usable page-table PFN found in $kpageflags_path" + ;; +*) + ksft_exit_fail "unknown kind '$kind' (expected: rodata|slab|pgtable)" + ;; +esac + +if [ -z "$phys_addr" ]; then + ksft_exit_skip "$missing_msg" +fi + +ksft_print "enabling $sysctl_path" +prior=3D$(cat "$sysctl_path") +echo 1 > "$sysctl_path" || ksft_exit_fail "failed to enable sysctl" + +ksft_print "injecting hwpoison at phys 0x$(printf '%x' "$phys_addr") (kind= =3D$kind)" +ksft_print "expecting kernel panic: 'Memory failure: : unrecoverable = page'" + +# If this returns, the kernel did not panic =E2=86=92 test failed. Restor= e the +# sysctl before reporting so the system is left as we found it. +if echo "$phys_addr" > "$inject_path"; then + echo "$prior" > "$sysctl_path" + ksft_exit_fail "inject returned without panic; sysctl ineffective" +fi + +# Write failed (e.g. -EINVAL on offlining a non-online region): also a +# failure for this test, since we expected the panic path. +echo "$prior" > "$sysctl_path" +ksft_exit_fail "inject failed before reaching the panic path" --=20 2.53.0-Meta