From nobody Sun Jun 7 03:02:01 2026 Received: from stravinsky.debian.org (stravinsky.debian.org [82.195.75.108]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 203E73242BD; Wed, 27 May 2026 14:06:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=82.195.75.108 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779890798; cv=none; b=fN907a/ydraGfNj0VmdF566mAdD6u8X6LYwM7RQmqtnZgw3iHasoM6lsX17fTmIHpBqf17/XnTVxi/tJbvI/x1VFgQ0L1+ijSLWY5AGi/nbA6WcM0H+WOZLX3NCD4Q0zvqL5oa1sZPLfNq9Hs4EhuYVDqiXodeOOaKn7qy1BST0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779890798; c=relaxed/simple; bh=dRzYu8tKgN6EjEgu5H7QOpWd9i5LbDMZJK2X6dDuigk=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=Phvn+nHztQaJ2DOtcb6ip3zOG/1nmfaeluFzAklimmoHH4tayMZMi4wKXmPWuqR4WiXfAiHFkYPOa5WD0JSM302befB1pf2rDKYCWQkyjssrJ0wZqEqhcihUOmBI1bnjBRW3micGKqCIRM79OFhQHDigYWbBnH1P6wXtvaLiCw8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=debian.org; spf=pass smtp.mailfrom=debian.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b=t5XLJFW+; arc=none smtp.client-ip=82.195.75.108 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=debian.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=debian.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b="t5XLJFW+" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=debian.org; s=smtpauto.stravinsky; h=X-Debian-User:Cc:To:In-Reply-To:References: Message-Id:Content-Transfer-Encoding:Content-Type:MIME-Version:Subject:Date: From:Reply-To:Content-ID:Content-Description; bh=7Eg1c66LFMSDZiSV4TVxCFeAUFW0duGKcjtTfzIRAmY=; b=t5XLJFW+uun1qucxrWkkyytvAC HKxTRIK0pdxK6gln5TMBVKA9j8oKw0A41XdPIb0bG2gARxsyUdy4Np5sACSgpck9Ek8n79JBgraJc n77fanFsSs/29HEH9MVJzDahLk1p1vICqVjuO3HtRbpSvExxqg4GYbfvv4h3sYHhf/dAFGAFyUtL3 3cDuqpj+SbZly5GYogh3UZ2XWATlPaMRPSYFijQ4IfKJnZKOvaoEfZp5f5NCFfwQq/+0aAoN7G3wI zSc+GuXpj9nBXeOgMBSOO/98xpgP023IS2gL2aiB2DuS1HUWp8+Elxa/Rjr0Z4dsxELYz4GwKLLM5 36h+Usqg==; Received: from authenticated-user by stravinsky.debian.org with esmtpsa (TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim 4.96) (envelope-from ) id 1wSEtz-003DSd-1i; Wed, 27 May 2026 14:06:31 +0000 From: Breno Leitao Date: Wed, 27 May 2026 07:06:14 -0700 Subject: [PATCH v8 1/6] mm/memory-failure: drop dead error_states[] entry for reserved pages Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260527-ecc_panic-v8-1-9ea0cfa16bb0@debian.org> References: <20260527-ecc_panic-v8-0-9ea0cfa16bb0@debian.org> In-Reply-To: <20260527-ecc_panic-v8-0-9ea0cfa16bb0@debian.org> To: Miaohe Lin , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Shuah Khan , Naoya Horiguchi , Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , Jonathan Corbet , Shuah Khan , "Liam R. Howlett" , "Liam R. Howlett" Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, Breno Leitao , linux-trace-kernel@vger.kernel.org, kernel-team@meta.com, Lance Yang X-Mailer: b4 0.16-dev-d5d98 X-Developer-Signature: v=1; a=openpgp-sha256; l=3194; i=leitao@debian.org; h=from:subject:message-id; bh=dRzYu8tKgN6EjEgu5H7QOpWd9i5LbDMZJK2X6dDuigk=; b=owEBbQKS/ZANAwAIATWjk5/8eHdtAcsmYgBqFvpb1t1yv21tSoXdpd9OeX7lVUqVxBfmBWhRO 7QMfvqcuUqJAjMEAAEIAB0WIQSshTmm6PRnAspKQ5s1o5Of/Hh3bQUCahb6WwAKCRA1o5Of/Hh3 bRLSD/9hra+31QED97u48QP55twrygQyNcmxlYKWygJmoyU4xa7l6enHBboEvlIJi7I/r2D8812 ZM1O1RL4Rl0bxPeX4dQDLD21S+aNapyrEP/X1Nc+z7O5xECZK1gAiTXa1ZDG/jna6TbAIT8iUKu 3c5LPqKasQzx6QwGZuf+PfZRicIkAxftwTNEdUB4+M74p8Jic0eGsydOMR23W0dzm75MsjGe39B wna0s/IXdcK1m7Qw4lpXQNdmbiWapHcmEUOqMezfMugMkUCiz8bgSfRRG7gLPG4EX1D5o2eNaMu itOnCL/dX/AUwCSE2eKiIMNZwf3Aey9zc6Msb3yVwKdRDxu0yLIHvUFQUI9jvIWRMjQu9fjNWOh yUUzXaDncvCoJA1LL1oVJzi/mEHe5YrHkLKOC6DMhYYMts7Cgj8s8OiXiGmngwT43XyL64x8VmO EHmlfnY9FVJY57gVvOP06yMEq6HJKpcHUD9mjcpdba4L24qncka7JHwbu0x1O7DOQSKi0M0IgDO +FZqA2sNAf4etbyVSm9aV6mCYLbt2jO3o0ItIeV3SpI+A40KNmuo1a8DcjrltK9R9NV+p3+dXvT WwZtY5lCZVJU/v+gePAT8u/k+VN1IZFPgWbIO4BcMcIw/zU4l1HWhqIFCsikew7x7FiAyA1/QKY 1Xr7zhi5EBlO9Cg== X-Developer-Key: i=leitao@debian.org; a=openpgp; fpr=AC8539A6E8F46702CA4A439B35A3939FFC78776D X-Debian-User: leitao The first entry of error_states[], { reserved, reserved, MF_MSG_KERNEL, me_kernel }, is unreachable. identify_page_state() has two callers, and neither one can dispatch a PG_reserved page to me_kernel(): * memory_failure() reaches identify_page_state() only after get_hwpoison_page() returned 1. get_any_page() reaches that return only via __get_hwpoison_page(), which only takes a refcount when the page is HWPoisonHandlable(). HWPoisonHandlable() is an allowlist for LRU, free-buddy, and (for soft-offline) movable_ops pages -- PG_reserved pages do not satisfy any of these, so they fail with -EBUSY/-EIO long before identify_page_state() runs. * try_memory_failure_hugetlb() reaches identify_page_state() only via the MF_HUGETLB_IN_USED branch, where the page is necessarily a hugetlb folio. hugetlb folios don't carry PG_reserved at that point: hugetlb_folio_init_vmemmap() calls __folio_clear_reserved() during init, so the reserved entry would not match even if it were still present. me_kernel() never executes and the entry exists only to be matched against by code that cannot see it. Drop the entry, the me_kernel() helper, and the now-unused "reserved" macro. Leave the MF_MSG_KERNEL enum value in place: it remains part of the tracepoint and pr_err() string tables, and follow-on work to classify unrecoverable kernel pages can reuse it without churning the user-visible enum. No functional change. Suggested-by: David Hildenbrand Acked-by: David Hildenbrand (Arm) Reviewed-by: Lance Yang Acked-by: Miaohe Lin Signed-off-by: Breno Leitao --- mm/memory-failure.c | 14 -------------- 1 file changed, 14 deletions(-) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 51508a55c405..f4d3e6e20e13 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -980,17 +980,6 @@ static bool has_extra_refcount(struct page_state *ps, = struct page *p, return false; } =20 -/* - * Error hit kernel page. - * Do nothing, try to be lucky and not touch this instead. For a few cases= we - * could be more sophisticated. - */ -static int me_kernel(struct page_state *ps, struct page *p) -{ - unlock_page(p); - return MF_IGNORED; -} - /* * Page in unknown state. Do nothing. * This is a catch-all in case we fail to make sense of the page state. @@ -1199,10 +1188,8 @@ static int me_huge_page(struct page_state *ps, struc= t page *p) #define mlock (1UL << PG_mlocked) #define lru (1UL << PG_lru) #define head (1UL << PG_head) -#define reserved (1UL << PG_reserved) =20 static struct page_state error_states[] =3D { - { reserved, reserved, MF_MSG_KERNEL, me_kernel }, /* * free pages are specially detected outside this table: * PG_buddy pages only make a small fraction of all free pages. @@ -1234,7 +1221,6 @@ static struct page_state error_states[] =3D { #undef mlock #undef lru #undef head -#undef reserved =20 static void update_per_node_mf_stats(unsigned long pfn, enum mf_result result) --=20 2.54.0 From nobody Sun Jun 7 03:02:01 2026 Received: from stravinsky.debian.org (stravinsky.debian.org [82.195.75.108]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 505FA31E84B; Wed, 27 May 2026 14:06:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=82.195.75.108 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779890801; cv=none; b=npoLKqNFW5sSI5vMryV2/sq55Vr7lE/91MJlFGXHoFn0XDXE1pd7tujLZZc34VCm0upU5bSSeiAAsEqduZEKPN9nkEsh+4e16fdja6kwBFiEff1AToc3iC/3xO3ddeC85/VurNDi5Nt63PwuOatWFJvMNrNYUjvDHzNKWPYf8zE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779890801; c=relaxed/simple; bh=CMxDEuc+EJZDUlHOyEkCjFppFFHvyvtO25COFqS0Qrc=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=h2D0V6EUS3C3eexg5HXVjxg4scSboBOFO86LD8MXlXDOflhE32KVtBuJnpCKHfeXcIcoSWfuHDY8Uys/+khahIvjWXumPZvU6TCsVZxWpoiQ7kgsWSyJV5BOlKwlLNmbPZ4DtsdWyL+2+E7h/znShq3viuM+Rlh7IQW45kEeszA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=debian.org; spf=pass smtp.mailfrom=debian.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b=VfUbJUAU; arc=none smtp.client-ip=82.195.75.108 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=debian.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=debian.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b="VfUbJUAU" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=debian.org; s=smtpauto.stravinsky; h=X-Debian-User:Cc:To:In-Reply-To:References: Message-Id:Content-Transfer-Encoding:Content-Type:MIME-Version:Subject:Date: From:Reply-To:Content-ID:Content-Description; bh=FhENfLdutSDbGe4T/p6ITcwdZ6ZMuTAeedUQ6PdP1Fk=; b=VfUbJUAU1vNX3D9FZHTX+CUAL6 6Gs5BxlBwCam7V833cbRWhz1idOQNJVdj1eCnKCmpPnZaXA/JKyFfhR2GjvDOIeokjn0acL2Ance/ cZnpDtOhf4vtQcrpeGKiuKfPb4h1QphQB5HLd9IYr1dcfa4DeRjdW9jqgU7mkF4oIdtdpJohfyKDx p2xINNw1Izfmx3oQsikG4EqKbRT5s4l1P8wj0OYhf9vXdi0l75/RMzDLzA3CoOFHoIdLB4nA43Pnf nfQu3L0ZrfkhshKYyeg78tih/dpnS+yTLHfbNXh0/Tp6zYeE11+YUoN8+O8I+fY1Xfzc/TRuh6STl nU+pY2sw==; Received: from authenticated-user by stravinsky.debian.org with esmtpsa (TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim 4.96) (envelope-from ) id 1wSEu5-003DSg-0e; Wed, 27 May 2026 14:06:37 +0000 From: Breno Leitao Date: Wed, 27 May 2026 07:06:15 -0700 Subject: [PATCH v8 2/6] mm/memory-failure: surface unhandlable kernel pages as -ENOTRECOVERABLE Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260527-ecc_panic-v8-2-9ea0cfa16bb0@debian.org> References: <20260527-ecc_panic-v8-0-9ea0cfa16bb0@debian.org> In-Reply-To: <20260527-ecc_panic-v8-0-9ea0cfa16bb0@debian.org> To: Miaohe Lin , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Shuah Khan , Naoya Horiguchi , Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , Jonathan Corbet , Shuah Khan , "Liam R. Howlett" , "Liam R. Howlett" Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, Breno Leitao , linux-trace-kernel@vger.kernel.org, kernel-team@meta.com, Lance Yang X-Mailer: b4 0.16-dev-d5d98 X-Developer-Signature: v=1; a=openpgp-sha256; l=5797; i=leitao@debian.org; h=from:subject:message-id; bh=CMxDEuc+EJZDUlHOyEkCjFppFFHvyvtO25COFqS0Qrc=; b=owEBbQKS/ZANAwAIATWjk5/8eHdtAcsmYgBqFvpbS4dSsLryGRWX+9CPYnqcxRRV/9Yht/L7s XGc2EpThbyJAjMEAAEIAB0WIQSshTmm6PRnAspKQ5s1o5Of/Hh3bQUCahb6WwAKCRA1o5Of/Hh3 bRUnEACk+lYX1maBV4qTr6tMuEu0Y5FijjQMZPHRDPTnPfxlEPM5BDjnUvbeFpeQbM+FYBoxYMA u13ESUcZdgyrIzikvUHVTt/jXadVLZ5J87zSkZ/vecu3C9dUZ3OcI2zjiGT+JVhjeD+imcWWHZu WawfRRCqu3CBxWMReAXteQxdpSqyJyjddeFXdylp/e+r5LpSBPBhTeG85xRL6YR10tAD+8EM/xk j99eNf0iPQiIFgL20/Q3haF0lN8TKiDRYfXzBSluPIfbslCkPS1TzgMLxRsT3mLCAjNAXAchBGL 8WctMsHX3iuL6XHl6/+9Zr3AGhBeym3aRUxNYdZLw7XJ8uTVqxox+r7e4mqDjrB5WpNtptFs1xb g6SgvNbfRGYboUY1caUvG/fZYOOqkMIePIOWfrT5klstFnvjsgjiDX7Mdnd3lyYJhz/rKiODBFz 5USCGHDvGFnIEru3Cu0Y7Hx+OtXhLA/MlK5bG4C4Ba9k81g+dA0sHQ3i8kL674NqAxsDegOuXaG 2TEHjVRkONmgCiG0/6y/9I4NAk234e7xEOSZTm8B30eWuDtvOgC6amSJxcmPeaWkzyycP0Mo2dO GN11/W9K/aWqGqGPC6r5tmZdvjWw3uwBgVMMGNZSfauWqIZTjf18YMJq0WBGZXJUjEkuSXFWLCb JM6gHyue4CMgtqA== X-Developer-Key: i=leitao@debian.org; a=openpgp; fpr=AC8539A6E8F46702CA4A439B35A3939FFC78776D X-Debian-User: leitao get_any_page() collapses every HWPoisonHandlable() rejection into a single -EIO via the __get_hwpoison_page() -> -EBUSY -> shake_page() -> retry path. That is correct for the transient case (a userspace folio briefly off LRU during migration or compaction, which a later shake can drag back), but wrong for stable kernel-owned pages: slab, page-table, large-kmalloc and PG_reserved pages will never become HWPoisonHandlable(), so the retry loop is wasted work and the final -EIO loses the "this is structurally unrecoverable" information. memory_failure() then maps -EIO into MF_MSG_GET_HWPOISON, which the panic-on-unrecoverable sysctl deliberately does not act on. Introduce HWPoisonKernelOwned(), a small predicate that positively identifies pages the hwpoison handler cannot recover from: HWPoisonKernelOwned(p, flags) :=3D !(MF_SOFT_OFFLINE && page_has_movable_ops(p)) && (PageReserved(p) || PageSlab(p) || PageTable(p) || PageLargeKmalloc(p)) The MF_SOFT_OFFLINE / page_has_movable_ops() opt-out mirrors the same exception in HWPoisonHandlable(): soft-offline is allowed to migrate movable_ops pages even though they are not on the LRU, and we must not pre-empt that with an unrecoverable verdict. The list is intentionally not exhaustive. vmalloc and kernel-stack pages, for example, do not carry a page_type bit and would need a different oracle; they keep going through the existing retry path unchanged. This is the smallest set we can identify with certainty by page type. Wire the helper into the top of get_any_page() to short-circuit those pages before the retry loop runs. On a hit, drop the caller's MF_COUNT_INCREASED reference (if any) and return -ENOTRECOVERABLE straight away. Pages outside the helper's positive list still take the existing retry path and return -EIO, leaving operator-visible behaviour for those cases unchanged. Extend the unhandlable-page pr_err() to fire for either errno and update the get_hwpoison_page() kerneldoc to document the new return. memory_failure() still folds every negative return into MF_MSG_GET_HWPOISON via its existing "else if (res < 0)" branch, so this patch on its own only changes the errno that soft_offline_page() can propagate to its callers. A follow-up wires -ENOTRECOVERABLE through memory_failure() and reports MF_MSG_KERNEL for the unrecoverable cases, which is what the panic_on_unrecoverable_memory_failure sysctl observes. Suggested-by: David Hildenbrand Suggested-by: Lance Yang Signed-off-by: Breno Leitao --- mm/memory-failure.c | 42 ++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 40 insertions(+), 2 deletions(-) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index f4d3e6e20e13..8f63bdfeff8f 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1325,6 +1325,28 @@ static inline bool HWPoisonHandlable(struct page *pa= ge, unsigned long flags) return PageLRU(page) || is_free_buddy_page(page); } =20 +/* + * Positive identification of pages the hwpoison handler cannot recover. + * These page types are owned by kernel internals (no userspace mapping + * to unmap, no file mapping to invalidate, no migration target), so the + * shake_page() / retry loop in get_any_page() can never turn them into + * something HWPoisonHandlable() will accept. Short-circuit them to + * -ENOTRECOVERABLE so callers can panic on operator request instead of + * spinning through retries that exit as a transient-looking -EIO. + * + * The MF_SOFT_OFFLINE / page_has_movable_ops() opt-out mirrors + * HWPoisonHandlable(): soft-offline is allowed to migrate movable_ops + * pages even though they are not on the LRU. + */ +static inline bool HWPoisonKernelOwned(struct page *page, unsigned long fl= ags) +{ + if ((flags & MF_SOFT_OFFLINE) && page_has_movable_ops(page)) + return false; + + return PageReserved(page) || PageSlab(page) || + PageTable(page) || PageLargeKmalloc(page); +} + static int __get_hwpoison_page(struct page *page, unsigned long flags) { struct folio *folio =3D page_folio(page); @@ -1371,6 +1393,19 @@ static int get_any_page(struct page *p, unsigned lon= g flags) if (flags & MF_COUNT_INCREASED) count_increased =3D true; =20 + /* + * Page types we know are kernel-owned and cannot be recovered. + * Short-circuit before the shake_page() / retry loop, which + * cannot turn any of these into something HWPoisonHandlable(). + * Drop the caller's reference if MF_COUNT_INCREASED took one. + */ + if (HWPoisonKernelOwned(p, flags)) { + if (count_increased) + put_page(p); + ret =3D -ENOTRECOVERABLE; + goto out; + } + try_again: if (!count_increased) { ret =3D __get_hwpoison_page(p, flags); @@ -1418,7 +1453,7 @@ static int get_any_page(struct page *p, unsigned long= flags) ret =3D -EIO; } out: - if (ret =3D=3D -EIO) + if (ret =3D=3D -EIO || ret =3D=3D -ENOTRECOVERABLE) pr_err("%#lx: unhandlable page.\n", page_to_pfn(p)); =20 return ret; @@ -1475,7 +1510,10 @@ static int __get_unpoison_page(struct page *page) * -EIO for pages on which we can not handle memory errors, * -EBUSY when get_hwpoison_page() has raced with page lifecycle * operations like allocation and free, - * -EHWPOISON when the page is hwpoisoned and taken off from buddy. + * -EHWPOISON when the page is hwpoisoned and taken off from buddy, + * -ENOTRECOVERABLE for kernel-owned pages identified by + * HWPoisonKernelOwned() (PG_reserved, slab, + * page-table, large-kmalloc) that the handler cannot recover. */ static int get_hwpoison_page(struct page *p, unsigned long flags) { --=20 2.54.0 From nobody Sun Jun 7 03:02:01 2026 Received: from stravinsky.debian.org (stravinsky.debian.org [82.195.75.108]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D3F2C31E84B; Wed, 27 May 2026 14:06:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=82.195.75.108 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779890806; cv=none; b=C8zYOA5cnO8PAM1m1ZH4bs17qK5yh3M6oIPkMPPtspsrkB1SwlYJKT/Z3OCwICDAUcvMpSWl5iUrtVOTlUmHssJdFCtIIRtt05zeYbTNzFEOYC8AcKNE5zGm91zZgJpuOkZZQGESrma1xMiOURBLwc3g3gSw4aIxr2TvFjABA5o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779890806; c=relaxed/simple; bh=ScD3PYIZI0a0zheAr9zG3EYnPdJ+1Id9YqVY4Vqhi9M=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=riMFNSQ0wRsZCgWObB+FdtHh4Y19oQ9/nparI0iRAaR10WKvxprpmyrLIlWwNc7LbdEU2PsKNT7mh6Hzqy++F6JSFNERlUy9FBmEbxaiEUjPDhQRMSB6GnUXDH0ED8f/vaHuhD0wJKZr71Z+J+zHh7FMwPcloGFoI48W6JDquoA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=debian.org; spf=pass smtp.mailfrom=debian.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b=QEgjkc8Z; arc=none smtp.client-ip=82.195.75.108 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=debian.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=debian.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b="QEgjkc8Z" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=debian.org; s=smtpauto.stravinsky; h=X-Debian-User:Cc:To:In-Reply-To:References: Message-Id:Content-Transfer-Encoding:Content-Type:MIME-Version:Subject:Date: From:Reply-To:Content-ID:Content-Description; bh=grLDsIqWV1YHU9+uNN+g0k8WBJT1n49B8J5hsZ/KNBM=; b=QEgjkc8ZiHM0a65uX8TFwr0ni3 0VWPsqHr237btqTgnn0azc/epmWYgkd2lC+4KG/Lqs5Q+foW+/8BlZc7+4EBg1hvUAiEVTz11298L nND5hSsWnlSTzfzkPOXpYgoJZyJB2LRi/QLYf3q1ej71q2fd8mB8N49iSG9O6aq3r77nhM/vHZoW+ DTmP4aTR4Up2viHeZawCjyt2Hy1NND+1md40acVax581K8QqtgXILaXo8bMzTaBrd+YbBsWIt/zqq xgG5ITpkw57emP87MzsX7XHiVcomYS/axDADRCTQHTQxnTRXoLXvSx/Fg8NYA7IbL93BRb8lX9VIB mhHvq0ow==; Received: from authenticated-user by stravinsky.debian.org with esmtpsa (TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim 4.96) (envelope-from ) id 1wSEuA-003DTD-2w; Wed, 27 May 2026 14:06:43 +0000 From: Breno Leitao Date: Wed, 27 May 2026 07:06:16 -0700 Subject: [PATCH v8 3/6] mm/memory-failure: report MF_MSG_KERNEL for unrecoverable kernel pages Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260527-ecc_panic-v8-3-9ea0cfa16bb0@debian.org> References: <20260527-ecc_panic-v8-0-9ea0cfa16bb0@debian.org> In-Reply-To: <20260527-ecc_panic-v8-0-9ea0cfa16bb0@debian.org> To: Miaohe Lin , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Shuah Khan , Naoya Horiguchi , Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , Jonathan Corbet , Shuah Khan , "Liam R. Howlett" , "Liam R. Howlett" Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, Breno Leitao , linux-trace-kernel@vger.kernel.org, kernel-team@meta.com X-Mailer: b4 0.16-dev-d5d98 X-Developer-Signature: v=1; a=openpgp-sha256; l=2349; i=leitao@debian.org; h=from:subject:message-id; bh=ScD3PYIZI0a0zheAr9zG3EYnPdJ+1Id9YqVY4Vqhi9M=; b=owEBbQKS/ZANAwAIATWjk5/8eHdtAcsmYgBqFvpbSHPg/Q3Yw/y6hVA3+yUCTI5vOf63OaaDy PBtuXI33n2JAjMEAAEIAB0WIQSshTmm6PRnAspKQ5s1o5Of/Hh3bQUCahb6WwAKCRA1o5Of/Hh3 bfXJD/4243i7IKsLyoJqofCvvgKk9Dsh5l+ijiS4/qpYxUtgxCTg9AGQ7dwb6gA1j5ynY2ualvU hueGPfZztKfg/yhB6R0A9SzM8tXHXMRqgicNVspCzCvs7MeKkemykMLcT1oudAxIVxwyjrprib3 SaDs2sTAL3SJA/MRvuTWOOyQiewhmH4vjtlutp9FOHWTfCiO6Zy83C+F4OeQ1827Vspm8ME4eBS HM4ZMPyC9+iv1mQS9djCxHdMpnzLSMg7KW2spA342HL9gP3dnFw4KOM2tVrKDA7rpfptTjnqX5K Idi3cZ9REl08MWzDB+IDTGEdgsO7ATX2g9iIbnfKiRb8TvIIkEcZKi+GhFyI1FvJfXIr2HDeUcD RPC0xxN4mZb5ejFmqdv3tKBwxwo+JUd/RI3NKgXbxoL1qFJYvwEujqb02oyHA2YG9rrvBlCED1a p5hjanYu66vGyfWMYAUDSSrbJGyF3wPn0R8EbB2LcXEJhNAGbaFVqQg4EBE1xVKZANHxAEUo5f5 jqAp6CVyRnBBkn5jcOS5YGUjYoQcuLGjIMD3+fe+oWNBRbsFkbVQWNSp9p+BMLoHz3Vi72bnbIu 1SKWwuvZVfO/swdzfQ5iF/sQiFIWYuYUWQTQAtnczGshWv1wttb0tLuq8OP+WR7cBmIuE4RQ18D UuD/cCh1h76EoYA== X-Developer-Key: i=leitao@debian.org; a=openpgp; fpr=AC8539A6E8F46702CA4A439B35A3939FFC78776D X-Debian-User: leitao The previous patch teaches get_any_page() to return -ENOTRECOVERABLE for stable unhandlable kernel pages (PG_reserved, slab, page tables, large-kmalloc). memory_failure() still folds every negative return into MF_MSG_GET_HWPOISON, so callers that want to react to the unrecoverable cases (a panic option, smarter logging) cannot tell them apart from transient page-allocator races. Turn the post-call branch into a switch over the get_hwpoison_page() return code: map -ENOTRECOVERABLE to MF_MSG_KERNEL and any other negative return to MF_MSG_GET_HWPOISON. case 0 keeps the existing free-buddy / kernel-high-order handling and case 1 falls through to the rest of memory_failure() unchanged. The MF_MSG_KERNEL label and tracepoint string are kept as "reserved kernel page" to avoid breaking userspace tools that match on those literals; the enum value still adequately tags the failure even though it now also covers slab, page tables and large-kmalloc pages. Suggested-by: David Hildenbrand Signed-off-by: Breno Leitao Acked-by: David Hildenbrand (Arm) Acked-by: Miaohe Lin --- mm/memory-failure.c | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 8f63bdfeff8f..14c0a958638c 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -2426,7 +2426,8 @@ int memory_failure(unsigned long pfn, int flags) * that may make page_ref_freeze()/page_ref_unfreeze() mismatch. */ res =3D get_hwpoison_page(p, flags); - if (!res) { + switch (res) { + case 0: if (is_free_buddy_page(p)) { if (take_page_off_buddy(p)) { page_ref_inc(p); @@ -2445,7 +2446,19 @@ int memory_failure(unsigned long pfn, int flags) res =3D action_result(pfn, MF_MSG_KERNEL_HIGH_ORDER, MF_IGNORED); } goto unlock_mutex; - } else if (res < 0) { + case 1: + /* Got a refcount on a handlable page. */ + break; + case -ENOTRECOVERABLE: + /* + * Stable unhandlable kernel-owned page (PG_reserved, + * slab, page tables, large-kmalloc). + * No recovery possible. + */ + res =3D action_result(pfn, MF_MSG_KERNEL, MF_IGNORED); + goto unlock_mutex; + default: + /* Transient lifecycle race with the page allocator. */ res =3D action_result(pfn, MF_MSG_GET_HWPOISON, MF_IGNORED); goto unlock_mutex; } --=20 2.54.0 From nobody Sun Jun 7 03:02:01 2026 Received: from stravinsky.debian.org (stravinsky.debian.org [82.195.75.108]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 347D7322B7D; Wed, 27 May 2026 14:06:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=82.195.75.108 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779890811; cv=none; b=PAENDJ0A0TohbDdbhn+CBGKiXJAMpwVtWudxxF0rpNREiF6xLYKyLwef67j8QpQY7c+0mU2Bk46jEOzCzopN++UM9fEsxpP8XL37dCL74w8bD6xzbmgSohxMM0a9Nz2M9Vch9CyMM58jvfS9wFpPDj5y6Y13wfxigaBtafgkDIQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779890811; c=relaxed/simple; bh=PVkQrhhrYvBQD3vyC0lD7FiGrFyJsu0JdQyRsKIR+Qo=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=bmul74ghnooNrRsEBCIQqDZLGNIynIKpVBUEIek2ZgqNdaQ3+vZtfZLr9VPl7De+Rh0CGDV1zCk59vYlsIM1CxLtxbktGNagd7KVuP3y7ipIhvFvJTfO97s9NmaTi9soCSjx5iuDEzjlz3FD5vqjInfkxbuvJAU/gl0DfvLX1BQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=debian.org; spf=pass smtp.mailfrom=debian.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b=hXRQFsrh; arc=none smtp.client-ip=82.195.75.108 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=debian.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=debian.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b="hXRQFsrh" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=debian.org; s=smtpauto.stravinsky; h=X-Debian-User:Cc:To:In-Reply-To:References: Message-Id:Content-Transfer-Encoding:Content-Type:MIME-Version:Subject:Date: From:Reply-To:Content-ID:Content-Description; bh=WbBaNvEUUV9dHKCwKIk01dAys3/khbUwIgj2MiKwh1k=; b=hXRQFsrhwcO5fbPxJdZLKmAlDC /5/Oj8hsdMcdix8mPdYre37lAs2/a0dazRsm78qecUsDQOp5/BlWTs7tgauiYVwHtQJxOrqhIHS3x VO5hpMsU9x5mEztCsAvkbxvTs0dJq9GXGctAJgshsRmneMVyolxiNWlwDkJ5h1uwS38n6b9v8EpBX mo2oPfoVqD7vv2eF3CVmTGMOiz4UJbv/Pzf1iuEHyZSdoqQQpevj2E7w/UY8w9OBcfd5/wEySNfoh i0LCFKR84uFanS5/q6355tzBS2KZpjzexCmeZ3aanuRFV3bRuGjHliUIrJ+CiXKBHe8qAowY90Fg/ lmpRPmDA==; Received: from authenticated-user by stravinsky.debian.org with esmtpsa (TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim 4.96) (envelope-from ) id 1wSEuG-003DTX-1N; Wed, 27 May 2026 14:06:48 +0000 From: Breno Leitao Date: Wed, 27 May 2026 07:06:17 -0700 Subject: [PATCH v8 4/6] mm/memory-failure: add panic option for unrecoverable pages Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260527-ecc_panic-v8-4-9ea0cfa16bb0@debian.org> References: <20260527-ecc_panic-v8-0-9ea0cfa16bb0@debian.org> In-Reply-To: <20260527-ecc_panic-v8-0-9ea0cfa16bb0@debian.org> To: Miaohe Lin , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Shuah Khan , Naoya Horiguchi , Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , Jonathan Corbet , Shuah Khan , "Liam R. Howlett" , "Liam R. Howlett" Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, Breno Leitao , linux-trace-kernel@vger.kernel.org, kernel-team@meta.com X-Mailer: b4 0.16-dev-d5d98 X-Developer-Signature: v=1; a=openpgp-sha256; l=3147; i=leitao@debian.org; h=from:subject:message-id; bh=PVkQrhhrYvBQD3vyC0lD7FiGrFyJsu0JdQyRsKIR+Qo=; b=owEBbQKS/ZANAwAIATWjk5/8eHdtAcsmYgBqFvpbj7PVLzJ01FRE86nH7X9QE2rSaA3qgOH24 VsAWQc/ZMOJAjMEAAEIAB0WIQSshTmm6PRnAspKQ5s1o5Of/Hh3bQUCahb6WwAKCRA1o5Of/Hh3 bXTiEACNvfxka9cCMrQpEcA5UqXBXxS+3NZK9HFqC8QaHpUFJgFpFNNUIv95wyLEnaUW3Iw+ZBC 19iDONTUFs6ehBZYcThrTuVs9ww0ppdn8HF5f/a61eH+h61cHAzaWPkGDa6Uexo8aiuWhIfwy1I hs2DuBJMR1AQBm8nqhHlEvj7DBPAphdbdRvtWW3kYubEdNLPYZ6TFDafxZoV494fSZQcQlPCMRm cbTlyAL6OAysiMbSv9h0WCPUGmbF52hmFC8TFBSqqWGMBstelFYKz/glnOurkDpbLdIICoCtQnJ kBApBpKqWeyPupsONN6gwGp7In1ZBE9oNyggtC2tISbY4vEm1+/xlvUjpO4l43/OV8xoPZOfsxm gVF34Vb7oKz+pFqkvmkKQ3cdVeRC7Tb4Ump+VYQN9KrfwrvQ+Cp5JdeIThG+PVkaokGUL9Z19T+ SA9Yl4871nAjgcczbO6hH/RQjFWQPmUt39WcevAc1Z1jXuMByn5/GCeXmND1BUq/K2/g1WElrvH sUu9teM90CtooOmatjcSN1YB0LVEfXGCURuOCjnzaCvhVupgFkFQWY7/haAy5Y5po2TR5mCptd4 y0RpDvWdqPsz8Zdt5Ae6sygozIgLRGXR5NOWPE3UY12AT1Qce9YF4hNfAnakE/oTIfHDriu3vHq ZzTj3JTFtMHLAUg== X-Developer-Key: i=leitao@debian.org; a=openpgp; fpr=AC8539A6E8F46702CA4A439B35A3939FFC78776D X-Debian-User: leitao Add a sysctl panic_on_unrecoverable_memory_failure (disabled by default) that triggers a kernel panic when memory_failure() encounters pages that cannot be recovered. This provides a clean crash with useful debug information rather than allowing silent data corruption or a delayed crash at an unrelated code path. Panic eligibility is intentionally narrow: only MF_MSG_KERNEL with result =3D=3D MF_IGNORED panics. After the previous patch, MF_MSG_KERNEL covers PG_reserved pages and the kernel-owned pages promoted from get_hwpoison_page() via -ENOTRECOVERABLE (slab, page tables, large-kmalloc). All other action types are excluded: - MF_MSG_GET_HWPOISON and MF_MSG_KERNEL_HIGH_ORDER can be reached by transient refcount races with the page allocator (an in-flight buddy allocation has refcount 0 and is no longer on the buddy free list, briefly), and panicking on them would risk killing the box for what is actually a recoverable userspace page. - MF_MSG_UNKNOWN means identify_page_state() could not classify the page; that is precisely the wrong basis for a panic decision. Signed-off-by: Breno Leitao --- mm/memory-failure.c | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 14c0a958638c..dcd53dbc6aec 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -74,6 +74,8 @@ static int sysctl_memory_failure_recovery __read_mostly = =3D 1; =20 static int sysctl_enable_soft_offline __read_mostly =3D 1; =20 +static int sysctl_panic_on_unrecoverable_mf __read_mostly; + atomic_long_t num_poisoned_pages __read_mostly =3D ATOMIC_LONG_INIT(0); =20 static bool hw_memory_failure __read_mostly =3D false; @@ -155,6 +157,15 @@ static const struct ctl_table memory_failure_table[] = =3D { .proc_handler =3D proc_dointvec_minmax, .extra1 =3D SYSCTL_ZERO, .extra2 =3D SYSCTL_ONE, + }, + { + .procname =3D "panic_on_unrecoverable_memory_failure", + .data =3D &sysctl_panic_on_unrecoverable_mf, + .maxlen =3D sizeof(sysctl_panic_on_unrecoverable_mf), + .mode =3D 0644, + .proc_handler =3D proc_dointvec_minmax, + .extra1 =3D SYSCTL_ZERO, + .extra2 =3D SYSCTL_ONE, } }; =20 @@ -1255,6 +1266,15 @@ static void update_per_node_mf_stats(unsigned long p= fn, ++mf_stats->total; } =20 +static bool panic_on_unrecoverable_mf(enum mf_action_page_type type, + enum mf_result result) +{ + if (!sysctl_panic_on_unrecoverable_mf || result !=3D MF_IGNORED) + return false; + + return type =3D=3D MF_MSG_KERNEL; +} + /* * "Dirty/Clean" indication is not 100% accurate due to the possibility of * setting PG_dirty outside page lock. See also comment above set_page_dir= ty(). @@ -1272,6 +1292,9 @@ static int action_result(unsigned long pfn, enum mf_a= ction_page_type type, pr_err("%#lx: recovery action for %s: %s\n", pfn, action_page_types[type], action_name[result]); =20 + if (panic_on_unrecoverable_mf(type, result)) + panic("Memory failure: %#lx: unrecoverable page", pfn); + return (result =3D=3D MF_RECOVERED || result =3D=3D MF_DELAYED) ? 0 : -EB= USY; } =20 --=20 2.54.0 From nobody Sun Jun 7 03:02:01 2026 Received: from stravinsky.debian.org (stravinsky.debian.org [82.195.75.108]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DD2DA31F9A6; Wed, 27 May 2026 14:06:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=82.195.75.108 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779890817; cv=none; b=ppZYQ+5yKFPJiw7zOG4rid36Zqs2t7RT3w58xTw1d9N+rh+1wp5vK6MsSNiqrp8KEcTYIlLKIw402KQBGBQGbFauDz9Dotct4B/1Rc7ZGyBTj3y7XL7QFkdiPhn9alBhLWroBIHqN3vZL3qcaXsEZpIjEleDJoTVlmTvbW4RZqs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779890817; c=relaxed/simple; bh=B5KdTD7ZhP/7hHR5QHSstU2DTYx/6InnYGWHrgT3KG0=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=lPtdwskGiMJ39RwL5reBqEhQXIvx/+kLOjNL1A7zpJBHkGzSUd7OBoMdnjZlX85q9bZouSvJtAbe5+PWEABW+CxV7fkwnaDLV2uVajdPCqm0MMUx+CmEGiY3ZsEMDLTo9bLBT4OU/eLWs4SbRx6a4o0rhJJANZZtjVGPcp9EbUA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=debian.org; spf=pass smtp.mailfrom=debian.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b=JHUaBSOJ; arc=none smtp.client-ip=82.195.75.108 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=debian.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=debian.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b="JHUaBSOJ" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=debian.org; s=smtpauto.stravinsky; h=X-Debian-User:Cc:To:In-Reply-To:References: Message-Id:Content-Transfer-Encoding:Content-Type:MIME-Version:Subject:Date: From:Reply-To:Content-ID:Content-Description; bh=qvNiDGBgEtWh04CKORYdnjbOWGHx4PFQRMc8UbqA0bc=; b=JHUaBSOJnq5vadUYwzmkHoaMjm 1k53DUwYyVs12ddAPrAOBff7ga+He6A7EGRfQO2BZQn84C6hR3PyatneMWAHX6WBsy2rxzLMhhoT6 p2+/dn7sshOf9yKbfvNjzmESoK/FaQ9FXOJDZ+GLMu4+4VFrVhfTJkMQNv1hB5lb+VqiFREGTkNLu pBYRwsfbX5Wt7d8lMVF3Ua0Obts18Yv8jOXl26BPM+sd5pfLJR+t+DbDtLO9Zq4xMw+m/U/8vM7sx bOkKl5OFqZU5bulJTJIeW/JerxVH6cE5xtV3kKxfYPr2laAhbHkId6x8pZ2PLp0A6SoLv8B2DgwEf ATUcNy2g==; Received: from authenticated-user by stravinsky.debian.org with esmtpsa (TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim 4.96) (envelope-from ) id 1wSEuM-003DTr-0H; Wed, 27 May 2026 14:06:54 +0000 From: Breno Leitao Date: Wed, 27 May 2026 07:06:18 -0700 Subject: [PATCH v8 5/6] Documentation: document panic_on_unrecoverable_memory_failure sysctl Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260527-ecc_panic-v8-5-9ea0cfa16bb0@debian.org> References: <20260527-ecc_panic-v8-0-9ea0cfa16bb0@debian.org> In-Reply-To: <20260527-ecc_panic-v8-0-9ea0cfa16bb0@debian.org> To: Miaohe Lin , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Shuah Khan , Naoya Horiguchi , Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , Jonathan Corbet , Shuah Khan , "Liam R. Howlett" , "Liam R. Howlett" Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, Breno Leitao , linux-trace-kernel@vger.kernel.org, kernel-team@meta.com X-Mailer: b4 0.16-dev-d5d98 X-Developer-Signature: v=1; a=openpgp-sha256; l=5051; i=leitao@debian.org; h=from:subject:message-id; bh=B5KdTD7ZhP/7hHR5QHSstU2DTYx/6InnYGWHrgT3KG0=; b=owEBbQKS/ZANAwAIATWjk5/8eHdtAcsmYgBqFvpbWFZ5tW+B7rNgwndjLhTSZaqZDyApt28GJ Z+2hgOzjLKJAjMEAAEIAB0WIQSshTmm6PRnAspKQ5s1o5Of/Hh3bQUCahb6WwAKCRA1o5Of/Hh3 bcF1D/96nl3pHZvVsIHDOBSzXfwQzAgISt292m7aKGeL5wOoHLfNVNRDP/Qay3Zw8ihSmZ3SKhq eCtGDDDFkRK8DyONcMBfMNzcU10qbErfG4WjyHM6o8G0vrPAwDC0PD/7lHc2+JmsEadaFvww+XM /a8NP520hvDMj5SFWLtm6SLqzLvIoDLfy5Xuq29WYmniC/kGgBFhZ/AOHulYeBwUD/9rzcvmIwj RFNtYgjZ9sFtDVZ4qCM3BWaxuNtoZId2vtOhOK4Uz4q4+D+5kBItasimM6zOdGS5mea++lWcj0f baLqPbPYDyM1mV/Ll8+3kqjZSZmh9DyW//0FUVQAGKFkH6lqVXx+C3t88ItZWnc/ULK+PvMqAa8 qnRQxkc4RXfAjoDeIZrqro8Qk/HgJWt37VUlbMM2rziXTB70YS7yZNWFf1gn5um8U2YPuOqeMVH O+UxsbKgQA7K0qX4ZQH9HwI2+MG25WFZMXd0PUj7dEkmnKfTo79y4dsVIgDzuaLD3KLCz2hETY9 Gdj6pWYpUvRsPFL8trBz7mnqNo3wY05zjlGAGglp3LZG+rJ5igsTHRWRuc6IpfmMSA/RXwxIC74 k9cAj6ATDQ8kYFXomnrW7iWMIrt5eKqdCs1apuDfCNcqV25ZjgZ6cT5XDp0P/28x8DWsecAWbUn GG9y0sD0/WbbUEA== X-Developer-Key: i=leitao@debian.org; a=openpgp; fpr=AC8539A6E8F46702CA4A439B35A3939FFC78776D X-Debian-User: leitao Add documentation for the new vm.panic_on_unrecoverable_memory_failure sysctl, describing which failures trigger a panic (kernel-owned pages the handler cannot recover) and which are intentionally left out (transient allocator races and unclassified pages). Signed-off-by: Breno Leitao --- Documentation/admin-guide/sysctl/vm.rst | 85 +++++++++++++++++++++++++++++= ++++ 1 file changed, 85 insertions(+) diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-= guide/sysctl/vm.rst index 97e12359775c..f71d87039904 100644 --- a/Documentation/admin-guide/sysctl/vm.rst +++ b/Documentation/admin-guide/sysctl/vm.rst @@ -67,6 +67,7 @@ Currently, these files are in /proc/sys/vm: - page-cluster - page_lock_unfairness - panic_on_oom +- panic_on_unrecoverable_memory_failure - percpu_pagelist_high_fraction - stat_interval - stat_refresh @@ -925,6 +926,90 @@ panic_on_oom=3D2+kdump gives you very strong tool to i= nvestigate why oom happens. You can get snapshot. =20 =20 +panic_on_unrecoverable_memory_failure +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +When a hardware memory error (e.g. multi-bit ECC) hits a kernel page +that cannot be recovered by the memory failure handler, the default +behaviour is to ignore the error and continue operation. This is +dangerous because the corrupted data remains accessible to the kernel, +risking silent data corruption or a delayed crash when the poisoned +memory is next accessed. + +When enabled, this sysctl triggers a panic on memory failure events +hitting kernel-owned pages that the handler cannot recover: +``PageReserved`` (firmware reservations, kernel image, vDSO, zero +page, and similar memblock-reserved regions), ``PageSlab``, +``PageTable``, and ``PageLargeKmalloc``. These are owned by the +kernel and the memory failure handler cannot reliably evict their +contents. + +For soft offline (``madvise(MADV_SOFT_OFFLINE)``, +``/sys/devices/system/memory/soft_offline_page``), pages owned by +``movable_ops`` are exempted, since soft offline is allowed to +migrate them even though they are not on the LRU. + +Other unrecoverable kernel-owned populations (vmalloc allocations, +kernel stack pages, ...) are not currently covered because the +handler has no page-type signal that distinguishes them from a +userspace folio temporarily off the LRU during migration or +compaction. Such pages still go through the standard +MF_MSG_GET_HWPOISON path: ``PG_hwpoison`` is set on them and a +delayed crash on the next access remains possible. Coverage may +grow as the handler gains stronger kernel-ownership signals. + +Recoverable failure paths are also intentionally left out: in-flight +buddy allocations and other transient races with the page allocator +can reach the same diagnostic, and panicking on them would risk +killing the box for a page destined for userspace where the standard +SIGBUS recovery path applies. Pages whose state could not be +classified at all are not covered either, since an unknown state is +not a sound basis for a panic decision. + +For many environments it is preferable to panic immediately with a clean +crash dump that captures the original error context, rather than to +continue and face a random crash later whose cause is difficult to +diagnose. + +Use cases +--------- + +This option is most useful in environments where unattributed crashes +are expensive to debug or where data integrity must take precedence +over availability: + +* Large fleets, where multi-bit ECC errors on kernel pages are observed + regularly and post-mortem analysis of an unrelated downstream crash + (often seconds to minutes after the original error) consumes + significant engineering effort. + +* Systems configured with kdump, where panicking at the moment of the + hardware error produces a vmcore that still contains the faulting + address, the affected page state, and the originating MCE/GHES + record =E2=80=94 context that is typically lost by the time a delayed cr= ash + occurs. + +* High-availability clusters that rely on fast, deterministic node + failure for failover, and prefer an immediate panic over silent data + corruption propagating to replicas or persistent storage. + +* Kernel and platform developers reproducing hwpoison issues with + tools such as ``mce-inject`` or error-injection debugfs interfaces, + where panicking on the unrecoverable path makes regressions + immediately visible instead of surfacing as later, unrelated + failures. + +=3D =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D +0 Try to continue operation (default). +1 Panic immediately. If the ``panic`` sysctl is also non-zero then the + machine will be rebooted. +=3D =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +Example:: + + echo 1 > /proc/sys/vm/panic_on_unrecoverable_memory_failure + + percpu_pagelist_high_fraction =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D =20 --=20 2.54.0 From nobody Sun Jun 7 03:02:01 2026 Received: from stravinsky.debian.org (stravinsky.debian.org [82.195.75.108]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ED7A833557D; Wed, 27 May 2026 14:07:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=82.195.75.108 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779890824; cv=none; b=pvmNujK2UW5RKOovsEGC5LY0LGpQGDe/3ucPIdJhKhTtDbzs1wjF5Uctw6ehPlZTq7XH+ZaQ4ZxiKKgr5/9R/hlcfErBBWj/LkwSqG1N+H0ZOj3/KaZhNdRgu+DfGUd0TKzakOLFqkJY946S7upJjDeYS2SvhYygEIpLMln5Voo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779890824; c=relaxed/simple; bh=SDfzb/1Kmc5Trou98JEPROybLRDzX5OpaKP4EzW9bos=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=ATb6uNgRnLlwHy/6H7c9ef5TtQ2oNQpCWDvmeJtrOoVscW3Epu+NOs0aSvOQPbE8p0/9AlHE+byhUGqhY2Vq+UaIQL3ZPkMEPMzXtIAmgbyW7oT21ep1FKY2SxlsDiRBdH2NNzGmba7d+eCvDDecxnyr8nCC2gV7qiOsvKn6OeU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=debian.org; spf=pass smtp.mailfrom=debian.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b=GHjxepgV; arc=none smtp.client-ip=82.195.75.108 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=debian.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=debian.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b="GHjxepgV" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=debian.org; s=smtpauto.stravinsky; h=X-Debian-User:Cc:To:In-Reply-To:References: Message-Id:Content-Transfer-Encoding:Content-Type:MIME-Version:Subject:Date: From:Reply-To:Content-ID:Content-Description; bh=0Qlr84M3AeZWVAoevmGDLh4IZ1AZxtRMBdJ6NJSNbEM=; b=GHjxepgVgGnp6Wf+o29m2Ra++S HFNuFpXEwMWMHjCwolxdH0TgvMvXleWGAmx5CPU5vDtEjhmwWU9K01Ot55hcZZ+Q8GVMYh5JJoYlU 7+tcNX0fW04IhfaPS18QsbYuu+iwgFrPeQxpV1tTu49ywF8ssNv9hDQ69nuhj6ZPf1XsEpbiuvsb9 klvs+2Rpwk0Wb7r8qU0XF6/1mFP8iZyan4JFw+s7/s1bkzd6itDmX+bbtbk+eHBKSVqH5H+baS3AC nAZZrn9qt6TF6bKNOE4huToK+9laoek5rYNPBaT++oIYKWEppfJl+weFUCdsSiOsMFWvhFDALjEVF rmZFus2Q==; Received: from authenticated-user by stravinsky.debian.org with esmtpsa (TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim 4.96) (envelope-from ) id 1wSEuR-003DUA-1w; Wed, 27 May 2026 14:07:00 +0000 From: Breno Leitao Date: Wed, 27 May 2026 07:06:19 -0700 Subject: [PATCH v8 6/6] selftests/mm: add hwpoison-panic destructive test Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260527-ecc_panic-v8-6-9ea0cfa16bb0@debian.org> References: <20260527-ecc_panic-v8-0-9ea0cfa16bb0@debian.org> In-Reply-To: <20260527-ecc_panic-v8-0-9ea0cfa16bb0@debian.org> To: Miaohe Lin , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Shuah Khan , Naoya Horiguchi , Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , Jonathan Corbet , Shuah Khan , "Liam R. Howlett" , "Liam R. Howlett" Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, Breno Leitao , linux-trace-kernel@vger.kernel.org, kernel-team@meta.com X-Mailer: b4 0.16-dev-d5d98 X-Developer-Signature: v=1; a=openpgp-sha256; l=10551; i=leitao@debian.org; h=from:subject:message-id; bh=SDfzb/1Kmc5Trou98JEPROybLRDzX5OpaKP4EzW9bos=; b=owEBbQKS/ZANAwAIATWjk5/8eHdtAcsmYgBqFvpb7o3XvcRbKPyXz7XX5qBYI9bJQToApahEY 327G7A4xtOJAjMEAAEIAB0WIQSshTmm6PRnAspKQ5s1o5Of/Hh3bQUCahb6WwAKCRA1o5Of/Hh3 bRBLD/4h24VZsHSQb9dnBv2gc5/3ym90bQI7rQpxuprsupXISYHUL9HYiIqBe28ACS97PYPO91+ nCIj1wqExeImfriJIlsiGUhcrLTBcFwvwmbMctZ19cBlwVv97HPsYnqzUyQfy5K0T5wzNQaRSjG sopEnb9S3U3ZWCTj+YCZ+P8Zca7fy0KDL+4edk19jBTjVweXcXG4pnZIQXeT/Y22sC6C/1tiDcM m7bUpFsrsG0GgHeRSQhyvD88B/QCEYcZ9f12FRVS877nxMcY9J11i2BC/YjrCCAE26Pe2ei+KNZ uKONOsCzPURI5ubYxB9cs1027cLbuj0b5wFWPLKQHh3XlR1LO+okzSu/nU8yZQRl5CkKAaETkDb mtvj8tzqQYIPORHn3YXtrbrcZj+FCyWe3A7/EovZ8Qh0dtuqKiaNCJSXeDYpZnpHUCezHgHAIuO be+D8v7rDeMJ3efLJWxgsIfYQdkbvUsRXl2c4so0acS+hkUPe+oZPj/gwAvkoRadtesOFRImiqz SIZp7Li/3WAybQkjarOu9L6W/2MaIpyzNc5VX/yuB9SbkZHDsdAc5H1wCAwDpmnGizvmYABx/Z5 TVLgb0XTikNOOLSTD8bucvWJKb+8JWRmYmSm7AYun+x6CUn78004D8oxOqwwkQhtSLU08TgIu3j HH4G+3Q8TZ4RnuA== X-Developer-Key: i=leitao@debian.org; a=openpgp; fpr=AC8539A6E8F46702CA4A439B35A3939FFC78776D X-Debian-User: leitao Add a destructive selftest that verifies vm.panic_on_unrecoverable_memory_failure actually panics when a hwpoison error hits a kernel-owned page. Three "kinds" of kernel-owned page can be targeted, selectable via the script's first positional argument (default: rodata): rodata - a PG_reserved page in the kernel rodata range, sourced from the "Kernel rodata" sub-resource of "System RAM" in /proc/iomem. That entry is reported on every major architecture and guarantees the chosen PFN is backed by struct page (an online System RAM range, not a firmware hole), is PG_reserved, and is read-only -- so even if the panic fails to fire for some reason, the resulting PG_hwpoison marker on rodata does not corrupt writable kernel state. slab - a slab page found by walking /proc/kpageflags for the first PFN with KPF_SLAB set (and KPF_HWPOISON / KPF_NOPAGE / KPF_COMPOUND_TAIL clear). Exercises the get_any_page() path on a non PG_reserved kernel-owned page and so catches regressions where get_any_page() collapses kernel-owned pages into a transient -EIO instead of -ENOTRECOVERABLE. pgtable - same as slab, but the PFN is selected via KPF_PGTABLE. PageLargeKmalloc, the fourth page type matched by HWPoisonKernelOwned(), is intentionally not covered: it is a PAGE_TYPE_OPS flag with no /proc/kpageflags bit, so selecting such a PFN from userspace is not feasible. The slab and pgtable variants already exercise the same get_any_page() positive-check branch. The script enables the sysctl and writes the selected physical address to /sys/devices/system/memory/hard_offline_page. A successful run crashes the kernel with Memory failure: : unrecoverable page A return from the inject means the panic did not fire and the test fails. Test outcome is therefore observed externally (serial console, kdump) rather than from the script's own exit code. The script is intentionally NOT wired into run_vmtests.sh: every successful run panics the kernel, which is incompatible with the sequential "run each category in the same VM" model that run_vmtests.sh assumes. It is also not registered as a TEST_PROGS / ksft_* wrapper so a default kselftest run does not opt itself into a panic. The script is meant to be executed manually inside a disposable VM (e.g. virtme-ng), one variant per VM boot, and requires RUN_DESTRUCTIVE=3D1 in the environment as a safety net. Signed-off-by: Breno Leitao --- tools/testing/selftests/mm/Makefile | 1 + tools/testing/selftests/mm/hwpoison-panic.sh | 193 +++++++++++++++++++++++= ++++ 2 files changed, 194 insertions(+) diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/= mm/Makefile index e6df968f0971..170e376c97b4 100644 --- a/tools/testing/selftests/mm/Makefile +++ b/tools/testing/selftests/mm/Makefile @@ -181,6 +181,7 @@ TEST_FILES +=3D charge_reserved_hugetlb.sh TEST_FILES +=3D hugetlb_reparenting_test.sh TEST_FILES +=3D test_page_frag.sh TEST_FILES +=3D run_vmtests.sh +TEST_FILES +=3D hwpoison-panic.sh =20 # required by charge_reserved_hugetlb.sh TEST_FILES +=3D write_hugetlb_memory.sh diff --git a/tools/testing/selftests/mm/hwpoison-panic.sh b/tools/testing/s= elftests/mm/hwpoison-panic.sh new file mode 100755 index 000000000000..43fc379f8761 --- /dev/null +++ b/tools/testing/selftests/mm/hwpoison-panic.sh @@ -0,0 +1,193 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# +# Verify vm.panic_on_unrecoverable_memory_failure by injecting a hwpoison +# error on a kernel-owned page and confirming the kernel panics. +# +# Three "kinds" of kernel-owned page can be targeted, selectable via the +# first positional argument (default: rodata): +# +# rodata - a PG_reserved page in the kernel rodata range +# (sourced from /proc/iomem "Kernel rodata"). Exercises +# memory_failure() -> get_any_page() on a PageReserved page. +# +# slab - a slab page found via /proc/kpageflags (KPF_SLAB). +# Exercises memory_failure() -> get_any_page() on a non +# PG_reserved kernel-owned page. This path is what catches +# regressions where get_any_page() collapses kernel-owned +# pages into a transient -EIO instead of -ENOTRECOVERABLE. +# +# pgtable - a page-table page found via /proc/kpageflags (KPF_PGTABLE). +# Same path as slab, different page type. +# +# This test is DESTRUCTIVE: a successful run crashes the kernel. It is +# meant to be executed inside a disposable VM (e.g. virtme-ng) with a +# serial console captured by the harness. It is skipped unless the +# caller opts in via RUN_DESTRUCTIVE=3D1. +# +# Test passes externally: the kernel must panic with +# "Memory failure: : unrecoverable page" +# A return from the inject means the panic did not fire and the test +# fails. +# +# Author: Breno Leitao + +set -u + +ksft_skip=3D4 +sysctl_path=3D/proc/sys/vm/panic_on_unrecoverable_memory_failure +inject_path=3D/sys/devices/system/memory/hard_offline_page +kpageflags_path=3D/proc/kpageflags + +# /proc/kpageflags bit positions (see include/uapi/linux/kernel-page-flags= .h) +KPF_SLAB=3D7 +KPF_COMPOUND_TAIL=3D16 +KPF_HWPOISON=3D19 +KPF_NOPAGE=3D20 +KPF_PGTABLE=3D26 + +kind=3D${1:-rodata} + +ksft_print() { echo "# $*"; } +ksft_exit_skip() { ksft_print "$*"; exit "$ksft_skip"; } +ksft_exit_fail() { echo "not ok 1 $*"; exit 1; } + +if [ "$(id -u)" -ne 0 ]; then + ksft_exit_skip "must run as root" +fi + +if [ ! -w "$sysctl_path" ]; then + ksft_exit_skip "$sysctl_path not present (kernel without the sysctl?)" +fi + +if [ ! -w "$inject_path" ]; then + ksft_exit_skip "$inject_path not present (no MEMORY_HOTPLUG?)" +fi + +if [ "${RUN_DESTRUCTIVE:-0}" !=3D "1" ]; then + ksft_exit_skip "destructive test; re-run with RUN_DESTRUCTIVE=3D1 inside = a disposable VM" +fi + +# Pick a PFN inside the kernel image rodata region of /proc/iomem. +# This is preferred over a top-level "Reserved" entry because top-level +# Reserved ranges are often firmware holes that have no backing struct +# page; pfn_to_online_page() returns NULL on those and memory_failure() +# bails out with -ENXIO before reaching the panic path. +# +# "Kernel rodata" is reported as a sub-resource of "System RAM" on every +# major architecture, which guarantees: +# - the PFN is backed by struct page (within an online memory range); +# - PG_reserved is set on the page (kernel image area); +# - the memory is read-only, so setting PG_hwpoison on it does not +# corrupt writable kernel state if the panic somehow does not fire. +# +# /proc/iomem entries look like (indented for sub-resources): +# " 02500000-02ffffff : Kernel rodata" +pick_rodata_phys_addr() { + awk -v pagesize=3D"$(getconf PAGE_SIZE)" ' + /: Kernel rodata[[:space:]]*$/ { + sub(/^[[:space:]]+/, "") + n =3D split($0, a, /[- ]/) + start =3D strtonum("0x" a[1]) + end =3D strtonum("0x" a[2]) + if (end <=3D start) + next + # Page-align upward and emit the first byte of that page. + pfn =3D int((start + pagesize - 1) / pagesize) + printf "0x%x\n", pfn * pagesize + exit 0 + } + ' /proc/iomem +} + +# Walk /proc/kpageflags and return the phys addr of the first PFN that +# has bit $1 set, with KPF_HWPOISON, KPF_NOPAGE and KPF_COMPOUND_TAIL +# all clear (so we attack a real, non-tail, not-already-poisoned page). +# +# We skip the first 16 MiB of PFNs to step past low-memory special +# ranges (BIOS/EFI/ACPI/etc.) that often are PG_reserved and would not +# exhibit the slab/pgtable type we are looking for. +pick_kpageflags_phys_addr() { + local want_bit=3D$1 + local pagesize skip_pfn + + [ -r "$kpageflags_path" ] || return + + pagesize=3D$(getconf PAGE_SIZE) + skip_pfn=3D$(((16 * 1024 * 1024) / pagesize)) + + od -An -tx8 -v -w8 -j "$((skip_pfn * 8))" "$kpageflags_path" 2>/dev/null = | \ + awk -v want_bit=3D"$want_bit" \ + -v hwp_bit=3D"$KPF_HWPOISON" \ + -v nopage_bit=3D"$KPF_NOPAGE" \ + -v tail_bit=3D"$KPF_COMPOUND_TAIL" \ + -v base_pfn=3D"$skip_pfn" \ + -v pagesize=3D"$pagesize" ' + # Test whether bit "b" is set in the 16-hex-digit value "hex". + # Done with substring + per-digit lookup so we never rely on awk + # bitwise operators (mawk lacks them) or 64-bit FP precision. + function bit_set(hex, b, di, bi, c, v) { + di =3D int(b / 4) + bi =3D b - di * 4 + c =3D substr(hex, length(hex) - di, 1) + v =3D strtonum("0x" c) + if (bi =3D=3D 0) return (v % 2) =3D=3D 1 + if (bi =3D=3D 1) return int(v / 2) % 2 =3D=3D 1 + if (bi =3D=3D 2) return int(v / 4) % 2 =3D=3D 1 + return int(v / 8) % 2 =3D=3D 1 + } + { + gsub(/^[[:space:]]+/, "") + h =3D $1 + if (bit_set(h, want_bit) && + !bit_set(h, hwp_bit) && + !bit_set(h, nopage_bit) && + !bit_set(h, tail_bit)) { + pfn =3D base_pfn + NR - 1 + printf "0x%x\n", pfn * pagesize + exit 0 + } + } + ' +} + +case "$kind" in +rodata) + phys_addr=3D$(pick_rodata_phys_addr) + missing_msg=3D'no "Kernel rodata" entry in /proc/iomem' + ;; +slab) + phys_addr=3D$(pick_kpageflags_phys_addr "$KPF_SLAB") + missing_msg=3D"no usable slab PFN found in $kpageflags_path" + ;; +pgtable) + phys_addr=3D$(pick_kpageflags_phys_addr "$KPF_PGTABLE") + missing_msg=3D"no usable page-table PFN found in $kpageflags_path" + ;; +*) + ksft_exit_fail "unknown kind '$kind' (expected: rodata|slab|pgtable)" + ;; +esac + +if [ -z "$phys_addr" ]; then + ksft_exit_skip "$missing_msg" +fi + +ksft_print "enabling $sysctl_path" +prior=3D$(cat "$sysctl_path") +echo 1 > "$sysctl_path" || ksft_exit_fail "failed to enable sysctl" + +ksft_print "injecting hwpoison at phys 0x$(printf '%x' "$phys_addr") (kind= =3D$kind)" +ksft_print "expecting kernel panic: 'Memory failure: : unrecoverable = page'" + +# If this returns, the kernel did not panic =E2=86=92 test failed. Restor= e the +# sysctl before reporting so the system is left as we found it. +if echo "$phys_addr" > "$inject_path"; then + echo "$prior" > "$sysctl_path" + ksft_exit_fail "inject returned without panic; sysctl ineffective" +fi + +# Write failed (e.g. -EINVAL on offlining a non-online region): also a +# failure for this test, since we expected the panic path. +echo "$prior" > "$sysctl_path" +ksft_exit_fail "inject failed before reaching the panic path" --=20 2.54.0