From nobody Fri Jun 19 09:05:27 2026 Received: from stravinsky.debian.org (stravinsky.debian.org [82.195.75.108]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4AADE3BADAA; Fri, 24 Apr 2026 12:24:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=82.195.75.108 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777033472; cv=none; b=eHEcLrT+yqFddTNsnxXFlTiYURPug0Rb0+gXUQwON5yED5D4p0Q8Tsu5J5dbpnE0zyb2P9IvQea5u2cNCXgOXd2bDN2CTYn3oUxzMcfCjUFQvKnlMERJ0mowT2yD8ENhF9+o5Jfnv90qfH8bzytMQLLpl3dUMgJac55k/XrAtdw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777033472; c=relaxed/simple; bh=7lm0oK+mnNa7lcfmncY/+Y6+e7vu6VbtV0bCxg9C0n0=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=sXVGcVa7be2fvPkR2m8HlXB8ftrIInNLo5/U0tU/zh9ZI4fn/JCkMk+Xjw+wsiA5juWwTwaO2bhyyWp5kQqP+X67x9YIDKpWjLhfDIlr3Afq/bZVyk+q1R7sPhKhsc+35IPpta/3KhR1R3R+IK2FzTDvN/3Y1kZVtFAjJLCfk84= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=debian.org; spf=none smtp.mailfrom=debian.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b=bSrD78zk; arc=none smtp.client-ip=82.195.75.108 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=debian.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=debian.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b="bSrD78zk" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=debian.org; s=smtpauto.stravinsky; h=X-Debian-User:Cc:To:In-Reply-To:References: Message-Id:Content-Transfer-Encoding:Content-Type:MIME-Version:Subject:Date: From:Reply-To:Content-ID:Content-Description; bh=p7gOUZBYoVRPaxWe4o+yvogsDm8mw2e6s7QwNm3stTo=; b=bSrD78zksL+9G/yXC8/A6BkcUW O9wMV2oEqTSUawDnArmaHfr2L31kSmTyNxwOYLPCRuMVZPhgCzLz69mYCYAJE84CzN6g+szd1Squq TlejHmQnXQSmD+CIf5e5bUrdQDpmrogK+pctPFEcvwWZp/U5FHwl/agrfBTC4q6Soqv/IsNi2tqgL DD88AQZN2k1eIz/oDSSQnTrijfgtp6dJl0DGuIRRMp+ClR4L81HcM2ybCCzFZLp2sMB+MznJi9nOh h0GTf9xsVfNIZsrR8Yx8MGuvLUWmRAQZXKlnO5ybUt3pmytDTqvBMJY7jy1PtVn6ot7i/8T93Lgq6 rVrbig7Q==; Received: from authenticated user by stravinsky.debian.org with esmtpsa (TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim 4.96) (envelope-from ) id 1wGFa5-003APa-1U; Fri, 24 Apr 2026 12:24:25 +0000 From: Breno Leitao Date: Fri, 24 Apr 2026 05:23:59 -0700 Subject: [PATCH v5 1/4] mm/memory-failure: report MF_MSG_KERNEL for reserved pages Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260424-ecc_panic-v5-1-a35f4b50425c@debian.org> References: <20260424-ecc_panic-v5-0-a35f4b50425c@debian.org> In-Reply-To: <20260424-ecc_panic-v5-0-a35f4b50425c@debian.org> To: Miaohe Lin , Naoya Horiguchi , Andrew Morton , Jonathan Corbet , Shuah Khan , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Shuah Khan Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, Breno Leitao , kernel-team@meta.com X-Mailer: b4 0.16-dev-453a6 X-Developer-Signature: v=1; a=openpgp-sha256; l=1179; i=leitao@debian.org; h=from:subject:message-id; bh=7lm0oK+mnNa7lcfmncY/+Y6+e7vu6VbtV0bCxg9C0n0=; b=owEBbQKS/ZANAwAIATWjk5/8eHdtAcsmYgBp62DtcfGBW7vwyPonjrr9fSNlpZBREuvDdKhph ZviCP0L3tuJAjMEAAEIAB0WIQSshTmm6PRnAspKQ5s1o5Of/Hh3bQUCaetg7QAKCRA1o5Of/Hh3 bUCXEACJqvZ/QZYA5IPrGQOS+grYAiPtDz9AJF6o/NUfAJR6p6eBswB10uwO/tIh+Q56AJMhTlP Z07kLsY1FJaAZya2IJHIIUOF00JLN3evkX1vPWmR36hK/Jw3Ucn3L+pEsOXp3n285jXpEZvXZpg 7kbL5An2Jzfm1oIePV3Z3it3pvrgaeKpdR+98VWzyVQHrzZ3d+SXFilMV0Km43Kr4HcG67fkpDa Ez/zdvSQ5K9OxhTd9BZE9iXQbHCfQew+2ZodvitIVXvWcE706C9LMVWzcPOpidnoUtj4ko2GG2M owqkxFMb/d+dZgCqEA8+HiOdrCD9qvlF2/0IXFLCJxZ88L8/du67Dy4S0kHMvStPI5tPzDmOQNk ERdni2NuoAZTn6Crw/B0sewKYhJJA1Hjz7mYdVLyLQ+T35/DCeMNXuSOlx9yhH8/e08X11dMAGl 6HxbDlBPCXiE31rQCDKP3JOl5EHloCKhe9JbyTRYl3VhE6zHU5b8Ii/96TZfLlOuVQrSd70DIR0 8VH3PUgUlkppx6ZzhXgG8AJyMzcQ//7VOKAdOd7TB7TVhjjLrWRAOPzkzmRx15HEfJwo3Vovcr3 cp7Wu7rkd/HmOgFNE5KaeiOBt2MG0Vv3EvApuhK0erFvjWknkg2wiX/sraFP8XdS9dyZGWt1CWL DwdKnTzxxZ53SYA== X-Developer-Key: i=leitao@debian.org; a=openpgp; fpr=AC8539A6E8F46702CA4A439B35A3939FFC78776D X-Debian-User: leitao When get_hwpoison_page() returns a negative value, distinguish reserved pages from other failure cases by reporting MF_MSG_KERNEL instead of MF_MSG_GET_HWPOISON. Reserved pages belong to the kernel and should be classified accordingly for proper handling. Acked-by: Miaohe Lin Signed-off-by: Breno Leitao Reviewed-by: Lance Yang --- mm/memory-failure.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index ee42d43613097..7b67e43dafbd1 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -2432,7 +2432,16 @@ int memory_failure(unsigned long pfn, int flags) } goto unlock_mutex; } else if (res < 0) { - res =3D action_result(pfn, MF_MSG_GET_HWPOISON, MF_IGNORED); + /* + * PageReserved is stable here: reserved pages have + * PG_reserved set at boot or by drivers and are never + * freed through the page allocator. + */ + if (PageReserved(p)) + res =3D action_result(pfn, MF_MSG_KERNEL, MF_IGNORED); + else + res =3D action_result(pfn, MF_MSG_GET_HWPOISON, + MF_IGNORED); goto unlock_mutex; } =20 --=20 2.52.0 From nobody Fri Jun 19 09:05:27 2026 Received: from stravinsky.debian.org (stravinsky.debian.org [82.195.75.108]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 86EB63CAE81; Fri, 24 Apr 2026 12:24:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=82.195.75.108 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777033476; cv=none; b=li1cYierpNifiLE9s4GK7VkjvoGJCRdxvn/0bmJy8erXp95QFWe1y+Rb2R5ZBcKV411pHKLqXp1knwl+UuSFreUw3fmY6UpdFfGHCtHsEMquF46iRHTAbi9to/EUIXZL1QRfNYqvmBUY2s86jX8ML7u5rDgIRFXvKTIGzqSpxIM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777033476; c=relaxed/simple; bh=eS1k0finACx5qpWj9XoieIaQ81rq4UQ36Bes/ewTsRw=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=S5JlZlCycT1lwPe66Qndts8llFiL8PzojE6QQMOkOS9FPdor/ZXyPL7DjdPE+tFzA33fDiiyLZFBWCzjMjD5xx5UHWmcaEXDot4EKkUHhkYZuxYPz73KO5uFMzJdF3hYiHq5X5bac+nZ5fMTb4B2vjXJ3GULZZZP9B43rW94Gns= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=debian.org; spf=none smtp.mailfrom=debian.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b=jv+RC6ic; arc=none smtp.client-ip=82.195.75.108 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=debian.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=debian.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b="jv+RC6ic" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=debian.org; s=smtpauto.stravinsky; h=X-Debian-User:Cc:To:In-Reply-To:References: Message-Id:Content-Transfer-Encoding:Content-Type:MIME-Version:Subject:Date: From:Reply-To:Content-ID:Content-Description; bh=Ux+dKLEa+dGQsO13rZ7n9Hp7R5EyF61kSfNPgtxt75k=; b=jv+RC6icqmw+AIfsj3bBghXgrt R8/hpMnnPDSGI8Zha4i+moOCUok7mhlWVvHruF/78BZf8PLPnvsML3WBP0hPBDwesgNx65KNKAgCC NiofZIkjzjyw0QGqvIbaGYDdBFBwLTTf+yRwwdLHDZmy89KDeOujxsW+1ucFfx6nI5lMoFYz/p1ly fzhRcyZupEAPsPL4+IF8yAlwijbcSJCq/XsqeYSZ8YaFmCys932Af7DGy+desTW2OJKKZyr433M0b 1jHJs2GkknsKEH8Ln5kwPPZZjVYGNkze0fQ6hNsvuzXk564ZAcQW5yNphw/X5aYDDLbJKgXxegvtR j7hVbaMA==; Received: from authenticated user by stravinsky.debian.org with esmtpsa (TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim 4.96) (envelope-from ) id 1wGFaA-003APf-1P; Fri, 24 Apr 2026 12:24:30 +0000 From: Breno Leitao Date: Fri, 24 Apr 2026 05:24:00 -0700 Subject: [PATCH v5 2/4] mm/memory-failure: add panic option for unrecoverable pages Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260424-ecc_panic-v5-2-a35f4b50425c@debian.org> References: <20260424-ecc_panic-v5-0-a35f4b50425c@debian.org> In-Reply-To: <20260424-ecc_panic-v5-0-a35f4b50425c@debian.org> To: Miaohe Lin , Naoya Horiguchi , Andrew Morton , Jonathan Corbet , Shuah Khan , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Shuah Khan Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, Breno Leitao , kernel-team@meta.com X-Mailer: b4 0.16-dev-453a6 X-Developer-Signature: v=1; a=openpgp-sha256; l=7305; i=leitao@debian.org; h=from:subject:message-id; bh=eS1k0finACx5qpWj9XoieIaQ81rq4UQ36Bes/ewTsRw=; b=owEBbQKS/ZANAwAIATWjk5/8eHdtAcsmYgBp62DtOfcaIv9TaQOU9gZwxQ7MlBe8fgAShmc/j uVu17mpXsiJAjMEAAEIAB0WIQSshTmm6PRnAspKQ5s1o5Of/Hh3bQUCaetg7QAKCRA1o5Of/Hh3 bfmYEACtzFVve0wOH9y0XhrsqkQRjyCZGRlfLloIc5eISlSB/JLE/rBNsDmq+PmAnkscUrkLR4B CniMwQp0p87r4fhfiYPxetH9YzeN5mA3f7Izw7KdCwONAWeuuswvwsMP4HfdjaiesIJEQd//WPt 0HoGn5JYbqR4hFHJWcmWFPZvSydNa/rA09jL48fVN2TNCyIqajXp/wt7I+UD4T7VPMBp2smc18N EtuASULZd5wscNCTwERsnoOAZ5AabgsulRamo2LaqTGUPbykSCBzUVG+6XUSHGA68KTES5245TM gkWA84AZOtohxhiVXEP1vLOObi33BHrphFfBP63AjsZ9+Xda3AbsIrC78od2hOmnUIJx6Rsp9vz 8SJ1xxp9mPuo+gEKOVxU8E5Z7KPTEciV7hSUJE3X+Pg0GYqTbPsduaT967j981byU6yup0qDuiK Qv2ZFjDp2aM0na55No31lICfnHtDjlxiPAgZDsK/nLPh8eX6lSLgJA+57C6YQTLqwOE1byPTCCN hwyfbD/gDFVraivUNwNrAP8kKdjiZYE3bkI7wZ/RKeDd/UjlbDju0VTcWbGSPryUbd6fL2vzmtZ RZ9hDRJ6D6AHtTE+XcScNZVM93CytZFrvJhWBiooeiuVj1w+KDGdLTO9oqBju1BSWTE3lrMCT2V e9H/rUsNDDkogMg== X-Developer-Key: i=leitao@debian.org; a=openpgp; fpr=AC8539A6E8F46702CA4A439B35A3939FFC78776D X-Debian-User: leitao Add a sysctl panic_on_unrecoverable_memory_failure that triggers a kernel panic when memory_failure() encounters pages that cannot be recovered. This provides a clean crash with useful debug information rather than allowing silent data corruption or a delayed crash at an unrelated code path. The panic is triggered for three categories of unrecoverable failures, all requiring result =3D=3D MF_IGNORED: - MF_MSG_KERNEL: reserved pages identified via PageReserved. - MF_MSG_KERNEL_HIGH_ORDER: pages that get_hwpoison_page() observed with refcount 0 but that are not in the buddy allocator (e.g. tail pages of a high-order kernel allocation). A buddy page being concurrently allocated to userspace can briefly land on this branch too =E2=80=94 its refcount is 0 inside the allocator and it is no longer = on the buddy free list =E2=80=94 and panicking on such a page would defeat t= he standard SIGBUS recovery path. The page allocator cannot reject hwpoisoned buddy pages reliably either: check_new_pages() is gated by is_check_pages_enabled() and is a no-op when CONFIG_DEBUG_VM=3Dn. Rule out the race inside panic_on_unrecoverable_mf(): yield with cpu_relax() so a concurrent allocator on another CPU can finish prep_new_page() and have its writes become visible, then re-check. A genuine high-order kernel tail page stays unowned (refcount 0, no LRU, no mapping, not in buddy); an in-flight allocation will have bumped the refcount, attached a mapping, or placed the page on an LRU by then. Only panic if the recheck still observes a fully unowned page. The window is narrowed, not eliminated, but is far below any allocator path's cost. - MF_MSG_UNKNOWN: pages that do not match any known recoverable state in error_states[]. A theoretical false positive from concurrent LRU isolation is mitigated by identify_page_state()'s two-pass design which rechecks using saved page_flags. MF_MSG_GET_HWPOISON is intentionally excluded: it covers both non-reserved kernel memory (SLAB/SLUB, vmalloc, kernel stacks, page tables) and transient refcount races, so panicking would risk false positives. Signed-off-by: Breno Leitao --- mm/memory-failure.c | 91 +++++++++++++++++++++++++++++++++++++++++++++++++= ++++ 1 file changed, 91 insertions(+) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 7b67e43dafbd1..fd1aed1af94a1 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -74,6 +74,8 @@ static int sysctl_memory_failure_recovery __read_mostly = =3D 1; =20 static int sysctl_enable_soft_offline __read_mostly =3D 1; =20 +static int sysctl_panic_on_unrecoverable_mf __read_mostly; + atomic_long_t num_poisoned_pages __read_mostly =3D ATOMIC_LONG_INIT(0); =20 static bool hw_memory_failure __read_mostly =3D false; @@ -155,6 +157,15 @@ static const struct ctl_table memory_failure_table[] = =3D { .proc_handler =3D proc_dointvec_minmax, .extra1 =3D SYSCTL_ZERO, .extra2 =3D SYSCTL_ONE, + }, + { + .procname =3D "panic_on_unrecoverable_memory_failure", + .data =3D &sysctl_panic_on_unrecoverable_mf, + .maxlen =3D sizeof(sysctl_panic_on_unrecoverable_mf), + .mode =3D 0644, + .proc_handler =3D proc_dointvec_minmax, + .extra1 =3D SYSCTL_ZERO, + .extra2 =3D SYSCTL_ONE, } }; =20 @@ -1281,6 +1292,75 @@ static void update_per_node_mf_stats(unsigned long p= fn, ++mf_stats->total; } =20 +/* + * Determine whether to panic on an unrecoverable memory failure. + * + * Panics on three categories of failures (all requiring result =3D=3D MF_= IGNORED): + * + * - MF_MSG_KERNEL: Reserved pages (PageReserved) that belong to the kerne= l. + * + * - MF_MSG_KERNEL_HIGH_ORDER: Pages that get_hwpoison_page() observed with + * refcount 0 but that are not in the buddy allocator (e.g. tail pages of + * a high-order kernel allocation). A buddy page being concurrently + * allocated could also reach this branch =E2=80=94 its refcount is brie= fly 0 + * inside the allocator and it is no longer on the buddy free list =E2= =80=94 and + * such a page may be destined for userspace, where the standard hwpoison + * path would recover it via SIGBUS. The page allocator cannot reject + * hwpoisoned buddy pages reliably either: check_new_pages() is gated by + * is_check_pages_enabled() and is a no-op when CONFIG_DEBUG_VM=3Dn. The + * recheck below rules out this race before panicking. + * + * - MF_MSG_UNKNOWN: Pages that reached identify_page_state() but matched = no + * recoverable state in error_states[]. A theoretical false positive from + * concurrent LRU isolation is mitigated by identify_page_state()'s + * two-pass design which rechecks using saved page_flags. + * + * MF_MSG_GET_HWPOISON is intentionally excluded: it covers dynamically + * allocated kernel memory (SLAB/SLUB, vmalloc, kernel stacks, page tables) + * which shares the return path with transient refcount races, so panicking + * would risk false positives. + */ +static bool panic_on_unrecoverable_mf(unsigned long pfn, + enum mf_action_page_type type, + enum mf_result result) +{ + struct page *p; + + if (!sysctl_panic_on_unrecoverable_mf || result !=3D MF_IGNORED) + return false; + + switch (type) { + case MF_MSG_KERNEL: + case MF_MSG_UNKNOWN: + return true; + case MF_MSG_KERNEL_HIGH_ORDER: + /* + * Rule out a concurrent buddy allocation: give the + * allocator a moment to finish prep_new_page() and + * re-check. A genuine high-order kernel tail page stays + * unowned; an in-flight allocation will have bumped the + * refcount, attached a mapping, or placed the page on + * an LRU by now. + */ + p =3D pfn_to_online_page(pfn); + if (!p) + return true; + /* + * Yield so a concurrent allocator on another CPU can + * finish prep_new_page() and have its writes become + * visible before we resample the page state. + */ + cpu_relax(); + return page_count(p) =3D=3D 0 && + !PageLRU(p) && + !page_mapped(p) && + !page_folio(p)->mapping && + !is_free_buddy_page(p); + default: + return false; + } +} + /* * "Dirty/Clean" indication is not 100% accurate due to the possibility of * setting PG_dirty outside page lock. See also comment above set_page_dir= ty(). @@ -1298,6 +1378,9 @@ static int action_result(unsigned long pfn, enum mf_a= ction_page_type type, pr_err("%#lx: recovery action for %s: %s\n", pfn, action_page_types[type], action_name[result]); =20 + if (panic_on_unrecoverable_mf(pfn, type, result)) + panic("Memory failure: %#lx: unrecoverable page", pfn); + return (result =3D=3D MF_RECOVERED || result =3D=3D MF_DELAYED) ? 0 : -EB= USY; } =20 @@ -2428,6 +2511,14 @@ int memory_failure(unsigned long pfn, int flags) } res =3D action_result(pfn, MF_MSG_BUDDY, res); } else { + /* + * The page has refcount 0 but is not in the buddy + * allocator =E2=80=94 typically a tail page of a high-order + * kernel allocation. A buddy page being concurrently + * allocated to userspace can also briefly land here; + * panic_on_unrecoverable_mf() rechecks to rule that + * out before triggering a panic. + */ res =3D action_result(pfn, MF_MSG_KERNEL_HIGH_ORDER, MF_IGNORED); } goto unlock_mutex; --=20 2.52.0 From nobody Fri Jun 19 09:05:27 2026 Received: from stravinsky.debian.org (stravinsky.debian.org [82.195.75.108]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1D2163C944A; Fri, 24 Apr 2026 12:24:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=82.195.75.108 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777033480; cv=none; b=MGD0vqKfykQVX93fMVlSxr3CYSsgj2a6o+4OXsz6SRdnGD66L5DCXbaXkOyNb0LuTvz/PZ54Kdy7/nvL9cIU8cyjBo3xrr7fM9kztwji+zUI2BwNwXmWhTLIj9QvlCsLSa+8q9npPcAeqQAE9sB2BDnfMbOliq0xGe+Gyq3nLis= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777033480; c=relaxed/simple; bh=/qCKorRDHDfdon8uQB15LannYTC+7j0ppSiNsgWgO1Y=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=gS+e3Y1u30UT0xGHTUPGtwPj9XsrsCrUO143oeXR6Z3SnF98zRc8+s06HqpsBLt4+X82ApORWQMMPQu3R5V2oxBAzAJN0RWArql3U/BhcfG9immR6UWdERNmiVnkAb05whHDQ1CulagsG81sQdozXgdHJ6En3pdMU38wWdd8Dfs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=debian.org; spf=none smtp.mailfrom=debian.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b=gKK60k+S; arc=none smtp.client-ip=82.195.75.108 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=debian.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=debian.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b="gKK60k+S" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=debian.org; s=smtpauto.stravinsky; h=X-Debian-User:Cc:To:In-Reply-To:References: Message-Id:Content-Transfer-Encoding:Content-Type:MIME-Version:Subject:Date: From:Reply-To:Content-ID:Content-Description; bh=T58sPSkCpDO6mBCqL0XNY2K3RwSWcmiQzHhYDz9yJNk=; b=gKK60k+S3QWIkEsCrZpLN2qzb7 nut7ZZBWKz1PqFWcgB/i1eQix6vrVsbVf5Nrkxr34b/YhxBif9c1+0eAA3/pEQ4WAHsXQEJ9MNnJi SdkS2/kuu+SY3XwaCJHg0iINy/ij99/44h+aDAha2ATMISlY4HV0+1j3Jb5UFpK2VH5KtagHiwfQN Dz7KAPRyMp5DW66Cp8F4YHEaw+KBYzYuL5E6n7qTbkup10a/+jjak64JJ2Bo6MfENvuiNtLXtvKZE ZLWCZj8qTrFsS3zHAtcnCspX3nJ+lmZBJN/MYuX2vDBIYzoL4BLLFegpjc/O8A1dYNwMFp4dJXR6o j2BqZ/BA==; Received: from authenticated user by stravinsky.debian.org with esmtpsa (TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim 4.96) (envelope-from ) id 1wGFaF-003APz-1L; Fri, 24 Apr 2026 12:24:35 +0000 From: Breno Leitao Date: Fri, 24 Apr 2026 05:24:01 -0700 Subject: [PATCH v5 3/4] Documentation: document panic_on_unrecoverable_memory_failure sysctl Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260424-ecc_panic-v5-3-a35f4b50425c@debian.org> References: <20260424-ecc_panic-v5-0-a35f4b50425c@debian.org> In-Reply-To: <20260424-ecc_panic-v5-0-a35f4b50425c@debian.org> To: Miaohe Lin , Naoya Horiguchi , Andrew Morton , Jonathan Corbet , Shuah Khan , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Shuah Khan Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, Breno Leitao , kernel-team@meta.com X-Mailer: b4 0.16-dev-453a6 X-Developer-Signature: v=1; a=openpgp-sha256; l=3920; i=leitao@debian.org; h=from:subject:message-id; bh=/qCKorRDHDfdon8uQB15LannYTC+7j0ppSiNsgWgO1Y=; b=owEBbQKS/ZANAwAIATWjk5/8eHdtAcsmYgBp62Duf59kh2dj9J6Mb05R+4nEoFVAbAXCB+7LW 8yh5FWm1EqJAjMEAAEIAB0WIQSshTmm6PRnAspKQ5s1o5Of/Hh3bQUCaetg7gAKCRA1o5Of/Hh3 bSVZEACbzhttOBC3Ol+GlNv//VUwHekj1rvhaAp79eZJ8vFb8jBi8rEAbCDhEBO1mCojmLbpb2E rrywuAl9yDZxWmG2mQGJtDS5wvNtMbP9KWUgwtpctvC0OQnATLrf1BN46NdC4RQ/628ZoL6TgTV PCqSuAlxdCkZkG/xsQE+KlPmqGUTogqFEqtreAk4CxZHtPrNqTOQ2TJOtIFIxflrSFcKPlUyD2U c4oUttA2kx7Uk03nMUB7HTRBbQ8jdaHGr9GSU44Kfk5s1skC8xGZOd3JD+XlCsiNRD5zCfda/1i Bzs1tmJih0O8B6ZaZvl2RGyxIpFFEh0ekQMP6HuESvcW59bJexYxoY5Ilkz3M4RhU5n3/PzHsi4 Vw9EHRuK7f8+ci2tANKyIZV9vIT9gRo1JWtfU2dBzicb43za53G57L9YFbwuWlyYJB9Xo55gd4Y qvZTF1DQ5MISTe547MZi8KsZ6qeHZUAnbIw1ua+rhwXxlipjEltrf6jdDqvcnurrPWF0FsEsk8k GSFndcTgQta0DeaX3U6V+zAt30Yg44jSSE2EhwydgOWLJhp8QG8HeFcCnJfq3NaIPw46+YhxASK uQjcjPwGAE+SIuwlCi3ow/0n1KcmQNCdQn7DNfQUarx1YP59vGKOZGahiPFCtGMp2smPr0jUTZu iZaBWTaaqrwqKzA== X-Developer-Key: i=leitao@debian.org; a=openpgp; fpr=AC8539A6E8F46702CA4A439B35A3939FFC78776D X-Debian-User: leitao Add documentation for the new vm.panic_on_unrecoverable_memory_failure sysctl, describing the three categories of failures that trigger a panic and noting which kernel page types are not yet covered. Signed-off-by: Breno Leitao --- Documentation/admin-guide/sysctl/vm.rst | 65 +++++++++++++++++++++++++++++= ++++ 1 file changed, 65 insertions(+) diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-= guide/sysctl/vm.rst index 97e12359775c9..f118ec5cd1fad 100644 --- a/Documentation/admin-guide/sysctl/vm.rst +++ b/Documentation/admin-guide/sysctl/vm.rst @@ -67,6 +67,7 @@ Currently, these files are in /proc/sys/vm: - page-cluster - page_lock_unfairness - panic_on_oom +- panic_on_unrecoverable_memory_failure - percpu_pagelist_high_fraction - stat_interval - stat_refresh @@ -925,6 +926,70 @@ panic_on_oom=3D2+kdump gives you very strong tool to i= nvestigate why oom happens. You can get snapshot. =20 =20 +panic_on_unrecoverable_memory_failure +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +When a hardware memory error (e.g. multi-bit ECC) hits a kernel page +that cannot be recovered by the memory failure handler, the default +behaviour is to ignore the error and continue operation. This is +dangerous because the corrupted data remains accessible to the kernel, +risking silent data corruption or a delayed crash when the poisoned +memory is next accessed. + +When enabled, this sysctl triggers a panic on three categories of +unrecoverable failures: reserved kernel pages, non-buddy kernel pages +with zero refcount (e.g. tail pages of high-order allocations), and +pages whose state cannot be classified as recoverable. + +Note that some kernel page types =E2=80=94 such as slab objects, vmalloc +allocations, kernel stacks, and page tables =E2=80=94 share a failure path +with transient refcount races and are not currently covered by this +option. I.e, do not panic when not confident of the page status. + +For many environments it is preferable to panic immediately with a clean +crash dump that captures the original error context, rather than to +continue and face a random crash later whose cause is difficult to +diagnose. + +Use cases +--------- + +This option is most useful in environments where unattributed crashes +are expensive to debug or where data integrity must take precedence +over availability: + +* Large fleets, where multi-bit ECC errors on kernel pages are observed + regularly and post-mortem analysis of an unrelated downstream crash + (often seconds to minutes after the original error) consumes + significant engineering effort. + +* Systems configured with kdump, where panicking at the moment of the + hardware error produces a vmcore that still contains the faulting + address, the affected page state, and the originating MCE/GHES + record =E2=80=94 context that is typically lost by the time a delayed cr= ash + occurs. + +* High-availability clusters that rely on fast, deterministic node + failure for failover, and prefer an immediate panic over silent data + corruption propagating to replicas or persistent storage. + +* Kernel and platform developers reproducing hwpoison issues with + tools such as ``mce-inject`` or error-injection debugfs interfaces, + where panicking on the unrecoverable path makes regressions + immediately visible instead of surfacing as later, unrelated + failures. + +=3D =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D +0 Try to continue operation (default). +1 Panic immediately. If the ``panic`` sysctl is also non-zero then the + machine will be rebooted. +=3D =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +Example:: + + echo 1 > /proc/sys/vm/panic_on_unrecoverable_memory_failure + + percpu_pagelist_high_fraction =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D =20 --=20 2.52.0 From nobody Fri Jun 19 09:05:27 2026 Received: from stravinsky.debian.org (stravinsky.debian.org [82.195.75.108]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5C5683CAE85; Fri, 24 Apr 2026 12:24:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=82.195.75.108 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777033485; cv=none; b=bf/6WE0kTUstqSyhnhbb4HaGJZ00GaBg2rP41enSvTO1IifXu/KneeyHpwXZKzJU4z+aVd35cbha5tm1N5oSIfU+1wrG/vr+C+bEqDNj+csreFUy71zIyDRu9/OrpcDjynMisgBWrhiicHy4CQ6BgkR1EPCKoXP4zv8DcYms01A= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777033485; c=relaxed/simple; bh=Ju4F+y15N/15YSKcoOoe8hecUCP4Fg5cxAM43INPgiw=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=mXiRESmQW6juTuaB3YDiIB2SFM2zcL0CRRczyRL+axh20likMgTQxg4al87KjlsTlUQr39nkOlrF3OJrALSGxvsaz6kt4HhLb0pRwNLEe0s1A+J7J/0i+pZpFkE8MBE/7TOVm1P1bZP6XgVgdiA91SSAXIuratGbuAEWo7MTy00= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=debian.org; spf=none smtp.mailfrom=debian.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b=YN3rWZLc; arc=none smtp.client-ip=82.195.75.108 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=debian.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=debian.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org header.b="YN3rWZLc" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=debian.org; s=smtpauto.stravinsky; h=X-Debian-User:Cc:To:In-Reply-To:References: Message-Id:Content-Transfer-Encoding:Content-Type:MIME-Version:Subject:Date: From:Reply-To:Content-ID:Content-Description; bh=VJ09Jo0h/vP/hOFzbwH4NhaWc8BN8qSSaNGeuLFDRBg=; b=YN3rWZLc23qtp4id7Che7GivgN LF8Am2+CC2mxHoaRu6hOkF4Usj7eEN2lt36BXtrcuCZHr0CRb9M/FG1Ww3J3kohVSyInyp1hyL3l8 jmn1ipXgtFzxQbSUpXrTbw/mbQ+J+IJEd4kqeE0PHYxGWcyJ+fP89PpGplBbNJHZOXBwEkJj3Fzpk i4TlXPfLlVClj7Wn0MhbDkO47SINKJMlIMr/IoiGn9+3pBC/cZIqOhfVroRDtO6al8t/eMyO4IdXQ Ii9j6FATtWNS92NqCAPLU5EhlNwANxF4onx6QHwt5toxqK3JS82KjsNx1Dhj1Edw7p3Xiy8qS2hGG BToyHnJw==; Received: from authenticated user by stravinsky.debian.org with esmtpsa (TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim 4.96) (envelope-from ) id 1wGFaK-003AQL-1h; Fri, 24 Apr 2026 12:24:40 +0000 From: Breno Leitao Date: Fri, 24 Apr 2026 05:24:02 -0700 Subject: [PATCH v5 4/4] selftests/mm: regression test for panic_on_unrecoverable_memory_failure Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260424-ecc_panic-v5-4-a35f4b50425c@debian.org> References: <20260424-ecc_panic-v5-0-a35f4b50425c@debian.org> In-Reply-To: <20260424-ecc_panic-v5-0-a35f4b50425c@debian.org> To: Miaohe Lin , Naoya Horiguchi , Andrew Morton , Jonathan Corbet , Shuah Khan , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Shuah Khan Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, Breno Leitao , kernel-team@meta.com X-Mailer: b4 0.16-dev-453a6 X-Developer-Signature: v=1; a=openpgp-sha256; l=3929; i=leitao@debian.org; h=from:subject:message-id; bh=Ju4F+y15N/15YSKcoOoe8hecUCP4Fg5cxAM43INPgiw=; b=owEBbQKS/ZANAwAIATWjk5/8eHdtAcsmYgBp62Duu02nZb1HcgkNBUxR/EIf1G8rz/XDjppUT a5iK2aM0haJAjMEAAEIAB0WIQSshTmm6PRnAspKQ5s1o5Of/Hh3bQUCaetg7gAKCRA1o5Of/Hh3 bWsDD/9hAKxJHJJVVYekL9J7smIGLipCaTr4t1HTbxP1YFDcRGht3rB9LvffgMAKZ62PgQoIP3k Wv5fIbdL4Wqec2gvXjLyr+VE5aQxBiIZSpqsTJx8iKAsQIBI4xpXpjy+T/j7Du+BP2EElYhXjKD +f9SEYoqgOYL8XCgucawSJQ2yzGSykz0pj4Cp2JqGqTTMVNSNn+P0bf6j0/fHRNdKclHBz6rL+f 5tHOkvT9UyD5FQwvuxJ/lDyT5b0D3PYChXL5Och1JmS33EzDyKZtKclj4qtaEgaI2PlToBApaTW nHOlSx4tDt0Xyh55JRMqCAKvHLWUw52QeABwgugBgR/CM4vlbct8Ep71GfT9AGwjz/sR6yqEZRh w03TG61wtVlT6gNdaYhCeX4t7fu/EpivZfdO2/qaB5A3QPAPtijmolOVwjlZGVixcPDnQGppcog 6uZyckkp4wLCspqDn8TfZ1gJp24FSSVZs4RKaFLUtuWzfNWKiN38ZwoZ+XU/96usP8yHOt7OuBB g66OGBDe5NJJUju6idlaxWnnowQ7TdpfNYWq8QpLQGEtDn2IU2DHrqD3L1Zb7VdCicDrnEnSqb+ iiiuXHHejKuVCcRSHbwdxDE6P8smOYx45fIXYbcu+sfSSyscZ+RyQ7hTORpdAHkVC1vYSVwveVr giSdM36EMaw2/hg== X-Developer-Key: i=leitao@debian.org; a=openpgp; fpr=AC8539A6E8F46702CA4A439B35A3939FFC78776D X-Debian-User: leitao Add a test that enables vm.panic_on_unrecoverable_memory_failure and injects MADV_HWPOISON on a userspace anonymous page. The page must still be recovered via SIGBUS =E2=80=94 it must not trigger a kernel panic. This is the regression test for the panic_on_unrecoverable_mf() recheck: a buddy page being concurrently allocated to userspace can briefly land on the MF_MSG_KERNEL_HIGH_ORDER branch (refcount 0, not in buddy), and without the recheck the kernel would panic on what is actually a recoverable userspace page. Run in a forked child so the SIGBUS path is fully exercised; if the kernel ever regresses and panics, the host VM dies and the harness reports the binary as never returning, which is itself a clear failure signal. Skips when the sysctl is not present (feature not built in) or when the test cannot write to it (insufficient privilege). Saves and restores the original sysctl value. Signed-off-by: Breno Leitao --- tools/testing/selftests/mm/memory-failure.c | 84 +++++++++++++++++++++++++= ++++ 1 file changed, 84 insertions(+) diff --git a/tools/testing/selftests/mm/memory-failure.c b/tools/testing/se= lftests/mm/memory-failure.c index 032ed952057c6..9cb8d694aee94 100644 --- a/tools/testing/selftests/mm/memory-failure.c +++ b/tools/testing/selftests/mm/memory-failure.c @@ -17,9 +17,13 @@ #include #include #include +#include +#include =20 #include "vm_util.h" =20 +#define PANIC_SYSCTL "/proc/sys/vm/panic_on_unrecoverable_memory_failure" + enum inject_type { MADV_HARD, MADV_SOFT, @@ -355,4 +359,84 @@ TEST_F(memory_failure, dirty_pagecache) ASSERT_EQ(close(fd), 0); } =20 +static int read_sysctl_int(const char *path, int *out) +{ + char buf[16]; + int fd, n; + + fd =3D open(path, O_RDONLY); + if (fd < 0) + return -1; + n =3D read(fd, buf, sizeof(buf) - 1); + close(fd); + if (n <=3D 0) + return -1; + buf[n] =3D '\0'; + *out =3D atoi(buf); + return 0; +} + +static int write_sysctl_int(const char *path, int val) +{ + char buf[16]; + int fd, len, ret =3D 0; + + fd =3D open(path, O_WRONLY); + if (fd < 0) + return -1; + len =3D snprintf(buf, sizeof(buf), "%d\n", val); + if (write(fd, buf, len) !=3D len) + ret =3D -1; + close(fd); + return ret; +} + +/* + * Regression test for vm.panic_on_unrecoverable_memory_failure. + * + * With the sysctl on, hwpoison injection on a userspace anonymous page + * must still be recovered via SIGBUS =E2=80=94 it must not trigger a kern= el + * panic. This guards the panic_on_unrecoverable_mf() recheck that rules + * out concurrent buddy allocations being misclassified as unrecoverable + * kernel pages (MF_MSG_KERNEL_HIGH_ORDER). + * + * If the kernel regresses and panics, the host VM dies and the test + * harness will report the binary as never having returned =E2=80=94 which= is + * itself a clear failure signal. + */ +TEST(panic_on_unrecoverable_user_page) +{ + unsigned long page_size; + int saved, status; + void *addr; + pid_t pid; + + if (read_sysctl_int(PANIC_SYSCTL, &saved)) + SKIP(return, "%s not available\n", PANIC_SYSCTL); + if (write_sysctl_int(PANIC_SYSCTL, 1)) + SKIP(return, "cannot enable %s (need root?)\n", PANIC_SYSCTL); + + page_size =3D sysconf(_SC_PAGESIZE); + + pid =3D fork(); + ASSERT_NE(pid, -1); + if (pid =3D=3D 0) { + addr =3D mmap(NULL, page_size, PROT_READ | PROT_WRITE, + MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); + if (addr =3D=3D MAP_FAILED) + _exit(1); + *(volatile char *)addr =3D 1; + if (madvise(addr, page_size, MADV_HWPOISON)) + _exit(2); + FORCE_READ(*(volatile char *)addr); + _exit(0); /* unreachable: SIGBUS expected */ + } + + ASSERT_EQ(waitpid(pid, &status, 0), pid); + write_sysctl_int(PANIC_SYSCTL, saved); + + ASSERT_TRUE(WIFSIGNALED(status)); + ASSERT_EQ(WTERMSIG(status), SIGBUS); +} + TEST_HARNESS_MAIN --=20 2.52.0