Changes from previouis version
* Replaced the late, action-time recheck for MF_MSG_KERNEL_HIGH_ORDER
with early failure classification inside get_any_page() (new patch
2/4).
* David Hildenbrand pointed out that the recheck inspected refcount and
folio mapping without holding a folio reference, which is unsafe
(concurrent split can trigger VM_WARN_ON_FOLIO).
* Lance Yang suggested moving the disambiguation to the call site that
still knows *why* the page reference could not be taken, which is what
this version does via a new enum mf_get_page_status (MF_GET_PAGE_OK
/ RACE / UNHANDLABLE) plumbed out through get_hwpoison_page().
Signed-off-by: Breno Leitao <leitao@debian.org>
---
Changes in v6:
- Dropped the selftest given the value was not clear
- Get the status of the failure from get_any_page()
- Small nits from different people/AIs.
- Link to v5: https://patch.msgid.link/20260424-ecc_panic-v5-0-a35f4b50425c@debian.org
Changes in v5:
- Add vm.panic_on_unrecoverable_memory_failure sysctl to panic on
unrecoverable kernel page hwpoison events (reserved pages, refcount-0
non-buddy pages, unknown state), with a recheck to avoid racing with
concurrent buddy allocations. (Miaohe)
- Distinguish reserved pages as MF_MSG_KERNEL in memory_failure(),
document the new sysctl in Documentation/admin-guide/sysctl/vm.rst,
and add a selftest verifying SIGBUS recovery on userspace pages still
works when the sysctl is enabled. (Miaohe)
- Added a selftest
- Link to v4:
https://patch.msgid.link/20260415-ecc_panic-v4-0-2d0277f8f601@debian.org
Changes in v4:
- Drop CONFIG_BOOTPARAM_MEMORY_FAILURE_PANIC kernel configuration option.
- Split the reserved page classification (MF_MSG_KERNEL) into its own
patch, separate from the panic mechanism.
- Document why the buddy allocator TOCTOU race (between
get_hwpoison_page() and is_free_buddy_page()) cannot cause false
positives: PG_hwpoison is set beforehand and check_new_page() in the
page allocator rejects hwpoisoned pages.
- Document the narrow LRU isolation race window for MF_MSG_UNKNOWN and
its mitigation via identify_page_state()'s two-pass design.
- Explicitly document why MF_MSG_GET_HWPOISON is excluded from the
panic conditions (shared path with transient races and non-reserved
kernel memory).
- Link to v3: https://patch.msgid.link/20260413-ecc_panic-v3-0-1dcbb2f12bc4@debian.org
Changes in v3:
- Rename is_unrecoverable_memory_failure() to panic_on_unrecoverable_mf()
as suggested by maintainer.
- Add CONFIG_BOOTPARAM_MEMORY_FAILURE_PANIC kernel configuration option,
similar to CONFIG_BOOTPARAM_HARDLOCKUP_PANIC.
- Add documentation for the sysctl and CONFIG option.
- Add code comments documenting the panic condition design rationale and
how the retry mechanism mitigates false positives from buddy allocator
races.
- Link to v2: https://patch.msgid.link/20260331-ecc_panic-v2-0-9e40d0f64f7a@debian.org
Changes in v2:
- Panic on MF_MSG_KERNEL, MF_MSG_KERNEL_HIGH_ORDER and MF_MSG_UNKNOWN
instead of MF_MSG_GET_HWPOISON.
- Report MF_MSG_KERNEL for reserved pages when get_hwpoison_page() fails
instead of MF_MSG_GET_HWPOISON.
- Link to v1: https://patch.msgid.link/20260323-ecc_panic-v1-0-72a1921726c5@debian.org
To: Miaohe Lin <linmiaohe@huawei.com>
To: Naoya Horiguchi <nao.horiguchi@gmail.com>
To: Andrew Morton <akpm@linux-foundation.org>
To: Steven Rostedt <rostedt@goodmis.org>
To: Masami Hiramatsu <mhiramat@kernel.org>
To: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Jonathan Corbet <corbet@lwn.net>
To: Shuah Khan <skhan@linuxfoundation.org>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-trace-kernel@vger.kernel.org
Cc: linux-doc@vger.kernel.org
---
Breno Leitao (4):
mm/memory-failure: report MF_MSG_KERNEL for reserved pages
mm/memory-failure: classify get_any_page() failures by reason
mm/memory-failure: add panic option for unrecoverable pages
Documentation: document panic_on_unrecoverable_memory_failure sysctl
Documentation/admin-guide/sysctl/vm.rst | 70 ++++++++++++++++++++++++++
include/trace/events/memory-failure.h | 2 +-
mm/memory-failure.c | 89 ++++++++++++++++++++++++++++++---
3 files changed, 152 insertions(+), 9 deletions(-)
---
base-commit: e98d21c170b01ddef366f023bbfcf6b31509fa83
change-id: 20260323-ecc_panic-4e473b83087c
Best regards,
--
Breno Leitao <leitao@debian.org>