Documentation/mm/index.rst | 1 + Documentation/mm/page_consistency.rst | 211 +++++++++++++++ MAINTAINERS | 10 + include/linux/dual_bitmap.h | 216 ++++++++++++++++ include/linux/page_consistency.h | 84 ++++++ mm/Kconfig.debug | 59 +++++ mm/Makefile | 2 + mm/mm_init.c | 9 + mm/page_alloc.c | 4 + mm/page_consistency.c | 360 ++++++++++++++++++++++++++ mm/page_consistency_test.c | 274 ++++++++++++++++++++ 11 files changed, 1230 insertions(+) create mode 100644 Documentation/mm/page_consistency.rst create mode 100644 include/linux/dual_bitmap.h create mode 100644 include/linux/page_consistency.h create mode 100644 mm/page_consistency.c create mode 100644 mm/page_consistency_test.c
Existing memory debugging tools - KASAN, KFENCE, page_poisoning - detect access violations and content corruption, but none of them can detect silent corruption in the page allocator's own metadata. If a hardware bit flip corrupts an allocation bitmap, the allocator hands out a page that is already in use (or fails to hand out a free one), and nothing in the kernel notices. This series adds a dual-bitmap consistency checker that maintains the invariant primary == ~secondary across two independently allocated bitmaps, so that any single-bit corruption in either bitmap is immediately detectable. The approach is based on NVIDIA safety research. Field studies consistently show that DRAM errors at scale are far more common than textbook assumptions suggest, even with ECC. Schroeder et al. (SIGMETRICS 2009) found 8% of DIMMs experienced errors per year in Google's fleet; Sridharan and Liberty (SC 2012) reported similar rates at LANL; Meta's 2021-2022 work documented silent data corruption at scale, including memory-related faults. The critical property of allocator metadata corruption is that it doesn't trigger an invalid memory access - the corrupted data is structurally valid, just wrong. KASAN instruments accesses, not metadata integrity, so it cannot see this class of fault. Functional safety is a different discipline from security that aims to reduce the risk of hardware and software misbehaving to an acceptable level. Security hardens against adversaries; safety hardens against random hardware failures (cosmic rays, cell wear-out, thermal noise) and systematic software failures (bugs). ISO 26262 (automotive functional safety) defines four Automotive Safety Integrity Levels, ASIL A through D. ASIL-D, the most stringent, is derived from the severity of the hazard in case of failure. IEC 61508 defines similar levels (SIL-1 through SIL-4) for industrial systems, and there are equivalent standards for avionics and medical devices. ISO 26262 requires Freedom From Interference (FFI): a safety element must not be corrupted by faults in other elements. For an OS kernel, this means the memory allocator's metadata must either be immune to corruption or corruption must be detected before it propagates. The dual-bitmap implements a way to protect from corruption coming from hardware or software - two complementary representations of page allocation state, allocated independently via memblock, where any single-bit fault in either bitmap is immediately detectable. Performance is secondary to correctness in this context. A safety mechanism must be simple enough to audit and certify, must fail deterministically (panic, not log-and-hope), and its correctness matters more than its throughput. The dual-bitmap adds two atomic bitops per alloc/free, but for safety-critical deployments this cost is acceptable because the alternative - undetected corruption propagating silently - violates the system's safety case. The static key ensures zero cost for kernels that don't need it. The natural question is why not use page_ext. The key objection from a safety perspective is that page_ext stores per-page metadata in memory that is itself subject to the same hardware faults we're trying to detect. The dual-bitmap approach works because the two bitmaps are independent allocations - corruption in one is caught by comparison with the other. Embedding both in page_ext means a single fault could corrupt both the tracking data and its redundant copy in the same allocation region. ISO 26262 recommends this approach for protecting against hardware faults, but it also helps against software faults - co-locating both bitmaps in page_ext violates this principle. Beyond the safety argument, there are practical issues: page_ext adds 8-100+ bytes per page depending on enabled features while the dual-bitmap uses 2 bits per page total, and page_ext initializes after the buddy allocator while the checker must be active before memblock_free_all() hands pages to buddy. Sasha Levin (7): mm: add generic dual-bitmap consistency primitives mm: add page consistency checker header mm: add Kconfig options for page consistency checker mm: add page consistency checker implementation mm/page_alloc: integrate page consistency hooks Documentation/mm: add page consistency checker documentation mm/page_consistency: add KUnit tests for dual-bitmap primitives Documentation/mm/index.rst | 1 + Documentation/mm/page_consistency.rst | 211 +++++++++++++++ MAINTAINERS | 10 + include/linux/dual_bitmap.h | 216 ++++++++++++++++ include/linux/page_consistency.h | 84 ++++++ mm/Kconfig.debug | 59 +++++ mm/Makefile | 2 + mm/mm_init.c | 9 + mm/page_alloc.c | 4 + mm/page_consistency.c | 360 ++++++++++++++++++++++++++ mm/page_consistency_test.c | 274 ++++++++++++++++++++ 11 files changed, 1230 insertions(+) create mode 100644 Documentation/mm/page_consistency.rst create mode 100644 include/linux/dual_bitmap.h create mode 100644 include/linux/page_consistency.h create mode 100644 mm/page_consistency.c create mode 100644 mm/page_consistency_test.c -- 2.53.0
On 4/24/26 16:00, Sasha Levin wrote: > Existing memory debugging tools - KASAN, KFENCE, page_poisoning - detect > access violations and content corruption, but none of them can detect > silent corruption in the page allocator's own metadata. If a hardware > bit flip corrupts an allocation bitmap, the allocator hands out a page An allocation what? The page allocator is a buddy allocator, it has no bitmap to track free/allocated state of pages?
On Fri, Apr 24, 2026 at 05:42:53PM +0200, Vlastimil Babka (SUSE) wrote: >On 4/24/26 16:00, Sasha Levin wrote: >> Existing memory debugging tools - KASAN, KFENCE, page_poisoning - detect >> access violations and content corruption, but none of them can detect >> silent corruption in the page allocator's own metadata. If a hardware >> bit flip corrupts an allocation bitmap, the allocator hands out a page > >An allocation what? The page allocator is a buddy allocator, it has no >bitmap to track free/allocated state of pages? You're right, the cover letter is misleading there. Buddy doesn't use a bitmap: PageBuddy lives in page_type, the free list is a list, and page->private holds the order. The dual-bitmap is new metadata the feature adds, maintained from the alloc/free hooks. What it actually catches is the same PFN being handed out twice before it's freed, or freed without having been allocated. Not every kind of buddy corruption shows up that way, but the common bad ones do. Corruption of the bitmap itself shows up through the complement invariant. I'll fix the wording in v2. -- Thanks, Sasha
On 4/24/26 18:25, Sasha Levin wrote: > On Fri, Apr 24, 2026 at 05:42:53PM +0200, Vlastimil Babka (SUSE) wrote: >> On 4/24/26 16:00, Sasha Levin wrote: >>> Existing memory debugging tools - KASAN, KFENCE, page_poisoning - detect >>> access violations and content corruption, but none of them can detect >>> silent corruption in the page allocator's own metadata. If a hardware >>> bit flip corrupts an allocation bitmap, the allocator hands out a page >> >> An allocation what? The page allocator is a buddy allocator, it has no >> bitmap to track free/allocated state of pages? > > You're right, the cover letter is misleading there. Buddy doesn't use a bitmap: > PageBuddy lives in page_type, the free list is a list, and page->private holds > the order. The dual-bitmap is new metadata the feature adds, maintained from > the alloc/free hooks. Given that you have PageBuddy (first "bit"), could we use a second bit in page_ext? -- Cheers, David
On Sat, Apr 25, 2026 at 07:51:10AM +0200, David Hildenbrand (Arm) wrote: >On 4/24/26 18:25, Sasha Levin wrote: >> On Fri, Apr 24, 2026 at 05:42:53PM +0200, Vlastimil Babka (SUSE) wrote: >>> On 4/24/26 16:00, Sasha Levin wrote: >>>> Existing memory debugging tools - KASAN, KFENCE, page_poisoning - detect >>>> access violations and content corruption, but none of them can detect >>>> silent corruption in the page allocator's own metadata. If a hardware >>>> bit flip corrupts an allocation bitmap, the allocator hands out a page >>> >>> An allocation what? The page allocator is a buddy allocator, it has no >>> bitmap to track free/allocated state of pages? >> >> You're right, the cover letter is misleading there. Buddy doesn't use a bitmap: >> PageBuddy lives in page_type, the free list is a list, and page->private holds >> the order. The dual-bitmap is new metadata the feature adds, maintained from >> the alloc/free hooks. > >Given that you have PageBuddy (first "bit"), could we use a second bit in page_ext? Hmm... Thats an interesting idea. I can see two concerns with something like this: 1. The checker has to be live before memblock_free_all() hands pages to buddy. page_ext isn't fully up that early I think. 2. page_type encodes buddy, offline, slab tags, etc... and a page that isn't PageBuddy isn't necessarily allocated through alloc_pages. The invariant gets case-y. But let me think about it a bit more. -- Thanks, Sasha
On Fri, Apr 24, 2026 at 10:00:49AM -0400, Sasha Levin wrote: > corruption must be detected before it propagates. The dual-bitmap > implements a way to protect from corruption coming from hardware or > software - two complementary representations of page allocation state, > allocated independently via memblock, where any single-bit fault in > either bitmap is immediately detectable. Performance is secondary to > correctness in this context. A safety mechanism must be simple enough > to audit and certify, must fail deterministically (panic, not > log-and-hope), and its correctness matters more than its throughput. > The dual-bitmap adds two atomic bitops per alloc/free, but for > safety-critical deployments this cost is acceptable because the > alternative - undetected corruption propagating silently - violates > the system's safety case. The static key ensures zero cost for kernels > that don't need it. But doubling the storage requirement in order to achieve merely detection is significantly worse than state-of-the-art in 1950 (when Richard Hamming invented Hamming codes). If we used a (7,3) code, we'd have SECDED at a lower cost. Of course, there are far better codes available than that today.
On Fri, Apr 24, 2026 at 04:34:12PM +0100, Matthew Wilcox wrote: >On Fri, Apr 24, 2026 at 10:00:49AM -0400, Sasha Levin wrote: >> corruption must be detected before it propagates. The dual-bitmap >> implements a way to protect from corruption coming from hardware or >> software - two complementary representations of page allocation state, >> allocated independently via memblock, where any single-bit fault in >> either bitmap is immediately detectable. Performance is secondary to >> correctness in this context. A safety mechanism must be simple enough >> to audit and certify, must fail deterministically (panic, not >> log-and-hope), and its correctness matters more than its throughput. >> The dual-bitmap adds two atomic bitops per alloc/free, but for >> safety-critical deployments this cost is acceptable because the >> alternative - undetected corruption propagating silently - violates >> the system's safety case. The static key ensures zero cost for kernels >> that don't need it. > >But doubling the storage requirement in order to achieve merely detection >is significantly worse than state-of-the-art in 1950 (when Richard >Hamming invented Hamming codes). If we used a (7,3) code, we'd have >SECDED at a lower cost. Of course, there are far better codes available >than that today. I agree with the density concern. I have two reasons for that: 1. Update cost. On the alloc/free hot path the dual-bitmap update is two independent test_and_set_bit. A Hamming/SECDED codeword needs a read-modify-write of the whole word with locking on every state change. 2. Correlated faults. The two copies need to sit in different physical memory so a multi-bit fault (row, column, bank, row-hammer) can only hit one of them. See this paper which has some numbers: https://dl.acm.org/doi/epdf/10.1145/2786763.2694348 - About 21% of DRAM faults span more than one bit, plain SECDED can leave up to 20 FIT per device of undetected errors from those, and it only helps at all if data and parity bits are spread across physically separate cells. Two memblock_alloc'd bitmaps give that separation for free. You could interleave a code across two independent regions instead, but then the invariant check stops being a one-line complement check, which is what I was trying to keep simple for the audit side. -- Thanks, Sasha
© 2016 - 2026 Red Hat, Inc.