Documentation/dev-tools/index.rst | 1 + Documentation/dev-tools/kstackwatch.rst | 377 +++++++++++++++++++++ MAINTAINERS | 9 + arch/Kconfig | 10 + arch/arm64/Kconfig | 1 + arch/arm64/include/asm/hw_breakpoint.h | 1 + arch/arm64/kernel/hw_breakpoint.c | 12 + arch/x86/Kconfig | 1 + arch/x86/include/asm/hw_breakpoint.h | 8 + arch/x86/kernel/hw_breakpoint.c | 148 +++++---- include/linux/hw_breakpoint.h | 6 + include/linux/kstackwatch.h | 68 ++++ include/linux/kstackwatch_types.h | 14 + include/linux/sched.h | 5 + kernel/events/hw_breakpoint.c | 37 +++ mm/Kconfig | 1 + mm/Makefile | 1 + mm/kstackwatch/Kconfig | 34 ++ mm/kstackwatch/Makefile | 8 + mm/kstackwatch/kernel.c | 295 +++++++++++++++++ mm/kstackwatch/stack.c | 416 ++++++++++++++++++++++++ mm/kstackwatch/test.c | 345 ++++++++++++++++++++ mm/kstackwatch/watch.c | 309 ++++++++++++++++++ tools/kstackwatch/kstackwatch_test.sh | 85 +++++ 24 files changed, 2130 insertions(+), 62 deletions(-) create mode 100644 Documentation/dev-tools/kstackwatch.rst create mode 100644 include/linux/kstackwatch.h create mode 100644 include/linux/kstackwatch_types.h create mode 100644 mm/kstackwatch/Kconfig create mode 100644 mm/kstackwatch/Makefile create mode 100644 mm/kstackwatch/kernel.c create mode 100644 mm/kstackwatch/stack.c create mode 100644 mm/kstackwatch/test.c create mode 100644 mm/kstackwatch/watch.c create mode 100755 tools/kstackwatch/kstackwatch_test.sh
Earlier this year, I debugged a stack corruption panic that revealed the
limitations of existing debugging tools. The bug persisted for 739 days
before being fixed (CVE-2025-22036), and my reproduction scenario
differed from the CVE report—highlighting how unpredictably these bugs
manifest.
The panic call trace:
<4>[89318.486564] <TASK>
<4>[89318.486570] dump_stack_lvl+0x48/0x70
<4>[89318.486580] dump_stack+0x10/0x20
<4>[89318.486586] panic+0x345/0x3a0
<4>[89318.486596] ? __blk_flush_plug+0x121/0x130
<4>[89318.486603] __stack_chk_fail+0x14/0x20
<4>[89318.486612] __blk_flush_plug+0x121/0x130
...27 other frames omitted
<4>[89318.486824] ksys_read+0x6b/0xf0
<4>[89318.486829] __x64_sys_read+0x19/0x30
<4>[89318.486834] x64_sys_call+0x1ada/0x25c0
<4>[89318.486840] do_syscall_64+0x7f/0x180
<4>[89318.486847] ? exc_page_fault+0x94/0x1b0
<4>[89318.486855] entry_SYSCALL_64_after_hwframe+0x73/0x7b
<4>[89318.486866] </TASK>
Initially, I enabled KASAN, but the bug did not reproduce. Reviewing the
code in __blk_flush_plug(), I found it difficult to trace all logic
paths due to indirect function calls through function pointers.
I added canary-locating code to obtain the canary address and value,
then inserted extensive debugging code to track canary modifications. I
observed the canary being corrupted between two unrelated assignments,
indicating corruption by another thread—a silent stack corruption bug.
I then added hardware breakpoint (hwbp) code, but still failed to catch
the corruption. After adding PID filters, function parameter filters,
and depth filters, I discovered the corruption occurred in
end_buffer_read_sync() via atomic_dec(&bh->b_count), where bh->b_count
overlapped with __blk_flush_plug()'s canary address. Tracing the bh
lifecycle revealed the root cause in exfat_get_block()—a function not
even present in the panic call trace.
This bug was later assigned CVE-2025-22036
(https://lore.kernel.org/all/2025041658-CVE-2025-22036-6469@gregkh/).
The vulnerability was introduced in commit 11a347fb6cef (March 13, 2023)
and fixed in commit 1bb7ff4204b6 (March 21, 2025)—persisting for 739
days. Notably, my reproduction scenario differed significantly from that
described in the CVE report, highlighting how these bugs manifest
unpredictably across different workloads.
This experience revealed how notoriously difficult stack corruption bugs
are to debug: KASAN cannot reproduce them, call traces are misleading,
and the actual culprit often lies outside the visible call chain. Manual
instrumentation with hardware breakpoints and filters was effective but
extremely time-consuming.
This motivated KStackWatch: automating the debugging workflow I manually
performed, making hardware breakpoint-based stack monitoring readily
available to all kernel developers facing similar issues.
KStackWatch is a lightweight debugging tool to detect kernel stack
corruption in real time. It installs a hardware breakpoint (watchpoint)
at a function's specified offset using kprobe.post_handler and removes
it in fprobe.exit_handler. This covers the full execution window and
reports corruption immediately with time, location, and a call stack.
Beyond automating proven debugging workflows, KStackWatch incorporates
robust engineering to handle complex scenarios like context switches,
recursion, and concurrent execution, making it suitable for broad
debugging use cases.
## Key Features
* Immediate and precise stack corruption detection
* Support for multiple concurrent watchpoints with configurable limits
* Lockless design, usable in any context
* Depth filter for recursive calls
* Low overhead of memory and CPU
* Flexible debugfs configuration with key=val syntax
* Architecture support: x86_64 and arm64
* Auto-canary detection to simplify configuration
## Architecture Support
KStackWatch currently supports x86_64 and arm64. The design is
architecture-agnostic, requiring only:
* Hardware breakpoint modification in atomic context
Arm64 support required only ~20 lines of code(patch 18,19). Future ports
to other architectures (e.g., riscv) should be straightforward for
developers familiar with their hardware breakpoint implementations.
## Performance Impact
Runtime overhead was measured on Intel Core Ultra 5 125H @ 3 GHz running
kernel 6.17, using test4 from patch 24:
Type | Time (ns) | Cycles
-----------------------------------------------
entry with watch | 10892 | 32620
entry without watch | 159 | 466
exit with watch | 12541 | 37556
exit without watch | 124 | 369
Comparation with other scenarios:
Mode | CPU Overhead (add) | Memory Overhead (add)
----------------------------+----------------------+-------------------------
Compiled but not enabled | None | ~20 B per task
Enabled, no function hit | None | ~few hundred B
Func hit, HWBP not toggled | ~140 ns per call | None
Func hit, HWBP toggled | ~11–12 µs per call | None
The overhead is minimal, making KStackWatch suitable for production
environments where stack corruption is suspected but kernel rebuilds are not feasible.
## Validation
To validate the approach, this series includes a self-contained test module and
a companion shell script. The module provides several test cases covering
scenarios such as canary overflow, recursive depth tracking, multi-threaded
silent corruption, retaddr overwriten. A detailed workflow example and usage
guide are provided in the documentation (patch 26).
While KStackWatch itself is a new tool and has not yet discovered production
bugs, it automates the exact methodology that I used to manually uncover
CVE-2025-22036. The tool is designed to make this powerful debugging technique
readily available to kernel developers, enabling them to efficiently detect and
diagnose similar stack corruption issues in the future.
---
Patches 1–3 of this series are also used in the wprobe work proposed by
Masami Hiramatsu, so there may be some overlap between our patches.
Patch 3 comes directly from Masami Hiramatsu (thanks).
---
Changelog:
v8:
* Add arm64 support
* Implement hwbp_reinstall() for arm64.
* Use single-step mode as default in ksw_watch_handler().
* Add latency measurements for probe handlers.
* Update configuration options
* Introduce explicit auto_canary parameter.
* Default watch_len to sizeof(unsigned long) when zero.
* Replace panic_on_catch with panic_hit ksw_config option.
* Enable KStackWatch in non-debug builds.
* Limit canary search range to the current stack frame when possible.
* Add automatic architecture detection for test parameters.
* Move kstackwatch.h to include/linux/.
* Relocate Kconfig fragments to the kstackwatch/ directory.
v7:
https://lore.kernel.org/all/20251009105650.168917-1-wangjinchao600@gmail.com/
* Fix maintainer entry to alphabetical position
v6:
https://lore.kernel.org/all/20250930024402.1043776-1-wangjinchao600@gmail.com/
* Replace procfs with debugfs interface
* Fix typos
v5:
https://lore.kernel.org/all/20250924115124.194940-1-wangjinchao600@gmail.com/
* Support key=value input format
* Support multiple watchpoints
* Support watching instruction inside loop
* Support recursion depth tracking with generation
* Ignore triggers from fprobe trampoline
* Split watch_on into watch_get and watch_on to fail fast
* Handle ksw_stack_prepare_watch error
* Rewrite silent corruption test
* Add multiple watchpoints test
* Add an example in documentation
v4:
https://lore.kernel.org/all/20250912101145.465708-1-wangjinchao600@gmail.com/
* Solve the lockdep issues with:
* per-task KStackWatch context to track depth
* atomic flag to protect watched_addr
* Use refactored version of arch_reinstall_hw_breakpoint
v3:
https://lore.kernel.org/all/20250910052335.1151048-1-wangjinchao600@gmail.com/
* Use modify_wide_hw_breakpoint_local() (from Masami)
* Add atomic flag to restrict /proc/kstackwatch to a single opener
* Protect stack probe with an atomic PID flag
* Handle CPU hotplug for watchpoints
* Add preempt_disable/enable in ksw_watch_on_local_cpu()
* Introduce const struct ksw_config *ksw_get_config(void) and use it
* Switch to global watch_attr, remove struct watch_info
* Validate local_var_len in parser()
* Handle case when canary is not found
* Use dump_stack() instead of show_regs() to allow module build
* Reduce logging and comments
* Format logs with KBUILD_MODNAME
* Remove unused headers
* Add new document
v2:
https://lore.kernel.org/all/20250904002126.1514566-1-wangjinchao600@gmail.com/
* Make hardware breakpoint and stack operations
architecture-independent.
v1:
https://lore.kernel.org/all/20250828073311.1116593-1-wangjinchao600@gmail.com/
* Replaced kretprobe with fprobe for function exit hooking, as
suggested by Masami Hiramatsu
* Introduced per-task depth logic to track recursion across scheduling
* Removed the use of workqueue for a more efficient corruption check
* Reordered patches for better logical flow
* Simplified and improved commit messages throughout the series
* Removed initial archcheck which should be improved later
* Replaced the multiple-thread test with silent corruption test
* Split self-tests into a separate patch to improve clarity.
* Added a new entry for KStackWatch to the MAINTAINERS file.
---
Jinchao Wang (26):
x86/hw_breakpoint: Unify breakpoint install/uninstall
x86/hw_breakpoint: Add arch_reinstall_hw_breakpoint
mm/ksw: add build system support
mm/ksw: add ksw_config struct and parser
mm/ksw: add singleton debugfs interface
mm/ksw: add HWBP pre-allocation
mm/ksw: Add atomic watchpoint management api
mm/ksw: ignore false positives from exit trampolines
mm/ksw: support CPU hotplug
sched/ksw: add per-task context
mm/ksw: add entry kprobe and exit fprobe management
mm/ksw: add per-task ctx tracking
mm/ksw: resolve stack watch addr and len
mm/ksw: limit canary search to current stack frame
mm/ksw: manage probe and HWBP lifecycle via procfs
mm/ksw: add KSTACKWATCH_PROFILING to measure probe cost
arm64/hw_breakpoint: Add arch_reinstall_hw_breakpoint
arm64/hwbp/ksw: integrate KStackWatch handler support
mm/ksw: add self-debug helpers
mm/ksw: add test module
mm/ksw: add stack overflow test
mm/ksw: add recursive depth test
mm/ksw: add multi-thread corruption test cases
tools/ksw: add arch-specific test script
docs: add KStackWatch document
MAINTAINERS: add entry for KStackWatch
Masami Hiramatsu (Google) (1):
HWBP: Add modify_wide_hw_breakpoint_local() API
Documentation/dev-tools/index.rst | 1 +
Documentation/dev-tools/kstackwatch.rst | 377 +++++++++++++++++++++
MAINTAINERS | 9 +
arch/Kconfig | 10 +
arch/arm64/Kconfig | 1 +
arch/arm64/include/asm/hw_breakpoint.h | 1 +
arch/arm64/kernel/hw_breakpoint.c | 12 +
arch/x86/Kconfig | 1 +
arch/x86/include/asm/hw_breakpoint.h | 8 +
arch/x86/kernel/hw_breakpoint.c | 148 +++++----
include/linux/hw_breakpoint.h | 6 +
include/linux/kstackwatch.h | 68 ++++
include/linux/kstackwatch_types.h | 14 +
include/linux/sched.h | 5 +
kernel/events/hw_breakpoint.c | 37 +++
mm/Kconfig | 1 +
mm/Makefile | 1 +
mm/kstackwatch/Kconfig | 34 ++
mm/kstackwatch/Makefile | 8 +
mm/kstackwatch/kernel.c | 295 +++++++++++++++++
mm/kstackwatch/stack.c | 416 ++++++++++++++++++++++++
mm/kstackwatch/test.c | 345 ++++++++++++++++++++
mm/kstackwatch/watch.c | 309 ++++++++++++++++++
tools/kstackwatch/kstackwatch_test.sh | 85 +++++
24 files changed, 2130 insertions(+), 62 deletions(-)
create mode 100644 Documentation/dev-tools/kstackwatch.rst
create mode 100644 include/linux/kstackwatch.h
create mode 100644 include/linux/kstackwatch_types.h
create mode 100644 mm/kstackwatch/Kconfig
create mode 100644 mm/kstackwatch/Makefile
create mode 100644 mm/kstackwatch/kernel.c
create mode 100644 mm/kstackwatch/stack.c
create mode 100644 mm/kstackwatch/test.c
create mode 100644 mm/kstackwatch/watch.c
create mode 100755 tools/kstackwatch/kstackwatch_test.sh
-*
2.43.0
On Tue, Nov 11, 2025 at 12:35:55AM +0800, Jinchao Wang wrote:
> Earlier this year, I debugged a stack corruption panic that revealed the
> limitations of existing debugging tools. The bug persisted for 739 days
> before being fixed (CVE-2025-22036), and my reproduction scenario
> differed from the CVE report—highlighting how unpredictably these bugs
> manifest.
Well, this demonstrates the dangers of keeping this problem siloed
within your own exfat group. The fix made in 1bb7ff4204b6 is wrong!
It was fixed properly in 7375f22495e7 which lists its Fixes: as
Linux-2.6.12-rc2, but that's simply the beginning of git history.
It's actually been there since v2.4.6.4 where it's documented as simply:
- some subtle fs/buffer.c race conditions (Andrew Morton, me)
As far as I can tell the changes made in 1bb7ff4204b6 should be
reverted.
> Initially, I enabled KASAN, but the bug did not reproduce. Reviewing the
> code in __blk_flush_plug(), I found it difficult to trace all logic
> paths due to indirect function calls through function pointers.
So why is the solution here not simply to fix KASAN instead of this
giant patch series?
On Mon, Nov 10, 2025 at 05:33:22PM +0000, Matthew Wilcox wrote: > On Tue, Nov 11, 2025 at 12:35:55AM +0800, Jinchao Wang wrote: > > Earlier this year, I debugged a stack corruption panic that revealed the > > limitations of existing debugging tools. The bug persisted for 739 days > > before being fixed (CVE-2025-22036), and my reproduction scenario > > differed from the CVE report—highlighting how unpredictably these bugs > > manifest. > > Well, this demonstrates the dangers of keeping this problem siloed > within your own exfat group. The fix made in 1bb7ff4204b6 is wrong! > It was fixed properly in 7375f22495e7 which lists its Fixes: as > Linux-2.6.12-rc2, but that's simply the beginning of git history. > It's actually been there since v2.4.6.4 where it's documented as simply: > > - some subtle fs/buffer.c race conditions (Andrew Morton, me) > > As far as I can tell the changes made in 1bb7ff4204b6 should be > reverted. Thank you for the correction and the detailed history. I wasn't aware this dated back to v2.4.6.4. I'm not part of the exfat group; I simply encountered a bug that 1bb7ff4204b6 happened to resolve in my scenario. The timeline actually illustrates the exact problem KStackWatch addresses: a bug introduced in 2001, partially addressed in 2025, then properly fixed months later. The 24-year gap suggests these silent stack corruptions are extremely difficult to locate. > > > Initially, I enabled KASAN, but the bug did not reproduce. Reviewing the > > code in __blk_flush_plug(), I found it difficult to trace all logic > > paths due to indirect function calls through function pointers. > > So why is the solution here not simply to fix KASAN instead of this > giant patch series? KASAN caught 7375f22495e7 because put_bh() accessed bh->b_count after wait_on_buffer() of another thread returned—the stack was invalid. In 1bb7ff4204b6 and my case, corruption occurred before the victim function of another thread returned. The stack remained valid to KASAN, so no warning triggered. This is timing-dependent, not a KASAN deficiency. Making KASAN treat parts of active stack frame as invalid would be complex and add significant overhead, likely worsening the reproduction prevention issue. KASAN's overhead already prevented reproduction in my environment. KStackWatch takes a different approach: it watches stack frame regardless of whether KASAN considers them valid or invalid, with much less overhead thereby preserving reproduction scenarios. The value proposition: Finding where corruption occurs is the bottleneck. Once located, subsystem experts can analyze the root cause. Without that location, even experts are stuck. If KStackWatch had existed earlier, this 24-year-old bug might have been found sooner when someone hit a similar corruption. The same applies to other stack corruption bugs. I'd appreciate your thoughts on whether this addresses your concerns. Best regards, Jinchao
[dropping all the individual email addresses; leaving only the
mailing lists]
On Wed, Nov 12, 2025 at 10:14:29AM +0800, Jinchao Wang wrote:
> On Mon, Nov 10, 2025 at 05:33:22PM +0000, Matthew Wilcox wrote:
> > On Tue, Nov 11, 2025 at 12:35:55AM +0800, Jinchao Wang wrote:
> > > Earlier this year, I debugged a stack corruption panic that revealed the
> > > limitations of existing debugging tools. The bug persisted for 739 days
> > > before being fixed (CVE-2025-22036), and my reproduction scenario
> > > differed from the CVE report—highlighting how unpredictably these bugs
> > > manifest.
> >
> > Well, this demonstrates the dangers of keeping this problem siloed
> > within your own exfat group. The fix made in 1bb7ff4204b6 is wrong!
> > It was fixed properly in 7375f22495e7 which lists its Fixes: as
> > Linux-2.6.12-rc2, but that's simply the beginning of git history.
> > It's actually been there since v2.4.6.4 where it's documented as simply:
> >
> > - some subtle fs/buffer.c race conditions (Andrew Morton, me)
> >
> > As far as I can tell the changes made in 1bb7ff4204b6 should be
> > reverted.
>
> Thank you for the correction and the detailed history. I wasn't aware this
> dated back to v2.4.6.4. I'm not part of the exfat group; I simply
> encountered a bug that 1bb7ff4204b6 happened to resolve in my scenario.
> The timeline actually illustrates the exact problem KStackWatch addresses:
> a bug introduced in 2001, partially addressed in 2025, then properly fixed
> months later. The 24-year gap suggests these silent stack corruptions are
> extremely difficult to locate.
I think that's a misdiagnosis caused by not understanding the limited
circumstances in which the problem occurs. To hit this problem, you
have to have a buffer_head allocated on the stack. That doesn't happen
in many places:
fs/buffer.c: struct buffer_head tmp = {
fs/direct-io.c: struct buffer_head map_bh = { 0, };
fs/ext2/super.c: struct buffer_head tmp_bh;
fs/ext2/super.c: struct buffer_head tmp_bh;
fs/ext4/mballoc-test.c: struct buffer_head bitmap_bh;
fs/ext4/mballoc-test.c: struct buffer_head gd_bh;
fs/gfs2/bmap.c: struct buffer_head bh;
fs/gfs2/bmap.c: struct buffer_head bh;
fs/isofs/inode.c: struct buffer_head dummy;
fs/jfs/super.c: struct buffer_head tmp_bh;
fs/jfs/super.c: struct buffer_head tmp_bh;
fs/mpage.c: struct buffer_head map_bh;
fs/mpage.c: struct buffer_head map_bh;
It's far more common for buffer_heads to be allocated from slab and
attached to folios. The other necessary condition to hit this problem
is that get_block() has to actually read the data from disk. That's
not normal either! Most filesystems just fill in the metadata about
the block and defer the actual read to when the data is wanted. That's
the high-performance way to do it.
So our opportunity to catch this bug was highly limited by the fact that
we just don't run the codepaths that would allow it to trigger.
> > > Initially, I enabled KASAN, but the bug did not reproduce. Reviewing the
> > > code in __blk_flush_plug(), I found it difficult to trace all logic
> > > paths due to indirect function calls through function pointers.
> >
> > So why is the solution here not simply to fix KASAN instead of this
> > giant patch series?
>
> KASAN caught 7375f22495e7 because put_bh() accessed bh->b_count after
> wait_on_buffer() of another thread returned—the stack was invalid.
> In 1bb7ff4204b6 and my case, corruption occurred before the victim
> function of another thread returned. The stack remained valid to KASAN,
> so no warning triggered. This is timing-dependent, not a KASAN deficiency.
I agree that it's a narrow race window, but nevertheless KASAN did catch
it with ntfs and not with exfat. The KASAN documentation states that
it can catch this kind of bug:
Generic KASAN supports finding bugs in all of slab, page_alloc, vmap, vmalloc,
stack, and global memory.
Software Tag-Based KASAN supports slab, page_alloc, vmalloc, and stack memory.
Hardware Tag-Based KASAN supports slab, page_alloc, and non-executable vmalloc
memory.
(hm, were you using hwkasan instead of swkasan, and that's why you
couldn't see it?)
On Wed, Nov 12, 2025 at 08:36:33PM +0000, Matthew Wilcox wrote:
> [dropping all the individual email addresses; leaving only the
> mailing lists]
>
> On Wed, Nov 12, 2025 at 10:14:29AM +0800, Jinchao Wang wrote:
> > On Mon, Nov 10, 2025 at 05:33:22PM +0000, Matthew Wilcox wrote:
> > > On Tue, Nov 11, 2025 at 12:35:55AM +0800, Jinchao Wang wrote:
> > > > Earlier this year, I debugged a stack corruption panic that revealed the
> > > > limitations of existing debugging tools. The bug persisted for 739 days
> > > > before being fixed (CVE-2025-22036), and my reproduction scenario
> > > > differed from the CVE report—highlighting how unpredictably these bugs
> > > > manifest.
> > >
> > > Well, this demonstrates the dangers of keeping this problem siloed
> > > within your own exfat group. The fix made in 1bb7ff4204b6 is wrong!
> > > It was fixed properly in 7375f22495e7 which lists its Fixes: as
> > > Linux-2.6.12-rc2, but that's simply the beginning of git history.
> > > It's actually been there since v2.4.6.4 where it's documented as simply:
> > >
> > > - some subtle fs/buffer.c race conditions (Andrew Morton, me)
> > >
> > > As far as I can tell the changes made in 1bb7ff4204b6 should be
> > > reverted.
> >
> > Thank you for the correction and the detailed history. I wasn't aware this
> > dated back to v2.4.6.4. I'm not part of the exfat group; I simply
> > encountered a bug that 1bb7ff4204b6 happened to resolve in my scenario.
> > The timeline actually illustrates the exact problem KStackWatch addresses:
> > a bug introduced in 2001, partially addressed in 2025, then properly fixed
> > months later. The 24-year gap suggests these silent stack corruptions are
> > extremely difficult to locate.
>
> I think that's a misdiagnosis caused by not understanding the limited
> circumstances in which the problem occurs. To hit this problem, you
> have to have a buffer_head allocated on the stack. That doesn't happen
> in many places:
>
> fs/buffer.c: struct buffer_head tmp = {
> fs/direct-io.c: struct buffer_head map_bh = { 0, };
> fs/ext2/super.c: struct buffer_head tmp_bh;
> fs/ext2/super.c: struct buffer_head tmp_bh;
> fs/ext4/mballoc-test.c: struct buffer_head bitmap_bh;
> fs/ext4/mballoc-test.c: struct buffer_head gd_bh;
> fs/gfs2/bmap.c: struct buffer_head bh;
> fs/gfs2/bmap.c: struct buffer_head bh;
> fs/isofs/inode.c: struct buffer_head dummy;
> fs/jfs/super.c: struct buffer_head tmp_bh;
> fs/jfs/super.c: struct buffer_head tmp_bh;
> fs/mpage.c: struct buffer_head map_bh;
> fs/mpage.c: struct buffer_head map_bh;
>
> It's far more common for buffer_heads to be allocated from slab and
> attached to folios. The other necessary condition to hit this problem
> is that get_block() has to actually read the data from disk. That's
> not normal either! Most filesystems just fill in the metadata about
> the block and defer the actual read to when the data is wanted. That's
> the high-performance way to do it.
>
> So our opportunity to catch this bug was highly limited by the fact that
> we just don't run the codepaths that would allow it to trigger.
>
> > > > Initially, I enabled KASAN, but the bug did not reproduce. Reviewing the
> > > > code in __blk_flush_plug(), I found it difficult to trace all logic
> > > > paths due to indirect function calls through function pointers.
> > >
> > > So why is the solution here not simply to fix KASAN instead of this
> > > giant patch series?
> >
> > KASAN caught 7375f22495e7 because put_bh() accessed bh->b_count after
> > wait_on_buffer() of another thread returned—the stack was invalid.
> > In 1bb7ff4204b6 and my case, corruption occurred before the victim
> > function of another thread returned. The stack remained valid to KASAN,
> > so no warning triggered. This is timing-dependent, not a KASAN deficiency.
>
> I agree that it's a narrow race window, but nevertheless KASAN did catch
> it with ntfs and not with exfat. The KASAN documentation states that
> it can catch this kind of bug:
>
> Generic KASAN supports finding bugs in all of slab, page_alloc, vmap, vmalloc,
> stack, and global memory.
>
> Software Tag-Based KASAN supports slab, page_alloc, vmalloc, and stack memory.
>
> Hardware Tag-Based KASAN supports slab, page_alloc, and non-executable vmalloc
> memory.
>
> (hm, were you using hwkasan instead of swkasan, and that's why you
> couldn't see it?)
>
You're right that these conditions are narrow. However, when these bugs
hit, they're severe and extremely difficult to debug. This year alone,
this specific buffer_head bug was hit at least twice: 1bb7ff4204b6 and my
case. Over 24 years, others likely encountered it but lacked tools to
pinpoint the root cause.
I used software KASAN for the exfat case, but the bug didn't reproduce,
likely due to timing changes from the overhead. More fundamentally, the
corruption was in-bounds within active stack frames, which KASAN cannot
detect by design.
Beyond buffer_head, I encountered another stack corruption bug in network
drivers this year. Without KStackWatch, I had to manually instrument the
code to locate where corruption occurred.
These issues may be more common than they appear. Given Linux's massive
user base combined with the kernel's huge codebase and the large volume of
driver code, both in-tree and out-of-tree, even narrow conditions will be
hit.
Since posting earlier versions, several developers have contacted me about
using KStackWatch for their own issues. KStackWatch fills a gap: it can
pinpoint in-bounds stack corruption with much lower overhead than KASAN.
© 2016 - 2026 Red Hat, Inc.