[PATCH v2 0/3] arm64: perf: Skip device memory during user callchain unwinding

Fredrik Markstrom posted 3 patches 1 month, 2 weeks ago
MAINTAINERS                                        |   1 +
arch/arm64/kernel/stacktrace.c                     | 116 ++++++++++++++++++
tools/testing/selftests/perf_events/Makefile       |   2 +-
.../testing/selftests/perf_events/test_perf_vmio.c | 131 +++++++++++++++++++++
4 files changed, 249 insertions(+), 1 deletion(-)
[PATCH v2 0/3] arm64: perf: Skip device memory during user callchain unwinding
Posted by Fredrik Markstrom 1 month, 2 weeks ago
Perf callchain unwinding follows userspace frame pointers via
copy_from_user. A corrupted or malicious frame pointer can point
into device I/O memory mapped into the process (e.g. via UIO or
/dev/mem), causing the kernel to read from MMIO regions in PMU
interrupt context. Such reads can have side effects on hardware
(clearing status registers, advancing FIFOs, triggering DMA) and
on arm64 can produce a synchronous external abort that panics the
kernel.

This series adds a guard that detects device memory before each
frame pointer read and skips the frame.

Patch 1: Lockless page table walk checking the MAIR attribute index
          in the leaf PTE to identify device memory types
          (MT_DEVICE_nGnRnE, MT_DEVICE_nGnRE). Follows the same
          pattern as perf_get_pgtable_size() in kernel/events/core.c.

Patch 2: (DO NOT MERGE) Module parameter to disable the guard at
          runtime for regression testing.

Patch 3: (DO NOT MERGE) kselftest that exercises the attack vector:
          maps /dev/mem, points FP into it, and verifies the kernel
          survives perf sampling.

Alternatives considered:

 - VMA lookup (mmap_read_trylock + vma_lookup checking VM_IO):
   requires the mmap lock on every frame.
 - RCU maple tree lookup: lock-free but still a tree traversal
   per frame.
 - lock_vma_under_rcu: sleeping lock, unusable from IRQ context.

The page table walk requires no locks and costs only 4 pointer
dereferences per frame.

Limitations:

 - The MAIR attribute check is arm64-specific. Other architectures
   use different mechanisms to identify device memory and would need
   their own PTE inspection logic.
 - The walk only detects memory types visible in the PTE. If a page
   is not present, the walk skips the frame. This has no additional
   cost: copy_from_user_inatomic cannot fault in pages either, so
   unwinding would stop at the same point regardless.

A QEMU-based reproducer is available at:
https://gitlab.com/frma71/qemu-kernel-tests/-/tree/vmio_perf_test?ref_type=tags

Signed-off-by: Fredrik Markstrom <fredrik.markstrom@est.tech>

---
Changes in v2:
- Added range_is_device_mem() to check both ends of the frame read
- Used module_param_unsafe with 0600 permissions
- Documented TOCTOU race in commit message
- Fixed selftest: O_CLOEXEC, mkdtemp, page size from sysconf

---
Fredrik Markstrom (3):
      arm64: perf: Skip device memory during user callchain unwinding
      DO NOT MERGE: arm64: perf: Add skip_vmio parameter to control device memory callchain guard
      DO NOT MERGE: selftests: perf_events: Add device memory callchain unwinding test

 MAINTAINERS                                        |   1 +
 arch/arm64/kernel/stacktrace.c                     | 116 ++++++++++++++++++
 tools/testing/selftests/perf_events/Makefile       |   2 +-
 .../testing/selftests/perf_events/test_perf_vmio.c | 131 +++++++++++++++++++++
 4 files changed, 249 insertions(+), 1 deletion(-)
---
base-commit: dca922e019dd758b4c1b4bec8f1d509efddeaab4
change-id: 20260427-master-with-pfix-v3-ae7173f538ca

Best regards,
-- 
Fredrik Markstrom <fredrik.markstrom@est.tech>
Re: [PATCH v2 0/3] arm64: perf: Skip device memory during user callchain unwinding
Posted by Will Deacon 4 weeks, 1 day ago
On Thu, Apr 30, 2026 at 12:55:12PM +0200, Fredrik Markstrom wrote:
> Perf callchain unwinding follows userspace frame pointers via
> copy_from_user. A corrupted or malicious frame pointer can point
> into device I/O memory mapped into the process (e.g. via UIO or
> /dev/mem), causing the kernel to read from MMIO regions in PMU
> interrupt context. Such reads can have side effects on hardware
> (clearing status registers, advancing FIFOs, triggering DMA) and
> on arm64 can produce a synchronous external abort that panics the
> kernel.

Hmm, but why is unwinding special in this case? If userspace has access
to sensitive MMIO/device mappings, it can presumably pass them to
syscalls and trigger crashes all over the place?

Will
Re: [PATCH v2 0/3] arm64: perf: Skip device memory during user callchain unwinding
Posted by Fredrik Markstrom 4 weeks ago
On Mon, May 18, 2026 at 04:06:11PM +0100, Will Deacon wrote:
> On Thu, Apr 30, 2026 at 12:55:12PM +0200, Fredrik Markstrom wrote:
> > Perf callchain unwinding follows userspace frame pointers via
> > copy_from_user. A corrupted or malicious frame pointer can point
> > into device I/O memory mapped into the process (e.g. via UIO or
> > /dev/mem), causing the kernel to read from MMIO regions in PMU
> > interrupt context. Such reads can have side effects on hardware
> > (clearing status registers, advancing FIFOs, triggering DMA) and
> > on arm64 can produce a synchronous external abort that panics the
> > kernel.
> 
> Hmm, but why is unwinding special in this case? If userspace has access
> to sensitive MMIO/device mappings, it can presumably pass them to
> syscalls and trigger crashes all over the place?

You’re totally right, a broken app with access to hardware like this can
already cause chaos by passing bad pointers to syscalls etc. But the big
difference here is who is to blame when things crash.
 
If an app passes a bad pointer to a syscall, it’s self-inflicted.

Unwinding here is asynchronous and unrelated to the application.
Perf interrupts a perfectly healthy app at a random moment. If that app
is using the frame pointer as a normal register (totally legal in
optimized code), it might hold a junk value that points to MMIO memory.
 
If the kernel blindly follows that junk pointer during an unwind, perf
causes the crash. I think it's acceptable that an app (with hardware
access) causes a crash if buggy, but I don't think it's acceptable that
a profiling tool is causing a crash just by looking at it.

Fredrik


> 
> Will
Re: [PATCH v2 0/3] arm64: perf: Skip device memory during user callchain unwinding
Posted by Fredrik Markstrom 5 days, 5 hours ago
Hello, is there anything I can do at this point to address this issue ?

/Fredrik

On Tue, May 19, 2026 at 10:25:04AM +0200, Fredrik Markstrom wrote:
> On Mon, May 18, 2026 at 04:06:11PM +0100, Will Deacon wrote:
> > On Thu, Apr 30, 2026 at 12:55:12PM +0200, Fredrik Markstrom wrote:
> > > Perf callchain unwinding follows userspace frame pointers via
> > > copy_from_user. A corrupted or malicious frame pointer can point
> > > into device I/O memory mapped into the process (e.g. via UIO or
> > > /dev/mem), causing the kernel to read from MMIO regions in PMU
> > > interrupt context. Such reads can have side effects on hardware
> > > (clearing status registers, advancing FIFOs, triggering DMA) and
> > > on arm64 can produce a synchronous external abort that panics the
> > > kernel.
> > 
> > Hmm, but why is unwinding special in this case? If userspace has access
> > to sensitive MMIO/device mappings, it can presumably pass them to
> > syscalls and trigger crashes all over the place?
> 
> You’re totally right, a broken app with access to hardware like this can
> already cause chaos by passing bad pointers to syscalls etc. But the big
> difference here is who is to blame when things crash.
>  
> If an app passes a bad pointer to a syscall, it’s self-inflicted.
> 
> Unwinding here is asynchronous and unrelated to the application.
> Perf interrupts a perfectly healthy app at a random moment. If that app
> is using the frame pointer as a normal register (totally legal in
> optimized code), it might hold a junk value that points to MMIO memory.
>  
> If the kernel blindly follows that junk pointer during an unwind, perf
> causes the crash. I think it's acceptable that an app (with hardware
> access) causes a crash if buggy, but I don't think it's acceptable that
> a profiling tool is causing a crash just by looking at it.
> 
> Fredrik
> 
> 
> > 
> > Will