[PATCH v5 0/4] iommu: Add IOMMU_DEBUG_PAGEALLOC sanitizer

Mostafa Saleh posted 4 patches 1 month ago
.../admin-guide/kernel-parameters.txt         |   9 +
drivers/iommu/Kconfig                         |  19 ++
drivers/iommu/Makefile                        |   1 +
drivers/iommu/iommu-debug-pagealloc.c         | 174 ++++++++++++++++++
drivers/iommu/iommu-priv.h                    |  58 ++++++
drivers/iommu/iommu.c                         |  11 +-
include/linux/iommu-debug-pagealloc.h         |  32 ++++
include/linux/mm.h                            |   5 +
mm/page_ext.c                                 |   4 +
9 files changed, 311 insertions(+), 2 deletions(-)
create mode 100644 drivers/iommu/iommu-debug-pagealloc.c
create mode 100644 include/linux/iommu-debug-pagealloc.h
[PATCH v5 0/4] iommu: Add IOMMU_DEBUG_PAGEALLOC sanitizer
Posted by Mostafa Saleh 1 month ago
Overview
--------
This patch series introduces a new debugging feature,
IOMMU_DEBUG_PAGEALLOC, designed to catch DMA use-after-free bugs
and IOMMU mapping leaks from buggy drivers.

The kernel has powerful sanitizers like KASAN and DEBUG_PAGEALLOC
for catching CPU-side memory corruption. However, there is limited
runtime sanitization for DMA mappings managed by the IOMMU. A buggy
driver can free a page while it is still mapped for DMA, leading to
memory corruption or use-after-free vulnerabilities when that page is
reallocated and used for a different purpose.

Inspired by DEBUG_PAGEALLOC, this sanitizer tracks IOMMU mappings on a
per-page basis, as it’s not possible to unmap the pages, because it
requires to lock and walk all domains on every kernel free, instead we
rely on page_ext to add an IOMMU-specific mapping reference count for
each page.
And on each page allocated/freed from the kernel we simply check the
count and WARN if it is not zero, and dumping page owner information
if enabled.

Concurrency
-----------
By design this check is racy where one caller can map pages just after
the check, which can lead to false negatives.
In my opinion this is acceptable for sanitizers (for ex KCSAN have
that property).
Otherwise we have to implement locks in iommu_map/unmap for all domains
which is not favourable even for a debug feature.
The sanitizer only guarantees that the refcount itself doesn’t get
corrupted using atomics. And there are no false positives.

CPU vs IOMMU Page Size
----------------------
IOMMUs can use different page sizes and which can be non-homogeneous;
not even all of them have the same page size.

To solve this, the refcount is always incremented and decremented in
units of the smallest page size supported by the IOMMU domain. This
ensures the accounting remains consistent regardless of the size of
the map or unmap operation, otherwise double counting can happen.

Testing & Performance
---------------------
This was tested on Morello with Arm64 + SMMUv3
Did some testing Lenovo IdeaCentre X Gen 10 Snapdragon
Did some testing on Qemu including different SMMUv3/CPU page size (arm64).

I also ran dma_map_benchmark on Morello:

echo dma_map_benchmark > /sys/bus/pci/devices/0000\:06\:00.0/driver_override
echo 0000:06:00.0 >  /sys/bus/pci/devices/0000\:06\:00.0/driver/unbind
echo 0000:06:00.0 > /sys/bus/pci/drivers/dma_map_benchmark/bind
./dma_map_benchmark -t $threads -g $nr_pages

CONFIG refers to "CONFIG_IOMMU_DEBUG_PAGEALLOC"
cmdline refers to "iommu.debug_pagealloc"
Numbers are (map latency)/(unmap latency), lower is better.

			CONFIG=n    CONFIG=y    CONFIG=y
			            cmdline=0   cmdline=1
4K - 1 thread		0.1/0.6     0.1/0.6     0.1/0.7
4K - 4 threads		0.1/1.1     0.1/1.0     0.2/1.1
1M - 1 thread		0.8/21.2    0.7/21.2    5.4/42.3
1M - 4 threads		1.1/45.9    1.1/46.0    5.9/45.1

Changes in v5:
v4: https://lore.kernel.org/all/20251211125928.3258905-1-smostafa@google.com/
- Fix typo in comment
- Collect Baolu R-bs

Main changes in v4:
v3: https://lore.kernel.org/all/20251124200811.2942432-1-smostafa@google.com/
- Update the kernel parameter format in docs based on Randy feedback
- Update commit subjects
- Add IOMMU only functions in iommu-priv.h based on Baolu feedback

Main changes in v3: (Most of them addressing Will comments)
v2: https://lore.kernel.org/linux-iommu/20251106163953.1971067-1-smostafa@google.com/
- Reword the Kconfig help
- Use unmap_begin/end instead of unmap/remap
- Use relaxed accessors when refcounting
- Fix a bug with checking the returned address from iova_to_phys
- Add more hardening checks (overflow)
- Add more debug info on assertions (dump_page_owner())
- Handle cases where unmap returns larger size as the core code seems
  to tolerate that.
- Drop Tested-by tags from Qinxin as the code logic changed

Main changes in v2:
v1: https://lore.kernel.org/linux-iommu/20251003173229.1533640-1-smostafa@google.com/
- Address Jörg comments about #ifdefs and static keys
- Reword the Kconfig help
- Drop RFC
- Collect t-b from Qinxin
- Minor cleanups

Mostafa Saleh (4):
  iommu: Add page_ext for IOMMU_DEBUG_PAGEALLOC
  iommu: Add calls for IOMMU_DEBUG_PAGEALLOC
  iommu: debug-pagealloc: Track IOMMU pages
  iommu: debug-pagealloc: Check mapped/unmapped kernel memory

 .../admin-guide/kernel-parameters.txt         |   9 +
 drivers/iommu/Kconfig                         |  19 ++
 drivers/iommu/Makefile                        |   1 +
 drivers/iommu/iommu-debug-pagealloc.c         | 174 ++++++++++++++++++
 drivers/iommu/iommu-priv.h                    |  58 ++++++
 drivers/iommu/iommu.c                         |  11 +-
 include/linux/iommu-debug-pagealloc.h         |  32 ++++
 include/linux/mm.h                            |   5 +
 mm/page_ext.c                                 |   4 +
 9 files changed, 311 insertions(+), 2 deletions(-)
 create mode 100644 drivers/iommu/iommu-debug-pagealloc.c
 create mode 100644 include/linux/iommu-debug-pagealloc.h

-- 
2.52.0.351.gbe84eed79e-goog
Re: [PATCH v5 0/4] iommu: Add IOMMU_DEBUG_PAGEALLOC sanitizer
Posted by Pranjal Shrivastava 1 month ago
On Tue, Jan 06, 2026 at 04:21:56PM +0000, Mostafa Saleh wrote:
> Overview
> --------
> This patch series introduces a new debugging feature,
> IOMMU_DEBUG_PAGEALLOC, designed to catch DMA use-after-free bugs
> and IOMMU mapping leaks from buggy drivers.
> 
> The kernel has powerful sanitizers like KASAN and DEBUG_PAGEALLOC
> for catching CPU-side memory corruption. However, there is limited
> runtime sanitization for DMA mappings managed by the IOMMU. A buggy
> driver can free a page while it is still mapped for DMA, leading to
> memory corruption or use-after-free vulnerabilities when that page is
> reallocated and used for a different purpose.
> 

Thanks for this series! This is really helpful!

> Inspired by DEBUG_PAGEALLOC, this sanitizer tracks IOMMU mappings on a
> per-page basis, as it’s not possible to unmap the pages, because it
> requires to lock and walk all domains on every kernel free, instead we
> rely on page_ext to add an IOMMU-specific mapping reference count for
> each page.
> And on each page allocated/freed from the kernel we simply check the
> count and WARN if it is not zero, and dumping page owner information
> if enabled.
> 
> Concurrency
> -----------
> By design this check is racy where one caller can map pages just after
> the check, which can lead to false negatives.
> In my opinion this is acceptable for sanitizers (for ex KCSAN have
> that property).
> Otherwise we have to implement locks in iommu_map/unmap for all domains
> which is not favourable even for a debug feature.
> The sanitizer only guarantees that the refcount itself doesn’t get
> corrupted using atomics. And there are no false positives.
> 
> CPU vs IOMMU Page Size
> ----------------------
> IOMMUs can use different page sizes and which can be non-homogeneous;
> not even all of them have the same page size.
> 
> To solve this, the refcount is always incremented and decremented in
> units of the smallest page size supported by the IOMMU domain. This
> ensures the accounting remains consistent regardless of the size of
> the map or unmap operation, otherwise double counting can happen.
> 
> Testing & Performance
> ---------------------
> This was tested on Morello with Arm64 + SMMUv3
> Did some testing Lenovo IdeaCentre X Gen 10 Snapdragon
> Did some testing on Qemu including different SMMUv3/CPU page size (arm64).
> 
> I also ran dma_map_benchmark on Morello:
> 
> echo dma_map_benchmark > /sys/bus/pci/devices/0000\:06\:00.0/driver_override
> echo 0000:06:00.0 >  /sys/bus/pci/devices/0000\:06\:00.0/driver/unbind
> echo 0000:06:00.0 > /sys/bus/pci/drivers/dma_map_benchmark/bind
> ./dma_map_benchmark -t $threads -g $nr_pages
> 
> CONFIG refers to "CONFIG_IOMMU_DEBUG_PAGEALLOC"
> cmdline refers to "iommu.debug_pagealloc"
> Numbers are (map latency)/(unmap latency), lower is better.
> 
> 			CONFIG=n    CONFIG=y    CONFIG=y
> 			            cmdline=0   cmdline=1
> 4K - 1 thread		0.1/0.6     0.1/0.6     0.1/0.7
> 4K - 4 threads		0.1/1.1     0.1/1.0     0.2/1.1
> 1M - 1 thread		0.8/21.2    0.7/21.2    5.4/42.3
> 1M - 4 threads		1.1/45.9    1.1/46.0    5.9/45.1
> 

Just curious to know if we've also measured the latency for larger
mappings? e.g. 1G mapping backed by `n` 4K mappings?

> Changes in v5:
> v4: https://lore.kernel.org/all/20251211125928.3258905-1-smostafa@google.com/
> - Fix typo in comment
> - Collect Baolu R-bs
> 
> Main changes in v4:
> v3: https://lore.kernel.org/all/20251124200811.2942432-1-smostafa@google.com/
> - Update the kernel parameter format in docs based on Randy feedback
> - Update commit subjects
> - Add IOMMU only functions in iommu-priv.h based on Baolu feedback
> 
> Main changes in v3: (Most of them addressing Will comments)
> v2: https://lore.kernel.org/linux-iommu/20251106163953.1971067-1-smostafa@google.com/
> - Reword the Kconfig help
> - Use unmap_begin/end instead of unmap/remap
> - Use relaxed accessors when refcounting
> - Fix a bug with checking the returned address from iova_to_phys
> - Add more hardening checks (overflow)
> - Add more debug info on assertions (dump_page_owner())
> - Handle cases where unmap returns larger size as the core code seems
>   to tolerate that.
> - Drop Tested-by tags from Qinxin as the code logic changed
> 
> Main changes in v2:
> v1: https://lore.kernel.org/linux-iommu/20251003173229.1533640-1-smostafa@google.com/
> - Address Jörg comments about #ifdefs and static keys
> - Reword the Kconfig help
> - Drop RFC
> - Collect t-b from Qinxin
> - Minor cleanups
> 
> Mostafa Saleh (4):
>   iommu: Add page_ext for IOMMU_DEBUG_PAGEALLOC
>   iommu: Add calls for IOMMU_DEBUG_PAGEALLOC
>   iommu: debug-pagealloc: Track IOMMU pages
>   iommu: debug-pagealloc: Check mapped/unmapped kernel memory
> 
>  .../admin-guide/kernel-parameters.txt         |   9 +
>  drivers/iommu/Kconfig                         |  19 ++
>  drivers/iommu/Makefile                        |   1 +
>  drivers/iommu/iommu-debug-pagealloc.c         | 174 ++++++++++++++++++
>  drivers/iommu/iommu-priv.h                    |  58 ++++++
>  drivers/iommu/iommu.c                         |  11 +-
>  include/linux/iommu-debug-pagealloc.h         |  32 ++++
>  include/linux/mm.h                            |   5 +
>  mm/page_ext.c                                 |   4 +
>  9 files changed, 311 insertions(+), 2 deletions(-)
>  create mode 100644 drivers/iommu/iommu-debug-pagealloc.c
>  create mode 100644 include/linux/iommu-debug-pagealloc.h
> 
> -- 
> 2.52.0.351.gbe84eed79e-goog
> 
> 
Re: [PATCH v5 0/4] iommu: Add IOMMU_DEBUG_PAGEALLOC sanitizer
Posted by Mostafa Saleh 4 weeks, 1 day ago
On Wed, Jan 07, 2026 at 03:24:59PM +0000, Pranjal Shrivastava wrote:
> On Tue, Jan 06, 2026 at 04:21:56PM +0000, Mostafa Saleh wrote:
> > Overview
> > --------
> > This patch series introduces a new debugging feature,
> > IOMMU_DEBUG_PAGEALLOC, designed to catch DMA use-after-free bugs
> > and IOMMU mapping leaks from buggy drivers.
> > 
> > The kernel has powerful sanitizers like KASAN and DEBUG_PAGEALLOC
> > for catching CPU-side memory corruption. However, there is limited
> > runtime sanitization for DMA mappings managed by the IOMMU. A buggy
> > driver can free a page while it is still mapped for DMA, leading to
> > memory corruption or use-after-free vulnerabilities when that page is
> > reallocated and used for a different purpose.
> > 
> 
> Thanks for this series! This is really helpful!
> 
> > Inspired by DEBUG_PAGEALLOC, this sanitizer tracks IOMMU mappings on a
> > per-page basis, as it’s not possible to unmap the pages, because it
> > requires to lock and walk all domains on every kernel free, instead we
> > rely on page_ext to add an IOMMU-specific mapping reference count for
> > each page.
> > And on each page allocated/freed from the kernel we simply check the
> > count and WARN if it is not zero, and dumping page owner information
> > if enabled.
> > 
> > Concurrency
> > -----------
> > By design this check is racy where one caller can map pages just after
> > the check, which can lead to false negatives.
> > In my opinion this is acceptable for sanitizers (for ex KCSAN have
> > that property).
> > Otherwise we have to implement locks in iommu_map/unmap for all domains
> > which is not favourable even for a debug feature.
> > The sanitizer only guarantees that the refcount itself doesn’t get
> > corrupted using atomics. And there are no false positives.
> > 
> > CPU vs IOMMU Page Size
> > ----------------------
> > IOMMUs can use different page sizes and which can be non-homogeneous;
> > not even all of them have the same page size.
> > 
> > To solve this, the refcount is always incremented and decremented in
> > units of the smallest page size supported by the IOMMU domain. This
> > ensures the accounting remains consistent regardless of the size of
> > the map or unmap operation, otherwise double counting can happen.
> > 
> > Testing & Performance
> > ---------------------
> > This was tested on Morello with Arm64 + SMMUv3
> > Did some testing Lenovo IdeaCentre X Gen 10 Snapdragon
> > Did some testing on Qemu including different SMMUv3/CPU page size (arm64).
> > 
> > I also ran dma_map_benchmark on Morello:
> > 
> > echo dma_map_benchmark > /sys/bus/pci/devices/0000\:06\:00.0/driver_override
> > echo 0000:06:00.0 >  /sys/bus/pci/devices/0000\:06\:00.0/driver/unbind
> > echo 0000:06:00.0 > /sys/bus/pci/drivers/dma_map_benchmark/bind
> > ./dma_map_benchmark -t $threads -g $nr_pages
> > 
> > CONFIG refers to "CONFIG_IOMMU_DEBUG_PAGEALLOC"
> > cmdline refers to "iommu.debug_pagealloc"
> > Numbers are (map latency)/(unmap latency), lower is better.
> > 
> > 			CONFIG=n    CONFIG=y    CONFIG=y
> > 			            cmdline=0   cmdline=1
> > 4K - 1 thread		0.1/0.6     0.1/0.6     0.1/0.7
> > 4K - 4 threads		0.1/1.1     0.1/1.0     0.2/1.1
> > 1M - 1 thread		0.8/21.2    0.7/21.2    5.4/42.3
> > 1M - 4 threads		1.1/45.9    1.1/46.0    5.9/45.1
> > 
> 
> Just curious to know if we've also measured the latency for larger
> mappings? e.g. 1G mapping backed by `n` 4K mappings?

No, the max granule supported by dma_map_benchmark is 1024, which
is 4M for 4K kernels.
I thought 1M would be better for my setup, as I am using SMMUv3,
where 1MB includes many PTEs compared to 4M, and the 4K test will
cover the single PTE case, so we get more coverage.

Thanks,
Mostafa

> 
> > Changes in v5:
> > v4: https://lore.kernel.org/all/20251211125928.3258905-1-smostafa@google.com/
> > - Fix typo in comment
> > - Collect Baolu R-bs
> > 
> > Main changes in v4:
> > v3: https://lore.kernel.org/all/20251124200811.2942432-1-smostafa@google.com/
> > - Update the kernel parameter format in docs based on Randy feedback
> > - Update commit subjects
> > - Add IOMMU only functions in iommu-priv.h based on Baolu feedback
> > 
> > Main changes in v3: (Most of them addressing Will comments)
> > v2: https://lore.kernel.org/linux-iommu/20251106163953.1971067-1-smostafa@google.com/
> > - Reword the Kconfig help
> > - Use unmap_begin/end instead of unmap/remap
> > - Use relaxed accessors when refcounting
> > - Fix a bug with checking the returned address from iova_to_phys
> > - Add more hardening checks (overflow)
> > - Add more debug info on assertions (dump_page_owner())
> > - Handle cases where unmap returns larger size as the core code seems
> >   to tolerate that.
> > - Drop Tested-by tags from Qinxin as the code logic changed
> > 
> > Main changes in v2:
> > v1: https://lore.kernel.org/linux-iommu/20251003173229.1533640-1-smostafa@google.com/
> > - Address Jörg comments about #ifdefs and static keys
> > - Reword the Kconfig help
> > - Drop RFC
> > - Collect t-b from Qinxin
> > - Minor cleanups
> > 
> > Mostafa Saleh (4):
> >   iommu: Add page_ext for IOMMU_DEBUG_PAGEALLOC
> >   iommu: Add calls for IOMMU_DEBUG_PAGEALLOC
> >   iommu: debug-pagealloc: Track IOMMU pages
> >   iommu: debug-pagealloc: Check mapped/unmapped kernel memory
> > 
> >  .../admin-guide/kernel-parameters.txt         |   9 +
> >  drivers/iommu/Kconfig                         |  19 ++
> >  drivers/iommu/Makefile                        |   1 +
> >  drivers/iommu/iommu-debug-pagealloc.c         | 174 ++++++++++++++++++
> >  drivers/iommu/iommu-priv.h                    |  58 ++++++
> >  drivers/iommu/iommu.c                         |  11 +-
> >  include/linux/iommu-debug-pagealloc.h         |  32 ++++
> >  include/linux/mm.h                            |   5 +
> >  mm/page_ext.c                                 |   4 +
> >  9 files changed, 311 insertions(+), 2 deletions(-)
> >  create mode 100644 drivers/iommu/iommu-debug-pagealloc.c
> >  create mode 100644 include/linux/iommu-debug-pagealloc.h
> > 
> > -- 
> > 2.52.0.351.gbe84eed79e-goog
> > 
> >