drivers/iommu/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
ARM64 currently falls through to IOMMU_DEFAULT_DMA_STRICT, while
X86 defaults to IOMMU_DEFAULT_DMA_LAZY. On ARM64 bare-metal
systems with the ARM SMMU, strict mode causes synchronous TLBI
+ CMD_SYNC on every DMA unmap, resulting in significant
throughput degradation for network-intensive workloads.
Benchmarked on an ARM64 bare-metal system (AWS m8g.metal-24xl)
running Debian 13 with kernel 6.12.74, using iperf3:
STRICT (default): 14.9 Gbps
LAZY: 39.8 Gbps
This is a 2.67x throughput improvement simply by switching the
IOMMU default domain mode.
Distributions that do not explicitly override this Kconfig
choice (e.g., Debian, SLES) silently get STRICT on ARM64,
causing this regression on bare-metal systems. Changing the
upstream default avoids the need for each distribution to
independently carry this override.
Add ARM64 to the LAZY default to align with X86 behavior.
Signed-off-by: Nafees Ahmed Abdul <nafeabd@amazon.com>
---
drivers/iommu/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index f86262b11..2822aba75 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -96,7 +96,7 @@ config IOMMU_DEBUGFS
choice
prompt "IOMMU default domain type"
depends on IOMMU_API
- default IOMMU_DEFAULT_DMA_LAZY if X86 || S390
+ default IOMMU_DEFAULT_DMA_LAZY if X86 || S390 || ARM64
default IOMMU_DEFAULT_DMA_STRICT
help
Choose the type of IOMMU domain used to manage DMA API usage by
--
2.47.3
On 02/04/2026 8:59 pm, Nafees Ahmed Abdul wrote: > ARM64 currently falls through to IOMMU_DEFAULT_DMA_STRICT, while > X86 defaults to IOMMU_DEFAULT_DMA_LAZY. On ARM64 bare-metal > systems with the ARM SMMU, strict mode causes synchronous TLBI > + CMD_SYNC on every DMA unmap, resulting in significant > throughput degradation for network-intensive workloads. > > Benchmarked on an ARM64 bare-metal system (AWS m8g.metal-24xl) > running Debian 13 with kernel 6.12.74, using iperf3: > > STRICT (default): 14.9 Gbps > LAZY: 39.8 Gbps > > This is a 2.67x throughput improvement simply by switching the > IOMMU default domain mode. > > Distributions that do not explicitly override this Kconfig > choice (e.g., Debian, SLES) silently get STRICT on ARM64, > causing this regression on bare-metal systems. It is not a "regression", it has always been this way since the beginning of IOMMU support on arm64. For many years, we didn't even have such a thing as lazy mode. > Changing the > upstream default avoids the need for each distribution to > independently carry this override. ...while equally *creating* that need for all the distros/users who do value security/robustness above performance. Who's to say what matters most? Besides, defconfig is never meant to be a distro config; distros *should* maintain their own configs, and if they're not delivering the options that the majority of their users want, that's between the distros and their users. The numbers game goes both ways too - the sheer quantity of arm64 systems where strict vs. lazy makes no noticeable performance difference, but does offer that small robustness benefit (i.e. embedded/mobile) is many orders of magnitude more the number of arm64 systems capable of 50GbE. Even your own data are suggesting this is actually a pretty niche case, if even 10GbE systems would still have plenty of headroom to keep up in strict mode - if anything that's actually pretty impressive! > Add ARM64 to the LAZY default to align with X86 behavior. But the other side of that is that the x86 (and S390) behaviour is a 20-year-old legacy which arguably only looks more and more anachronistic in today's post-Spectre/etc. security-conscious world. Wouldn't an even better alignment argument be to start cleaning up such legacy, rather than spread it further onto more modern architectures which never even had it? Thanks, Robin. > Signed-off-by: Nafees Ahmed Abdul <nafeabd@amazon.com> > --- > drivers/iommu/Kconfig | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig > index f86262b11..2822aba75 100644 > --- a/drivers/iommu/Kconfig > +++ b/drivers/iommu/Kconfig > @@ -96,7 +96,7 @@ config IOMMU_DEBUGFS > choice > prompt "IOMMU default domain type" > depends on IOMMU_API > - default IOMMU_DEFAULT_DMA_LAZY if X86 || S390 > + default IOMMU_DEFAULT_DMA_LAZY if X86 || S390 || ARM64 > default IOMMU_DEFAULT_DMA_STRICT > help > Choose the type of IOMMU domain used to manage DMA API usage by
Hi Nafees, On Thu, Apr 02, 2026 at 07:59:13PM +0000, Nafees Ahmed Abdul wrote: > ARM64 currently falls through to IOMMU_DEFAULT_DMA_STRICT, while > X86 defaults to IOMMU_DEFAULT_DMA_LAZY. On ARM64 bare-metal > systems with the ARM SMMU, strict mode causes synchronous TLBI > + CMD_SYNC on every DMA unmap, resulting in significant > throughput degradation for network-intensive workloads. > > Benchmarked on an ARM64 bare-metal system (AWS m8g.metal-24xl) > running Debian 13 with kernel 6.12.74, using iperf3: > > STRICT (default): 14.9 Gbps > LAZY: 39.8 Gbps > > This is a 2.67x throughput improvement simply by switching the > IOMMU default domain mode. > > Distributions that do not explicitly override this Kconfig > choice (e.g., Debian, SLES) silently get STRICT on ARM64, > causing this regression on bare-metal systems. Changing the > upstream default avoids the need for each distribution to > independently carry this override. > Thanks for the patch and the benchmarks. However, I'm not sure why should we change the compile-time default for all ARM64 systems? Currently, users can already achieve this behavior by using the `iommu.strict=0` boot parameter. Since IOMMU_DEFAULT_DMA_STRICT provides a higher security guarantee (preventing sub-page aliasing and potential "use-after-unmap" attacks), keeping it as the default and allowing users to opt-in via the kernel cmd line seems like the safer path, in my opinion. Additionally, distributions like Debian can also set this via their GRUB configurations for performance. > Add ARM64 to the LAZY default to align with X86 behavior. > > Signed-off-by: Nafees Ahmed Abdul <nafeabd@amazon.com> > --- > drivers/iommu/Kconfig | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig > index f86262b11..2822aba75 100644 > --- a/drivers/iommu/Kconfig > +++ b/drivers/iommu/Kconfig > @@ -96,7 +96,7 @@ config IOMMU_DEBUGFS > choice > prompt "IOMMU default domain type" > depends on IOMMU_API > - default IOMMU_DEFAULT_DMA_LAZY if X86 || S390 > + default IOMMU_DEFAULT_DMA_LAZY if X86 || S390 || ARM64 > default IOMMU_DEFAULT_DMA_STRICT > help > Choose the type of IOMMU domain used to manage DMA API usage by Thanks, Praan
On Fri, Apr 03, 2026 at 02:28:17AM +0000, Pranjal Shrivastava wrote: > Thanks for the patch and the benchmarks. > > However, I'm not sure why should we change the compile-time default for > all ARM64 systems? Currently, users can already achieve this behavior by > using the `iommu.strict=0` boot parameter. Personally I really dislike these rando arch specific things. What justification is there for any arch to be unique here? I'd expect a single kconfig 'try to be strict by default' and that's it. No arch override. Jason
© 2016 - 2026 Red Hat, Inc.