[RFC PATCH] iommu: Default to lazy DMA mode on ARM64

Nafees Ahmed Abdul posted 1 patch 2 months, 1 week ago
drivers/iommu/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
[RFC PATCH] iommu: Default to lazy DMA mode on ARM64
Posted by Nafees Ahmed Abdul 2 months, 1 week ago
ARM64 currently falls through to IOMMU_DEFAULT_DMA_STRICT, while
X86 defaults to IOMMU_DEFAULT_DMA_LAZY. On ARM64 bare-metal
systems with the ARM SMMU, strict mode causes synchronous TLBI
+ CMD_SYNC on every DMA unmap, resulting in significant
throughput degradation for network-intensive workloads.

Benchmarked on an ARM64 bare-metal system (AWS m8g.metal-24xl)
running Debian 13 with kernel 6.12.74, using iperf3:

  STRICT (default): 14.9 Gbps
  LAZY:             39.8 Gbps

This is a 2.67x throughput improvement simply by switching the
IOMMU default domain mode.

Distributions that do not explicitly override this Kconfig
choice (e.g., Debian, SLES) silently get STRICT on ARM64,
causing this regression on bare-metal systems. Changing the
upstream default avoids the need for each distribution to
independently carry this override.

Add ARM64 to the LAZY default to align with X86 behavior.

Signed-off-by: Nafees Ahmed Abdul <nafeabd@amazon.com>
---
 drivers/iommu/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index f86262b11..2822aba75 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -96,7 +96,7 @@ config IOMMU_DEBUGFS
 choice
 	prompt "IOMMU default domain type"
 	depends on IOMMU_API
-	default IOMMU_DEFAULT_DMA_LAZY if X86 || S390
+	default IOMMU_DEFAULT_DMA_LAZY if X86 || S390 || ARM64
 	default IOMMU_DEFAULT_DMA_STRICT
 	help
 	  Choose the type of IOMMU domain used to manage DMA API usage by
-- 
2.47.3
Re: [RFC PATCH] iommu: Default to lazy DMA mode on ARM64
Posted by Robin Murphy 2 months, 1 week ago
On 02/04/2026 8:59 pm, Nafees Ahmed Abdul wrote:
> ARM64 currently falls through to IOMMU_DEFAULT_DMA_STRICT, while
> X86 defaults to IOMMU_DEFAULT_DMA_LAZY. On ARM64 bare-metal
> systems with the ARM SMMU, strict mode causes synchronous TLBI
> + CMD_SYNC on every DMA unmap, resulting in significant
> throughput degradation for network-intensive workloads.
> 
> Benchmarked on an ARM64 bare-metal system (AWS m8g.metal-24xl)
> running Debian 13 with kernel 6.12.74, using iperf3:
> 
>    STRICT (default): 14.9 Gbps
>    LAZY:             39.8 Gbps
> 
> This is a 2.67x throughput improvement simply by switching the
> IOMMU default domain mode.
> 
> Distributions that do not explicitly override this Kconfig
> choice (e.g., Debian, SLES) silently get STRICT on ARM64,
> causing this regression on bare-metal systems.

It is not a "regression", it has always been this way since the 
beginning of IOMMU support on arm64. For many years, we didn't even have 
such a thing as lazy mode.

> Changing the
> upstream default avoids the need for each distribution to
> independently carry this override.

...while equally *creating* that need for all the distros/users who do 
value security/robustness above performance. Who's to say what matters 
most? Besides, defconfig is never meant to be a distro config; distros 
*should* maintain their own configs, and if they're not delivering the 
options that the majority of their users want, that's between the 
distros and their users.

The numbers game goes both ways too - the sheer quantity of arm64 
systems where strict vs. lazy makes no noticeable performance 
difference, but does offer that small robustness benefit (i.e. 
embedded/mobile) is many orders of magnitude more the number of arm64 
systems capable of 50GbE. Even your own data are suggesting this is 
actually a pretty niche case, if even 10GbE systems would still have 
plenty of headroom to keep up in strict mode - if anything that's 
actually pretty impressive!

> Add ARM64 to the LAZY default to align with X86 behavior.

But the other side of that is that the x86 (and S390) behaviour is a 
20-year-old legacy which arguably only looks more and more anachronistic 
in today's post-Spectre/etc. security-conscious world. Wouldn't an even 
better alignment argument be to start cleaning up such legacy, rather 
than spread it further onto more modern architectures which never even 
had it?

Thanks,
Robin.

> Signed-off-by: Nafees Ahmed Abdul <nafeabd@amazon.com>
> ---
>   drivers/iommu/Kconfig | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index f86262b11..2822aba75 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -96,7 +96,7 @@ config IOMMU_DEBUGFS
>   choice
>   	prompt "IOMMU default domain type"
>   	depends on IOMMU_API
> -	default IOMMU_DEFAULT_DMA_LAZY if X86 || S390
> +	default IOMMU_DEFAULT_DMA_LAZY if X86 || S390 || ARM64
>   	default IOMMU_DEFAULT_DMA_STRICT
>   	help
>   	  Choose the type of IOMMU domain used to manage DMA API usage by
Re: [RFC PATCH] iommu: Default to lazy DMA mode on ARM64
Posted by Pranjal Shrivastava 2 months, 1 week ago
Hi Nafees,

On Thu, Apr 02, 2026 at 07:59:13PM +0000, Nafees Ahmed Abdul wrote:
> ARM64 currently falls through to IOMMU_DEFAULT_DMA_STRICT, while
> X86 defaults to IOMMU_DEFAULT_DMA_LAZY. On ARM64 bare-metal
> systems with the ARM SMMU, strict mode causes synchronous TLBI
> + CMD_SYNC on every DMA unmap, resulting in significant
> throughput degradation for network-intensive workloads.
> 
> Benchmarked on an ARM64 bare-metal system (AWS m8g.metal-24xl)
> running Debian 13 with kernel 6.12.74, using iperf3:
> 
>   STRICT (default): 14.9 Gbps
>   LAZY:             39.8 Gbps
> 
> This is a 2.67x throughput improvement simply by switching the
> IOMMU default domain mode.
> 
> Distributions that do not explicitly override this Kconfig
> choice (e.g., Debian, SLES) silently get STRICT on ARM64,
> causing this regression on bare-metal systems. Changing the
> upstream default avoids the need for each distribution to
> independently carry this override.
> 

Thanks for the patch and the benchmarks.

However, I'm not sure why should we change the compile-time default for
all ARM64 systems? Currently, users can already achieve this behavior by
using the `iommu.strict=0` boot parameter.

Since IOMMU_DEFAULT_DMA_STRICT provides a higher security guarantee 
(preventing sub-page aliasing and potential "use-after-unmap" attacks),
keeping it as the default and allowing users to opt-in via the kernel cmd
line seems like the safer path, in my opinion. 

Additionally, distributions like Debian can also set this via their 
GRUB configurations for performance.

> Add ARM64 to the LAZY default to align with X86 behavior.
> 
> Signed-off-by: Nafees Ahmed Abdul <nafeabd@amazon.com>
> ---
>  drivers/iommu/Kconfig | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index f86262b11..2822aba75 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -96,7 +96,7 @@ config IOMMU_DEBUGFS
>  choice
>  	prompt "IOMMU default domain type"
>  	depends on IOMMU_API
> -	default IOMMU_DEFAULT_DMA_LAZY if X86 || S390
> +	default IOMMU_DEFAULT_DMA_LAZY if X86 || S390 || ARM64
>  	default IOMMU_DEFAULT_DMA_STRICT
>  	help
>  	  Choose the type of IOMMU domain used to manage DMA API usage by
 
Thanks,
Praan
Re: [RFC PATCH] iommu: Default to lazy DMA mode on ARM64
Posted by Jason Gunthorpe 2 months, 1 week ago
On Fri, Apr 03, 2026 at 02:28:17AM +0000, Pranjal Shrivastava wrote:

> Thanks for the patch and the benchmarks.
> 
> However, I'm not sure why should we change the compile-time default for
> all ARM64 systems? Currently, users can already achieve this behavior by
> using the `iommu.strict=0` boot parameter.

Personally I really dislike these rando arch specific things.

What justification is there for any arch to be unique here?

I'd expect a single kconfig 'try to be strict by default' and that's
it. No arch override.

Jason