[PATCH rc v4 3/4] iommu/arm-smmu-v3: Mark STE EATS safe when computing the update sequence

Nicolin Chen posted 4 patches 1 month, 3 weeks ago
There is a newer version of this series
[PATCH rc v4 3/4] iommu/arm-smmu-v3: Mark STE EATS safe when computing the update sequence
Posted by Nicolin Chen 1 month, 3 weeks ago
From: Jason Gunthorpe <jgg@nvidia.com>

If a VM wants to toggle EATS off at the same time as changing the CFG, the
hypervisor will see EATS change to 0 and insert a V=0 breaking update into
the STE even though the VM did not ask for that.

In bare metal, EATS is ignored by CFG=ABORT/BYPASS, which is why this does
not cause a problem until we have nested where CFG is always a variation of
S2 trans that does use EATS.

Relax the rules for EATS sequencing, we don't need it to be exact because
the enclosing code will always disable ATS at the PCI device if we are
changing EATS. This ensures there are no ATS transactions that can race
with an EATS change so we don't need to carefully sequence these bits.

Fixes: 1e8be08d1c91 ("iommu/arm-smmu-v3: Support IOMMU_DOMAIN_NESTED")
Cc: stable@vger.kernel.org
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 12a9669bcc83..a3b29ad20a82 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1095,6 +1095,15 @@ void arm_smmu_get_ste_update_safe(__le64 *safe_bits)
 	 *  fault records even when MEV == 0.
 	 */
 	safe_bits[1] |= cpu_to_le64(STRTAB_STE_1_MEV);
+
+	/*
+	 * EATS is used to reject and control the ATS behavior of the device. If
+	 * we are changing it away from 0 then we already trust the device to
+	 * use ATS properly and we have sequenced the device's ATS enable in PCI
+	 * config space to prevent it from issuing ATS while we are changing
+	 * EATS.
+	 */
+	safe_bits[1] |= cpu_to_le64(STRTAB_STE_1_EATS);
 }
 EXPORT_SYMBOL_IF_KUNIT(arm_smmu_get_ste_update_safe);
 
-- 
2.43.0
Re: [PATCH rc v4 3/4] iommu/arm-smmu-v3: Mark STE EATS safe when computing the update sequence
Posted by Mostafa Saleh 1 month, 2 weeks ago
On Tue, Dec 16, 2025 at 08:26:01PM -0800, Nicolin Chen wrote:
> From: Jason Gunthorpe <jgg@nvidia.com>
> 
> If a VM wants to toggle EATS off at the same time as changing the CFG, the
> hypervisor will see EATS change to 0 and insert a V=0 breaking update into
> the STE even though the VM did not ask for that.
> 
> In bare metal, EATS is ignored by CFG=ABORT/BYPASS, which is why this does
> not cause a problem until we have nested where CFG is always a variation of
> S2 trans that does use EATS.
> 
> Relax the rules for EATS sequencing, we don't need it to be exact because
> the enclosing code will always disable ATS at the PCI device if we are
> changing EATS. This ensures there are no ATS transactions that can race
> with an EATS change so we don't need to carefully sequence these bits.
> 
> Fixes: 1e8be08d1c91 ("iommu/arm-smmu-v3: Support IOMMU_DOMAIN_NESTED")
> Cc: stable@vger.kernel.org
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 12a9669bcc83..a3b29ad20a82 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -1095,6 +1095,15 @@ void arm_smmu_get_ste_update_safe(__le64 *safe_bits)
>  	 *  fault records even when MEV == 0.
>  	 */
>  	safe_bits[1] |= cpu_to_le64(STRTAB_STE_1_MEV);
> +
> +	/*
> +	 * EATS is used to reject and control the ATS behavior of the device. If
> +	 * we are changing it away from 0 then we already trust the device to
> +	 * use ATS properly and we have sequenced the device's ATS enable in PCI
> +	 * config space to prevent it from issuing ATS while we are changing
> +	 * EATS.
> +	 */

I am not sure about this one, Is it only about trusting the device?

I’d be worried about cases where we switch domains, that means that
briefly the HW observers EATS=1 while it was not intended, especially
that EATS is in a different DWORD from S2TTB and CDptr. With all the
IOMMUFD/VFIO stuff it makes it harder to reason about. But I can’t
come up with an example to break this.

Thanks,
Mostafa

> +	safe_bits[1] |= cpu_to_le64(STRTAB_STE_1_EATS);
>  }
>  EXPORT_SYMBOL_IF_KUNIT(arm_smmu_get_ste_update_safe);
>  
> -- 
> 2.43.0
> 
Re: [PATCH rc v4 3/4] iommu/arm-smmu-v3: Mark STE EATS safe when computing the update sequence
Posted by Nicolin Chen 1 month, 2 weeks ago
On Thu, Dec 18, 2025 at 04:42:40PM +0000, Mostafa Saleh wrote:
> On Tue, Dec 16, 2025 at 08:26:01PM -0800, Nicolin Chen wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > 
> > If a VM wants to toggle EATS off at the same time as changing the CFG, the
> > hypervisor will see EATS change to 0 and insert a V=0 breaking update into
> > the STE even though the VM did not ask for that.
> > 
> > In bare metal, EATS is ignored by CFG=ABORT/BYPASS, which is why this does
> > not cause a problem until we have nested where CFG is always a variation of
> > S2 trans that does use EATS.
> > 
> > Relax the rules for EATS sequencing, we don't need it to be exact because
> > the enclosing code will always disable ATS at the PCI device if we are
> > changing EATS. This ensures there are no ATS transactions that can race
> > with an EATS change so we don't need to carefully sequence these bits.
> > 
> > Fixes: 1e8be08d1c91 ("iommu/arm-smmu-v3: Support IOMMU_DOMAIN_NESTED")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> > Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com>
> > Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> > ---
> >  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 9 +++++++++
> >  1 file changed, 9 insertions(+)
> > 
> > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > index 12a9669bcc83..a3b29ad20a82 100644
> > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > @@ -1095,6 +1095,15 @@ void arm_smmu_get_ste_update_safe(__le64 *safe_bits)
> >  	 *  fault records even when MEV == 0.
> >  	 */
> >  	safe_bits[1] |= cpu_to_le64(STRTAB_STE_1_MEV);
> > +
> > +	/*
> > +	 * EATS is used to reject and control the ATS behavior of the device. If
> > +	 * we are changing it away from 0 then we already trust the device to
> > +	 * use ATS properly and we have sequenced the device's ATS enable in PCI
> > +	 * config space to prevent it from issuing ATS while we are changing
> > +	 * EATS.
> > +	 */
> 
> I am not sure about this one, Is it only about trusting the device?
>
> I’d be worried about cases where we switch domains, that means that
> briefly the HW observers EATS=1 while it was not intended, especially
> that EATS is in a different DWORD from S2TTB and CDptr. With all the
> IOMMUFD/VFIO stuff it makes it harder to reason about. But I can’t
> come up with an example to break this.

Hmm..

I think the last line that driver controls pci_enable/disable_ats()
should justify the whole thing? Are you worried about device still
doing ATS after pci_disable_ats()?

Thanks
Nicolin
Re: [PATCH rc v4 3/4] iommu/arm-smmu-v3: Mark STE EATS safe when computing the update sequence
Posted by Jason Gunthorpe 1 month, 2 weeks ago
On Thu, Dec 18, 2025 at 09:32:43AM -0800, Nicolin Chen wrote:
> > > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > > index 12a9669bcc83..a3b29ad20a82 100644
> > > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > > @@ -1095,6 +1095,15 @@ void arm_smmu_get_ste_update_safe(__le64 *safe_bits)
> > >  	 *  fault records even when MEV == 0.
> > >  	 */
> > >  	safe_bits[1] |= cpu_to_le64(STRTAB_STE_1_MEV);
> > > +
> > > +	/*
> > > +	 * EATS is used to reject and control the ATS behavior of the device. If
> > > +	 * we are changing it away from 0 then we already trust the device to
> > > +	 * use ATS properly and we have sequenced the device's ATS enable in PCI
> > > +	 * config space to prevent it from issuing ATS while we are changing
> > > +	 * EATS.
> > > +	 */
> > 
> > I am not sure about this one, Is it only about trusting the device?

Yes. The purpose of EATS=0 is to prevent the device from using ATS at
all - critically including using translated TLPs eg because it is an
untrusted device and the OS wants to prevent it from attacking the
system with direct access to physical memory.

If the device is trusted then once we disable ATS it must stop issuing
ATS, so the EATS=0 should never trigger a fault.

> > I’d be worried about cases where we switch domains, that means that
> > briefly the HW observers EATS=1 while it was not intended, especially
> > that EATS is in a different DWORD from S2TTB and CDptr. 

Well, no, it means EATS is enabled a little bit earlier or disabled a
little bit later, it doesn't mean it was not intended.

The point is our rules for ATS say that the ATC is empty at this
moment and the device is not permitted to do any ATS fetches because
we won't issue any flushes.

Thus there can be no concurrent ATS traffic and we don't need to
exactly sequence EATS with the translation.

With virtualization the hypervisor is still the exclusive owner of ATS
and guarentees that EATS enable/disable is sequences correctly with
ATC invalidation.

> I think the last line that driver controls pci_enable/disable_ats()
> should justify the whole thing? Are you worried about device still
> doing ATS after pci_disable_ats()?

Exactly right, and we can't worry about that because it says the whole
ATC coherency system is broken.

Jason
Re: [PATCH rc v4 3/4] iommu/arm-smmu-v3: Mark STE EATS safe when computing the update sequence
Posted by Mostafa Saleh 1 month ago
On Thu, Dec 18, 2025 at 02:01:29PM -0400, Jason Gunthorpe wrote:
> On Thu, Dec 18, 2025 at 09:32:43AM -0800, Nicolin Chen wrote:
> > > > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > > > index 12a9669bcc83..a3b29ad20a82 100644
> > > > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > > > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > > > @@ -1095,6 +1095,15 @@ void arm_smmu_get_ste_update_safe(__le64 *safe_bits)
> > > >  	 *  fault records even when MEV == 0.
> > > >  	 */
> > > >  	safe_bits[1] |= cpu_to_le64(STRTAB_STE_1_MEV);
> > > > +
> > > > +	/*
> > > > +	 * EATS is used to reject and control the ATS behavior of the device. If
> > > > +	 * we are changing it away from 0 then we already trust the device to
> > > > +	 * use ATS properly and we have sequenced the device's ATS enable in PCI
> > > > +	 * config space to prevent it from issuing ATS while we are changing
> > > > +	 * EATS.
> > > > +	 */
> > > 
> > > I am not sure about this one, Is it only about trusting the device?
> 
> Yes. The purpose of EATS=0 is to prevent the device from using ATS at
> all - critically including using translated TLPs eg because it is an
> untrusted device and the OS wants to prevent it from attacking the
> system with direct access to physical memory.
> 
> If the device is trusted then once we disable ATS it must stop issuing
> ATS, so the EATS=0 should never trigger a fault.
> 
> > > I’d be worried about cases where we switch domains, that means that
> > > briefly the HW observers EATS=1 while it was not intended, especially
> > > that EATS is in a different DWORD from S2TTB and CDptr. 
> 
> Well, no, it means EATS is enabled a little bit earlier or disabled a
> little bit later, it doesn't mean it was not intended.
> 
> The point is our rules for ATS say that the ATC is empty at this
> moment and the device is not permitted to do any ATS fetches because
> we won't issue any flushes.
> 
> Thus there can be no concurrent ATS traffic and we don't need to
> exactly sequence EATS with the translation.
> 
> With virtualization the hypervisor is still the exclusive owner of ATS
> and guarentees that EATS enable/disable is sequences correctly with
> ATC invalidation.

I see, thanks a lot for the explanation!
I was mainly worried about the virtualization case as the VMM can
influence ATS enable from the vSTE.
Looking at the code in arm-smmu-v3-iommufd and iommufd/device.c, it
seems ATS enable will only change at attach.
Looking into arm_smmu_attach_commit(), I see the ATC is invalidated
after the STE is installed, which means that userspace can transiently
observe the old domain ATC. I think that's fine at the moment because
when binding to a VFIO driver, it will attach to an empty domain.

Thanks,
Mostafa

> 
> > I think the last line that driver controls pci_enable/disable_ats()
> > should justify the whole thing? Are you worried about device still
> > doing ATS after pci_disable_ats()?
> 
> Exactly right, and we can't worry about that because it says the whole
> ATC coherency system is broken.
> 
> Jason
Re: [PATCH rc v4 3/4] iommu/arm-smmu-v3: Mark STE EATS safe when computing the update sequence
Posted by Jason Gunthorpe 1 month ago
On Fri, Jan 02, 2026 at 06:22:45PM +0000, Mostafa Saleh wrote:

> Looking at the code in arm-smmu-v3-iommufd and iommufd/device.c, it
> seems ATS enable will only change at attach.
> Looking into arm_smmu_attach_commit(), I see the ATC is invalidated
> after the STE is installed, which means that userspace can transiently
> observe the old domain ATC. I think that's fine at the moment because
> when binding to a VFIO driver, it will attach to an empty domain.

Yeah, it is a bit confusing..

The PCI ATS flag in the config space must exclusively be controlled by
the IOMMU driver. We must not allow the PCI device to turn on ATS
while the IOMMU driver is not synchronized and able to issue flushes,
otherwise we can have cache inconsistencies.

This goes both ways, we can't allow the hypervisor to turn on ATS
unless the VM is aware and is issuing flushes for its S1, and we can't
have the VM turn on ATS unless the hypervisor is aware and issuing
flushes for its S2.

Thus, for a VM the vPCI config space for ATS is ignored by HW, and the
VM controls ATS ONLY through the vSTE EATS bit.

Only once the VM enables EATS do we get to turn on the PCI config
space ATS.

All the detail I explains before holds because all the ATS switching
logic in the drvier is designed to ensure no invalidations are lost,
regardless of what two domains are being switched between. At worst
you get a transient moment during switching when the ATC hold a page
that is either new/old domain - but that is resolved before the domain
switch completes.

AMD folks, this whole explanation in this thread applies to the AMD
driver too, it currently has wrong ATS for virtualization, and fixing
it means make it work like ARM.. It will need to be fixed to complete
the VIOMMU work.

Jason