From: Jason Gunthorpe <jgg@nvidia.com>
If a VM wants to toggle EATS off at the same time as changing the CFG, the
hypervisor will see EATS change to 0 and insert a V=0 breaking update into
the STE even though the VM did not ask for that.
In bare metal, EATS is ignored by CFG=ABORT/BYPASS, which is why this does
not cause a problem until we have nested where CFG is always a variation of
S2 trans that does use EATS.
Relax the rules for EATS sequencing, we don't need it to be exact because
the enclosing code will always disable ATS at the PCI device if we are
changing EATS. This ensures there are no ATS transactions that can race
with an EATS change so we don't need to carefully sequence these bits.
Fixes: 1e8be08d1c91 ("iommu/arm-smmu-v3: Support IOMMU_DOMAIN_NESTED")
Cc: stable@vger.kernel.org
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 12a9669bcc83..a3b29ad20a82 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1095,6 +1095,15 @@ void arm_smmu_get_ste_update_safe(__le64 *safe_bits)
* fault records even when MEV == 0.
*/
safe_bits[1] |= cpu_to_le64(STRTAB_STE_1_MEV);
+
+ /*
+ * EATS is used to reject and control the ATS behavior of the device. If
+ * we are changing it away from 0 then we already trust the device to
+ * use ATS properly and we have sequenced the device's ATS enable in PCI
+ * config space to prevent it from issuing ATS while we are changing
+ * EATS.
+ */
+ safe_bits[1] |= cpu_to_le64(STRTAB_STE_1_EATS);
}
EXPORT_SYMBOL_IF_KUNIT(arm_smmu_get_ste_update_safe);
--
2.43.0
On Tue, Dec 16, 2025 at 08:26:01PM -0800, Nicolin Chen wrote:
> From: Jason Gunthorpe <jgg@nvidia.com>
>
> If a VM wants to toggle EATS off at the same time as changing the CFG, the
> hypervisor will see EATS change to 0 and insert a V=0 breaking update into
> the STE even though the VM did not ask for that.
>
> In bare metal, EATS is ignored by CFG=ABORT/BYPASS, which is why this does
> not cause a problem until we have nested where CFG is always a variation of
> S2 trans that does use EATS.
>
> Relax the rules for EATS sequencing, we don't need it to be exact because
> the enclosing code will always disable ATS at the PCI device if we are
> changing EATS. This ensures there are no ATS transactions that can race
> with an EATS change so we don't need to carefully sequence these bits.
>
> Fixes: 1e8be08d1c91 ("iommu/arm-smmu-v3: Support IOMMU_DOMAIN_NESTED")
> Cc: stable@vger.kernel.org
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> ---
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 12a9669bcc83..a3b29ad20a82 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -1095,6 +1095,15 @@ void arm_smmu_get_ste_update_safe(__le64 *safe_bits)
> * fault records even when MEV == 0.
> */
> safe_bits[1] |= cpu_to_le64(STRTAB_STE_1_MEV);
> +
> + /*
> + * EATS is used to reject and control the ATS behavior of the device. If
> + * we are changing it away from 0 then we already trust the device to
> + * use ATS properly and we have sequenced the device's ATS enable in PCI
> + * config space to prevent it from issuing ATS while we are changing
> + * EATS.
> + */
I am not sure about this one, Is it only about trusting the device?
I’d be worried about cases where we switch domains, that means that
briefly the HW observers EATS=1 while it was not intended, especially
that EATS is in a different DWORD from S2TTB and CDptr. With all the
IOMMUFD/VFIO stuff it makes it harder to reason about. But I can’t
come up with an example to break this.
Thanks,
Mostafa
> + safe_bits[1] |= cpu_to_le64(STRTAB_STE_1_EATS);
> }
> EXPORT_SYMBOL_IF_KUNIT(arm_smmu_get_ste_update_safe);
>
> --
> 2.43.0
>
On Thu, Dec 18, 2025 at 04:42:40PM +0000, Mostafa Saleh wrote:
> On Tue, Dec 16, 2025 at 08:26:01PM -0800, Nicolin Chen wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> >
> > If a VM wants to toggle EATS off at the same time as changing the CFG, the
> > hypervisor will see EATS change to 0 and insert a V=0 breaking update into
> > the STE even though the VM did not ask for that.
> >
> > In bare metal, EATS is ignored by CFG=ABORT/BYPASS, which is why this does
> > not cause a problem until we have nested where CFG is always a variation of
> > S2 trans that does use EATS.
> >
> > Relax the rules for EATS sequencing, we don't need it to be exact because
> > the enclosing code will always disable ATS at the PCI device if we are
> > changing EATS. This ensures there are no ATS transactions that can race
> > with an EATS change so we don't need to carefully sequence these bits.
> >
> > Fixes: 1e8be08d1c91 ("iommu/arm-smmu-v3: Support IOMMU_DOMAIN_NESTED")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> > Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com>
> > Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> > ---
> > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 9 +++++++++
> > 1 file changed, 9 insertions(+)
> >
> > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > index 12a9669bcc83..a3b29ad20a82 100644
> > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > @@ -1095,6 +1095,15 @@ void arm_smmu_get_ste_update_safe(__le64 *safe_bits)
> > * fault records even when MEV == 0.
> > */
> > safe_bits[1] |= cpu_to_le64(STRTAB_STE_1_MEV);
> > +
> > + /*
> > + * EATS is used to reject and control the ATS behavior of the device. If
> > + * we are changing it away from 0 then we already trust the device to
> > + * use ATS properly and we have sequenced the device's ATS enable in PCI
> > + * config space to prevent it from issuing ATS while we are changing
> > + * EATS.
> > + */
>
> I am not sure about this one, Is it only about trusting the device?
>
> I’d be worried about cases where we switch domains, that means that
> briefly the HW observers EATS=1 while it was not intended, especially
> that EATS is in a different DWORD from S2TTB and CDptr. With all the
> IOMMUFD/VFIO stuff it makes it harder to reason about. But I can’t
> come up with an example to break this.
Hmm..
I think the last line that driver controls pci_enable/disable_ats()
should justify the whole thing? Are you worried about device still
doing ATS after pci_disable_ats()?
Thanks
Nicolin
On Thu, Dec 18, 2025 at 09:32:43AM -0800, Nicolin Chen wrote: > > > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > > > index 12a9669bcc83..a3b29ad20a82 100644 > > > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > > > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > > > @@ -1095,6 +1095,15 @@ void arm_smmu_get_ste_update_safe(__le64 *safe_bits) > > > * fault records even when MEV == 0. > > > */ > > > safe_bits[1] |= cpu_to_le64(STRTAB_STE_1_MEV); > > > + > > > + /* > > > + * EATS is used to reject and control the ATS behavior of the device. If > > > + * we are changing it away from 0 then we already trust the device to > > > + * use ATS properly and we have sequenced the device's ATS enable in PCI > > > + * config space to prevent it from issuing ATS while we are changing > > > + * EATS. > > > + */ > > > > I am not sure about this one, Is it only about trusting the device? Yes. The purpose of EATS=0 is to prevent the device from using ATS at all - critically including using translated TLPs eg because it is an untrusted device and the OS wants to prevent it from attacking the system with direct access to physical memory. If the device is trusted then once we disable ATS it must stop issuing ATS, so the EATS=0 should never trigger a fault. > > I’d be worried about cases where we switch domains, that means that > > briefly the HW observers EATS=1 while it was not intended, especially > > that EATS is in a different DWORD from S2TTB and CDptr. Well, no, it means EATS is enabled a little bit earlier or disabled a little bit later, it doesn't mean it was not intended. The point is our rules for ATS say that the ATC is empty at this moment and the device is not permitted to do any ATS fetches because we won't issue any flushes. Thus there can be no concurrent ATS traffic and we don't need to exactly sequence EATS with the translation. With virtualization the hypervisor is still the exclusive owner of ATS and guarentees that EATS enable/disable is sequences correctly with ATC invalidation. > I think the last line that driver controls pci_enable/disable_ats() > should justify the whole thing? Are you worried about device still > doing ATS after pci_disable_ats()? Exactly right, and we can't worry about that because it says the whole ATC coherency system is broken. Jason
On Thu, Dec 18, 2025 at 02:01:29PM -0400, Jason Gunthorpe wrote: > On Thu, Dec 18, 2025 at 09:32:43AM -0800, Nicolin Chen wrote: > > > > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > > > > index 12a9669bcc83..a3b29ad20a82 100644 > > > > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > > > > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > > > > @@ -1095,6 +1095,15 @@ void arm_smmu_get_ste_update_safe(__le64 *safe_bits) > > > > * fault records even when MEV == 0. > > > > */ > > > > safe_bits[1] |= cpu_to_le64(STRTAB_STE_1_MEV); > > > > + > > > > + /* > > > > + * EATS is used to reject and control the ATS behavior of the device. If > > > > + * we are changing it away from 0 then we already trust the device to > > > > + * use ATS properly and we have sequenced the device's ATS enable in PCI > > > > + * config space to prevent it from issuing ATS while we are changing > > > > + * EATS. > > > > + */ > > > > > > I am not sure about this one, Is it only about trusting the device? > > Yes. The purpose of EATS=0 is to prevent the device from using ATS at > all - critically including using translated TLPs eg because it is an > untrusted device and the OS wants to prevent it from attacking the > system with direct access to physical memory. > > If the device is trusted then once we disable ATS it must stop issuing > ATS, so the EATS=0 should never trigger a fault. > > > > I’d be worried about cases where we switch domains, that means that > > > briefly the HW observers EATS=1 while it was not intended, especially > > > that EATS is in a different DWORD from S2TTB and CDptr. > > Well, no, it means EATS is enabled a little bit earlier or disabled a > little bit later, it doesn't mean it was not intended. > > The point is our rules for ATS say that the ATC is empty at this > moment and the device is not permitted to do any ATS fetches because > we won't issue any flushes. > > Thus there can be no concurrent ATS traffic and we don't need to > exactly sequence EATS with the translation. > > With virtualization the hypervisor is still the exclusive owner of ATS > and guarentees that EATS enable/disable is sequences correctly with > ATC invalidation. I see, thanks a lot for the explanation! I was mainly worried about the virtualization case as the VMM can influence ATS enable from the vSTE. Looking at the code in arm-smmu-v3-iommufd and iommufd/device.c, it seems ATS enable will only change at attach. Looking into arm_smmu_attach_commit(), I see the ATC is invalidated after the STE is installed, which means that userspace can transiently observe the old domain ATC. I think that's fine at the moment because when binding to a VFIO driver, it will attach to an empty domain. Thanks, Mostafa > > > I think the last line that driver controls pci_enable/disable_ats() > > should justify the whole thing? Are you worried about device still > > doing ATS after pci_disable_ats()? > > Exactly right, and we can't worry about that because it says the whole > ATC coherency system is broken. > > Jason
On Fri, Jan 02, 2026 at 06:22:45PM +0000, Mostafa Saleh wrote: > Looking at the code in arm-smmu-v3-iommufd and iommufd/device.c, it > seems ATS enable will only change at attach. > Looking into arm_smmu_attach_commit(), I see the ATC is invalidated > after the STE is installed, which means that userspace can transiently > observe the old domain ATC. I think that's fine at the moment because > when binding to a VFIO driver, it will attach to an empty domain. Yeah, it is a bit confusing.. The PCI ATS flag in the config space must exclusively be controlled by the IOMMU driver. We must not allow the PCI device to turn on ATS while the IOMMU driver is not synchronized and able to issue flushes, otherwise we can have cache inconsistencies. This goes both ways, we can't allow the hypervisor to turn on ATS unless the VM is aware and is issuing flushes for its S1, and we can't have the VM turn on ATS unless the hypervisor is aware and issuing flushes for its S2. Thus, for a VM the vPCI config space for ATS is ignored by HW, and the VM controls ATS ONLY through the vSTE EATS bit. Only once the VM enables EATS do we get to turn on the PCI config space ATS. All the detail I explains before holds because all the ATS switching logic in the drvier is designed to ensure no invalidations are lost, regardless of what two domains are being switched between. At worst you get a transient moment during switching when the ATC hold a page that is either new/old domain - but that is resolved before the domain switch completes. AMD folks, this whole explanation in this thread applies to the AMD driver too, it currently has wrong ATS for virtualization, and fixing it means make it work like ARM.. It will need to be fixed to complete the VIOMMU work. Jason
© 2016 - 2026 Red Hat, Inc.