.../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 23 ++++++- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 66 +++++++++++++++---- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 40 +++++++++++ 3 files changed, 117 insertions(+), 12 deletions(-)
Nvidia Tegra264 SMMUs are affected by an erratum where a TLB entry can
survive an invalidation that races with concurrent traffic targeting
the same entry. The hardware-recommended software workaround is to
issue every CFGI/TLBI command (each followed by CMD_SYNC) twice. The
second issue must execute only after the first issue's CMD_SYNC has
completed, giving the sequence:
TLBI/CFGI ... CMD_SYNC TLBI/CFGI ... CMD_SYNC
ATC_INV is not affected and must not be doubled.
This series implements the workaround by hooking the duplication into
arm_smmu_cmdq_issue_cmdlist(), the single chokepoint that every
synchronous CMDQ submission flows through.
Patch 1 is a preparatory refactor that factors the existing batch
force-sync conditions out of arm_smmu_cmdq_batch_add_cmd_p() into a
new arm_smmu_cmdq_batch_force_sync() helper. No functional change.
Patch 2 detects affected instances using the existing
"nvidia,tegra264-smmu" compatible string, exposes the condition via a
new ARM_SMMU_OPT_TLBI_TWICE option bit, and adds a static-inline
arm_smmu_cmd_needs_tlbi_twice() classifier in arm-smmu-v3.h so that
both the in-tree CMDQ path and the iommufd VSMMU path can share a
single predicate.
Patch 3 wires the workaround in. arm_smmu_cmdq_issue_cmdlist() becomes
a thin wrapper that re-issues a synced cmdlist a second time when the
first command needs doubling. The Tegra264 condition is added to
arm_smmu_cmdq_batch_force_sync() so a full batch carrying CFGI/TLBI
commands flushes with sync=true and is then doubled. The iommufd
VSMMU path (arm_vsmmu_cache_invalidate()) is also taught to split the
user-supplied batch at every "needs doubling" / "doesn't need
doubling" transition via a new arm_vsmmu_can_batch_cmd() predicate,
since that path can otherwise mix CFGI/TLBI with ATC_INV in a single
submission.
The series is based on Jason Gunthorpe's "Remove SMMUv3
struct arm_smmu_cmdq_ent" series [1], specifically commit 13428b0bf794
("iommu/arm-smmu-v3: Directly encode TLBI commands") which is the
final patch of that series in linux-next.
[1] https://lore.kernel.org/all/177919957385.1012282.14787407041669291032.b4-ty@kernel.org/
Changes since v2:
- Add new prep patch 1/3 that factors the existing force-sync
conditions into arm_smmu_cmdq_batch_force_sync() (from Nicolin).
- Move arm_smmu_cmd_needs_tlbi_twice() to arm-smmu-v3.h as static
inline taking (smmu, cmd*) and folding in the option check.
- Plug the Tegra264 condition into arm_smmu_cmdq_batch_force_sync()
instead of carrying a separate need_sync in batch_add_cmd_p().
- Fix iommufd batching: arm_vsmmu_cache_invalidate() can mix
CFGI/TLBI with ATC_INV in one batch. Split at the boundary via a
new arm_vsmmu_can_batch_cmd() predicate.
- Patch 2 wording: "next patch wires" -> "a subsequent change will
wire".
v2: https://lore.kernel.org/all/20260529140830.629738-1-amhetre@nvidia.com/
Nicolin Chen (1):
iommu/arm-smmu-v3: Factor out CMDQ batch force-sync conditions
Ashish Mhetre (2):
iommu/arm-smmu-v3: Detect Tegra264 erratum
iommu/arm-smmu-v3: Issue CFGI/TLBI twice on Tegra264
.../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 23 ++++++-
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 66 +++++++++++++++----
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 40 +++++++++++
3 files changed, 117 insertions(+), 12 deletions(-)
base-commit: 13428b0bf7947098daf9a1db14a74d33eb1b5079
--
2.50.1
On Mon, Jun 01, 2026 at 10:48:42AM +0000, Ashish Mhetre wrote:
> Nvidia Tegra264 SMMUs are affected by an erratum where a TLB entry can
> survive an invalidation that races with concurrent traffic targeting
> the same entry. The hardware-recommended software workaround is to
> issue every CFGI/TLBI command (each followed by CMD_SYNC) twice. The
> second issue must execute only after the first issue's CMD_SYNC has
> completed, giving the sequence:
This seems quite intrusive, will the TLB entry survive if you push a
full invalidation instead?
Thanks,
Mostafa
>
> TLBI/CFGI ... CMD_SYNC TLBI/CFGI ... CMD_SYNC
>
> ATC_INV is not affected and must not be doubled.
>
> This series implements the workaround by hooking the duplication into
> arm_smmu_cmdq_issue_cmdlist(), the single chokepoint that every
> synchronous CMDQ submission flows through.
>
> Patch 1 is a preparatory refactor that factors the existing batch
> force-sync conditions out of arm_smmu_cmdq_batch_add_cmd_p() into a
> new arm_smmu_cmdq_batch_force_sync() helper. No functional change.
>
> Patch 2 detects affected instances using the existing
> "nvidia,tegra264-smmu" compatible string, exposes the condition via a
> new ARM_SMMU_OPT_TLBI_TWICE option bit, and adds a static-inline
> arm_smmu_cmd_needs_tlbi_twice() classifier in arm-smmu-v3.h so that
> both the in-tree CMDQ path and the iommufd VSMMU path can share a
> single predicate.
>
> Patch 3 wires the workaround in. arm_smmu_cmdq_issue_cmdlist() becomes
> a thin wrapper that re-issues a synced cmdlist a second time when the
> first command needs doubling. The Tegra264 condition is added to
> arm_smmu_cmdq_batch_force_sync() so a full batch carrying CFGI/TLBI
> commands flushes with sync=true and is then doubled. The iommufd
> VSMMU path (arm_vsmmu_cache_invalidate()) is also taught to split the
> user-supplied batch at every "needs doubling" / "doesn't need
> doubling" transition via a new arm_vsmmu_can_batch_cmd() predicate,
> since that path can otherwise mix CFGI/TLBI with ATC_INV in a single
> submission.
>
> The series is based on Jason Gunthorpe's "Remove SMMUv3
> struct arm_smmu_cmdq_ent" series [1], specifically commit 13428b0bf794
> ("iommu/arm-smmu-v3: Directly encode TLBI commands") which is the
> final patch of that series in linux-next.
>
> [1] https://lore.kernel.org/all/177919957385.1012282.14787407041669291032.b4-ty@kernel.org/
>
> Changes since v2:
> - Add new prep patch 1/3 that factors the existing force-sync
> conditions into arm_smmu_cmdq_batch_force_sync() (from Nicolin).
> - Move arm_smmu_cmd_needs_tlbi_twice() to arm-smmu-v3.h as static
> inline taking (smmu, cmd*) and folding in the option check.
> - Plug the Tegra264 condition into arm_smmu_cmdq_batch_force_sync()
> instead of carrying a separate need_sync in batch_add_cmd_p().
> - Fix iommufd batching: arm_vsmmu_cache_invalidate() can mix
> CFGI/TLBI with ATC_INV in one batch. Split at the boundary via a
> new arm_vsmmu_can_batch_cmd() predicate.
> - Patch 2 wording: "next patch wires" -> "a subsequent change will
> wire".
>
> v2: https://lore.kernel.org/all/20260529140830.629738-1-amhetre@nvidia.com/
>
>
> Nicolin Chen (1):
> iommu/arm-smmu-v3: Factor out CMDQ batch force-sync conditions
>
> Ashish Mhetre (2):
> iommu/arm-smmu-v3: Detect Tegra264 erratum
> iommu/arm-smmu-v3: Issue CFGI/TLBI twice on Tegra264
>
> .../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 23 ++++++-
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 66 +++++++++++++++----
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 40 +++++++++++
> 3 files changed, 117 insertions(+), 12 deletions(-)
>
>
> base-commit: 13428b0bf7947098daf9a1db14a74d33eb1b5079
> --
> 2.50.1
>
>
On Tue, Jun 02, 2026 at 04:31:29PM +0000, Mostafa Saleh wrote: > On Mon, Jun 01, 2026 at 10:48:42AM +0000, Ashish Mhetre wrote: > > Nvidia Tegra264 SMMUs are affected by an erratum where a TLB entry can > > survive an invalidation that races with concurrent traffic targeting > > the same entry. The hardware-recommended software workaround is to > > issue every CFGI/TLBI command (each followed by CMD_SYNC) twice. The > > second issue must execute only after the first issue's CMD_SYNC has > > completed, giving the sequence: > > This seems quite intrusive, will the TLB entry survive if you push a > full invalidation instead? It's 36 lines and completely contained to the insides of the command sumbissions code?? Stuff like this is why I was guiding you to use the more code exactly as is for pkvm. Historically there have been many invalidation related errata, and invalidate twice seems to be a theme in fixing many of them. Jason
© 2016 - 2026 Red Hat, Inc.