[PATCH] iommu/vt-d: Clear Present bit before tearing down scalable-mode context entry

Michael Bommarito posted 1 patch 1 week, 4 days ago
drivers/iommu/intel/pasid.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
[PATCH] iommu/vt-d: Clear Present bit before tearing down scalable-mode context entry
Posted by Michael Bommarito 1 week, 4 days ago
device_pasid_table_teardown() zeroes the 128-bit scalable-mode context
entry with context_clear_entry() while the Present bit is still set. This
creates a window where the hardware can fetch a torn entry, with some
fields already zeroed while Present is still set, leading to unpredictable
behavior or spurious faults. The context-cache invalidation is issued only
after the entry has been zeroed, and intel_pasid_free_table() then frees
the PASID directory pages, so the IOMMU can keep walking a stale Present=1
entry that points at freed memory.

While x86 provides strong write ordering, the compiler may reorder the two
64-bit writes to the entry, and the hardware fetch is not guaranteed to be
atomic with respect to multiple CPU writes.

Commit c1e4f1dccbe9d ("iommu/vt-d: Clear Present bit before tearing down
context entry") fixed this exact pattern in domain_context_clear_one() and
the copied-context path, but device_pasid_table_teardown() was not
converted.

Align it with the "Guidance to Software for Invalidations" in the VT-d
spec, Section 6.5.3.3, using the same ownership handshake as the sibling
fix: clear only the Present bit, flush it to the IOMMU, perform the
context-cache invalidation, and only then zero the rest of the entry.

Fixes: 81e921fd32161 ("iommu/vt-d: Fix NULL domain on device release")
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Assisted-by: Claude:claude-opus-4-7
---
Found by static analysis while auditing the callers of context_clear_entry()
for the same teardown ordering that c1e4f1dccbe9d addressed. This site is
reachable only in scalable mode, so it does not manifest on the legacy-mode
hardware available to me; I could not trigger a runtime fault and the change
is verified by code inspection only, on the same basis as the sibling fix.
Compile-tested on x86_64 with CONFIG_INTEL_IOMMU; no new warnings.

 drivers/iommu/intel/pasid.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index 89541b74ab8ca..40910dc7363b1 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -748,10 +748,12 @@ static void device_pasid_table_teardown(struct device *dev, u8 bus, u8 devfn)
 	}
 
 	did = context_domain_id(context);
-	context_clear_entry(context);
+	context_clear_present(context);
 	__iommu_flush_cache(iommu, context, sizeof(*context));
 	spin_unlock(&iommu->lock);
 	intel_context_flush_no_pasid(info, context, did);
+	context_clear_entry(context);
+	__iommu_flush_cache(iommu, context, sizeof(*context));
 }
 
 static int pci_pasid_table_teardown(struct pci_dev *pdev, u16 alias, void *data)

base-commit: 5200f5f493f79f14bbdc349e402a40dfb32f23c8
-- 
2.53.0
Re: [PATCH] iommu/vt-d: Clear Present bit before tearing down scalable-mode context entry
Posted by Baolu Lu 1 week ago
On 5/28/26 10:55, Michael Bommarito wrote:
> device_pasid_table_teardown() zeroes the 128-bit scalable-mode context
> entry with context_clear_entry() while the Present bit is still set. This
> creates a window where the hardware can fetch a torn entry, with some
> fields already zeroed while Present is still set, leading to unpredictable
> behavior or spurious faults. The context-cache invalidation is issued only
> after the entry has been zeroed, and intel_pasid_free_table() then frees
> the PASID directory pages, so the IOMMU can keep walking a stale Present=1
> entry that points at freed memory.
> 
> While x86 provides strong write ordering, the compiler may reorder the two
> 64-bit writes to the entry, and the hardware fetch is not guaranteed to be
> atomic with respect to multiple CPU writes.
> 
> Commit c1e4f1dccbe9d ("iommu/vt-d: Clear Present bit before tearing down
> context entry") fixed this exact pattern in domain_context_clear_one() and
> the copied-context path, but device_pasid_table_teardown() was not
> converted.
> 
> Align it with the "Guidance to Software for Invalidations" in the VT-d
> spec, Section 6.5.3.3, using the same ownership handshake as the sibling
> fix: clear only the Present bit, flush it to the IOMMU, perform the
> context-cache invalidation, and only then zero the rest of the entry.
> 
> Fixes: 81e921fd32161 ("iommu/vt-d: Fix NULL domain on device release")
> Signed-off-by: Michael Bommarito<michael.bommarito@gmail.com>
> Assisted-by:Claude:claude-opus-4-7
> ---
> Found by static analysis while auditing the callers of context_clear_entry()
> for the same teardown ordering that c1e4f1dccbe9d addressed. This site is
> reachable only in scalable mode, so it does not manifest on the legacy-mode
> hardware available to me; I could not trigger a runtime fault and the change
> is verified by code inspection only, on the same basis as the sibling fix.
> Compile-tested on x86_64 with CONFIG_INTEL_IOMMU; no new warnings.
> 
>   drivers/iommu/intel/pasid.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)

Sashiko reported a pre-existing issue:

https://sashiko.dev/#/patchset/20260528025557.3209367-1-michael.bommarito%40gmail.com

I have added it to my task list and will follow up with a fix in a
separate patch.

[Severity: Critical]
This is a pre-existing issue, but does this context cache flush actually
invalidate the alias BDF?

It looks like intel_context_flush_no_pasid() issues a device-selective
invalidation that is hardcoded to use the primary device's BDF from info
(PCI_DEVID(info->bus, info->devfn)), completely ignoring the alias's bus
and devfn parameters passed into device_pasid_table_teardown().

If device_pasid_table_teardown() is called for a DMA alias, the IOMMU
context cache for the alias BDF might never be invalidated.

When the domain's page tables or scalable-mode PASID directories are
subsequently freed, could any DMA arriving under the alias BDF cause the
IOMMU hardware to walk the stale cache entry into freed system memory?

Thanks,
baolu
Re: [PATCH] iommu/vt-d: Clear Present bit before tearing down scalable-mode context entry
Posted by Baolu Lu 1 week ago
On 5/28/26 10:55, Michael Bommarito wrote:
> device_pasid_table_teardown() zeroes the 128-bit scalable-mode context
> entry with context_clear_entry() while the Present bit is still set. This
> creates a window where the hardware can fetch a torn entry, with some
> fields already zeroed while Present is still set, leading to unpredictable
> behavior or spurious faults. The context-cache invalidation is issued only
> after the entry has been zeroed, and intel_pasid_free_table() then frees
> the PASID directory pages, so the IOMMU can keep walking a stale Present=1
> entry that points at freed memory.
> 
> While x86 provides strong write ordering, the compiler may reorder the two
> 64-bit writes to the entry, and the hardware fetch is not guaranteed to be
> atomic with respect to multiple CPU writes.
> 
> Commit c1e4f1dccbe9d ("iommu/vt-d: Clear Present bit before tearing down
> context entry") fixed this exact pattern in domain_context_clear_one() and
> the copied-context path, but device_pasid_table_teardown() was not
> converted.
> 
> Align it with the "Guidance to Software for Invalidations" in the VT-d
> spec, Section 6.5.3.3, using the same ownership handshake as the sibling
> fix: clear only the Present bit, flush it to the IOMMU, perform the
> context-cache invalidation, and only then zero the rest of the entry.
> 
> Fixes: 81e921fd32161 ("iommu/vt-d: Fix NULL domain on device release")
> Signed-off-by: Michael Bommarito<michael.bommarito@gmail.com>
> Assisted-by:Claude:claude-opus-4-7
> ---
> Found by static analysis while auditing the callers of context_clear_entry()
> for the same teardown ordering that c1e4f1dccbe9d addressed. This site is
> reachable only in scalable mode, so it does not manifest on the legacy-mode
> hardware available to me; I could not trigger a runtime fault and the change
> is verified by code inspection only, on the same basis as the sibling fix.
> Compile-tested on x86_64 with CONFIG_INTEL_IOMMU; no new warnings.
> 
>   drivers/iommu/intel/pasid.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)

Queued for linux-next. Thank you!