[PATCH v1] iommu/amd: IRT cache incoherency bug

Magnus Kalland posted 1 patch 4 days, 5 hours ago
drivers/iommu/amd/iommu.c | 30 +++++++++++++++++++++++++++---
1 file changed, 27 insertions(+), 3 deletions(-)
[PATCH v1] iommu/amd: IRT cache incoherency bug
Posted by Magnus Kalland 4 days, 5 hours ago
DMA aliasing causes interrupt remapping table entries (IRTEs) to be shared
between multiple device IDs. See commit 3c124435e8dd
("iommu/amd: Support multiple PCI DMA aliases in IRQ Remapping") for more
information on this. However, the AMD IOMMU driver currently invalidates
IRTE cache entries on a per-device basis whenever an IRTE is updated, not
for each alias.

This approach leaves stale IRTE cache entries when an IRTE is cached under
one DMA alias but later updated and invalidated through a different alias.
In such cases, the original device ID is never invalidated, since it is
programmed via aliasing.

This incoherency bug has been observed when IRTEs are cached for one
Non-Transparent Bridge (NTB) DMA alias, later updated via another.

Fix this by invalidating the interrupt remapping table cache for all DMA
aliases when updating an IRTE.

Link to original thread: https://lore.kernel.org/linux-iommu/20251215114952.190550-2-magnus@dolphinics.com/T/#raf80785292deb22aafe6a817424051ea0d1d28f4

Resending for visibility.

Changes since original thread:
 - Renamed struct pci_dev pointer parameter to unused
 - Rebased on latest master

Cc: Tore H. Larsen <torel@simula.no>
Co-developed-by: Lars B. Kristiansen <larsk@dolphinics.com>
Signed-off-by: Lars B. Kristiansen <larsk@dolphinics.com>
Co-developed-by: Jonas Markussen <jonas@dolphinics.com>
Signed-off-by: Jonas Markussen <jonas@dolphinics.com>
Signed-off-by: Magnus Kalland <magnus@dolphinics.com>

---
 drivers/iommu/amd/iommu.c | 30 +++++++++++++++++++++++++++---
 1 file changed, 27 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 7c12be1b247f..404afcaf4bc1 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3103,22 +3103,44 @@ const struct iommu_ops amd_iommu_ops = {
 static struct irq_chip amd_ir_chip;
 static DEFINE_SPINLOCK(iommu_table_lock);
 
+static int iommu_flush_dev_irt(struct pci_dev *unused, u16 devid, void *data)
+{
+	int ret;
+	struct iommu_cmd cmd;
+	struct amd_iommu *iommu = data;
+
+	build_inv_irt(&cmd, devid);
+	ret = __iommu_queue_command_sync(iommu, &cmd, true);
+	return ret;
+}
+
 static void iommu_flush_irt_and_complete(struct amd_iommu *iommu, u16 devid)
 {
 	int ret;
 	u64 data;
+	int domain = iommu->pci_seg->id;
+	unsigned int bus = PCI_BUS_NUM(devid);
+	unsigned int devfn = devid & 0xff;
 	unsigned long flags;
 	struct iommu_cmd cmd, cmd2;
+	struct pci_dev *pdev = NULL;
 
 	if (iommu->irtcachedis_enabled)
 		return;
 
-	build_inv_irt(&cmd, devid);
 	data = atomic64_inc_return(&iommu->cmd_sem_val);
 	build_completion_wait(&cmd2, iommu, data);
 
-	raw_spin_lock_irqsave(&iommu->lock, flags);
-	ret = __iommu_queue_command_sync(iommu, &cmd, true);
+	pdev = pci_get_domain_bus_and_slot(domain, bus, devfn);
+	if (pdev) {
+		raw_spin_lock_irqsave(&iommu->lock, flags);
+		ret = pci_for_each_dma_alias(pdev, iommu_flush_dev_irt, iommu);
+	} else {
+		build_inv_irt(&cmd, devid);
+		raw_spin_lock_irqsave(&iommu->lock, flags);
+		ret = __iommu_queue_command_sync(iommu, &cmd, true);
+	}
+
 	if (ret)
 		goto out;
 	ret = __iommu_queue_command_sync(iommu, &cmd2, false);
@@ -3127,6 +3149,8 @@ static void iommu_flush_irt_and_complete(struct amd_iommu *iommu, u16 devid)
 	wait_on_sem(iommu, data);
 out:
 	raw_spin_unlock_irqrestore(&iommu->lock, flags);
+	if (pdev)
+		pci_dev_put(pdev);
 }
 
 static inline u8 iommu_get_int_tablen(struct iommu_dev_data *dev_data)
-- 
2.43.0
Re: [PATCH v1] iommu/amd: IRT cache incoherency bug
Posted by Jörg Rödel 4 days, 3 hours ago
On Tue, Feb 03, 2026 at 12:32:10PM +0100, Magnus Kalland wrote:
> DMA aliasing causes interrupt remapping table entries (IRTEs) to be shared
> between multiple device IDs. See commit 3c124435e8dd
> ("iommu/amd: Support multiple PCI DMA aliases in IRQ Remapping") for more
> information on this. However, the AMD IOMMU driver currently invalidates
> IRTE cache entries on a per-device basis whenever an IRTE is updated, not
> for each alias.
> 
> This approach leaves stale IRTE cache entries when an IRTE is cached under
> one DMA alias but later updated and invalidated through a different alias.
> In such cases, the original device ID is never invalidated, since it is
> programmed via aliasing.
> 
> This incoherency bug has been observed when IRTEs are cached for one
> Non-Transparent Bridge (NTB) DMA alias, later updated via another.
> 
> Fix this by invalidating the interrupt remapping table cache for all DMA
> aliases when updating an IRTE.
> 
> Link to original thread: https://lore.kernel.org/linux-iommu/20251215114952.190550-2-magnus@dolphinics.com/T/#raf80785292deb22aafe6a817424051ea0d1d28f4
> 
> Resending for visibility.
> 
> Changes since original thread:
>  - Renamed struct pci_dev pointer parameter to unused
>  - Rebased on latest master
> 
> Cc: Tore H. Larsen <torel@simula.no>
> Co-developed-by: Lars B. Kristiansen <larsk@dolphinics.com>
> Signed-off-by: Lars B. Kristiansen <larsk@dolphinics.com>
> Co-developed-by: Jonas Markussen <jonas@dolphinics.com>
> Signed-off-by: Jonas Markussen <jonas@dolphinics.com>
> Signed-off-by: Magnus Kalland <magnus@dolphinics.com>
> 
> ---
>  drivers/iommu/amd/iommu.c | 30 +++++++++++++++++++++++++++---
>  1 file changed, 27 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> index 7c12be1b247f..404afcaf4bc1 100644
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> @@ -3103,22 +3103,44 @@ const struct iommu_ops amd_iommu_ops = {
>  static struct irq_chip amd_ir_chip;
>  static DEFINE_SPINLOCK(iommu_table_lock);
>  
> +static int iommu_flush_dev_irt(struct pci_dev *unused, u16 devid, void *data)
> +{
> +	int ret;
> +	struct iommu_cmd cmd;
> +	struct amd_iommu *iommu = data;
> +
> +	build_inv_irt(&cmd, devid);
> +	ret = __iommu_queue_command_sync(iommu, &cmd, true);
> +	return ret;
> +}
> +
>  static void iommu_flush_irt_and_complete(struct amd_iommu *iommu, u16 devid)
>  {
>  	int ret;
>  	u64 data;
> +	int domain = iommu->pci_seg->id;
> +	unsigned int bus = PCI_BUS_NUM(devid);
> +	unsigned int devfn = devid & 0xff;
>  	unsigned long flags;
>  	struct iommu_cmd cmd, cmd2;
> +	struct pci_dev *pdev = NULL;
>  
>  	if (iommu->irtcachedis_enabled)
>  		return;
>  
> -	build_inv_irt(&cmd, devid);
>  	data = atomic64_inc_return(&iommu->cmd_sem_val);
>  	build_completion_wait(&cmd2, iommu, data);
>  
> -	raw_spin_lock_irqsave(&iommu->lock, flags);
> -	ret = __iommu_queue_command_sync(iommu, &cmd, true);
> +	pdev = pci_get_domain_bus_and_slot(domain, bus, devfn);
> +	if (pdev) {
> +		raw_spin_lock_irqsave(&iommu->lock, flags);

Move the lock above the if () ...

> +		ret = pci_for_each_dma_alias(pdev, iommu_flush_dev_irt, iommu);
> +	} else {
> +		build_inv_irt(&cmd, devid);
> +		raw_spin_lock_irqsave(&iommu->lock, flags);
> +		ret = __iommu_queue_command_sync(iommu, &cmd, true);

... and call iommu_flush_dev_irt(NULL, devid, iommu) here.

> +	}
> +
>  	if (ret)
>  		goto out;
>  	ret = __iommu_queue_command_sync(iommu, &cmd2, false);
> @@ -3127,6 +3149,8 @@ static void iommu_flush_irt_and_complete(struct amd_iommu *iommu, u16 devid)
>  	wait_on_sem(iommu, data);
>  out:
>  	raw_spin_unlock_irqrestore(&iommu->lock, flags);
> +	if (pdev)
> +		pci_dev_put(pdev);

This can also be moved to the respective if () branch above, no?

-Joerg
[PATCH v2] iommu/amd: Invalidate IRT cache for DMA aliases
Posted by Magnus Kalland 2 days, 3 hours ago
DMA aliasing causes interrupt remapping table entries (IRTEs) to be shared
between multiple device IDs. See commit 3c124435e8dd
("iommu/amd: Support multiple PCI DMA aliases in IRQ Remapping") for more
information on this. However, the AMD IOMMU driver currently invalidates
IRTE cache entries on a per-device basis whenever an IRTE is updated, not
for each alias.

This approach leaves stale IRTE cache entries when an IRTE is cached under
one DMA alias but later updated and invalidated through a different alias.
In such cases, the original device ID is never invalidated, since it is
programmed via aliasing.

This incoherency bug has been observed when IRTEs are cached for one
Non-Transparent Bridge (NTB) DMA alias, later updated via another.

Fix this by invalidating the interrupt remapping table cache for all DMA
aliases when updating an IRTE.

Link: https://lore.kernel.org/linux-iommu/fwtqfdk3m7qrazj4bfutl4grac46agtxztc3p2lqnejt2wyexu@lztyomxrm3pk/
Signed-off-by: Magnus Kalland <magnus@dolphinics.com>

---

v2:
 - Move the lock acquire before branching
 - Call iommu_flush_dev_irt() when pdev is null
 - Handle pdev refcount in correct branch

 drivers/iommu/amd/iommu.c | 32 +++++++++++++++++++++++++++-----
 1 file changed, 27 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 2e1865daa1ce..b5256b28b0c8 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3077,25 +3077,47 @@ const struct iommu_ops amd_iommu_ops = {
 static struct irq_chip amd_ir_chip;
 static DEFINE_SPINLOCK(iommu_table_lock);
 
+static int iommu_flush_dev_irt(struct pci_dev *unused, u16 devid, void *data)
+{
+	int ret;
+	struct iommu_cmd cmd;
+	struct amd_iommu *iommu = data;
+
+	build_inv_irt(&cmd, devid);
+	ret = __iommu_queue_command_sync(iommu, &cmd, true);
+	return ret;
+}
+
 static void iommu_flush_irt_and_complete(struct amd_iommu *iommu, u16 devid)
 {
 	int ret;
 	u64 data;
+	int domain = iommu->pci_seg->id;
+	unsigned int bus = PCI_BUS_NUM(devid);
+	unsigned int devfn = devid & 0xff;
 	unsigned long flags;
-	struct iommu_cmd cmd, cmd2;
+	struct iommu_cmd cmd;
+	struct pci_dev *pdev = NULL;
 
 	if (iommu->irtcachedis_enabled)
 		return;
 
-	build_inv_irt(&cmd, devid);
 	data = atomic64_inc_return(&iommu->cmd_sem_val);
-	build_completion_wait(&cmd2, iommu, data);
+	build_completion_wait(&cmd, iommu, data);
 
+	pdev = pci_get_domain_bus_and_slot(domain, bus, devfn);
 	raw_spin_lock_irqsave(&iommu->lock, flags);
-	ret = __iommu_queue_command_sync(iommu, &cmd, true);
+	if (pdev) {
+		ret = pci_for_each_dma_alias(pdev, iommu_flush_dev_irt, iommu);
+		pci_dev_put(pdev);
+	} else {
+		ret = iommu_flush_dev_irt(NULL, devid, iommu);
+	}
+
 	if (ret)
 		goto out;
-	ret = __iommu_queue_command_sync(iommu, &cmd2, false);
+
+	ret = __iommu_queue_command_sync(iommu, &cmd, false);
 	if (ret)
 		goto out;
 	wait_on_sem(iommu, data);
-- 
2.43.0
Re: [PATCH v2] iommu/amd: Invalidate IRT cache for DMA aliases
Posted by Jörg Rödel 1 day, 7 hours ago
On Thu, Feb 05, 2026 at 03:01:00PM +0100, Magnus Kalland wrote:
>  drivers/iommu/amd/iommu.c | 32 +++++++++++++++++++++++++++-----
>  1 file changed, 27 insertions(+), 5 deletions(-)

This does not apply cleanly on the iommu/next branch or the iommu/amd/amd-vi
branch. Please rebase to the next -rc1 when it comes out and re-send.

-Joerg
Re: [PATCH v2] iommu/amd: Invalidate IRT cache for DMA aliases
Posted by Vasant Hegde 1 day, 7 hours ago
Joerg,

On 2/6/2026 3:36 PM, Jörg Rödel wrote:
> On Thu, Feb 05, 2026 at 03:01:00PM +0100, Magnus Kalland wrote:
>>  drivers/iommu/amd/iommu.c | 32 +++++++++++++++++++++++++++-----
>>  1 file changed, 27 insertions(+), 5 deletions(-)
> 
> This does not apply cleanly on the iommu/next branch or the iommu/amd/amd-vi
> branch. Please rebase to the next -rc1 when it comes out and re-send.

Sorry. I had missed this one. Patch looks fine. Can you wait until Monday? We
want to test it internally and make sure it looks fine.

-Vasant

Re: [PATCH v2] iommu/amd: Invalidate IRT cache for DMA aliases
Posted by Jörg Rödel 1 day, 6 hours ago
Hi Vasant,

On Fri, Feb 06, 2026 at 03:40:49PM +0530, Vasant Hegde wrote:
> On 2/6/2026 3:36 PM, Jörg Rödel wrote:
> > On Thu, Feb 05, 2026 at 03:01:00PM +0100, Magnus Kalland wrote:
> >>  drivers/iommu/amd/iommu.c | 32 +++++++++++++++++++++++++++-----
> >>  1 file changed, 27 insertions(+), 5 deletions(-)
> > 
> > This does not apply cleanly on the iommu/next branch or the iommu/amd/amd-vi
> > branch. Please rebase to the next -rc1 when it comes out and re-send.
> 
> Sorry. I had missed this one. Patch looks fine. Can you wait until Monday? We
> want to test it internally and make sure it looks fine.

You have at least two more weeks, this will not make it into my tree until
after the next merge window.

-Joerg