[PATCH v2] iommu/vt-d: fix intel iommu iotlb sync hardlockup and retry

Guanghui Feng posted 1 patch 22 hours ago
drivers/iommu/intel/dmar.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
[PATCH v2] iommu/vt-d: fix intel iommu iotlb sync hardlockup and retry
Posted by Guanghui Feng 22 hours ago
Device-TLB Invalidation Response Time-out (ITE) handling was added in
commit: 6ba6c3a4cacf.

When an ITE occurs, iommu will sets the ITE (Invalidation Time-out
Error) field in the Fault Status Register. No new descriptors are
fetched from the Invalidation Queue until software clears the ITE field
in the Fault Status Register. Tail pointer Register updates by software
while the ITE field is Set does not cause descriptor fetches by
hardware. At the time ITE field is Set, hardware aborts any
inv_wait_dsc commands pending in hardware and does not increment
the Invalidation Queue Head register. When software clears the
ITE field in the Fault Status Register, hardware fetches
descriptor pointed by the Invalidation Queue Head register.

But in the qi_check_fault process, it is implemented by default
according to the 2009 commit: 6ba6c3a4cacf, that is, only one
struct qi_desc is submitted at a time. A qi_desc request is
immediately followed by a wait_desc/QI_IWD_TYPE for
synchronization. Therefore, the IOMMU driver implementation
considers invalid queue entries at odd positions to be
wait_desc. After ITE is set, hardware aborts any pending
inv_wait_dsc commands in hardware. Therefore, qi_check_fault
iterates through odd-position as wait_desc entries and sets
desc_status to QI_ABORT. However, the current implementation
allows multiple struct qi_desc to be submitted simultaneously,
followed by one wait_desc, so it's no longer guaranteed that
odd-position entries will be wait_desc. When the number of submitted
struct qi_desc is even, wait_desc's desc_status will not be set to QI_ABORT,
qi_check_fault will return 0, and qi_submit_sync will then
execute in an infinite loop and cause a hard lockup when
interrupts are disabled and the PCIe device does not respond to
Device-TLB Invalidation requests.

Additionally, if the device remains online and an IOMMU ITE
occurs, simply returning -EAGAIN is sufficient. When processing
the -EAGAIN result, qi_submit_sync will automatically reclaim
all submitted struct qi_desc and resubmit the requests.

Through this modification:
1. Correctly triggers the resubmission of struct qi_desc when
an ITE occurs.
2. Prevents the IOMMU driver from disabling interrupts and
executing in an infinite loop within qi_submit_sync when an
3. Correctly handling simultaneous requests from multiple CPUs
and multiple contexts that result in timeouts.

Signed-off-by: Guanghui Feng <guanghuifeng@linux.alibaba.com>
---
 drivers/iommu/intel/dmar.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
index ec975c73cfe6..6938800e9884 100644
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -1314,7 +1314,6 @@ static int qi_check_fault(struct intel_iommu *iommu, int index, int wait_index)
 	if (fault & DMA_FSTS_ITE) {
 		head = readl(iommu->reg + DMAR_IQH_REG);
 		head = ((head >> shift) - 1 + QI_LENGTH) % QI_LENGTH;
-		head |= 1;
 		tail = readl(iommu->reg + DMAR_IQT_REG);
 		tail = ((tail >> shift) - 1 + QI_LENGTH) % QI_LENGTH;
 
@@ -1331,7 +1330,7 @@ static int qi_check_fault(struct intel_iommu *iommu, int index, int wait_index)
 		do {
 			if (qi->desc_status[head] == QI_IN_USE)
 				qi->desc_status[head] = QI_ABORT;
-			head = (head - 2 + QI_LENGTH) % QI_LENGTH;
+			head = (head - 1 + QI_LENGTH) % QI_LENGTH;
 		} while (head != tail);
 
 		/*
-- 
2.43.7