iommu/arm-smmu-v3: Quarantine device upon ATC invalidation timeout

[PATCH v2 4/7] iommu/arm-smmu-v3: Mark ATC invalidate timeouts via lockless bitmap

Posted by Nicolin Chen 2 weeks, 6 days ago

An ATC invalidation timeout is a fatal error. While the SMMUv3 hardware is
aware of the timeout via a GERROR interrupt, the driver thread issuing the
commands lacks a direct mechanism to verify whether its specific batch was
the cause or not, as polling the CMD_SYNC status doesn't natively return a
failure code, making it very difficult to coordinate per-device recovery.

Introduce an atc_sync_timeouts bitmap in the cmdq structure to bridge this
gap. When the ISR detects an ATC timeout, set the bit corresponding to the
physical CMDQ index of the faulting CMD_SYNC command.

On the issuer side, after polling completes (or times out), test and clear
its dedicated bit. If set, override any generic timeout, return -ETIMEDOUT
to trigger device quarantine.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  1 +
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 20 +++++++++++++++++++-
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 36de2b0b2ebe6..3eb12a34b086a 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -633,6 +633,7 @@ struct arm_smmu_cmdq {
 	atomic_long_t			*valid_map;
 	atomic_t			owner_prod;
 	atomic_t			lock;
+	unsigned long			*atc_sync_timeouts;
 	bool				(*supports_cmd)(struct arm_smmu_cmdq_ent *ent);
 };
 
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 01030ffd2fe23..9c8972ebc94f9 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -445,7 +445,10 @@ void __arm_smmu_cmdq_skip_err(struct arm_smmu_device *smmu,
 		 * at the CMD_SYNC. Attempt to complete other pending commands
 		 * by repeating the CMD_SYNC, though we might well end up back
 		 * here since the ATC invalidation may still be pending.
+		 *
+		 * Mark the faulty batch in the bitmap for the issuer to match.
 		 */
+		set_bit(Q_IDX(&q->llq, cons), cmdq->atc_sync_timeouts);
 		return;
 	case CMDQ_ERR_CERROR_ILL_IDX:
 	default:
@@ -895,9 +898,19 @@ int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
 
 	/* 5. If we are inserting a CMD_SYNC, we must wait for it to complete */
 	if (sync) {
+		u32 sync_prod;
+
 		llq.prod = queue_inc_prod_n(&llq, n);
+		sync_prod = llq.prod;
+
 		ret = arm_smmu_cmdq_poll_until_sync(smmu, cmdq, &llq);
-		if (ret) {
+		if (test_and_clear_bit(Q_IDX(&llq, sync_prod),
+				       cmdq->atc_sync_timeouts)) {
+			dev_err_ratelimited(smmu->dev,
+					    "CMD_SYNC for ATC_INV timeout at prod=0x%08x\n",
+					    sync_prod);
+			ret = -ETIMEDOUT;
+		} else if (ret) {
 			dev_err_ratelimited(smmu->dev,
 					    "CMD_SYNC timeout at 0x%08x [hwprod 0x%08x, hwcons 0x%08x]\n",
 					    llq.prod,
@@ -4458,6 +4471,11 @@ int arm_smmu_cmdq_init(struct arm_smmu_device *smmu,
 	if (!cmdq->valid_map)
 		return -ENOMEM;
 
+	cmdq->atc_sync_timeouts =
+		devm_bitmap_zalloc(smmu->dev, nents, GFP_KERNEL);
+	if (!cmdq->atc_sync_timeouts)
+		return -ENOMEM;
+
 	return 0;
 }
 
-- 
2.43.0

Re: [PATCH v2 4/7] iommu/arm-smmu-v3: Mark ATC invalidate timeouts via lockless bitmap

Posted by Samiullah Khawaja 2 weeks, 4 days ago

Hi Nicolin,

On Tue, Mar 17, 2026 at 12:15:37PM -0700, Nicolin Chen wrote:
>An ATC invalidation timeout is a fatal error. While the SMMUv3 hardware is
>aware of the timeout via a GERROR interrupt, the driver thread issuing the
>commands lacks a direct mechanism to verify whether its specific batch was
>the cause or not, as polling the CMD_SYNC status doesn't natively return a
>failure code, making it very difficult to coordinate per-device recovery.
>
>Introduce an atc_sync_timeouts bitmap in the cmdq structure to bridge this
>gap. When the ISR detects an ATC timeout, set the bit corresponding to the
>physical CMDQ index of the faulting CMD_SYNC command.
>
>On the issuer side, after polling completes (or times out), test and clear
>its dedicated bit. If set, override any generic timeout, return -ETIMEDOUT
>to trigger device quarantine.
>
>Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
>---
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  1 +
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 20 +++++++++++++++++++-
> 2 files changed, 20 insertions(+), 1 deletion(-)
>
>diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>index 36de2b0b2ebe6..3eb12a34b086a 100644
>--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>@@ -633,6 +633,7 @@ struct arm_smmu_cmdq {
> 	atomic_long_t			*valid_map;
> 	atomic_t			owner_prod;
> 	atomic_t			lock;
>+	unsigned long			*atc_sync_timeouts;
> 	bool				(*supports_cmd)(struct arm_smmu_cmdq_ent *ent);
> };
>
>diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>index 01030ffd2fe23..9c8972ebc94f9 100644
>--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>@@ -445,7 +445,10 @@ void __arm_smmu_cmdq_skip_err(struct arm_smmu_device *smmu,
> 		 * at the CMD_SYNC. Attempt to complete other pending commands
> 		 * by repeating the CMD_SYNC, though we might well end up back
> 		 * here since the ATC invalidation may still be pending.
>+		 *
>+		 * Mark the faulty batch in the bitmap for the issuer to match.
> 		 */
>+		set_bit(Q_IDX(&q->llq, cons), cmdq->atc_sync_timeouts);
> 		return;
> 	case CMDQ_ERR_CERROR_ILL_IDX:
> 	default:
>@@ -895,9 +898,19 @@ int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
>
> 	/* 5. If we are inserting a CMD_SYNC, we must wait for it to complete */
> 	if (sync) {
>+		u32 sync_prod;
>+
> 		llq.prod = queue_inc_prod_n(&llq, n);
>+		sync_prod = llq.prod;
>+
> 		ret = arm_smmu_cmdq_poll_until_sync(smmu, cmdq, &llq);
>-		if (ret) {
>+		if (test_and_clear_bit(Q_IDX(&llq, sync_prod),
>+				       cmdq->atc_sync_timeouts)) {

This will not be set if a software timeout (1 second) occurs. Do you
know if the ATC timeout of Arm sMMUv3 is less than the software timeout
in the driver?

If not maybe we can handle the software timeout here also as the cmdlist
is already known?

Thanks,
Sami
>+			dev_err_ratelimited(smmu->dev,
>+					    "CMD_SYNC for ATC_INV timeout at prod=0x%08x\n",
>+					    sync_prod);
>+			ret = -ETIMEDOUT;
>+		} else if (ret) {
> 			dev_err_ratelimited(smmu->dev,
> 					    "CMD_SYNC timeout at 0x%08x [hwprod 0x%08x, hwcons 0x%08x]\n",
> 					    llq.prod,
>@@ -4458,6 +4471,11 @@ int arm_smmu_cmdq_init(struct arm_smmu_device *smmu,
> 	if (!cmdq->valid_map)
> 		return -ENOMEM;
>
>+	cmdq->atc_sync_timeouts =
>+		devm_bitmap_zalloc(smmu->dev, nents, GFP_KERNEL);
>+	if (!cmdq->atc_sync_timeouts)
>+		return -ENOMEM;
>+
> 	return 0;
> }
>
>-- 
>2.43.0
>
>

Re: [PATCH v2 4/7] iommu/arm-smmu-v3: Mark ATC invalidate timeouts via lockless bitmap

Posted by Nicolin Chen 2 weeks, 4 days ago

Hi Sami,

On Wed, Mar 18, 2026 at 10:02:32PM +0000, Samiullah Khawaja wrote:
> On Tue, Mar 17, 2026 at 12:15:37PM -0700, Nicolin Chen wrote:
> > @@ -895,9 +898,19 @@ int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
> > 
> > 	/* 5. If we are inserting a CMD_SYNC, we must wait for it to complete */
> > 	if (sync) {
> > +		u32 sync_prod;
> > +
> > 		llq.prod = queue_inc_prod_n(&llq, n);
> > +		sync_prod = llq.prod;
> > +
> > 		ret = arm_smmu_cmdq_poll_until_sync(smmu, cmdq, &llq);
> > -		if (ret) {
> > +		if (test_and_clear_bit(Q_IDX(&llq, sync_prod),
> > +				       cmdq->atc_sync_timeouts)) {
> 
> This will not be set if a software timeout (1 second) occurs. Do you
> know if the ATC timeout of Arm sMMUv3 is less than the software timeout
> in the driver?

You brought up a good point!

I think ATC timeout follows the PCI Completion Timeout Value in
"Device Control 2 Register", which is typically set [50us, 50ms]
but can be set up to [17s, 64s] according to PCI Base spec.

> If not maybe we can handle the software timeout here also as the cmdlist
> is already known?

I think it's trickier.

If the software times out first at 1s, it means the CMDQ is still
pending on wait for the completion of ATC invalidation. Then, the
caller sees -ETIMEOUT and tries to bisect the ATC batch or update
the STE directly, either of which involves CMDQ. But CMDQ has not
recovered yet.

Then, in case of a batch, all the reties could timeout again. So,
it will fail to identify which device is truly broken. This would
end badly by blindly disabling all the devices in the batch. Also
the disabling calls require CMDQ too, so they might fail as well.

Thus, partially to answer the question, in case software timeout,
I am afraid that we can hardly do anything.. :-/

This means I need to set a different return code for ATC timeouts
v.s. software timeouts.

Also, there is another problem: when PCI CTO finally reaches, the
GERROR ISR will set atc_sync_timeouts but nobody will clear it..
So, before calling arm_smmu_cmdq_issue_cmdlist(), we need to make
sure there is no dirty bit on the bitmap too.

Thanks!
Nicolin

Re: [PATCH v2 4/7] iommu/arm-smmu-v3: Mark ATC invalidate timeouts via lockless bitmap

Posted by Jason Gunthorpe 1 week, 6 days ago

On Wed, Mar 18, 2026 at 04:23:53PM -0700, Nicolin Chen wrote:

> If the software times out first at 1s, it means the CMDQ is still
> pending on wait for the completion of ATC invalidation. Then, the
> caller sees -ETIMEOUT and tries to bisect the ATC batch or update
> the STE directly, either of which involves CMDQ. But CMDQ has not
> recovered yet.

Yeah, I don't know if the SW timeout flow is really all that RASy here
right now. Without somehow recovering the CMDQ it is pointless to try
to continue after a timeout.

And we are really in trouble if things like normal IOTLB invalidation
start to fail.

I think the right thing is to somehow try to recover the cmdq and then
restart it on the commands that haven't been SYNC'd yet and just keep
trying, maybe with progressively longer timeouts.

Just ignoring the error and continuing doesn't seem safe.

But that's something else again, as long as ATC invalidation reliably
hits the HW timeout first we should be OK to ignore it in this
series..

Jason

Re: [PATCH v2 4/7] iommu/arm-smmu-v3: Mark ATC invalidate timeouts via lockless bitmap

Posted by Nicolin Chen 1 week, 6 days ago

On Mon, Mar 23, 2026 at 08:57:56PM -0300, Jason Gunthorpe wrote:
> On Wed, Mar 18, 2026 at 04:23:53PM -0700, Nicolin Chen wrote:
> 
> > If the software times out first at 1s, it means the CMDQ is still
> > pending on wait for the completion of ATC invalidation. Then, the
> > caller sees -ETIMEOUT and tries to bisect the ATC batch or update
> > the STE directly, either of which involves CMDQ. But CMDQ has not
> > recovered yet.
> 
> Yeah, I don't know if the SW timeout flow is really all that RASy here
> right now. Without somehow recovering the CMDQ it is pointless to try
> to continue after a timeout.
> 
> And we are really in trouble if things like normal IOTLB invalidation
> start to fail.
> 
> I think the right thing is to somehow try to recover the cmdq and then
> restart it on the commands that haven't been SYNC'd yet and just keep
> trying, maybe with progressively longer timeouts.
> 
> Just ignoring the error and continuing doesn't seem safe.
> 
> But that's something else again, as long as ATC invalidation reliably
> hits the HW timeout first we should be OK to ignore it in this
> series..

Yea. I will leave a FIXME inline for now.

Nicolin

Re: [PATCH v2 4/7] iommu/arm-smmu-v3: Mark ATC invalidate timeouts via lockless bitmap

Posted by Samiullah Khawaja 2 weeks, 4 days ago

On Wed, Mar 18, 2026 at 04:23:53PM -0700, Nicolin Chen wrote:
>Hi Sami,
>
>On Wed, Mar 18, 2026 at 10:02:32PM +0000, Samiullah Khawaja wrote:
>> On Tue, Mar 17, 2026 at 12:15:37PM -0700, Nicolin Chen wrote:
>> > @@ -895,9 +898,19 @@ int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
>> >
>> > 	/* 5. If we are inserting a CMD_SYNC, we must wait for it to complete */
>> > 	if (sync) {
>> > +		u32 sync_prod;
>> > +
>> > 		llq.prod = queue_inc_prod_n(&llq, n);
>> > +		sync_prod = llq.prod;
>> > +
>> > 		ret = arm_smmu_cmdq_poll_until_sync(smmu, cmdq, &llq);
>> > -		if (ret) {
>> > +		if (test_and_clear_bit(Q_IDX(&llq, sync_prod),
>> > +				       cmdq->atc_sync_timeouts)) {
>>
>> This will not be set if a software timeout (1 second) occurs. Do you
>> know if the ATC timeout of Arm sMMUv3 is less than the software timeout
>> in the driver?
>
>You brought up a good point!
>
>I think ATC timeout follows the PCI Completion Timeout Value in
>"Device Control 2 Register", which is typically set [50us, 50ms]
>but can be set up to [17s, 64s] according to PCI Base spec.

Agreed.
>
>> If not maybe we can handle the software timeout here also as the cmdlist
>> is already known?
>
>I think it's trickier.
>
>If the software times out first at 1s, it means the CMDQ is still
>pending on wait for the completion of ATC invalidation. Then, the
>caller sees -ETIMEOUT and tries to bisect the ATC batch or update
>the STE directly, either of which involves CMDQ. But CMDQ has not
>recovered yet.
>
>Then, in case of a batch, all the reties could timeout again. So,
>it will fail to identify which device is truly broken. This would
>end badly by blindly disabling all the devices in the batch. Also
>the disabling calls require CMDQ too, so they might fail as well.

Yes, looking at VT-d currently and the queue length is 256 and this
spirals out of control quickly.
>
>Thus, partially to answer the question, in case software timeout,
>I am afraid that we can hardly do anything.. :-/

Agreed.

Do you think we can maybe document this somewhere? Maybe add to the
cover letter?
>
>This means I need to set a different return code for ATC timeouts
>v.s. software timeouts.
>
>Also, there is another problem: when PCI CTO finally reaches, the
>GERROR ISR will set atc_sync_timeouts but nobody will clear it..
>So, before calling arm_smmu_cmdq_issue_cmdlist(), we need to make
>sure there is no dirty bit on the bitmap too.

Yes, Just to confirm, do you think this needs to be handled regardless
whether we handle the software timeout for the ATC invalidation?
Basically to cleanup the bit on bitmap.
>
>Thanks!
>Nicolin

Re: [PATCH v2 4/7] iommu/arm-smmu-v3: Mark ATC invalidate timeouts via lockless bitmap

Posted by Nicolin Chen 2 weeks, 4 days ago

On Thu, Mar 19, 2026 at 12:08:04AM +0000, Samiullah Khawaja wrote:
> On Wed, Mar 18, 2026 at 04:23:53PM -0700, Nicolin Chen wrote:
> > If the software times out first at 1s, it means the CMDQ is still
> > pending on wait for the completion of ATC invalidation. Then, the
> > caller sees -ETIMEOUT and tries to bisect the ATC batch or update
> > the STE directly, either of which involves CMDQ. But CMDQ has not
> > recovered yet.
> > 
> > Then, in case of a batch, all the reties could timeout again. So,
> > it will fail to identify which device is truly broken. This would
> > end badly by blindly disabling all the devices in the batch. Also
> > the disabling calls require CMDQ too, so they might fail as well.
> 
> Yes, looking at VT-d currently and the queue length is 256 and this
> spirals out of control quickly.
> > 
> > Thus, partially to answer the question, in case software timeout,
> > I am afraid that we can hardly do anything.. :-/
> 
> Agreed.
> 
> Do you think we can maybe document this somewhere? Maybe add to the
> cover letter?

Yes. I will add a note inline as well where software times out.

> > This means I need to set a different return code for ATC timeouts
> > v.s. software timeouts.
> > 
> > Also, there is another problem: when PCI CTO finally reaches, the
> > GERROR ISR will set atc_sync_timeouts but nobody will clear it..
> > So, before calling arm_smmu_cmdq_issue_cmdlist(), we need to make
> > sure there is no dirty bit on the bitmap too.
> 
> Yes, Just to confirm, do you think this needs to be handled regardless
> whether we handle the software timeout for the ATC invalidation?
> Basically to cleanup the bit on bitmap.

I don't see a reason not to. I think the next issuer who sees a
dirty slot in the bitmap will not have any idea about that ATC
timeout (batch). Basically the previous issuer was returned and
the batch is gone. So, it can do nothing but clear the slot in
the bitmap and move forward.

Nicolin

RE: [PATCH v2 4/7] iommu/arm-smmu-v3: Mark ATC invalidate timeouts via lockless bitmap

Posted by Tian, Kevin 2 weeks, 5 days ago

> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Wednesday, March 18, 2026 3:16 AM
> 
> An ATC invalidation timeout is a fatal error. While the SMMUv3 hardware is
> aware of the timeout via a GERROR interrupt, the driver thread issuing the
> commands lacks a direct mechanism to verify whether its specific batch was
> the cause or not, as polling the CMD_SYNC status doesn't natively return a
> failure code, making it very difficult to coordinate per-device recovery.
> 
> Introduce an atc_sync_timeouts bitmap in the cmdq structure to bridge this
> gap. When the ISR detects an ATC timeout, set the bit corresponding to the
> physical CMDQ index of the faulting CMD_SYNC command.
> 

It's nice to see the ability of allowing sw to identify the faulting sync command
upon an ATC timeout! On VT-d it's not feasible when multiple wait descriptors
(similar to CMD_SYNC) are in-fly... :/

Re: [PATCH v2 4/7] iommu/arm-smmu-v3: Mark ATC invalidate timeouts via lockless bitmap

Posted by Nicolin Chen 2 weeks, 5 days ago

On Wed, Mar 18, 2026 at 07:36:20AM +0000, Tian, Kevin wrote:
> > From: Nicolin Chen <nicolinc@nvidia.com>
> > Sent: Wednesday, March 18, 2026 3:16 AM
> > 
> > An ATC invalidation timeout is a fatal error. While the SMMUv3 hardware is
> > aware of the timeout via a GERROR interrupt, the driver thread issuing the
> > commands lacks a direct mechanism to verify whether its specific batch was
> > the cause or not, as polling the CMD_SYNC status doesn't natively return a
> > failure code, making it very difficult to coordinate per-device recovery.
> > 
> > Introduce an atc_sync_timeouts bitmap in the cmdq structure to bridge this
> > gap. When the ISR detects an ATC timeout, set the bit corresponding to the
> > physical CMDQ index of the faulting CMD_SYNC command.
> > 
> 
> It's nice to see the ability of allowing sw to identify the faulting sync command
> upon an ATC timeout! On VT-d it's not feasible when multiple wait descriptors
> (similar to CMD_SYNC) are in-fly... :/

Actually SMMU doesn't know which device is faulting when CMD_SYNC
follows ATC_INV commands for multiple devices. The commit message
in PATCH-7 describes this in the end. So Jason suggested to retry
those ATC_INV commands by bisecting them per-device, which allows
us to pinpoint which device.

Could VT-d do the same?

Nicolin

Re: [PATCH v2 4/7] iommu/arm-smmu-v3: Mark ATC invalidate timeouts via lockless bitmap

Posted by Samiullah Khawaja 2 weeks, 4 days ago

Hi Nicolin,

On Wed, Mar 18, 2026 at 12:26:33PM -0700, Nicolin Chen wrote:
>On Wed, Mar 18, 2026 at 07:36:20AM +0000, Tian, Kevin wrote:
>> > From: Nicolin Chen <nicolinc@nvidia.com>
>> > Sent: Wednesday, March 18, 2026 3:16 AM
>> >
>> > An ATC invalidation timeout is a fatal error. While the SMMUv3 hardware is
>> > aware of the timeout via a GERROR interrupt, the driver thread issuing the
>> > commands lacks a direct mechanism to verify whether its specific batch was
>> > the cause or not, as polling the CMD_SYNC status doesn't natively return a
>> > failure code, making it very difficult to coordinate per-device recovery.
>> >
>> > Introduce an atc_sync_timeouts bitmap in the cmdq structure to bridge this
>> > gap. When the ISR detects an ATC timeout, set the bit corresponding to the
>> > physical CMDQ index of the faulting CMD_SYNC command.
>> >
>>
>> It's nice to see the ability of allowing sw to identify the faulting sync command
>> upon an ATC timeout! On VT-d it's not feasible when multiple wait descriptors
>> (similar to CMD_SYNC) are in-fly... :/
>
>Actually SMMU doesn't know which device is faulting when CMD_SYNC

VT-d is able to find out the SID of the device for which the device TLB
invalidation timed-out occured by using the SID reported in the
"Invalidation Queue Error Record Register" (VT-d Specs 11.4.9.9).
>follows ATC_INV commands for multiple devices. The commit message
>in PATCH-7 describes this in the end. So Jason suggested to retry
>those ATC_INV commands by bisecting them per-device, which allows
>us to pinpoint which device.

But for a software timeout, something like this would be needed.
>
>Could VT-d do the same?
>
>Nicolin
>

Thanks,
Sami

RE: [PATCH v2 4/7] iommu/arm-smmu-v3: Mark ATC invalidate timeouts via lockless bitmap

Posted by Tian, Kevin 2 weeks, 4 days ago

> From: Samiullah Khawaja <skhawaja@google.com>
> Sent: Thursday, March 19, 2026 6:07 AM
> 
> Hi Nicolin,
> 
> On Wed, Mar 18, 2026 at 12:26:33PM -0700, Nicolin Chen wrote:
> >On Wed, Mar 18, 2026 at 07:36:20AM +0000, Tian, Kevin wrote:
> >> > From: Nicolin Chen <nicolinc@nvidia.com>
> >> > Sent: Wednesday, March 18, 2026 3:16 AM
> >> >
> >> > An ATC invalidation timeout is a fatal error. While the SMMUv3
> hardware is
> >> > aware of the timeout via a GERROR interrupt, the driver thread issuing
> the
> >> > commands lacks a direct mechanism to verify whether its specific batch
> was
> >> > the cause or not, as polling the CMD_SYNC status doesn't natively return
> a
> >> > failure code, making it very difficult to coordinate per-device recovery.
> >> >
> >> > Introduce an atc_sync_timeouts bitmap in the cmdq structure to bridge
> this
> >> > gap. When the ISR detects an ATC timeout, set the bit corresponding to
> the
> >> > physical CMDQ index of the faulting CMD_SYNC command.
> >> >
> >>
> >> It's nice to see the ability of allowing sw to identify the faulting sync
> command
> >> upon an ATC timeout! On VT-d it's not feasible when multiple wait
> descriptors
> >> (similar to CMD_SYNC) are in-fly... :/
> >
> >Actually SMMU doesn't know which device is faulting when CMD_SYNC
> 
> VT-d is able to find out the SID of the device for which the device TLB
> invalidation timed-out occured by using the SID reported in the
> "Invalidation Queue Error Record Register" (VT-d Specs 11.4.9.9).

yes. but when there are multiple submissions (each with a wait descriptor)
fetched/handled by the hw and then an invalidation timeout comes, all
pending wait descriptors will be aborted (not just the one corresponding
to the timeout). In this case all affected submitters need to re-try.

Re: [PATCH v2 4/7] iommu/arm-smmu-v3: Mark ATC invalidate timeouts via lockless bitmap

Posted by Nicolin Chen 2 weeks, 4 days ago

On Thu, Mar 19, 2026 at 03:08:05AM +0000, Tian, Kevin wrote:
> > From: Samiullah Khawaja <skhawaja@google.com>
> > Sent: Thursday, March 19, 2026 6:07 AM
> > 
> > Hi Nicolin,
> > 
> > On Wed, Mar 18, 2026 at 12:26:33PM -0700, Nicolin Chen wrote:
> > >On Wed, Mar 18, 2026 at 07:36:20AM +0000, Tian, Kevin wrote:
> > >> > From: Nicolin Chen <nicolinc@nvidia.com>
> > >> > Sent: Wednesday, March 18, 2026 3:16 AM
> > >> >
> > >> > An ATC invalidation timeout is a fatal error. While the SMMUv3
> > hardware is
> > >> > aware of the timeout via a GERROR interrupt, the driver thread issuing
> > the
> > >> > commands lacks a direct mechanism to verify whether its specific batch
> > was
> > >> > the cause or not, as polling the CMD_SYNC status doesn't natively return
> > a
> > >> > failure code, making it very difficult to coordinate per-device recovery.
> > >> >
> > >> > Introduce an atc_sync_timeouts bitmap in the cmdq structure to bridge
> > this
> > >> > gap. When the ISR detects an ATC timeout, set the bit corresponding to
> > the
> > >> > physical CMDQ index of the faulting CMD_SYNC command.
> > >> >
> > >>
> > >> It's nice to see the ability of allowing sw to identify the faulting sync
> > command
> > >> upon an ATC timeout! On VT-d it's not feasible when multiple wait
> > descriptors
> > >> (similar to CMD_SYNC) are in-fly... :/
> > >
> > >Actually SMMU doesn't know which device is faulting when CMD_SYNC
> > 
> > VT-d is able to find out the SID of the device for which the device TLB
> > invalidation timed-out occured by using the SID reported in the
> > "Invalidation Queue Error Record Register" (VT-d Specs 11.4.9.9).
> 
> yes. but when there are multiple submissions (each with a wait descriptor)
> fetched/handled by the hw and then an invalidation timeout comes, all
> pending wait descriptors will be aborted (not just the one corresponding
> to the timeout). In this case all affected submitters need to re-try.

This sounds similar to SMMU then.

Nicolin

Re: [PATCH v2 4/7] iommu/arm-smmu-v3: Mark ATC invalidate timeouts via lockless bitmap

Posted by Jason Gunthorpe 1 week, 6 days ago

On Wed, Mar 18, 2026 at 08:12:04PM -0700, Nicolin Chen wrote:
> > > VT-d is able to find out the SID of the device for which the device TLB
> > > invalidation timed-out occured by using the SID reported in the
> > > "Invalidation Queue Error Record Register" (VT-d Specs 11.4.9.9).
> > 
> > yes. but when there are multiple submissions (each with a wait descriptor)
> > fetched/handled by the hw and then an invalidation timeout comes, all
> > pending wait descriptors will be aborted (not just the one corresponding
> > to the timeout). In this case all affected submitters need to re-try.
> 
> This sounds similar to SMMU then.

Not entirely.. smmu HW stops processing at a SYNC and waits for
everything pending to complete, then goes on forward. If there is a HW
reported ATC timeout then it is contained to the SYNC that followed
the ATC invalidation. The errored sync is skipped and whatever follows
continues forward, so it doesn't contaminate future work.

VT-d's wait descriptor with fence FN=1 sounds identical???

I guess if FN=0 then things start to become indeterminate what the
wait actually waits for..

Jason