[PATCH] iommu/vt-d: Fix potential soft lockup due to reclaim

Jacob Pan posted 1 patch 1 year, 6 months ago
drivers/iommu/intel/dmar.c  | 13 +++++++++----
drivers/iommu/intel/iommu.h |  3 ++-
2 files changed, 11 insertions(+), 5 deletions(-)
[PATCH] iommu/vt-d: Fix potential soft lockup due to reclaim
Posted by Jacob Pan 1 year, 6 months ago
From: Sanjay K Kumar <sanjay.k.kumar@intel.com>

If qi_submit_sync() is invoked with 0 invalidation descriptors (for
instance, for DMA draining purposes), we can run into a bug where a
submitting thread fails to detect the completion of invalidation_wait.
Subsequently, this led to a soft lockup.

Suppose thread T1 invokes qi_submit_sync() with non-zero descriptors, while
concurrently, thread T2 calls qi_submit_sync() with zero descriptors. Both
threads then enter a while loop, waiting for their respective descriptors
to complete. T1 detects its completion (i.e., T1's invalidation_wait status
changes to QI_DONE by HW) and proceeds to call reclaim_free_desc() to
reclaim all descriptors, potentially including adjacent ones of other
threads that are also marked as QI_DONE.

During this time, while T2 is waiting to acquire the qi->q_lock, the IOMMU
hardware may complete the invalidation for T2, setting its status to
QI_DONE. However, if T1's execution of reclaim_free_desc() frees T2's
invalidation_wait descriptor and changes its status to QI_FREE, T2 will
not observe the QI_DONE status for its invalidation_wait and will
indefinitely remain stuck.

This soft lockup does not occur when only non-zero descriptors are
submitted.In such cases, invalidation descriptors are interspersed among
wait descriptors with the status QI_IN_USE, acting as barriers. These
barriers prevent the reclaim code from mistakenly freeing descriptors
belonging to other submitters.

Considered the following example timeline:
	T1			T2
========================================
	ID1
	WD1
	while(WD1!=QI_DONE)
	unlock
				lock
	WD1=QI_DONE*		WD2
				while(WD2!=QI_DONE)
				unlock
	lock
	WD1==QI_DONE?
	ID1=QI_DONE		WD2=DONE*
	reclaim()
	ID1=FREE
	WD1=FREE
	WD2=FREE
	unlock
				soft lockup! T2 never sees QI_DONE in WD2

Where:
ID = invalidation descriptor
WD = wait descriptor
* Written by hardware

The root of the problem is that the descriptor status QI_DONE flag is used
for two conflicting purposes:
1. signal a descriptor is ready for reclaim (to be freed)
2. signal by the hardware that a wait descriptor is complete

The solution (in this patch) is state separation by introducing a new flag
for the descriptors called QI_TO_BE_FREED.

Once a thread's invalidation descriptors are complete, their status would
be set to QI_TO_BE_FREED. The reclaim_free_desc() function would then only
free descriptors marked as QI_TO_BE_FREED instead of those marked as
QI_DONE. This change ensures that T2 (from the previous example) will
correctly observe the completion of its invalidation_wait (marked as
QI_DONE).

Currently, there is no impact by this bug on the existing users because no
callers are submitting invalidations with 0 descriptors.

Signed-off-by: Sanjay K Kumar <sanjay.k.kumar@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 drivers/iommu/intel/dmar.c  | 13 +++++++++----
 drivers/iommu/intel/iommu.h |  3 ++-
 2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
index 304e84949ca7..00e0f5f801c5 100644
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -1204,8 +1204,7 @@ static void free_iommu(struct intel_iommu *iommu)
  */
 static inline void reclaim_free_desc(struct q_inval *qi)
 {
-	while (qi->desc_status[qi->free_tail] == QI_DONE ||
-	       qi->desc_status[qi->free_tail] == QI_ABORT) {
+	while (qi->desc_status[qi->free_tail] == QI_TO_BE_FREED) {
 		qi->desc_status[qi->free_tail] = QI_FREE;
 		qi->free_tail = (qi->free_tail + 1) % QI_LENGTH;
 		qi->free_cnt++;
@@ -1463,8 +1462,14 @@ int qi_submit_sync(struct intel_iommu *iommu, struct qi_desc *desc,
 		raw_spin_lock(&qi->q_lock);
 	}
 
-	for (i = 0; i < count; i++)
-		qi->desc_status[(index + i) % QI_LENGTH] = QI_DONE;
+	/*
+	 * The reclaim code can free descriptors from multiple submissions
+	 * starting from the tail of the queue. When count == 0, the
+	 * status of the standalone wait descriptor at the tail of the queue
+	 * must be set to QI_TO_BE_FREED to allow the reclaim code to proceed.
+	 */
+	for (i = 0; i <= count; i++)
+		qi->desc_status[(index + i) % QI_LENGTH] = QI_TO_BE_FREED;
 
 	reclaim_free_desc(qi);
 	raw_spin_unlock_irqrestore(&qi->q_lock, flags);
diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index eaf015b4353b..1ab39f9145f2 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -382,7 +382,8 @@ enum {
 	QI_FREE,
 	QI_IN_USE,
 	QI_DONE,
-	QI_ABORT
+	QI_ABORT,
+	QI_TO_BE_FREED
 };
 
 #define QI_CC_TYPE		0x1
-- 
2.25.1
RE: [PATCH] iommu/vt-d: Fix potential soft lockup due to reclaim
Posted by Tian, Kevin 1 year, 6 months ago
> From: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Sent: Saturday, July 20, 2024 2:17 AM
> 
> From: Sanjay K Kumar <sanjay.k.kumar@intel.com>
> 
> If qi_submit_sync() is invoked with 0 invalidation descriptors (for
> instance, for DMA draining purposes), we can run into a bug where a
> submitting thread fails to detect the completion of invalidation_wait.
> Subsequently, this led to a soft lockup.
> 
> Suppose thread T1 invokes qi_submit_sync() with non-zero descriptors,
> while
> concurrently, thread T2 calls qi_submit_sync() with zero descriptors. Both
> threads then enter a while loop, waiting for their respective descriptors
> to complete. T1 detects its completion (i.e., T1's invalidation_wait status
> changes to QI_DONE by HW) and proceeds to call reclaim_free_desc() to
> reclaim all descriptors, potentially including adjacent ones of other
> threads that are also marked as QI_DONE.
> 
> During this time, while T2 is waiting to acquire the qi->q_lock, the IOMMU
> hardware may complete the invalidation for T2, setting its status to
> QI_DONE. However, if T1's execution of reclaim_free_desc() frees T2's
> invalidation_wait descriptor and changes its status to QI_FREE, T2 will
> not observe the QI_DONE status for its invalidation_wait and will
> indefinitely remain stuck.
> 
> This soft lockup does not occur when only non-zero descriptors are
> submitted.In such cases, invalidation descriptors are interspersed among
> wait descriptors with the status QI_IN_USE, acting as barriers. These
> barriers prevent the reclaim code from mistakenly freeing descriptors
> belonging to other submitters.
> 
> Considered the following example timeline:
> 	T1			T2
> ========================================
> 	ID1
> 	WD1
> 	while(WD1!=QI_DONE)
> 	unlock
> 				lock
> 	WD1=QI_DONE*		WD2
> 				while(WD2!=QI_DONE)
> 				unlock
> 	lock
> 	WD1==QI_DONE?
> 	ID1=QI_DONE		WD2=DONE*
> 	reclaim()
> 	ID1=FREE
> 	WD1=FREE
> 	WD2=FREE
> 	unlock
> 				soft lockup! T2 never sees QI_DONE in WD2
> 
> Where:
> ID = invalidation descriptor
> WD = wait descriptor
> * Written by hardware
> 
> The root of the problem is that the descriptor status QI_DONE flag is used
> for two conflicting purposes:
> 1. signal a descriptor is ready for reclaim (to be freed)
> 2. signal by the hardware that a wait descriptor is complete
> 
> The solution (in this patch) is state separation by introducing a new flag
> for the descriptors called QI_TO_BE_FREED.
> 
> Once a thread's invalidation descriptors are complete, their status would
> be set to QI_TO_BE_FREED. The reclaim_free_desc() function would then
> only
> free descriptors marked as QI_TO_BE_FREED instead of those marked as
> QI_DONE. This change ensures that T2 (from the previous example) will
> correctly observe the completion of its invalidation_wait (marked as
> QI_DONE).
> 
> Currently, there is no impact by this bug on the existing users because no
> callers are submitting invalidations with 0 descriptors.

bug fix is for existing users. Please revise the subject line and this msg
to make it clear that it's for preparation of a new usage.

> 
> Signed-off-by: Sanjay K Kumar <sanjay.k.kumar@intel.com>
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> ---
>  drivers/iommu/intel/dmar.c  | 13 +++++++++----
>  drivers/iommu/intel/iommu.h |  3 ++-
>  2 files changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
> index 304e84949ca7..00e0f5f801c5 100644
> --- a/drivers/iommu/intel/dmar.c
> +++ b/drivers/iommu/intel/dmar.c
> @@ -1204,8 +1204,7 @@ static void free_iommu(struct intel_iommu
> *iommu)
>   */
>  static inline void reclaim_free_desc(struct q_inval *qi)
>  {
> -	while (qi->desc_status[qi->free_tail] == QI_DONE ||
> -	       qi->desc_status[qi->free_tail] == QI_ABORT) {
> +	while (qi->desc_status[qi->free_tail] == QI_TO_BE_FREED) {
>  		qi->desc_status[qi->free_tail] = QI_FREE;
>  		qi->free_tail = (qi->free_tail + 1) % QI_LENGTH;
>  		qi->free_cnt++;
> @@ -1463,8 +1462,14 @@ int qi_submit_sync(struct intel_iommu *iommu,
> struct qi_desc *desc,
>  		raw_spin_lock(&qi->q_lock);
>  	}
> 
> -	for (i = 0; i < count; i++)
> -		qi->desc_status[(index + i) % QI_LENGTH] = QI_DONE;
> +	/*
> +	 * The reclaim code can free descriptors from multiple submissions
> +	 * starting from the tail of the queue. When count == 0, the
> +	 * status of the standalone wait descriptor at the tail of the queue
> +	 * must be set to QI_TO_BE_FREED to allow the reclaim code to
> proceed.
> +	 */
> +	for (i = 0; i <= count; i++)
> +		qi->desc_status[(index + i) % QI_LENGTH] = QI_TO_BE_FREED;

We don't really need a new flag. Just set them to QI_FREE and then
reclaim QI_FREE slots until hitting qi->head in reclaim_free_desc().

> 
>  	reclaim_free_desc(qi);
>  	raw_spin_unlock_irqrestore(&qi->q_lock, flags);
> diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
> index eaf015b4353b..1ab39f9145f2 100644
> --- a/drivers/iommu/intel/iommu.h
> +++ b/drivers/iommu/intel/iommu.h
> @@ -382,7 +382,8 @@ enum {
>  	QI_FREE,
>  	QI_IN_USE,
>  	QI_DONE,
> -	QI_ABORT
> +	QI_ABORT,
> +	QI_TO_BE_FREED
>  };
> 
>  #define QI_CC_TYPE		0x1
> --
> 2.25.1
> 
Re: [PATCH] iommu/vt-d: Fix potential soft lockup due to reclaim
Posted by Jacob Pan 1 year, 6 months ago
On Wed, 24 Jul 2024 07:40:25 +0000, "Tian, Kevin" <kevin.tian@intel.com>
wrote:

> > From: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > Sent: Saturday, July 20, 2024 2:17 AM
> > 
> > From: Sanjay K Kumar <sanjay.k.kumar@intel.com>
> > 
> > If qi_submit_sync() is invoked with 0 invalidation descriptors (for
> > instance, for DMA draining purposes), we can run into a bug where a
> > submitting thread fails to detect the completion of invalidation_wait.
> > Subsequently, this led to a soft lockup.
> > 
> > Suppose thread T1 invokes qi_submit_sync() with non-zero descriptors,
> > while
> > concurrently, thread T2 calls qi_submit_sync() with zero descriptors.
> > Both threads then enter a while loop, waiting for their respective
> > descriptors to complete. T1 detects its completion (i.e., T1's
> > invalidation_wait status changes to QI_DONE by HW) and proceeds to call
> > reclaim_free_desc() to reclaim all descriptors, potentially including
> > adjacent ones of other threads that are also marked as QI_DONE.
> > 
> > During this time, while T2 is waiting to acquire the qi->q_lock, the
> > IOMMU hardware may complete the invalidation for T2, setting its status
> > to QI_DONE. However, if T1's execution of reclaim_free_desc() frees T2's
> > invalidation_wait descriptor and changes its status to QI_FREE, T2 will
> > not observe the QI_DONE status for its invalidation_wait and will
> > indefinitely remain stuck.
> > 
> > This soft lockup does not occur when only non-zero descriptors are
> > submitted.In such cases, invalidation descriptors are interspersed among
> > wait descriptors with the status QI_IN_USE, acting as barriers. These
> > barriers prevent the reclaim code from mistakenly freeing descriptors
> > belonging to other submitters.
> > 
> > Considered the following example timeline:
> > 	T1			T2
> > ========================================
> > 	ID1
> > 	WD1
> > 	while(WD1!=QI_DONE)
> > 	unlock
> > 				lock
> > 	WD1=QI_DONE*		WD2
> > 				while(WD2!=QI_DONE)
> > 				unlock
> > 	lock
> > 	WD1==QI_DONE?
> > 	ID1=QI_DONE		WD2=DONE*
> > 	reclaim()
> > 	ID1=FREE
> > 	WD1=FREE
> > 	WD2=FREE
> > 	unlock
> > 				soft lockup! T2 never sees QI_DONE in
> > WD2
> > 
> > Where:
> > ID = invalidation descriptor
> > WD = wait descriptor
> > * Written by hardware
> > 
> > The root of the problem is that the descriptor status QI_DONE flag is
> > used for two conflicting purposes:
> > 1. signal a descriptor is ready for reclaim (to be freed)
> > 2. signal by the hardware that a wait descriptor is complete
> > 
> > The solution (in this patch) is state separation by introducing a new
> > flag for the descriptors called QI_TO_BE_FREED.
> > 
> > Once a thread's invalidation descriptors are complete, their status
> > would be set to QI_TO_BE_FREED. The reclaim_free_desc() function would
> > then only
> > free descriptors marked as QI_TO_BE_FREED instead of those marked as
> > QI_DONE. This change ensures that T2 (from the previous example) will
> > correctly observe the completion of its invalidation_wait (marked as
> > QI_DONE).
> > 
> > Currently, there is no impact by this bug on the existing users because
> > no callers are submitting invalidations with 0 descriptors.  
> 
> bug fix is for existing users. Please revise the subject line and this msg
> to make it clear that it's for preparation of a new usage.
The bug is in the qi_submit_sync function itself since it permits callers
to give 0 as count. It is a bug regardless of users.

I put "potential" in the subject line to indicate, perhaps it is too vague.
How about just stating what it is fixing:
"Fix potential lockup if qi_submit_sync called with 0 count"

Also change this paragraph to:
"Currently, there is no impact by this bug on the existing users because
 no callers are submitting invalidations with 0 descriptors. This fix will
 enable future users (such as DMA drain) calling qi_submit_sync() with 0
 count."

> > 
> > Signed-off-by: Sanjay K Kumar <sanjay.k.kumar@intel.com>
> > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > ---
> >  drivers/iommu/intel/dmar.c  | 13 +++++++++----
> >  drivers/iommu/intel/iommu.h |  3 ++-
> >  2 files changed, 11 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
> > index 304e84949ca7..00e0f5f801c5 100644
> > --- a/drivers/iommu/intel/dmar.c
> > +++ b/drivers/iommu/intel/dmar.c
> > @@ -1204,8 +1204,7 @@ static void free_iommu(struct intel_iommu
> > *iommu)
> >   */
> >  static inline void reclaim_free_desc(struct q_inval *qi)
> >  {
> > -	while (qi->desc_status[qi->free_tail] == QI_DONE ||
> > -	       qi->desc_status[qi->free_tail] == QI_ABORT) {
> > +	while (qi->desc_status[qi->free_tail] == QI_TO_BE_FREED) {
> >  		qi->desc_status[qi->free_tail] = QI_FREE;
> >  		qi->free_tail = (qi->free_tail + 1) % QI_LENGTH;
> >  		qi->free_cnt++;
> > @@ -1463,8 +1462,14 @@ int qi_submit_sync(struct intel_iommu *iommu,
> > struct qi_desc *desc,
> >  		raw_spin_lock(&qi->q_lock);
> >  	}
> > 
> > -	for (i = 0; i < count; i++)
> > -		qi->desc_status[(index + i) % QI_LENGTH] = QI_DONE;
> > +	/*
> > +	 * The reclaim code can free descriptors from multiple
> > submissions
> > +	 * starting from the tail of the queue. When count == 0, the
> > +	 * status of the standalone wait descriptor at the tail of the
> > queue
> > +	 * must be set to QI_TO_BE_FREED to allow the reclaim code to
> > proceed.
> > +	 */
> > +	for (i = 0; i <= count; i++)
> > +		qi->desc_status[(index + i) % QI_LENGTH] =
> > QI_TO_BE_FREED;  
> 
> We don't really need a new flag. Just set them to QI_FREE and then
> reclaim QI_FREE slots until hitting qi->head in reclaim_free_desc().
We do need to have a separate state for descriptors pending to be freed.
Otherwise, reclaim code will advance pass the intended range.

> > 
> >  	reclaim_free_desc(qi);
> >  	raw_spin_unlock_irqrestore(&qi->q_lock, flags);
> > diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
> > index eaf015b4353b..1ab39f9145f2 100644
> > --- a/drivers/iommu/intel/iommu.h
> > +++ b/drivers/iommu/intel/iommu.h
> > @@ -382,7 +382,8 @@ enum {
> >  	QI_FREE,
> >  	QI_IN_USE,
> >  	QI_DONE,
> > -	QI_ABORT
> > +	QI_ABORT,
> > +	QI_TO_BE_FREED
> >  };
> > 
> >  #define QI_CC_TYPE		0x1
> > --
> > 2.25.1
> >   
> 


Thanks,

Jacob
RE: [PATCH] iommu/vt-d: Fix potential soft lockup due to reclaim
Posted by Tian, Kevin 1 year, 6 months ago
> From: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Sent: Thursday, July 25, 2024 12:26 AM
> 
> On Wed, 24 Jul 2024 07:40:25 +0000, "Tian, Kevin" <kevin.tian@intel.com>
> wrote:
> 
> > > From: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > > Sent: Saturday, July 20, 2024 2:17 AM
> > >
> > > Currently, there is no impact by this bug on the existing users because
> > > no callers are submitting invalidations with 0 descriptors.
> >
> > bug fix is for existing users. Please revise the subject line and this msg
> > to make it clear that it's for preparation of a new usage.
> The bug is in the qi_submit_sync function itself since it permits callers
> to give 0 as count. It is a bug regardless of users.
> 
> I put "potential" in the subject line to indicate, perhaps it is too vague.
> How about just stating what it is fixing:
> "Fix potential lockup if qi_submit_sync called with 0 count"
> 
> Also change this paragraph to:
> "Currently, there is no impact by this bug on the existing users because
>  no callers are submitting invalidations with 0 descriptors. This fix will
>  enable future users (such as DMA drain) calling qi_submit_sync() with 0
>  count."

Then please move it to the start.

> > > @@ -1463,8 +1462,14 @@ int qi_submit_sync(struct intel_iommu
> *iommu,
> > > struct qi_desc *desc,
> > >  		raw_spin_lock(&qi->q_lock);
> > >  	}
> > >
> > > -	for (i = 0; i < count; i++)
> > > -		qi->desc_status[(index + i) % QI_LENGTH] = QI_DONE;
> > > +	/*
> > > +	 * The reclaim code can free descriptors from multiple
> > > submissions
> > > +	 * starting from the tail of the queue. When count == 0, the
> > > +	 * status of the standalone wait descriptor at the tail of the
> > > queue
> > > +	 * must be set to QI_TO_BE_FREED to allow the reclaim code to
> > > proceed.
> > > +	 */
> > > +	for (i = 0; i <= count; i++)
> > > +		qi->desc_status[(index + i) % QI_LENGTH] =
> > > QI_TO_BE_FREED;
> >
> > We don't really need a new flag. Just set them to QI_FREE and then
> > reclaim QI_FREE slots until hitting qi->head in reclaim_free_desc().
> We do need to have a separate state for descriptors pending to be freed.
> Otherwise, reclaim code will advance pass the intended range.
> 

The commit msg said that QI_DONE is currently used for conflicting purpose.

Using QI_FREE keeps it only for signaling that a wait desc is completed.

The key is that reclaim() should not change a desc's state before it's
consumed by the owner. Instead we always let the owner to change the
state and reclaim() only does scan and adjust the tracking fields then
such race condition disappears.

In this example T2's slots are changed to QI_FREE by T2 after it completes
all the checks. Only at this point those slots can be reclaimed.
Re: [PATCH] iommu/vt-d: Fix potential soft lockup due to reclaim
Posted by Jacob Pan 1 year, 6 months ago
On Thu, 25 Jul 2024 00:44:05 +0000, "Tian, Kevin" <kevin.tian@intel.com>
wrote:

> > From: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > Sent: Thursday, July 25, 2024 12:26 AM
> > 
> > On Wed, 24 Jul 2024 07:40:25 +0000, "Tian, Kevin" <kevin.tian@intel.com>
> > wrote:
> >   
> > > > From: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > > > Sent: Saturday, July 20, 2024 2:17 AM
> > > >
> > > > Currently, there is no impact by this bug on the existing users
> > > > because no callers are submitting invalidations with 0 descriptors.
> > > >  
> > >
> > > bug fix is for existing users. Please revise the subject line and
> > > this msg to make it clear that it's for preparation of a new usage.  
> > The bug is in the qi_submit_sync function itself since it permits
> > callers to give 0 as count. It is a bug regardless of users.
> > 
> > I put "potential" in the subject line to indicate, perhaps it is too
> > vague. How about just stating what it is fixing:
> > "Fix potential lockup if qi_submit_sync called with 0 count"
> > 
> > Also change this paragraph to:
> > "Currently, there is no impact by this bug on the existing users because
> >  no callers are submitting invalidations with 0 descriptors. This fix
> > will enable future users (such as DMA drain) calling qi_submit_sync()
> > with 0 count."  
> 
> Then please move it to the start.
> 
> > > > @@ -1463,8 +1462,14 @@ int qi_submit_sync(struct intel_iommu  
> > *iommu,  
> > > > struct qi_desc *desc,
> > > >  		raw_spin_lock(&qi->q_lock);
> > > >  	}
> > > >
> > > > -	for (i = 0; i < count; i++)
> > > > -		qi->desc_status[(index + i) % QI_LENGTH] = QI_DONE;
> > > > +	/*
> > > > +	 * The reclaim code can free descriptors from multiple
> > > > submissions
> > > > +	 * starting from the tail of the queue. When count == 0,
> > > > the
> > > > +	 * status of the standalone wait descriptor at the tail of
> > > > the queue
> > > > +	 * must be set to QI_TO_BE_FREED to allow the reclaim code
> > > > to proceed.
> > > > +	 */
> > > > +	for (i = 0; i <= count; i++)
> > > > +		qi->desc_status[(index + i) % QI_LENGTH] =
> > > > QI_TO_BE_FREED;  
> > >
> > > We don't really need a new flag. Just set them to QI_FREE and then
> > > reclaim QI_FREE slots until hitting qi->head in reclaim_free_desc().  
> > We do need to have a separate state for descriptors pending to be freed.
> > Otherwise, reclaim code will advance pass the intended range.
> >   
> 
> The commit msg said that QI_DONE is currently used for conflicting
> purpose.
> 
> Using QI_FREE keeps it only for signaling that a wait desc is completed.
> 
> The key is that reclaim() should not change a desc's state before it's
> consumed by the owner. Instead we always let the owner to change the
> state and reclaim() only does scan and adjust the tracking fields then
> such race condition disappears.
> 
> In this example T2's slots are changed to QI_FREE by T2 after it completes
> all the checks. Only at this point those slots can be reclaimed.

The problem is that without the TO_BE_FREED state, the reclaim code would
have no way of knowing which ones are to be reclaimed and which ones are
currently free. Therefore, it cannot track free_cnt.

The current reclaim code is not aware of owners nor how many to reclaim.

If I make the following changes and run, free_cnt will keep going up and
system cannot boot. Perhaps you meant another way?

--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -1204,8 +1204,7 @@ static void free_iommu(struct intel_iommu *iommu)
  */
 static inline void reclaim_free_desc(struct q_inval *qi)
 {
-       while (qi->desc_status[qi->free_tail] == QI_TO_BE_FREED) {
-               qi->desc_status[qi->free_tail] = QI_FREE;
+       while (qi->desc_status[qi->free_tail] == QI_FREE) {
                qi->free_tail = (qi->free_tail + 1) % QI_LENGTH;
                qi->free_cnt++;
        }
@@ -1466,10 +1465,10 @@ int qi_submit_sync(struct intel_iommu *iommu, struct qi_desc *desc,
         * The reclaim code can free descriptors from multiple submissions
         * starting from the tail of the queue. When count == 0, the
         * status of the standalone wait descriptor at the tail of the queue
-        * must be set to QI_TO_BE_FREED to allow the reclaim code to proceed.
+        * must be set to QI_FREE to allow the reclaim code to proceed.
         */
        for (i = 0; i <= count; i++)
-               qi->desc_status[(index + i) % QI_LENGTH] = QI_TO_BE_FREED;
+               qi->desc_status[(index + i) % QI_LENGTH] = QI_FREE;

        reclaim_free_desc(qi);
        raw_spin_unlock_irqrestore(&qi->q_lock, flags);
diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index 1ab39f9145f2..eaf015b4353b 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -382,8 +382,7 @@ enum {
        QI_FREE,
        QI_IN_USE,
        QI_DONE,
-       QI_ABORT,
-       QI_TO_BE_FREED
+       QI_ABORT
 };

Thanks,

Jacob
RE: [PATCH] iommu/vt-d: Fix potential soft lockup due to reclaim
Posted by Tian, Kevin 1 year, 6 months ago
> From: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Sent: Friday, July 26, 2024 5:11 AM
> > > > > @@ -1463,8 +1462,14 @@ int qi_submit_sync(struct intel_iommu
> > > *iommu,
> > > > > struct qi_desc *desc,
> > > > >  		raw_spin_lock(&qi->q_lock);
> > > > >  	}
> > > > >
> > > > > -	for (i = 0; i < count; i++)
> > > > > -		qi->desc_status[(index + i) % QI_LENGTH] = QI_DONE;
> > > > > +	/*
> > > > > +	 * The reclaim code can free descriptors from multiple
> > > > > submissions
> > > > > +	 * starting from the tail of the queue. When count == 0,
> > > > > the
> > > > > +	 * status of the standalone wait descriptor at the tail of
> > > > > the queue
> > > > > +	 * must be set to QI_TO_BE_FREED to allow the reclaim code
> > > > > to proceed.
> > > > > +	 */
> > > > > +	for (i = 0; i <= count; i++)
> > > > > +		qi->desc_status[(index + i) % QI_LENGTH] =
> > > > > QI_TO_BE_FREED;
> > > >
> > > > We don't really need a new flag. Just set them to QI_FREE and then
> > > > reclaim QI_FREE slots until hitting qi->head in reclaim_free_desc().
> > > We do need to have a separate state for descriptors pending to be freed.
> > > Otherwise, reclaim code will advance pass the intended range.
> > >
> >
> > The commit msg said that QI_DONE is currently used for conflicting
> > purpose.
> >
> > Using QI_FREE keeps it only for signaling that a wait desc is completed.
> >
> > The key is that reclaim() should not change a desc's state before it's
> > consumed by the owner. Instead we always let the owner to change the
> > state and reclaim() only does scan and adjust the tracking fields then
> > such race condition disappears.
> >
> > In this example T2's slots are changed to QI_FREE by T2 after it completes
> > all the checks. Only at this point those slots can be reclaimed.
> 
> The problem is that without the TO_BE_FREED state, the reclaim code would
> have no way of knowing which ones are to be reclaimed and which ones are
> currently free. Therefore, it cannot track free_cnt.
> 
> The current reclaim code is not aware of owners nor how many to reclaim.
> 
> If I make the following changes and run, free_cnt will keep going up and
> system cannot boot. Perhaps you meant another way?
> 
> --- a/drivers/iommu/intel/dmar.c
> +++ b/drivers/iommu/intel/dmar.c
> @@ -1204,8 +1204,7 @@ static void free_iommu(struct intel_iommu
> *iommu)
>   */
>  static inline void reclaim_free_desc(struct q_inval *qi)
>  {
> -       while (qi->desc_status[qi->free_tail] == QI_TO_BE_FREED) {
> -               qi->desc_status[qi->free_tail] = QI_FREE;
> +       while (qi->desc_status[qi->free_tail] == QI_FREE) {
>                 qi->free_tail = (qi->free_tail + 1) % QI_LENGTH;
>                 qi->free_cnt++;

Here miss a check to prevent reclaiming unused slots:

		if (qi->free_tail == qi->free_head)
			break;

In the example flow reclaim_free_desc() in T1 will only reclaim slots
used by T1 as slots of T2 are either QI_IN_USE or QI_DONE. T2 slots
will be reclaimed when T2 calls reclaim_free_desc() after setting them
to QI_FREE, and reclaim will stop at qi->free_head.

If for some reason T2 completes earlier than T1. reclaim_free_desc()
in T2 does nothing as the first slot qi->free_tail belongs to T1 still
IN_USE. T2's slots will then wait until reclaim is triggered by T1 later.

>         }
> @@ -1466,10 +1465,10 @@ int qi_submit_sync(struct intel_iommu *iommu,
> struct qi_desc *desc,
>          * The reclaim code can free descriptors from multiple submissions
>          * starting from the tail of the queue. When count == 0, the
>          * status of the standalone wait descriptor at the tail of the queue
> -        * must be set to QI_TO_BE_FREED to allow the reclaim code to proceed.
> +        * must be set to QI_FREE to allow the reclaim code to proceed.
>          */
>         for (i = 0; i <= count; i++)
> -               qi->desc_status[(index + i) % QI_LENGTH] = QI_TO_BE_FREED;
> +               qi->desc_status[(index + i) % QI_LENGTH] = QI_FREE;
> 
>         reclaim_free_desc(qi);
>         raw_spin_unlock_irqrestore(&qi->q_lock, flags);
> diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
> index 1ab39f9145f2..eaf015b4353b 100644
> --- a/drivers/iommu/intel/iommu.h
> +++ b/drivers/iommu/intel/iommu.h
> @@ -382,8 +382,7 @@ enum {
>         QI_FREE,
>         QI_IN_USE,
>         QI_DONE,
> -       QI_ABORT,
> -       QI_TO_BE_FREED
> +       QI_ABORT
>  };
> 
> Thanks,
> 
> Jacob
Re: [PATCH] iommu/vt-d: Fix potential soft lockup due to reclaim
Posted by Jacob Pan 1 year, 6 months ago
On Fri, 26 Jul 2024 00:18:13 +0000, "Tian, Kevin" <kevin.tian@intel.com>
wrote:

> > From: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > Sent: Friday, July 26, 2024 5:11 AM  
> > > > > > @@ -1463,8 +1462,14 @@ int qi_submit_sync(struct intel_iommu  
> > > > *iommu,  
> > > > > > struct qi_desc *desc,
> > > > > >  		raw_spin_lock(&qi->q_lock);
> > > > > >  	}
> > > > > >
> > > > > > -	for (i = 0; i < count; i++)
> > > > > > -		qi->desc_status[(index + i) % QI_LENGTH] =
> > > > > > QI_DONE;
> > > > > > +	/*
> > > > > > +	 * The reclaim code can free descriptors from multiple
> > > > > > submissions
> > > > > > +	 * starting from the tail of the queue. When count ==
> > > > > > 0, the
> > > > > > +	 * status of the standalone wait descriptor at the
> > > > > > tail of the queue
> > > > > > +	 * must be set to QI_TO_BE_FREED to allow the reclaim
> > > > > > code to proceed.
> > > > > > +	 */
> > > > > > +	for (i = 0; i <= count; i++)
> > > > > > +		qi->desc_status[(index + i) % QI_LENGTH] =
> > > > > > QI_TO_BE_FREED;  
> > > > >
> > > > > We don't really need a new flag. Just set them to QI_FREE and then
> > > > > reclaim QI_FREE slots until hitting qi->head in
> > > > > reclaim_free_desc().  
> > > > We do need to have a separate state for descriptors pending to be
> > > > freed. Otherwise, reclaim code will advance pass the intended range.
> > > >  
> > >
> > > The commit msg said that QI_DONE is currently used for conflicting
> > > purpose.
> > >
> > > Using QI_FREE keeps it only for signaling that a wait desc is
> > > completed.
> > >
> > > The key is that reclaim() should not change a desc's state before it's
> > > consumed by the owner. Instead we always let the owner to change the
> > > state and reclaim() only does scan and adjust the tracking fields then
> > > such race condition disappears.
> > >
> > > In this example T2's slots are changed to QI_FREE by T2 after it
> > > completes all the checks. Only at this point those slots can be
> > > reclaimed.  
> > 
> > The problem is that without the TO_BE_FREED state, the reclaim code
> > would have no way of knowing which ones are to be reclaimed and which
> > ones are currently free. Therefore, it cannot track free_cnt.
> > 
> > The current reclaim code is not aware of owners nor how many to reclaim.
> > 
> > If I make the following changes and run, free_cnt will keep going up and
> > system cannot boot. Perhaps you meant another way?
> > 
> > --- a/drivers/iommu/intel/dmar.c
> > +++ b/drivers/iommu/intel/dmar.c
> > @@ -1204,8 +1204,7 @@ static void free_iommu(struct intel_iommu
> > *iommu)
> >   */
> >  static inline void reclaim_free_desc(struct q_inval *qi)
> >  {
> > -       while (qi->desc_status[qi->free_tail] == QI_TO_BE_FREED) {
> > -               qi->desc_status[qi->free_tail] = QI_FREE;
> > +       while (qi->desc_status[qi->free_tail] == QI_FREE) {
> >                 qi->free_tail = (qi->free_tail + 1) % QI_LENGTH;
> >                 qi->free_cnt++;  
> 
> Here miss a check to prevent reclaiming unused slots:
> 
> 		if (qi->free_tail == qi->free_head)
> 			break;
> 
> In the example flow reclaim_free_desc() in T1 will only reclaim slots
> used by T1 as slots of T2 are either QI_IN_USE or QI_DONE. T2 slots
> will be reclaimed when T2 calls reclaim_free_desc() after setting them
> to QI_FREE, and reclaim will stop at qi->free_head.
> 
> If for some reason T2 completes earlier than T1. reclaim_free_desc()
> in T2 does nothing as the first slot qi->free_tail belongs to T1 still
> IN_USE. T2's slots will then wait until reclaim is triggered by T1 later.
> 
This makes sense. Unlike the original code, we now only allow freeing the
submitter's own descriptors.

> >         }
> > @@ -1466,10 +1465,10 @@ int qi_submit_sync(struct intel_iommu *iommu,
> > struct qi_desc *desc,
> >          * The reclaim code can free descriptors from multiple
> > submissions
> >          * starting from the tail of the queue. When count == 0, the
> >          * status of the standalone wait descriptor at the tail of the
> > queue
> > -        * must be set to QI_TO_BE_FREED to allow the reclaim code to
> > proceed.
> > +        * must be set to QI_FREE to allow the reclaim code to proceed.
> >          */
> >         for (i = 0; i <= count; i++)
> > -               qi->desc_status[(index + i) % QI_LENGTH] =
> > QI_TO_BE_FREED;
> > +               qi->desc_status[(index + i) % QI_LENGTH] = QI_FREE;
> > 
> >         reclaim_free_desc(qi);
> >         raw_spin_unlock_irqrestore(&qi->q_lock, flags);
> > diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
> > index 1ab39f9145f2..eaf015b4353b 100644
> > --- a/drivers/iommu/intel/iommu.h
> > +++ b/drivers/iommu/intel/iommu.h
> > @@ -382,8 +382,7 @@ enum {
> >         QI_FREE,
> >         QI_IN_USE,
> >         QI_DONE,
> > -       QI_ABORT,
> > -       QI_TO_BE_FREED
> > +       QI_ABORT
> >  };
> > 
> > Thanks,
> > 
> > Jacob  
> 


Thanks,

Jacob
RE: [PATCH] iommu/vt-d: Fix potential soft lockup due to reclaim
Posted by Kumar, Sanjay K 1 year, 5 months ago
Sorry, just catching up with emails.
I noticed another thing. 
In qi_submit_sync() input parameters, when the count is 0, the expectation should be that desc should be NULL right?
In that case, the below code will cause a problem.

type = desc->qw0 & GENMASK_ULL(3, 0);

The above line requires caller (when calling with count = 0) to pass a fake non-NULL desc pointer. Should we fix this as well? Alternatively, we can fix it whenever we create the use case of caller calling with count=0.

Thanks,
Sanjay

-----Original Message-----
From: Jacob Pan <jacob.jun.pan@linux.intel.com> 
Sent: Thursday, July 25, 2024 7:42 PM
To: Tian, Kevin <kevin.tian@intel.com>
Cc: iommu@lists.linux.dev; LKML <linux-kernel@vger.kernel.org>; Lu Baolu <baolu.lu@linux.intel.com>; Liu, Yi L <yi.l.liu@intel.com>; Zhang, Tina <tina.zhang@intel.com>; Kumar, Sanjay K <sanjay.k.kumar@intel.com>; jacob.jun.pan@linux.intel.com
Subject: Re: [PATCH] iommu/vt-d: Fix potential soft lockup due to reclaim


On Fri, 26 Jul 2024 00:18:13 +0000, "Tian, Kevin" <kevin.tian@intel.com>
wrote:

> > From: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > Sent: Friday, July 26, 2024 5:11 AM
> > > > > > @@ -1463,8 +1462,14 @@ int qi_submit_sync(struct intel_iommu
> > > > *iommu,
> > > > > > struct qi_desc *desc,
> > > > > >  		raw_spin_lock(&qi->q_lock);
> > > > > >  	}
> > > > > >
> > > > > > -	for (i = 0; i < count; i++)
> > > > > > -		qi->desc_status[(index + i) % QI_LENGTH] =
> > > > > > QI_DONE;
> > > > > > +	/*
> > > > > > +	 * The reclaim code can free descriptors from multiple
> > > > > > submissions
> > > > > > +	 * starting from the tail of the queue. When count ==
> > > > > > 0, the
> > > > > > +	 * status of the standalone wait descriptor at the
> > > > > > tail of the queue
> > > > > > +	 * must be set to QI_TO_BE_FREED to allow the reclaim
> > > > > > code to proceed.
> > > > > > +	 */
> > > > > > +	for (i = 0; i <= count; i++)
> > > > > > +		qi->desc_status[(index + i) % QI_LENGTH] =
> > > > > > QI_TO_BE_FREED;
> > > > >
> > > > > We don't really need a new flag. Just set them to QI_FREE and 
> > > > > then reclaim QI_FREE slots until hitting qi->head in 
> > > > > reclaim_free_desc().
> > > > We do need to have a separate state for descriptors pending to 
> > > > be freed. Otherwise, reclaim code will advance pass the intended range.
> > > >  
> > >
> > > The commit msg said that QI_DONE is currently used for conflicting 
> > > purpose.
> > >
> > > Using QI_FREE keeps it only for signaling that a wait desc is 
> > > completed.
> > >
> > > The key is that reclaim() should not change a desc's state before 
> > > it's consumed by the owner. Instead we always let the owner to 
> > > change the state and reclaim() only does scan and adjust the 
> > > tracking fields then such race condition disappears.
> > >
> > > In this example T2's slots are changed to QI_FREE by T2 after it 
> > > completes all the checks. Only at this point those slots can be 
> > > reclaimed.
> > 
> > The problem is that without the TO_BE_FREED state, the reclaim code 
> > would have no way of knowing which ones are to be reclaimed and 
> > which ones are currently free. Therefore, it cannot track free_cnt.
> > 
> > The current reclaim code is not aware of owners nor how many to reclaim.
> > 
> > If I make the following changes and run, free_cnt will keep going up 
> > and system cannot boot. Perhaps you meant another way?
> > 
> > --- a/drivers/iommu/intel/dmar.c
> > +++ b/drivers/iommu/intel/dmar.c
> > @@ -1204,8 +1204,7 @@ static void free_iommu(struct intel_iommu
> > *iommu)
> >   */
> >  static inline void reclaim_free_desc(struct q_inval *qi)  {
> > -       while (qi->desc_status[qi->free_tail] == QI_TO_BE_FREED) {
> > -               qi->desc_status[qi->free_tail] = QI_FREE;
> > +       while (qi->desc_status[qi->free_tail] == QI_FREE) {
> >                 qi->free_tail = (qi->free_tail + 1) % QI_LENGTH;
> >                 qi->free_cnt++;
> 
> Here miss a check to prevent reclaiming unused slots:
> 
> 		if (qi->free_tail == qi->free_head)
> 			break;
> 
> In the example flow reclaim_free_desc() in T1 will only reclaim slots 
> used by T1 as slots of T2 are either QI_IN_USE or QI_DONE. T2 slots 
> will be reclaimed when T2 calls reclaim_free_desc() after setting them 
> to QI_FREE, and reclaim will stop at qi->free_head.
> 
> If for some reason T2 completes earlier than T1. reclaim_free_desc() 
> in T2 does nothing as the first slot qi->free_tail belongs to T1 still 
> IN_USE. T2's slots will then wait until reclaim is triggered by T1 later.
> 
This makes sense. Unlike the original code, we now only allow freeing the submitter's own descriptors.

> >         }
> > @@ -1466,10 +1465,10 @@ int qi_submit_sync(struct intel_iommu 
> > *iommu, struct qi_desc *desc,
> >          * The reclaim code can free descriptors from multiple 
> > submissions
> >          * starting from the tail of the queue. When count == 0, the
> >          * status of the standalone wait descriptor at the tail of 
> > the queue
> > -        * must be set to QI_TO_BE_FREED to allow the reclaim code to
> > proceed.
> > +        * must be set to QI_FREE to allow the reclaim code to proceed.
> >          */
> >         for (i = 0; i <= count; i++)
> > -               qi->desc_status[(index + i) % QI_LENGTH] =
> > QI_TO_BE_FREED;
> > +               qi->desc_status[(index + i) % QI_LENGTH] = QI_FREE;
> > 
> >         reclaim_free_desc(qi);
> >         raw_spin_unlock_irqrestore(&qi->q_lock, flags); diff --git 
> > a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h index 
> > 1ab39f9145f2..eaf015b4353b 100644
> > --- a/drivers/iommu/intel/iommu.h
> > +++ b/drivers/iommu/intel/iommu.h
> > @@ -382,8 +382,7 @@ enum {
> >         QI_FREE,
> >         QI_IN_USE,
> >         QI_DONE,
> > -       QI_ABORT,
> > -       QI_TO_BE_FREED
> > +       QI_ABORT
> >  };
> > 
> > Thanks,
> > 
> > Jacob
> 


Thanks,

Jacob
Re: [PATCH] iommu/vt-d: Fix potential soft lockup due to reclaim
Posted by Baolu Lu 1 year, 5 months ago
On 2024/9/2 13:44, Kumar, Sanjay K wrote:
> Sorry, just catching up with emails.
> I noticed another thing.
> In qi_submit_sync() input parameters, when the count is 0, the expectation should be that desc should be NULL right?
> In that case, the below code will cause a problem.
> 
> type = desc->qw0 & GENMASK_ULL(3, 0);
> 
> The above line requires caller (when calling with count = 0) to pass a fake non-NULL desc pointer. Should we fix this as well? Alternatively, we can fix it whenever we create the use case of caller calling with count=0.

No worries, Sanjay. I will take care of this later. This is actually not
a fix, but rather an extension of the helper to support a new use case
(count=0).

Thanks,
baolu

> 
> Thanks,
> Sanjay
> 
> -----Original Message-----
> From: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Sent: Thursday, July 25, 2024 7:42 PM
> To: Tian, Kevin <kevin.tian@intel.com>
> Cc: iommu@lists.linux.dev; LKML <linux-kernel@vger.kernel.org>; Lu Baolu <baolu.lu@linux.intel.com>; Liu, Yi L <yi.l.liu@intel.com>; Zhang, Tina <tina.zhang@intel.com>; Kumar, Sanjay K <sanjay.k.kumar@intel.com>; jacob.jun.pan@linux.intel.com
> Subject: Re: [PATCH] iommu/vt-d: Fix potential soft lockup due to reclaim
> 
> 
> On Fri, 26 Jul 2024 00:18:13 +0000, "Tian, Kevin" <kevin.tian@intel.com>
> wrote:
> 
>>> From: Jacob Pan <jacob.jun.pan@linux.intel.com>
>>> Sent: Friday, July 26, 2024 5:11 AM
>>>>>>> @@ -1463,8 +1462,14 @@ int qi_submit_sync(struct intel_iommu
>>>>> *iommu,
>>>>>>> struct qi_desc *desc,
>>>>>>>   		raw_spin_lock(&qi->q_lock);
>>>>>>>   	}
>>>>>>>
>>>>>>> -	for (i = 0; i < count; i++)
>>>>>>> -		qi->desc_status[(index + i) % QI_LENGTH] =
>>>>>>> QI_DONE;
>>>>>>> +	/*
>>>>>>> +	 * The reclaim code can free descriptors from multiple
>>>>>>> submissions
>>>>>>> +	 * starting from the tail of the queue. When count ==
>>>>>>> 0, the
>>>>>>> +	 * status of the standalone wait descriptor at the
>>>>>>> tail of the queue
>>>>>>> +	 * must be set to QI_TO_BE_FREED to allow the reclaim
>>>>>>> code to proceed.
>>>>>>> +	 */
>>>>>>> +	for (i = 0; i <= count; i++)
>>>>>>> +		qi->desc_status[(index + i) % QI_LENGTH] =
>>>>>>> QI_TO_BE_FREED;
>>>>>>
>>>>>> We don't really need a new flag. Just set them to QI_FREE and
>>>>>> then reclaim QI_FREE slots until hitting qi->head in
>>>>>> reclaim_free_desc().
>>>>> We do need to have a separate state for descriptors pending to
>>>>> be freed. Otherwise, reclaim code will advance pass the intended range.
>>>>>   
>>>>
>>>> The commit msg said that QI_DONE is currently used for conflicting
>>>> purpose.
>>>>
>>>> Using QI_FREE keeps it only for signaling that a wait desc is
>>>> completed.
>>>>
>>>> The key is that reclaim() should not change a desc's state before
>>>> it's consumed by the owner. Instead we always let the owner to
>>>> change the state and reclaim() only does scan and adjust the
>>>> tracking fields then such race condition disappears.
>>>>
>>>> In this example T2's slots are changed to QI_FREE by T2 after it
>>>> completes all the checks. Only at this point those slots can be
>>>> reclaimed.
>>>
>>> The problem is that without the TO_BE_FREED state, the reclaim code
>>> would have no way of knowing which ones are to be reclaimed and
>>> which ones are currently free. Therefore, it cannot track free_cnt.
>>>
>>> The current reclaim code is not aware of owners nor how many to reclaim.
>>>
>>> If I make the following changes and run, free_cnt will keep going up
>>> and system cannot boot. Perhaps you meant another way?
>>>
>>> --- a/drivers/iommu/intel/dmar.c
>>> +++ b/drivers/iommu/intel/dmar.c
>>> @@ -1204,8 +1204,7 @@ static void free_iommu(struct intel_iommu
>>> *iommu)
>>>    */
>>>   static inline void reclaim_free_desc(struct q_inval *qi)  {
>>> -       while (qi->desc_status[qi->free_tail] == QI_TO_BE_FREED) {
>>> -               qi->desc_status[qi->free_tail] = QI_FREE;
>>> +       while (qi->desc_status[qi->free_tail] == QI_FREE) {
>>>                  qi->free_tail = (qi->free_tail + 1) % QI_LENGTH;
>>>                  qi->free_cnt++;
>>
>> Here miss a check to prevent reclaiming unused slots:
>>
>> 		if (qi->free_tail == qi->free_head)
>> 			break;
>>
>> In the example flow reclaim_free_desc() in T1 will only reclaim slots
>> used by T1 as slots of T2 are either QI_IN_USE or QI_DONE. T2 slots
>> will be reclaimed when T2 calls reclaim_free_desc() after setting them
>> to QI_FREE, and reclaim will stop at qi->free_head.
>>
>> If for some reason T2 completes earlier than T1. reclaim_free_desc()
>> in T2 does nothing as the first slot qi->free_tail belongs to T1 still
>> IN_USE. T2's slots will then wait until reclaim is triggered by T1 later.
>>
> This makes sense. Unlike the original code, we now only allow freeing the submitter's own descriptors.
> 
>>>          }
>>> @@ -1466,10 +1465,10 @@ int qi_submit_sync(struct intel_iommu
>>> *iommu, struct qi_desc *desc,
>>>           * The reclaim code can free descriptors from multiple
>>> submissions
>>>           * starting from the tail of the queue. When count == 0, the
>>>           * status of the standalone wait descriptor at the tail of
>>> the queue
>>> -        * must be set to QI_TO_BE_FREED to allow the reclaim code to
>>> proceed.
>>> +        * must be set to QI_FREE to allow the reclaim code to proceed.
>>>           */
>>>          for (i = 0; i <= count; i++)
>>> -               qi->desc_status[(index + i) % QI_LENGTH] =
>>> QI_TO_BE_FREED;
>>> +               qi->desc_status[(index + i) % QI_LENGTH] = QI_FREE;
>>>
>>>          reclaim_free_desc(qi);
>>>          raw_spin_unlock_irqrestore(&qi->q_lock, flags); diff --git
>>> a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h index
>>> 1ab39f9145f2..eaf015b4353b 100644
>>> --- a/drivers/iommu/intel/iommu.h
>>> +++ b/drivers/iommu/intel/iommu.h
>>> @@ -382,8 +382,7 @@ enum {
>>>          QI_FREE,
>>>          QI_IN_USE,
>>>          QI_DONE,
>>> -       QI_ABORT,
>>> -       QI_TO_BE_FREED
>>> +       QI_ABORT
>>>   };
>>>
>>> Thanks,
>>>
>>> Jacob
>>
> 
> 
> Thanks,
> 
> Jacob
>