drm/xe: Replace use of system_wq with tlb_inval->fence_signal_wq

[PATCH v2] drm/xe: Replace use of system_wq with tlb_inval->fence_signal_wq

Posted by Marco Crivellari 4 weeks, 1 day ago

This patch continues the effort to refactor workqueue APIs, which has begun
with the changes introducing new workqueues and a new alloc_workqueue flag:

   commit 128ea9f6ccfb ("workqueue: Add system_percpu_wq and system_dfl_wq")
   commit 930c2ea566af ("workqueue: Add new WQ_PERCPU flag")

The point of the refactoring is to eventually alter the default behavior of
workqueues to become unbound by default so that their workload placement is
optimized by the scheduler.

Before that to happen, workqueue users must be converted to the better named
new workqueues with no intended behaviour changes:

   system_wq -> system_percpu_wq
   system_unbound_wq -> system_dfl_wq

This way the old obsolete workqueues (system_wq, system_unbound_wq) can be
removed in the future.

After a carefully evaluation, because this is the fence signaling path, we
changed the code in order to use one of the Xe's workqueue.

So, a new workqueue named 'fence_signal_wq' has been added to
'struct xe_tlb_inval' and has been initialized with 'gt->ordered_wq'
changing the system_wq uses with tlb_inval->fence_signal_wq.

Link: https://lore.kernel.org/all/20250221112003.1dSuoGyc@linutronix.de/
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
---
Changes in v2:
- added 'fence_signal_wq', initialized with 'gt->ordered_wq' in order to use
  it in the fence signaling path, instead of system_wq.

- rebased on v6.19-rc4


 drivers/gpu/drm/xe/xe_tlb_inval.c       | 10 +++++++---
 drivers/gpu/drm/xe/xe_tlb_inval_types.h |  2 ++
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_tlb_inval.c b/drivers/gpu/drm/xe/xe_tlb_inval.c
index 918a59e686ea..2e98f407c59d 100644
--- a/drivers/gpu/drm/xe/xe_tlb_inval.c
+++ b/drivers/gpu/drm/xe/xe_tlb_inval.c
@@ -94,7 +94,7 @@ static void xe_tlb_inval_fence_timeout(struct work_struct *work)
 		xe_tlb_inval_fence_signal(fence);
 	}
 	if (!list_empty(&tlb_inval->pending_fences))
-		queue_delayed_work(system_wq, &tlb_inval->fence_tdr,
+		queue_delayed_work(tlb_inval->fence_signal_wq, &tlb_inval->fence_tdr,
 				   timeout_delay);
 	spin_unlock_irq(&tlb_inval->pending_lock);
 }
@@ -146,6 +146,10 @@ int xe_gt_tlb_inval_init_early(struct xe_gt *gt)
 	if (IS_ERR(tlb_inval->job_wq))
 		return PTR_ERR(tlb_inval->job_wq);
 
+	tlb_inval->fence_signal_wq = gt->ordered_wq;
+	if (IS_ERR(tlb_inval->fence_signal_wq))
+		return PTR_ERR(tlb_inval->fence_signal_wq);
+
 	/* XXX: Blindly setting up backend to GuC */
 	xe_guc_tlb_inval_init_early(&gt->uc.guc, tlb_inval);
 
@@ -226,7 +230,7 @@ static void xe_tlb_inval_fence_prep(struct xe_tlb_inval_fence *fence)
 	list_add_tail(&fence->link, &tlb_inval->pending_fences);
 
 	if (list_is_singular(&tlb_inval->pending_fences))
-		queue_delayed_work(system_wq, &tlb_inval->fence_tdr,
+		queue_delayed_work(tlb_inval->fence_signal_wq, &tlb_inval->fence_tdr,
 				   tlb_inval->ops->timeout_delay(tlb_inval));
 	spin_unlock_irq(&tlb_inval->pending_lock);
 
@@ -378,7 +382,7 @@ void xe_tlb_inval_done_handler(struct xe_tlb_inval *tlb_inval, int seqno)
 	}
 
 	if (!list_empty(&tlb_inval->pending_fences))
-		mod_delayed_work(system_wq,
+		mod_delayed_work(tlb_inval->fence_signal_wq,
 				 &tlb_inval->fence_tdr,
 				 tlb_inval->ops->timeout_delay(tlb_inval));
 	else
diff --git a/drivers/gpu/drm/xe/xe_tlb_inval_types.h b/drivers/gpu/drm/xe/xe_tlb_inval_types.h
index 8f8b060e9005..1a3e239ea3a7 100644
--- a/drivers/gpu/drm/xe/xe_tlb_inval_types.h
+++ b/drivers/gpu/drm/xe/xe_tlb_inval_types.h
@@ -106,6 +106,8 @@ struct xe_tlb_inval {
 	struct workqueue_struct *job_wq;
 	/** @tlb_inval.lock: protects TLB invalidation fences */
 	spinlock_t lock;
+	/** @fence_signal_wq: schedule fence signaling path jobs  */
+	struct workqueue_struct *fence_signal_wq;
 };
 
 /**
-- 
2.52.0

Re: [PATCH v2] drm/xe: Replace use of system_wq with tlb_inval->fence_signal_wq

Posted by Matthew Brost 4 weeks ago

On Fri, Jan 09, 2026 at 04:57:17PM +0100, Marco Crivellari wrote:
> This patch continues the effort to refactor workqueue APIs, which has begun
> with the changes introducing new workqueues and a new alloc_workqueue flag:
> 
>    commit 128ea9f6ccfb ("workqueue: Add system_percpu_wq and system_dfl_wq")
>    commit 930c2ea566af ("workqueue: Add new WQ_PERCPU flag")
> 
> The point of the refactoring is to eventually alter the default behavior of
> workqueues to become unbound by default so that their workload placement is
> optimized by the scheduler.
> 
> Before that to happen, workqueue users must be converted to the better named
> new workqueues with no intended behaviour changes:
> 
>    system_wq -> system_percpu_wq
>    system_unbound_wq -> system_dfl_wq
> 
> This way the old obsolete workqueues (system_wq, system_unbound_wq) can be
> removed in the future.
> 
> After a carefully evaluation, because this is the fence signaling path, we
> changed the code in order to use one of the Xe's workqueue.
> 
> So, a new workqueue named 'fence_signal_wq' has been added to
> 'struct xe_tlb_inval' and has been initialized with 'gt->ordered_wq'
> changing the system_wq uses with tlb_inval->fence_signal_wq.
> 
> Link: https://lore.kernel.org/all/20250221112003.1dSuoGyc@linutronix.de/
> Suggested-by: Tejun Heo <tj@kernel.org>
> Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
> ---
> Changes in v2:
> - added 'fence_signal_wq', initialized with 'gt->ordered_wq' in order to use
>   it in the fence signaling path, instead of system_wq.
> 
> - rebased on v6.19-rc4
> 
> 
>  drivers/gpu/drm/xe/xe_tlb_inval.c       | 10 +++++++---
>  drivers/gpu/drm/xe/xe_tlb_inval_types.h |  2 ++
>  2 files changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_tlb_inval.c b/drivers/gpu/drm/xe/xe_tlb_inval.c
> index 918a59e686ea..2e98f407c59d 100644
> --- a/drivers/gpu/drm/xe/xe_tlb_inval.c
> +++ b/drivers/gpu/drm/xe/xe_tlb_inval.c
> @@ -94,7 +94,7 @@ static void xe_tlb_inval_fence_timeout(struct work_struct *work)
>  		xe_tlb_inval_fence_signal(fence);
>  	}
>  	if (!list_empty(&tlb_inval->pending_fences))
> -		queue_delayed_work(system_wq, &tlb_inval->fence_tdr,
> +		queue_delayed_work(tlb_inval->fence_signal_wq, &tlb_inval->fence_tdr,
>  				   timeout_delay);
>  	spin_unlock_irq(&tlb_inval->pending_lock);
>  }
> @@ -146,6 +146,10 @@ int xe_gt_tlb_inval_init_early(struct xe_gt *gt)
>  	if (IS_ERR(tlb_inval->job_wq))
>  		return PTR_ERR(tlb_inval->job_wq);
>  
> +	tlb_inval->fence_signal_wq = gt->ordered_wq;
> +	if (IS_ERR(tlb_inval->fence_signal_wq))
> +		return PTR_ERR(tlb_inval->fence_signal_wq);
> +
>  	/* XXX: Blindly setting up backend to GuC */
>  	xe_guc_tlb_inval_init_early(&gt->uc.guc, tlb_inval);
>  
> @@ -226,7 +230,7 @@ static void xe_tlb_inval_fence_prep(struct xe_tlb_inval_fence *fence)
>  	list_add_tail(&fence->link, &tlb_inval->pending_fences);
>  
>  	if (list_is_singular(&tlb_inval->pending_fences))
> -		queue_delayed_work(system_wq, &tlb_inval->fence_tdr,
> +		queue_delayed_work(tlb_inval->fence_signal_wq, &tlb_inval->fence_tdr,
>  				   tlb_inval->ops->timeout_delay(tlb_inval));
>  	spin_unlock_irq(&tlb_inval->pending_lock);
>  
> @@ -378,7 +382,7 @@ void xe_tlb_inval_done_handler(struct xe_tlb_inval *tlb_inval, int seqno)
>  	}
>  
>  	if (!list_empty(&tlb_inval->pending_fences))
> -		mod_delayed_work(system_wq,
> +		mod_delayed_work(tlb_inval->fence_signal_wq,
>  				 &tlb_inval->fence_tdr,
>  				 tlb_inval->ops->timeout_delay(tlb_inval));
>  	else
> diff --git a/drivers/gpu/drm/xe/xe_tlb_inval_types.h b/drivers/gpu/drm/xe/xe_tlb_inval_types.h
> index 8f8b060e9005..1a3e239ea3a7 100644
> --- a/drivers/gpu/drm/xe/xe_tlb_inval_types.h
> +++ b/drivers/gpu/drm/xe/xe_tlb_inval_types.h
> @@ -106,6 +106,8 @@ struct xe_tlb_inval {
>  	struct workqueue_struct *job_wq;
>  	/** @tlb_inval.lock: protects TLB invalidation fences */
>  	spinlock_t lock;
> +	/** @fence_signal_wq: schedule fence signaling path jobs  */
> +	struct workqueue_struct *fence_signal_wq;

I hate to be nitpicky but this name / kernel doc isn't great.

How about:

/** @timeout_wq: schedules TLB invalidation fence timeouts */
struct workqueue_struct *timeout_wq;

Functionality everything in this patch looks correct.

Matt

>  };
>  
>  /**
> -- 
> 2.52.0
>

Re: [PATCH v2] drm/xe: Replace use of system_wq with tlb_inval->fence_signal_wq

Posted by Marco Crivellari 3 weeks, 5 days ago

On Fri, Jan 9, 2026 at 7:54 PM Matthew Brost <matthew.brost@intel.com> wrote:
>
> I hate to be nitpicky but this name / kernel doc isn't great.
>
> How about:
>
> /** @timeout_wq: schedules TLB invalidation fence timeouts */
> struct workqueue_struct *timeout_wq;
>
> Functionality everything in this patch looks correct.

Hi,

Sure sounds good, I will send the v3.

Thanks!

-- 

Marco Crivellari

L3 Support Engineer