crypto: qat - add WQ_PERCPU to alloc_workqueue users

[PATCH] crypto: qat - add WQ_PERCPU to alloc_workqueue users

Posted by Marco Crivellari 3 months ago

Currently if a user enqueues a work item using schedule_delayed_work() the
used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.
This lack of consistency cannot be addressed without refactoring the API.

alloc_workqueue() treats all queues as per-CPU by default, while unbound
workqueues must opt-in via WQ_UNBOUND.

This default is suboptimal: most workloads benefit from unbound queues,
allowing the scheduler to place worker threads where they’re needed and
reducing noise when CPUs are isolated.

This continues the effort to refactor workqueue APIs, which began with
the introduction of new workqueues and a new alloc_workqueue flag in:

commit 128ea9f6ccfb ("workqueue: Add system_percpu_wq and system_dfl_wq")
commit 930c2ea566af ("workqueue: Add new WQ_PERCPU flag")

This change adds a new WQ_PERCPU flag to explicitly request alloc_workqueue()
to be per-cpu when WQ_UNBOUND has not been specified.

With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND),
any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND
must now use WQ_PERCPU.

Once migration is complete, WQ_UNBOUND can be removed and unbound will
become the implicit default.

Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
---
 drivers/crypto/intel/qat/qat_common/adf_aer.c    | 4 ++--
 drivers/crypto/intel/qat/qat_common/adf_isr.c    | 3 ++-
 drivers/crypto/intel/qat/qat_common/adf_sriov.c  | 3 ++-
 drivers/crypto/intel/qat/qat_common/adf_vf_isr.c | 3 ++-
 4 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/crypto/intel/qat/qat_common/adf_aer.c b/drivers/crypto/intel/qat/qat_common/adf_aer.c
index 35679b21ff63..667d5e320f50 100644
--- a/drivers/crypto/intel/qat/qat_common/adf_aer.c
+++ b/drivers/crypto/intel/qat/qat_common/adf_aer.c
@@ -276,11 +276,11 @@ int adf_notify_fatal_error(struct adf_accel_dev *accel_dev)
 int adf_init_aer(void)
 {
 	device_reset_wq = alloc_workqueue("qat_device_reset_wq",
-					  WQ_MEM_RECLAIM, 0);
+					  WQ_MEM_RECLAIM | WQ_PERCPU, 0);
 	if (!device_reset_wq)
 		return -EFAULT;
 
-	device_sriov_wq = alloc_workqueue("qat_device_sriov_wq", 0, 0);
+	device_sriov_wq = alloc_workqueue("qat_device_sriov_wq", WQ_PERCPU, 0);
 	if (!device_sriov_wq) {
 		destroy_workqueue(device_reset_wq);
 		device_reset_wq = NULL;
diff --git a/drivers/crypto/intel/qat/qat_common/adf_isr.c b/drivers/crypto/intel/qat/qat_common/adf_isr.c
index 12e565613661..4639d7fd93e6 100644
--- a/drivers/crypto/intel/qat/qat_common/adf_isr.c
+++ b/drivers/crypto/intel/qat/qat_common/adf_isr.c
@@ -384,7 +384,8 @@ EXPORT_SYMBOL_GPL(adf_isr_resource_alloc);
  */
 int __init adf_init_misc_wq(void)
 {
-	adf_misc_wq = alloc_workqueue("qat_misc_wq", WQ_MEM_RECLAIM, 0);
+	adf_misc_wq = alloc_workqueue("qat_misc_wq",
+				      WQ_MEM_RECLAIM | WQ_PERCPU, 0);
 
 	return !adf_misc_wq ? -ENOMEM : 0;
 }
diff --git a/drivers/crypto/intel/qat/qat_common/adf_sriov.c b/drivers/crypto/intel/qat/qat_common/adf_sriov.c
index 31d1ef0cb1f5..bb904ba4bf84 100644
--- a/drivers/crypto/intel/qat/qat_common/adf_sriov.c
+++ b/drivers/crypto/intel/qat/qat_common/adf_sriov.c
@@ -299,7 +299,8 @@ EXPORT_SYMBOL_GPL(adf_sriov_configure);
 int __init adf_init_pf_wq(void)
 {
 	/* Workqueue for PF2VF responses */
-	pf2vf_resp_wq = alloc_workqueue("qat_pf2vf_resp_wq", WQ_MEM_RECLAIM, 0);
+	pf2vf_resp_wq = alloc_workqueue("qat_pf2vf_resp_wq",
+					WQ_MEM_RECLAIM | WQ_PERCPU, 0);
 
 	return !pf2vf_resp_wq ? -ENOMEM : 0;
 }
diff --git a/drivers/crypto/intel/qat/qat_common/adf_vf_isr.c b/drivers/crypto/intel/qat/qat_common/adf_vf_isr.c
index a4636ec9f9ca..d0fef20a3df4 100644
--- a/drivers/crypto/intel/qat/qat_common/adf_vf_isr.c
+++ b/drivers/crypto/intel/qat/qat_common/adf_vf_isr.c
@@ -299,7 +299,8 @@ EXPORT_SYMBOL_GPL(adf_flush_vf_wq);
  */
 int __init adf_init_vf_wq(void)
 {
-	adf_vf_stop_wq = alloc_workqueue("adf_vf_stop_wq", WQ_MEM_RECLAIM, 0);
+	adf_vf_stop_wq = alloc_workqueue("adf_vf_stop_wq",
+					 WQ_MEM_RECLAIM | WQ_PERCPU, 0);
 
 	return !adf_vf_stop_wq ? -EFAULT : 0;
 }
-- 
2.51.1

Re: [PATCH] crypto: qat - add WQ_PERCPU to alloc_workqueue users

Posted by Herbert Xu 2 months, 3 weeks ago

On Fri, Nov 07, 2025 at 12:23:54PM +0100, Marco Crivellari wrote:
> Currently if a user enqueues a work item using schedule_delayed_work() the
> used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
> WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
> schedule_work() that is using system_wq and queue_work(), that makes use
> again of WORK_CPU_UNBOUND.
> This lack of consistency cannot be addressed without refactoring the API.
> 
> alloc_workqueue() treats all queues as per-CPU by default, while unbound
> workqueues must opt-in via WQ_UNBOUND.
> 
> This default is suboptimal: most workloads benefit from unbound queues,
> allowing the scheduler to place worker threads where they’re needed and
> reducing noise when CPUs are isolated.
> 
> This continues the effort to refactor workqueue APIs, which began with
> the introduction of new workqueues and a new alloc_workqueue flag in:
> 
> commit 128ea9f6ccfb ("workqueue: Add system_percpu_wq and system_dfl_wq")
> commit 930c2ea566af ("workqueue: Add new WQ_PERCPU flag")
> 
> This change adds a new WQ_PERCPU flag to explicitly request alloc_workqueue()
> to be per-cpu when WQ_UNBOUND has not been specified.
> 
> With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND),
> any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND
> must now use WQ_PERCPU.
> 
> Once migration is complete, WQ_UNBOUND can be removed and unbound will
> become the implicit default.
> 
> Suggested-by: Tejun Heo <tj@kernel.org>
> Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
> ---
>  drivers/crypto/intel/qat/qat_common/adf_aer.c    | 4 ++--
>  drivers/crypto/intel/qat/qat_common/adf_isr.c    | 3 ++-
>  drivers/crypto/intel/qat/qat_common/adf_sriov.c  | 3 ++-
>  drivers/crypto/intel/qat/qat_common/adf_vf_isr.c | 3 ++-
>  4 files changed, 8 insertions(+), 5 deletions(-)

Patch applied.  Thanks.
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: [PATCH] crypto: qat - add WQ_PERCPU to alloc_workqueue users

Posted by Giovanni Cabiddu 2 months, 3 weeks ago

Hi Marco,

On Fri, Nov 07, 2025 at 12:23:54PM +0100, Marco Crivellari wrote:
> Currently if a user enqueues a work item using schedule_delayed_work() the
> used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
> WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
> schedule_work() that is using system_wq and queue_work(), that makes use
> again of WORK_CPU_UNBOUND.
> This lack of consistency cannot be addressed without refactoring the API.
The reference to WORK_CPU_UNBOUND in this paragraph got me a bit
confused :-). As I understand it, if a workqueue is allocated with default
parameters (i.e., no flags), it is per-CPU, so using queue_work() or
queue_delayed_work() on such a queue would behave similarly to
schedule_work() or schedule_delayed_work() in terms of CPU affinity.

Is the `lack of consistency` you are referring in this paragraph about
developer expectations?  IOW developers might assume they're getting
unbound behavior?

> alloc_workqueue() treats all queues as per-CPU by default, while unbound
> workqueues must opt-in via WQ_UNBOUND.
> 
> This default is suboptimal: most workloads benefit from unbound queues,
> allowing the scheduler to place worker threads where they’re needed and
> reducing noise when CPUs are isolated.
> 
> This continues the effort to refactor workqueue APIs, which began with
> the introduction of new workqueues and a new alloc_workqueue flag in:
> 
> commit 128ea9f6ccfb ("workqueue: Add system_percpu_wq and system_dfl_wq")
> commit 930c2ea566af ("workqueue: Add new WQ_PERCPU flag")
> 
> This change adds a new WQ_PERCPU flag to explicitly request alloc_workqueue()
> to be per-cpu when WQ_UNBOUND has not been specified.
> 
> With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND),
> any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND
> must now use WQ_PERCPU.
> 
> Once migration is complete, WQ_UNBOUND can be removed and unbound will
> become the implicit default.
> 
> Suggested-by: Tejun Heo <tj@kernel.org>
> Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Acked-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>

> ---
>  drivers/crypto/intel/qat/qat_common/adf_aer.c    | 4 ++--
>  drivers/crypto/intel/qat/qat_common/adf_isr.c    | 3 ++-
>  drivers/crypto/intel/qat/qat_common/adf_sriov.c  | 3 ++-
>  drivers/crypto/intel/qat/qat_common/adf_vf_isr.c | 3 ++-
>  4 files changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/crypto/intel/qat/qat_common/adf_aer.c b/drivers/crypto/intel/qat/qat_common/adf_aer.c
> index 35679b21ff63..667d5e320f50 100644
> --- a/drivers/crypto/intel/qat/qat_common/adf_aer.c
> +++ b/drivers/crypto/intel/qat/qat_common/adf_aer.c
> @@ -276,11 +276,11 @@ int adf_notify_fatal_error(struct adf_accel_dev *accel_dev)
>  int adf_init_aer(void)
>  {
>  	device_reset_wq = alloc_workqueue("qat_device_reset_wq",
> -					  WQ_MEM_RECLAIM, 0);
> +					  WQ_MEM_RECLAIM | WQ_PERCPU, 0);
>  	if (!device_reset_wq)
>  		return -EFAULT;
>  
> -	device_sriov_wq = alloc_workqueue("qat_device_sriov_wq", 0, 0);
> +	device_sriov_wq = alloc_workqueue("qat_device_sriov_wq", WQ_PERCPU, 0);
>  	if (!device_sriov_wq) {
>  		destroy_workqueue(device_reset_wq);
>  		device_reset_wq = NULL;
> diff --git a/drivers/crypto/intel/qat/qat_common/adf_isr.c b/drivers/crypto/intel/qat/qat_common/adf_isr.c
> index 12e565613661..4639d7fd93e6 100644
> --- a/drivers/crypto/intel/qat/qat_common/adf_isr.c
> +++ b/drivers/crypto/intel/qat/qat_common/adf_isr.c
> @@ -384,7 +384,8 @@ EXPORT_SYMBOL_GPL(adf_isr_resource_alloc);
>   */
>  int __init adf_init_misc_wq(void)
>  {
> -	adf_misc_wq = alloc_workqueue("qat_misc_wq", WQ_MEM_RECLAIM, 0);
> +	adf_misc_wq = alloc_workqueue("qat_misc_wq",
> +				      WQ_MEM_RECLAIM | WQ_PERCPU, 0);
>  
>  	return !adf_misc_wq ? -ENOMEM : 0;
>  }
> diff --git a/drivers/crypto/intel/qat/qat_common/adf_sriov.c b/drivers/crypto/intel/qat/qat_common/adf_sriov.c
> index 31d1ef0cb1f5..bb904ba4bf84 100644
> --- a/drivers/crypto/intel/qat/qat_common/adf_sriov.c
> +++ b/drivers/crypto/intel/qat/qat_common/adf_sriov.c
> @@ -299,7 +299,8 @@ EXPORT_SYMBOL_GPL(adf_sriov_configure);
>  int __init adf_init_pf_wq(void)
>  {
>  	/* Workqueue for PF2VF responses */
> -	pf2vf_resp_wq = alloc_workqueue("qat_pf2vf_resp_wq", WQ_MEM_RECLAIM, 0);
> +	pf2vf_resp_wq = alloc_workqueue("qat_pf2vf_resp_wq",
> +					WQ_MEM_RECLAIM | WQ_PERCPU, 0);
>  
>  	return !pf2vf_resp_wq ? -ENOMEM : 0;
>  }
> diff --git a/drivers/crypto/intel/qat/qat_common/adf_vf_isr.c b/drivers/crypto/intel/qat/qat_common/adf_vf_isr.c
> index a4636ec9f9ca..d0fef20a3df4 100644
> --- a/drivers/crypto/intel/qat/qat_common/adf_vf_isr.c
> +++ b/drivers/crypto/intel/qat/qat_common/adf_vf_isr.c
> @@ -299,7 +299,8 @@ EXPORT_SYMBOL_GPL(adf_flush_vf_wq);
>   */
>  int __init adf_init_vf_wq(void)
>  {
> -	adf_vf_stop_wq = alloc_workqueue("adf_vf_stop_wq", WQ_MEM_RECLAIM, 0);
> +	adf_vf_stop_wq = alloc_workqueue("adf_vf_stop_wq",
> +					 WQ_MEM_RECLAIM | WQ_PERCPU, 0);
>  
>  	return !adf_vf_stop_wq ? -EFAULT : 0;
>  }
> -- 
> 2.51.1
>

Re: [PATCH] crypto: qat - add WQ_PERCPU to alloc_workqueue users

Posted by Marco Crivellari 2 months, 3 weeks ago

On Wed, Nov 12, 2025 at 6:02 PM Giovanni Cabiddu
<giovanni.cabiddu@intel.com> wrote:
>
> Hi Marco,
>
> On Fri, Nov 07, 2025 at 12:23:54PM +0100, Marco Crivellari wrote:
> > Currently if a user enqueues a work item using schedule_delayed_work() the
> > used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
> > WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
> > schedule_work() that is using system_wq and queue_work(), that makes use
> > again of WORK_CPU_UNBOUND.
> > This lack of consistency cannot be addressed without refactoring the API.
> The reference to WORK_CPU_UNBOUND in this paragraph got me a bit
> confused :-). As I understand it, if a workqueue is allocated with default
> parameters (i.e., no flags), it is per-CPU, so using queue_work() or
> queue_delayed_work() on such a queue would behave similarly to
> schedule_work() or schedule_delayed_work() in terms of CPU affinity.
>
> Is the `lack of consistency` you are referring in this paragraph about
> developer expectations?  IOW developers might assume they're getting
> unbound behavior?

Hi Giovanni,

Sorry for the confusion. The first paragraph is mostly to give some information
about the reason for the change.

It is correct what you are saying, indeed.
I will share the cover letter (for subsystem that needs one):

----
Let's consider a nohz_full system with isolated CPUs: wq_unbound_cpumask is
set to the housekeeping CPUs, for !WQ_UNBOUND the local CPU is selected.

This leads to different scenarios if a work item is scheduled on an
isolated CPU where "delay" value is 0 or greater then 0:
        schedule_delayed_work(, 0);

This will be handled by __queue_work() that will queue the work item on the
current local (isolated) CPU, while:

        schedule_delayed_work(, 1);

Will move the timer on an housekeeping CPU, and schedule the work there.
----

You can find more information and details at (also the reasons about the WQ
API change):

https://lore.kernel.org/all/20250221112003.1dSuoGyc@linutronix.de/

In short anyhow: that paragraph is not directly related to the changes
introduced here.
Here we only added explicitly WQ_PERCPU if WQ_UNBOUND is not present.

Thanks!

--

Marco Crivellari

L3 Support Engineer, Technology & Product