[v1] replace old wq(s), added WQ_PERCPU to alloc_workqueue

[PATCH 1/4] drm/amdgpu: replace use of system_unbound_wq with system_dfl_wq

Posted by Marco Crivellari 3 months, 1 week ago

Currently if a user enqueue a work item using schedule_delayed_work() the
used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.

This lack of consistency cannot be addressed without refactoring the API.

system_unbound_wq should be the default workqueue so as not to enforce
locality constraints for random work whenever it's not required.

Adding system_dfl_wq to encourage its use when unbound work should be used.

The old system_unbound_wq will be kept for a few release cycles.

Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
---
 drivers/gpu/drm/amd/amdgpu/aldebaran.c     | 2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c  | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/aldebaran.c b/drivers/gpu/drm/amd/amdgpu/aldebaran.c
index 9569dc16dd3d..7957e6c4c416 100644
--- a/drivers/gpu/drm/amd/amdgpu/aldebaran.c
+++ b/drivers/gpu/drm/amd/amdgpu/aldebaran.c
@@ -175,7 +175,7 @@ aldebaran_mode2_perform_reset(struct amdgpu_reset_control *reset_ctl,
 	list_for_each_entry(tmp_adev, reset_device_list, reset_list) {
 		/* For XGMI run all resets in parallel to speed up the process */
 		if (tmp_adev->gmc.xgmi.num_physical_nodes > 1) {
-			if (!queue_work(system_unbound_wq,
+			if (!queue_work(system_dfl_wq,
 					&tmp_adev->reset_cntl->reset_work))
 				r = -EALREADY;
 		} else
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 7a899fb4de29..8c4d79f6c14f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -6033,7 +6033,7 @@ int amdgpu_do_asic_reset(struct list_head *device_list_handle,
 		list_for_each_entry(tmp_adev, device_list_handle, reset_list) {
 			/* For XGMI run all resets in parallel to speed up the process */
 			if (tmp_adev->gmc.xgmi.num_physical_nodes > 1) {
-				if (!queue_work(system_unbound_wq,
+				if (!queue_work(system_dfl_wq,
 						&tmp_adev->xgmi_reset_work))
 					r = -EALREADY;
 			} else
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
index 28c4ad62f50e..9c4631608526 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
@@ -116,7 +116,7 @@ static int amdgpu_reset_xgmi_reset_on_init_perform_reset(
 	/* Mode1 reset needs to be triggered on all devices together */
 	list_for_each_entry(tmp_adev, reset_device_list, reset_list) {
 		/* For XGMI run all resets in parallel to speed up the process */
-		if (!queue_work(system_unbound_wq, &tmp_adev->xgmi_reset_work))
+		if (!queue_work(system_dfl_wq, &tmp_adev->xgmi_reset_work))
 			r = -EALREADY;
 		if (r) {
 			dev_err(tmp_adev->dev,
-- 
2.51.0

Re: [PATCH 1/4] drm/amdgpu: replace use of system_unbound_wq with system_dfl_wq

Posted by Christian König 3 months, 1 week ago

On 10/30/25 17:10, Marco Crivellari wrote:
> Currently if a user enqueue a work item using schedule_delayed_work() the
> used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
> WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
> schedule_work() that is using system_wq and queue_work(), that makes use
> again of WORK_CPU_UNBOUND.
> 
> This lack of consistency cannot be addressed without refactoring the API.
> 
> system_unbound_wq should be the default workqueue so as not to enforce
> locality constraints for random work whenever it's not required.
> 
> Adding system_dfl_wq to encourage its use when unbound work should be used.
> 
> The old system_unbound_wq will be kept for a few release cycles.

In all the cases below we actually want the work to run on a different CPU than the current one.

So using system_unbound_wq seems to be more appropriate.

Regards,
Christian.

> 
> Suggested-by: Tejun Heo <tj@kernel.org>
> Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/aldebaran.c     | 2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c  | 2 +-
>  3 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/aldebaran.c b/drivers/gpu/drm/amd/amdgpu/aldebaran.c
> index 9569dc16dd3d..7957e6c4c416 100644
> --- a/drivers/gpu/drm/amd/amdgpu/aldebaran.c
> +++ b/drivers/gpu/drm/amd/amdgpu/aldebaran.c
> @@ -175,7 +175,7 @@ aldebaran_mode2_perform_reset(struct amdgpu_reset_control *reset_ctl,
>  	list_for_each_entry(tmp_adev, reset_device_list, reset_list) {
>  		/* For XGMI run all resets in parallel to speed up the process */
>  		if (tmp_adev->gmc.xgmi.num_physical_nodes > 1) {
> -			if (!queue_work(system_unbound_wq,
> +			if (!queue_work(system_dfl_wq,
>  					&tmp_adev->reset_cntl->reset_work))
>  				r = -EALREADY;
>  		} else
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 7a899fb4de29..8c4d79f6c14f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -6033,7 +6033,7 @@ int amdgpu_do_asic_reset(struct list_head *device_list_handle,
>  		list_for_each_entry(tmp_adev, device_list_handle, reset_list) {
>  			/* For XGMI run all resets in parallel to speed up the process */
>  			if (tmp_adev->gmc.xgmi.num_physical_nodes > 1) {
> -				if (!queue_work(system_unbound_wq,
> +				if (!queue_work(system_dfl_wq,
>  						&tmp_adev->xgmi_reset_work))
>  					r = -EALREADY;
>  			} else
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
> index 28c4ad62f50e..9c4631608526 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
> @@ -116,7 +116,7 @@ static int amdgpu_reset_xgmi_reset_on_init_perform_reset(
>  	/* Mode1 reset needs to be triggered on all devices together */
>  	list_for_each_entry(tmp_adev, reset_device_list, reset_list) {
>  		/* For XGMI run all resets in parallel to speed up the process */
> -		if (!queue_work(system_unbound_wq, &tmp_adev->xgmi_reset_work))
> +		if (!queue_work(system_dfl_wq, &tmp_adev->xgmi_reset_work))
>  			r = -EALREADY;
>  		if (r) {
>  			dev_err(tmp_adev->dev,

Re: [PATCH 1/4] drm/amdgpu: replace use of system_unbound_wq with system_dfl_wq

Posted by Marco Crivellari 3 months, 1 week ago

On Thu, Oct 30, 2025 at 6:14 PM Christian König
<christian.koenig@amd.com> wrote:
>[...]
> In all the cases below we actually want the work to run on a different CPU than the current one.
>
> So using system_unbound_wq seems to be more appropriate.

Hello Christian,

system_dfl_wq is the new workqueue that will replace
system_unbound_wq, but the behavior is the same.
So, if you need system_unbound_wq, it means system_dfl_wq is fine here.

Thanks!
-- 

Marco Crivellari

L3 Support Engineer, Technology & Product

Re: [PATCH 1/4] drm/amdgpu: replace use of system_unbound_wq with system_dfl_wq

Posted by Christian König 3 months, 1 week ago

On 10/31/25 09:42, Marco Crivellari wrote:
> On Thu, Oct 30, 2025 at 6:14 PM Christian König
> <christian.koenig@amd.com> wrote:
>> [...]
>> In all the cases below we actually want the work to run on a different CPU than the current one.
>>
>> So using system_unbound_wq seems to be more appropriate.
> 
> Hello Christian,
> 
> system_dfl_wq is the new workqueue that will replace
> system_unbound_wq, but the behavior is the same.
> So, if you need system_unbound_wq, it means system_dfl_wq is fine here.

Ah, ok thanks! In that case I'm fine with the change.

It sounded like system_dfl_wq is the new per CPU wq.

Regards,
Christian.

> 
> Thanks!