[v4] replace system_unbound_wq, add WQ_PERCPU to alloc_workqueue

[PATCH v4 0/2] replace system_unbound_wq, add WQ_PERCPU to alloc_workqueue

Posted by Marco Crivellari 5 days, 4 hours ago

Hi,

=== Current situation: problems ===

Let's consider a nohz_full system with isolated CPUs: wq_unbound_cpumask is
set to the housekeeping CPUs, for !WQ_UNBOUND the local CPU is selected.

This leads to different scenarios if a work item is scheduled on an
isolated CPU where "delay" value is 0 or greater then 0:
        schedule_delayed_work(, 0);

This will be handled by __queue_work() that will queue the work item on the
current local (isolated) CPU, while:

        schedule_delayed_work(, 1);

Will move the timer on an housekeeping CPU, and schedule the work there.

Currently if a user enqueue a work item using schedule_delayed_work() the
used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.

This lack of consistency cannot be addressed without refactoring the API.

=== Recent changes to the WQ API ===

The following, address the recent changes in the Workqueue API:

- commit 128ea9f6ccfb ("workqueue: Add system_percpu_wq and system_dfl_wq")
- commit 930c2ea566af ("workqueue: Add new WQ_PERCPU flag")

The old workqueues will be removed in a future release cycle.

=== Introduced Changes by this series ===

1) [P 1]  Replace uses of system_unbound_wq

    system_unbound_wq is to be used when locality is not required.

    Because of that, system_unbound_wq has been replaced with
    system_dfl_wq, to make sure this would be the default choice
    when locality is not important.

    system_dfl_wq behave like system_unbound_wq.


2) [P 2] add WQ_PERCPU to alloc_workqueue()

    This change adds a new WQ_PERCPU flag to explicitly request
    alloc_workqueue() to be per-cpu when WQ_UNBOUND has not been specified.

    The behavior is the same.

Thanks!

---
Changes in v4:
- rebased on drm-xe

Changes in v3:
- rebased on v6.19-rc6 (on master specifically)

- commit logs improved

Changes in v2:
- rebased on v6.18-rc4.

- commit logs integrated with the appropriate workqueue API commit hash.


Marco Crivellari (2):
  drm/xe: replace use of system_unbound_wq with system_dfl_wq
  drm/xe: add WQ_PERCPU to alloc_workqueue users

 drivers/gpu/drm/xe/xe_devcoredump.c     | 2 +-
 drivers/gpu/drm/xe/xe_device.c          | 4 ++--
 drivers/gpu/drm/xe/xe_execlist.c        | 2 +-
 drivers/gpu/drm/xe/xe_ggtt.c            | 2 +-
 drivers/gpu/drm/xe/xe_guc_ct.c          | 4 ++--
 drivers/gpu/drm/xe/xe_hw_engine_group.c | 3 ++-
 drivers/gpu/drm/xe/xe_oa.c              | 2 +-
 drivers/gpu/drm/xe/xe_sriov.c           | 2 +-
 drivers/gpu/drm/xe/xe_vm.c              | 4 ++--
 9 files changed, 13 insertions(+), 12 deletions(-)

-- 
2.52.0

Re: [PATCH v4 0/2] replace system_unbound_wq, add WQ_PERCPU to alloc_workqueue

Posted by Rodrigo Vivi 4 days, 23 hours ago

On Mon, Feb 02, 2026 at 11:37:54AM +0100, Marco Crivellari wrote:
> Hi,
> 
> === Current situation: problems ===
> 
> Let's consider a nohz_full system with isolated CPUs: wq_unbound_cpumask is
> set to the housekeeping CPUs, for !WQ_UNBOUND the local CPU is selected.
> 
> This leads to different scenarios if a work item is scheduled on an
> isolated CPU where "delay" value is 0 or greater then 0:
>         schedule_delayed_work(, 0);
> 
> This will be handled by __queue_work() that will queue the work item on the
> current local (isolated) CPU, while:
> 
>         schedule_delayed_work(, 1);
> 
> Will move the timer on an housekeeping CPU, and schedule the work there.
> 
> Currently if a user enqueue a work item using schedule_delayed_work() the
> used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
> WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
> schedule_work() that is using system_wq and queue_work(), that makes use
> again of WORK_CPU_UNBOUND.
> 
> This lack of consistency cannot be addressed without refactoring the API.
> 
> === Recent changes to the WQ API ===
> 
> The following, address the recent changes in the Workqueue API:
> 
> - commit 128ea9f6ccfb ("workqueue: Add system_percpu_wq and system_dfl_wq")
> - commit 930c2ea566af ("workqueue: Add new WQ_PERCPU flag")
> 
> The old workqueues will be removed in a future release cycle.
> 
> === Introduced Changes by this series ===
> 
> 1) [P 1]  Replace uses of system_unbound_wq
> 
>     system_unbound_wq is to be used when locality is not required.
> 
>     Because of that, system_unbound_wq has been replaced with
>     system_dfl_wq, to make sure this would be the default choice
>     when locality is not important.
> 
>     system_dfl_wq behave like system_unbound_wq.
> 
> 
> 2) [P 2] add WQ_PERCPU to alloc_workqueue()
> 
>     This change adds a new WQ_PERCPU flag to explicitly request
>     alloc_workqueue() to be per-cpu when WQ_UNBOUND has not been specified.
> 
>     The behavior is the same.
> 
> Thanks!
> 
> ---
> Changes in v4:
> - rebased on drm-xe

series is

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

I just resent it for CI and will push to drm-xe-next as soon as I get
the greenlight from CI.

Thanks,
Rodrigo.

> 
> Changes in v3:
> - rebased on v6.19-rc6 (on master specifically)
> 
> - commit logs improved
> 
> Changes in v2:
> - rebased on v6.18-rc4.
> 
> - commit logs integrated with the appropriate workqueue API commit hash.
> 
> 
> Marco Crivellari (2):
>   drm/xe: replace use of system_unbound_wq with system_dfl_wq
>   drm/xe: add WQ_PERCPU to alloc_workqueue users
> 
>  drivers/gpu/drm/xe/xe_devcoredump.c     | 2 +-
>  drivers/gpu/drm/xe/xe_device.c          | 4 ++--
>  drivers/gpu/drm/xe/xe_execlist.c        | 2 +-
>  drivers/gpu/drm/xe/xe_ggtt.c            | 2 +-
>  drivers/gpu/drm/xe/xe_guc_ct.c          | 4 ++--
>  drivers/gpu/drm/xe/xe_hw_engine_group.c | 3 ++-
>  drivers/gpu/drm/xe/xe_oa.c              | 2 +-
>  drivers/gpu/drm/xe/xe_sriov.c           | 2 +-
>  drivers/gpu/drm/xe/xe_vm.c              | 4 ++--
>  9 files changed, 13 insertions(+), 12 deletions(-)
> 
> -- 
> 2.52.0
>

Re: [PATCH v4 0/2] replace system_unbound_wq, add WQ_PERCPU to alloc_workqueue

Posted by Marco Crivellari 4 days, 23 hours ago

On Mon, Feb 2, 2026 at 5:21 PM Rodrigo Vivi <rodrigo.vivi@intel.com> wrote:
> [...]
> series is
>
> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
>
> I just resent it for CI and will push to drm-xe-next as soon as I get
> the greenlight from CI.
>

Many thanks Rodrigo!

-- 

Marco Crivellari

L3 Support Engineer

Re: [PATCH v4 0/2] replace system_unbound_wq, add WQ_PERCPU to alloc_workqueue

Posted by Rodrigo Vivi 4 days, 15 hours ago

On Mon, Feb 02, 2026 at 05:22:05PM +0100, Marco Crivellari wrote:
> On Mon, Feb 2, 2026 at 5:21 PM Rodrigo Vivi <rodrigo.vivi@intel.com> wrote:
> > [...]
> > series is
> >
> > Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> >
> > I just resent it for CI and will push to drm-xe-next as soon as I get
> > the greenlight from CI.
> >
> 
> Many thanks Rodrigo!

Thank you. Pushed to drm-xe-next.

> 
> -- 
> 
> Marco Crivellari
> 
> L3 Support Engineer