[RFC PATCH 0/2] Add queue_*() functions and prefer per-cpu workqueue and flag

Marco Crivellari posted 2 patches 1 month, 1 week ago
include/linux/workqueue.h | 108 ++++++++++++++++++++++++++++++++++++++
kernel/workqueue.c        |   6 ++-
2 files changed, 113 insertions(+), 1 deletion(-)
[RFC PATCH 0/2] Add queue_*() functions and prefer per-cpu workqueue and flag
Posted by Marco Crivellari 1 month, 1 week ago
Hi,

The following is part of the Workqueue refactoring, and a first RFC about
the rename of the schedule_*() interfaces along with the introduction of
the "wq prefer per-cpu" workqueue and workqueue flag.

Any feedback is more then welcome!

~~~

More information about the reasons behind the workqueue refactoring can
be found at:

  https://lore.kernel.org/all/20250221112003.1dSuoGyc@linutronix.de/

Actually schedule_work() and schedule_work_on() enqueue works using
system_percpu_wq. The function name doesn't suggest it, on top of that,
only the per-cpu version is present.

Because of that, the following changes are introduced:

- queue_{bound|unbound}_work() as future replacement of schedule_work()

- queue_bound_work_on() as future replacement of schedule_work_on()

- queue_bound_delayed_work() as future replacement of
  schedule_delayed_work()

- queue_unbound_delayed_work() to offer the unbound version

- queue_bound_delayed_work_on() as future replacement of
  schedule_delayed_work_on()

The addition of queue_unbound_delayed_work() is because "delayed" functions
make use of a global timer and that means the work will be executed on the
CPU where the timer fired.

The Workqueue API currently do not distinguish between use case where
locality is important for correctness and where is important for
efficiency. So introduce WQ_PREFER_PERCPU and wq_prefer_percpu_wq, so
that works who need to be per-cpu but don't strictly require it, can
use such workqueue / workqueue flag.


Thanks!

Marco Crivellari (2):
  workqueue: Add queue_*() functions, future schedule_*() replacement
  workqueue: Add WQ_PREFER_PERCPU and system_prefer_percpu_wq

 include/linux/workqueue.h | 108 ++++++++++++++++++++++++++++++++++++++
 kernel/workqueue.c        |   6 ++-
 2 files changed, 113 insertions(+), 1 deletion(-)

-- 
2.53.0
Re: [RFC PATCH 0/2] Add queue_*() functions and prefer per-cpu workqueue and flag
Posted by Tejun Heo 1 month, 1 week ago
(cc'ing Breno)

Hello,

On Tue, May 05, 2026 at 06:16:56PM +0200, Marco Crivellari wrote:
> Actually schedule_work() and schedule_work_on() enqueue works using
> system_percpu_wq. The function name doesn't suggest it, on top of that,
> only the per-cpu version is present.

I was hoping to just retire schedule_work[_on]() and let people use e.g.
system_percpu_wq directly. Is that too verbose for casual users?

> Because of that, the following changes are introduced:
> 
> - queue_{bound|unbound}_work() as future replacement of schedule_work()

If we do this, I think "percpu" is a lot clearer than "bound". percpu <->
(nothing) combination would be nice eventually but maybe that's too
confusing now. Does percpu <-> unbound combination sound weird?

...
> The Workqueue API currently do not distinguish between use case where
> locality is important for correctness and where is important for
> efficiency. So introduce WQ_PREFER_PERCPU and wq_prefer_percpu_wq, so
> that works who need to be per-cpu but don't strictly require it, can
> use such workqueue / workqueue flag.

What's requested through WQ_PREFER_PERCPU is simliar to what WQ_AFFN_CPU
does, so that might just work out. The only problem is that WQ_AFFN_CPU will
create nr_cpus workers to populate the per-cpu pods on boot. Maybe that's
not a problem if this gets used widely.

Thanks.

-- 
tejun
Re: [RFC PATCH 0/2] Add queue_*() functions and prefer per-cpu workqueue and flag
Posted by Breno Leitao 1 month, 1 week ago
On Tue, May 05, 2026 at 10:18:49AM -1000, Tejun Heo wrote:
> (cc'ing Breno)

Thanks!

> On Tue, May 05, 2026 at 06:16:56PM +0200, Marco Crivellari wrote:
> > Actually schedule_work() and schedule_work_on() enqueue works using
> > system_percpu_wq. The function name doesn't suggest it, on top of that,
> > only the per-cpu version is present.
> 
> I was hoping to just retire schedule_work[_on]() and let people use e.g.
> system_percpu_wq directly. Is that too verbose for casual users?

I think schedule_work() doesn't help much, and makes the system a bit harder to
understand. When I started reading this code, I would have preferred to see
queue_work(system_percpu_wq, work) instead of schedule_work(work).

In fact, I suspect this patchset exists partly because we have the
schedule_work() helper.

Would this proposal exist if schedule_work() had never been added?

> > Because of that, the following changes are introduced:
> > 
> > - queue_{bound|unbound}_work() as future replacement of schedule_work()
> 
> If we do this, I think "percpu" is a lot clearer than "bound". percpu <->
> (nothing) combination would be nice eventually but maybe that's too
> confusing now. Does percpu <-> unbound combination sound weird?

Would percpu <-> global sound less weird?

> > The Workqueue API currently do not distinguish between use case where
> > locality is important for correctness and where is important for
> > efficiency.

If you enqueue work to system_unbound_wq with the default affinitization, you
already get locality (WQ_AFFN_CACHE groups CPUs sharing the same LLC). This is
the way to say that locality is important for efficiency, anbd the WQ_AFFN_CPU
is the way to specify that locality is important for correctness. 

On top of that, WQ_AFFN_SYSTEM is a way to specify that locality is not
necessary at all.

Also, how WQ_PREFER_PERCPU behaves differently from WQ_AFFN_CPU?

Thanks for the RFC,
--breno
Re: [RFC PATCH 0/2] Add queue_*() functions and prefer per-cpu workqueue and flag
Posted by Marco Crivellari 1 month, 1 week ago
On Wed, May 6, 2026 at 3:40 PM Breno Leitao <leitao@debian.org> wrote:
>
> On Tue, May 05, 2026 at 10:18:49AM -1000, Tejun Heo wrote:
> > (cc'ing Breno)

Thanks Tejun and Breno, for your feedbacks.

> > On Tue, May 05, 2026 at 06:16:56PM +0200, Marco Crivellari wrote:
> > > Actually schedule_work() and schedule_work_on() enqueue works using
> > > system_percpu_wq. The function name doesn't suggest it, on top of that,
> > > only the per-cpu version is present.
> >
> > I was hoping to just retire schedule_work[_on]() and let people use e.g.
> > system_percpu_wq directly. Is that too verbose for casual users?
>
> I think schedule_work() doesn't help much, and makes the system a bit harder to
> understand. When I started reading this code, I would have preferred to see
> queue_work(system_percpu_wq, work) instead of schedule_work(work).
>
> In fact, I suspect this patchset exists partly because we have the
> schedule_work() helper.

Yes, correct. Perhaps retiring schedule_work[_on](), as Tejun
suggested, would be the easiest way indeed.

I proposed this in light of our next step (which I would say is the
first): ensuring that every schedule_work() really needs to use the
per-cpu workqueue and offloads work that can be unbound to the unbound
workqueue.

So, either we're going to have an "unbound" version or we use
queue_work() directly that sounds good to me. I guess retire - in
future - schedule_work[_on]() would be cleaner: so that users must
also specify the workqueue they really need to use.

> > > Because of that, the following changes are introduced:
> > >
> > > - queue_{bound|unbound}_work() as future replacement of schedule_work()
> >
> > If we do this, I think "percpu" is a lot clearer than "bound". percpu <->
> > (nothing) combination would be nice eventually but maybe that's too
> > confusing now. Does percpu <-> unbound combination sound weird?
>
> Would percpu <-> global sound less weird?

Now that I read your inputs, if we rename, perhaps we can keep the
current abbreviation for unbound ("dfl") to avoid introducing
something new.

What do you both think about:

- queue_percpu_work()
- queue_dfl_work()

?

They somehow reflect the newly introduced system_dfl_wq and system_percpu_wq.

> > > The Workqueue API currently do not distinguish between use case where
> > > locality is important for correctness and where is important for
> > > efficiency.
>
> If you enqueue work to system_unbound_wq with the default affinitization, you
> already get locality (WQ_AFFN_CACHE groups CPUs sharing the same LLC). This is
> the way to say that locality is important for efficiency, anbd the WQ_AFFN_CPU
> is the way to specify that locality is important for correctness.
>
> On top of that, WQ_AFFN_SYSTEM is a way to specify that locality is not
> necessary at all.
>
> Also, how WQ_PREFER_PERCPU behaves differently from WQ_AFFN_CPU?

Let me share where this was discussed a year ago:

https://lore.kernel.org/all/Z79E_gbWm9j9bkfR@slm.duckdns.org/

Perhaps - likely - I haven't understood the WQ_PREFER_PERCPU proposal
here; I thought it was a workqueue flag, to be used like WQ_PERCPU or
WQ_UNBOUND.
Reading Tejun's reply is also clearer now.

Anyhow, this idea is based on customer reports I've seen previously.
We noticed that with certain workloads, specific per-cpu work creates
noise on isolated CPUs. With a flag like that we can identify which
workqueues prefer to be per-cpu and *not* for correctness. This allows
using a boot parameter / sysctl, for example, to keep those workqueues
affined only to housekeeping CPUs.

Of course, if we can achieve the same with a system workqueue (like
system_prefer_percpu_wq), that would also be fine. I think it would be
way easier, it should be similar to what we're doing with
system_power_efficient_wq [1].

Tejun, Breno (and others), what do you think? Bad idea? :-)

Thanks!

- [1] https://elixir.bootlin.com/linux/v7.0.1/source/kernel/workqueue.c#L7907

--

Marco Crivellari

SUSE Labs
Re: [RFC PATCH 0/2] Add queue_*() functions and prefer per-cpu workqueue and flag
Posted by Tejun Heo 1 month ago
Hello,

On Thu, May 07, 2026 at 12:25:30PM +0200, Marco Crivellari wrote:
> So, either we're going to have an "unbound" version or we use
> queue_work() directly that sounds good to me. I guess retire - in
> future - schedule_work[_on]() would be cleaner: so that users must
> also specify the workqueue they really need to use.

Yeah, retiring would be my preference if we need to update them anyway. I
don't think the thin wrappers add anything useful.

> What do you both think about:
> 
> - queue_percpu_work()
> - queue_dfl_work()

But if were to keep the wrappers, yeah, these are better names.

> Let me share where this was discussed a year ago:
> 
> https://lore.kernel.org/all/Z79E_gbWm9j9bkfR@slm.duckdns.org/
> 
> Perhaps - likely - I haven't understood the WQ_PREFER_PERCPU proposal
> here; I thought it was a workqueue flag, to be used like WQ_PERCPU or
> WQ_UNBOUND.
> Reading Tejun's reply is also clearer now.

Yeah, that was what was discussed then.

> Anyhow, this idea is based on customer reports I've seen previously.
> We noticed that with certain workloads, specific per-cpu work creates
> noise on isolated CPUs. With a flag like that we can identify which
> workqueues prefer to be per-cpu and *not* for correctness. This allows
> using a boot parameter / sysctl, for example, to keep those workqueues
> affined only to housekeeping CPUs.
> 
> Of course, if we can achieve the same with a system workqueue (like
> system_prefer_percpu_wq), that would also be fine. I think it would be
> way easier, it should be similar to what we're doing with
> system_power_efficient_wq [1].

WQ_AFFN_CPU is more flexible as the tasks aren't pinned to the CPU but there
may be downsides:

- Concurrency management isn't available.

- Would create more kworkers.

Maybe the original plan can be adapted to:

- Add WQ_PERFER_PERCPU as discussed before.

- At boot time, allow selecting whether to back them with percpu wqs or
  WQ_AFFN_X unbound ones. Maybe we can even experiment with default to
  WQ_AFFN_CPU.

Thanks.

-- 
tejun
Re: [RFC PATCH 0/2] Add queue_*() functions and prefer per-cpu workqueue and flag
Posted by Marco Crivellari 1 month ago
Hello,

On Thu, May 7, 2026 at 11:27 PM Tejun Heo <tj@kernel.org> wrote:
>
> Hello,
>
> On Thu, May 07, 2026 at 12:25:30PM +0200, Marco Crivellari wrote:
> > So, either we're going to have an "unbound" version or we use
> > queue_work() directly that sounds good to me. I guess retire - in
> > future - schedule_work[_on]() would be cleaner: so that users must
> > also specify the workqueue they really need to use.
>
> Yeah, retiring would be my preference if we need to update them anyway. I
> don't think the thin wrappers add anything useful.

Fine, this sounds good for me if we do it that way.

> WQ_AFFN_CPU is more flexible as the tasks aren't pinned to the CPU but there
> may be downsides:
>
> - Concurrency management isn't available.
>
> - Would create more kworkers.
>
> Maybe the original plan can be adapted to:
>
> - Add WQ_PERFER_PERCPU as discussed before.
>
> - At boot time, allow selecting whether to back them with percpu wqs or
>   WQ_AFFN_X unbound ones. Maybe we can even experiment with default to
>   WQ_AFFN_CPU.

Cool, maybe I will ask something else when the time comes!

Thanks!
--

Marco Crivellari

SUSE Labs
Re: [RFC PATCH 0/2] Add queue_*() functions and prefer per-cpu workqueue and flag
Posted by Frederic Weisbecker 1 month ago
Le Thu, May 07, 2026 at 11:27:52AM -1000, Tejun Heo a écrit :
> > Anyhow, this idea is based on customer reports I've seen previously.
> > We noticed that with certain workloads, specific per-cpu work creates
> > noise on isolated CPUs. With a flag like that we can identify which
> > workqueues prefer to be per-cpu and *not* for correctness. This allows
> > using a boot parameter / sysctl, for example, to keep those workqueues
> > affined only to housekeeping CPUs.
> > 
> > Of course, if we can achieve the same with a system workqueue (like
> > system_prefer_percpu_wq), that would also be fine. I think it would be
> > way easier, it should be similar to what we're doing with
> > system_power_efficient_wq [1].
> 
> WQ_AFFN_CPU is more flexible as the tasks aren't pinned to the CPU but there
> may be downsides:
> 
> - Concurrency management isn't available.
> 
> - Would create more kworkers.
> 
> Maybe the original plan can be adapted to:
> 
> - Add WQ_PERFER_PERCPU as discussed before.
> 
> - At boot time, allow selecting whether to back them with percpu wqs or
>   WQ_AFFN_X unbound ones. Maybe we can even experiment with default to
>   WQ_AFFN_CPU.

Isn't WQ_POWER_EFFICIENT enough for what we want here? ie: it does a per-cpu
preference except when some config is enabled or isolation is on. It could be
renamed to WQ_PREFER_PERCPU to generalize its meaning for more than just power
purposes.

Thanks.

-- 
Frederic Weisbecker
SUSE Labs
Re: [RFC PATCH 0/2] Add queue_*() functions and prefer per-cpu workqueue and flag
Posted by Tejun Heo 1 month ago
Hello,

On Fri, May 08, 2026 at 02:09:20PM +0200, Frederic Weisbecker wrote:
> Isn't WQ_POWER_EFFICIENT enough for what we want here? ie: it does a per-cpu
> preference except when some config is enabled or isolation is on. It could be
> renamed to WQ_PREFER_PERCPU to generalize its meaning for more than just power
> purposes.

That may satisfy the minimum requirement but I think it'd be a shame if we
do all the work and still leave the semantics overloaded, which was the
initial problem to begin with. I really want the intent of each specific
selection expressed unambigiously.

Besides, even outside of isolation use cases, having relaxed affinity can be
useful as it gives the scheduler more leeway in placement decisions. e.g.
There's no real downsides to running such work item on SMT pair or maybe
that CPU is particularly overloaded due to net irq and rx processing and
some work items are better off running on another CPU in the same LLC and so
on. If we can manage so without causing perf issues, I want the default to
be soft affinity, not a hard one.

Thanks.

-- 
tejun