kernel/workqueue.c | 217 ++++++++++++++++++++++++++++++++++++++------- 1 file changed, 183 insertions(+), 34 deletions(-)
Ordered workqueues does not currently follow changes made to the
global unbound cpumask because per-pool workqueue changes may break
the ordering guarantee. IOW, a work function in an ordered workqueue
may run on a cpuset isolated CPU.
This series enables ordered workqueues to follow changes made to the
global unbound cpumask by temporaily saving the work items in an
internal queue until the old pwq has been properly flushed and to be
freed. At that point, those work items, if present, are queued back to
the new pwq to be executed.
Waiman Long (3):
workqueue: Skip __WQ_DESTROYING workqueues when updating global
unbound cpumask
workqueue: Break out __queue_work_rcu_locked() from __queue_work()
workqueue: Enable unbound cpumask update on ordered workqueues
kernel/workqueue.c | 217 ++++++++++++++++++++++++++++++++++++++-------
1 file changed, 183 insertions(+), 34 deletions(-)
--
2.39.3
Hi Waiman, Thanks for working on this! On 30/01/24 13:33, Waiman Long wrote: > Ordered workqueues does not currently follow changes made to the > global unbound cpumask because per-pool workqueue changes may break > the ordering guarantee. IOW, a work function in an ordered workqueue > may run on a cpuset isolated CPU. > > This series enables ordered workqueues to follow changes made to the > global unbound cpumask by temporaily saving the work items in an > internal queue until the old pwq has been properly flushed and to be > freed. At that point, those work items, if present, are queued back to > the new pwq to be executed. I took it for a quick first spin (on top of wq/for-6.9) and this is what I'm seeing. Let's take edac-poller ordered wq, as the behavior seems to be the same for the rest. Initially we have (using wq_dump.py) wq_unbound_cpumask=0xffffffff 000000ff ... pool[80] ref= 44 nice= 0 idle/workers= 2/ 2 cpus=0xffffffff 000000ff pod_cpus=0xffffffff 000000ff ... edac-poller ordered 80 80 80 80 80 80 80 80 ... ... edac-poller 0xffffffff 000000ff 345 0xffffffff 000000ff after I # echo 3 >/sys/devices/virtual/workqueue/cpumask I get wq_unbound_cpumask=00000003 ... pool[86] ref= 44 nice= 0 idle/workers= 2/ 2 cpus=00000003 pod_cpus=00000003 ... edac-poller ordered 86 86 86 86 86 86 86 86 86 86 ... ... edac-poller 0xffffffff 000000ff 345 0xffffffff 000000ff So, IIUC, the pool and wq -> pool settings are updated correctly, but the wq.unbound_cpus (and its associated rescure affinity) are left untouched. Is this expected or are we maybe still missing an additional step? Best, Juri
On 1/31/24 08:01, Juri Lelli wrote: > Hi Waiman, > > Thanks for working on this! > > On 30/01/24 13:33, Waiman Long wrote: >> Ordered workqueues does not currently follow changes made to the >> global unbound cpumask because per-pool workqueue changes may break >> the ordering guarantee. IOW, a work function in an ordered workqueue >> may run on a cpuset isolated CPU. >> >> This series enables ordered workqueues to follow changes made to the >> global unbound cpumask by temporaily saving the work items in an >> internal queue until the old pwq has been properly flushed and to be >> freed. At that point, those work items, if present, are queued back to >> the new pwq to be executed. > I took it for a quick first spin (on top of wq/for-6.9) and this is what > I'm seeing. > > Let's take edac-poller ordered wq, as the behavior seems to be the same > for the rest. > > Initially we have (using wq_dump.py) > > wq_unbound_cpumask=0xffffffff 000000ff > ... > pool[80] ref= 44 nice= 0 idle/workers= 2/ 2 cpus=0xffffffff 000000ff pod_cpus=0xffffffff 000000ff > ... > edac-poller ordered 80 80 80 80 80 80 80 80 ... > ... > edac-poller 0xffffffff 000000ff 345 0xffffffff 000000ff > > after I > > # echo 3 >/sys/devices/virtual/workqueue/cpumask > > I get > > wq_unbound_cpumask=00000003 > ... > pool[86] ref= 44 nice= 0 idle/workers= 2/ 2 cpus=00000003 pod_cpus=00000003 > ... > edac-poller ordered 86 86 86 86 86 86 86 86 86 86 ... > ... > edac-poller 0xffffffff 000000ff 345 0xffffffff 000000ff > > So, IIUC, the pool and wq -> pool settings are updated correctly, but > the wq.unbound_cpus (and its associated rescure affinity) are left > untouched. Is this expected or are we maybe still missing an additional > step? Isn't this what the 4th patch of your RFC workqueue patch series does? https://lore.kernel.org/lkml/20240116161929.232885-5-juri.lelli@redhat.com/ The focus of this series is to make sure that we can update the pool cpumask of ordered workqueue to follow changes in global unbound workqueue cpumask. So I haven't touched anything related to rescuer at all. I will include your 4th patch in the next version of this series. Cheers, Longman
On 31/01/24 10:31, Waiman Long wrote: > > On 1/31/24 08:01, Juri Lelli wrote: > > Hi Waiman, > > > > Thanks for working on this! > > > > On 30/01/24 13:33, Waiman Long wrote: > > > Ordered workqueues does not currently follow changes made to the > > > global unbound cpumask because per-pool workqueue changes may break > > > the ordering guarantee. IOW, a work function in an ordered workqueue > > > may run on a cpuset isolated CPU. > > > > > > This series enables ordered workqueues to follow changes made to the > > > global unbound cpumask by temporaily saving the work items in an > > > internal queue until the old pwq has been properly flushed and to be > > > freed. At that point, those work items, if present, are queued back to > > > the new pwq to be executed. > > I took it for a quick first spin (on top of wq/for-6.9) and this is what > > I'm seeing. > > > > Let's take edac-poller ordered wq, as the behavior seems to be the same > > for the rest. > > > > Initially we have (using wq_dump.py) > > > > wq_unbound_cpumask=0xffffffff 000000ff > > ... > > pool[80] ref= 44 nice= 0 idle/workers= 2/ 2 cpus=0xffffffff 000000ff pod_cpus=0xffffffff 000000ff > > ... > > edac-poller ordered 80 80 80 80 80 80 80 80 ... > > ... > > edac-poller 0xffffffff 000000ff 345 0xffffffff 000000ff > > > > after I > > > > # echo 3 >/sys/devices/virtual/workqueue/cpumask > > > > I get > > > > wq_unbound_cpumask=00000003 > > ... > > pool[86] ref= 44 nice= 0 idle/workers= 2/ 2 cpus=00000003 pod_cpus=00000003 > > ... > > edac-poller ordered 86 86 86 86 86 86 86 86 86 86 ... > > ... > > edac-poller 0xffffffff 000000ff 345 0xffffffff 000000ff > > > > So, IIUC, the pool and wq -> pool settings are updated correctly, but > > the wq.unbound_cpus (and its associated rescure affinity) are left > > untouched. Is this expected or are we maybe still missing an additional > > step? > > Isn't this what the 4th patch of your RFC workqueue patch series does? > > https://lore.kernel.org/lkml/20240116161929.232885-5-juri.lelli@redhat.com/ > > The focus of this series is to make sure that we can update the pool cpumask > of ordered workqueue to follow changes in global unbound workqueue cpumask. > So I haven't touched anything related to rescuer at all. My patch only uses the wq->unbound_attrs->cpumask to change the associated rescuer cpumask, but I don't think your series modifies the former? Thanks, Juri
On 2/1/24 05:18, Juri Lelli wrote: > On 31/01/24 10:31, Waiman Long wrote: >> On 1/31/24 08:01, Juri Lelli wrote: >>> Hi Waiman, >>> >>> Thanks for working on this! >>> >>> On 30/01/24 13:33, Waiman Long wrote: >>>> Ordered workqueues does not currently follow changes made to the >>>> global unbound cpumask because per-pool workqueue changes may break >>>> the ordering guarantee. IOW, a work function in an ordered workqueue >>>> may run on a cpuset isolated CPU. >>>> >>>> This series enables ordered workqueues to follow changes made to the >>>> global unbound cpumask by temporaily saving the work items in an >>>> internal queue until the old pwq has been properly flushed and to be >>>> freed. At that point, those work items, if present, are queued back to >>>> the new pwq to be executed. >>> I took it for a quick first spin (on top of wq/for-6.9) and this is what >>> I'm seeing. >>> >>> Let's take edac-poller ordered wq, as the behavior seems to be the same >>> for the rest. >>> >>> Initially we have (using wq_dump.py) >>> >>> wq_unbound_cpumask=0xffffffff 000000ff >>> ... >>> pool[80] ref= 44 nice= 0 idle/workers= 2/ 2 cpus=0xffffffff 000000ff pod_cpus=0xffffffff 000000ff >>> ... >>> edac-poller ordered 80 80 80 80 80 80 80 80 ... >>> ... >>> edac-poller 0xffffffff 000000ff 345 0xffffffff 000000ff >>> >>> after I >>> >>> # echo 3 >/sys/devices/virtual/workqueue/cpumask >>> >>> I get >>> >>> wq_unbound_cpumask=00000003 >>> ... >>> pool[86] ref= 44 nice= 0 idle/workers= 2/ 2 cpus=00000003 pod_cpus=00000003 >>> ... >>> edac-poller ordered 86 86 86 86 86 86 86 86 86 86 ... >>> ... >>> edac-poller 0xffffffff 000000ff 345 0xffffffff 000000ff >>> >>> So, IIUC, the pool and wq -> pool settings are updated correctly, but >>> the wq.unbound_cpus (and its associated rescure affinity) are left >>> untouched. Is this expected or are we maybe still missing an additional >>> step? >> Isn't this what the 4th patch of your RFC workqueue patch series does? >> >> https://lore.kernel.org/lkml/20240116161929.232885-5-juri.lelli@redhat.com/ >> >> The focus of this series is to make sure that we can update the pool cpumask >> of ordered workqueue to follow changes in global unbound workqueue cpumask. >> So I haven't touched anything related to rescuer at all. > My patch only uses the wq->unbound_attrs->cpumask to change the > associated rescuer cpumask, but I don't think your series modifies the > former? I don't think so. The calling sequence of apply_wqattrs_prepare() and apply_wqattrs_commit() will copy unbound_cpumask into ctx->attrs which is copied into unbound_attrs. So unbound_attrs->cpumask should reflect the new global unbound cpumask. This code is there all along. The only difference is that ordered workqueues were skipped for unbound cpumask update before. This patch series now includes those ordered workqueues when the unbound cpumask is updated. Cheers, Longman
On 01/02/24 09:28, Waiman Long wrote: > On 2/1/24 05:18, Juri Lelli wrote: > > On 31/01/24 10:31, Waiman Long wrote: ... > > My patch only uses the wq->unbound_attrs->cpumask to change the > > associated rescuer cpumask, but I don't think your series modifies the > > former? > > I don't think so. The calling sequence of apply_wqattrs_prepare() and > apply_wqattrs_commit() will copy unbound_cpumask into ctx->attrs which is > copied into unbound_attrs. So unbound_attrs->cpumask should reflect the new > global unbound cpumask. This code is there all along. Indeed. I believe this is what my 3/4 [1] was trying to cure, though. I still think that with current code the new_attr->cpumask gets first correctly initialized considering unbound_cpumask apply_wqattrs_prepare -> copy_workqueue_attrs(new_attrs, attrs); wqattrs_actualize_cpumask(new_attrs, unbound_cpumask); but then overwritten further below using cpu_possible_mask apply_wqattrs_prepare -> copy_workqueue_attrs(new_attrs, attrs); cpumask_and(new_attrs->cpumask, new_attrs->cpumask, cpu_possible_mask); operation that I honestly seem to still fail to grasp why we need to do. :) In the end we commit that last (overwritten) cpumask apply_wqattrs_commit -> copy_workqueue_attrs(ctx->wq->unbound_attrs, ctx->attrs); Now, my patch was wrong, as you pointed out, as it wasn't taking into consideration the ordering guarantee. I thought maybe your changes (plus and additional change to the above?) might fix the problem correctly. Best, Juri 1 - https://lore.kernel.org/lkml/20240116161929.232885-4-juri.lelli@redhat.com/
Hello, On Fri, Feb 02, 2024 at 03:55:15PM +0100, Juri Lelli wrote: > Indeed. I believe this is what my 3/4 [1] was trying to cure, though. I > still think that with current code the new_attr->cpumask gets first > correctly initialized considering unbound_cpumask > > apply_wqattrs_prepare -> > copy_workqueue_attrs(new_attrs, attrs); > wqattrs_actualize_cpumask(new_attrs, unbound_cpumask); > > but then overwritten further below using cpu_possible_mask > > apply_wqattrs_prepare -> > copy_workqueue_attrs(new_attrs, attrs); > cpumask_and(new_attrs->cpumask, new_attrs->cpumask, cpu_possible_mask); > > operation that I honestly seem to still fail to grasp why we need to do. > :) So, imagine the following scenario on a system with four CPUs: 1. Initially both wq_unbound_cpumask and wq A's cpumask are 0xf. 2. wq_unbound_cpumask is set to 0x3. A's effective is 0x3. 3. A's cpumask is set to 0xe, A's effective is 0x3. 4. wq_unbound_cpumask is restore to 0xf. A's effective should become 0xe. The reason why we're saving what user requested rather than effective is to be able to do #4 so that the effective is always what's currently allowed from what the user specified for the workqueue. Now, if you want the current effective cpumask, that always coincides with the workqueue's dfl_pwq's __pod_cpumask and if you look at the current wq/for-6.9 branch, that's accessible through unbound_effective_cpumask() helper. Thanks. -- tejun
On 2/2/24 12:07, Tejun Heo wrote: > Hello, > > On Fri, Feb 02, 2024 at 03:55:15PM +0100, Juri Lelli wrote: >> Indeed. I believe this is what my 3/4 [1] was trying to cure, though. I >> still think that with current code the new_attr->cpumask gets first >> correctly initialized considering unbound_cpumask >> >> apply_wqattrs_prepare -> >> copy_workqueue_attrs(new_attrs, attrs); >> wqattrs_actualize_cpumask(new_attrs, unbound_cpumask); >> >> but then overwritten further below using cpu_possible_mask >> >> apply_wqattrs_prepare -> >> copy_workqueue_attrs(new_attrs, attrs); >> cpumask_and(new_attrs->cpumask, new_attrs->cpumask, cpu_possible_mask); >> >> operation that I honestly seem to still fail to grasp why we need to do. >> :) > So, imagine the following scenario on a system with four CPUs: > > 1. Initially both wq_unbound_cpumask and wq A's cpumask are 0xf. > > 2. wq_unbound_cpumask is set to 0x3. A's effective is 0x3. > > 3. A's cpumask is set to 0xe, A's effective is 0x3. > > 4. wq_unbound_cpumask is restore to 0xf. A's effective should become 0xe. > > The reason why we're saving what user requested rather than effective is to > be able to do #4 so that the effective is always what's currently allowed > from what the user specified for the workqueue. > > Now, if you want the current effective cpumask, that always coincides with > the workqueue's dfl_pwq's __pod_cpumask and if you look at the current > wq/for-6.9 branch, that's accessible through unbound_effective_cpumask() > helper. Thank for the explanation, we will use the new unbound_effective_cpumask() helper. It does look like there is a major restructuring of the workqueue code in 6.9. I will adapt my patch series to be based on the for-6.9 branch. Cheers, Longman
On 02/02/24 14:03, Waiman Long wrote: > On 2/2/24 12:07, Tejun Heo wrote: > > Hello, > > > > On Fri, Feb 02, 2024 at 03:55:15PM +0100, Juri Lelli wrote: > > > Indeed. I believe this is what my 3/4 [1] was trying to cure, though. I > > > still think that with current code the new_attr->cpumask gets first > > > correctly initialized considering unbound_cpumask > > > > > > apply_wqattrs_prepare -> > > > copy_workqueue_attrs(new_attrs, attrs); > > > wqattrs_actualize_cpumask(new_attrs, unbound_cpumask); > > > > > > but then overwritten further below using cpu_possible_mask > > > > > > apply_wqattrs_prepare -> > > > copy_workqueue_attrs(new_attrs, attrs); > > > cpumask_and(new_attrs->cpumask, new_attrs->cpumask, cpu_possible_mask); > > > > > > operation that I honestly seem to still fail to grasp why we need to do. > > > :) > > So, imagine the following scenario on a system with four CPUs: > > > > 1. Initially both wq_unbound_cpumask and wq A's cpumask are 0xf. > > > > 2. wq_unbound_cpumask is set to 0x3. A's effective is 0x3. > > > > 3. A's cpumask is set to 0xe, A's effective is 0x3. > > > > 4. wq_unbound_cpumask is restore to 0xf. A's effective should become 0xe. > > > > The reason why we're saving what user requested rather than effective is to > > be able to do #4 so that the effective is always what's currently allowed > > from what the user specified for the workqueue. Thanks for the explanation! > > Now, if you want the current effective cpumask, that always coincides with > > the workqueue's dfl_pwq's __pod_cpumask and if you look at the current > > wq/for-6.9 branch, that's accessible through unbound_effective_cpumask() > > helper. > > Thank for the explanation, we will use the new unbound_effective_cpumask() > helper. Right, that should indeed work. Best, Juri
© 2016 - 2025 Red Hat, Inc.