When unbind_workers() reads wq_unbound_cpumask to set the affinity of
freshly-unbound kworkers, it only holds wq_pool_attach_mutex. This isn't
sufficient as wq_unbound_cpumask is only protected by wq_pool_mutex.
This is made more obvious as of recent commit
46a4d679ef88 ("workqueue: Avoid a false warning in unbind_workers()")
e.g.
unbind_workers() workqueue_set_unbound_cpumask()
kthread_set_per_cpu(p, -1);
if (cpumask_intersects(wq_unbound_cpumask, cpu_active_mask))
cpumask_copy(wq_unbound_cpumask, cpumask);
WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, wq_unbound_cpumask) < 0);
Make workqueue_offline_cpu() invoke unbind_workers() with wq_pool_mutex
held.
Fixes: 10a5a651e3af ("workqueue: Restrict kworker in the offline CPU pool running on housekeeping CPUs")
Signed-off-by: Valentin Schneider <vschneid@redhat.com>
---
kernel/workqueue.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index aa8a82bc6738..97cc41430a76 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -5143,14 +5143,15 @@ int workqueue_offline_cpu(unsigned int cpu)
if (WARN_ON(cpu != smp_processor_id()))
return -1;
+ mutex_lock(&wq_pool_mutex);
+
unbind_workers(cpu);
/* update NUMA affinity of unbound workqueues */
- mutex_lock(&wq_pool_mutex);
list_for_each_entry(wq, &workqueues, list)
wq_update_unbound_numa(wq, cpu, false);
- mutex_unlock(&wq_pool_mutex);
+ mutex_unlock(&wq_pool_mutex);
return 0;
}
--
2.31.1
On Tue, Aug 2, 2022 at 4:42 PM Valentin Schneider <vschneid@redhat.com> wrote: > > When unbind_workers() reads wq_unbound_cpumask to set the affinity of > freshly-unbound kworkers, it only holds wq_pool_attach_mutex. This isn't > sufficient as wq_unbound_cpumask is only protected by wq_pool_mutex. > Hello Valentin, Updating wq_unbound_cpumask requires cpus_read_lock() and unbind_workers() is in the CPU hotplug path and so it is sufficient to access to wq_unbound_cpumask in unbind_workers(). The extra protection is only required when the logic is also moved to destroy_worker(). Thanks Lai
On 2022/8/2 16:41, Valentin Schneider wrote:
> When unbind_workers() reads wq_unbound_cpumask to set the affinity of
> freshly-unbound kworkers, it only holds wq_pool_attach_mutex. This isn't
> sufficient as wq_unbound_cpumask is only protected by wq_pool_mutex.
>
> This is made more obvious as of recent commit
>
> 46a4d679ef88 ("workqueue: Avoid a false warning in unbind_workers()")
>
> e.g.
>
> unbind_workers() workqueue_set_unbound_cpumask()
> kthread_set_per_cpu(p, -1);
> if (cpumask_intersects(wq_unbound_cpumask, cpu_active_mask))
> cpumask_copy(wq_unbound_cpumask, cpumask);
> WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, wq_unbound_cpumask) < 0);
>
> Make workqueue_offline_cpu() invoke unbind_workers() with wq_pool_mutex
> held.
I would prefer to protect wq_unbound_cpumask with wq_pool_attach_mutex.
From df7b4672db4dfd3e480b1873b9d346e8a7dfc69f Mon Sep 17 00:00:00 2001
From: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Date: Wed, 3 Aug 2022 10:52:04 +0800
Subject: [PATCH] workqueue: Protects wq_unbound_cpumask with
wq_pool_attach_mutex
When unbind_workers() reads wq_unbound_cpumask to set the affinity of
freshly-unbound kworkers, it only holds wq_pool_attach_mutex. This isn't
sufficient as wq_unbound_cpumask is only protected by wq_pool_mutex.
Make wq_unbound_cpumask protected with wq_pool_attach_mutex and also
remove the need of temporary saved_cpumask.
Fixes: 10a5a651e3af ("workqueue: Restrict kworker in the offline CPU pool running on housekeeping CPUs")
Reported-by: Valentin Schneider <vschneid@redhat.com>
Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
---
kernel/workqueue.c | 41 ++++++++++++++++-------------------------
1 file changed, 16 insertions(+), 25 deletions(-)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 6b2b66940530..eaea73e7e365 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -327,7 +327,7 @@ static struct rcuwait manager_wait = __RCUWAIT_INITIALIZER(manager_wait);
static LIST_HEAD(workqueues); /* PR: list of all workqueues */
static bool workqueue_freezing; /* PL: have wqs started freezing? */
-/* PL: allowable cpus for unbound wqs and work items */
+/* PL&A: allowable cpus for unbound wqs and work items */
static cpumask_var_t wq_unbound_cpumask;
/* CPU where unbound work was last round robin scheduled from this CPU */
@@ -3933,7 +3933,8 @@ static void apply_wqattrs_cleanup(struct apply_wqattrs_ctx *ctx)
/* allocate the attrs and pwqs for later installation */
static struct apply_wqattrs_ctx *
apply_wqattrs_prepare(struct workqueue_struct *wq,
- const struct workqueue_attrs *attrs)
+ const struct workqueue_attrs *attrs,
+ const cpumask_var_t unbound_cpumask)
{
struct apply_wqattrs_ctx *ctx;
struct workqueue_attrs *new_attrs, *tmp_attrs;
@@ -3949,14 +3950,15 @@ apply_wqattrs_prepare(struct workqueue_struct *wq,
goto out_free;
/*
- * Calculate the attrs of the default pwq.
+ * Calculate the attrs of the default pwq with unbound_cpumask
+ * which is wq_unbound_cpumask or to set to wq_unbound_cpumask.
* If the user configured cpumask doesn't overlap with the
* wq_unbound_cpumask, we fallback to the wq_unbound_cpumask.
*/
copy_workqueue_attrs(new_attrs, attrs);
- cpumask_and(new_attrs->cpumask, new_attrs->cpumask, wq_unbound_cpumask);
+ cpumask_and(new_attrs->cpumask, new_attrs->cpumask, unbound_cpumask);
if (unlikely(cpumask_empty(new_attrs->cpumask)))
- cpumask_copy(new_attrs->cpumask, wq_unbound_cpumask);
+ cpumask_copy(new_attrs->cpumask, unbound_cpumask);
/*
* We may create multiple pwqs with differing cpumasks. Make a
@@ -4053,7 +4055,7 @@ static int apply_workqueue_attrs_locked(struct workqueue_struct *wq,
wq->flags &= ~__WQ_ORDERED;
}
- ctx = apply_wqattrs_prepare(wq, attrs);
+ ctx = apply_wqattrs_prepare(wq, attrs, wq_unbound_cpumask);
if (!ctx)
return -ENOMEM;
@@ -5311,7 +5313,7 @@ void thaw_workqueues(void)
}
#endif /* CONFIG_FREEZER */
-static int workqueue_apply_unbound_cpumask(void)
+static int workqueue_apply_unbound_cpumask(const cpumask_var_t unbound_cpumask)
{
LIST_HEAD(ctxs);
int ret = 0;
@@ -5327,7 +5329,7 @@ static int workqueue_apply_unbound_cpumask(void)
if (wq->flags & __WQ_ORDERED)
continue;
- ctx = apply_wqattrs_prepare(wq, wq->unbound_attrs);
+ ctx = apply_wqattrs_prepare(wq, wq->unbound_attrs, unbound_cpumask);
if (!ctx) {
ret = -ENOMEM;
break;
@@ -5342,6 +5344,11 @@ static int workqueue_apply_unbound_cpumask(void)
apply_wqattrs_cleanup(ctx);
}
+ if (!ret) {
+ mutex_lock(&wq_pool_attach_mutex);
+ cpumask_copy(wq_unbound_cpumask, unbound_cpumask);
+ mutex_unlock(&wq_pool_attach_mutex);
+ }
return ret;
}
@@ -5360,7 +5367,6 @@ static int workqueue_apply_unbound_cpumask(void)
int workqueue_set_unbound_cpumask(cpumask_var_t cpumask)
{
int ret = -EINVAL;
- cpumask_var_t saved_cpumask;
/*
* Not excluding isolated cpus on purpose.
@@ -5374,23 +5380,8 @@ int workqueue_set_unbound_cpumask(cpumask_var_t cpumask)
goto out_unlock;
}
- if (!zalloc_cpumask_var(&saved_cpumask, GFP_KERNEL)) {
- ret = -ENOMEM;
- goto out_unlock;
- }
-
- /* save the old wq_unbound_cpumask. */
- cpumask_copy(saved_cpumask, wq_unbound_cpumask);
-
- /* update wq_unbound_cpumask at first and apply it to wqs. */
- cpumask_copy(wq_unbound_cpumask, cpumask);
- ret = workqueue_apply_unbound_cpumask();
-
- /* restore the wq_unbound_cpumask when failed. */
- if (ret < 0)
- cpumask_copy(wq_unbound_cpumask, saved_cpumask);
+ ret = workqueue_apply_unbound_cpumask(cpumask);
- free_cpumask_var(saved_cpumask);
out_unlock:
apply_wqattrs_unlock();
}
--
2.19.1.6.gb485710b
On Wed, Aug 03, 2022 at 11:40:28AM +0800, Lai Jiangshan wrote:
> From df7b4672db4dfd3e480b1873b9d346e8a7dfc69f Mon Sep 17 00:00:00 2001
> From: Lai Jiangshan <jiangshan.ljs@antgroup.com>
> Date: Wed, 3 Aug 2022 10:52:04 +0800
> Subject: [PATCH] workqueue: Protects wq_unbound_cpumask with
> wq_pool_attach_mutex
>
> When unbind_workers() reads wq_unbound_cpumask to set the affinity of
> freshly-unbound kworkers, it only holds wq_pool_attach_mutex. This isn't
> sufficient as wq_unbound_cpumask is only protected by wq_pool_mutex.
>
> Make wq_unbound_cpumask protected with wq_pool_attach_mutex and also
> remove the need of temporary saved_cpumask.
>
> Fixes: 10a5a651e3af ("workqueue: Restrict kworker in the offline CPU pool running on housekeeping CPUs")
> Reported-by: Valentin Schneider <vschneid@redhat.com>
> Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
This patch looks fine to me but is whitespace corrupted (two leading spaces
on all context lines). Can you please resend?
Thanks.
--
tejun
From: Lai Jiangshan <jiangshan.ljs@antgroup.com>
When unbind_workers() reads wq_unbound_cpumask to set the affinity of
freshly-unbound kworkers, it only holds wq_pool_attach_mutex. This isn't
sufficient as wq_unbound_cpumask is only protected by wq_pool_mutex.
Make wq_unbound_cpumask protected with wq_pool_attach_mutex and also
remove the need of temporary saved_cpumask.
Fixes: 10a5a651e3af ("workqueue: Restrict kworker in the offline CPU pool running on housekeeping CPUs")
Reported-by: Valentin Schneider <vschneid@redhat.com>
Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
---
kernel/workqueue.c | 41 ++++++++++++++++-------------------------
1 file changed, 16 insertions(+), 25 deletions(-)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 6b2b66940530..eaea73e7e365 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -327,7 +327,7 @@ static struct rcuwait manager_wait = __RCUWAIT_INITIALIZER(manager_wait);
static LIST_HEAD(workqueues); /* PR: list of all workqueues */
static bool workqueue_freezing; /* PL: have wqs started freezing? */
-/* PL: allowable cpus for unbound wqs and work items */
+/* PL&A: allowable cpus for unbound wqs and work items */
static cpumask_var_t wq_unbound_cpumask;
/* CPU where unbound work was last round robin scheduled from this CPU */
@@ -3933,7 +3933,8 @@ static void apply_wqattrs_cleanup(struct apply_wqattrs_ctx *ctx)
/* allocate the attrs and pwqs for later installation */
static struct apply_wqattrs_ctx *
apply_wqattrs_prepare(struct workqueue_struct *wq,
- const struct workqueue_attrs *attrs)
+ const struct workqueue_attrs *attrs,
+ const cpumask_var_t unbound_cpumask)
{
struct apply_wqattrs_ctx *ctx;
struct workqueue_attrs *new_attrs, *tmp_attrs;
@@ -3949,14 +3950,15 @@ apply_wqattrs_prepare(struct workqueue_struct *wq,
goto out_free;
/*
- * Calculate the attrs of the default pwq.
+ * Calculate the attrs of the default pwq with unbound_cpumask
+ * which is wq_unbound_cpumask or to set to wq_unbound_cpumask.
* If the user configured cpumask doesn't overlap with the
* wq_unbound_cpumask, we fallback to the wq_unbound_cpumask.
*/
copy_workqueue_attrs(new_attrs, attrs);
- cpumask_and(new_attrs->cpumask, new_attrs->cpumask, wq_unbound_cpumask);
+ cpumask_and(new_attrs->cpumask, new_attrs->cpumask, unbound_cpumask);
if (unlikely(cpumask_empty(new_attrs->cpumask)))
- cpumask_copy(new_attrs->cpumask, wq_unbound_cpumask);
+ cpumask_copy(new_attrs->cpumask, unbound_cpumask);
/*
* We may create multiple pwqs with differing cpumasks. Make a
@@ -4053,7 +4055,7 @@ static int apply_workqueue_attrs_locked(struct workqueue_struct *wq,
wq->flags &= ~__WQ_ORDERED;
}
- ctx = apply_wqattrs_prepare(wq, attrs);
+ ctx = apply_wqattrs_prepare(wq, attrs, wq_unbound_cpumask);
if (!ctx)
return -ENOMEM;
@@ -5311,7 +5313,7 @@ void thaw_workqueues(void)
}
#endif /* CONFIG_FREEZER */
-static int workqueue_apply_unbound_cpumask(void)
+static int workqueue_apply_unbound_cpumask(const cpumask_var_t unbound_cpumask)
{
LIST_HEAD(ctxs);
int ret = 0;
@@ -5327,7 +5329,7 @@ static int workqueue_apply_unbound_cpumask(void)
if (wq->flags & __WQ_ORDERED)
continue;
- ctx = apply_wqattrs_prepare(wq, wq->unbound_attrs);
+ ctx = apply_wqattrs_prepare(wq, wq->unbound_attrs, unbound_cpumask);
if (!ctx) {
ret = -ENOMEM;
break;
@@ -5342,6 +5344,11 @@ static int workqueue_apply_unbound_cpumask(void)
apply_wqattrs_cleanup(ctx);
}
+ if (!ret) {
+ mutex_lock(&wq_pool_attach_mutex);
+ cpumask_copy(wq_unbound_cpumask, unbound_cpumask);
+ mutex_unlock(&wq_pool_attach_mutex);
+ }
return ret;
}
@@ -5360,7 +5367,6 @@ static int workqueue_apply_unbound_cpumask(void)
int workqueue_set_unbound_cpumask(cpumask_var_t cpumask)
{
int ret = -EINVAL;
- cpumask_var_t saved_cpumask;
/*
* Not excluding isolated cpus on purpose.
@@ -5374,23 +5380,8 @@ int workqueue_set_unbound_cpumask(cpumask_var_t cpumask)
goto out_unlock;
}
- if (!zalloc_cpumask_var(&saved_cpumask, GFP_KERNEL)) {
- ret = -ENOMEM;
- goto out_unlock;
- }
-
- /* save the old wq_unbound_cpumask. */
- cpumask_copy(saved_cpumask, wq_unbound_cpumask);
-
- /* update wq_unbound_cpumask at first and apply it to wqs. */
- cpumask_copy(wq_unbound_cpumask, cpumask);
- ret = workqueue_apply_unbound_cpumask();
-
- /* restore the wq_unbound_cpumask when failed. */
- if (ret < 0)
- cpumask_copy(wq_unbound_cpumask, saved_cpumask);
+ ret = workqueue_apply_unbound_cpumask(cpumask);
- free_cpumask_var(saved_cpumask);
out_unlock:
apply_wqattrs_unlock();
}
--
2.19.1.6.gb485710b
Hello,
On Thu, Aug 18, 2022 at 10:33:48PM +0800, Lai Jiangshan wrote:
> @@ -5342,6 +5344,11 @@ static int workqueue_apply_unbound_cpumask(void)
> apply_wqattrs_cleanup(ctx);
> }
>
> + if (!ret) {
> + mutex_lock(&wq_pool_attach_mutex);
> + cpumask_copy(wq_unbound_cpumask, unbound_cpumask);
> + mutex_unlock(&wq_pool_attach_mutex);
Is this enough? Shouldn't the lock be protecting a wider scope? If there's
someone reading the flag with just pool_attach_mutex, what prevents them
reading it right before the new value is committed and keeps using the stale
value?
Thanks.
--
tejun
On Sat, Aug 27, 2022 at 8:33 AM Tejun Heo <tj@kernel.org> wrote:
>
> Hello,
>
> On Thu, Aug 18, 2022 at 10:33:48PM +0800, Lai Jiangshan wrote:
> > @@ -5342,6 +5344,11 @@ static int workqueue_apply_unbound_cpumask(void)
> > apply_wqattrs_cleanup(ctx);
> > }
> >
> > + if (!ret) {
> > + mutex_lock(&wq_pool_attach_mutex);
> > + cpumask_copy(wq_unbound_cpumask, unbound_cpumask);
> > + mutex_unlock(&wq_pool_attach_mutex);
>
> Is this enough? Shouldn't the lock be protecting a wider scope? If there's
> someone reading the flag with just pool_attach_mutex, what prevents them
> reading it right before the new value is committed and keeps using the stale
> value?
Which "flag"? wq_unbound_cpumask?
This code is adding protection for wq_unbound_cpumask and makes
unbind_workers() use a stable version of wq_unbound_cpumask during
operation.
It doesn't really matter if pool's mask becomes stale later again
with respect to wq_unbound_cpumask.
No code ensures the disassociated pool's mask is kept with the newest
wq_unbound_cpumask since the 10a5a651e3af ("workqueue: Restrict kworker
in the offline CPU pool running on housekeeping CPUs") first uses
wq_unbound_cpumask for the disassociated pools.
What matters is that the pool's mask should the wq_unbound_cpumask
at the time when it becomes disassociated which has no isolated CPUs.
I don't like 10a5a651e3af for it not synching the pool's mask
with wq_unbound_cpumask. But I think it works anyway.
>
> Thanks.
>
> --
> tejun
Hello,
On Tue, Aug 30, 2022 at 05:32:17PM +0800, Lai Jiangshan wrote:
> > Is this enough? Shouldn't the lock be protecting a wider scope? If there's
> > someone reading the flag with just pool_attach_mutex, what prevents them
> > reading it right before the new value is committed and keeps using the stale
> > value?
>
> Which "flag"? wq_unbound_cpumask?
Oh, yeah, sorry.
> This code is adding protection for wq_unbound_cpumask and makes
> unbind_workers() use a stable version of wq_unbound_cpumask during
> operation.
>
> It doesn't really matter if pool's mask becomes stale later again
> with respect to wq_unbound_cpumask.
>
> No code ensures the disassociated pool's mask is kept with the newest
> wq_unbound_cpumask since the 10a5a651e3af ("workqueue: Restrict kworker
> in the offline CPU pool running on housekeeping CPUs") first uses
> wq_unbound_cpumask for the disassociated pools.
>
> What matters is that the pool's mask should the wq_unbound_cpumask
> at the time when it becomes disassociated which has no isolated CPUs.
>
> I don't like 10a5a651e3af for it not synching the pool's mask
> with wq_unbound_cpumask. But I think it works anyway.
Hmm... I see. Can you add a comment explaining why we're grasbbing
wq_pool_attach_mutex there?
Thanks.
--
tejun
On 03/08/22 11:40, Lai Jiangshan wrote:
> On 2022/8/2 16:41, Valentin Schneider wrote:
>> When unbind_workers() reads wq_unbound_cpumask to set the affinity of
>> freshly-unbound kworkers, it only holds wq_pool_attach_mutex. This isn't
>> sufficient as wq_unbound_cpumask is only protected by wq_pool_mutex.
>>
>> This is made more obvious as of recent commit
>>
>> 46a4d679ef88 ("workqueue: Avoid a false warning in unbind_workers()")
>>
>> e.g.
>>
>> unbind_workers() workqueue_set_unbound_cpumask()
>> kthread_set_per_cpu(p, -1);
>> if (cpumask_intersects(wq_unbound_cpumask, cpu_active_mask))
>> cpumask_copy(wq_unbound_cpumask, cpumask);
>> WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, wq_unbound_cpumask) < 0);
>>
>> Make workqueue_offline_cpu() invoke unbind_workers() with wq_pool_mutex
>> held.
>
> I would prefer to protect wq_unbound_cpumask with wq_pool_attach_mutex.
That looks alright to me, do you want to push that separately as it's a
standalone patch, or should I carry it with this series?
On Thu, Aug 4, 2022 at 7:40 PM Valentin Schneider <vschneid@redhat.com> wrote:
>
> On 03/08/22 11:40, Lai Jiangshan wrote:
> > On 2022/8/2 16:41, Valentin Schneider wrote:
> >> When unbind_workers() reads wq_unbound_cpumask to set the affinity of
> >> freshly-unbound kworkers, it only holds wq_pool_attach_mutex. This isn't
> >> sufficient as wq_unbound_cpumask is only protected by wq_pool_mutex.
> >>
> >> This is made more obvious as of recent commit
> >>
> >> 46a4d679ef88 ("workqueue: Avoid a false warning in unbind_workers()")
> >>
> >> e.g.
> >>
> >> unbind_workers() workqueue_set_unbound_cpumask()
> >> kthread_set_per_cpu(p, -1);
> >> if (cpumask_intersects(wq_unbound_cpumask, cpu_active_mask))
> >> cpumask_copy(wq_unbound_cpumask, cpumask);
> >> WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, wq_unbound_cpumask) < 0);
> >>
> >> Make workqueue_offline_cpu() invoke unbind_workers() with wq_pool_mutex
> >> held.
> >
> > I would prefer to protect wq_unbound_cpumask with wq_pool_attach_mutex.
>
> That looks alright to me, do you want to push that separately as it's a
> standalone patch, or should I carry it with this series?
>
I'm Okay with both.
It needs review from Tejun. If Tejun has not queued it before you send
a new update of this series, I will be glad if you carry it.
© 2016 - 2026 Red Hat, Inc.