[RFC PATCH 1/8] workqueue: Unconditionally set cpumask in worker_attach_to_pool()

Lai Jiangshan posted 8 patches 3 years, 8 months ago
[RFC PATCH 1/8] workqueue: Unconditionally set cpumask in worker_attach_to_pool()
Posted by Lai Jiangshan 3 years, 8 months ago
From: Lai Jiangshan <jiangshan.ljs@antgroup.com>

If a worker is spuriously woken up after kthread_bind_mask() but before
worker_attach_to_pool(), and there are some cpu-hot-[un]plug happening
during the same interval, the worker task might be pushed away from its
bound CPU with its affinity changed by the scheduler and worker_attach_to_pool()
doesn't rebind it properly.

Do unconditionally affinity binding in worker_attach_to_pool() to fix
the problem.

Prepare for moving worker_attach_to_pool() from create_worker() to the
starting of worker_thread() which will really cause the said interval
even without spurious wakeup.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Tejun Heo <tj@kernel.org>,
Cc: Petr Mladek <pmladek@suse.com>
Cc: Michal Hocko <mhocko@suse.com>,
Cc: Peter Zijlstra <peterz@infradead.org>,
Cc: Wedson Almeida Filho <wedsonaf@google.com>
Fixes: 640f17c82460 ("workqueue: Restrict affinity change to rescuer")
Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
---
 kernel/workqueue.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 1ea50f6be843..928aad7d6123 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1872,8 +1872,11 @@ static void worker_attach_to_pool(struct worker *worker,
 	else
 		kthread_set_per_cpu(worker->task, pool->cpu);
 
-	if (worker->rescue_wq)
-		set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);
+	/*
+	 * set_cpus_allowed_ptr() will fail if the cpumask doesn't have any
+	 * online CPUs.  It'll be re-applied when any of the CPUs come up.
+	 */
+	set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);
 
 	list_add_tail(&worker->node, &pool->workers);
 	worker->pool = pool;
-- 
2.19.1.6.gb485710b
Re: [RFC PATCH 1/8] workqueue: Unconditionally set cpumask in worker_attach_to_pool()
Posted by Peter Zijlstra 3 years, 7 months ago
On Thu, Aug 04, 2022 at 04:41:28PM +0800, Lai Jiangshan wrote:
> From: Lai Jiangshan <jiangshan.ljs@antgroup.com>
> 
> If a worker is spuriously woken up after kthread_bind_mask() but before
> worker_attach_to_pool(), and there are some cpu-hot-[un]plug happening
> during the same interval, the worker task might be pushed away from its
> bound CPU with its affinity changed by the scheduler and worker_attach_to_pool()
> doesn't rebind it properly.

Can you *please* be more explicit. The above doesn't give me enough clue
to reconstruct the actual scenario you're fixing.

Draw a picture or something.
Re: [RFC PATCH 1/8] workqueue: Unconditionally set cpumask in worker_attach_to_pool()
Posted by Tejun Heo 3 years, 7 months ago
cc'ing Waiman.

On Thu, Aug 04, 2022 at 04:41:28PM +0800, Lai Jiangshan wrote:
> From: Lai Jiangshan <jiangshan.ljs@antgroup.com>
> 
> If a worker is spuriously woken up after kthread_bind_mask() but before
> worker_attach_to_pool(), and there are some cpu-hot-[un]plug happening
> during the same interval, the worker task might be pushed away from its
> bound CPU with its affinity changed by the scheduler and worker_attach_to_pool()
> doesn't rebind it properly.
> 
> Do unconditionally affinity binding in worker_attach_to_pool() to fix
> the problem.
> 
> Prepare for moving worker_attach_to_pool() from create_worker() to the
> starting of worker_thread() which will really cause the said interval
> even without spurious wakeup.

So, this looks fine but I think the whole thing can be simplified if we
integrate this with the persistent user cpumask change that Waiman is
working on. We can just set the cpumask once during init and let the
scheduler core figure out what the current effective mask is as CPU
availability changes.

 http://lkml.kernel.org/r/20220816192734.67115-4-longman@redhat.com

Thanks.

-- 
tejun
Re: [RFC PATCH 1/8] workqueue: Unconditionally set cpumask in worker_attach_to_pool()
Posted by Lai Jiangshan 3 years, 7 months ago
On Wed, Aug 17, 2022 at 5:18 AM Tejun Heo <tj@kernel.org> wrote:
>
> cc'ing Waiman.
>
> On Thu, Aug 04, 2022 at 04:41:28PM +0800, Lai Jiangshan wrote:
> > From: Lai Jiangshan <jiangshan.ljs@antgroup.com>
> >
> > If a worker is spuriously woken up after kthread_bind_mask() but before
> > worker_attach_to_pool(), and there are some cpu-hot-[un]plug happening
> > during the same interval, the worker task might be pushed away from its
> > bound CPU with its affinity changed by the scheduler and worker_attach_to_pool()
> > doesn't rebind it properly.
> >
> > Do unconditionally affinity binding in worker_attach_to_pool() to fix
> > the problem.
> >
> > Prepare for moving worker_attach_to_pool() from create_worker() to the
> > starting of worker_thread() which will really cause the said interval
> > even without spurious wakeup.
>
> So, this looks fine but I think the whole thing can be simplified if we
> integrate this with the persistent user cpumask change that Waiman is
> working on. We can just set the cpumask once during init and let the
> scheduler core figure out what the current effective mask is as CPU
> availability changes.
>
>  http://lkml.kernel.org/r/20220816192734.67115-4-longman@redhat.com
>

I like this approach.