kernel/workqueue.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
Currently, the nr_running can be modified from timer tick, that means
the timer tick can run in not-irq-protected critical section to modify
nr_runnig, consider the following scenario:
CPU0
kworker/0:2 (events)
worker_clr_flags(worker, WORKER_PREP | WORKER_REBOUND);
->pool->nr_running++; (1)
process_one_work()
->worker->current_func(work);
->schedule()
->wq_worker_sleeping()
->worker->sleeping = 1;
->pool->nr_running--; (0)
....
->wq_worker_running()
....
CPU0 by interrupt:
wq_worker_tick()
->worker_set_flags(worker, WORKER_CPU_INTENSIVE);
->pool->nr_running--; (-1)
->worker->flags |= WORKER_CPU_INTENSIVE;
....
->if (!(worker->flags & WORKER_NOT_RUNNING))
->pool->nr_running++; (will not execute)
->worker->sleeping = 0;
....
->worker_clr_flags(worker, WORKER_CPU_INTENSIVE);
->pool->nr_running++; (0)
....
worker_set_flags(worker, WORKER_PREP);
->pool->nr_running--; (-1)
....
worker_enter_idle()
->WARN_ON_ONCE(pool->nr_workers == pool->nr_idle && pool->nr_running);
if the nr_workers is equal to nr_idle, due to the nr_running is not zero,
will trigger WARN_ON_ONCE().
[ 2.460602] WARNING: CPU: 0 PID: 63 at kernel/workqueue.c:1999 worker_enter_idle+0xb2/0xc0
[ 2.462163] Modules linked in:
[ 2.463401] CPU: 0 PID: 63 Comm: kworker/0:2 Not tainted 6.4.0-rc2-next-20230519 #1
[ 2.463771] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
[ 2.465127] Workqueue: 0x0 (events)
[ 2.465678] RIP: 0010:worker_enter_idle+0xb2/0xc0
...
[ 2.472614] Call Trace:
[ 2.473152] <TASK>
[ 2.474182] worker_thread+0x71/0x430
[ 2.474992] ? _raw_spin_unlock_irqrestore+0x28/0x50
[ 2.475263] kthread+0x103/0x120
[ 2.475493] ? __pfx_worker_thread+0x10/0x10
[ 2.476355] ? __pfx_kthread+0x10/0x10
[ 2.476635] ret_from_fork+0x2c/0x50
[ 2.477051] </TASK>
This commit therefore add the check of worker->sleeping in
wq_worker_tick(), if the worker->sleeping is not zero, directly return.
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Closes: https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20230519/testrun/17078554/suite/boot/test/clang-nightly-lkftconfig/log
Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>
---
kernel/workqueue.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 9c5c1cfa478f..329b84c42062 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1144,13 +1144,12 @@ void wq_worker_tick(struct task_struct *task)
* longer than wq_cpu_intensive_thresh_us, it's automatically marked
* CPU_INTENSIVE to avoid stalling other concurrency-managed work items.
*/
- if ((worker->flags & WORKER_NOT_RUNNING) ||
+ if ((worker->flags & WORKER_NOT_RUNNING) || worker->sleeping ||
worker->task->se.sum_exec_runtime - worker->current_at <
wq_cpu_intensive_thresh_us * NSEC_PER_USEC)
return;
raw_spin_lock(&pool->lock);
-
worker_set_flags(worker, WORKER_CPU_INTENSIVE);
wq_cpu_intensive_report(worker->current_func);
pwq->stats[PWQ_STAT_CPU_INTENSIVE]++;
--
2.17.1
On Tue, May 23, 2023 at 10:09:41PM +0800, Zqiang wrote: > diff --git a/kernel/workqueue.c b/kernel/workqueue.c > index 9c5c1cfa478f..329b84c42062 100644 > --- a/kernel/workqueue.c > +++ b/kernel/workqueue.c > @@ -1144,13 +1144,12 @@ void wq_worker_tick(struct task_struct *task) > * longer than wq_cpu_intensive_thresh_us, it's automatically marked > * CPU_INTENSIVE to avoid stalling other concurrency-managed work items. > */ > - if ((worker->flags & WORKER_NOT_RUNNING) || > + if ((worker->flags & WORKER_NOT_RUNNING) || worker->sleeping || > worker->task->se.sum_exec_runtime - worker->current_at < > wq_cpu_intensive_thresh_us * NSEC_PER_USEC) > return; Ah, right, this isn't just interrupted read-modify-write. It has to consider sleeping. This is subtle. We'll definitely need more comments. Will think more about it. Thanks. -- tejun
Hello, On Tue, May 23, 2023 at 09:40:16AM -1000, Tejun Heo wrote: > On Tue, May 23, 2023 at 10:09:41PM +0800, Zqiang wrote: > > diff --git a/kernel/workqueue.c b/kernel/workqueue.c > > index 9c5c1cfa478f..329b84c42062 100644 > > --- a/kernel/workqueue.c > > +++ b/kernel/workqueue.c > > @@ -1144,13 +1144,12 @@ void wq_worker_tick(struct task_struct *task) > > * longer than wq_cpu_intensive_thresh_us, it's automatically marked > > * CPU_INTENSIVE to avoid stalling other concurrency-managed work items. > > */ > > - if ((worker->flags & WORKER_NOT_RUNNING) || > > + if ((worker->flags & WORKER_NOT_RUNNING) || worker->sleeping || > > worker->task->se.sum_exec_runtime - worker->current_at < > > wq_cpu_intensive_thresh_us * NSEC_PER_USEC) > > return; > > Ah, right, this isn't just interrupted read-modify-write. It has to consider > sleeping. This is subtle. We'll definitely need more comments. Will think > more about it. So, there already are enough barriers to make this safe but that's kinda brittle because e.g. it'd depend on the barrier in preempt_disable() which is there for an unrelated reason. Can you please change ->sleeping accesses to use WRITE/READ_ONCE() and explain in wq_worker_tick() that the worker doesn't contribute to ->nr_running while ->sleeping regardless of NOT_RUNNING and thus the operation shouldn't proceed? We probably need to make it prettier but I think that should do for now. Thanks. -- tejun
> > Hello, > > On Tue, May 23, 2023 at 09:40:16AM -1000, Tejun Heo wrote: > > On Tue, May 23, 2023 at 10:09:41PM +0800, Zqiang wrote: > > > diff --git a/kernel/workqueue.c b/kernel/workqueue.c > > > index 9c5c1cfa478f..329b84c42062 100644 > > > --- a/kernel/workqueue.c > > > +++ b/kernel/workqueue.c > > > @@ -1144,13 +1144,12 @@ void wq_worker_tick(struct task_struct *task) > > > * longer than wq_cpu_intensive_thresh_us, it's automatically marked > > > * CPU_INTENSIVE to avoid stalling other concurrency-managed work items. > > > */ > > > - if ((worker->flags & WORKER_NOT_RUNNING) || > > > + if ((worker->flags & WORKER_NOT_RUNNING) || worker->sleeping || > > > worker->task->se.sum_exec_runtime - worker->current_at < > > > wq_cpu_intensive_thresh_us * NSEC_PER_USEC) > > > return; > > > > Ah, right, this isn't just interrupted read-modify-write. It has to consider > > sleeping. This is subtle. We'll definitely need more comments. Will think > > more about it. > > So, there already are enough barriers to make this safe but that's kinda > brittle because e.g. it'd depend on the barrier in preempt_disable() which > is there for an unrelated reason. Can you please change ->sleeping accesses > to use WRITE/READ_ONCE() and explain in wq_worker_tick() that the worker > doesn't contribute to ->nr_running while ->sleeping regardless of > NOT_RUNNING and thus the operation shouldn't proceed? We probably need to > make it prettier but I think that should do for now. Thanks for the suggestion, I will resend. > > Thanks. > > -- > tejun
This commit disable check CPU-hogging work in wq_worker_tick(), when set
the 'workqueue.cpu_intensive_thresh_us=0' in bootparams.
Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>
---
kernel/workqueue.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index e548b2eda12a..ccbc9f2dafa6 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1145,6 +1145,7 @@ void wq_worker_tick(struct task_struct *task)
* CPU_INTENSIVE to avoid stalling other concurrency-managed work items.
*/
if ((worker->flags & WORKER_NOT_RUNNING) || worker->sleeping ||
+ !wq_cpu_intensive_thresh_us ||
worker->task->se.sum_exec_runtime - worker->current_at <
wq_cpu_intensive_thresh_us * NSEC_PER_USEC)
return;
--
2.17.1
© 2016 - 2026 Red Hat, Inc.