[PATCH] workqueue: let WORKER_CPU_INTENSIVE be included in watchdog

Tio Zhang posted 1 patch 2 years, 6 months ago
There is a newer version of this series
kernel/workqueue.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)
[PATCH] workqueue: let WORKER_CPU_INTENSIVE be included in watchdog
Posted by Tio Zhang 2 years, 6 months ago
When a pool has a worker with WORKER_CPU_INTENSIVE set but other workers
are not that busy, the pool->worklist will mostly be empty, which leads
the intensive work always having a chance of escaping from the watchdog's
check. This may cause watchdog miss finding out a forever running work
in WQ_CPU_INTENSIVE.

Also, after commit '616db8779b1e3f93075df691432cccc5ef3c3ba0',
workers with potentially intensive works will automatically be converted
into WORKER_CPU_INTENSIVE. This might let watchdog to miss all work
potentially running forever.

Signed-off-by: Tio Zhang <tiozhang@didiglobal.com>
---
 kernel/workqueue.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 02a8f402eeb5..29875b680f5b 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -6280,10 +6280,23 @@ static void wq_watchdog_timer_fn(struct timer_list *unused)
 	rcu_read_lock();
 
 	for_each_pool(pool, pi) {
+		struct worker *worker;
 		unsigned long pool_ts, touched, ts;
+		bool check_intensive = false;
 
 		pool->cpu_stall = false;
-		if (list_empty(&pool->worklist))
+
+		/* Not sure if we should let WORKER_UNBOUND to
+		 * be included? Since let a unbound work to last
+		 * more than e,g, 30 seconds seem also unacceptable.
+		 */
+		for_each_pool_worker(worker, pool) {
+			if (worker->flags & WORKER_CPU_INTENSIVE) {
+				check_intensive = true;
+				break;
+			}
+		}
+		if (list_empty(&pool->worklist) && !check_intensive)
 			continue;
 
 		/*
-- 
2.17.1
Re: [PATCH] workqueue: let WORKER_CPU_INTENSIVE be included in watchdog
Posted by kernel test robot 2 years, 5 months ago

Hello,

kernel test robot noticed "WARNING:at_kernel/workqueue.c:#wq_watchdog_timer_fn" on:

commit: f5d265c1a77104897fad14235b2637b155c01efd ("[PATCH] workqueue: let WORKER_CPU_INTENSIVE be included in watchdog")
url: https://github.com/intel-lab-lkp/linux/commits/Tio-Zhang/workqueue-let-WORKER_CPU_INTENSIVE-be-included-in-watchdog/20230811-182610
base: https://git.kernel.org/cgit/linux/kernel/git/tj/wq.git for-next
patch link: https://lore.kernel.org/all/20230811102250.GA7959@didi-ThinkCentre-M930t-N000/
patch subject: [PATCH] workqueue: let WORKER_CPU_INTENSIVE be included in watchdog

in testcase: rcutorture
version: 
with following parameters:

	runtime: 300s
	test: default
	torture_type: srcu



compiler: gcc-12
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

(please refer to attached dmesg/kmsg for entire log/backtrace)



If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202308142140.cf9be57a-oliver.sang@intel.com



[   35.935095][    C0] ------------[ cut here ]------------
[   35.936505][    C0] WARNING: CPU: 0 PID: 100 at kernel/workqueue.c:6400 wq_watchdog_timer_fn+0x185/0x3b0
[   35.938627][    C0] Modules linked in:
[   35.939641][    C0] CPU: 0 PID: 100 Comm: systemd-journal Not tainted 6.5.0-rc1-00043-gf5d265c1a771 #1
[   35.941708][    C0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[   35.948000][    C0] EIP: wq_watchdog_timer_fn+0x185/0x3b0
[   35.949262][    C0] Code: b8 bc bc 2b c2 e8 db ee c8 00 31 d2 89 c3 85 c0 b8 28 91 89 c2 6a 00 0f 94 c2 31 c9 e8 44 b9 10 00 85 db 58 0f 85 6d ff ff ff <0f> 0b ba
 01 00 00 00 e9 63 ff ff ff 8d b4 26 00 00 00 00 8b 75 e8
[   35.953232][    C0] EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: 00000001
[   35.954818][    C0] ESI: c22bb970 EDI: c3aa3500 EBP: c4067f0c ESP: c4067ee8
[   35.956302][    C0] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068 EFLAGS: 00210246
[   35.957911][    C0] CR0: 80050033 CR2: b7ae9360 CR3: 2d43e000 CR4: 00040690
[   35.959401][    C0] Call Trace:
[   35.960250][    C0]  <SOFTIRQ>
[   35.961061][    C0]  ? show_regs+0x50/0x60
[   35.962135][    C0]  ? wq_watchdog_timer_fn+0x185/0x3b0
[   35.963335][    C0]  ? __warn+0x6f/0x1d0
[   35.964381][    C0]  ? wq_watchdog_timer_fn+0x185/0x3b0
[   35.965658][    C0]  ? report_bug+0x169/0x190
[   35.966864][    C0]  ? exc_overflow+0x40/0x40
[   35.968144][    C0]  ? handle_bug+0x28/0x50
[   35.969223][    C0]  ? exc_invalid_op+0x1a/0x60
[   35.970336][    C0]  ? handle_exception+0x14a/0x14a
[   35.971590][    C0]  ? poke_int3_handler+0x1eb/0x2e0
[   35.972787][    C0]  ? exc_overflow+0x40/0x40
[   35.973818][    C0]  ? wq_watchdog_timer_fn+0x185/0x3b0
[   35.975100][    C0]  ? exc_overflow+0x40/0x40
[   35.976255][    C0]  ? wq_watchdog_timer_fn+0x185/0x3b0
[   35.977590][    C0]  ? show_all_workqueues+0x300/0x300
[   35.978887][    C0]  call_timer_fn+0xb7/0x310
[   35.980130][    C0]  ? show_all_workqueues+0x300/0x300
[   35.981379][    C0]  ? show_all_workqueues+0x300/0x300
[   35.982658][    C0]  __run_timers+0x2a3/0x3b0
[   35.983860][    C0]  run_timer_softirq+0x1c/0x20
[   35.985029][    C0]  __do_softirq+0x144/0x518
[   35.986182][    C0]  ? __lock_text_end+0xc/0xc
[   35.987384][    C0]  call_on_stack+0x45/0x50
[   35.988532][    C0]  </SOFTIRQ>
[   35.989438][    C0]  ? irq_exit_rcu+0xb3/0xf0
[   35.990584][    C0]  ? sysvec_apic_timer_interrupt+0x1f/0x30
[   35.992046][    C0]  ? handle_exception+0x14a/0x14a
[   35.993299][    C0]  ? percpu_ref_put_many+0x64/0x140
[   35.994831][    C0]  ? vmware_sched_clock+0x100/0x100
[   35.996171][    C0]  ? lock_release+0x7b/0xe0
[   35.997313][    C0]  ? vmware_sched_clock+0x100/0x100
[   35.998584][    C0]  ? lock_release+0x7b/0xe0
[   35.999799][    C0]  ? percpu_ref_put_many+0x78/0x140
[   36.001302][    C0]  ? uncharge_folio+0x198/0x3a0
[   36.002498][    C0]  ? __mem_cgroup_uncharge_list+0x52/0x90
[   36.003965][    C0]  ? release_pages+0x17b/0x4b0
[   36.005193][    C0]  ? __folio_batch_release+0x1d/0x40
[   36.006440][    C0]  ? shmem_undo_range+0x2f8/0x7f0
[   36.007799][    C0]  ? shmem_evict_inode+0x111/0x2e0
[   36.008857][    C0]  ? __lock_release+0x152/0x2f0
[   36.009924][    C0]  ? check_preemption_disabled+0x2a/0x50
[   36.011540][    C0]  ? preempt_count_sub+0x74/0x150
[   36.012814][    C0]  ? _raw_spin_unlock+0x57/0x80
[   36.014052][    C0]  ? evict+0xed/0x220
[   36.015188][    C0]  ? evict+0xed/0x220
[   36.016345][    C0]  ? iput_final+0x148/0x190
[   36.017550][    C0]  ? iput+0x14f/0x180
[   36.024349][    C0]  ? dentry_unlink_inode+0xaf/0x110
[   36.025748][    C0]  ? __dentry_kill+0x11f/0x200
[   36.027051][    C0]  ? dentry_kill+0x7b/0x1f0
[   36.028271][    C0]  ? dput+0x2d8/0x2f0
[   36.029343][    C0]  ? __fput+0x164/0x400
[   36.030465][    C0]  ? ____fput+0xd/0x10
[   36.031626][    C0]  ? task_work_run+0x94/0xf0
[   36.033020][    C0]  ? exit_to_user_mode_prepare+0x335/0x340
[   36.034506][    C0]  ? syscall_exit_to_user_mode+0x1a/0x50
[   36.035998][    C0]  ? do_int80_syscall_32+0x62/0xa0
[   36.037305][    C0]  ? entry_INT80_32+0x107/0x107
[   36.038557][    C0] irq event stamp: 234966
[   36.039817][    C0] hardirqs last  enabled at (234978): [<c10eaca6>] __up_console_sem+0x56/0x60
[   36.041931][    C0] hardirqs last disabled at (234987): [<c10eac8d>] __up_console_sem+0x3d/0x60
[   36.044105][    C0] softirqs last  enabled at (233030): [<c1d34934>] __do_softirq+0x2a4/0x518
[   36.046155][    C0] softirqs last disabled at (234197): [<c1023fb5>] call_on_stack+0x45/0x50
[   36.048296][    C0] ---[ end trace 0000000000000000 ]---



The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20230814/202308142140.cf9be57a-oliver.sang@intel.com



-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
[PATCH v2] workqueue: let WORKER_CPU_INTENSIVE be included in watchdog
Posted by Tio Zhang 2 years, 5 months ago
When a pool has a worker with WORKER_CPU_INTENSIVE set but other workers
are not that busy, the pool->worklist will mostly be empty, which leads
the intensive work always having a chance of escaping from the watchdog's
check. This may cause watchdog miss finding out a forever running work
in WQ_CPU_INTENSIVE.

Also, after commit '616db8779b1e3f93075df691432cccc5ef3c3ba0',
workers with potentially intensive works will automatically be converted
into WORKER_CPU_INTENSIVE. This might let watchdog to miss all work
potentially running forever.

Signed-off-by: Tio Zhang <tiozhang@didiglobal.com>
---
 kernel/workqueue.c | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 02a8f402eeb5..564d96c38d4d 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -6277,13 +6277,29 @@ static void wq_watchdog_timer_fn(struct timer_list *unused)
 	if (!thresh)
 		return;
 
-	rcu_read_lock();
+	mutex_lock(&wq_pool_mutex);
 
 	for_each_pool(pool, pi) {
+		struct worker *worker;
 		unsigned long pool_ts, touched, ts;
+		bool check_intensive = false;
 
 		pool->cpu_stall = false;
-		if (list_empty(&pool->worklist))
+
+		/* Not sure if we should let WORKER_UNBOUND to
+		 * be included? Since let a unbound work to last
+		 * more than e,g, 30 seconds seem also unacceptable.
+		 */
+		mutex_lock(&wq_pool_attach_mutex);
+		for_each_pool_worker(worker, pool) {
+			if (worker->flags & WORKER_CPU_INTENSIVE) {
+				check_intensive = true;
+				break;
+			}
+		}
+		mutex_unlock(&wq_pool_attach_mutex);
+
+		if (list_empty(&pool->worklist) && !check_intensive)
 			continue;
 
 		/*
@@ -6320,7 +6336,7 @@ static void wq_watchdog_timer_fn(struct timer_list *unused)
 
 	}
 
-	rcu_read_unlock();
+	mutex_unlock(&wq_pool_mutex);
 
 	if (lockup_detected)
 		show_all_workqueues();
-- 
2.17.1
Re: [PATCH v2] workqueue: let WORKER_CPU_INTENSIVE be included in watchdog
Posted by kernel test robot 2 years, 5 months ago

Hello,

kernel test robot noticed "BUG:sleeping_function_called_from_invalid_context_at_kernel/locking/mutex.c" on:

commit: f87686da5e324dd5fbc44ae39677d5111fdbef0f ("[PATCH v2] workqueue: let WORKER_CPU_INTENSIVE be included in watchdog")
url: https://github.com/intel-lab-lkp/linux/commits/Tio-Zhang/workqueue-let-WORKER_CPU_INTENSIVE-be-included-in-watchdog/20230822-190755
base: https://git.kernel.org/cgit/linux/kernel/git/tj/wq.git for-next
patch link: https://lore.kernel.org/all/20230822110609.GA3702@didi-ThinkCentre-M930t-N000/
patch subject: [PATCH v2] workqueue: let WORKER_CPU_INTENSIVE be included in watchdog

in testcase: boot

compiler: clang-16
test machine: qemu-system-i386 -enable-kvm -cpu SandyBridge -smp 2 -m 4G

(please refer to attached dmesg/kmsg for entire log/backtrace)



If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202308241637.304413c4-oliver.sang@intel.com


[   34.238546][    C0] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:580
[   34.240594][    C0] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 0, name: swapper/0
[   34.242381][    C0] preempt_count: 101, expected: 0
[   34.243435][    C0] 1 lock held by swapper/0/0:
[   34.244371][    C0]  #0: 84fdbf20 ((&wq_watchdog_timer)){+.-.}-{0:0}, at: call_timer_fn+0x52/0x230
[   34.245809][    C0] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.5.0-rc1-00045-gf87686da5e32 #1
[   34.247236][    C0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[   34.247425][    C0] Call Trace:
[   34.251534][    C0]  <SOFTIRQ>
[   34.251534][    C0]  dump_stack_lvl+0x80/0xd0
[   34.251534][    C0]  dump_stack+0xd/0x14
[   34.251534][    C0]  __might_resched+0x245/0x260
[   34.254546][    C0]  __might_sleep+0x3a/0x60
[   34.254546][    C0]  __mutex_lock_common+0x53/0x1028
[   34.254546][    C0]  ? lock_acquire+0xce/0x260
[   34.254546][    C0]  mutex_lock_nested+0x21/0x30
[   34.254546][    C0]  ? wq_watchdog_timer_fn+0x41/0x4b0
[   34.254546][    C0]  ? pool_mayday_timeout+0x160/0x160
[   34.254546][    C0]  wq_watchdog_timer_fn+0x41/0x4b0
[   34.254546][    C0]  ? pool_mayday_timeout+0x160/0x160
[   34.254546][    C0]  call_timer_fn+0x9c/0x230
[   34.254546][    C0]  run_timer_softirq+0x2d8/0x380
[   34.254546][    C0]  __do_softirq+0x128/0x41f
[   34.254546][    C0]  ? do_softirq_own_stack+0x22/0x30
[   34.254546][    C0]  ? __kprobes_text_end+0xc/0xc
[   34.254546][    C0]  do_softirq_own_stack+0x22/0x30
[   34.254546][    C0]  </SOFTIRQ>
[   34.254546][    C0]  ? sysvec_call_function_single+0x30/0x30
[   34.254546][    C0]  irq_exit_rcu+0x6a/0xa0
[   34.254546][    C0]  sysvec_apic_timer_interrupt+0x1f/0x34
[   34.254546][    C0]  handle_exception+0x133/0x133
[   34.254546][    C0] EIP: intel_idle_hlt+0x10/0x18
[   34.254546][    C0] Code: 64 8b 3d 40 41 ed 83 f0 80 4f 02 20 f7 07 08 00 00 00 75 b1 e9 67 ff ff ff 90 55 89 e5 89 c8 eb 07 0f 00 2d f4 24 d7 82 fb f4 <fa> 5d 31
 c9 c3 90 90 90 55 89 e5 89 c8 fb eb 07 0f 00 2d f4 24 d7
[   34.254546][    C0] EAX: 00000002 EBX: 00000002 ECX: 00000002 EDX: 83b1c114
[   34.254546][    C0] ESI: e96152b4 EDI: 83b1c1d8 EBP: 83709ee4 ESP: 83709ee4
[   34.254546][    C0] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00200246
[   34.254546][    C0]  ? sysvec_call_function_single+0x30/0x30
[   34.254546][    C0]  ? intel_idle_hlt+0x10/0x18
[   34.254546][    C0]  cpuidle_enter_state+0xfa/0x48f
[   34.254546][    C0]  ? ktime_get+0x89/0x130
[   34.254546][    C0]  cpuidle_enter+0x2f/0x50
[   34.254546][    C0]  do_idle+0x1ab/0x250
[   34.254546][    C0]  cpu_startup_entry+0x25/0x30
[   34.254546][    C0]  rest_init+0x185/0x190
[   34.254546][    C0]  arch_call_rest_init+0xd/0x10
[   34.254546][    C0]  start_kernel+0x331/0x3a0
[   34.254546][    C0]  i386_start_kernel+0x21e/0x228
[   34.254546][    C0]  startup_32_smp+0x151/0x154
[   34.301725][    C0]
[   34.302134][    C0] =============================
[   34.303085][    C0] [ BUG: Invalid wait context ]
[   34.304063][    C0] 6.5.0-rc1-00045-gf87686da5e32 #1 Tainted: G        W
[   34.305573][    C0] -----------------------------
[   34.305718][    C0] swapper/0/0 is trying to lock:
[   34.305718][    C0] 837409e4 (wq_pool_mutex){+.+.}-{3:3}, at: wq_watchdog_timer_fn+0x41/0x4b0
[   34.305718][    C0] other info that might help us debug this:
[   34.305718][    C0] context-{2:2}
[   34.305718][    C0] 1 lock held by swapper/0/0:
[   34.305718][    C0]  #0: 84fdbf20 ((&wq_watchdog_timer)){+.-.}-{0:0}, at: call_timer_fn+0x52/0x230
[   34.305718][    C0] stack backtrace:
[   34.305718][    C0] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W          6.5.0-rc1-00045-gf87686da5e32 #1
[   34.305718][    C0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[   34.305718][    C0] Call Trace:
[   34.305718][    C0]  <SOFTIRQ>
[   34.305718][    C0]  dump_stack_lvl+0x80/0xd0
[   34.305718][    C0]  dump_stack+0xd/0x14
[   34.305718][    C0]  __lock_acquire+0xd5e/0x2c80
[   34.305718][    C0]  ? irqtime_account_irq+0x66/0x90
[   34.305718][    C0]  ? irqentry_exit+0x83/0x8c
[   34.305718][    C0]  ? sysvec_call_function_single+0x30/0x30
[   34.305718][    C0]  ? sysvec_apic_timer_interrupt+0x29/0x34
[   34.305718][    C0]  ? handle_exception+0x133/0x133
[   34.305718][    C0]  lock_acquire+0xce/0x260
[   34.305718][    C0]  ? wq_watchdog_timer_fn+0x41/0x4b0
[   34.305718][    C0]  ? dump_stack_lvl+0xaf/0xd0
[   34.305718][    C0]  ? dump_stack+0xd/0x14
[   34.305718][    C0]  ? __might_resched+0x251/0x260
[   34.305718][    C0]  __mutex_lock_common+0xa8/0x1028
[   34.305718][    C0]  ? wq_watchdog_timer_fn+0x41/0x4b0
[   34.305718][    C0]  ? lock_acquire+0xce/0x260
[   34.305718][    C0]  mutex_lock_nested+0x21/0x30
[   34.305718][    C0]  ? wq_watchdog_timer_fn+0x41/0x4b0
[   34.305718][    C0]  ? pool_mayday_timeout+0x160/0x160
[   34.305718][    C0]  wq_watchdog_timer_fn+0x41/0x4b0
[   34.305718][    C0]  ? pool_mayday_timeout+0x160/0x160
[   34.305718][    C0]  call_timer_fn+0x9c/0x230
[   34.305718][    C0]  run_timer_softirq+0x2d8/0x380
[   34.305718][    C0]  __do_softirq+0x128/0x41f
[   34.305718][    C0]  ? do_softirq_own_stack+0x22/0x30
[   34.305718][    C0]  ? __kprobes_text_end+0xc/0xc
[   34.305718][    C0]  do_softirq_own_stack+0x22/0x30
[   34.305718][    C0]  </SOFTIRQ>
[   34.305718][    C0]  ? sysvec_call_function_single+0x30/0x30
[   34.305718][    C0]  irq_exit_rcu+0x6a/0xa0
[   34.305718][    C0]  sysvec_apic_timer_interrupt+0x1f/0x34
[   34.305718][    C0]  handle_exception+0x133/0x133
[   34.305718][    C0] EIP: intel_idle_hlt+0x10/0x18
[   34.305718][    C0] Code: 64 8b 3d 40 41 ed 83 f0 80 4f 02 20 f7 07 08 00 00 00 75 b1 e9 67 ff ff ff 90 55 89 e5 89 c8 eb 07 0f 00 2d f4 24 d7 82 fb f4 <fa> 5d 31
 c9 c3 90 90 90 55 89 e5 89 c8 fb eb 07 0f 00 2d f4 24 d7
[   34.305718][    C0] EAX: 00000002 EBX: 00000002 ECX: 00000002 EDX: 83b1c114
[   34.305718][    C0] ESI: e96152b4 EDI: 83b1c1d8 EBP: 83709ee4 ESP: 83709ee4
[   34.305718][    C0] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00200246
[   34.305718][    C0]  ? sysvec_call_function_single+0x30/0x30
[   34.305718][    C0]  ? intel_idle_hlt+0x10/0x18
[   34.305718][    C0]  cpuidle_enter_state+0xfa/0x48f
[   34.305718][    C0]  ? ktime_get+0x89/0x130
[   34.305718][    C0]  cpuidle_enter+0x2f/0x50
[   34.305718][    C0]  do_idle+0x1ab/0x250
[   34.305718][    C0]  cpu_startup_entry+0x25/0x30
[   34.305718][    C0]  rest_init+0x185/0x190
[   34.305718][    C0]  arch_call_rest_init+0xd/0x10
[   34.305718][    C0]  start_kernel+0x331/0x3a0
[   34.305718][    C0]  i386_start_kernel+0x21e/0x228
[   34.305718][    C0]  startup_32_smp+0x151/0x154



The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20230824/202308241637.304413c4-oliver.sang@intel.com



-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Re: [PATCH v2] workqueue: let WORKER_CPU_INTENSIVE be included in watchdog
Posted by Tejun Heo 2 years, 5 months ago
On Tue, Aug 22, 2023 at 07:06:51PM +0800, Tio Zhang wrote:
> When a pool has a worker with WORKER_CPU_INTENSIVE set but other workers
> are not that busy, the pool->worklist will mostly be empty, which leads
> the intensive work always having a chance of escaping from the watchdog's
> check. This may cause watchdog miss finding out a forever running work
> in WQ_CPU_INTENSIVE.
> 
> Also, after commit '616db8779b1e3f93075df691432cccc5ef3c3ba0',
> workers with potentially intensive works will automatically be converted
> into WORKER_CPU_INTENSIVE. This might let watchdog to miss all work
> potentially running forever.

The watchdog is there to ensure forward progress. CPU_INTENSIVE work items
don't block other work items, so I'm not sure why it should trigger the
watchdog.

Thanks.

-- 
tejun