[PATCH v2] io_uring/io-wq: re-check IO_WQ_BIT_EXIT for each linked work item

Runyu Xiao posted 1 patch 1 week, 4 days ago
io_uring/io-wq.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
[PATCH v2] io_uring/io-wq: re-check IO_WQ_BIT_EXIT for each linked work item
Posted by Runyu Xiao 1 week, 4 days ago
commit 10dc95939817 ("io_uring/io-wq: check IO_WQ_BIT_EXIT inside work
run loop") fixed the obvious case where io_worker_handle_work() took one
exit-bit snapshot before draining pending work, but the fix stops one
level too early.

io_worker_handle_work() now re-checks IO_WQ_BIT_EXIT in its outer work
run loop, yet it still snapshots that bit once before processing a
whole dependent linked-work chain. If io_wq_exit_start() sets
IO_WQ_BIT_EXIT after the first linked item has started, the remaining
linked items can still reuse stale do_kill = false, skip
IO_WQ_WORK_CANCEL, and continue running after exit has begun.

That means the previous fix did not fully eliminate the exit-latency
problem; it only narrowed it to linked chains. A long or slow linked
chain can still keep io-wq exit waiting for work that should already
have been canceled.

The issue was found on Linux v6.18.21 by our static-analysis tool,
which flagged linked-work loops that snapshot shared exit state
outside per-item cancel decisions, and was then confirmed by manual
auditing of io_worker_handle_work(). It was later reproduced with a
QEMU no-device validation selftest that preserved the same contract:
a three-node unbound linked chain, an exit actor setting
IO_WQ_BIT_EXIT after work1, and slow post-exit linked work. With a
3000 ms delay injected into each post-exit item, the buggy path
spends about 6066 ms after exit running work2/work3, while the fixed
path cancels both and finishes in about 2 ms.

Re-check test_bit(IO_WQ_BIT_EXIT, &wq->state) for each iteration of the
dependent-link loop, right before deciding whether to cancel the
current work item. That closes the remaining stale-snapshot window and
prevents linked post-exit work from stretching shutdown latency.

Build-tested by compiling io_uring/io-wq.o on x86_64 with the local
.config. No special hardware was required.

Fixes: 10dc95939817 ("io_uring/io-wq: check IO_WQ_BIT_EXIT inside work run loop")
Cc: stable@vger.kernel.org
Signed-off-by: Runyu Xiao <runyu.xiao@seu.edu.cn>
---
v2:
  - resend for upstream instead of stable
  - point commit text and Fixes: to the upstream commit

 io_uring/io-wq.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/io_uring/io-wq.c b/io_uring/io-wq.c
index 49a9c914b4e9..28d81398ebee 100644
--- a/io_uring/io-wq.c
+++ b/io_uring/io-wq.c
@@ -601,7 +601,6 @@ static void io_worker_handle_work(struct io_wq_acct *acct,
 	struct io_wq *wq = worker->wq;
 
 	do {
-		bool do_kill = test_bit(IO_WQ_BIT_EXIT, &wq->state);
 		struct io_wq_work *work;
 
 		/*
@@ -637,6 +636,7 @@ static void io_worker_handle_work(struct io_wq_acct *acct,
 
 		/* handle a whole dependent link */
 		do {
+			bool do_kill = test_bit(IO_WQ_BIT_EXIT, &wq->state);
 			struct io_wq_work *next_hashed, *linked;
 			unsigned int work_flags = atomic_read(&work->flags);
 			unsigned int hash = __io_wq_is_hashed(work_flags)
-- 
2.34.1
Re: [PATCH v2] io_uring/io-wq: re-check IO_WQ_BIT_EXIT for each linked work item
Posted by Jens Axboe 1 week, 4 days ago
On Thu, 28 May 2026 01:22:03 +0800, Runyu Xiao wrote:
> commit 10dc95939817 ("io_uring/io-wq: check IO_WQ_BIT_EXIT inside work
> run loop") fixed the obvious case where io_worker_handle_work() took one
> exit-bit snapshot before draining pending work, but the fix stops one
> level too early.
> 
> io_worker_handle_work() now re-checks IO_WQ_BIT_EXIT in its outer work
> run loop, yet it still snapshots that bit once before processing a
> whole dependent linked-work chain. If io_wq_exit_start() sets
> IO_WQ_BIT_EXIT after the first linked item has started, the remaining
> linked items can still reuse stale do_kill = false, skip
> IO_WQ_WORK_CANCEL, and continue running after exit has begun.
> 
> [...]

Applied, thanks!

[1/1] io_uring/io-wq: re-check IO_WQ_BIT_EXIT for each linked work item
      commit: 29bef9934b2521f787bb15dd1985d4c0d12ae02a

Best regards,
-- 
Jens Axboe