io_uring/io-wq: let workers exit when unused

[PATCH v1 1/2] io-wq: add exit-on-idle mode

Posted by Li Chen 6 days, 19 hours ago

io-wq uses an idle timeout to shrink the pool, but keeps the last worker
around indefinitely to avoid churn.

For tasks that used io_uring for file I/O and then stop using io_uring,
this can leave an iou-wrk-* thread behind even after all io_uring instances
are gone. This is unnecessary overhead and also gets in the way of process
checkpoint/restore.

Add an exit-on-idle mode that makes all io-wq workers exit as soon as they
become idle, and provide io_wq_set_exit_on_idle() to toggle it.

Signed-off-by: Li Chen <me@linux.beauty>
---
 io_uring/io-wq.c | 31 +++++++++++++++++++++++++++++++
 io_uring/io-wq.h |  1 +
 2 files changed, 32 insertions(+)

diff --git a/io_uring/io-wq.c b/io_uring/io-wq.c
index 5d0928f37471..97e7eb847c6e 100644
--- a/io_uring/io-wq.c
+++ b/io_uring/io-wq.c
@@ -35,6 +35,7 @@ enum {
 
 enum {
 	IO_WQ_BIT_EXIT		= 0,	/* wq exiting */
+	IO_WQ_BIT_EXIT_ON_IDLE	= 1,	/* allow all workers to exit on idle */
 };
 
 enum {
@@ -655,6 +656,18 @@ static int io_wq_worker(void *data)
 			io_worker_handle_work(acct, worker);
 
 		raw_spin_lock(&wq->lock);
+		/*
+		 * If wq is marked idle-exit, drop this worker as soon as it
+		 * becomes idle. This is used to avoid keeping io-wq worker
+		 * threads around for tasks that no longer have any active
+		 * io_uring instances.
+		 */
+		if (test_bit(IO_WQ_BIT_EXIT_ON_IDLE, &wq->state)) {
+			acct->nr_workers--;
+			raw_spin_unlock(&wq->lock);
+			__set_current_state(TASK_RUNNING);
+			break;
+		}
 		/*
 		 * Last sleep timed out. Exit if we're not the last worker,
 		 * or if someone modified our affinity.
@@ -894,6 +907,24 @@ static bool io_wq_worker_wake(struct io_worker *worker, void *data)
 	return false;
 }
 
+void io_wq_set_exit_on_idle(struct io_wq *wq, bool enable)
+{
+	if (!wq->task)
+		return;
+
+	if (!enable) {
+		clear_bit(IO_WQ_BIT_EXIT_ON_IDLE, &wq->state);
+		return;
+	}
+
+	if (test_and_set_bit(IO_WQ_BIT_EXIT_ON_IDLE, &wq->state))
+		return;
+
+	rcu_read_lock();
+	io_wq_for_each_worker(wq, io_wq_worker_wake, NULL);
+	rcu_read_unlock();
+}
+
 static void io_run_cancel(struct io_wq_work *work, struct io_wq *wq)
 {
 	do {
diff --git a/io_uring/io-wq.h b/io_uring/io-wq.h
index b3b004a7b625..f7f17a23693e 100644
--- a/io_uring/io-wq.h
+++ b/io_uring/io-wq.h
@@ -46,6 +46,7 @@ struct io_wq_data {
 struct io_wq *io_wq_create(unsigned bounded, struct io_wq_data *data);
 void io_wq_exit_start(struct io_wq *wq);
 void io_wq_put_and_exit(struct io_wq *wq);
+void io_wq_set_exit_on_idle(struct io_wq *wq, bool enable);
 
 void io_wq_enqueue(struct io_wq *wq, struct io_wq_work *work);
 void io_wq_hash_work(struct io_wq_work *work, void *val);
-- 
2.52.0

Re: [PATCH v1 1/2] io-wq: add exit-on-idle mode

Posted by Jens Axboe 6 days, 18 hours ago

On 2/2/26 7:37 AM, Li Chen wrote:
> diff --git a/io_uring/io-wq.c b/io_uring/io-wq.c
> index 5d0928f37471..97e7eb847c6e 100644
> --- a/io_uring/io-wq.c
> +++ b/io_uring/io-wq.c
> @@ -655,6 +656,18 @@ static int io_wq_worker(void *data)
>  			io_worker_handle_work(acct, worker);
>  
>  		raw_spin_lock(&wq->lock);
> +		/*
> +		 * If wq is marked idle-exit, drop this worker as soon as it
> +		 * becomes idle. This is used to avoid keeping io-wq worker
> +		 * threads around for tasks that no longer have any active
> +		 * io_uring instances.
> +		 */
> +		if (test_bit(IO_WQ_BIT_EXIT_ON_IDLE, &wq->state)) {
> +			acct->nr_workers--;
> +			raw_spin_unlock(&wq->lock);
> +			__set_current_state(TASK_RUNNING);
> +			break;
> +		}
>  		/*
>  		 * Last sleep timed out. Exit if we're not the last worker,
>  		 * or if someone modified our affinity.

One more note - just add this test_bit() to the check right below, then
you avoid duplicating all of that exit logic. They do the exact same
thing.

-- 
Jens Axboe

Re: [PATCH v1 1/2] io-wq: add exit-on-idle mode

Posted by Jens Axboe 6 days, 18 hours ago

On 2/2/26 7:37 AM, Li Chen wrote:
> io-wq uses an idle timeout to shrink the pool, but keeps the last worker
> around indefinitely to avoid churn.
> 
> For tasks that used io_uring for file I/O and then stop using io_uring,
> this can leave an iou-wrk-* thread behind even after all io_uring instances
> are gone. This is unnecessary overhead and also gets in the way of process
> checkpoint/restore.
> 
> Add an exit-on-idle mode that makes all io-wq workers exit as soon as they
> become idle, and provide io_wq_set_exit_on_idle() to toggle it.

Was going to say, rather than add a mode for this, why not just have the
idle single worker exit when the last ring is closed? But that is indeed
exactly what these two patches do. So I think this is fine, I just don't
think using the word "mode" for it is correct. "state" would be a lot
better - if we have all rings exited, then that's a state change in
terms of yeah let's just dump that idle worker.

With that in mind, I think these two patches look fine. I'll give them a
closer look. Could you perhaps write a test case for this?

-- 
Jens Axboe