workqueue: add workqueue.mayday_initial_timeout

[PATCH] workqueue: add workqueue.mayday_initial_timeout

Posted by ying chen 3 months ago

If creating a new worker takes longer than MAYDAY_INITIAL_TIMEOUT,
the rescuer thread will be woken up to process works scheduled on
@pool, resulting in sequential execution of all works. This may lead
to a situation where one work blocks others. However, the initial
rescue timeout defaults to 10 milliseconds, which can easily be
triggered in heavy-load environments.
---
 Documentation/admin-guide/kernel-parameters.txt | 4 ++++
 kernel/workqueue.c                              | 5 ++++-
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt
b/Documentation/admin-guide/kernel-parameters.txt
index 149bfa7..be3f488 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -7376,6 +7376,10 @@
                        When enabled, memory and cache locality will be
                        impacted.

+       workqueue.mayday_initial_timeout
+                       Set the initial timeout (jiffies) for the mayday timer.
+                       Default is MAYDAY_INITIAL_TIMEOUT.
+
        writecombine=   [LOONGARCH,EARLY] Control the MAT (Memory Access
                        Type) of ioremap_wc().

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 003474c..c810b61 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -481,6 +481,9 @@ static DEFINE_PER_CPU_SHARED_ALIGNED(struct
irq_work [NR_STD_WORKER_POOLS],
 static DEFINE_PER_CPU_SHARED_ALIGNED(struct worker_pool [NR_STD_WORKER_POOLS],
                                     bh_worker_pools);

+static unsigned long wq_mayday_initial_timeout = MAYDAY_INITIAL_TIMEOUT;
+module_param_named(mayday_initial_timeout, wq_mayday_initial_timeout,
ulong, 0644);
+
 /* the per-cpu worker pools */
 static DEFINE_PER_CPU_SHARED_ALIGNED(struct worker_pool [NR_STD_WORKER_POOLS],
                                     cpu_worker_pools);
@@ -3050,7 +3053,7 @@ static void maybe_create_worker(struct worker_pool *pool)
        raw_spin_unlock_irq(&pool->lock);

        /* if we don't make progress in MAYDAY_INITIAL_TIMEOUT, call for help */
-       mod_timer(&pool->mayday_timer, jiffies + MAYDAY_INITIAL_TIMEOUT);
+       mod_timer(&pool->mayday_timer, jiffies + wq_mayday_initial_timeout);

        while (true) {
                if (create_worker(pool) || !need_to_create_worker(pool))
--
1.8.3.1

Re: [PATCH] workqueue: add workqueue.mayday_initial_timeout

Posted by Tejun Heo 2 months, 4 weeks ago

Hello,

On Tue, Nov 11, 2025 at 10:52:44AM +0800, ying chen wrote:
> If creating a new worker takes longer than MAYDAY_INITIAL_TIMEOUT,
> the rescuer thread will be woken up to process works scheduled on
> @pool, resulting in sequential execution of all works. This may lead
> to a situation where one work blocks others. However, the initial
> rescue timeout defaults to 10 milliseconds, which can easily be
> triggered in heavy-load environments.

This is not how workqueue works. Rescuer doesn't exclude other workers. If
other workers become available, they will run the workqueue concurrently.
All that initial timeout achieves is delaying the initial execution from the
rescuer.

Is this from observing real behaviors? If so, what was the test case and how
did the behavior after the patch? It couldn't have gotten better.

Thanks.

-- 
tejun

Re: [PATCH] workqueue: add workqueue.mayday_initial_timeout

Posted by ying chen 2 months, 4 weeks ago

On Wed, Nov 12, 2025 at 4:40 AM Tejun Heo <tj@kernel.org> wrote:
>
> Hello,
>
> On Tue, Nov 11, 2025 at 10:52:44AM +0800, ying chen wrote:
> > If creating a new worker takes longer than MAYDAY_INITIAL_TIMEOUT,
> > the rescuer thread will be woken up to process works scheduled on
> > @pool, resulting in sequential execution of all works. This may lead
> > to a situation where one work blocks others. However, the initial
> > rescue timeout defaults to 10 milliseconds, which can easily be
> > triggered in heavy-load environments.
>
> This is not how workqueue works. Rescuer doesn't exclude other workers. If
> other workers become available, they will run the workqueue concurrently.
> All that initial timeout achieves is delaying the initial execution from the
> rescuer.
>
> Is this from observing real behaviors? If so, what was the test case and how
> did the behavior after the patch? It couldn't have gotten better.
>
> Thanks.
>
> --
> tejun

We encountered an XFS deadlock issue. However, unlike the scenario
described in the patch,
 in our case the rescuer thread was still woken up even when memory
was sufficient, likely due to heavy load.
patch： xfs: don't use BMBT btree split workers for IO completion
(c85007e2e3942da1f9361e4b5a9388ea3a8dcc5b)

Works that have already been scheduled will be executed sequentially
within the rescuer thread.
static int rescuer_thread(void *__rescuer)
{
                ......
                /*
                 * Slurp in all works issued via this workqueue and
                 * process'em.
                 */
                WARN_ON_ONCE(!list_empty(scheduled));
                list_for_each_entry_safe(work, n, &pool->worklist, entry) {
                        if (get_work_pwq(work) == pwq) {
                                if (first)
                                        pool->watchdog_ts = jiffies;
                                move_linked_works(work, scheduled, &n);
                        }
                        first = false;
                }

                if (!list_empty(scheduled)) {
                        process_scheduled_works(rescuer);
                        ......
                }

Re: [PATCH] workqueue: add workqueue.mayday_initial_timeout

Posted by Tejun Heo 2 months, 4 weeks ago

Hello,

On Wed, Nov 12, 2025 at 10:01:10AM +0800, ying chen wrote:
> Works that have already been scheduled will be executed sequentially
> within the rescuer thread.
> static int rescuer_thread(void *__rescuer)
> {
>                 ......
>                 /*
>                  * Slurp in all works issued via this workqueue and
>                  * process'em.
>                  */
>                 WARN_ON_ONCE(!list_empty(scheduled));
>                 list_for_each_entry_safe(work, n, &pool->worklist, entry) {
>                         if (get_work_pwq(work) == pwq) {
>                                 if (first)
>                                         pool->watchdog_ts = jiffies;
>                                 move_linked_works(work, scheduled, &n);
>                         }
>                         first = false;
>                 }
> 
>                 if (!list_empty(scheduled)) {
>                         process_scheduled_works(rescuer);
>                         ......
>                 }

Ah, I see what you mean. The slurping is to avoid potentially O(N^2)
scanning but that probably the wrong trade-off to make here. I think the
right solution is making it break out after finding the first matching work
item and loop outside so that it processes work item one by one.

Thanks.

-- 
tejun

Re: [PATCH] workqueue: add workqueue.mayday_initial_timeout

Posted by Lai Jiangshan 2 months, 3 weeks ago

Hello.

On Thu, Nov 13, 2025 at 12:03 AM Tejun Heo <tj@kernel.org> wrote:

>
>
> Ah, I see what you mean. The slurping is to avoid potentially O(N^2)
> scanning but that probably the wrong trade-off to make here. I think the
> right solution is making it break out after finding the first matching work
> item and loop outside so that it processes work item one by one.
>

I'm implementing it. A positional dummy work item is added to mark the
position for the next scan after processing the previous one.

I’ll send it soon.

Still, I suggest that the workqueue user eliminate any dependencies among
the work items when using a rescuer. If the memory pressure is not relieved
and no normal workers come to help, it simply won’t work.

thanks
Lai

Re: [PATCH] workqueue: add workqueue.mayday_initial_timeout

Posted by ying chen 2 months, 4 weeks ago

On Thu, Nov 13, 2025 at 12:03 AM Tejun Heo <tj@kernel.org> wrote:
>
> Hello,
>
> On Wed, Nov 12, 2025 at 10:01:10AM +0800, ying chen wrote:
> > Works that have already been scheduled will be executed sequentially
> > within the rescuer thread.
> > static int rescuer_thread(void *__rescuer)
> > {
> >                 ......
> >                 /*
> >                  * Slurp in all works issued via this workqueue and
> >                  * process'em.
> >                  */
> >                 WARN_ON_ONCE(!list_empty(scheduled));
> >                 list_for_each_entry_safe(work, n, &pool->worklist, entry) {
> >                         if (get_work_pwq(work) == pwq) {
> >                                 if (first)
> >                                         pool->watchdog_ts = jiffies;
> >                                 move_linked_works(work, scheduled, &n);
> >                         }
> >                         first = false;
> >                 }
> >
> >                 if (!list_empty(scheduled)) {
> >                         process_scheduled_works(rescuer);
> >                         ......
> >                 }
>
> Ah, I see what you mean. The slurping is to avoid potentially O(N^2)
> scanning but that probably the wrong trade-off to make here. I think the
> right solution is making it break out after finding the first matching work
> item and loop outside so that it processes work item one by one.

Processing work items one-by-one is indeed an excellent solution.
However, wouldn't it also be necessary to provide a method for
adjusting the mayday initial timeout?

Thanks.

Re: [PATCH] workqueue: add workqueue.mayday_initial_timeout

Posted by Tejun Heo 2 months, 3 weeks ago

Hello,

On Thu, Nov 13, 2025 at 10:34:43AM +0800, ying chen wrote:
> Processing work items one-by-one is indeed an excellent solution.
> However, wouldn't it also be necessary to provide a method for
> adjusting the mayday initial timeout?

Adding an interface like that isn't difficult but I'm not sure what that
would achieve. A rescuer is there to guarantee forward progress when the
system is under memory pressure and processing work items of the workqueue
may be required to free up memory. IOW, when that workqueue not making
forward progress can lead to system deadlock.

As such, this doesn't have that much system performance implications (aside
from the serialization effect that you raised). If rescuer is needed, the
system is in tatters anyway, especially in terms of latency response, so I'm
not sure what fine-tuning rescuer response time would help with.

Thanks.

-- 
tejun

Re: [PATCH] workqueue: add workqueue.mayday_initial_timeout

Posted by Matthew Wilcox 3 months ago

On Tue, Nov 11, 2025 at 10:52:44AM +0800, ying chen wrote:
> If creating a new worker takes longer than MAYDAY_INITIAL_TIMEOUT,
> the rescuer thread will be woken up to process works scheduled on
> @pool, resulting in sequential execution of all works. This may lead
> to a situation where one work blocks others. However, the initial
> rescue timeout defaults to 10 milliseconds, which can easily be
> triggered in heavy-load environments.

Then we should tune it automatically, not blame the admin for not tuning
this parameter.

Re: [PATCH] workqueue: add workqueue.mayday_initial_timeout

Posted by Lai Jiangshan 3 months ago

Hello, Chen

Thanks for trying to improve.

On Tue, Nov 11, 2025 at 10:55 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Tue, Nov 11, 2025 at 10:52:44AM +0800, ying chen wrote:
> > If creating a new worker takes longer than MAYDAY_INITIAL_TIMEOUT,
> > the rescuer thread will be woken up to process works scheduled on
> > @pool, resulting in sequential execution of all works. This may lead
> > to a situation where one work blocks others. However, the initial
> > rescue timeout defaults to 10 milliseconds, which can easily be
> > triggered in heavy-load environments.

This is what it was intended for. A workqueue equipped with a rescuer
is considered a single-threaded workqueue, where deadlock can occur
when there are works depending on each other.

It is recommended that the user modify the works to eliminate any
dependencies or consider using multiple workqueues. Increasing the
timeout doesn’t solve the problem; it merely hides it.

Changing the workqueue implementation to support concurrency for
the rescuer would complicate the design unnecessarily. The use of a
rescuer is only for memory reclamation, where the works should be
simple, focusing on freeing memory rather than waiting each other.

Thanks
Lai

>
> Then we should tune it automatically, not blame the admin for not tuning
> this parameter.

Re: [PATCH] workqueue: add workqueue.mayday_initial_timeout

Posted by ying chen 3 months ago

On Tue, Nov 11, 2025 at 10:55 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Tue, Nov 11, 2025 at 10:52:44AM +0800, ying chen wrote:
> > If creating a new worker takes longer than MAYDAY_INITIAL_TIMEOUT,
> > the rescuer thread will be woken up to process works scheduled on
> > @pool, resulting in sequential execution of all works. This may lead
> > to a situation where one work blocks others. However, the initial
> > rescue timeout defaults to 10 milliseconds, which can easily be
> > triggered in heavy-load environments.
>
> Then we should tune it automatically, not blame the admin for not tuning
> this parameter.
Currently, dynamic tuning of the mayday timer's initial timeout is not
supported.