[PATCH] thread-pool: Allow at least 1 thread in thread_pool_adjust_max_threads_to_work()

Maciej S. Szmigiero posted 1 patch 1 week, 1 day ago
Failed in applying to current master (apply log)
util/thread-pool.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
[PATCH] thread-pool: Allow at least 1 thread in thread_pool_adjust_max_threads_to_work()
Posted by Maciej S. Szmigiero 1 week, 1 day ago
From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>

thread_pool_adjust_max_threads_to_work() is supposed to give each task its
own thread by setting the pool max thread count limit accordingly.

However, if there aren't any tasks currently in the pool the pool max
thread count will be set to 0, which will trigger an assertion failure
in thread_pool_set_max_threads() - because setting this value would
completely block the pool by not allowing it to process any submitted
tasks.

This also can happen if a task is submitted via
thread_pool_submit_immediate() to an empty pool but the task completes so
quickly that by the time this function calls
thread_pool_adjust_max_threads_to_work() the pool again has no unfinished
tasks in it.

Fix this by making sure that the pool is allowed to create at least 1
thread.

Fixes: b5aa74968b27 ("thread-pool: Implement generic (non-AIO) pool support")
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
---
 util/thread-pool.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/util/thread-pool.c b/util/thread-pool.c
index 8f8cb38d5ce0..4e75191c983e 100644
--- a/util/thread-pool.c
+++ b/util/thread-pool.c
@@ -493,5 +493,5 @@ bool thread_pool_adjust_max_threads_to_work(ThreadPool *pool)
 {
     QEMU_LOCK_GUARD(&pool->cur_work_lock);
 
-    return thread_pool_set_max_threads(pool, pool->cur_work);
+    return thread_pool_set_max_threads(pool, MAX(pool->cur_work, 1));
 }
Re: [PATCH] thread-pool: Allow at least 1 thread in thread_pool_adjust_max_threads_to_work()
Posted by Peter Xu 3 days, 20 hours ago
On Thu, May 21, 2026 at 09:06:07PM +0200, Maciej S. Szmigiero wrote:
> From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>
> 
> thread_pool_adjust_max_threads_to_work() is supposed to give each task its
> own thread by setting the pool max thread count limit accordingly.
> 
> However, if there aren't any tasks currently in the pool the pool max
> thread count will be set to 0, which will trigger an assertion failure
> in thread_pool_set_max_threads() - because setting this value would
> completely block the pool by not allowing it to process any submitted
> tasks.
> 
> This also can happen if a task is submitted via
> thread_pool_submit_immediate() to an empty pool but the task completes so
> quickly that by the time this function calls
> thread_pool_adjust_max_threads_to_work() the pool again has no unfinished
> tasks in it.

Sorry for a late comment. Just curious: how easy is this to reproduce?

> 
> Fix this by making sure that the pool is allowed to create at least 1
> thread.

But then it means we have no work and then we will create one thread does
nothing..

I suspect the real culprit is we released the cur_work_lock during the
whole process of thread_pool_submit_immediate().  If we take it during the
whole window this will be a no-issue too.

The other question is, if it is awkward to manually adjust num of threads,
shall we set num to be -1 (unlimited) while pool created?

Thanks,

> 
> Fixes: b5aa74968b27 ("thread-pool: Implement generic (non-AIO) pool support")
> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
> ---
>  util/thread-pool.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/util/thread-pool.c b/util/thread-pool.c
> index 8f8cb38d5ce0..4e75191c983e 100644
> --- a/util/thread-pool.c
> +++ b/util/thread-pool.c
> @@ -493,5 +493,5 @@ bool thread_pool_adjust_max_threads_to_work(ThreadPool *pool)
>  {
>      QEMU_LOCK_GUARD(&pool->cur_work_lock);
>  
> -    return thread_pool_set_max_threads(pool, pool->cur_work);
> +    return thread_pool_set_max_threads(pool, MAX(pool->cur_work, 1));
>  }
> 

-- 
Peter Xu
Re: [PATCH] thread-pool: Allow at least 1 thread in thread_pool_adjust_max_threads_to_work()
Posted by Maciej S. Szmigiero 3 days, 19 hours ago
On 26.05.2026 22:33, Peter Xu wrote:
> On Thu, May 21, 2026 at 09:06:07PM +0200, Maciej S. Szmigiero wrote:
>> From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>
>>
>> thread_pool_adjust_max_threads_to_work() is supposed to give each task its
>> own thread by setting the pool max thread count limit accordingly.
>>
>> However, if there aren't any tasks currently in the pool the pool max
>> thread count will be set to 0, which will trigger an assertion failure
>> in thread_pool_set_max_threads() - because setting this value would
>> completely block the pool by not allowing it to process any submitted
>> tasks.
>>
>> This also can happen if a task is submitted via
>> thread_pool_submit_immediate() to an empty pool but the task completes so
>> quickly that by the time this function calls
>> thread_pool_adjust_max_threads_to_work() the pool again has no unfinished
>> tasks in it.
> 
> Sorry for a late comment. Just curious: how easy is this to reproduce?

It's difficult to reproduce in most setups.

My main VFIO live migration setup never hit it for more than a year, other
similar setup hit it recently 3 times.

On the other hand, putting sleep(5) in the middle of
thread_pool_submit_immediate() makes it reproduce nearly always for me.

>>
>> Fix this by making sure that the pool is allowed to create at least 1
>> thread.
> 
> But then it means we have no work and then we will create one thread does
> nothing..

thread_pool_adjust_max_threads_to_work() is currently called only from
thread_pool_submit_immediate().

If the API user truly wants no threads in the pool for time being
(even though this will completely block the pool) they can use
thread_pool_submit() to submit their task(s).
This won't call thread_pool_adjust_max_threads_to_work() by itself.

Also, since the thread pool is created with zero initial threads by
default the patch does *not* mean that the each newly created pool
will have one idle thread now.

> 
> I suspect the real culprit is we released the cur_work_lock during the
> whole process of thread_pool_submit_immediate().  If we take it during the
> whole window this will be a no-issue too.

True, however this will need splitting both thread_pool_submit() and
thread_pool_adjust_max_threads_to_work() into two versions each: one
which takes pool->cur_work_lock, another which does not and then
take pool->cur_work_lock on their behalf in thread_pool_submit_immediate().

Not to mention this would also mean putting the whole g_thread_pool_push()
-> g_thread_pool_start_thread() machinery under pool->cur_work_lock in this
case instead of just pool->cur_work increment.

I think doing this such way would add unnecessary complexity considering
that its alternative of this patch is a single-line trivial change.

> The other question is, if it is awkward to manually adjust num of threads,
> shall we set num to be -1 (unlimited) while pool created?

We can't - Glib thread pool API does not allow unlimited exclusive pools. 
> Thanks,
>
Thanks,
Maciej
Re: [PATCH] thread-pool: Allow at least 1 thread in thread_pool_adjust_max_threads_to_work()
Posted by Peter Xu 3 days, 18 hours ago
On Tue, May 26, 2026 at 11:06:58PM +0200, Maciej S. Szmigiero wrote:
> On 26.05.2026 22:33, Peter Xu wrote:
> > On Thu, May 21, 2026 at 09:06:07PM +0200, Maciej S. Szmigiero wrote:
> > > From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>
> > > 
> > > thread_pool_adjust_max_threads_to_work() is supposed to give each task its
> > > own thread by setting the pool max thread count limit accordingly.
> > > 
> > > However, if there aren't any tasks currently in the pool the pool max
> > > thread count will be set to 0, which will trigger an assertion failure
> > > in thread_pool_set_max_threads() - because setting this value would
> > > completely block the pool by not allowing it to process any submitted
> > > tasks.
> > > 
> > > This also can happen if a task is submitted via
> > > thread_pool_submit_immediate() to an empty pool but the task completes so
> > > quickly that by the time this function calls
> > > thread_pool_adjust_max_threads_to_work() the pool again has no unfinished
> > > tasks in it.
> > 
> > Sorry for a late comment. Just curious: how easy is this to reproduce?
> 
> It's difficult to reproduce in most setups.
> 
> My main VFIO live migration setup never hit it for more than a year, other
> similar setup hit it recently 3 times.
> 
> On the other hand, putting sleep(5) in the middle of
> thread_pool_submit_immediate() makes it reproduce nearly always for me.

Thanks, I'll attach this info to the patch when queuing.

> 
> > > 
> > > Fix this by making sure that the pool is allowed to create at least 1
> > > thread.
> > 
> > But then it means we have no work and then we will create one thread does
> > nothing..
> 
> thread_pool_adjust_max_threads_to_work() is currently called only from
> thread_pool_submit_immediate().
> 
> If the API user truly wants no threads in the pool for time being
> (even though this will completely block the pool) they can use
> thread_pool_submit() to submit their task(s).
> This won't call thread_pool_adjust_max_threads_to_work() by itself.
> 
> Also, since the thread pool is created with zero initial threads by
> default the patch does *not* mean that the each newly created pool
> will have one idle thread now.

This is another small problem.  I think the ideal case is we create threads
before hand, not on-demand here.  It should be faster. I don't know how
much, maybe it's not measurable, but IIUC that's one good thing about
exclusive thread pool that we didn't really leverage.

I think the system on both sides should know exactly how many threads we
need, hence they can create them before downtime starts.  During the
process, we shouldn't need to adjust num of threads, also avoid the bug
like this.  But I'll leave that to you to decide if it's worthwhile.

> 
> > 
> > I suspect the real culprit is we released the cur_work_lock during the
> > whole process of thread_pool_submit_immediate().  If we take it during the
> > whole window this will be a no-issue too.
> 
> True, however this will need splitting both thread_pool_submit() and
> thread_pool_adjust_max_threads_to_work() into two versions each: one
> which takes pool->cur_work_lock, another which does not and then
> take pool->cur_work_lock on their behalf in thread_pool_submit_immediate().
> 
> Not to mention this would also mean putting the whole g_thread_pool_push()
> -> g_thread_pool_start_thread() machinery under pool->cur_work_lock in this
> case instead of just pool->cur_work increment.
> 
> I think doing this such way would add unnecessary complexity considering
> that its alternative of this patch is a single-line trivial change.
> 
> > The other question is, if it is awkward to manually adjust num of threads,
> > shall we set num to be -1 (unlimited) while pool created?
> 
> We can't - Glib thread pool API does not allow unlimited exclusive pools.

If we don't care about pre-init of thread pools, I'm actually not sure if
we need exclusive pool at all.. maybe it'll still help, to e.g. inherit
scheduler setup of migration thread.

But it's ok anyway, I'll queue this patch for now.

Thanks,

-- 
Peter Xu
Re: [PATCH] thread-pool: Allow at least 1 thread in thread_pool_adjust_max_threads_to_work()
Posted by Maciej S. Szmigiero 1 day, 18 hours ago
On 27.05.2026 00:48, Peter Xu wrote:
> On Tue, May 26, 2026 at 11:06:58PM +0200, Maciej S. Szmigiero wrote:
>> On 26.05.2026 22:33, Peter Xu wrote:
>>> On Thu, May 21, 2026 at 09:06:07PM +0200, Maciej S. Szmigiero wrote:
>>>> From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>
>>>>
>>>> thread_pool_adjust_max_threads_to_work() is supposed to give each task its
>>>> own thread by setting the pool max thread count limit accordingly.
>>>>
>>>> However, if there aren't any tasks currently in the pool the pool max
>>>> thread count will be set to 0, which will trigger an assertion failure
>>>> in thread_pool_set_max_threads() - because setting this value would
>>>> completely block the pool by not allowing it to process any submitted
>>>> tasks.
>>>>
>>>> This also can happen if a task is submitted via
>>>> thread_pool_submit_immediate() to an empty pool but the task completes so
>>>> quickly that by the time this function calls
>>>> thread_pool_adjust_max_threads_to_work() the pool again has no unfinished
>>>> tasks in it.
>>>
>>> Sorry for a late comment. Just curious: how easy is this to reproduce?
>>
>> It's difficult to reproduce in most setups.
>>
>> My main VFIO live migration setup never hit it for more than a year, other
>> similar setup hit it recently 3 times.
>>
>> On the other hand, putting sleep(5) in the middle of
>> thread_pool_submit_immediate() makes it reproduce nearly always for me.
> 
> Thanks, I'll attach this info to the patch when queuing.

Thanks.

>>
>>>>
>>>> Fix this by making sure that the pool is allowed to create at least 1
>>>> thread.
>>>
>>> But then it means we have no work and then we will create one thread does
>>> nothing..
>>
>> thread_pool_adjust_max_threads_to_work() is currently called only from
>> thread_pool_submit_immediate().
>>
>> If the API user truly wants no threads in the pool for time being
>> (even though this will completely block the pool) they can use
>> thread_pool_submit() to submit their task(s).
>> This won't call thread_pool_adjust_max_threads_to_work() by itself.
>>
>> Also, since the thread pool is created with zero initial threads by
>> default the patch does *not* mean that the each newly created pool
>> will have one idle thread now.
> 
> This is another small problem.  I think the ideal case is we create threads
> before hand, not on-demand here.  It should be faster. I don't know how
> much, maybe it's not measurable, but IIUC that's one good thing about
> exclusive thread pool that we didn't really leverage.
> 
> I think the system on both sides should know exactly how many threads we
> need, hence they can create them before downtime starts.  During the
> process, we shouldn't need to adjust num of threads, also avoid the bug
> like this.  But I'll leave that to you to decide if it's worthwhile.

That's indeed a good idea, although this would need some common VFIO
migration handler (not per-VFIO-device one) to count the number of
VFIO devices with multifd migration enabled and scale the pool number
of threads accordingly.

Also, as you wrote, the performance gains for this additional mechanism
could easily be very slight.

However, I will keep this idea in mind when I am going to be working on
further improvements there.

>>
>>>
>>> I suspect the real culprit is we released the cur_work_lock during the
>>> whole process of thread_pool_submit_immediate().  If we take it during the
>>> whole window this will be a no-issue too.
>>
>> True, however this will need splitting both thread_pool_submit() and
>> thread_pool_adjust_max_threads_to_work() into two versions each: one
>> which takes pool->cur_work_lock, another which does not and then
>> take pool->cur_work_lock on their behalf in thread_pool_submit_immediate().
>>
>> Not to mention this would also mean putting the whole g_thread_pool_push()
>> -> g_thread_pool_start_thread() machinery under pool->cur_work_lock in this
>> case instead of just pool->cur_work increment.
>>
>> I think doing this such way would add unnecessary complexity considering
>> that its alternative of this patch is a single-line trivial change.
>>
>>> The other question is, if it is awkward to manually adjust num of threads,
>>> shall we set num to be -1 (unlimited) while pool created?
>>
>> We can't - Glib thread pool API does not allow unlimited exclusive pools.
> 
> If we don't care about pre-init of thread pools, I'm actually not sure if
> we need exclusive pool at all.. maybe it'll still help, to e.g. inherit
> scheduler setup of migration thread.
> 
> But it's ok anyway, I'll queue this patch for now.
> 
> Thanks,
> 

Thanks,
Maciej
Re: [PATCH] thread-pool: Allow at least 1 thread in thread_pool_adjust_max_threads_to_work()
Posted by Fabiano Rosas 1 week, 1 day ago
"Maciej S. Szmigiero" <mail@maciej.szmigiero.name> writes:

> From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>
>
> thread_pool_adjust_max_threads_to_work() is supposed to give each task its
> own thread by setting the pool max thread count limit accordingly.
>
> However, if there aren't any tasks currently in the pool the pool max
> thread count will be set to 0, which will trigger an assertion failure
> in thread_pool_set_max_threads() - because setting this value would
> completely block the pool by not allowing it to process any submitted
> tasks.
>
> This also can happen if a task is submitted via
> thread_pool_submit_immediate() to an empty pool but the task completes so
> quickly that by the time this function calls
> thread_pool_adjust_max_threads_to_work() the pool again has no unfinished
> tasks in it.
>
> Fix this by making sure that the pool is allowed to create at least 1
> thread.
>
> Fixes: b5aa74968b27 ("thread-pool: Implement generic (non-AIO) pool support")
> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
> ---
>  util/thread-pool.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/util/thread-pool.c b/util/thread-pool.c
> index 8f8cb38d5ce0..4e75191c983e 100644
> --- a/util/thread-pool.c
> +++ b/util/thread-pool.c
> @@ -493,5 +493,5 @@ bool thread_pool_adjust_max_threads_to_work(ThreadPool *pool)
>  {
>      QEMU_LOCK_GUARD(&pool->cur_work_lock);
>  
> -    return thread_pool_set_max_threads(pool, pool->cur_work);
> +    return thread_pool_set_max_threads(pool, MAX(pool->cur_work, 1));
>  }

Reviewed-by: Fabiano Rosas <farosas@suse.de>