This avoids the following deadlock:
1) a thread calls run_on_cpu for CPU 2 from a timer, and single_tcg_halt_cond
is signaled
2) CPU 1 is running and exits. It finds no work item and enters CPU 2
3) because the I/O thread is stuck in run_on_cpu, the round-robin kick
timer never triggers, and CPU 2 never runs the work item
4) run_on_cpu never completes
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
cpus.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/cpus.c b/cpus.c
index a2b33ccb29..0ddeeefc14 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1220,16 +1220,20 @@ static void qemu_wait_io_event_common(CPUState *cpu)
process_queued_cpu_work(cpu);
}
-static void qemu_tcg_rr_wait_io_event(CPUState *cpu)
+static void qemu_tcg_rr_wait_io_event(void)
{
+ CPUState *cpu;
+
while (all_cpu_threads_idle()) {
stop_tcg_kick_timer();
- qemu_cond_wait(cpu->halt_cond, &qemu_global_mutex);
+ qemu_cond_wait(first_cpu->halt_cond, &qemu_global_mutex);
}
start_tcg_kick_timer();
- qemu_wait_io_event_common(cpu);
+ CPU_FOREACH(cpu) {
+ qemu_wait_io_event_common(cpu);
+ }
}
static void qemu_wait_io_event(CPUState *cpu)
@@ -1562,7 +1566,7 @@ static void *qemu_tcg_rr_cpu_thread_fn(void *arg)
qemu_notify_event();
}
- qemu_tcg_rr_wait_io_event(cpu ? cpu : first_cpu);
+ qemu_tcg_rr_wait_io_event();
deal_with_unplugged_cpus();
}
--
2.17.1
On Wed, Nov 14, 2018 at 12:44:00 +0100, Paolo Bonzini wrote:
> This avoids the following deadlock:
>
> 1) a thread calls run_on_cpu for CPU 2 from a timer, and single_tcg_halt_cond
> is signaled
>
> 2) CPU 1 is running and exits. It finds no work item and enters CPU 2
>
> 3) because the I/O thread is stuck in run_on_cpu, the round-robin kick
> timer never triggers, and CPU 2 never runs the work item
>
> 4) run_on_cpu never completes
I'm having trouble understanding (2)->(3).
When the vCPU thread enters CPU 2, shouldn't it detect that work is
pending? As in:
/* assume cpu == cpu2 in the example above */
while (cpu && !cpu->queued_work_first && !cpu->exit_request) {
Both cpu->queued_work_first and cpu->exit_request will be set for cpu2.
I can see though how with an additional CPU the deadlock
could happen. For example, the I/O thread does run_on_cpu(cpu3),
which kicks cpu1 (i.e. the tcg_current_rr_cpu) and cpu3, but not cpu2.
Then cpu1 exits, and cpu2 starts executing; unless cpu2 exits on its
own volition, it will run forever.
Thanks,
Emilio
On 14/11/2018 20:42, Emilio G. Cota wrote:
> On Wed, Nov 14, 2018 at 12:44:00 +0100, Paolo Bonzini wrote:
>> This avoids the following deadlock:
>>
>> 1) a thread calls run_on_cpu for CPU 2 from a timer, and single_tcg_halt_cond
>> is signaled
>>
>> 2) CPU 1 is running and exits. It finds no work item and enters CPU 2
>>
>> 3) because the I/O thread is stuck in run_on_cpu, the round-robin kick
>> timer never triggers, and CPU 2 never runs the work item
>>
>> 4) run_on_cpu never completes
>
> I'm having trouble understanding (2)->(3).
>
> When the vCPU thread enters CPU 2, shouldn't it detect that work is
> pending? As in:
>
> /* assume cpu == cpu2 in the example above */
> while (cpu && !cpu->queued_work_first && !cpu->exit_request) {
>
> Both cpu->queued_work_first and cpu->exit_request will be set for cpu2.
>
> I can see though how with an additional CPU the deadlock
> could happen. For example, the I/O thread does run_on_cpu(cpu3),
> which kicks cpu1 (i.e. the tcg_current_rr_cpu) and cpu3, but not cpu2.
> Then cpu1 exits, and cpu2 starts executing; unless cpu2 exits on its
> own volition, it will run forever.
Yes, the thread must call run_on_cpu for CPU *3* from a timer.
Paolo
On Fri, Nov 16, 2018 at 00:15:53 +0100, Paolo Bonzini wrote: > On 14/11/2018 20:42, Emilio G. Cota wrote: > > On Wed, Nov 14, 2018 at 12:44:00 +0100, Paolo Bonzini wrote: > >> This avoids the following deadlock: > >> > >> 1) a thread calls run_on_cpu for CPU 2 from a timer, and single_tcg_halt_cond > >> is signaled > >> > >> 2) CPU 1 is running and exits. It finds no work item and enters CPU 2 > >> > >> 3) because the I/O thread is stuck in run_on_cpu, the round-robin kick > >> timer never triggers, and CPU 2 never runs the work item > >> > >> 4) run_on_cpu never completes (snip) > > I can see though how with an additional CPU the deadlock > > could happen. For example, the I/O thread does run_on_cpu(cpu3), > > which kicks cpu1 (i.e. the tcg_current_rr_cpu) and cpu3, but not cpu2. > > Then cpu1 exits, and cpu2 starts executing; unless cpu2 exits on its > > own volition, it will run forever. > > Yes, the thread must call run_on_cpu for CPU *3* from a timer. Thanks! Please add my Reviewed-by: Emilio G. Cota <cota@braap.org> tag when fixing up the commit message. E.
© 2016 - 2025 Red Hat, Inc.