accel/tcg/tcg-accel-ops-mttcg.c | 11 ----------- 1 file changed, 11 deletions(-)
mttcg asserts that an execution ending with EXCP_HALTED must have
cpu->halted. However between the event or instruction that sets
cpu->halted and requests exit and the assertion here, an
asynchronous event could clear cpu->halted.
This leads to crashes running AIX on ppc/pseries because it uses
H_CEDE/H_PROD hcalls, where H_CEDE sets self->halted = 1 and
H_PROD sets other cpu->halted = 0 and kicks it.
H_PROD could be turned into an interrupt to wake, but several other
places in ppc, sparc, and semihosting follow what looks like a similar
pattern setting halted = 0 directly. So remove this assertion.
Reported-by: Ivan Warren <ivan@vmfacility.fr>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
accel/tcg/tcg-accel-ops-mttcg.c | 11 -----------
1 file changed, 11 deletions(-)
diff --git a/accel/tcg/tcg-accel-ops-mttcg.c b/accel/tcg/tcg-accel-ops-mttcg.c
index b276262007..d0b6f288d9 100644
--- a/accel/tcg/tcg-accel-ops-mttcg.c
+++ b/accel/tcg/tcg-accel-ops-mttcg.c
@@ -98,17 +98,6 @@ static void *mttcg_cpu_thread_fn(void *arg)
case EXCP_DEBUG:
cpu_handle_guest_debug(cpu);
break;
- case EXCP_HALTED:
- /*
- * during start-up the vCPU is reset and the thread is
- * kicked several times. If we don't ensure we go back
- * to sleep in the halted state we won't cleanly
- * start-up when the vCPU is enabled.
- *
- * cpu->halted should ensure we sleep in wait_io_event
- */
- g_assert(cpu->halted);
- break;
case EXCP_ATOMIC:
qemu_mutex_unlock_iothread();
cpu_exec_step_atomic(cpu);
--
2.40.1
29.08.2023 04:06, Nicholas Piggin wrote: > mttcg asserts that an execution ending with EXCP_HALTED must have > cpu->halted. However between the event or instruction that sets > cpu->halted and requests exit and the assertion here, an > asynchronous event could clear cpu->halted. > > This leads to crashes running AIX on ppc/pseries because it uses > H_CEDE/H_PROD hcalls, where H_CEDE sets self->halted = 1 and > H_PROD sets other cpu->halted = 0 and kicks it. > > H_PROD could be turned into an interrupt to wake, but several other > places in ppc, sparc, and semihosting follow what looks like a similar > pattern setting halted = 0 directly. So remove this assertion. > > Reported-by: Ivan Warren <ivan@vmfacility.fr> > Signed-off-by: Nicholas Piggin <npiggin@gmail.com> This one also smells like a stable material, is it not? Thanks, /mjt > diff --git a/accel/tcg/tcg-accel-ops-mttcg.c b/accel/tcg/tcg-accel-ops-mttcg.c > index b276262007..d0b6f288d9 100644 > --- a/accel/tcg/tcg-accel-ops-mttcg.c > +++ b/accel/tcg/tcg-accel-ops-mttcg.c > @@ -98,17 +98,6 @@ static void *mttcg_cpu_thread_fn(void *arg) > case EXCP_DEBUG: > cpu_handle_guest_debug(cpu); > break; > - case EXCP_HALTED: > - /* > - * during start-up the vCPU is reset and the thread is > - * kicked several times. If we don't ensure we go back > - * to sleep in the halted state we won't cleanly > - * start-up when the vCPU is enabled. > - * > - * cpu->halted should ensure we sleep in wait_io_event > - */ > - g_assert(cpu->halted); > - break; > case EXCP_ATOMIC: > qemu_mutex_unlock_iothread(); > cpu_exec_step_atomic(cpu);
On Fri Sep 22, 2023 at 4:25 AM AEST, Michael Tokarev wrote: > 29.08.2023 04:06, Nicholas Piggin wrote: > > mttcg asserts that an execution ending with EXCP_HALTED must have > > cpu->halted. However between the event or instruction that sets > > cpu->halted and requests exit and the assertion here, an > > asynchronous event could clear cpu->halted. > > > > This leads to crashes running AIX on ppc/pseries because it uses > > H_CEDE/H_PROD hcalls, where H_CEDE sets self->halted = 1 and > > H_PROD sets other cpu->halted = 0 and kicks it. > > > > H_PROD could be turned into an interrupt to wake, but several other > > places in ppc, sparc, and semihosting follow what looks like a similar > > pattern setting halted = 0 directly. So remove this assertion. > > > > Reported-by: Ivan Warren <ivan@vmfacility.fr> > > Signed-off-by: Nicholas Piggin <npiggin@gmail.com> > > This one also smells like a stable material, is it not? Yeah I would say it is. Thanks, Nick > > Thanks, > > /mjt > > > diff --git a/accel/tcg/tcg-accel-ops-mttcg.c b/accel/tcg/tcg-accel-ops-mttcg.c > > index b276262007..d0b6f288d9 100644 > > --- a/accel/tcg/tcg-accel-ops-mttcg.c > > +++ b/accel/tcg/tcg-accel-ops-mttcg.c > > @@ -98,17 +98,6 @@ static void *mttcg_cpu_thread_fn(void *arg) > > case EXCP_DEBUG: > > cpu_handle_guest_debug(cpu); > > break; > > - case EXCP_HALTED: > > - /* > > - * during start-up the vCPU is reset and the thread is > > - * kicked several times. If we don't ensure we go back > > - * to sleep in the halted state we won't cleanly > > - * start-up when the vCPU is enabled. > > - * > > - * cpu->halted should ensure we sleep in wait_io_event > > - */ > > - g_assert(cpu->halted); > > - break; > > case EXCP_ATOMIC: > > qemu_mutex_unlock_iothread(); > > cpu_exec_step_atomic(cpu);
On 8/28/23 18:06, Nicholas Piggin wrote: > mttcg asserts that an execution ending with EXCP_HALTED must have > cpu->halted. However between the event or instruction that sets > cpu->halted and requests exit and the assertion here, an > asynchronous event could clear cpu->halted. > > This leads to crashes running AIX on ppc/pseries because it uses > H_CEDE/H_PROD hcalls, where H_CEDE sets self->halted = 1 and > H_PROD sets other cpu->halted = 0 and kicks it. > > H_PROD could be turned into an interrupt to wake, but several other > places in ppc, sparc, and semihosting follow what looks like a similar > pattern setting halted = 0 directly. So remove this assertion. > > Reported-by: Ivan Warren <ivan@vmfacility.fr> > Signed-off-by: Nicholas Piggin <npiggin@gmail.com> > --- > accel/tcg/tcg-accel-ops-mttcg.c | 11 ----------- > 1 file changed, 11 deletions(-) The adjustments of 'halted' and 'prod' are done under the io lock in both cases, so there's no race there. It is perfectly reasonable that after thread A sets halted and drops the lock, thread B may acquire the lock and clear halted before thread A has a chance to complete longjmp and cycle through its main loop. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> > > diff --git a/accel/tcg/tcg-accel-ops-mttcg.c b/accel/tcg/tcg-accel-ops-mttcg.c > index b276262007..d0b6f288d9 100644 > --- a/accel/tcg/tcg-accel-ops-mttcg.c > +++ b/accel/tcg/tcg-accel-ops-mttcg.c > @@ -98,17 +98,6 @@ static void *mttcg_cpu_thread_fn(void *arg) > case EXCP_DEBUG: > cpu_handle_guest_debug(cpu); > break; > - case EXCP_HALTED: > - /* > - * during start-up the vCPU is reset and the thread is > - * kicked several times. If we don't ensure we go back > - * to sleep in the halted state we won't cleanly > - * start-up when the vCPU is enabled. > - * > - * cpu->halted should ensure we sleep in wait_io_event > - */ > - g_assert(cpu->halted); > - break; I adjusted the patch to keep the case label and update the comment, still dropping the assert. Queued to tcg-next. r~
© 2016 - 2024 Red Hat, Inc.