The bug describes a race whereby cpu_exec_step_atomic can acquire a TB
which is invalidated by a tb_flush before we execute it. This doesn't
affect the other cpu_exec modes as a tb_flush by it's nature can only
occur on a quiescent system. The race was described as:
B2. tcg_cpu_exec => cpu_exec => tb_find => tb_gen_code
B3. tcg_tb_alloc obtains a new TB
C3. TB obtained with tb_lookup__cpu_state or tb_gen_code
(same TB as B2)
A3. start_exclusive critical section entered
A4. do_tb_flush is called, TB memory freed/re-allocated
A5. end_exclusive exits critical section
B2. tcg_cpu_exec => cpu_exec => tb_find => tb_gen_code
B3. tcg_tb_alloc reallocates TB from B2
C4. start_exclusive critical section entered
C5. cpu_tb_exec executes the TB code that was free in A4
The simplest fix is to widen the exclusive period to include the TB
lookup. As a result we can drop the complication of checking we are in
the exclusive region before we end it.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Cc: Yifan <me@yifanlu.com>
Cc: Bug 1863025 <1863025@bugs.launchpad.net>
---
accel/tcg/cpu-exec.c | 21 +++++++++++----------
1 file changed, 11 insertions(+), 10 deletions(-)
diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index 2560c90eec7..d95c4848a47 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -240,6 +240,8 @@ void cpu_exec_step_atomic(CPUState *cpu)
uint32_t cf_mask = cflags & CF_HASH_MASK;
if (sigsetjmp(cpu->jmp_env, 0) == 0) {
+ start_exclusive();
+
tb = tb_lookup__cpu_state(cpu, &pc, &cs_base, &flags, cf_mask);
if (tb == NULL) {
mmap_lock();
@@ -247,8 +249,6 @@ void cpu_exec_step_atomic(CPUState *cpu)
mmap_unlock();
}
- start_exclusive();
-
/* Since we got here, we know that parallel_cpus must be true. */
parallel_cpus = false;
cc->cpu_exec_enter(cpu);
@@ -271,14 +271,15 @@ void cpu_exec_step_atomic(CPUState *cpu)
qemu_plugin_disable_mem_helpers(cpu);
}
- if (cpu_in_exclusive_context(cpu)) {
- /* We might longjump out of either the codegen or the
- * execution, so must make sure we only end the exclusive
- * region if we started it.
- */
- parallel_cpus = true;
- end_exclusive();
- }
+
+ /*
+ * As we start the exclusive region before codegen we must still
+ * be in the region if we longjump out of either the codegen or
+ * the execution.
+ */
+ g_assert(cpu_in_exclusive_context(cpu));
+ parallel_cpus = true;
+ end_exclusive();
}
struct tb_desc {
--
2.20.1
On 2/14/20 6:49 AM, Alex Bennée wrote: > The bug describes a race whereby cpu_exec_step_atomic can acquire a TB > which is invalidated by a tb_flush before we execute it. This doesn't > affect the other cpu_exec modes as a tb_flush by it's nature can only > occur on a quiescent system. The race was described as: > > B2. tcg_cpu_exec => cpu_exec => tb_find => tb_gen_code > B3. tcg_tb_alloc obtains a new TB > > C3. TB obtained with tb_lookup__cpu_state or tb_gen_code > (same TB as B2) > > A3. start_exclusive critical section entered > A4. do_tb_flush is called, TB memory freed/re-allocated > A5. end_exclusive exits critical section > > B2. tcg_cpu_exec => cpu_exec => tb_find => tb_gen_code > B3. tcg_tb_alloc reallocates TB from B2 > > C4. start_exclusive critical section entered > C5. cpu_tb_exec executes the TB code that was free in A4 > > The simplest fix is to widen the exclusive period to include the TB > lookup. As a result we can drop the complication of checking we are in > the exclusive region before we end it. I'm not 100% keen on having the tb_gen_code within the exclusive region. It implies a much larger delay on (at least) the first execution of the atomic operation. But I suppose until recently we had a global lock around code generation, and this is only slightly worse. Plus, it has the advantage of being dead simple, and without the races vs tb_ctx.tb_flush_count that exist in Yifan's patch. Applied to tcg-next. r~
What race are you thinking of in my patch? The obvious race I can think of is benign: Case 1: A: does TB flush B: read tb_flush_count A: increment tb_flush_count A: end_exclusive B: tb_lookup__cpu_state/tb_gen_code B: start_exclusive B: read tb_flush_count again (increment seen) B: retries Case 2: B: read tb_flush_count A: does TB flush A: increment tb_flush_count A: end_exclusive B: tb_lookup__cpu_state/tb_gen_code B: start_exclusive B: read tb_flush_count again (increment seen) B: retries Case 3: A: does TB flush A: increment tb_flush_count A: end_exclusive B: read tb_flush_count B: tb_lookup__cpu_state/tb_gen_code B: start_exclusive B: read tb_flush_count again (no increment seen) B: proceeds Case 1 is the expected case. Case 2, we thought TB was stale but it wasn't so we get it again with tb_lookup__cpu_state with minimal extra overhead. Case 3 seems to be bad because we could read tb_flush_count and find it already incremented. But if so that means thread A is at the end of do_tb_flush and the lookup tables are already cleared and the TCG context is already reset. So it should be safe for thread B to call tb_lookup__cpu_state or tb_gen_code. Yifan On Fri, Feb 14, 2020 at 3:31 PM Richard Henderson <richard.henderson@linaro.org> wrote: > > On 2/14/20 6:49 AM, Alex Bennée wrote: > > The bug describes a race whereby cpu_exec_step_atomic can acquire a TB > > which is invalidated by a tb_flush before we execute it. This doesn't > > affect the other cpu_exec modes as a tb_flush by it's nature can only > > occur on a quiescent system. The race was described as: > > > > B2. tcg_cpu_exec => cpu_exec => tb_find => tb_gen_code > > B3. tcg_tb_alloc obtains a new TB > > > > C3. TB obtained with tb_lookup__cpu_state or tb_gen_code > > (same TB as B2) > > > > A3. start_exclusive critical section entered > > A4. do_tb_flush is called, TB memory freed/re-allocated > > A5. end_exclusive exits critical section > > > > B2. tcg_cpu_exec => cpu_exec => tb_find => tb_gen_code > > B3. tcg_tb_alloc reallocates TB from B2 > > > > C4. start_exclusive critical section entered > > C5. cpu_tb_exec executes the TB code that was free in A4 > > > > The simplest fix is to widen the exclusive period to include the TB > > lookup. As a result we can drop the complication of checking we are in > > the exclusive region before we end it. > > I'm not 100% keen on having the tb_gen_code within the exclusive region. It > implies a much larger delay on (at least) the first execution of the atomic > operation. > > But I suppose until recently we had a global lock around code generation, and > this is only slightly worse. Plus, it has the advantage of being dead simple, > and without the races vs tb_ctx.tb_flush_count that exist in Yifan's patch. > > Applied to tcg-next. > > > r~
On 14/02/20 15:49, Alex Bennée wrote: > The bug describes a race whereby cpu_exec_step_atomic can acquire a TB > which is invalidated by a tb_flush before we execute it. This doesn't > affect the other cpu_exec modes as a tb_flush by it's nature can only > occur on a quiescent system. The race was described as: > > B2. tcg_cpu_exec => cpu_exec => tb_find => tb_gen_code > B3. tcg_tb_alloc obtains a new TB > > C3. TB obtained with tb_lookup__cpu_state or tb_gen_code > (same TB as B2) > > A3. start_exclusive critical section entered > A4. do_tb_flush is called, TB memory freed/re-allocated > A5. end_exclusive exits critical section > > B2. tcg_cpu_exec => cpu_exec => tb_find => tb_gen_code > B3. tcg_tb_alloc reallocates TB from B2 > > C4. start_exclusive critical section entered > C5. cpu_tb_exec executes the TB code that was free in A4 > > The simplest fix is to widen the exclusive period to include the TB > lookup. As a result we can drop the complication of checking we are in > the exclusive region before we end it. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> > Signed-off-by: Alex Bennée <alex.bennee@linaro.org> > Cc: Yifan <me@yifanlu.com> > Cc: Bug 1863025 <1863025@bugs.launchpad.net> > --- > accel/tcg/cpu-exec.c | 21 +++++++++++---------- > 1 file changed, 11 insertions(+), 10 deletions(-) > > diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c > index 2560c90eec7..d95c4848a47 100644 > --- a/accel/tcg/cpu-exec.c > +++ b/accel/tcg/cpu-exec.c > @@ -240,6 +240,8 @@ void cpu_exec_step_atomic(CPUState *cpu) > uint32_t cf_mask = cflags & CF_HASH_MASK; > > if (sigsetjmp(cpu->jmp_env, 0) == 0) { > + start_exclusive(); > + > tb = tb_lookup__cpu_state(cpu, &pc, &cs_base, &flags, cf_mask); > if (tb == NULL) { > mmap_lock(); > @@ -247,8 +249,6 @@ void cpu_exec_step_atomic(CPUState *cpu) > mmap_unlock(); > } > > - start_exclusive(); > - > /* Since we got here, we know that parallel_cpus must be true. */ > parallel_cpus = false; > cc->cpu_exec_enter(cpu); > @@ -271,14 +271,15 @@ void cpu_exec_step_atomic(CPUState *cpu) > qemu_plugin_disable_mem_helpers(cpu); > } > > - if (cpu_in_exclusive_context(cpu)) { > - /* We might longjump out of either the codegen or the > - * execution, so must make sure we only end the exclusive > - * region if we started it. > - */ > - parallel_cpus = true; > - end_exclusive(); > - } > + > + /* > + * As we start the exclusive region before codegen we must still > + * be in the region if we longjump out of either the codegen or > + * the execution. > + */ > + g_assert(cpu_in_exclusive_context(cpu)); > + parallel_cpus = true; > + end_exclusive(); > } > > struct tb_desc { >
© 2016 - 2024 Red Hat, Inc.