tcg/tcg-internal.h | 12 +++---- tcg/tcg-op.c | 86 +++++++++++++++++++++++++--------------------- 2 files changed, 53 insertions(+), 45 deletions(-)
With tcg_last_op(), we always get the last op of the stream. With TCGContext.emit_before_op, the most recently emitted op is no longer the last op. Instead, pass the op being emitted back from the allocator so that we can link it to the label without needing to look it up. r~ Richard Henderson (2): tcg: Return TCGOp from tcg_gen_op[1-6] tcg: Propagate new TCGOp to add_as_label_use tcg/tcg-internal.h | 12 +++---- tcg/tcg-op.c | 86 +++++++++++++++++++++++++--------------------- 2 files changed, 53 insertions(+), 45 deletions(-) -- 2.43.0
On 9/10/24 14:23, Richard Henderson wrote: > With tcg_last_op(), we always get the last op of the stream. > With TCGContext.emit_before_op, the most recently emitted op > is no longer the last op. > > Instead, pass the op being emitted back from the allocator so > that we can link it to the label without needing to look it up. Oh, I meant to point out from whence this comes. The plugin uses a conditional ld_i32 tmp18,env,$0xffffffffffffdb10 mul_i32 tmp18,tmp18,$0x18 ext_i32_i64 tmp17,tmp18 add_i64 tmp17,tmp17,$0x575410edadc8 ld_i64 tmp21,tmp17,$0x0 brcond_i64 tmp21,$0x0,ltu,$L1 ld_i32 tmp18,env,$0xffffffffffffdb10 call plugin(0x79a2abfde66a),$0x1,$0,tmp18,$0x0 set_label $L1 Note that the branch is X < 0 (unsigned), which is always false, and thus the branch is optimized away. r~
Richard Henderson <richard.henderson@linaro.org> writes: > On 9/10/24 14:23, Richard Henderson wrote: >> With tcg_last_op(), we always get the last op of the stream. >> With TCGContext.emit_before_op, the most recently emitted op >> is no longer the last op. >> Instead, pass the op being emitted back from the allocator so >> that we can link it to the label without needing to look it up. > > Oh, I meant to point out from whence this comes. > The plugin uses a conditional size_t n_insns = qemu_plugin_tb_n_insns(tb); qemu_plugin_u64 quantum_insn = qemu_plugin_scoreboard_u64_in_struct(vcpus, vCPUTime, quantum_insn); /* count (and eventually trap) once per tb */ qemu_plugin_register_vcpu_tb_exec_inline_per_vcpu( tb, QEMU_PLUGIN_INLINE_ADD_U64, quantum_insn, n_insns); > ld_i32 tmp18,env,$0xffffffffffffdb10 > mul_i32 tmp18,tmp18,$0x18 > ext_i32_i64 tmp17,tmp18 > add_i64 tmp17,tmp17,$0x575410edadc8 qemu_plugin_register_vcpu_tb_exec_cond_cb( tb, every_quantum_insn, QEMU_PLUGIN_CB_NO_REGS, QEMU_PLUGIN_COND_GE, quantum_insn, max_insn_per_quantum, NULL); ? > ld_i64 tmp21,tmp17,$0x0 > brcond_i64 tmp21,$0x0,ltu,$L1 > ld_i32 tmp18,env,$0xffffffffffffdb10 > call plugin(0x79a2abfde66a),$0x1,$0,tmp18,$0x0 > set_label $L1 > > Note that the branch is X < 0 (unsigned), which is always false, and > thus the branch is optimized away. I'm obviously missing something reading this. How can TCG know the state of the scoreboard variables and optimise away the branch? > > > r~ -- Alex Bennée Virtualisation Tech Lead @ Linaro
On 9/13/24 03:23, Alex Bennée wrote: > Richard Henderson <richard.henderson@linaro.org> writes: > >> On 9/10/24 14:23, Richard Henderson wrote: >>> With tcg_last_op(), we always get the last op of the stream. >>> With TCGContext.emit_before_op, the most recently emitted op >>> is no longer the last op. >>> Instead, pass the op being emitted back from the allocator so >>> that we can link it to the label without needing to look it up. >> >> Oh, I meant to point out from whence this comes. >> The plugin uses a conditional > > size_t n_insns = qemu_plugin_tb_n_insns(tb); > qemu_plugin_u64 quantum_insn = > qemu_plugin_scoreboard_u64_in_struct(vcpus, vCPUTime, quantum_insn); > /* count (and eventually trap) once per tb */ > qemu_plugin_register_vcpu_tb_exec_inline_per_vcpu( > tb, QEMU_PLUGIN_INLINE_ADD_U64, quantum_insn, n_insns); > >> ld_i32 tmp18,env,$0xffffffffffffdb10 >> mul_i32 tmp18,tmp18,$0x18 >> ext_i32_i64 tmp17,tmp18 >> add_i64 tmp17,tmp17,$0x575410edadc8 > > qemu_plugin_register_vcpu_tb_exec_cond_cb( > tb, every_quantum_insn, > QEMU_PLUGIN_CB_NO_REGS, QEMU_PLUGIN_COND_GE, > quantum_insn, max_insn_per_quantum, NULL); > > ? > >> ld_i64 tmp21,tmp17,$0x0 >> brcond_i64 tmp21,$0x0,ltu,$L1 >> ld_i32 tmp18,env,$0xffffffffffffdb10 >> call plugin(0x79a2abfde66a),$0x1,$0,tmp18,$0x0 >> set_label $L1 >> >> Note that the branch is X < 0 (unsigned), which is always false, and >> thus the branch is optimized away. > > I'm obviously missing something reading this. How can TCG know the state > of the scoreboard variables and optimise away the branch? > The constant against which we compare scoreboard entry value is known at translation time. >> >> >> r~ >
On 9/13/24 03:23, Alex Bennée wrote: >> Note that the branch is X < 0 (unsigned), which is always false, and >> thus the branch is optimized away. > > I'm obviously missing something reading this. How can TCG know the state > of the scoreboard variables and optimise away the branch? 0 < 0 is of course false. r~
© 2016 - 2024 Red Hat, Inc.