v1: https://lists.gnu.org/archive/html/qemu-devel/2018-10/msg05682.html
Changes since v1:
- Drop the 2-pass translation. Instead, empty instrumentation
is injected during translation. If it turns out that this
empty instrumentation is not needed, it is removed from
the output. For this, add 2 TCG ops that mark the beginning
and end of this empty instrumentation.
This is cleaner than 2-pass translation, although it
ends up being quite a bit more code, since we have
to copy backend TCG ops, which is tedious. Performance-wise,
it is at worst ~9% slower (~1.3% avg) than 2-pass for SPEC06int:
https://imgur.com/a/bUNox3H
This is for an "empty" plugin (also added to tests/plugin/empty.c).
That is, it subscribes to TB translation events and does nothing
with them (i.e. no execution-time subscriptions).
This means the empty instrumentation has to be injected and then
removed, which is the worst-case scenario since all the injection
work is wasted.
- Add QTAILQ_REMOVE_SEVERAL, which helps speed up the removal
of empty instrumentation.
- Drop the "TCG runtime helper" support. We do not need it
for empty instrumentation; we just replace the function pointer
in the copied "call" op directly.
+ To detect when an instruction uses helpers, just strncmp
the helper's name against "plugin_".
- Drop tb->plugin_mask. Instead, read cpu->plugin_mask from
translator_loop.
- Drop the xxhash patches, since I submitted those as a separate
series.
- Move a lot of plugin-related code from translator.c to
plugin-gen.c, leaving only a few function calls in translator.c.
- Add support for only subscribing to an instruction's reads or
writes. This is implemented via a flag added to the memory
registration functions of the public API.
- Disentangle callbacks into separate arrays. Instead of just
having 3 arrays (tb, insn and mem callbacks), have 5 arrays
(tb, insn, virt. mem, hostaddr mem) of 2 arrays each (udata_cb
and inline). This takes a bit more space per TB, but note that
this struct is allocated only once in each TCGContext. OTOH,
it makes the code much simpler. The union in struct dyn_cb
remains, since for instrumenting memory accesses from helpers
we still coalesce all types of memory callbacks into a single
array.
- Add get_page_addr_code_hostp to get the host address of code
from common code. Use this to export the host address of
instructions (qemu_plugin_insn_haddr() added to the public API).
- Define TCGMemOp MO_HADDR. If set, the TCG backend copies on
a TLB hit the corresponding host address to env->hostaddr.
This allows us to only do this copy when needed.
- Use helpers for reading and setting env->hostaddr, so that
we minimize the use of #ifdef CONFIG_PLUGIN.
- Only define env->hostaddr if CONFIG_PLUGIN.
- Drop the trailing 'S' in CONFIG_PLUGINS: it is now CONFIG_PLUGIN.
- Drop a few optional features from the RFC:
+ lockstep execution
+ plugin-chan + guest hooks
+ virtual clock control
- Define translator_ld* helpers and use them, as suggested
by Alex and rth. All target ISAs that use translator_loop
have been converted, except s390x and mips.
- Do not bloat TCGContext if !CONFIG_PLUGIN.
- Define TCGContext.plugin_tb as a pointer, instead of the
whole struct.
- Test on 32-bit and 64-bit hosts (i386, x86_64, ppc64, aarch64).
- Add cpu_in_exclusive_work_context() and use it in tb_flush(),
as suggested by Alex.
- configure fixes, including MacOSX builds thanks to Roman's help.
- Remove macros in atomic_template.h, as suggested by Alex.
Turns out they aren't needed, inlines are enough.
- Fixed a bug by which cpu->plugin_mem was not being cleared
if the instruction that used helpers was the last one in
a TB (e.g. an exception). Fix it by adding checks (1) when
returning from longjmp, and (2) when finishing a TB from
tcg, so that we're sure to leave cpu->plugin_mem
in a good state. (I noticed the bug by uninstalling a plugin
that had registered memory callbacks, which resulted in
callbacks to the uninstalled [dlclose'd] plugin.)
- Make sure tcg_ctx->plugin_mem_cb is always NULL after finishing
the translation of a TB. This fixes a bug on uninstall.
- Do not abort when qemu_plugin_uninstall is called more than
once. This is actually quite common, so just silently return
on subsequent calls to uninstall.
- Drop the "qemu"/QEMU from some overly long function/macro
names. This applies to qemu-internal files, of course.
- Keep the plugin's argument array in memory until the plugin
is uninstalled, so that plugins don't have to strdup their
arguments.
- Drop nargs argument from tcg_op_insert_before/after; it's
unused.
- Rename plugin-api.h to qemu-plugin.h, which is the same name
it gets in the final destination (after `make install').
- Add insn_inline function to the API.
- Add some sample plugins to tests/plugin.
You can fetch this series from:
https://github.com/cota/qemu/tree/plugin-v2
Thanks,
Emilio
---
.gitignore | 2 +
Makefile | 8 +-
Makefile.target | 18 +
accel/tcg/Makefile.objs | 1 +
accel/tcg/atomic_template.h | 117 +++-
accel/tcg/cpu-exec.c | 2 +
accel/tcg/cputlb.c | 23 +-
accel/tcg/plugin-gen.c | 1085 +++++++++++++++++++++++++++++
accel/tcg/plugin-helpers.h | 6 +
accel/tcg/softmmu_template.h | 43 +-
accel/tcg/translate-all.c | 15 +-
accel/tcg/translator.c | 16 +
bsd-user/syscall.c | 12 +
configure | 86 ++-
cpus-common.c | 2 +
cpus.c | 10 +
exec.c | 2 +
include/exec/cpu-defs.h | 9 +
include/exec/cpu_ldst.h | 9 +
include/exec/cpu_ldst_template.h | 43 +-
include/exec/cpu_ldst_useronly_template.h | 42 +-
include/exec/exec-all.h | 13 +
include/exec/helper-gen.h | 1 +
include/exec/helper-proto.h | 1 +
include/exec/helper-tcg.h | 1 +
include/exec/plugin-gen.h | 75 ++
include/exec/translator.h | 28 +
include/qemu/plugin.h | 253 +++++++
include/qemu/qemu-plugin.h | 241 +++++++
include/qemu/queue.h | 10 +
include/qom/cpu.h | 19 +
linux-user/exit.c | 1 +
linux-user/main.c | 18 +
linux-user/syscall.c | 3 +
plugin.c | 1030 +++++++++++++++++++++++++++
qemu-options.hx | 17 +
qemu-plugins.symbols | 34 +
qom/cpu.c | 2 +
target/alpha/translate.c | 2 +-
target/arm/translate-a64.c | 2 +
target/arm/translate.c | 8 +-
target/hppa/translate.c | 2 +-
target/i386/translate.c | 10 +-
target/m68k/translate.c | 2 +-
target/openrisc/translate.c | 2 +-
target/ppc/translate.c | 8 +-
target/riscv/translate.c | 2 +-
target/sh4/translate.c | 2 +-
target/sparc/translate.c | 2 +-
target/xtensa/translate.c | 4 +-
tcg/README | 2 +-
tcg/i386/tcg-target.inc.c | 7 +
tcg/optimize.c | 4 +-
tcg/tcg-op.c | 44 +-
tcg/tcg-op.h | 16 +
tcg/tcg-opc.h | 3 +
tcg/tcg.c | 27 +-
tcg/tcg.h | 32 +-
tests/plugin/Makefile | 28 +
tests/plugin/bb.c | 66 ++
tests/plugin/empty.c | 30 +
tests/plugin/insn.c | 63 ++
tests/plugin/mem.c | 93 +++
trace-events | 2 +-
vl.c | 11 +
65 files changed, 3653 insertions(+), 119 deletions(-)