contrib/plugins/meson.build | 3 +- contrib/plugins/traps.c | 84 +++++++++ docs/about/emulation.rst | 8 + include/qemu/plugin-event.h | 3 + include/qemu/plugin.h | 13 ++ include/qemu/qemu-plugin.h | 60 +++++++ plugins/core.c | 57 ++++++ target/alpha/helper.c | 13 ++ target/arm/helper.c | 24 +++ target/arm/internals.h | 1 + target/arm/tcg/m_helper.c | 5 + target/avr/helper.c | 3 + target/hppa/int_helper.c | 44 +++++ target/i386/tcg/excp_helper.c | 3 + target/i386/tcg/seg_helper.c | 4 + target/loongarch/cpu.c | 4 + target/m68k/op_helper.c | 22 +++ target/microblaze/helper.c | 10 ++ target/mips/tcg/system/tlb_helper.c | 11 ++ target/openrisc/interrupt.c | 15 ++ target/ppc/excp_helper.c | 41 +++++ target/riscv/cpu_helper.c | 9 + target/rx/helper.c | 12 ++ target/s390x/tcg/excp_helper.c | 8 + target/sh4/helper.c | 4 + target/sparc/int32_helper.c | 7 + target/sparc/int64_helper.c | 10 ++ target/tricore/op_helper.c | 5 + target/xtensa/exc_helper.c | 6 + tests/tcg/plugins/discons.c | 210 ++++++++++++++++++++++ tests/tcg/plugins/meson.build | 2 +- tests/tcg/riscv64/Makefile.softmmu-target | 12 ++ tests/tcg/riscv64/doubletrap.S | 73 ++++++++ tests/tcg/riscv64/interruptedmemory.S | 67 +++++++ 34 files changed, 851 insertions(+), 2 deletions(-) create mode 100644 contrib/plugins/traps.c create mode 100644 tests/tcg/plugins/discons.c create mode 100644 tests/tcg/riscv64/doubletrap.S create mode 100644 tests/tcg/riscv64/interruptedmemory.S
Some analysis greatly benefits, or depends on, information about
certain types of dicontinuities such as interrupts. For example, we may
need to handle the execution of a new translation block differently if
it is not the result of normal program flow but of an interrupt.
Even with the existing interfaces, it is more or less possible to
discern these situations, e.g. as done by the cflow plugin. However,
this process poses a considerable overhead to the core analysis one may
intend to perform.
These changes introduce a generic and easy-to-use interface for plugin
authors in the form of a callback for discontinuities. Patch 1 defines
an enumeration of some trap-related discontinuities including somewhat
narrow definitions of the discontinuity evetns and a callback type.
Patch 2 defines the callback registration function. Patch 3 adds some
hooks for triggering the callbacks. Patch 4 adds an example plugin
showcasing the new API.
Patches 5 through 22 call the hooks for all architectures but hexagon,
mapping architecture specific events to the three categories defined in
patch 1. We don't plan to add hooks for hexagon since despite having
exceptions apparently doesn't have any discontinuities associated with
them.
Patch 23 supplies a test plugin asserting some behavior of the plugin
API w.r.t. the PCs reported by the new API. Finally, patches 24 and 25
add new tests for riscv which serve as test-cases for the test plugin.
Sidenote: I'm likely doing something wrong for one architecture or
the other. These patches are untested for most of them.
Richard Henderson proposed streamlining interrupts and exceptions for
all targets and calling the hooks from a higher level rather than in
each target code. However, there are a few obstacled and I decided to
not do this as part of this series.
Since v5:
- The internal function plugin_vcpu_cb__discon now takes the
qemu_plugin_event as a parameter instead of determining the event
from the discon type.
- Fixed computation of the last PC for ARM platforms.
- Code mapping ARM exception index to discon type is now shared
between m- and a-profile.
- Fixed mapping of interrupt number to discon type for HPPA platforms.
- Removed exception hook for some internal events for Motorola 68000.
- Call hook for unaligned access exceptions on MicroBlaze platforms.
- Prevented calling of exception hooks for resets on OpenRISC.
- Made the discon test plugin compare hardware addesses transpated
with qemu_plugin_translate_vaddr when comparing addresses. Before
we'd use a crude bitmask.
Since v4:
- Fixed a typo in the documentation of the
qemu_plugin_vcpu_discon_cb_t function type (pointed out by Pierrick
Bouvier)
- Fixed a reference in the documentation of the
qemu_plugin_vcpu_discon_cb_t function type
- Added hooks for SuperH and TriCore targets
- Fixed typos in commit messages (pointed out by Daniel Henrique
Barboza)
Since v3 (RFC):
- Switched to shifting 1 notation for qemu_plugin_discon_type values
(as requested by Pierrick Bouvier)
- Added missing documentation of function parameters of function
pointer type qemu_plugin_vcpu_discon_cb_t
- Added missing documentation of function parameters of
qemu_plugin_register_vcpu_discon_cb
- Eliminated "to" argument from hooks called from target specific
code, i.e. qemu_plugin_vcpu_interrupt_cb and friends, determine "to"
address using CPUClass::get_pc
- Replaced comment declaring switch-case unreachable with
g_assert_not_reached()
- Call qemu_plugin_register_vcpu_discon_cb with QEMU_PLUGIN_DISCON_ALL
rather than QEMU_PLUGIN_DISCON_TRAPS in "traps" example plugin
- Take max_vcpus from qemu_info_t in "traps" example plugin, don't
determine it based on VCPU activation
- Added a description of the "traps" example plugin (as requested by
Pierrick Bouvier)
- Added section for the "traps" example plugin in documentation's
"Emulation" chapter
- Fixed messed-up switch-case in alpha_cpu_do_interrupt
- Added hooks for PA-RISC, x86, loongarch, Motorola 68000, MicroBlaze,
OpenRISC, Power PC, Renesas Xtreme, IBM System/390 and xtensa
targets.
- Made "discon" test plugin check PCs in vcpu_discon callback (as
requested by Pierrick Bouvier)
- Added parameter to "discon" test plugin for controlling which
address bits are compared to cope with TBs being used under
different virtual addresses
- Added parameter to "discon" test plugin for printing a full
instruction trace for debugging purposes
- Made "discon" test plugin abort by default on address mismatches
- Added test-cases for RISC-V
Since v2 (tcg-plugins: add hooks for interrupts, exceptions and traps):
- Switched from traps as core concept to more generic discontinuities
- Switched from semihosting to hostcall as term for emulated traps
- Added enumeration of events and dedicated callback type
- Make callback receive event type as well as origin and target PC
(as requested by Pierrick Bouvier)
- Combined registration functions for different traps into a single
one for all types of discontinuities (as requested by Pierrick
Bouvier)
- Migrated records in example plugin from fully pre-allocated to a
scoreboard (as suggested by Pierrick Bouvier)
- Handle PSCI calls as hostcall (as pointed out by Peter Maydell)
- Added hooks for ARM Cortex M arches (as pointed out by Peter
Maydell)
- Added hooks for Alpha targets
- Added hooks for MIPS targets
- Added a plugin for testing some of the interface behaviour
Since v1:
- Split the one callback into multiple callbacks
- Added a target-agnostic definition of the relevant event(s)
- Call hooks from architecture-code rather than accel/tcg/cpu-exec.c
- Added a plugin showcasing API usage
Julian Ganz (25):
plugins: add types for callbacks related to certain discontinuities
plugins: add API for registering discontinuity callbacks
plugins: add hooks for new discontinuity related callbacks
contrib/plugins: add plugin showcasing new dicontinuity related API
target/alpha: call plugin trap callbacks
target/arm: call plugin trap callbacks
target/avr: call plugin trap callbacks
target/hppa: call plugin trap callbacks
target/i386: call plugin trap callbacks
target/loongarch: call plugin trap callbacks
target/m68k: call plugin trap callbacks
target/microblaze: call plugin trap callbacks
target/mips: call plugin trap callbacks
target/openrisc: call plugin trap callbacks
target/ppc: call plugin trap callbacks
target/riscv: call plugin trap callbacks
target/rx: call plugin trap callbacks
target/s390x: call plugin trap callbacks
target/sh4: call plugin trap callbacks
target/sparc: call plugin trap callbacks
target/tricore: call plugin trap callbacks
target/xtensa: call plugin trap callbacks
tests: add plugin asserting correctness of discon event's to_pc
tests: add test for double-traps on rv64
tests: add test with interrupted memory accesses on rv64
contrib/plugins/meson.build | 3 +-
contrib/plugins/traps.c | 84 +++++++++
docs/about/emulation.rst | 8 +
include/qemu/plugin-event.h | 3 +
include/qemu/plugin.h | 13 ++
include/qemu/qemu-plugin.h | 60 +++++++
plugins/core.c | 57 ++++++
target/alpha/helper.c | 13 ++
target/arm/helper.c | 24 +++
target/arm/internals.h | 1 +
target/arm/tcg/m_helper.c | 5 +
target/avr/helper.c | 3 +
target/hppa/int_helper.c | 44 +++++
target/i386/tcg/excp_helper.c | 3 +
target/i386/tcg/seg_helper.c | 4 +
target/loongarch/cpu.c | 4 +
target/m68k/op_helper.c | 22 +++
target/microblaze/helper.c | 10 ++
target/mips/tcg/system/tlb_helper.c | 11 ++
target/openrisc/interrupt.c | 15 ++
target/ppc/excp_helper.c | 41 +++++
target/riscv/cpu_helper.c | 9 +
target/rx/helper.c | 12 ++
target/s390x/tcg/excp_helper.c | 8 +
target/sh4/helper.c | 4 +
target/sparc/int32_helper.c | 7 +
target/sparc/int64_helper.c | 10 ++
target/tricore/op_helper.c | 5 +
target/xtensa/exc_helper.c | 6 +
tests/tcg/plugins/discons.c | 210 ++++++++++++++++++++++
tests/tcg/plugins/meson.build | 2 +-
tests/tcg/riscv64/Makefile.softmmu-target | 12 ++
tests/tcg/riscv64/doubletrap.S | 73 ++++++++
tests/tcg/riscv64/interruptedmemory.S | 67 +++++++
34 files changed, 851 insertions(+), 2 deletions(-)
create mode 100644 contrib/plugins/traps.c
create mode 100644 tests/tcg/plugins/discons.c
create mode 100644 tests/tcg/riscv64/doubletrap.S
create mode 100644 tests/tcg/riscv64/interruptedmemory.S
--
2.49.1
Hi Julian, On 4/9/25 22:46, Julian Ganz wrote: > Some analysis greatly benefits, or depends on, information about > certain types of dicontinuities such as interrupts. For example, we may > need to handle the execution of a new translation block differently if > it is not the result of normal program flow but of an interrupt. > > Even with the existing interfaces, it is more or less possible to > discern these situations, e.g. as done by the cflow plugin. However, > this process poses a considerable overhead to the core analysis one may > intend to perform. > > These changes introduce a generic and easy-to-use interface for plugin > authors in the form of a callback for discontinuities. Patch 1 defines > an enumeration of some trap-related discontinuities including somewhat > narrow definitions of the discontinuity evetns and a callback type. > Patch 2 defines the callback registration function. Patch 3 adds some > hooks for triggering the callbacks. Patch 4 adds an example plugin > showcasing the new API. > > Patches 5 through 22 call the hooks for all architectures but hexagon, > mapping architecture specific events to the three categories defined in > patch 1. We don't plan to add hooks for hexagon since despite having > exceptions apparently doesn't have any discontinuities associated with > them. > Richard Henderson proposed streamlining interrupts and exceptions for > all targets and calling the hooks from a higher level rather than in > each target code. However, there are a few obstacled and I decided to > not do this as part of this series. Does that mean another part is planned, and when it lands then these patches will be reverted?
Hi Philippe, September 22, 2025 at 1:31 PM, "Philippe Mathieu-Daudé" wrote: > > Richard Henderson proposed streamlining interrupts and exceptions for > > all targets and calling the hooks from a higher level rather than in > > each target code. However, there are a few obstacled and I decided to > > not do this as part of this series. > > > Does that mean another part is planned, and when it lands then these > patches will be reverted? I don't have any tangible plans for a follow-up series. If I end up drafting one it will likely take a while. A follow-up series will likely not straight revert these changes since they essentially mark points where we would want to return some required additional information to said higher level. The hooks would instead vanish one by one as part of a migration. Regards, Julian
On Thu, 4 Sep 2025, Julian Ganz wrote: > Some analysis greatly benefits, or depends on, information about > certain types of dicontinuities such as interrupts. For example, we may > need to handle the execution of a new translation block differently if > it is not the result of normal program flow but of an interrupt. > > Even with the existing interfaces, it is more or less possible to > discern these situations, e.g. as done by the cflow plugin. However, > this process poses a considerable overhead to the core analysis one may > intend to perform. I'd rather have overhead in the plugin than in interrupt and exception handling on every target unless this can be completely disabled somehow when not needed to not pose any overhead on interrupt handling in the guest. Have you done any testing on how much overhead this adds to interrupt heavy guest workloads? At least for PPC these are already much slower than real CPU so I'd like it to get faster not slower. Regards, BALATON Zoltan > These changes introduce a generic and easy-to-use interface for plugin > authors in the form of a callback for discontinuities. Patch 1 defines > an enumeration of some trap-related discontinuities including somewhat > narrow definitions of the discontinuity evetns and a callback type. > Patch 2 defines the callback registration function. Patch 3 adds some > hooks for triggering the callbacks. Patch 4 adds an example plugin > showcasing the new API. > > Patches 5 through 22 call the hooks for all architectures but hexagon, > mapping architecture specific events to the three categories defined in > patch 1. We don't plan to add hooks for hexagon since despite having > exceptions apparently doesn't have any discontinuities associated with > them. > > Patch 23 supplies a test plugin asserting some behavior of the plugin > API w.r.t. the PCs reported by the new API. Finally, patches 24 and 25 > add new tests for riscv which serve as test-cases for the test plugin. > > Sidenote: I'm likely doing something wrong for one architecture or > the other. These patches are untested for most of them. > > Richard Henderson proposed streamlining interrupts and exceptions for > all targets and calling the hooks from a higher level rather than in > each target code. However, there are a few obstacled and I decided to > not do this as part of this series. > > Since v5: > - The internal function plugin_vcpu_cb__discon now takes the > qemu_plugin_event as a parameter instead of determining the event > from the discon type. > - Fixed computation of the last PC for ARM platforms. > - Code mapping ARM exception index to discon type is now shared > between m- and a-profile. > - Fixed mapping of interrupt number to discon type for HPPA platforms. > - Removed exception hook for some internal events for Motorola 68000. > - Call hook for unaligned access exceptions on MicroBlaze platforms. > - Prevented calling of exception hooks for resets on OpenRISC. > - Made the discon test plugin compare hardware addesses transpated > with qemu_plugin_translate_vaddr when comparing addresses. Before > we'd use a crude bitmask. > > Since v4: > - Fixed a typo in the documentation of the > qemu_plugin_vcpu_discon_cb_t function type (pointed out by Pierrick > Bouvier) > - Fixed a reference in the documentation of the > qemu_plugin_vcpu_discon_cb_t function type > - Added hooks for SuperH and TriCore targets > - Fixed typos in commit messages (pointed out by Daniel Henrique > Barboza) > > Since v3 (RFC): > - Switched to shifting 1 notation for qemu_plugin_discon_type values > (as requested by Pierrick Bouvier) > - Added missing documentation of function parameters of function > pointer type qemu_plugin_vcpu_discon_cb_t > - Added missing documentation of function parameters of > qemu_plugin_register_vcpu_discon_cb > - Eliminated "to" argument from hooks called from target specific > code, i.e. qemu_plugin_vcpu_interrupt_cb and friends, determine "to" > address using CPUClass::get_pc > - Replaced comment declaring switch-case unreachable with > g_assert_not_reached() > - Call qemu_plugin_register_vcpu_discon_cb with QEMU_PLUGIN_DISCON_ALL > rather than QEMU_PLUGIN_DISCON_TRAPS in "traps" example plugin > - Take max_vcpus from qemu_info_t in "traps" example plugin, don't > determine it based on VCPU activation > - Added a description of the "traps" example plugin (as requested by > Pierrick Bouvier) > - Added section for the "traps" example plugin in documentation's > "Emulation" chapter > - Fixed messed-up switch-case in alpha_cpu_do_interrupt > - Added hooks for PA-RISC, x86, loongarch, Motorola 68000, MicroBlaze, > OpenRISC, Power PC, Renesas Xtreme, IBM System/390 and xtensa > targets. > - Made "discon" test plugin check PCs in vcpu_discon callback (as > requested by Pierrick Bouvier) > - Added parameter to "discon" test plugin for controlling which > address bits are compared to cope with TBs being used under > different virtual addresses > - Added parameter to "discon" test plugin for printing a full > instruction trace for debugging purposes > - Made "discon" test plugin abort by default on address mismatches > - Added test-cases for RISC-V > > Since v2 (tcg-plugins: add hooks for interrupts, exceptions and traps): > - Switched from traps as core concept to more generic discontinuities > - Switched from semihosting to hostcall as term for emulated traps > - Added enumeration of events and dedicated callback type > - Make callback receive event type as well as origin and target PC > (as requested by Pierrick Bouvier) > - Combined registration functions for different traps into a single > one for all types of discontinuities (as requested by Pierrick > Bouvier) > - Migrated records in example plugin from fully pre-allocated to a > scoreboard (as suggested by Pierrick Bouvier) > - Handle PSCI calls as hostcall (as pointed out by Peter Maydell) > - Added hooks for ARM Cortex M arches (as pointed out by Peter > Maydell) > - Added hooks for Alpha targets > - Added hooks for MIPS targets > - Added a plugin for testing some of the interface behaviour > > Since v1: > - Split the one callback into multiple callbacks > - Added a target-agnostic definition of the relevant event(s) > - Call hooks from architecture-code rather than accel/tcg/cpu-exec.c > - Added a plugin showcasing API usage > > Julian Ganz (25): > plugins: add types for callbacks related to certain discontinuities > plugins: add API for registering discontinuity callbacks > plugins: add hooks for new discontinuity related callbacks > contrib/plugins: add plugin showcasing new dicontinuity related API > target/alpha: call plugin trap callbacks > target/arm: call plugin trap callbacks > target/avr: call plugin trap callbacks > target/hppa: call plugin trap callbacks > target/i386: call plugin trap callbacks > target/loongarch: call plugin trap callbacks > target/m68k: call plugin trap callbacks > target/microblaze: call plugin trap callbacks > target/mips: call plugin trap callbacks > target/openrisc: call plugin trap callbacks > target/ppc: call plugin trap callbacks > target/riscv: call plugin trap callbacks > target/rx: call plugin trap callbacks > target/s390x: call plugin trap callbacks > target/sh4: call plugin trap callbacks > target/sparc: call plugin trap callbacks > target/tricore: call plugin trap callbacks > target/xtensa: call plugin trap callbacks > tests: add plugin asserting correctness of discon event's to_pc > tests: add test for double-traps on rv64 > tests: add test with interrupted memory accesses on rv64 > > contrib/plugins/meson.build | 3 +- > contrib/plugins/traps.c | 84 +++++++++ > docs/about/emulation.rst | 8 + > include/qemu/plugin-event.h | 3 + > include/qemu/plugin.h | 13 ++ > include/qemu/qemu-plugin.h | 60 +++++++ > plugins/core.c | 57 ++++++ > target/alpha/helper.c | 13 ++ > target/arm/helper.c | 24 +++ > target/arm/internals.h | 1 + > target/arm/tcg/m_helper.c | 5 + > target/avr/helper.c | 3 + > target/hppa/int_helper.c | 44 +++++ > target/i386/tcg/excp_helper.c | 3 + > target/i386/tcg/seg_helper.c | 4 + > target/loongarch/cpu.c | 4 + > target/m68k/op_helper.c | 22 +++ > target/microblaze/helper.c | 10 ++ > target/mips/tcg/system/tlb_helper.c | 11 ++ > target/openrisc/interrupt.c | 15 ++ > target/ppc/excp_helper.c | 41 +++++ > target/riscv/cpu_helper.c | 9 + > target/rx/helper.c | 12 ++ > target/s390x/tcg/excp_helper.c | 8 + > target/sh4/helper.c | 4 + > target/sparc/int32_helper.c | 7 + > target/sparc/int64_helper.c | 10 ++ > target/tricore/op_helper.c | 5 + > target/xtensa/exc_helper.c | 6 + > tests/tcg/plugins/discons.c | 210 ++++++++++++++++++++++ > tests/tcg/plugins/meson.build | 2 +- > tests/tcg/riscv64/Makefile.softmmu-target | 12 ++ > tests/tcg/riscv64/doubletrap.S | 73 ++++++++ > tests/tcg/riscv64/interruptedmemory.S | 67 +++++++ > 34 files changed, 851 insertions(+), 2 deletions(-) > create mode 100644 contrib/plugins/traps.c > create mode 100644 tests/tcg/plugins/discons.c > create mode 100644 tests/tcg/riscv64/doubletrap.S > create mode 100644 tests/tcg/riscv64/interruptedmemory.S > >
September 5, 2025 at 1:38 PM, "BALATON Zoltan" wrote: > On Thu, 4 Sep 2025, Julian Ganz wrote: > > Even with the existing interfaces, it is more or less possible to > > discern these situations, e.g. as done by the cflow plugin. However, > > this process poses a considerable overhead to the core analysis one may > > intend to perform. > > > I'd rather have overhead in the plugin than in interrupt and exception > handling on every target unless this can be completely disabled > somehow when not needed to not pose any overhead on interrupt handling > in the guest. The "more or less" is rather heavy here: with the current API there is no way to distinguish between interrupts and exceptions. Double-traps can probably only be detected if you don't rely on weird, very error prone heuristics around TB translations. And as Alex Benée pointed out, qemu can be easily built with plugins disabled. > Have you done any testing on how much overhead this adds > to interrupt heavy guest workloads? At least for PPC these are already > much slower than real CPU so I'd like it to get faster not slower. No, I have not made any performance measurements. However, given that for every single TB execution a similar hook is called already, the impact related to other existing plugin infrastructure _should_ be neglectible. That is, if your workload actually runs any code and is not constantly bombarded with interrupts that _do_ result in a trap (which _may_ happen during some tests). So if you are performance sensitive enough to care, you will very likely want to disable plugins anyway. Regards, Julian
On Fri, 5 Sep 2025, Julian Ganz wrote: > September 5, 2025 at 1:38 PM, "BALATON Zoltan" wrote: >> On Thu, 4 Sep 2025, Julian Ganz wrote: >>> Even with the existing interfaces, it is more or less possible to >>> discern these situations, e.g. as done by the cflow plugin. However, >>> this process poses a considerable overhead to the core analysis one may >>> intend to perform. >>> >> I'd rather have overhead in the plugin than in interrupt and exception >> handling on every target unless this can be completely disabled >> somehow when not needed to not pose any overhead on interrupt handling >> in the guest. > > The "more or less" is rather heavy here: with the current API there is > no way to distinguish between interrupts and exceptions. Double-traps > can probably only be detected if you don't rely on weird, very error > prone heuristics around TB translations. > > And as Alex Benée pointed out, qemu can be easily built with plugins > disabled. > >> Have you done any testing on how much overhead this adds >> to interrupt heavy guest workloads? At least for PPC these are already >> much slower than real CPU so I'd like it to get faster not slower. > > No, I have not made any performance measurements. However, given that > for every single TB execution a similar hook is called already, the > impact related to other existing plugin infrastructure _should_ be > neglectible. > > That is, if your workload actually runs any code and is not constantly > bombarded with interrupts that _do_ result in a trap (which _may_ happen > during some tests). > > So if you are performance sensitive enough to care, you will very likely > want to disable plugins anyway. I can disable plugins and do that normally but that does not help those who get QEMU from their distro (i.e. most users). If this infrastructure was disabled in default builds and needed an explicit option to enable then those who need it could enable it and not imposed it on everyone else who just get a default build from a distro and never use plugins. Having an option which needs rebuild is like not having the option for most people. I guess the question is which is the larger group? Those who just run guests or those who use this instrumentation with plugins. The default may better be what the larger group needs. Even then distros may still change the default so it would be best if the overhead can be minimised even if enabled. I think the log infrastructure does that, would a similar solution work here? For testing I've found that because embedded PPC CPUs have a software controlled MMU (and in addition to that QEMU may flush TLB entries too often) running something that does a lot of memory access like runnung the STREAM benchmark on sam460ex is hit by this IIRC but anything else causing a lot of interrupts like reading from emulated disk or sound is probably affected as well. I've tried to optimise PPC exception handling a bit before but whenever I optimise something it is later undone by other changes not caring about performance. Regards, BALATON Zoltan
September 5, 2025 at 9:25 PM, "BALATON Zoltan" wrote: > On Fri, 5 Sep 2025, Julian Ganz wrote: > > September 5, 2025 at 1:38 PM, "BALATON Zoltan" wrote: > > > Have you done any testing on how much overhead this adds > > > to interrupt heavy guest workloads? At least for PPC these are already > > > much slower than real CPU so I'd like it to get faster not slower. > > > > > No, I have not made any performance measurements. However, given that > > for every single TB execution a similar hook is called already, the > > impact related to other existing plugin infrastructure _should_ be > > neglectible. > > > > That is, if your workload actually runs any code and is not constantly > > bombarded with interrupts that _do_ result in a trap (which _may_ happen > > during some tests). > > > > So if you are performance sensitive enough to care, you will very likely > > want to disable plugins anyway. > > > I can disable plugins and do that normally but that does not help those who get QEMU from their distro (i.e. most users). If this infrastructure was disabled in default builds and needed an explicit option to enable then those who need it could enable it and not imposed it on everyone else who just get a default build from a distro and never use plugins. Having an option which needs rebuild is like not having the option for most people. I guess the question is which is the larger group? Those who just run guests or those who use this instrumentation with plugins. Hard to say. > The default may better be what the larger group needs. Even then distros may still change the default so it would be best if the overhead can be minimised even if enabled. I think the log infrastructure does that, would a similar solution work here? > > For testing I've found that because embedded PPC CPUs have a software controlled MMU (and in addition to that QEMU may flush TLB entries too often) running something that does a lot of memory access like runnung the STREAM benchmark on sam460ex is hit by this IIRC but anything else causing a lot of interrupts like reading from emulated disk or sound is probably affected as well. I've tried to optimise PPC exception handling a bit before but whenever I optimise something it is later undone by other changes not caring about performance. I could try running the benchmark on multiple versions: * qemu with plugins disabled, * with plugins enabled but without these patches and * with plugins enabled and with these patches. However, I'll likely only report back with results next week, though. Do you happen to have an image you can point me to? Either something that has the benchmark already or some unixoid running on the platform? I'm currently not motivated enough to cook up some bare-metal testbed for a platform I'm not familiar with. Regards, Julian
On Fri, 5 Sep 2025, Julian Ganz wrote: > September 5, 2025 at 9:25 PM, "BALATON Zoltan" wrote: >> On Fri, 5 Sep 2025, Julian Ganz wrote: >>> September 5, 2025 at 1:38 PM, "BALATON Zoltan" wrote: >>>> Have you done any testing on how much overhead this adds >>>> to interrupt heavy guest workloads? At least for PPC these are already >>>> much slower than real CPU so I'd like it to get faster not slower. >>>> >>> No, I have not made any performance measurements. However, given that >>> for every single TB execution a similar hook is called already, the >>> impact related to other existing plugin infrastructure _should_ be >>> neglectible. >>> >>> That is, if your workload actually runs any code and is not constantly >>> bombarded with interrupts that _do_ result in a trap (which _may_ happen >>> during some tests). >>> >>> So if you are performance sensitive enough to care, you will very likely >>> want to disable plugins anyway. >>> >> I can disable plugins and do that normally but that does not help those who get QEMU from their distro (i.e. most users). If this infrastructure was disabled in default builds and needed an explicit option to enable then those who need it could enable it and not imposed it on everyone else who just get a default build from a distro and never use plugins. Having an option which needs rebuild is like not having the option for most people. I guess the question is which is the larger group? Those who just run guests or those who use this instrumentation with plugins. > > Hard to say. > >> The default may better be what the larger group needs. Even then distros may still change the default so it would be best if the overhead can be minimised even if enabled. I think the log infrastructure does that, would a similar solution work here? >> >> For testing I've found that because embedded PPC CPUs have a software controlled MMU (and in addition to that QEMU may flush TLB entries too often) running something that does a lot of memory access like runnung the STREAM benchmark on sam460ex is hit by this IIRC but anything else causing a lot of interrupts like reading from emulated disk or sound is probably affected as well. I've tried to optimise PPC exception handling a bit before but whenever I optimise something it is later undone by other changes not caring about performance. > > I could try running the benchmark on multiple versions: > > * qemu with plugins disabled, > * with plugins enabled but without these patches and > * with plugins enabled and with these patches. > > However, I'll likely only report back with results next week, though. > Do you happen to have an image you can point me to? Either something > that has the benchmark already or some unixoid running on the platform? > I'm currently not motivated enough to cook up some bare-metal testbed > for a platform I'm not familiar with. I don't have ready images to test embedded PPC MMU exceptions which I think this may affect most. I had an image for pegasos2 for a general test used here: https://lists.nongnu.org/archive/html/qemu-discuss/2023-12/msg00008.html but that machine has a G4 CPU which has hardware MMU so is likely not affected. I have uploaded some PPC binaries for the STREAM benchmark that I tested with before here: http://zero.eik.bme.hu/~balaton/qemu/stream-test.zip which may excercise this if run on sam460ex or ppce500 machines but I don't have a scripted test case for that. There is some docs on how to run Linux on these machines here: https://www.qemu.org/docs/master/system/target-ppc.html Alternatively maybe running a disk IO benchmark on an emulated IDE controller using PIO mode or some other device that generates a lots of interrupts may test this. I think you can use the "info irq" command in QEMU Monitor to check how many interrupts you get. Regards, BALATON Zoltan
September 7, 2025 at 10:21 PM, "BALATON Zoltan" wrote: > I have uploaded some PPC binaries for the STREAM benchmark that I tested with before here: > http://zero.eik.bme.hu/~balaton/qemu/stream-test.zip > which may excercise this if run on sam460ex or ppce500 machines but I don't have a scripted test case for that. There is some docs on how to run Linux on these machines here: > https://www.qemu.org/docs/master/system/target-ppc.html After spending too much time looking for usable root-images (and then giving up and just stuffing the executables in an ext2), I got to run these beanchmarks on Linux 4.4.5 configured for the Sam460ex from [1]. I ran streamPPCpowerpcO3 on qemu with these patches: ------------------------------------------------------------- Function Best Rate MB/s Avg time Min time Max time Copy: 2867.6 0.056828 0.055795 0.061792 Scale: 1057.5 0.153282 0.151305 0.158115 Add: 1308.8 0.187095 0.183380 0.193672 Triad: 1111.6 0.220863 0.215902 0.230440 ------------------------------------------------------------- After doing a clean build, with the fans still audible: ------------------------------------------------------------- Function Best Rate MB/s Avg time Min time Max time Copy: 2932.9 0.055131 0.054554 0.055667 Scale: 1067.9 0.151520 0.149832 0.155000 Add: 1324.9 0.184807 0.181150 0.191386 Triad: 1122.0 0.220080 0.213896 0.229302 ------------------------------------------------------------- On qemu (6a9fa5ef3230a7d51e0d953a59ee9ef10af705b8) without these patches, but plugins enabled: ------------------------------------------------------------- Function Best Rate MB/s Avg time Min time Max time Copy: 2972.1 0.054407 0.053834 0.054675 Scale: 1068.6 0.151503 0.149726 0.154594 Add: 1327.6 0.185160 0.180784 0.193181 Triad: 1127.2 0.219249 0.212915 0.229230 ------------------------------------------------------------- And on qemu (6a9fa5ef3230a7d51e0d953a59ee9ef10af705b8) without these patches, with plugins disabled: ------------------------------------------------------------- Function Best Rate MB/s Avg time Min time Max time Copy: 2983.4 0.055141 0.053630 0.060013 Scale: 1058.3 0.152353 0.151186 0.155072 Add: 1323.9 0.184707 0.181279 0.188868 Triad: 1128.2 0.218674 0.212734 0.230314 ------------------------------------------------------------- I fail to see any significant indication that these patches, or plugins in general, would result in a degredation of performance. Regards, Julian [1]: http://www.supertuxkart-amiga.de/amiga/sam.html#downloads
On Tue, 9 Sep 2025, Julian Ganz wrote: > September 7, 2025 at 10:21 PM, "BALATON Zoltan" wrote: >> I have uploaded some PPC binaries for the STREAM benchmark that I tested with before here: >> http://zero.eik.bme.hu/~balaton/qemu/stream-test.zip >> which may excercise this if run on sam460ex or ppce500 machines but I don't have a scripted test case for that. There is some docs on how to run Linux on these machines here: >> https://www.qemu.org/docs/master/system/target-ppc.html > > After spending too much time looking for usable root-images (and then > giving up and just stuffing the executables in an ext2), I got to run > these beanchmarks on Linux 4.4.5 configured for the Sam460ex from [1]. Thank you for testing this. > I ran streamPPCpowerpcO3 on qemu with these patches: > > ------------------------------------------------------------- > Function Best Rate MB/s Avg time Min time Max time > Copy: 2867.6 0.056828 0.055795 0.061792 > Scale: 1057.5 0.153282 0.151305 0.158115 > Add: 1308.8 0.187095 0.183380 0.193672 > Triad: 1111.6 0.220863 0.215902 0.230440 > ------------------------------------------------------------- > > After doing a clean build, with the fans still audible: > > ------------------------------------------------------------- > Function Best Rate MB/s Avg time Min time Max time > Copy: 2932.9 0.055131 0.054554 0.055667 > Scale: 1067.9 0.151520 0.149832 0.155000 > Add: 1324.9 0.184807 0.181150 0.191386 > Triad: 1122.0 0.220080 0.213896 0.229302 > ------------------------------------------------------------- What was different between the above two runs? I guess maybe one is with plugins disabled but it's not clear from the description. > On qemu (6a9fa5ef3230a7d51e0d953a59ee9ef10af705b8) without these > patches, but plugins enabled: > > ------------------------------------------------------------- > Function Best Rate MB/s Avg time Min time Max time > Copy: 2972.1 0.054407 0.053834 0.054675 > Scale: 1068.6 0.151503 0.149726 0.154594 > Add: 1327.6 0.185160 0.180784 0.193181 > Triad: 1127.2 0.219249 0.212915 0.229230 > ------------------------------------------------------------- > > And on qemu (6a9fa5ef3230a7d51e0d953a59ee9ef10af705b8) without these > patches, with plugins disabled: > > ------------------------------------------------------------- > Function Best Rate MB/s Avg time Min time Max time > Copy: 2983.4 0.055141 0.053630 0.060013 > Scale: 1058.3 0.152353 0.151186 0.155072 > Add: 1323.9 0.184707 0.181279 0.188868 > Triad: 1128.2 0.218674 0.212734 0.230314 > ------------------------------------------------------------- > > I fail to see any significant indication that these patches, or > plugins in general, would result in a degredation of performance. With worst case Copy test it seems to be about 3.5% (and about 1.7% with plugins disabled?) and should be less than that normally so it does not add much more overhead to plugins than there is already so this should be acceptable. It may still be interesting to see if the overhead with plugins disabled can be avoided with a similar way as logging does it. Regards, BALATON Zoltan > Regards, > Julian > > [1]: http://www.supertuxkart-amiga.de/amiga/sam.html#downloads > >
September 10, 2025 at 12:06 PM, "BALATON Zoltan" wrote: > On Tue, 9 Sep 2025, Julian Ganz wrote: > > I ran streamPPCpowerpcO3 on qemu with these patches: > > > > ------------------------------------------------------------- > > Function Best Rate MB/s Avg time Min time Max time > > Copy: 2867.6 0.056828 0.055795 0.061792 > > Scale: 1057.5 0.153282 0.151305 0.158115 > > Add: 1308.8 0.187095 0.183380 0.193672 > > Triad: 1111.6 0.220863 0.215902 0.230440 > > ------------------------------------------------------------- > > > > After doing a clean build, with the fans still audible: > > > > ------------------------------------------------------------- > > Function Best Rate MB/s Avg time Min time Max time > > Copy: 2932.9 0.055131 0.054554 0.055667 > > Scale: 1067.9 0.151520 0.149832 0.155000 > > Add: 1324.9 0.184807 0.181150 0.191386 > > Triad: 1122.0 0.220080 0.213896 0.229302 > > ------------------------------------------------------------- > > > What was different between the above two runs? I guess maybe one is with plugins disabled but it's not clear from the description. The difference is nothing but a a clean rebuild of qemu. As you see there are fluctuations already. Plugins are enabled for both cases. > > On qemu (6a9fa5ef3230a7d51e0d953a59ee9ef10af705b8) without these > > patches, but plugins enabled: > > > > ------------------------------------------------------------- > > Function Best Rate MB/s Avg time Min time Max time > > Copy: 2972.1 0.054407 0.053834 0.054675 > > Scale: 1068.6 0.151503 0.149726 0.154594 > > Add: 1327.6 0.185160 0.180784 0.193181 > > Triad: 1127.2 0.219249 0.212915 0.229230 > > ------------------------------------------------------------- > > > > And on qemu (6a9fa5ef3230a7d51e0d953a59ee9ef10af705b8) without these > > patches, with plugins disabled: > > > > ------------------------------------------------------------- > > Function Best Rate MB/s Avg time Min time Max time > > Copy: 2983.4 0.055141 0.053630 0.060013 > > Scale: 1058.3 0.152353 0.151186 0.155072 > > Add: 1323.9 0.184707 0.181279 0.188868 > > Triad: 1128.2 0.218674 0.212734 0.230314 > > ------------------------------------------------------------- > > > > I fail to see any significant indication that these patches, or > > plugins in general, would result in a degredation of performance. > > > With worst case Copy test it seems to be about 3.5% (and about 1.7% with plugins disabled?) and should be less than that normally so it does not add much more overhead to plugins than there is already so this should be acceptable. It may still be interesting to see if the overhead with plugins disabled can be avoided with a similar way as logging does it. The thing is: that's probably just usual fluctuations. As you have seen with the first two measurements the values fluctuate quite a bit between runs of the test on the very same qemu (assuming that a clean build did not incur any _other_ relevant change). For example, the best rate for scale shown with plugins enabled is one percent faster than with plugins disabled. Is this significant? Probably not. Or at least it doesn't make much sense. TL:RD: run the very same test multiple times without any changes and you get "vastly" different results. And just from these coarse statistics it's hard to judge whether especially min/max have any significance anyway. That's why you usually also include deviation/variance when writing performance tests, and some percentiles if you have enough individual measurements and the means to store those. What you _can_ tell from these numbers is that the spread for a single function and run is in the percents. I may do some more tests this week, with runtimes longer than a few seconds if I can find the motivation to set up everything I'd need to compile your benchmark. In the mean-time, you are welcome to make your own measurements if you want to. The patches are also availible at [1] if you don't want to apply them to your local tree yourself. Regards, Julian [1]: https://github.com/patchew-project/qemu/tree/patchew/cover.1757018626.git.neither@nut.email
"Julian Ganz" <neither@nut.email> writes: > September 10, 2025 at 12:06 PM, "BALATON Zoltan" wrote: >> On Tue, 9 Sep 2025, Julian Ganz wrote: >> > I ran streamPPCpowerpcO3 on qemu with these patches: >> > >> > ------------------------------------------------------------- >> > Function Best Rate MB/s Avg time Min time Max time >> > Copy: 2867.6 0.056828 0.055795 0.061792 >> > Scale: 1057.5 0.153282 0.151305 0.158115 >> > Add: 1308.8 0.187095 0.183380 0.193672 >> > Triad: 1111.6 0.220863 0.215902 0.230440 >> > ------------------------------------------------------------- >> > >> > After doing a clean build, with the fans still audible: >> > >> > ------------------------------------------------------------- >> > Function Best Rate MB/s Avg time Min time Max time >> > Copy: 2932.9 0.055131 0.054554 0.055667 >> > Scale: 1067.9 0.151520 0.149832 0.155000 >> > Add: 1324.9 0.184807 0.181150 0.191386 >> > Triad: 1122.0 0.220080 0.213896 0.229302 >> > ------------------------------------------------------------- >> > >> What was different between the above two runs? I guess maybe one is with plugins disabled but it's not clear from the description. > > The difference is nothing but a a clean rebuild of qemu. As you see > there are fluctuations already. Plugins are enabled for both cases. > >> > On qemu (6a9fa5ef3230a7d51e0d953a59ee9ef10af705b8) without these >> > patches, but plugins enabled: >> > >> > ------------------------------------------------------------- >> > Function Best Rate MB/s Avg time Min time Max time >> > Copy: 2972.1 0.054407 0.053834 0.054675 >> > Scale: 1068.6 0.151503 0.149726 0.154594 >> > Add: 1327.6 0.185160 0.180784 0.193181 >> > Triad: 1127.2 0.219249 0.212915 0.229230 >> > ------------------------------------------------------------- >> > >> > And on qemu (6a9fa5ef3230a7d51e0d953a59ee9ef10af705b8) without these >> > patches, with plugins disabled: >> > >> > ------------------------------------------------------------- >> > Function Best Rate MB/s Avg time Min time Max time >> > Copy: 2983.4 0.055141 0.053630 0.060013 >> > Scale: 1058.3 0.152353 0.151186 0.155072 >> > Add: 1323.9 0.184707 0.181279 0.188868 >> > Triad: 1128.2 0.218674 0.212734 0.230314 >> > ------------------------------------------------------------- >> > >> > I fail to see any significant indication that these patches, or >> > plugins in general, would result in a degredation of performance. >> > >> With worst case Copy test it seems to be about 3.5% (and about 1.7% >> with plugins disabled?) and should be less than that normally so it >> does not add much more overhead to plugins than there is already so >> this should be acceptable. It may still be interesting to see if the >> overhead with plugins disabled can be avoided with a similar way as >> logging does it. > > The thing is: that's probably just usual fluctuations. As you have seen > with the first two measurements the values fluctuate quite a bit between > runs of the test on the very same qemu (assuming that a clean build did > not incur any _other_ relevant change). For example, the best rate for > scale shown with plugins enabled is one percent faster than with plugins > disabled. Is this significant? Probably not. Or at least it doesn't make > much sense. I wouldn't spend too much time chasing this down. As you say this fluctuation is well within the noise range. I can recommend hyperfine as a runner: https://github.com/sharkdp/hyperfine as it does some work on how many times you need to run a test before the results are statistically relevant. > I may do some more tests this week, with runtimes longer than a few > seconds if I can find the motivation to set up everything I'd need to > compile your benchmark. In the mean-time, you are welcome to make your > own measurements if you want to. The patches are also availible at [1] > if you don't want to apply them to your local tree yourself. Balton, I don't think worries about performance impact are justified and Julian has certainly done enough due diligence here. If you can come up with a repeatable test that shows a measurable impact then please do so. > > Regards, > Julian > > [1]: https://github.com/patchew-project/qemu/tree/patchew/cover.1757018626.git.neither@nut.email -- Alex Bennée Virtualisation Tech Lead @ Linaro
On Wed, 10 Sep 2025, Alex Bennée wrote: > "Julian Ganz" <neither@nut.email> writes: >> September 10, 2025 at 12:06 PM, "BALATON Zoltan" wrote: >>> On Tue, 9 Sep 2025, Julian Ganz wrote: >>>> I ran streamPPCpowerpcO3 on qemu with these patches: >>>> >>>> ------------------------------------------------------------- >>>> Function Best Rate MB/s Avg time Min time Max time >>>> Copy: 2867.6 0.056828 0.055795 0.061792 >>>> Scale: 1057.5 0.153282 0.151305 0.158115 >>>> Add: 1308.8 0.187095 0.183380 0.193672 >>>> Triad: 1111.6 0.220863 0.215902 0.230440 >>>> ------------------------------------------------------------- >>>> >>>> After doing a clean build, with the fans still audible: >>>> >>>> ------------------------------------------------------------- >>>> Function Best Rate MB/s Avg time Min time Max time >>>> Copy: 2932.9 0.055131 0.054554 0.055667 >>>> Scale: 1067.9 0.151520 0.149832 0.155000 >>>> Add: 1324.9 0.184807 0.181150 0.191386 >>>> Triad: 1122.0 0.220080 0.213896 0.229302 >>>> ------------------------------------------------------------- >>>> >>> What was different between the above two runs? I guess maybe one is with plugins disabled but it's not clear from the description. >> >> The difference is nothing but a a clean rebuild of qemu. As you see >> there are fluctuations already. Plugins are enabled for both cases. >> >>>> On qemu (6a9fa5ef3230a7d51e0d953a59ee9ef10af705b8) without these >>>> patches, but plugins enabled: >>>> >>>> ------------------------------------------------------------- >>>> Function Best Rate MB/s Avg time Min time Max time >>>> Copy: 2972.1 0.054407 0.053834 0.054675 >>>> Scale: 1068.6 0.151503 0.149726 0.154594 >>>> Add: 1327.6 0.185160 0.180784 0.193181 >>>> Triad: 1127.2 0.219249 0.212915 0.229230 >>>> ------------------------------------------------------------- >>>> >>>> And on qemu (6a9fa5ef3230a7d51e0d953a59ee9ef10af705b8) without these >>>> patches, with plugins disabled: >>>> >>>> ------------------------------------------------------------- >>>> Function Best Rate MB/s Avg time Min time Max time >>>> Copy: 2983.4 0.055141 0.053630 0.060013 >>>> Scale: 1058.3 0.152353 0.151186 0.155072 >>>> Add: 1323.9 0.184707 0.181279 0.188868 >>>> Triad: 1128.2 0.218674 0.212734 0.230314 >>>> ------------------------------------------------------------- >>>> >>>> I fail to see any significant indication that these patches, or >>>> plugins in general, would result in a degredation of performance. >>>> >>> With worst case Copy test it seems to be about 3.5% (and about 1.7% >>> with plugins disabled?) and should be less than that normally so it >>> does not add much more overhead to plugins than there is already so >>> this should be acceptable. It may still be interesting to see if the >>> overhead with plugins disabled can be avoided with a similar way as >>> logging does it. >> >> The thing is: that's probably just usual fluctuations. As you have seen >> with the first two measurements the values fluctuate quite a bit between >> runs of the test on the very same qemu (assuming that a clean build did >> not incur any _other_ relevant change). For example, the best rate for >> scale shown with plugins enabled is one percent faster than with plugins >> disabled. Is this significant? Probably not. Or at least it doesn't make >> much sense. > > I wouldn't spend too much time chasing this down. As you say this > fluctuation is well within the noise range. > > I can recommend hyperfine as a runner: > > https://github.com/sharkdp/hyperfine > > as it does some work on how many times you need to run a test before the > results are statistically relevant. > >> I may do some more tests this week, with runtimes longer than a few >> seconds if I can find the motivation to set up everything I'd need to >> compile your benchmark. In the mean-time, you are welcome to make your >> own measurements if you want to. The patches are also availible at [1] >> if you don't want to apply them to your local tree yourself. > > Balton, > > I don't think worries about performance impact are justified and Julian > has certainly done enough due diligence here. If you can come up with a > repeatable test that shows a measurable impact then please do so. I agree this testing is enough to ensure there is no big impact. I just wanted to make sure there is some testing and not just adding stuff without worrying about performance. I'd like to keep QEMU quick and only add unavoidable overhead where possible but I don't demand to spend too much time on that. If Julian got interested and does more testing that may give some interesting results for possible optimisation but if no time for that this was enough to measure the impact for this series. Regards, BALATON Zoltan >> Regards, >> Julian >> >> [1]: https://github.com/patchew-project/qemu/tree/patchew/cover.1757018626.git.neither@nut.email > >
September 7, 2025 at 10:21 PM, "BALATON Zoltan" wrote:
> On Fri, 5 Sep 2025, Julian Ganz wrote:
>
> >
> > September 5, 2025 at 9:25 PM, "BALATON Zoltan" wrote:
> > > The default may better be what the larger group needs. Even then distros may still change the default so it would be best if the overhead can be minimised even if enabled. I think the log infrastructure does that, would a similar solution work here?
> > >
> > > For testing I've found that because embedded PPC CPUs have a software controlled MMU (and in addition to that QEMU may flush TLB entries too often) running something that does a lot of memory access like runnung the STREAM benchmark on sam460ex is hit by this IIRC but anything else causing a lot of interrupts like reading from emulated disk or sound is probably affected as well. I've tried to optimise PPC exception handling a bit before but whenever I optimise something it is later undone by other changes not caring about performance.
> > >
> > I could try running the benchmark on multiple versions:
> >
> > * qemu with plugins disabled,
> > * with plugins enabled but without these patches and
> > * with plugins enabled and with these patches.
> >
> > However, I'll likely only report back with results next week, though.
> > Do you happen to have an image you can point me to? Either something
> > that has the benchmark already or some unixoid running on the platform?
> > I'm currently not motivated enough to cook up some bare-metal testbed
> > for a platform I'm not familiar with.
> >
> I don't have ready images to test embedded PPC MMU exceptions which I think this may affect most. I had an image for pegasos2 for a general test used here:
> https://lists.nongnu.org/archive/html/qemu-discuss/2023-12/msg00008.html
> but that machine has a G4 CPU which has hardware MMU so is likely not affected.
I ran this test anyway, because it was easy enough to run. Tweaked the
script to do 10 runs, each with one of the aforementioned variants.
Isolating the time stats printed gives you the following:
Qemu with patches and plugins enabled:
Frame | CPU time/estim | REAL time/estim | play/CPU | ETA
1149/1149 (100%)| 0:23/ 0:23| 0:23/ 0:23| 1.2557x| 0:00
1149/1149 (100%)| 0:23/ 0:23| 0:23/ 0:23| 1.2621x| 0:00
1149/1149 (100%)| 0:23/ 0:23| 0:23/ 0:23| 1.2536x| 0:00
1149/1149 (100%)| 0:24/ 0:24| 0:24/ 0:24| 1.2394x| 0:00
1149/1149 (100%)| 0:23/ 0:23| 0:23/ 0:23| 1.2529x| 0:00
1149/1149 (100%)| 0:23/ 0:23| 0:23/ 0:23| 1.2565x| 0:00
1149/1149 (100%)| 0:24/ 0:24| 0:24/ 0:24| 1.2456x| 0:00
1149/1149 (100%)| 0:24/ 0:24| 0:24/ 0:24| 1.2450x| 0:00
1149/1149 (100%)| 0:23/ 0:23| 0:23/ 0:23| 1.2526x| 0:00
1149/1149 (100%)| 0:23/ 0:23| 0:23/ 0:23| 1.2528x| 0:00
Qemu (6a9fa5ef3230a7d51e0d953a59ee9ef10af705b8) without these patches,
but plugins enabled:
Frame | CPU time/estim | REAL time/estim | play/CPU | ETA
1149/1149 (100%)| 0:24/ 0:24| 0:24/ 0:24| 1.2309x| 0:00
1149/1149 (100%)| 0:24/ 0:24| 0:24/ 0:24| 1.2399x| 0:00
1149/1149 (100%)| 0:23/ 0:23| 0:23/ 0:23| 1.2547x| 0:00
1149/1149 (100%)| 0:23/ 0:23| 0:23/ 0:23| 1.2511x| 0:00
1149/1149 (100%)| 0:24/ 0:24| 0:24/ 0:24| 1.2265x| 0:00
1149/1149 (100%)| 0:24/ 0:24| 0:24/ 0:24| 1.2156x| 0:00
1149/1149 (100%)| 0:24/ 0:24| 0:24/ 0:24| 1.2401x| 0:00
1149/1149 (100%)| 0:24/ 0:24| 0:24/ 0:24| 1.2460x| 0:00
1149/1149 (100%)| 0:24/ 0:24| 0:24/ 0:24| 1.2472x| 0:00
1149/1149 (100%)| 0:24/ 0:24| 0:24/ 0:24| 1.2370x| 0:00
Qemu (6a9fa5ef3230a7d51e0d953a59ee9ef10af705b8) without these patches,
with plugins disabled:
Frame | CPU time/estim | REAL time/estim | play/CPU | ETA
1149/1149 (100%)| 0:24/ 0:24| 0:24/ 0:24| 1.2478x| 0:00
1149/1149 (100%)| 0:23/ 0:23| 0:23/ 0:23| 1.2509x| 0:00
1149/1149 (100%)| 0:24/ 0:24| 0:24/ 0:24| 1.2500x| 0:00
1149/1149 (100%)| 0:24/ 0:24| 0:24/ 0:24| 1.2019x| 0:00
1149/1149 (100%)| 0:24/ 0:24| 0:24/ 0:24| 1.2439x| 0:00
1149/1149 (100%)| 0:24/ 0:24| 0:24/ 0:24| 1.2453x| 0:00
1149/1149 (100%)| 0:25/ 0:25| 0:25/ 0:25| 1.1945x| 0:00
1149/1149 (100%)| 0:24/ 0:24| 0:24/ 0:24| 1.2331x| 0:00
1149/1149 (100%)| 0:23/ 0:23| 0:23/ 0:23| 1.2510x| 0:00
1149/1149 (100%)| 0:24/ 0:24| 0:24/ 0:24| 1.2467x| 0:00
So nothing to see here. If anything, we see a slight reduction in
runtime with these patches, which doesn't makes any sense. I did not do
a fresh clean build for those, and the order I ran the tests may have
had some influence on the results.
> I have uploaded some PPC binaries for the STREAM benchmark that I tested with before here:
> http://zero.eik.bme.hu/~balaton/qemu/stream-test.zip
> which may excercise this if run on sam460ex or ppce500 machines but I don't have a scripted test case for that. There is some docs on how to run Linux on these machines here:
> https://www.qemu.org/docs/master/system/target-ppc.html
Thanks, I'll have alook at how to run those over the course of this
week.
> Alternatively maybe running a disk IO benchmark on an emulated IDE controller using PIO mode or some other device that generates a lots of interrupts may test this. I think you can use the "info irq" command in QEMU Monitor to check how many interrupts you get.
I started writing a small exception/interrupt torture test for the
PPC440 with the help of robots. If and when I'm finishing it I'll do
some measurements with that.
Regards,
Julian
BALATON Zoltan <balaton@eik.bme.hu> writes: > On Thu, 4 Sep 2025, Julian Ganz wrote: >> Some analysis greatly benefits, or depends on, information about >> certain types of dicontinuities such as interrupts. For example, we may >> need to handle the execution of a new translation block differently if >> it is not the result of normal program flow but of an interrupt. >> >> Even with the existing interfaces, it is more or less possible to >> discern these situations, e.g. as done by the cflow plugin. However, >> this process poses a considerable overhead to the core analysis one may >> intend to perform. > > I'd rather have overhead in the plugin than in interrupt and exception > handling on every target unless this can be completely disabled > somehow when not needed to not pose any overhead on interrupt handling > in the guest. If you build with --disable-plugins the compiler should dead code away all the plugin hooks. But in general the overhead from unused plugins is in the noise. > Have you done any testing on how much overhead this adds > to interrupt heavy guest workloads? At least for PPC these are already > much slower than real CPU so I'd like it to get faster not slower. I have a vague memory that this is due to ppc running the interrupt handling code more often than it should. But I forget the details. Are there any functional tests that exhibit this slow IRQ handling behaviour? > > Regards, > BALATON Zoltan > >> These changes introduce a generic and easy-to-use interface for plugin >> authors in the form of a callback for discontinuities. Patch 1 defines >> an enumeration of some trap-related discontinuities including somewhat >> narrow definitions of the discontinuity evetns and a callback type. >> Patch 2 defines the callback registration function. Patch 3 adds some >> hooks for triggering the callbacks. Patch 4 adds an example plugin >> showcasing the new API. >> >> Patches 5 through 22 call the hooks for all architectures but hexagon, >> mapping architecture specific events to the three categories defined in >> patch 1. We don't plan to add hooks for hexagon since despite having >> exceptions apparently doesn't have any discontinuities associated with >> them. >> >> Patch 23 supplies a test plugin asserting some behavior of the plugin >> API w.r.t. the PCs reported by the new API. Finally, patches 24 and 25 >> add new tests for riscv which serve as test-cases for the test plugin. >> >> Sidenote: I'm likely doing something wrong for one architecture or >> the other. These patches are untested for most of them. >> >> Richard Henderson proposed streamlining interrupts and exceptions for >> all targets and calling the hooks from a higher level rather than in >> each target code. However, there are a few obstacled and I decided to >> not do this as part of this series. >> >> Since v5: >> - The internal function plugin_vcpu_cb__discon now takes the >> qemu_plugin_event as a parameter instead of determining the event >> from the discon type. >> - Fixed computation of the last PC for ARM platforms. >> - Code mapping ARM exception index to discon type is now shared >> between m- and a-profile. >> - Fixed mapping of interrupt number to discon type for HPPA platforms. >> - Removed exception hook for some internal events for Motorola 68000. >> - Call hook for unaligned access exceptions on MicroBlaze platforms. >> - Prevented calling of exception hooks for resets on OpenRISC. >> - Made the discon test plugin compare hardware addesses transpated >> with qemu_plugin_translate_vaddr when comparing addresses. Before >> we'd use a crude bitmask. >> >> Since v4: >> - Fixed a typo in the documentation of the >> qemu_plugin_vcpu_discon_cb_t function type (pointed out by Pierrick >> Bouvier) >> - Fixed a reference in the documentation of the >> qemu_plugin_vcpu_discon_cb_t function type >> - Added hooks for SuperH and TriCore targets >> - Fixed typos in commit messages (pointed out by Daniel Henrique >> Barboza) >> >> Since v3 (RFC): >> - Switched to shifting 1 notation for qemu_plugin_discon_type values >> (as requested by Pierrick Bouvier) >> - Added missing documentation of function parameters of function >> pointer type qemu_plugin_vcpu_discon_cb_t >> - Added missing documentation of function parameters of >> qemu_plugin_register_vcpu_discon_cb >> - Eliminated "to" argument from hooks called from target specific >> code, i.e. qemu_plugin_vcpu_interrupt_cb and friends, determine "to" >> address using CPUClass::get_pc >> - Replaced comment declaring switch-case unreachable with >> g_assert_not_reached() >> - Call qemu_plugin_register_vcpu_discon_cb with QEMU_PLUGIN_DISCON_ALL >> rather than QEMU_PLUGIN_DISCON_TRAPS in "traps" example plugin >> - Take max_vcpus from qemu_info_t in "traps" example plugin, don't >> determine it based on VCPU activation >> - Added a description of the "traps" example plugin (as requested by >> Pierrick Bouvier) >> - Added section for the "traps" example plugin in documentation's >> "Emulation" chapter >> - Fixed messed-up switch-case in alpha_cpu_do_interrupt >> - Added hooks for PA-RISC, x86, loongarch, Motorola 68000, MicroBlaze, >> OpenRISC, Power PC, Renesas Xtreme, IBM System/390 and xtensa >> targets. >> - Made "discon" test plugin check PCs in vcpu_discon callback (as >> requested by Pierrick Bouvier) >> - Added parameter to "discon" test plugin for controlling which >> address bits are compared to cope with TBs being used under >> different virtual addresses >> - Added parameter to "discon" test plugin for printing a full >> instruction trace for debugging purposes >> - Made "discon" test plugin abort by default on address mismatches >> - Added test-cases for RISC-V >> >> Since v2 (tcg-plugins: add hooks for interrupts, exceptions and traps): >> - Switched from traps as core concept to more generic discontinuities >> - Switched from semihosting to hostcall as term for emulated traps >> - Added enumeration of events and dedicated callback type >> - Make callback receive event type as well as origin and target PC >> (as requested by Pierrick Bouvier) >> - Combined registration functions for different traps into a single >> one for all types of discontinuities (as requested by Pierrick >> Bouvier) >> - Migrated records in example plugin from fully pre-allocated to a >> scoreboard (as suggested by Pierrick Bouvier) >> - Handle PSCI calls as hostcall (as pointed out by Peter Maydell) >> - Added hooks for ARM Cortex M arches (as pointed out by Peter >> Maydell) >> - Added hooks for Alpha targets >> - Added hooks for MIPS targets >> - Added a plugin for testing some of the interface behaviour >> >> Since v1: >> - Split the one callback into multiple callbacks >> - Added a target-agnostic definition of the relevant event(s) >> - Call hooks from architecture-code rather than accel/tcg/cpu-exec.c >> - Added a plugin showcasing API usage >> >> Julian Ganz (25): >> plugins: add types for callbacks related to certain discontinuities >> plugins: add API for registering discontinuity callbacks >> plugins: add hooks for new discontinuity related callbacks >> contrib/plugins: add plugin showcasing new dicontinuity related API >> target/alpha: call plugin trap callbacks >> target/arm: call plugin trap callbacks >> target/avr: call plugin trap callbacks >> target/hppa: call plugin trap callbacks >> target/i386: call plugin trap callbacks >> target/loongarch: call plugin trap callbacks >> target/m68k: call plugin trap callbacks >> target/microblaze: call plugin trap callbacks >> target/mips: call plugin trap callbacks >> target/openrisc: call plugin trap callbacks >> target/ppc: call plugin trap callbacks >> target/riscv: call plugin trap callbacks >> target/rx: call plugin trap callbacks >> target/s390x: call plugin trap callbacks >> target/sh4: call plugin trap callbacks >> target/sparc: call plugin trap callbacks >> target/tricore: call plugin trap callbacks >> target/xtensa: call plugin trap callbacks >> tests: add plugin asserting correctness of discon event's to_pc >> tests: add test for double-traps on rv64 >> tests: add test with interrupted memory accesses on rv64 >> >> contrib/plugins/meson.build | 3 +- >> contrib/plugins/traps.c | 84 +++++++++ >> docs/about/emulation.rst | 8 + >> include/qemu/plugin-event.h | 3 + >> include/qemu/plugin.h | 13 ++ >> include/qemu/qemu-plugin.h | 60 +++++++ >> plugins/core.c | 57 ++++++ >> target/alpha/helper.c | 13 ++ >> target/arm/helper.c | 24 +++ >> target/arm/internals.h | 1 + >> target/arm/tcg/m_helper.c | 5 + >> target/avr/helper.c | 3 + >> target/hppa/int_helper.c | 44 +++++ >> target/i386/tcg/excp_helper.c | 3 + >> target/i386/tcg/seg_helper.c | 4 + >> target/loongarch/cpu.c | 4 + >> target/m68k/op_helper.c | 22 +++ >> target/microblaze/helper.c | 10 ++ >> target/mips/tcg/system/tlb_helper.c | 11 ++ >> target/openrisc/interrupt.c | 15 ++ >> target/ppc/excp_helper.c | 41 +++++ >> target/riscv/cpu_helper.c | 9 + >> target/rx/helper.c | 12 ++ >> target/s390x/tcg/excp_helper.c | 8 + >> target/sh4/helper.c | 4 + >> target/sparc/int32_helper.c | 7 + >> target/sparc/int64_helper.c | 10 ++ >> target/tricore/op_helper.c | 5 + >> target/xtensa/exc_helper.c | 6 + >> tests/tcg/plugins/discons.c | 210 ++++++++++++++++++++++ >> tests/tcg/plugins/meson.build | 2 +- >> tests/tcg/riscv64/Makefile.softmmu-target | 12 ++ >> tests/tcg/riscv64/doubletrap.S | 73 ++++++++ >> tests/tcg/riscv64/interruptedmemory.S | 67 +++++++ >> 34 files changed, 851 insertions(+), 2 deletions(-) >> create mode 100644 contrib/plugins/traps.c >> create mode 100644 tests/tcg/plugins/discons.c >> create mode 100644 tests/tcg/riscv64/doubletrap.S >> create mode 100644 tests/tcg/riscv64/interruptedmemory.S >> >> -- Alex Bennée Virtualisation Tech Lead @ Linaro
© 2016 - 2026 Red Hat, Inc.