When migrating ARM guests accross same machines with different host
kernels we are likely to encounter failures such as:
"failed to load cpu:cpreg_vmstate_array_len"
This is due to the fact KVM exposes a different number of registers
to qemu on source and destination. When trying to migrate a bigger
register set to a smaller one, qemu cannot save the CPU state.
For example, recently we faced such kind of situations with:
- unconditionnal exposure of KVM_REG_ARM_VENDOR_HYP_BMAP_2 FW pseudo
register from v6.16 onwards. Causes backward migration failure.
- removal of unconditionnal exposure of TCR2_EL1, PIRE0_EL1, PIR_EL1
from v6.13 onwards. Causes forward migration failure.
This situation is really problematic for distributions which want to
guarantee forward and backward migration of a given machine type
between different releases.
While the series mainly targets KVM acceleration, this problem
also exists with TCG. For instance some registers may be exposed
while they shouldn't. Then it is tricky to fix that situation
without breaking forward migration. An example was provided by
Peter: 4f2b82f60 ("target/arm: Reinstate bogus AArch32 DBGDTRTX
register for migration compat).
This series introduces 2 CPU array properties that list
- the CPU registers to hide from the exposes sysregs (aims
at removing registers from the destination)
- The CPU registers that may not exist but which can be found
in the incoming migration stream (aims at ignoring extra
registers in the incoming state)
An example is given to illustrate how those props
could be used to apply compats for machine types supposed to "see" the
same register set accross various host kernels.
Mitigation of DBGDTRTX issue would be achieved by setting
x-mig-safe-missing-regs=0x40200000200e0298 which matches
AArch32 DBGDTRTX register index.
The first patch improves the tracing so that we can quickly detect
which registers do not match between the incoming stream and the
exposed sysregs
---
Available at:
https://github.com/eauger/qemu/tree/mitig-v5
---
Tests:
- migration with 10.2 machine with old qemu featuring DBGDTRTX
and this one where it is removed. Forward migration works.
backward doesn't because the register is not present in the
input migration stream and write_list_to_cpustate() fails
while write_raw_cp_reg and reading it back. write_raw_cp_reg()
seems to read an unintialized values from cpu->cpreg_values[i].
write has no effect since type is ARM_CP_CONST but read_raw_cp_reg
returns ri->resetvalue which differs from uninitialized value.
I would have expected the initial cpu->cpreg_values[i] to match
reset value which is obviously not the case. Laso the comment hints
that it should be. So maybe another issue? Nevertheless I am
not totally sure supporting backward migration for TCG is a must.
This may be fixed separately if it is confirmed this is a bug.
- migration with accel=kvm back and forth old host/qemu where
host does not feature fixes for TCR2_EL1, PIRE0_EL1, PIR_EL1
and recent KVM_REG_ARM_VENDOR_HYP_BMAP_2 FW and more recent
kernel/this qemu that feature them. Migration works forward
and backward with 10.1 machine type.
History:
v4 -> v5:
- Fixed issue reported by Sebastian about aggregated array
props. This lead to the introduction of
hw/arm/virt: Introduce framework to aggregate hidden-regs
and safe-missing-regs
- Collected additional hacks from Connie
v3 -> v4:
- Collected Connie's & Sebastian's R-bs
- Squashed patches 3 and 5
- various typos and rewording
v2 -> v3:
- revert target/arm: Reinstate bogus AArch32 DBGDTRTX register for migration compat
- fix some typos and rework target/arm/cpu.h hidden_regs comment (Connie)
- Even for TCG we use KVM index
v1 -> v2:
- fixed typos (Connie)
- Make it less KVM specific (tentative hidding of TCG regs, not
tested)
- Tested DBGDTRTX TCG case reported by Peter
- No change to the property format yet. Ran out of idea. However
I changed the name of the property with x-mig prefix
- Changed the terminology, kept hidding but remove fake which was
confusing
- Simplified the logic for regs missing in the incoming stream and
do not check anymore they are exposed on dest
Eric Auger (11):
hw/arm/virt: Rename arm_virt_compat into arm_virt_compat_defaults
target/arm/machine: Improve traces on register mismatch during
migration
target/arm/cpu: Allow registers to be hidden
target/arm/machine: Allow extra regs in the incoming stream
kvm-all: Enforce hidden regs are never accessed
target/arm/cpu: Implement hide_reg callback()
target/arm/cpu: Expose x-mig-hidden-regs and x-mig-safe-missing-regs
properties
hw/arm/virt: Declare AArch32 DBGDTRTX as safe to ignore in incoming
stream
Revert "target/arm: Reinstate bogus AArch32 DBGDTRTX register for
migration compat"
hw/arm/virt: Introduce framework to aggregate hidden-regs and
safe-missing-regs
hw/arm/virt: [DO NOT UPSTREAM] Enforce compatibility with older
kernels
include/hw/arm/virt.h | 23 +++++++++++
include/hw/core/cpu.h | 2 +
target/arm/cpu.h | 48 ++++++++++++++++++++++
accel/kvm/kvm-all.c | 12 ++++++
hw/arm/virt.c | 86 ++++++++++++++++++++++++++++++++++++---
target/arm/cpu.c | 11 +++++
target/arm/debug_helper.c | 29 -------------
target/arm/helper.c | 12 +++++-
target/arm/kvm.c | 35 +++++++++++++++-
target/arm/machine.c | 70 ++++++++++++++++++++++++++++---
target/arm/trace-events | 10 +++++
11 files changed, 295 insertions(+), 43 deletions(-)
--
2.52.0