[PATCH v2 0/8] Mitigation of "failed to load cpu:cpreg_vmstate_array_len" migration failures

Eric Auger posted 8 patches 1 day, 18 hours ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20251118160920.554809-1-eric.auger@redhat.com
Maintainers: Paolo Bonzini <pbonzini@redhat.com>, Peter Maydell <peter.maydell@linaro.org>, Eduardo Habkost <eduardo@habkost.net>, Marcel Apfelbaum <marcel.apfelbaum@gmail.com>, "Philippe Mathieu-Daudé" <philmd@linaro.org>, Yanan Wang <wangyanan55@huawei.com>, Zhao Liu <zhao1.liu@intel.com>
include/hw/core/cpu.h   |  2 ++
target/arm/cpu.h        | 51 +++++++++++++++++++++++++++++++
accel/kvm/kvm-all.c     | 12 ++++++++
hw/arm/virt.c           | 19 ++++++++++++
target/arm/cpu.c        | 11 +++++++
target/arm/helper.c     | 10 +++++-
target/arm/kvm.c        | 35 ++++++++++++++++++++-
target/arm/machine.c    | 67 +++++++++++++++++++++++++++++++++++++----
target/arm/trace-events |  9 ++++++
9 files changed, 208 insertions(+), 8 deletions(-)
[PATCH v2 0/8] Mitigation of "failed to load cpu:cpreg_vmstate_array_len" migration failures
Posted by Eric Auger 1 day, 18 hours ago
When migrating ARM guests accross same machines with different host
kernels we are likely to encounter failures such as:

"failed to load cpu:cpreg_vmstate_array_len"

This is due to the fact KVM exposes a different number of registers
to qemu on source and destination. When trying to migrate a bigger
register set to a smaller one, qemu cannot save the CPU state.

For example, recently we faced such kind of situations with:
- unconditionnal exposure of KVM_REG_ARM_VENDOR_HYP_BMAP_2 FW pseudo
  register from v6.16 onwards. Causes backward migration failure.
- removal of unconditionnal exposure of TCR2_EL1, PIRE0_EL1, PIR_EL1
  from v6.13 onwards. Causes forward migration failure.

This situation is really problematic for distributions which want to
guarantee forward and backward migration of a given machine type
between different releases. 

While the series mainly targets KVM acceleration, this problem
also exists with TCG. For instance some registers may be exposed
while they shouldn't. Then it is tricky to fix that situation
without breaking forward migration. An example was provided by
Peter: 4f2b82f60 ("target/arm: Reinstate bogus AArch32 DBGDTRTX
register for migration compat).

This series introduces 2 CPU array properties that list
- the CPU registers to hide from the exposes sysregs (aims
  at removing registers from the destination)
- The CPU registers that may not exist but which can be found
  in the incoming migration stream (aims at ignoring extra
  registers in the incoming state)

An example is given to illustrate how those props
could be used to apply compats for machine types supposed to "see" the
same register set accross various host kernels.

Mitigation of DBGDTRTX issue would be achived by setting
x-mig-safe-missing-regs=0x40200000200e0298 which matches
AArch32 DBGDTRTX register index.

The first patch improves the tracing so that we can quickly detect
which registers do not match between the incoming stream and the
exposed sysregs

---

History:

v1 -> v2:
- fixed typos (Connie)
- Make it less KVM specific (tentative hidding of TCG regs, not
  tested)
- Tested DBGDTRTX TCG case reported by Peter
- No change to the property format yet. Ran out of idea. However
  I changed the name of the property with x-mig prefix
- Changed the terminology, kept hidding but remove fake which was
  confusing
- Simplified the logic for regs missing in the incoming stream and
  do not check anymore they are exposed on dest

Available at:
https://github.com/eauger/qemu/tree/mitig-v2


Eric Auger (8):
  target/arm/machine: Improve traces on register mismatch during
    migration
  target/arm/cpu: Allow registers to be hidden
  target/arm/machine: Allow extra regs in the incoming stream
  target/arm/helper: Skip hidden registers
  kvm-all: Add the capability to blacklist some KVM regs
  target/arm/cpu: Implement hide_reg callback()
  target/arm/cpu: Expose x-mig-hidden-regs and x-mig-safe-missing-regs
    properties
  hw/arm/virt: [DO NOT UPSTREAM] Enforce compatibility with older
    kernels

 include/hw/core/cpu.h   |  2 ++
 target/arm/cpu.h        | 51 +++++++++++++++++++++++++++++++
 accel/kvm/kvm-all.c     | 12 ++++++++
 hw/arm/virt.c           | 19 ++++++++++++
 target/arm/cpu.c        | 11 +++++++
 target/arm/helper.c     | 10 +++++-
 target/arm/kvm.c        | 35 ++++++++++++++++++++-
 target/arm/machine.c    | 67 +++++++++++++++++++++++++++++++++++++----
 target/arm/trace-events |  9 ++++++
 9 files changed, 208 insertions(+), 8 deletions(-)

-- 
2.51.1