[PATCH v9 0/6] Mitigation of "failed to load cpu:cpreg_vmstate_array_len" migration failures

Eric Auger posted 6 patches 1 month ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20260306150302.203112-1-eric.auger@redhat.com
Maintainers: Peter Maydell <peter.maydell@linaro.org>
target/arm/cpu.h          | 33 +++++++++++++++++
target/arm/cpu.c          | 75 +++++++++++++++++++++++++++++++++++++++
target/arm/cpu64.c        | 34 ++++++++++++++++++
target/arm/debug_helper.c | 29 ---------------
target/arm/helper.c       |  5 +++
target/arm/machine.c      | 21 +++++++----
target/arm/trace-events   |  2 ++
7 files changed, 164 insertions(+), 35 deletions(-)
[PATCH v9 0/6] Mitigation of "failed to load cpu:cpreg_vmstate_array_len" migration failures
Posted by Eric Auger 1 month ago
This series applies on top of
[PATCH v3 0/7] Improve traces on migration failure due to cpreg number mismatch

It introduces an infrastructure to explicitly tolerate some
mismatches between cpu cpreg indexes/values on migration.

We especially handle failures due to an attempt to migrate
more registers than exposed on the destination.

Once the infrastructure is in place we can handle the following
known issues:
KVM:
- removal of unconditionnal exposure of TCR2_EL1, PIRE0_EL1, PIR_EL1
  from v6.13 onwards. Causes forward migration failure.
- unconditionnal exposure of KVM_REG_ARM_VENDOR_HYP_BMAP_2 FW pseudo
  register from v6.16 onwards. Causes backward migration failure.
TCG:
- We can eventually remove AArch32 DBGDTRTX register which was
  erroneously defined.

Both series are available at:
https://github.com/eauger/qemu/tree/cpreg_vmstate_array_len_traces_v3_mitig_v3

History:
v8 -> v9:
- took into account Seb's comments (fixed ->type check, rename one function)
- collected R-bs

v7 -> v8:
- rebase on top of v3 dep
- removal of target/arm/machine: Fix detection of unknown incoming cpregs
  which is integrated in prerequisite series

v6 -> v7
- complete rework after Peter's comments in
  (https://lore.kernel.org/all/20260126165445.3033335-1-eric.auger@redhat.com/)
  Especially it does not use properties at all. Do not feature hidden regs
  Also the initial series was split into 2.


Eric Auger (6):
  target/arm/cpu: Introduce the infrastructure for cpreg migration
    tolerances
  target/arm/machine: Take account cpreg mig tolerances in case of
    mismatch
  target/arm/cpu64: Mitigate migration failures due to spurious TCR_EL1,
    PIRE0_EL1 and PIR_EL1
  target/arm/cpu64: Define cpreg migration tolerance for
    KVM_REG_ARM_VENDOR_HYP_BMAP_2
  target/arm/helper: Define cpreg migration tolerance for DGBDTR_EL0
  Revert "target/arm: Reinstate bogus AArch32 DBGDTRTX register for
    migration compat"

 target/arm/cpu.h          | 33 +++++++++++++++++
 target/arm/cpu.c          | 75 +++++++++++++++++++++++++++++++++++++++
 target/arm/cpu64.c        | 34 ++++++++++++++++++
 target/arm/debug_helper.c | 29 ---------------
 target/arm/helper.c       |  5 +++
 target/arm/machine.c      | 21 +++++++----
 target/arm/trace-events   |  2 ++
 7 files changed, 164 insertions(+), 35 deletions(-)

-- 
2.53.0
Re: [PATCH v9 0/6] Mitigation of "failed to load cpu:cpreg_vmstate_array_len" migration failures
Posted by Peter Maydell 4 weeks ago
On Fri, 6 Mar 2026 at 15:03, Eric Auger <eric.auger@redhat.com> wrote:
>
> This series applies on top of
> [PATCH v3 0/7] Improve traces on migration failure due to cpreg number mismatch
>
> It introduces an infrastructure to explicitly tolerate some
> mismatches between cpu cpreg indexes/values on migration.
>
> We especially handle failures due to an attempt to migrate
> more registers than exposed on the destination.
>
> Once the infrastructure is in place we can handle the following
> known issues:
> KVM:
> - removal of unconditionnal exposure of TCR2_EL1, PIRE0_EL1, PIR_EL1
>   from v6.13 onwards. Causes forward migration failure.
> - unconditionnal exposure of KVM_REG_ARM_VENDOR_HYP_BMAP_2 FW pseudo
>   register from v6.16 onwards. Causes backward migration failure.
> TCG:
> - We can eventually remove AArch32 DBGDTRTX register which was
>   erroneously defined.

Hi; I have this on my list to review, but I think this series has
unfortunately missed the 11.0 freeze deadline, so we should aim to
get it in for early in the 11.1 cycle.

thanks
-- PMM