include/hw/arm/virt.h | 23 ++++++++++ include/hw/core/cpu.h | 2 + target/arm/cpu.h | 48 +++++++++++++++++++++ accel/kvm/kvm-all.c | 12 ++++++ hw/arm/virt.c | 89 ++++++++++++++++++++++++++++++++++++--- target/arm/cpu.c | 11 +++++ target/arm/debug_helper.c | 29 ------------- target/arm/helper.c | 12 +++++- target/arm/kvm.c | 35 ++++++++++++++- target/arm/machine.c | 70 +++++++++++++++++++++++++++--- target/arm/trace-events | 10 +++++ 11 files changed, 298 insertions(+), 43 deletions(-)
When migrating ARM guests accross same machines with different host
kernels we are likely to encounter failures such as:
"failed to load cpu:cpreg_vmstate_array_len"
This is due to the fact KVM exposes a different number of registers
to qemu on source and destination. When trying to migrate a bigger
register set to a smaller one, qemu cannot save the CPU state.
For example, recently we faced such kind of situations with:
- unconditionnal exposure of KVM_REG_ARM_VENDOR_HYP_BMAP_2 FW pseudo
register from v6.16 onwards. Causes backward migration failure.
- removal of unconditionnal exposure of TCR2_EL1, PIRE0_EL1, PIR_EL1
from v6.13 onwards. Causes forward migration failure.
This situation is really problematic for distributions which want to
guarantee forward and backward migration of a given machine type
between different releases.
While the series mainly targets KVM acceleration, this problem
also exists with TCG. For instance some registers may be exposed
while they shouldn't. Then it is tricky to fix that situation
without breaking forward migration. An example was provided by
Peter: 4f2b82f60 ("target/arm: Reinstate bogus AArch32 DBGDTRTX
register for migration compat).
This series introduces 2 CPU array properties that list
- the CPU registers to hide from the exposes sysregs (aims
at removing registers from the destination)
- The CPU registers that may not exist but which can be found
in the incoming migration stream (aims at ignoring extra
registers in the incoming state)
An example is given to illustrate how those props
could be used to apply compats for machine types supposed to "see" the
same register set accross various host kernels.
Mitigation of DBGDTRTX issue would be achieved by setting
x-mig-safe-missing-regs=0x40200000200e0298 which matches
AArch32 DBGDTRTX register index.
The first patch improves the tracing so that we can quickly detect
which registers do not match between the incoming stream and the
exposed sysregs
---
Available at:
https://github.com/eauger/qemu/tree/mitig-v6
---
Tests:
- migration with 10.2 machine with old qemu featuring DBGDTRTX
and this one where it is removed. Forward migration works.
backward doesn't because the register is not present in the
input migration stream and write_list_to_cpustate() fails
while write_raw_cp_reg and reading it back. write_raw_cp_reg()
seems to read an unintialized values from cpu->cpreg_values[i].
write has no effect since type is ARM_CP_CONST but read_raw_cp_reg
returns ri->resetvalue which differs from uninitialized value.
I would have expected the initial cpu->cpreg_values[i] to match
reset value which is obviously not the case. Laso the comment hints
that it should be. So maybe another issue? Nevertheless I am
not totally sure supporting backward migration for TCG is a must.
This may be fixed separately if it is confirmed this is a bug.
- migration with accel=kvm back and forth old host/qemu where
host does not feature fixes for TCR2_EL1, PIRE0_EL1, PIR_EL1
and recent KVM_REG_ARM_VENDOR_HYP_BMAP_2 FW and more recent
kernel/this qemu that feature them. Migration works forward
and backward with 10.1 machine type.
History:
v5 -> v6:
- move GString init and collected Sebastian's R-b
v4 -> v5:
- Fixed issue reported by Sebastian about aggregated array
props. This lead to the introduction of
hw/arm/virt: Introduce framework to aggregate hidden-regs
and safe-missing-regs
- Collected additional hacks from Connie
v3 -> v4:
- Collected Connie's & Sebastian's R-bs
- Squashed patches 3 and 5
- various typos and rewording
v2 -> v3:
- revert target/arm: Reinstate bogus AArch32 DBGDTRTX register for migration compat
- fix some typos and rework target/arm/cpu.h hidden_regs comment (Connie)
- Even for TCG we use KVM index
v1 -> v2:
- fixed typos (Connie)
- Make it less KVM specific (tentative hidding of TCG regs, not
tested)
- Tested DBGDTRTX TCG case reported by Peter
- No change to the property format yet. Ran out of idea. However
I changed the name of the property with x-mig prefix
- Changed the terminology, kept hidding but remove fake which was
confusing
- Simplified the logic for regs missing in the incoming stream and
do not check anymore they are exposed on dest
Eric Auger (11):
hw/arm/virt: Rename arm_virt_compat into arm_virt_compat_defaults
target/arm/machine: Improve traces on register mismatch during
migration
target/arm/cpu: Allow registers to be hidden
target/arm/machine: Allow extra regs in the incoming stream
kvm-all: Enforce hidden regs are never accessed
target/arm/cpu: Implement hide_reg callback()
target/arm/cpu: Expose x-mig-hidden-regs and x-mig-safe-missing-regs
properties
hw/arm/virt: Declare AArch32 DBGDTRTX as safe to ignore in incoming
stream
Revert "target/arm: Reinstate bogus AArch32 DBGDTRTX register for
migration compat"
hw/arm/virt: Introduce framework to aggregate hidden-regs and
safe-missing-regs
hw/arm/virt: [DO NOT UPSTREAM] Enforce compatibility with older
kernels
include/hw/arm/virt.h | 23 ++++++++++
include/hw/core/cpu.h | 2 +
target/arm/cpu.h | 48 +++++++++++++++++++++
accel/kvm/kvm-all.c | 12 ++++++
hw/arm/virt.c | 89 ++++++++++++++++++++++++++++++++++++---
target/arm/cpu.c | 11 +++++
target/arm/debug_helper.c | 29 -------------
target/arm/helper.c | 12 +++++-
target/arm/kvm.c | 35 ++++++++++++++-
target/arm/machine.c | 70 +++++++++++++++++++++++++++---
target/arm/trace-events | 10 +++++
11 files changed, 298 insertions(+), 43 deletions(-)
--
2.52.0
On Mon, 26 Jan 2026 at 16:54, Eric Auger <eric.auger@redhat.com> wrote: > > When migrating ARM guests accross same machines with different host > kernels we are likely to encounter failures such as: > > "failed to load cpu:cpreg_vmstate_array_len" > > This is due to the fact KVM exposes a different number of registers > to qemu on source and destination. When trying to migrate a bigger > register set to a smaller one, qemu cannot save the CPU state. > > For example, recently we faced such kind of situations with: > - unconditionnal exposure of KVM_REG_ARM_VENDOR_HYP_BMAP_2 FW pseudo > register from v6.16 onwards. Causes backward migration failure. > - removal of unconditionnal exposure of TCR2_EL1, PIRE0_EL1, PIR_EL1 > from v6.13 onwards. Causes forward migration failure. Hi; sorry I haven't given this series any attention before. (1) Yes, this is definitely a problem we need to solve. (2) What are the requirements we have for this? This series sets up CPU properties controlling this, and then sets them in the virt machine model based on the machine type, but this seems awkward for two reasons: * using properties confines us to using a "text string" way of describing the behaviour; if we could implement the handling in code and C data structures in target/arm we could potentially do it in a more flexible and readable way (e.g. being able to specify the register via something other than a raw hex value) * different host kernel versions isn't really related to the QEMU version, so tying it to a versioned machine type doesn't seem to fit Q: Do we need the user to be able to control this (e.g. adding extra registers to be ignored) on their command line, or can we say "you need a newer QEMU that understands how to deal with this register if you want to do migrations involving this newer kernel version" ? Q: This series adds a "hide this register" option which stops the register appearing in the outbound migration data. Do we need that, or would it be enough to have "ignore this register in the inbound migration data" ? Assuming we're not trying to migrate backwards to an older QEMU version that's unaware of the new register, that seems to me like it should be equivalent. (3) Categories of sysreg that are causing problems: a: "controls" -- like the PSCI_VERSION pseudoreg. Here the setting controls what the kernel is exposing to the guest, and so we need to be able to have the user tell QEMU to use a specific version that's not the host kernel default if the default isn't one that's valid for all older kernels. Sometimes the new kernel default is the same as the old kernel's behaviour and in those cases we also want handling of "if you see the control reg in the incoming data and its value is the default then it's OK to ignore it". b: "things exposed that should not have been" -- where the old kernel exposed a register but the new one does not because exposing the register was wrong (i.e. a bug). The handling here can be "ignore this in migration input if present". Examples are the TCG2_EL1, PIRE0_EL1, PIR_EL1 regs that shouldn't exist if the corresponding feature was disabled for the guest. c: "things not exposed that should have been" -- where a new kernel exposes a new register that the old one does not, and so migration from a host with the new kernel to the old one fails. In most cases it should be possible to handle this with "ignore in migration input if present", or "fail migration if incoming value is not some safe default, but if it is that default value then ignore". Have I missed anything ? (4) Mechanisms for handling them: This series provides two mechanisms: "safe missing reg" -- these registers are ignored if they appear in the incoming migration data. "hidden" -- the behaviour here is that we effectively entirely ignore the register, so we do not read it from the kernel or write it back, do not send it in outbound migration data, and do not expect to see it in incoming migration data. The "arm: add kvm-psci-version vcpu property" series handles one specific "control" register, with a specific user-facing cpu property. If new "control" type registers are rare, this seems like a good way to go, because it means we can give the user an interface that is reasonably clear about what it does, and we can provide better errors on the migration-destination side (e.g. pointing the user at the need to specify the property on the source side to get a VM they can migrate to this destination). The only use of "hidden" so far is for KVM_REG_ARM_VENDOR_HYP_BMAP_2. However, I'm not sure this is the right way to handle this register. Judging from the documentation, this seems to be a "control" register: it would let QEMU enable certain things to be visible to the guest. It also is odd to treat this differently from the existing KVM_REG_ARM_VENDOR_HYP_BMAP register, which has exactly the same semantics. I think that the right way to treat this register would be "if this is present in the incoming migration system and the host kernel doesn't know about it, a value of zero is OK, but any other value should fail migration". In general I'm not convinced that "hidden" is a useful thing to provide -- it should always be fine for QEMU to read and write back to the same host kernel some sysreg it doesn't know about, so what "hidden" is mostly doing is "don't put this into outgoing migration data". Do we need to be able to do that, or can we instead always use a "ignore in incoming migration data" strategy? (5) My preferences I think that assuming that it meets the requirements, I would prefer something like a mechanism where we use some kind of C data structure / code in target/arm/machine.c to represent "this register needs some special handling", where the special handling might be: - ignore if present in input - if present in input, value must be X, otherwise fail migration - maybe some other things if we need them and this is not tied to specific QEMU machine versions and isn't something we expose via QOM properties. I'd rather avoid the "hidden" register idea unless we definitely need it in addition to "ignore in incoming data". thanks -- PMM
On Mon, 26 Jan 2026, Eric Auger wrote:
> When migrating ARM guests accross same machines with different host
> kernels we are likely to encounter failures such as:
>
> "failed to load cpu:cpreg_vmstate_array_len"
>
> This is due to the fact KVM exposes a different number of registers
> to qemu on source and destination. When trying to migrate a bigger
> register set to a smaller one, qemu cannot save the CPU state.
>
> For example, recently we faced such kind of situations with:
> - unconditionnal exposure of KVM_REG_ARM_VENDOR_HYP_BMAP_2 FW pseudo
> register from v6.16 onwards. Causes backward migration failure.
> - removal of unconditionnal exposure of TCR2_EL1, PIRE0_EL1, PIR_EL1
> from v6.13 onwards. Causes forward migration failure.
>
> This situation is really problematic for distributions which want to
> guarantee forward and backward migration of a given machine type
> between different releases.
>
> While the series mainly targets KVM acceleration, this problem
> also exists with TCG. For instance some registers may be exposed
> while they shouldn't. Then it is tricky to fix that situation
> without breaking forward migration. An example was provided by
> Peter: 4f2b82f60 ("target/arm: Reinstate bogus AArch32 DBGDTRTX
> register for migration compat).
>
> This series introduces 2 CPU array properties that list
> - the CPU registers to hide from the exposes sysregs (aims
> at removing registers from the destination)
> - The CPU registers that may not exist but which can be found
> in the incoming migration stream (aims at ignoring extra
> registers in the incoming state)
>
> An example is given to illustrate how those props
> could be used to apply compats for machine types supposed to "see" the
> same register set accross various host kernels.
>
> Mitigation of DBGDTRTX issue would be achieved by setting
> x-mig-safe-missing-regs=0x40200000200e0298 which matches
> AArch32 DBGDTRTX register index.
>
> The first patch improves the tracing so that we can quickly detect
> which registers do not match between the incoming stream and the
> exposed sysregs
>
I gave these a spin - works as advertised, no issues found.
Tested-by: Sebastian Ott <sebott@redhat.com>
On 1/27/26 5:52 PM, Sebastian Ott wrote:
> On Mon, 26 Jan 2026, Eric Auger wrote:
>
>> When migrating ARM guests accross same machines with different host
>> kernels we are likely to encounter failures such as:
>>
>> "failed to load cpu:cpreg_vmstate_array_len"
>>
>> This is due to the fact KVM exposes a different number of registers
>> to qemu on source and destination. When trying to migrate a bigger
>> register set to a smaller one, qemu cannot save the CPU state.
>>
>> For example, recently we faced such kind of situations with:
>> - unconditionnal exposure of KVM_REG_ARM_VENDOR_HYP_BMAP_2 FW pseudo
>> register from v6.16 onwards. Causes backward migration failure.
>> - removal of unconditionnal exposure of TCR2_EL1, PIRE0_EL1, PIR_EL1
>> from v6.13 onwards. Causes forward migration failure.
>>
>> This situation is really problematic for distributions which want to
>> guarantee forward and backward migration of a given machine type
>> between different releases.
>>
>> While the series mainly targets KVM acceleration, this problem
>> also exists with TCG. For instance some registers may be exposed
>> while they shouldn't. Then it is tricky to fix that situation
>> without breaking forward migration. An example was provided by
>> Peter: 4f2b82f60 ("target/arm: Reinstate bogus AArch32 DBGDTRTX
>> register for migration compat).
>>
>> This series introduces 2 CPU array properties that list
>> - the CPU registers to hide from the exposes sysregs (aims
>> at removing registers from the destination)
>> - The CPU registers that may not exist but which can be found
>> in the incoming migration stream (aims at ignoring extra
>> registers in the incoming state)
>>
>> An example is given to illustrate how those props
>> could be used to apply compats for machine types supposed to "see" the
>> same register set accross various host kernels.
>>
>> Mitigation of DBGDTRTX issue would be achieved by setting
>> x-mig-safe-missing-regs=0x40200000200e0298 which matches
>> AArch32 DBGDTRTX register index.
>>
>> The first patch improves the tracing so that we can quickly detect
>> which registers do not match between the incoming stream and the
>> exposed sysregs
>>
>
> I gave these a spin - works as advertised, no issues found.
> Tested-by: Sebastian Ott <sebott@redhat.com>
Thanks!
Eric
Hi Peter, Richard,
On 1/26/26 5:52 PM, Eric Auger wrote:
> When migrating ARM guests accross same machines with different host
> kernels we are likely to encounter failures such as:
>
> "failed to load cpu:cpreg_vmstate_array_len"
>
> This is due to the fact KVM exposes a different number of registers
> to qemu on source and destination. When trying to migrate a bigger
> register set to a smaller one, qemu cannot save the CPU state.
>
> For example, recently we faced such kind of situations with:
> - unconditionnal exposure of KVM_REG_ARM_VENDOR_HYP_BMAP_2 FW pseudo
> register from v6.16 onwards. Causes backward migration failure.
> - removal of unconditionnal exposure of TCR2_EL1, PIRE0_EL1, PIR_EL1
> from v6.13 onwards. Causes forward migration failure.
>
> This situation is really problematic for distributions which want to
> guarantee forward and backward migration of a given machine type
> between different releases.
>
> While the series mainly targets KVM acceleration, this problem
> also exists with TCG. For instance some registers may be exposed
> while they shouldn't. Then it is tricky to fix that situation
> without breaking forward migration. An example was provided by
> Peter: 4f2b82f60 ("target/arm: Reinstate bogus AArch32 DBGDTRTX
> register for migration compat).
>
> This series introduces 2 CPU array properties that list
> - the CPU registers to hide from the exposes sysregs (aims
> at removing registers from the destination)
> - The CPU registers that may not exist but which can be found
> in the incoming migration stream (aims at ignoring extra
> registers in the incoming state)
>
> An example is given to illustrate how those props
> could be used to apply compats for machine types supposed to "see" the
> same register set accross various host kernels.
>
> Mitigation of DBGDTRTX issue would be achieved by setting
> x-mig-safe-missing-regs=0x40200000200e0298 which matches
> AArch32 DBGDTRTX register index.
>
> The first patch improves the tracing so that we can quickly detect
> which registers do not match between the incoming stream and the
> exposed sysregs
Most of the patches of the series have collected R-bs. Do you have
concerns with the approach?
This aims at solving distro real life issues wrt cross kernel migration
failures and we would appreciate to get a generic solution within 11.0
timeframe.
Also [PATCH v4 0/2] arm: add kvm-psci-version vcpu property
(https://lore.kernel.org/all/20251202160853.22560-3-sebott@redhat.com/)
is part of this initiative and also collected R-bs/T-bs.
Looking forward to your feedbacks.
Eric
>
> ---
>
> Available at:
> https://github.com/eauger/qemu/tree/mitig-v6
>
> ---
>
> Tests:
> - migration with 10.2 machine with old qemu featuring DBGDTRTX
> and this one where it is removed. Forward migration works.
> backward doesn't because the register is not present in the
> input migration stream and write_list_to_cpustate() fails
> while write_raw_cp_reg and reading it back. write_raw_cp_reg()
> seems to read an unintialized values from cpu->cpreg_values[i].
> write has no effect since type is ARM_CP_CONST but read_raw_cp_reg
> returns ri->resetvalue which differs from uninitialized value.
> I would have expected the initial cpu->cpreg_values[i] to match
> reset value which is obviously not the case. Laso the comment hints
> that it should be. So maybe another issue? Nevertheless I am
> not totally sure supporting backward migration for TCG is a must.
> This may be fixed separately if it is confirmed this is a bug.
>
> - migration with accel=kvm back and forth old host/qemu where
> host does not feature fixes for TCR2_EL1, PIRE0_EL1, PIR_EL1
> and recent KVM_REG_ARM_VENDOR_HYP_BMAP_2 FW and more recent
> kernel/this qemu that feature them. Migration works forward
> and backward with 10.1 machine type.
>
> History:
>
> v5 -> v6:
> - move GString init and collected Sebastian's R-b
>
> v4 -> v5:
> - Fixed issue reported by Sebastian about aggregated array
> props. This lead to the introduction of
> hw/arm/virt: Introduce framework to aggregate hidden-regs
> and safe-missing-regs
> - Collected additional hacks from Connie
>
> v3 -> v4:
> - Collected Connie's & Sebastian's R-bs
> - Squashed patches 3 and 5
> - various typos and rewording
>
> v2 -> v3:
> - revert target/arm: Reinstate bogus AArch32 DBGDTRTX register for migration compat
> - fix some typos and rework target/arm/cpu.h hidden_regs comment (Connie)
> - Even for TCG we use KVM index
>
> v1 -> v2:
> - fixed typos (Connie)
> - Make it less KVM specific (tentative hidding of TCG regs, not
> tested)
> - Tested DBGDTRTX TCG case reported by Peter
> - No change to the property format yet. Ran out of idea. However
> I changed the name of the property with x-mig prefix
> - Changed the terminology, kept hidding but remove fake which was
> confusing
> - Simplified the logic for regs missing in the incoming stream and
> do not check anymore they are exposed on dest
>
>
> Eric Auger (11):
> hw/arm/virt: Rename arm_virt_compat into arm_virt_compat_defaults
> target/arm/machine: Improve traces on register mismatch during
> migration
> target/arm/cpu: Allow registers to be hidden
> target/arm/machine: Allow extra regs in the incoming stream
> kvm-all: Enforce hidden regs are never accessed
> target/arm/cpu: Implement hide_reg callback()
> target/arm/cpu: Expose x-mig-hidden-regs and x-mig-safe-missing-regs
> properties
> hw/arm/virt: Declare AArch32 DBGDTRTX as safe to ignore in incoming
> stream
> Revert "target/arm: Reinstate bogus AArch32 DBGDTRTX register for
> migration compat"
> hw/arm/virt: Introduce framework to aggregate hidden-regs and
> safe-missing-regs
> hw/arm/virt: [DO NOT UPSTREAM] Enforce compatibility with older
> kernels
>
> include/hw/arm/virt.h | 23 ++++++++++
> include/hw/core/cpu.h | 2 +
> target/arm/cpu.h | 48 +++++++++++++++++++++
> accel/kvm/kvm-all.c | 12 ++++++
> hw/arm/virt.c | 89 ++++++++++++++++++++++++++++++++++++---
> target/arm/cpu.c | 11 +++++
> target/arm/debug_helper.c | 29 -------------
> target/arm/helper.c | 12 +++++-
> target/arm/kvm.c | 35 ++++++++++++++-
> target/arm/machine.c | 70 +++++++++++++++++++++++++++---
> target/arm/trace-events | 10 +++++
> 11 files changed, 298 insertions(+), 43 deletions(-)
>
© 2016 - 2026 Red Hat, Inc.