From: William Roche <william.roche@oracle.com>
Problem:
--------
A Qemu VM can survive a memory error, as qemu can relay the error to the
VM kernel which could also deal with it -- poisoning/off-lining the impacted
page. This situation creates a hole in the VM memory address space (an
unreadable page or set of pages).
A migration request of this VM (live migration through the network or
pseudo-migration with the creation of a state file) will crash Qemu when
it sequentially reads the memory address space and stumbles on the
existing hole.
New fix proposal:
-----------------
Let's prevent the migration when we know that there is a poison page in
the VM address space.
History:
--------
My first fix proposal for this crash condition (latest version:
https://lore.kernel.org/all/20231106220319.456765-1-william.roche@oracle.com/ )
relied on a well behaving kernel to guaranty that a known poison page is
not accessed. It introduced an ARM platform specificity.
I haven't received any feedback about the ARM specificity to avoid
a possible memory corruption after a migration transforming a poisoned
page into an all zero page.
I also accept that when a memory error leads to memory poisoning, this
platform functionality has to be honored as long as a physical platform
would provide it.
Peter asked for a complete correction of this problem (transfering
the memory holes information with the migration and recreating these
holes on the destination platform).
In the meantime, this is a very small fix to avoid the current crash
situation reading the poisoned memory pages. I'm simply preventing
the migration when we know that it would crash, when there is a
poisoned page in the VM address space.
This is a generic protection code, avoiding a crash condition and
reporting the following error message:
"Error: Can't migrate this vm with hardware poisoned memory, please reboot the vm and try again"
instead of crashing the VM.
This fix is scripts/checkpatch.pl clean.
Unit tested on ARM and x86.
William Roche (1):
migration: prevent migration when VM has poisoned memory
accel/kvm/kvm-all.c | 10 ++++++++++
accel/stubs/kvm-stub.c | 5 +++++
include/sysemu/kvm.h | 6 ++++++
migration/migration.c | 7 +++++++
4 files changed, 28 insertions(+)
--
2.39.3