[PATCH v4 0/2] Qemu crashes on VM migration after an handled memory error

“William Roche posted 2 patches 7 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20231013150839.867164-1-william.roche@oracle.com
Maintainers: Paolo Bonzini <pbonzini@redhat.com>, Juan Quintela <quintela@redhat.com>, Peter Xu <peterx@redhat.com>, Fabiano Rosas <farosas@suse.de>, Leonardo Bras <leobras@redhat.com>, Peter Maydell <peter.maydell@linaro.org>, Marcelo Tosatti <mtosatti@redhat.com>
There is a newer version of this series
accel/kvm/kvm-all.c      | 41 +++++++++++++++++++++++++++++++++++++++-
accel/stubs/kvm-stub.c   | 10 ++++++++++
include/sysemu/kvm.h     | 16 ++++++++++++++++
include/sysemu/kvm_int.h |  3 ++-
migration/migration.c    |  6 ++++++
migration/ram-compress.c |  3 ++-
migration/ram.c          | 24 +++++++++++++++++++++--
migration/ram.h          |  2 ++
target/arm/kvm64.c       |  6 +++++-
target/i386/kvm/kvm.c    |  2 +-
10 files changed, 106 insertions(+), 7 deletions(-)
[PATCH v4 0/2] Qemu crashes on VM migration after an handled memory error
Posted by “William Roche 7 months ago
From: William Roche <william.roche@oracle.com>

A Qemu VM can survive a memory error, as qemu can relay the error to the
VM kernel which could also deal with it -- poisoning/off-lining the impacted
page.
This situation creates a hole in the VM memory address space that the VM kernel
knows about (an unreadable page or set of pages).

But the migration of this VM (live migration through the network or
pseudo-migration with the creation of a state file) will crash Qemu when
it sequentially reads the memory address space and stumbles on the
existing hole.

In order to thoroughly correct this problem, the poison information should
follow the migration which represents several difficulties:
- poisoning a page on the destination machine to replicate the source
  poison requires CAP_SYS_ADMIN priviledges, and qemu process may not
  always run as a root process
- the destination kernel needs to be configured with CONFIG_MEMORY_FAILURE
- the poison information would require a memory transfer protocol
  enhancement to provide this information
(The current patches don't provide any of that)

But if we rely on the fact that the a running VM kernel is correctly
dealing with memory poison it is informed about: marking the poison page
as inaccessible, we could count on the VM kernel to make sure that
poisoned pages are not used, even after a migration.
In this case, I suggest to treat the poisoned pages as if they were
zero-pages for the migration copy.
This fix also works with underlying large pages, taking into account the
RAMBlock segment "page-size".

Now, it leaves a case that we have to deal with: if a memory error is
reported to qemu but not injected into the running kernel...
As the migration will go from a poisoned page to an all-zero page, if
the VM kernel doesn't prevent the access to this page, a memory read
that would generate a BUS_MCEERR_AR error on the source platform, could
be reading zeros on the destination. This is a memory corruption. 

So we have to ensure that all poisoned pages we set to zero are known by
the running kernel. But we have a problem with platforms where BUS_MCEERR_AO
errors are ignored, which means that qemu knows about the poison but the VM
doesn't. For the moment it's only the case for ARM, but could later be
also needed for AMD VMs.
See https://lore.kernel.org/all/20230912211824.90952-3-john.allen@amd.com/

In order to avoid this possible silent data corruption situation, we should
prevent the migration when we know that a poisoned page is ignored from the VM.

Which is, according to me, the smallest fix we need  to avoid qemu crashes
on migration after an handled memory error, without introducing a possible
corruption situation.

This fix is scripts/checkpatch.pl clean.
Unit test: Migration blocking succesfully tested on ARM -- injected AO error
blocks it. On x86 the same type of error being relayed doesn't block.

v2:
  - adding compressed transfer handling of poisoned pages

v3:
  - Included the Reviewed-by and Tested-by information on first patch
  - added a TODO comment above control_save_page()
    mentioning Zhijian's feedback about RDMA migration failure.

v4:
  - adding a patch to deal with unknown poison tracking
    (not using migrate_add_blocker as this is not devices related and
    we want to avoid the interaction with --only-migratable mechanism)


William Roche (2):
  migration: skip poisoned memory pages on "ram saving" phase
  migration: prevent migration when a poisoned page is unknown from the
    VM

 accel/kvm/kvm-all.c      | 41 +++++++++++++++++++++++++++++++++++++++-
 accel/stubs/kvm-stub.c   | 10 ++++++++++
 include/sysemu/kvm.h     | 16 ++++++++++++++++
 include/sysemu/kvm_int.h |  3 ++-
 migration/migration.c    |  6 ++++++
 migration/ram-compress.c |  3 ++-
 migration/ram.c          | 24 +++++++++++++++++++++--
 migration/ram.h          |  2 ++
 target/arm/kvm64.c       |  6 +++++-
 target/i386/kvm/kvm.c    |  2 +-
 10 files changed, 106 insertions(+), 7 deletions(-)

-- 
2.39.3