[Qemu-devel] [RFC PATCH 0/4] ARM/ARM64 fixes for live memory snapshot based on userfaultfd

Christian Pinto posted 4 patches 7 years ago
Failed in applying to current master (apply log)
migration/migration.c    |  9 +++++----
migration/postcopy-ram.c | 25 ++++++++-----------------
migration/ram.c          | 18 ++++++++++++++----
3 files changed, 27 insertions(+), 25 deletions(-)
[Qemu-devel] [RFC PATCH 0/4] ARM/ARM64 fixes for live memory snapshot based on userfaultfd
Posted by Christian Pinto 7 years ago
This patch series introduces a set of fixes to the previous work proposed by
Hailiang Zhang to enable in QEMU live memory snapshot based
on userfaultfd. See discussion here:
http://www.mail-archive.com/qemu-devel@nongnu.org/msg393118.html

These patches apply on top of: 
https://github.com/coloft/qemu/tree/snapshot-v2
that is the latest version of Hailiang's work, and rely on the latest work on
userfaultfd available on Andrea Arcangeli's Linux kernel tree:
https://git.kernel.org/cgit/linux/kernel/git/andrea/aa.git/log/?h=userfault

The original work was mainly tested on x86 tcg machines and was not working
ARM/ARM64 tcg.
The fixes presented in this series enable the live memory snapshot
to work for ARM64 tcg guests running on top of an ARM64 host.

The main problems encountered were:
    - QEMU uses for ARM a memory page size of 1KB. Even though this size is not
      supported by the Linux kernel, is is kept for backward compatibility
      with older ARM CPU MMUs. Initial work was write-unprotecting pages with
      a granularity not always aligned with host page size, causing userfaultfd
      to fail.
    - The VM execution was resumed right before the status of the migration
      was switched from MIGRATION_STATUS_SETUP to MIGRATION_STATUS_ACTIVE.
      This was causing again the VM to trigger a "Bus error", due to wrong
      status of some memory pages.
    - When unprotecting a memory page the flag
      UFFDIO_WRITEPROTECT_MODE_DONTWAKE was used. This way, after a page is
      copied into snapshot file, the virtual machine execution is not resumed.


To test the patches on an ARM64 host, boot an ARM64 tcg machine:

qemu-system-aarch64 -machine virt,accel=tcg -cpu cortex-a57\
        -m 256 -kernel Image \
        -initrd rootfs.cpio.gz \
        -append "earlyprintk rw console=ttyAMA0" \
        -net nic -net user \
        -nographic -serial pty -monitor stdio

start migration from QEMU monitor:

    (qemu) migrate file:/root/test_snapshot


resume VM form snapshot:

qemu-system-aarch64 -machine virt,accel=tcg -cpu cortex-a57\
        -m 256 -kernel Image \
        -initrd rootfs.cpio.gz \
        -append "earlyprintk rw console=ttyAMA0" \
        -net nic -net user \
        -nographic -serial stdio -monitor pty \
        -incoming file:/root/test_snapshot

Christian Pinto (4):
  migration/postcopy-ram: check pagefault flags in userfaultfd thread
  migration/ram: Fix for ARM/ARM64 page size
  migration: snapshot thread
  migration/postcopy-ram: ram_set_pages_wp fix

 migration/migration.c    |  9 +++++----
 migration/postcopy-ram.c | 25 ++++++++-----------------
 migration/ram.c          | 18 ++++++++++++++----
 3 files changed, 27 insertions(+), 25 deletions(-)

-- 
2.11.0


Re: [Qemu-devel] [RFC PATCH 0/4] ARM/ARM64 fixes for live memory snapshot based on userfaultfd
Posted by Dr. David Alan Gilbert 7 years ago
* Christian Pinto (c.pinto@virtualopensystems.com) wrote:
> This patch series introduces a set of fixes to the previous work proposed by
> Hailiang Zhang to enable in QEMU live memory snapshot based
> on userfaultfd. See discussion here:
> http://www.mail-archive.com/qemu-devel@nongnu.org/msg393118.html

Thanks for posting this,

> These patches apply on top of: 
> https://github.com/coloft/qemu/tree/snapshot-v2
> that is the latest version of Hailiang's work, and rely on the latest work on
> userfaultfd available on Andrea Arcangeli's Linux kernel tree:
> https://git.kernel.org/cgit/linux/kernel/git/andrea/aa.git/log/?h=userfault
> 
> The original work was mainly tested on x86 tcg machines and was not working
> ARM/ARM64 tcg.
> The fixes presented in this series enable the live memory snapshot
> to work for ARM64 tcg guests running on top of an ARM64 host.
> 
> The main problems encountered were:
>     - QEMU uses for ARM a memory page size of 1KB. Even though this size is not
>       supported by the Linux kernel, is is kept for backward compatibility
>       with older ARM CPU MMUs. Initial work was write-unprotecting pages with
>       a granularity not always aligned with host page size, causing userfaultfd
>       to fail.

Yes, Power similarly has a 4kb size for the target page size even though
the host kernel is normally a large page size.

>     - The VM execution was resumed right before the status of the migration
>       was switched from MIGRATION_STATUS_SETUP to MIGRATION_STATUS_ACTIVE.
>       This was causing again the VM to trigger a "Bus error", due to wrong
>       status of some memory pages.
>     - When unprotecting a memory page the flag
>       UFFDIO_WRITEPROTECT_MODE_DONTWAKE was used. This way, after a page is
>       copied into snapshot file, the virtual machine execution is not resumed.
> 
> 
> To test the patches on an ARM64 host, boot an ARM64 tcg machine:
> 
> qemu-system-aarch64 -machine virt,accel=tcg -cpu cortex-a57\
>         -m 256 -kernel Image \
>         -initrd rootfs.cpio.gz \
>         -append "earlyprintk rw console=ttyAMA0" \
>         -net nic -net user \
>         -nographic -serial pty -monitor stdio
> 
> start migration from QEMU monitor:
> 
>     (qemu) migrate file:/root/test_snapshot
> 
> 
> resume VM form snapshot:
> 
> qemu-system-aarch64 -machine virt,accel=tcg -cpu cortex-a57\
>         -m 256 -kernel Image \
>         -initrd rootfs.cpio.gz \
>         -append "earlyprintk rw console=ttyAMA0" \
>         -net nic -net user \
>         -nographic -serial stdio -monitor pty \
>         -incoming file:/root/test_snapshot

Nice, what's your use case and how are you dealing with storage?

Dave

> Christian Pinto (4):
>   migration/postcopy-ram: check pagefault flags in userfaultfd thread
>   migration/ram: Fix for ARM/ARM64 page size
>   migration: snapshot thread
>   migration/postcopy-ram: ram_set_pages_wp fix
> 
>  migration/migration.c    |  9 +++++----
>  migration/postcopy-ram.c | 25 ++++++++-----------------
>  migration/ram.c          | 18 ++++++++++++++----
>  3 files changed, 27 insertions(+), 25 deletions(-)
> 
> -- 
> 2.11.0
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Re: [Qemu-devel] [RFC PATCH 0/4] ARM/ARM64 fixes for live memory snapshot based on userfaultfd
Posted by Christian Pinto 7 years ago
Hello Alan,


On 09/03/2017 18:46, Dr. David Alan Gilbert wrote:
> * Christian Pinto (c.pinto@virtualopensystems.com) wrote:
>> This patch series introduces a set of fixes to the previous work proposed by
>> Hailiang Zhang to enable in QEMU live memory snapshot based
>> on userfaultfd. See discussion here:
>> http://www.mail-archive.com/qemu-devel@nongnu.org/msg393118.html
> Thanks for posting this,
>
>> These patches apply on top of:
>> https://github.com/coloft/qemu/tree/snapshot-v2
>> that is the latest version of Hailiang's work, and rely on the latest work on
>> userfaultfd available on Andrea Arcangeli's Linux kernel tree:
>> https://git.kernel.org/cgit/linux/kernel/git/andrea/aa.git/log/?h=userfault
>>
>> The original work was mainly tested on x86 tcg machines and was not working
>> ARM/ARM64 tcg.
>> The fixes presented in this series enable the live memory snapshot
>> to work for ARM64 tcg guests running on top of an ARM64 host.
>>
>> The main problems encountered were:
>>      - QEMU uses for ARM a memory page size of 1KB. Even though this size is not
>>        supported by the Linux kernel, is is kept for backward compatibility
>>        with older ARM CPU MMUs. Initial work was write-unprotecting pages with
>>        a granularity not always aligned with host page size, causing userfaultfd
>>        to fail.
> Yes, Power similarly has a 4kb size for the target page size even though
> the host kernel is normally a large page size.

The fix included in this series should solve the problem for Power as well,
since it is making sure the address passed to userfaultfd is aligned
to the host page size. So, if someone in the Power community is
interested in this functionality, this fix might come handy.

>
>>      - The VM execution was resumed right before the status of the migration
>>        was switched from MIGRATION_STATUS_SETUP to MIGRATION_STATUS_ACTIVE.
>>        This was causing again the VM to trigger a "Bus error", due to wrong
>>        status of some memory pages.
>>      - When unprotecting a memory page the flag
>>        UFFDIO_WRITEPROTECT_MODE_DONTWAKE was used. This way, after a page is
>>        copied into snapshot file, the virtual machine execution is not resumed.
>>
>>
>> To test the patches on an ARM64 host, boot an ARM64 tcg machine:
>>
>> qemu-system-aarch64 -machine virt,accel=tcg -cpu cortex-a57\
>>          -m 256 -kernel Image \
>>          -initrd rootfs.cpio.gz \
>>          -append "earlyprintk rw console=ttyAMA0" \
>>          -net nic -net user \
>>          -nographic -serial pty -monitor stdio
>>
>> start migration from QEMU monitor:
>>
>>      (qemu) migrate file:/root/test_snapshot
>>
>>
>> resume VM form snapshot:
>>
>> qemu-system-aarch64 -machine virt,accel=tcg -cpu cortex-a57\
>>          -m 256 -kernel Image \
>>          -initrd rootfs.cpio.gz \
>>          -append "earlyprintk rw console=ttyAMA0" \
>>          -net nic -net user \
>>          -nographic -serial stdio -monitor pty \
>>          -incoming file:/root/test_snapshot
> Nice, what's your use case and how are you dealing with storage?

This is a work done in the context of a H2020 European Project
named ExaNoDe (http://exanode.eu) that is building a prototype ARM64
based compute node for the exascale (computing capabilities in the order
of the Exaflop) domain. In this project, targeting HPC, scientific 
applications
using MPI will be executed in virtualized computing nodes (KVM VMs),
rather than directly on physical machines. This is mainly to improve the
manageability of the overall system and ease the task of separating
different workloads. The work done on live memory snapshot is meant
to tackle the problem of system resiliency, reducing the overall impact
on the virtualized software, and leading to higher availability of the
virtualized computing nodes.

For the time being we are focusing on memory, and storage has not yet
been taken into consideration. However, at a first glance I would say that
storage in QEMU is already using CoW that could be useful for this scenario
as well.


Thanks,

Christian

>
> Dave
>
>> Christian Pinto (4):
>>    migration/postcopy-ram: check pagefault flags in userfaultfd thread
>>    migration/ram: Fix for ARM/ARM64 page size
>>    migration: snapshot thread
>>    migration/postcopy-ram: ram_set_pages_wp fix
>>
>>   migration/migration.c    |  9 +++++----
>>   migration/postcopy-ram.c | 25 ++++++++-----------------
>>   migration/ram.c          | 18 ++++++++++++++----
>>   3 files changed, 27 insertions(+), 25 deletions(-)
>>
>> -- 
>> 2.11.0
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK