[RFC 0/3] add snapshot/restore fuzzing device

Richard Liu posted 3 patches 1 year, 9 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20220722192041.93006-1-richy.liu.2002@gmail.com
Maintainers: "Michael S. Tsirkin" <mst@redhat.com>, Marcel Apfelbaum <marcel.apfelbaum@gmail.com>, Paolo Bonzini <pbonzini@redhat.com>, Richard Henderson <richard.henderson@linaro.org>, Eduardo Habkost <eduardo@habkost.net>, Juan Quintela <quintela@redhat.com>, "Dr. David Alan Gilbert" <dgilbert@redhat.com>
docs/devel/snapshot.rst |  26 +++++++
hw/i386/Kconfig         |   1 +
hw/misc/Kconfig         |   3 +
hw/misc/meson.build     |   1 +
hw/misc/snapshot.c      | 164 ++++++++++++++++++++++++++++++++++++++++
migration/savevm.c      |  84 ++++++++++++++++++++
migration/savevm.h      |   3 +
7 files changed, 282 insertions(+)
create mode 100644 docs/devel/snapshot.rst
create mode 100644 hw/misc/snapshot.c
[RFC 0/3] add snapshot/restore fuzzing device
Posted by Richard Liu 1 year, 9 months ago
This RFC adds a virtual device for snapshot/restores within QEMU. I am working
on this as a part of QEMU Google Summer of Code 2022. Fast snapshot/restores
within QEMU is helpful for code fuzzing.

I reused the migration code for saving and restoring virtual device and CPU
state. As for the RAM, I am using a simple COW mmaped file to do restores.

The loadvm migration function I used for doing restores only worked after I
called it from a qemu_bh. I'm not sure if I should run the migration code in a
separate thread (see patch 3), since currently it is running as a part of the
device code in the vCPU thread.

This is a rough first revision and feedback on the cpu and device state restores
is appreciated.

To test locally, boot up any linux distro. I used the following C file to
interact with the PCI snapshot device:

    #include <stdio.h>
    #include <stdint.h>
    #include <fcntl.h>
    #include <sys/mman.h>
    #include <unistd.h>

    int main() {
        int fd = open("/sys/bus/pci/devices/0000:00:04.0/resource0", O_RDWR | O_SYNC);
        size_t size = 1024 * 1024;
        uint32_t* memory = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);

        printf("%x\n", memory[0]);

        int a = 0;
        memory[0] = 0x101; // save snapshot
        printf("before: value of a = %d\n", a);
        a = 1;
        printf("middle: value of a = %d\n", a);
        memory[0] = 0x102; // load snapshot
        printf("after: value of a = %d\n", a);

        return 0;
    }

Richard Liu (3):
  create skeleton snapshot device and add docs
  implement ram save/restore
  use migration code for cpu and device save/restore

 docs/devel/snapshot.rst |  26 +++++++
 hw/i386/Kconfig         |   1 +
 hw/misc/Kconfig         |   3 +
 hw/misc/meson.build     |   1 +
 hw/misc/snapshot.c      | 164 ++++++++++++++++++++++++++++++++++++++++
 migration/savevm.c      |  84 ++++++++++++++++++++
 migration/savevm.h      |   3 +
 7 files changed, 282 insertions(+)
 create mode 100644 docs/devel/snapshot.rst
 create mode 100644 hw/misc/snapshot.c

-- 
2.35.1
Re: [RFC 0/3] add snapshot/restore fuzzing device
Posted by Claudio Fontana 1 year, 9 months ago
Hi Richard,

On 7/22/22 21:20, Richard Liu wrote:
> This RFC adds a virtual device for snapshot/restores within QEMU. I am working
> on this as a part of QEMU Google Summer of Code 2022. Fast snapshot/restores
> within QEMU is helpful for code fuzzing.
> 
> I reused the migration code for saving and restoring virtual device and CPU
> state. As for the RAM, I am using a simple COW mmaped file to do restores.
> 
> The loadvm migration function I used for doing restores only worked after I
> called it from a qemu_bh. I'm not sure if I should run the migration code in a
> separate thread (see patch 3), since currently it is running as a part of the
> device code in the vCPU thread.
> 
> This is a rough first revision and feedback on the cpu and device state restores
> is appreciated.

As I understand it, usually the save and restore of VM state in QEMU can best be
managed by libvirt APIs, and for example using the libvirt command line tool virsh:

$ virsh save (or managedsave)

$ virsh restore (or start)

These commands start a QEMU migration using the QMP protocol to a file descriptor,
previously opened by libvirt to contain the state file.

(getfd QMP command):
https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#qapidoc-2811

(migrate QMP command):
https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#qapidoc-1947

This is unfortunately currently very slow.

Maybe you could help thinking out or with the implementation of the solution?
I tried to push this approach that only involves libvirt, using the existing QEMU multifd migration to a socket:

https://listman.redhat.com/archives/libvir-list/2022-June/232252.html

performance is very good compared with what is possible today, but it won't be upstreamable because it is not deemed optimal, and libvirt wants the code to be in QEMU.

What about helping in thinking out how the QEMU-based solution could look like?

The requirements for now in my view seem to be:

* avoiding the kernel file page trashing for large transfers
  which currently requires in my view changing QEMU to be able to migrate a stream to an fd that is open with O_DIRECT.
  In practice this means somehow making all QEMU migration stream writes block-friendly (adding some buffering?).

* allow concurrent parallel transfers
  to be able to use extra cpu resources to speed up the transfer if such resources are available.

* we should be able to transfer multiple GB/s with modern nvmes for super fast VM state save and restore (few seconds even for a 30GB VM),
  and we should do no worse than the prototype fully implemented in libvirt, otherwise it would not make sense to implement it in QEMU.

What do you think?

Ciao,

Claudio

> 
> To test locally, boot up any linux distro. I used the following C file to
> interact with the PCI snapshot device:
> 
>     #include <stdio.h>
>     #include <stdint.h>
>     #include <fcntl.h>
>     #include <sys/mman.h>
>     #include <unistd.h>
> 
>     int main() {
>         int fd = open("/sys/bus/pci/devices/0000:00:04.0/resource0", O_RDWR | O_SYNC);
>         size_t size = 1024 * 1024;
>         uint32_t* memory = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> 
>         printf("%x\n", memory[0]);
> 
>         int a = 0;
>         memory[0] = 0x101; // save snapshot
>         printf("before: value of a = %d\n", a);
>         a = 1;
>         printf("middle: value of a = %d\n", a);
>         memory[0] = 0x102; // load snapshot
>         printf("after: value of a = %d\n", a);
> 
>         return 0;
>     }
> 
> Richard Liu (3):
>   create skeleton snapshot device and add docs
>   implement ram save/restore
>   use migration code for cpu and device save/restore
> 
>  docs/devel/snapshot.rst |  26 +++++++
>  hw/i386/Kconfig         |   1 +
>  hw/misc/Kconfig         |   3 +
>  hw/misc/meson.build     |   1 +
>  hw/misc/snapshot.c      | 164 ++++++++++++++++++++++++++++++++++++++++
>  migration/savevm.c      |  84 ++++++++++++++++++++
>  migration/savevm.h      |   3 +
>  7 files changed, 282 insertions(+)
>  create mode 100644 docs/devel/snapshot.rst
>  create mode 100644 hw/misc/snapshot.c
>
Re: [RFC 0/3] add snapshot/restore fuzzing device
Posted by Alexander Bulekov 1 year, 9 months ago
On 220722 2210, Claudio Fontana wrote:
> Hi Richard,
> 
> On 7/22/22 21:20, Richard Liu wrote:
> > This RFC adds a virtual device for snapshot/restores within QEMU. I am working
> > on this as a part of QEMU Google Summer of Code 2022. Fast snapshot/restores
> > within QEMU is helpful for code fuzzing.
> > 
> > I reused the migration code for saving and restoring virtual device and CPU
> > state. As for the RAM, I am using a simple COW mmaped file to do restores.
> > 
> > The loadvm migration function I used for doing restores only worked after I
> > called it from a qemu_bh. I'm not sure if I should run the migration code in a
> > separate thread (see patch 3), since currently it is running as a part of the
> > device code in the vCPU thread.
> > 
> > This is a rough first revision and feedback on the cpu and device state restores
> > is appreciated.
> 
> As I understand it, usually the save and restore of VM state in QEMU can best be
> managed by libvirt APIs, and for example using the libvirt command line tool virsh:
> 
> $ virsh save (or managedsave)
> 
> $ virsh restore (or start)
> 
> These commands start a QEMU migration using the QMP protocol to a file descriptor,
> previously opened by libvirt to contain the state file.
> 
> (getfd QMP command):
> https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#qapidoc-2811
> 
> (migrate QMP command):
> https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#qapidoc-1947
> 
> This is unfortunately currently very slow.
> 
> Maybe you could help thinking out or with the implementation of the solution?
> I tried to push this approach that only involves libvirt, using the existing QEMU multifd migration to a socket:
> 
> https://listman.redhat.com/archives/libvir-list/2022-June/232252.html
> 
> performance is very good compared with what is possible today, but it won't be upstreamable because it is not deemed optimal, and libvirt wants the code to be in QEMU.
> 
> What about helping in thinking out how the QEMU-based solution could look like?
> 
> The requirements for now in my view seem to be:
> 
> * avoiding the kernel file page trashing for large transfers
>   which currently requires in my view changing QEMU to be able to migrate a stream to an fd that is open with O_DIRECT.
>   In practice this means somehow making all QEMU migration stream writes block-friendly (adding some buffering?).
> 
> * allow concurrent parallel transfers
>   to be able to use extra cpu resources to speed up the transfer if such resources are available.
> 
> * we should be able to transfer multiple GB/s with modern nvmes for super fast VM state save and restore (few seconds even for a 30GB VM),
>   and we should do no worse than the prototype fully implemented in libvirt, otherwise it would not make sense to implement it in QEMU.
> 
> What do you think?

Hi Claudio,
These changes aim to restore a VM hundreds-thousands of times per second
within the same process. Do you think that is achievable with the design
of qmp migrate? We want to to avoid serializing/transferring all of
memory over the FD. So right now, this series only uses migration code
for device state. Right now (in 3/3), the memory is "restored" simply be
re-mmapping MAP_PRIVATE from file-backed memory. However, future
versions might use dirty-page-tracking with a shadow memory-snapshot, to
avoid the page-faults that result from the mmap + MAP_PRIVATE approach.

In terms of the way the guest initiates snapshots/restores, maybe there
is a neater way to do this with QMP, by providing the guest with access
to qmp via a serial device. That way, we avoid the need for a custom
virtual-device. Right now, the snapshots are requested/restored over
MMIO, since we need to make snapshots at precise locations in the
guest's execution (i.e. a specific program counter in a process running
in the guest). I wonder if there is a way to achieve that with qmp
forwarded to the guest.

-Alex

> 
> Ciao,
> 
> Claudio
> 
> > 
> > To test locally, boot up any linux distro. I used the following C file to
> > interact with the PCI snapshot device:
> > 
> >     #include <stdio.h>
> >     #include <stdint.h>
> >     #include <fcntl.h>
> >     #include <sys/mman.h>
> >     #include <unistd.h>
> > 
> >     int main() {
> >         int fd = open("/sys/bus/pci/devices/0000:00:04.0/resource0", O_RDWR | O_SYNC);
> >         size_t size = 1024 * 1024;
> >         uint32_t* memory = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> > 
> >         printf("%x\n", memory[0]);
> > 
> >         int a = 0;
> >         memory[0] = 0x101; // save snapshot
> >         printf("before: value of a = %d\n", a);
> >         a = 1;
> >         printf("middle: value of a = %d\n", a);
> >         memory[0] = 0x102; // load snapshot
> >         printf("after: value of a = %d\n", a);
> > 
> >         return 0;
> >     }
> > 
> > Richard Liu (3):
> >   create skeleton snapshot device and add docs
> >   implement ram save/restore
> >   use migration code for cpu and device save/restore
> > 
> >  docs/devel/snapshot.rst |  26 +++++++
> >  hw/i386/Kconfig         |   1 +
> >  hw/misc/Kconfig         |   3 +
> >  hw/misc/meson.build     |   1 +
> >  hw/misc/snapshot.c      | 164 ++++++++++++++++++++++++++++++++++++++++
> >  migration/savevm.c      |  84 ++++++++++++++++++++
> >  migration/savevm.h      |   3 +
> >  7 files changed, 282 insertions(+)
> >  create mode 100644 docs/devel/snapshot.rst
> >  create mode 100644 hw/misc/snapshot.c
> > 
>