hw/core/qdev.c | 24 +++ hw/net/virtio-net.c | 282 +++++++++++++++++++++++++++++++++ hw/pci/pci.c | 17 ++ hw/pci/pcie.c | 6 + hw/vfio/pci.c | 31 +++- hw/vfio/pci.h | 1 + include/hw/pci/pci.h | 4 + include/hw/qdev-core.h | 9 ++ include/hw/virtio/virtio-net.h | 12 ++ include/hw/virtio/virtio.h | 1 + include/migration/vmstate.h | 2 + migration/migration.c | 21 +++ migration/migration.h | 3 + migration/savevm.c | 36 +++++ migration/savevm.h | 2 + qapi/migration.json | 24 ++- qapi/net.json | 16 ++ qdev-monitor.c | 38 ++++- tests/libqos/libqos.c | 3 +- vl.c | 6 +- 20 files changed, 524 insertions(+), 14 deletions(-)
This is implementing the host side of the net_failover concept (https://www.kernel.org/doc/html/latest/networking/net_failover.html) Changes since v3: * Patch 1, make return values of qdev_should_hide_device() more clear * Patch 1, clarify comment about new should_be_hidden DeviceListener * Patch 2 new patch, add net_failover_option_id to PCIDevice, only allow PCIExpress devices for now * Patch 8, only go into wait_unplug state when failover devices are present * Patch 8, add new state to migration_is_setup_or_active, tested cancelling while migration is in this state * Patch 8, simplify handling of wait_unplug state, don't cancel migration after timeout, let upper layer do this, get rid of retry counter (dgilbert) * Patch 11, move net_failover_pair_id to PCIDev, move check for pci class to PCI code, only allow PCIe devices for now as we only support hotplugging these devices (aw) * verified qemu make check tests ran, checked that docker-test-quick@centos7 runs successful, tested migration with/without failover, without vfio-pci * this now allows only PCIe devices because that's the only hotplug controller that supports the partial unplug as of now. I'll work on making it discoverable for libvirt or on support for the other hotplug controllers in a follow-on patch set The general idea is that we have a pair of devices, a vfio-pci and a virtio-net device. Before migration the vfio device is unplugged and data flows to the virtio-net device, on the target side another vfio-pci device is plugged in to take over the data-path. In the guest the net_failover module will pair net devices with the same MAC address. * Patch 1 adds the infrastructure to hide the device for the qbus and qdev APIs * Patch 2 adds checks to PCIDevice for only allowing ethernet devices as failover primary and only PCIExpress capable devices * Patch 3 sets a new flag for PCIDevice 'partially_hotplugged' which we use to skip the unrealize code path when doing a unplug of the primary device * Patch 4 sets the pending_deleted_event before triggering the guest unplug request * Patch 5 and 6 add new qmp events, one sends the device id of a device that was just requested to be unplugged from the guest and another one to let libvirt know if VIRTIO_NET_F_STANDBY was negotiated * Patch 7 make sure that we can unplug the vfio-device before migration starts * Patch 8 adds a new migration state that is entered while we wait for devices to be unplugged by guest OS * Patch 9 just adds the new migration state to a check in libqos code * Patch 10 In the second patch the virtio-net uses the API to defer adding the vfio device until the VIRTIO_NET_F_STANDBY feature is acked. It also implements the migration handler to unplug the device from the guest and re-plug in case of migration failure * Patch 11 allows migration for failover vfio-pci devices Previous discussion: RFC v1 https://patchwork.ozlabs.org/cover/989098/ RFC v2 https://www.mail-archive.com/qemu-devel@nongnu.org/msg606906.html v1: https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg03968.html v2: https://www.mail-archive.com/qemu-devel@nongnu.org/msg635214.html v3: https://patchew.org/QEMU/20191011112015.11785-1-jfreimann@redhat.com/ To summarize concerns/feedback from previous discussion: 1.- guest OS can reject or worse _delay_ unplug by any amount of time. Migration might get stuck for unpredictable time with unclear reason. This approach combines two tricky things, hot/unplug and migration. -> We need to let libvirt know what's happening. Add new qmp events and a new migration state. When a primary device is (partially) unplugged (only from guest) we send a qmp event with the device id. When it is unplugged from the guest the DEVICE_DELETED event is sent. Migration will enter the wait-unplug state while waiting for the guest os to unplug all primary devices and then move on with migration. 2. PCI devices are a precious ressource. The primary device should never be added to QEMU if it won't be used by guest instead of hiding it in QEMU. -> We only hotplug the device when the standby feature bit was negotiated. We save the device cmdline options until we need it for qdev_device_add() Hiding a device can be a useful concept to model. For example a pci device in a powered-off slot could be marked as hidden until the slot is powered on (mst). 3. Management layer software should handle this. Open Stack already has components/code to handle unplug/replug VFIO devices and metadata to provide to the guest for detecting which devices should be paired. -> An approach that includes all software from firmware to higher-level management software wasn't tried in the last years. This is an attempt to keep it simple and contained in QEMU as much as possible. One of the problems that stopped management software and libvirt from implementing this idea is that it can't be sure that it's possible to re-plug the primary device. By not freeing the devices resources in QEMU and only asking the guest OS to unplug it is possible to re-plug the device in case of a migration failure. 4. Hotplugging a device and then making it part of a failover setup is not possible -> addressed by extending qdev hotplug functions to check for hidden attribute, so e.g. device_add can be used to plug a device. I have tested this with a mlx5 and igb NIC and was able to migrate the VM. Command line example: qemu-system-x86_64 -enable-kvm -m 3072 -smp 3 \ -machine q35,kernel-irqchip=split -cpu host \ -serial stdio \ -net none \ -qmp unix:/tmp/qmp.socket,server,nowait \ -monitor telnet:127.0.0.1:5555,server,nowait \ -device pcie-root-port,id=root0,multifunction=on,chassis=0,addr=0xa \ -device pcie-root-port,id=root1,bus=pcie.0,chassis=1 \ -device pcie-root-port,id=root2,bus=pcie.0,chassis=2 \ -netdev tap,script=/root/bin/bridge.sh,downscript=no,id=hostnet1,vhost=on \ -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:6f:55:cc,bus=root2,failover=on \ -device vfio-pci,host=5e:00.2,id=hostdev0,bus=root1,net_failover_pair_id =net1 \ /root/rhel-guest-image-8.0-1781.x86_64.qcow2 I'm grateful for any remarks or ideas! Thanks! regards, Jens Jens Freimann (10): qdev/qbus: add hidden device support pci: mark devices partially unplugged pci: mark device having guest unplug request pending qapi: add unplug primary event qapi: add failover negotiated event migration: allow unplug during migration for failover devices migration: add new migration state wait-unplug libqos: tolerate wait-unplug migration state net/virtio: add failover support vfio: unplug failover primary device before migration hw/core/qdev.c | 20 +++ hw/net/virtio-net.c | 267 +++++++++++++++++++++++++++++++++ hw/pci/pci.c | 2 + hw/pci/pcie.c | 6 + hw/vfio/pci.c | 35 ++++- hw/vfio/pci.h | 2 + include/hw/pci/pci.h | 1 + include/hw/qdev-core.h | 10 ++ include/hw/virtio/virtio-net.h | 12 ++ include/hw/virtio/virtio.h | 1 + include/migration/vmstate.h | 2 + migration/migration.c | 34 +++++ migration/migration.h | 3 + migration/savevm.c | 36 +++++ migration/savevm.h | 1 + qapi/migration.json | 24 ++- qapi/net.json | 16 ++ qdev-monitor.c | 43 +++++- tests/libqos/libqos.c | 3 +- vl.c | 6 +- 20 files changed, 515 insertions(+), 9 deletions(-) -- 2.21.0 *** BLURB HERE *** Jens Freimann (11): qdev/qbus: add hidden device support pci: add option for net failover pci: mark devices partially unplugged pci: mark device having guest unplug request pending qapi: add unplug primary event qapi: add failover negotiated event migration: allow unplug during migration for failover devices migration: add new migration state wait-unplug libqos: tolerate wait-unplug migration state net/virtio: add failover support vfio: unplug failover primary device before migration hw/core/qdev.c | 24 +++ hw/net/virtio-net.c | 282 +++++++++++++++++++++++++++++++++ hw/pci/pci.c | 17 ++ hw/pci/pcie.c | 6 + hw/vfio/pci.c | 26 ++- hw/vfio/pci.h | 1 + include/hw/pci/pci.h | 4 + include/hw/qdev-core.h | 9 ++ include/hw/virtio/virtio-net.h | 12 ++ include/hw/virtio/virtio.h | 1 + include/migration/vmstate.h | 2 + migration/migration.c | 22 +++ migration/migration.h | 3 + migration/savevm.c | 36 +++++ migration/savevm.h | 1 + qapi/migration.json | 24 ++- qapi/net.json | 16 ++ qdev-monitor.c | 38 ++++- tests/libqos/libqos.c | 3 +- vl.c | 6 +- 20 files changed, 519 insertions(+), 14 deletions(-) -- 2.21.0 Jens Freimann (11): qdev/qbus: add hidden device support pci: add option for net failover pci: mark devices partially unplugged pci: mark device having guest unplug request pending qapi: add unplug primary event qapi: add failover negotiated event migration: allow unplug during migration for failover devices migration: add new migration state wait-unplug libqos: tolerate wait-unplug migration state net/virtio: add failover support vfio: unplug failover primary device before migration hw/core/qdev.c | 24 +++ hw/net/virtio-net.c | 282 +++++++++++++++++++++++++++++++++ hw/pci/pci.c | 17 ++ hw/pci/pcie.c | 6 + hw/vfio/pci.c | 26 ++- hw/vfio/pci.h | 1 + include/hw/pci/pci.h | 4 + include/hw/qdev-core.h | 9 ++ include/hw/virtio/virtio-net.h | 12 ++ include/hw/virtio/virtio.h | 1 + include/migration/vmstate.h | 2 + migration/migration.c | 21 +++ migration/migration.h | 3 + migration/savevm.c | 36 +++++ migration/savevm.h | 2 + qapi/migration.json | 24 ++- qapi/net.json | 16 ++ qdev-monitor.c | 38 ++++- tests/libqos/libqos.c | 3 +- vl.c | 6 +- 20 files changed, 519 insertions(+), 14 deletions(-) -- 2.21.0 Jens Freimann (11): qdev/qbus: add hidden device support pci: add option for net failover pci: mark devices partially unplugged pci: mark device having guest unplug request pending qapi: add unplug primary event qapi: add failover negotiated event migration: allow unplug during migration for failover devices migration: add new migration state wait-unplug libqos: tolerate wait-unplug migration state net/virtio: add failover support vfio: unplug failover primary device before migration hw/core/qdev.c | 24 +++ hw/net/virtio-net.c | 282 +++++++++++++++++++++++++++++++++ hw/pci/pci.c | 17 ++ hw/pci/pcie.c | 6 + hw/vfio/pci.c | 31 +++- hw/vfio/pci.h | 1 + include/hw/pci/pci.h | 4 + include/hw/qdev-core.h | 9 ++ include/hw/virtio/virtio-net.h | 12 ++ include/hw/virtio/virtio.h | 1 + include/migration/vmstate.h | 2 + migration/migration.c | 21 +++ migration/migration.h | 3 + migration/savevm.c | 36 +++++ migration/savevm.h | 2 + qapi/migration.json | 24 ++- qapi/net.json | 16 ++ qdev-monitor.c | 38 ++++- tests/libqos/libqos.c | 3 +- vl.c | 6 +- 20 files changed, 524 insertions(+), 14 deletions(-) -- 2.21.0
On Fri, 18 Oct 2019 22:20:29 +0200 Jens Freimann <jfreimann@redhat.com> wrote: > This is implementing the host side of the net_failover concept > (https://www.kernel.org/doc/html/latest/networking/net_failover.html) (...) > Jens Freimann (10): > qdev/qbus: add hidden device support > pci: mark devices partially unplugged > pci: mark device having guest unplug request pending > qapi: add unplug primary event > qapi: add failover negotiated event > migration: allow unplug during migration for failover devices > migration: add new migration state wait-unplug > libqos: tolerate wait-unplug migration state > net/virtio: add failover support > vfio: unplug failover primary device before migration I have looked over the patches I have not commented on directly as well, and they look sane to me (i.e. I didn't spot any obvious problems). Feel free to add my ack if you like.
Patchew URL: https://patchew.org/QEMU/20191018202040.30349-1-jfreimann@redhat.com/ Hi, This series failed the docker-mingw@fedora build test. Please find the testing commands and their output below. If you have Docker installed, you can probably reproduce it locally. === TEST SCRIPT BEGIN === #! /bin/bash export ARCH=x86_64 make docker-image-fedora V=1 NETWORK=1 time make docker-test-mingw@fedora J=14 NETWORK=1 === TEST SCRIPT END === CC hw/core/platform-bus.o CC hw/core/generic-loader.o /tmp/qemu-test/src/hw/core/qdev.c: In function 'qdev_should_hide_device': /tmp/qemu-test/src/hw/core/qdev.c:235:15: error: 'rc' may be used uninitialized in this function [-Werror=maybe-uninitialized] return rc > 0; ~~~^~~ cc1: all warnings being treated as errors make: *** [/tmp/qemu-test/src/rules.mak:69: hw/core/qdev.o] Error 1 make: *** Waiting for unfinished jobs.... Traceback (most recent call last): File "./tests/docker/docker.py", line 662, in <module> --- raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', '--label', 'com.qemu.instance.uuid=1d57bd92e38c4da3a26a7b0d378b562e', '-u', '1001', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 'SHOW_ENV=', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/home/patchew/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-ggapjx1d/src/docker-src.2019-10-19-11.13.01.17849:/var/tmp/qemu:z,ro', 'qemu:fedora', '/var/tmp/qemu/run', 'test-mingw']' returned non-zero exit status 2. filter=--filter=label=com.qemu.instance.uuid=1d57bd92e38c4da3a26a7b0d378b562e make[1]: *** [docker-run] Error 1 make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-ggapjx1d/src' make: *** [docker-run-test-mingw@fedora] Error 2 real 2m33.496s user 0m8.171s The full log is available at http://patchew.org/logs/20191018202040.30349-1-jfreimann@redhat.com/testing.docker-mingw@fedora/?type=message. --- Email generated automatically by Patchew [https://patchew.org/]. Please send your feedback to patchew-devel@redhat.com
Patchew URL: https://patchew.org/QEMU/20191018202040.30349-1-jfreimann@redhat.com/ Hi, This series failed the docker-quick@centos7 build test. Please find the testing commands and their output below. If you have Docker installed, you can probably reproduce it locally. === TEST SCRIPT BEGIN === #!/bin/bash make docker-image-centos7 V=1 NETWORK=1 time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1 === TEST SCRIPT END === CC hw/core/or-irq.o CC hw/core/split-irq.o /tmp/qemu-test/src/hw/core/qdev.c: In function 'qdev_should_hide_device': /tmp/qemu-test/src/hw/core/qdev.c:235:5: error: 'rc' may be used uninitialized in this function [-Werror=maybe-uninitialized] return rc > 0; ^ cc1: all warnings being treated as errors make: *** [hw/core/qdev.o] Error 1 make: *** Waiting for unfinished jobs.... Traceback (most recent call last): File "./tests/docker/docker.py", line 662, in <module> --- raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', '--label', 'com.qemu.instance.uuid=45df366e5eaf45f6ac429142fe2cc309', '-u', '1001', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/home/patchew/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-ozx9dk_0/src/docker-src.2019-10-19-11.10.07.7972:/var/tmp/qemu:z,ro', 'qemu:centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit status 2. filter=--filter=label=com.qemu.instance.uuid=45df366e5eaf45f6ac429142fe2cc309 make[1]: *** [docker-run] Error 1 make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-ozx9dk_0/src' make: *** [docker-run-test-quick@centos7] Error 2 real 2m19.619s user 0m8.409s The full log is available at http://patchew.org/logs/20191018202040.30349-1-jfreimann@redhat.com/testing.docker-quick@centos7/?type=message. --- Email generated automatically by Patchew [https://patchew.org/]. Please send your feedback to patchew-devel@redhat.com
On Sat, Oct 19, 2019 at 08:12:27AM -0700, no-reply@patchew.org wrote: >Patchew URL: https://patchew.org/QEMU/20191018202040.30349-1-jfreimann@redhat.com/ > > > >Hi, > >This series failed the docker-quick@centos7 build test. Please find the testing commands and >their output below. If you have Docker installed, you can probably reproduce it >locally. > >=== TEST SCRIPT BEGIN === >#!/bin/bash >make docker-image-centos7 V=1 NETWORK=1 >time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1 >=== TEST SCRIPT END === > > CC hw/core/or-irq.o > CC hw/core/split-irq.o >/tmp/qemu-test/src/hw/core/qdev.c: In function 'qdev_should_hide_device': >/tmp/qemu-test/src/hw/core/qdev.c:235:5: error: 'rc' may be used uninitialized in this function [-Werror=maybe-uninitialized] > return rc > 0; hmpf, always run all tests especially after last minute changes on friday afternoon. I'll fix this. regards, Jens ^ >cc1: all warnings being treated as errors >make: *** [hw/core/qdev.o] Error 1 >make: *** Waiting for unfinished jobs.... >Traceback (most recent call last): > File "./tests/docker/docker.py", line 662, in <module> >--- > raise CalledProcessError(retcode, cmd) >subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', '--label', 'com.qemu.instance.uuid=45df366e5eaf45f6ac429142fe2cc309', '-u', '1001', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/home/patchew/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-ozx9dk_0/src/docker-src.2019-10-19-11.10.07.7972:/var/tmp/qemu:z,ro', 'qemu:centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit status 2. >filter=--filter=label=com.qemu.instance.uuid=45df366e5eaf45f6ac429142fe2cc309 >make[1]: *** [docker-run] Error 1 >make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-ozx9dk_0/src' >make: *** [docker-run-test-quick@centos7] Error 2 > >real 2m19.619s >user 0m8.409s > > >The full log is available at >http://patchew.org/logs/20191018202040.30349-1-jfreimann@redhat.com/testing.docker-quick@centos7/?type=message. >--- >Email generated automatically by Patchew [https://patchew.org/]. >Please send your feedback to patchew-devel@redhat.com
© 2016 - 2024 Red Hat, Inc.