[PATCH v6 0/11] add failover feature for assigned network devices

Jens Freimann posted 11 patches 4 years, 6 months ago
Test asan passed
Test checkpatch passed
Test FreeBSD passed
Test docker-mingw@fedora passed
Test docker-clang@ubuntu passed
Test docker-quick@centos7 passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20191025121930.6855-1-jfreimann@redhat.com
Maintainers: "Michael S. Tsirkin" <mst@redhat.com>, "Daniel P. Berrangé" <berrange@redhat.com>, Thomas Huth <thuth@redhat.com>, Marcel Apfelbaum <marcel.apfelbaum@gmail.com>, Jason Wang <jasowang@redhat.com>, Alex Williamson <alex.williamson@redhat.com>, Eduardo Habkost <ehabkost@redhat.com>, "Dr. David Alan Gilbert" <dgilbert@redhat.com>, Juan Quintela <quintela@redhat.com>, Markus Armbruster <armbru@redhat.com>, Laurent Vivier <lvivier@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>, Eric Blake <eblake@redhat.com>
There is a newer version of this series
MAINTAINERS                    |   1 +
docs/virtio-net-failover.rst   |  68 ++++++++
hw/core/qdev.c                 |  25 +++
hw/net/virtio-net.c            | 302 +++++++++++++++++++++++++++++++++
hw/pci/pci.c                   |  32 ++++
hw/pci/pcie.c                  |   6 +
hw/vfio/pci.c                  |  26 ++-
hw/vfio/pci.h                  |   1 +
include/hw/pci/pci.h           |   4 +
include/hw/qdev-core.h         |  30 ++++
include/hw/virtio/virtio-net.h |  12 ++
include/hw/virtio/virtio.h     |   1 +
include/migration/vmstate.h    |   2 +
migration/migration.c          |  21 +++
migration/migration.h          |   3 +
migration/savevm.c             |  36 ++++
migration/savevm.h             |   2 +
qapi/migration.json            |  24 ++-
qapi/net.json                  |  19 +++
qdev-monitor.c                 |  43 ++++-
tests/libqos/libqos.c          |   3 +-
vl.c                           |   6 +-
22 files changed, 652 insertions(+), 15 deletions(-)
create mode 100644 docs/virtio-net-failover.rst
[PATCH v6 0/11] add failover feature for assigned network devices
Posted by Jens Freimann 4 years, 6 months ago
This is implementing the host side of the net_failover concept
(https://www.kernel.org/doc/html/latest/networking/net_failover.html)

Changes since v5:
* rename net_failover_pair_id parameter/property to failover_pair_id
* in PCI code use pci_bus_is_express(). This won't fail on functions > 0
* make sure primary and standby can't be added to same PCI slot
* add documentation file in docs/ to virtio-net patch, add file to
   MAINTAINERS (added to networking devices section)
* add comment to QAPI event for failover negotiation, try to improve
   commit message 

The general idea is that we have a pair of devices, a vfio-pci and a
virtio-net device. Before migration the vfio device is unplugged and data
flows to the virtio-net device, on the target side another vfio-pci device
is plugged in to take over the data-path. In the guest the net_failover
module will pair net devices with the same MAC address.

* Patch 1 adds the infrastructure to hide the device for the qbus and qdev APIs

* Patch 2 adds checks to PCIDevice for only allowing ethernet devices as
  failover primary and only PCIExpress capable devices

* Patch 3 sets a new flag for PCIDevice 'partially_hotplugged' which we
  use to skip the unrealize code path when doing a unplug of the primary
  device

* Patch 4 sets the pending_deleted_event before triggering the guest
  unplug request

* Patch 5 and 6 add new qmp events, one sends the device id of a device
  that was just requested to be unplugged from the guest and another one
  to let libvirt know if VIRTIO_NET_F_STANDBY was negotiated

* Patch 7 make sure that we can unplug the vfio-device before
  migration starts

* Patch 8 adds a new migration state that is entered while we wait for
  devices to be unplugged by guest OS

* Patch 9 just adds the new migration state to a check in libqos code

* Patch 10 In the second patch the virtio-net uses the API to defer adding the vfio
  device until the VIRTIO_NET_F_STANDBY feature is acked. It also
  implements the migration handler to unplug the device from the guest and
  re-plug in case of migration failure

* Patch 11 allows migration for failover vfio-pci devices

Previous discussion:
  RFC v1 https://patchwork.ozlabs.org/cover/989098/
  RFC v2 https://www.mail-archive.com/qemu-devel@nongnu.org/msg606906.html
  v1: https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg03968.html
  v2: https://www.mail-archive.com/qemu-devel@nongnu.org/msg635214.html
  v3: https://patchew.org/QEMU/20191011112015.11785-1-jfreimann@redhat.com/
  v4: https://patchew.org/QEMU/20191018202040.30349-1-jfreimann@redhat.com/
  v5: https://patchew.org/QEMU/20191023082711.16694-1-jfreimann@redhat.com/

To summarize concerns/feedback from previous discussion:
1.- guest OS can reject or worse _delay_ unplug by any amount of time.
  Migration might get stuck for unpredictable time with unclear reason.
  This approach combines two tricky things, hot/unplug and migration.
  -> We need to let libvirt know what's happening. Add new qmp events
     and a new migration state. When a primary device is (partially)
     unplugged (only from guest) we send a qmp event with the device id. When
     it is unplugged from the guest the DEVICE_DELETED event is sent.
     Migration will enter the wait-unplug state while waiting for the guest
     os to unplug all primary devices and then move on with migration.
2. PCI devices are a precious ressource. The primary device should never
  be added to QEMU if it won't be used by guest instead of hiding it in
  QEMU.
  -> We only hotplug the device when the standby feature bit was
     negotiated. We save the device cmdline options until we need it for
     qdev_device_add()
     Hiding a device can be a useful concept to model. For example a
     pci device in a powered-off slot could be marked as hidden until the slot is
     powered on (mst).
3. Management layer software should handle this. Open Stack already has
  components/code to handle unplug/replug VFIO devices and metadata to
  provide to the guest for detecting which devices should be paired.
  -> An approach that includes all software from firmware to
     higher-level management software wasn't tried in the last years. This is
     an attempt to keep it simple and contained in QEMU as much as possible.
     One of the problems that stopped management software and libvirt from
     implementing this idea is that it can't be sure that it's possible to
     re-plug the primary device. By not freeing the devices resources in QEMU
     and only asking the guest OS to unplug it is possible to re-plug the
     device in case of a migration failure.
4. Hotplugging a device and then making it part of a failover setup is
   not possible
  -> addressed by extending qdev hotplug functions to check for hidden
     attribute, so e.g. device_add can be used to plug a device.


I have tested this with a mlx5 and igb NIC and was able to migrate the VM.

Command line example:

qemu-system-x86_64 -enable-kvm -m 3072 -smp 3 \
        -machine q35,kernel-irqchip=split -cpu host   \
        -serial stdio   \
        -net none \
        -qmp unix:/tmp/qmp.socket,server,nowait \
        -monitor telnet:127.0.0.1:5555,server,nowait \
        -device pcie-root-port,id=root0,multifunction=on,chassis=0,addr=0xa \
        -device pcie-root-port,id=root1,bus=pcie.0,chassis=1 \
        -device pcie-root-port,id=root2,bus=pcie.0,chassis=2 \
        -netdev tap,script=/root/bin/bridge.sh,downscript=no,id=hostnet1,vhost=on \
        -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:6f:55:cc,bus=root2,failover=on \
	-device vfio-pci,host=5e:00.2,id=hostdev0,bus=root1,failover_pair_id =net1 \
        /root/rhel-guest-image-8.0-1781.x86_64.qcow2

I'm grateful for any remarks or ideas!

Thanks!

regards,
Jens 


Jens Freimann (11):
  qdev/qbus: add hidden device support
  pci: add option for net failover
  pci: mark devices partially unplugged
  pci: mark device having guest unplug request pending
  qapi: add unplug primary event
  qapi: add failover negotiated event
  migration: allow unplug during migration for failover devices
  migration: add new migration state wait-unplug
  libqos: tolerate wait-unplug migration state
  net/virtio: add failover support
  vfio: unplug failover primary device before migration

 MAINTAINERS                    |   1 +
 docs/virtio-net-failover.rst   |  68 ++++++++
 hw/core/qdev.c                 |  25 +++
 hw/net/virtio-net.c            | 302 +++++++++++++++++++++++++++++++++
 hw/pci/pci.c                   |  32 ++++
 hw/pci/pcie.c                  |   6 +
 hw/vfio/pci.c                  |  26 ++-
 hw/vfio/pci.h                  |   1 +
 include/hw/pci/pci.h           |   4 +
 include/hw/qdev-core.h         |  30 ++++
 include/hw/virtio/virtio-net.h |  12 ++
 include/hw/virtio/virtio.h     |   1 +
 include/migration/vmstate.h    |   2 +
 migration/migration.c          |  21 +++
 migration/migration.h          |   3 +
 migration/savevm.c             |  36 ++++
 migration/savevm.h             |   2 +
 qapi/migration.json            |  24 ++-
 qapi/net.json                  |  19 +++
 qdev-monitor.c                 |  43 ++++-
 tests/libqos/libqos.c          |   3 +-
 vl.c                           |   6 +-
 22 files changed, 652 insertions(+), 15 deletions(-)
 create mode 100644 docs/virtio-net-failover.rst

-- 
2.21.0


Re: [PATCH v6 0/11] add failover feature for assigned network devices
Posted by Roman Kagan 4 years, 6 months ago
On Fri, Oct 25, 2019 at 02:19:19PM +0200, Jens Freimann wrote:
> This is implementing the host side of the net_failover concept
> (https://www.kernel.org/doc/html/latest/networking/net_failover.html)
> 
> Changes since v5:
> * rename net_failover_pair_id parameter/property to failover_pair_id
> * in PCI code use pci_bus_is_express(). This won't fail on functions > 0
> * make sure primary and standby can't be added to same PCI slot
> * add documentation file in docs/ to virtio-net patch, add file to
>    MAINTAINERS (added to networking devices section)
> * add comment to QAPI event for failover negotiation, try to improve
>    commit message 
> 
> The general idea is that we have a pair of devices, a vfio-pci and a
> virtio-net device. Before migration the vfio device is unplugged and data
> flows to the virtio-net device, on the target side another vfio-pci device
> is plugged in to take over the data-path. In the guest the net_failover
> module will pair net devices with the same MAC address.
> 
> * Patch 1 adds the infrastructure to hide the device for the qbus and qdev APIs
> 
> * Patch 2 adds checks to PCIDevice for only allowing ethernet devices as
>   failover primary and only PCIExpress capable devices
> 
> * Patch 3 sets a new flag for PCIDevice 'partially_hotplugged' which we
>   use to skip the unrealize code path when doing a unplug of the primary
>   device
> 
> * Patch 4 sets the pending_deleted_event before triggering the guest
>   unplug request
> 
> * Patch 5 and 6 add new qmp events, one sends the device id of a device
>   that was just requested to be unplugged from the guest and another one
>   to let libvirt know if VIRTIO_NET_F_STANDBY was negotiated
> 
> * Patch 7 make sure that we can unplug the vfio-device before
>   migration starts
> 
> * Patch 8 adds a new migration state that is entered while we wait for
>   devices to be unplugged by guest OS
> 
> * Patch 9 just adds the new migration state to a check in libqos code
> 
> * Patch 10 In the second patch the virtio-net uses the API to defer adding the vfio
>   device until the VIRTIO_NET_F_STANDBY feature is acked. It also
>   implements the migration handler to unplug the device from the guest and
>   re-plug in case of migration failure
> 
> * Patch 11 allows migration for failover vfio-pci devices
> 
> Previous discussion:
>   RFC v1 https://patchwork.ozlabs.org/cover/989098/
>   RFC v2 https://www.mail-archive.com/qemu-devel@nongnu.org/msg606906.html
>   v1: https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg03968.html
>   v2: https://www.mail-archive.com/qemu-devel@nongnu.org/msg635214.html
>   v3: https://patchew.org/QEMU/20191011112015.11785-1-jfreimann@redhat.com/
>   v4: https://patchew.org/QEMU/20191018202040.30349-1-jfreimann@redhat.com/
>   v5: https://patchew.org/QEMU/20191023082711.16694-1-jfreimann@redhat.com/
> 
> To summarize concerns/feedback from previous discussion:
> 1.- guest OS can reject or worse _delay_ unplug by any amount of time.
>   Migration might get stuck for unpredictable time with unclear reason.
>   This approach combines two tricky things, hot/unplug and migration.
>   -> We need to let libvirt know what's happening. Add new qmp events
>      and a new migration state. When a primary device is (partially)
>      unplugged (only from guest) we send a qmp event with the device id. When
>      it is unplugged from the guest the DEVICE_DELETED event is sent.
>      Migration will enter the wait-unplug state while waiting for the guest
>      os to unplug all primary devices and then move on with migration.
> 2. PCI devices are a precious ressource. The primary device should never
>   be added to QEMU if it won't be used by guest instead of hiding it in
>   QEMU.
>   -> We only hotplug the device when the standby feature bit was
>      negotiated. We save the device cmdline options until we need it for
>      qdev_device_add()

The status of the feature support in the guest can change.  E.g. a guest
reboot will clear it for certain, and the guest may boot into another OS
that doesn't support pv-pt failover, and will become confused by two
network devices with the same MAC.  AFAICS from a brief skimming, the
patchset doesn't appear to address this scenario (which is probably not
so uncommon).

>      Hiding a device can be a useful concept to model. For example a
>      pci device in a powered-off slot could be marked as hidden until the slot is
>      powered on (mst).
> 3. Management layer software should handle this. Open Stack already has
>   components/code to handle unplug/replug VFIO devices and metadata to
>   provide to the guest for detecting which devices should be paired.
>   -> An approach that includes all software from firmware to
>      higher-level management software wasn't tried in the last years. This is
>      an attempt to keep it simple and contained in QEMU as much as possible.
>      One of the problems that stopped management software and libvirt from
>      implementing this idea is that it can't be sure that it's possible to
>      re-plug the primary device. By not freeing the devices resources in QEMU
>      and only asking the guest OS to unplug it is possible to re-plug the
>      device in case of a migration failure.

Frankly I'm failing to see the point in requiring 100%-reliable re-plug
on migration rollback.  The whole idea of this failover is to allow
temporary QOS degradation; if this isn't allowed you don't even consider
migrating.  So if the migration fails, you can leave the guest in the
degraded state on the source host until a better migraion target is
found or the conditions on the source host allow the re-plug to succeed.

Thanks,
Roman.

Re: [PATCH v6 0/11] add failover feature for assigned network devices
Posted by Jens Freimann 4 years, 6 months ago
Hi Michael,

I addressed all comments and feedback and think this can be merged but
I'm unclear about which tree it should go to. Will you merge it into
the virtio-tree?

regards,
Jens

On Fri, Oct 25, 2019 at 02:19:19PM +0200, Jens Freimann wrote:
>This is implementing the host side of the net_failover concept
>(https://www.kernel.org/doc/html/latest/networking/net_failover.html)
>
>Changes since v5:
>* rename net_failover_pair_id parameter/property to failover_pair_id
>* in PCI code use pci_bus_is_express(). This won't fail on functions > 0
>* make sure primary and standby can't be added to same PCI slot
>* add documentation file in docs/ to virtio-net patch, add file to
>   MAINTAINERS (added to networking devices section)
>* add comment to QAPI event for failover negotiation, try to improve
>   commit message
>
>The general idea is that we have a pair of devices, a vfio-pci and a
>virtio-net device. Before migration the vfio device is unplugged and data
>flows to the virtio-net device, on the target side another vfio-pci device
>is plugged in to take over the data-path. In the guest the net_failover
>module will pair net devices with the same MAC address.
>
>* Patch 1 adds the infrastructure to hide the device for the qbus and qdev APIs
>
>* Patch 2 adds checks to PCIDevice for only allowing ethernet devices as
>  failover primary and only PCIExpress capable devices
>
>* Patch 3 sets a new flag for PCIDevice 'partially_hotplugged' which we
>  use to skip the unrealize code path when doing a unplug of the primary
>  device
>
>* Patch 4 sets the pending_deleted_event before triggering the guest
>  unplug request
>
>* Patch 5 and 6 add new qmp events, one sends the device id of a device
>  that was just requested to be unplugged from the guest and another one
>  to let libvirt know if VIRTIO_NET_F_STANDBY was negotiated
>
>* Patch 7 make sure that we can unplug the vfio-device before
>  migration starts
>
>* Patch 8 adds a new migration state that is entered while we wait for
>  devices to be unplugged by guest OS
>
>* Patch 9 just adds the new migration state to a check in libqos code
>
>* Patch 10 In the second patch the virtio-net uses the API to defer adding the vfio
>  device until the VIRTIO_NET_F_STANDBY feature is acked. It also
>  implements the migration handler to unplug the device from the guest and
>  re-plug in case of migration failure
>
>* Patch 11 allows migration for failover vfio-pci devices
>
>Previous discussion:
>  RFC v1 https://patchwork.ozlabs.org/cover/989098/
>  RFC v2 https://www.mail-archive.com/qemu-devel@nongnu.org/msg606906.html
>  v1: https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg03968.html
>  v2: https://www.mail-archive.com/qemu-devel@nongnu.org/msg635214.html
>  v3: https://patchew.org/QEMU/20191011112015.11785-1-jfreimann@redhat.com/
>  v4: https://patchew.org/QEMU/20191018202040.30349-1-jfreimann@redhat.com/
>  v5: https://patchew.org/QEMU/20191023082711.16694-1-jfreimann@redhat.com/
>
>To summarize concerns/feedback from previous discussion:
>1.- guest OS can reject or worse _delay_ unplug by any amount of time.
>  Migration might get stuck for unpredictable time with unclear reason.
>  This approach combines two tricky things, hot/unplug and migration.
>  -> We need to let libvirt know what's happening. Add new qmp events
>     and a new migration state. When a primary device is (partially)
>     unplugged (only from guest) we send a qmp event with the device id. When
>     it is unplugged from the guest the DEVICE_DELETED event is sent.
>     Migration will enter the wait-unplug state while waiting for the guest
>     os to unplug all primary devices and then move on with migration.
>2. PCI devices are a precious ressource. The primary device should never
>  be added to QEMU if it won't be used by guest instead of hiding it in
>  QEMU.
>  -> We only hotplug the device when the standby feature bit was
>     negotiated. We save the device cmdline options until we need it for
>     qdev_device_add()
>     Hiding a device can be a useful concept to model. For example a
>     pci device in a powered-off slot could be marked as hidden until the slot is
>     powered on (mst).
>3. Management layer software should handle this. Open Stack already has
>  components/code to handle unplug/replug VFIO devices and metadata to
>  provide to the guest for detecting which devices should be paired.
>  -> An approach that includes all software from firmware to
>     higher-level management software wasn't tried in the last years. This is
>     an attempt to keep it simple and contained in QEMU as much as possible.
>     One of the problems that stopped management software and libvirt from
>     implementing this idea is that it can't be sure that it's possible to
>     re-plug the primary device. By not freeing the devices resources in QEMU
>     and only asking the guest OS to unplug it is possible to re-plug the
>     device in case of a migration failure.
>4. Hotplugging a device and then making it part of a failover setup is
>   not possible
>  -> addressed by extending qdev hotplug functions to check for hidden
>     attribute, so e.g. device_add can be used to plug a device.
>
>
>I have tested this with a mlx5 and igb NIC and was able to migrate the VM.
>
>Command line example:
>
>qemu-system-x86_64 -enable-kvm -m 3072 -smp 3 \
>        -machine q35,kernel-irqchip=split -cpu host   \
>        -serial stdio   \
>        -net none \
>        -qmp unix:/tmp/qmp.socket,server,nowait \
>        -monitor telnet:127.0.0.1:5555,server,nowait \
>        -device pcie-root-port,id=root0,multifunction=on,chassis=0,addr=0xa \
>        -device pcie-root-port,id=root1,bus=pcie.0,chassis=1 \
>        -device pcie-root-port,id=root2,bus=pcie.0,chassis=2 \
>        -netdev tap,script=/root/bin/bridge.sh,downscript=no,id=hostnet1,vhost=on \
>        -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:6f:55:cc,bus=root2,failover=on \
>	-device vfio-pci,host=5e:00.2,id=hostdev0,bus=root1,failover_pair_id =net1 \
>        /root/rhel-guest-image-8.0-1781.x86_64.qcow2
>
>I'm grateful for any remarks or ideas!
>
>Thanks!
>
>regards,
>Jens
>
>
>Jens Freimann (11):
>  qdev/qbus: add hidden device support
>  pci: add option for net failover
>  pci: mark devices partially unplugged
>  pci: mark device having guest unplug request pending
>  qapi: add unplug primary event
>  qapi: add failover negotiated event
>  migration: allow unplug during migration for failover devices
>  migration: add new migration state wait-unplug
>  libqos: tolerate wait-unplug migration state
>  net/virtio: add failover support
>  vfio: unplug failover primary device before migration
>
> MAINTAINERS                    |   1 +
> docs/virtio-net-failover.rst   |  68 ++++++++
> hw/core/qdev.c                 |  25 +++
> hw/net/virtio-net.c            | 302 +++++++++++++++++++++++++++++++++
> hw/pci/pci.c                   |  32 ++++
> hw/pci/pcie.c                  |   6 +
> hw/vfio/pci.c                  |  26 ++-
> hw/vfio/pci.h                  |   1 +
> include/hw/pci/pci.h           |   4 +
> include/hw/qdev-core.h         |  30 ++++
> include/hw/virtio/virtio-net.h |  12 ++
> include/hw/virtio/virtio.h     |   1 +
> include/migration/vmstate.h    |   2 +
> migration/migration.c          |  21 +++
> migration/migration.h          |   3 +
> migration/savevm.c             |  36 ++++
> migration/savevm.h             |   2 +
> qapi/migration.json            |  24 ++-
> qapi/net.json                  |  19 +++
> qdev-monitor.c                 |  43 ++++-
> tests/libqos/libqos.c          |   3 +-
> vl.c                           |   6 +-
> 22 files changed, 652 insertions(+), 15 deletions(-)
> create mode 100644 docs/virtio-net-failover.rst
>
>-- 
>2.21.0
>
>


Re: [PATCH v6 0/11] add failover feature for assigned network devices
Posted by Michael S. Tsirkin 4 years, 6 months ago
I see at least comments from Markus.
You answered but don't you need to also tweak the patch?


On Mon, Oct 28, 2019 at 11:27:23AM +0100, Jens Freimann wrote:
> Hi Michael,
> 
> I addressed all comments and feedback and think this can be merged but
> I'm unclear about which tree it should go to. Will you merge it into
> the virtio-tree?
> 
> regards,
> Jens
> 
> On Fri, Oct 25, 2019 at 02:19:19PM +0200, Jens Freimann wrote:
> > This is implementing the host side of the net_failover concept
> > (https://www.kernel.org/doc/html/latest/networking/net_failover.html)
> > 
> > Changes since v5:
> > * rename net_failover_pair_id parameter/property to failover_pair_id
> > * in PCI code use pci_bus_is_express(). This won't fail on functions > 0
> > * make sure primary and standby can't be added to same PCI slot
> > * add documentation file in docs/ to virtio-net patch, add file to
> >   MAINTAINERS (added to networking devices section)
> > * add comment to QAPI event for failover negotiation, try to improve
> >   commit message
> > 
> > The general idea is that we have a pair of devices, a vfio-pci and a
> > virtio-net device. Before migration the vfio device is unplugged and data
> > flows to the virtio-net device, on the target side another vfio-pci device
> > is plugged in to take over the data-path. In the guest the net_failover
> > module will pair net devices with the same MAC address.
> > 
> > * Patch 1 adds the infrastructure to hide the device for the qbus and qdev APIs
> > 
> > * Patch 2 adds checks to PCIDevice for only allowing ethernet devices as
> >  failover primary and only PCIExpress capable devices
> > 
> > * Patch 3 sets a new flag for PCIDevice 'partially_hotplugged' which we
> >  use to skip the unrealize code path when doing a unplug of the primary
> >  device
> > 
> > * Patch 4 sets the pending_deleted_event before triggering the guest
> >  unplug request
> > 
> > * Patch 5 and 6 add new qmp events, one sends the device id of a device
> >  that was just requested to be unplugged from the guest and another one
> >  to let libvirt know if VIRTIO_NET_F_STANDBY was negotiated
> > 
> > * Patch 7 make sure that we can unplug the vfio-device before
> >  migration starts
> > 
> > * Patch 8 adds a new migration state that is entered while we wait for
> >  devices to be unplugged by guest OS
> > 
> > * Patch 9 just adds the new migration state to a check in libqos code
> > 
> > * Patch 10 In the second patch the virtio-net uses the API to defer adding the vfio
> >  device until the VIRTIO_NET_F_STANDBY feature is acked. It also
> >  implements the migration handler to unplug the device from the guest and
> >  re-plug in case of migration failure
> > 
> > * Patch 11 allows migration for failover vfio-pci devices
> > 
> > Previous discussion:
> >  RFC v1 https://patchwork.ozlabs.org/cover/989098/
> >  RFC v2 https://www.mail-archive.com/qemu-devel@nongnu.org/msg606906.html
> >  v1: https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg03968.html
> >  v2: https://www.mail-archive.com/qemu-devel@nongnu.org/msg635214.html
> >  v3: https://patchew.org/QEMU/20191011112015.11785-1-jfreimann@redhat.com/
> >  v4: https://patchew.org/QEMU/20191018202040.30349-1-jfreimann@redhat.com/
> >  v5: https://patchew.org/QEMU/20191023082711.16694-1-jfreimann@redhat.com/
> > 
> > To summarize concerns/feedback from previous discussion:
> > 1.- guest OS can reject or worse _delay_ unplug by any amount of time.
> >  Migration might get stuck for unpredictable time with unclear reason.
> >  This approach combines two tricky things, hot/unplug and migration.
> >  -> We need to let libvirt know what's happening. Add new qmp events
> >     and a new migration state. When a primary device is (partially)
> >     unplugged (only from guest) we send a qmp event with the device id. When
> >     it is unplugged from the guest the DEVICE_DELETED event is sent.
> >     Migration will enter the wait-unplug state while waiting for the guest
> >     os to unplug all primary devices and then move on with migration.
> > 2. PCI devices are a precious ressource. The primary device should never
> >  be added to QEMU if it won't be used by guest instead of hiding it in
> >  QEMU.
> >  -> We only hotplug the device when the standby feature bit was
> >     negotiated. We save the device cmdline options until we need it for
> >     qdev_device_add()
> >     Hiding a device can be a useful concept to model. For example a
> >     pci device in a powered-off slot could be marked as hidden until the slot is
> >     powered on (mst).
> > 3. Management layer software should handle this. Open Stack already has
> >  components/code to handle unplug/replug VFIO devices and metadata to
> >  provide to the guest for detecting which devices should be paired.
> >  -> An approach that includes all software from firmware to
> >     higher-level management software wasn't tried in the last years. This is
> >     an attempt to keep it simple and contained in QEMU as much as possible.
> >     One of the problems that stopped management software and libvirt from
> >     implementing this idea is that it can't be sure that it's possible to
> >     re-plug the primary device. By not freeing the devices resources in QEMU
> >     and only asking the guest OS to unplug it is possible to re-plug the
> >     device in case of a migration failure.
> > 4. Hotplugging a device and then making it part of a failover setup is
> >   not possible
> >  -> addressed by extending qdev hotplug functions to check for hidden
> >     attribute, so e.g. device_add can be used to plug a device.
> > 
> > 
> > I have tested this with a mlx5 and igb NIC and was able to migrate the VM.
> > 
> > Command line example:
> > 
> > qemu-system-x86_64 -enable-kvm -m 3072 -smp 3 \
> >        -machine q35,kernel-irqchip=split -cpu host   \
> >        -serial stdio   \
> >        -net none \
> >        -qmp unix:/tmp/qmp.socket,server,nowait \
> >        -monitor telnet:127.0.0.1:5555,server,nowait \
> >        -device pcie-root-port,id=root0,multifunction=on,chassis=0,addr=0xa \
> >        -device pcie-root-port,id=root1,bus=pcie.0,chassis=1 \
> >        -device pcie-root-port,id=root2,bus=pcie.0,chassis=2 \
> >        -netdev tap,script=/root/bin/bridge.sh,downscript=no,id=hostnet1,vhost=on \
> >        -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:6f:55:cc,bus=root2,failover=on \
> > 	-device vfio-pci,host=5e:00.2,id=hostdev0,bus=root1,failover_pair_id =net1 \
> >        /root/rhel-guest-image-8.0-1781.x86_64.qcow2
> > 
> > I'm grateful for any remarks or ideas!
> > 
> > Thanks!
> > 
> > regards,
> > Jens
> > 
> > 
> > Jens Freimann (11):
> >  qdev/qbus: add hidden device support
> >  pci: add option for net failover
> >  pci: mark devices partially unplugged
> >  pci: mark device having guest unplug request pending
> >  qapi: add unplug primary event
> >  qapi: add failover negotiated event
> >  migration: allow unplug during migration for failover devices
> >  migration: add new migration state wait-unplug
> >  libqos: tolerate wait-unplug migration state
> >  net/virtio: add failover support
> >  vfio: unplug failover primary device before migration
> > 
> > MAINTAINERS                    |   1 +
> > docs/virtio-net-failover.rst   |  68 ++++++++
> > hw/core/qdev.c                 |  25 +++
> > hw/net/virtio-net.c            | 302 +++++++++++++++++++++++++++++++++
> > hw/pci/pci.c                   |  32 ++++
> > hw/pci/pcie.c                  |   6 +
> > hw/vfio/pci.c                  |  26 ++-
> > hw/vfio/pci.h                  |   1 +
> > include/hw/pci/pci.h           |   4 +
> > include/hw/qdev-core.h         |  30 ++++
> > include/hw/virtio/virtio-net.h |  12 ++
> > include/hw/virtio/virtio.h     |   1 +
> > include/migration/vmstate.h    |   2 +
> > migration/migration.c          |  21 +++
> > migration/migration.h          |   3 +
> > migration/savevm.c             |  36 ++++
> > migration/savevm.h             |   2 +
> > qapi/migration.json            |  24 ++-
> > qapi/net.json                  |  19 +++
> > qdev-monitor.c                 |  43 ++++-
> > tests/libqos/libqos.c          |   3 +-
> > vl.c                           |   6 +-
> > 22 files changed, 652 insertions(+), 15 deletions(-)
> > create mode 100644 docs/virtio-net-failover.rst
> > 
> > -- 
> > 2.21.0
> > 
> > 
> 
> 

Re: [PATCH v6 0/11] add failover feature for assigned network devices
Posted by Jens Freimann 4 years, 6 months ago
On Mon, Oct 28, 2019 at 11:58:53AM -0400, Michael S. Tsirkin wrote:
>I see at least comments from Markus.
>You answered but don't you need to also tweak the patch?

That comment was addressed already IMO, but I'll change the patch
description as well and while I'm at it will also fix David's comment.

regards,
Jens


Re: [PATCH v6 0/11] add failover feature for assigned network devices
Posted by Michael S. Tsirkin 4 years, 6 months ago
On Fri, Oct 25, 2019 at 02:19:19PM +0200, Jens Freimann wrote:
> This is implementing the host side of the net_failover concept
> (https://www.kernel.org/doc/html/latest/networking/net_failover.html)

OK so I put this on a next branch for now, to merge after the release.
I plan to rebase it after rc1, and then stop rebasing.


> Changes since v5:
> * rename net_failover_pair_id parameter/property to failover_pair_id
> * in PCI code use pci_bus_is_express(). This won't fail on functions > 0
> * make sure primary and standby can't be added to same PCI slot
> * add documentation file in docs/ to virtio-net patch, add file to
>    MAINTAINERS (added to networking devices section)
> * add comment to QAPI event for failover negotiation, try to improve
>    commit message 
> 
> The general idea is that we have a pair of devices, a vfio-pci and a
> virtio-net device. Before migration the vfio device is unplugged and data
> flows to the virtio-net device, on the target side another vfio-pci device
> is plugged in to take over the data-path. In the guest the net_failover
> module will pair net devices with the same MAC address.
> 
> * Patch 1 adds the infrastructure to hide the device for the qbus and qdev APIs
> 
> * Patch 2 adds checks to PCIDevice for only allowing ethernet devices as
>   failover primary and only PCIExpress capable devices
> 
> * Patch 3 sets a new flag for PCIDevice 'partially_hotplugged' which we
>   use to skip the unrealize code path when doing a unplug of the primary
>   device
> 
> * Patch 4 sets the pending_deleted_event before triggering the guest
>   unplug request
> 
> * Patch 5 and 6 add new qmp events, one sends the device id of a device
>   that was just requested to be unplugged from the guest and another one
>   to let libvirt know if VIRTIO_NET_F_STANDBY was negotiated
> 
> * Patch 7 make sure that we can unplug the vfio-device before
>   migration starts
> 
> * Patch 8 adds a new migration state that is entered while we wait for
>   devices to be unplugged by guest OS
> 
> * Patch 9 just adds the new migration state to a check in libqos code
> 
> * Patch 10 In the second patch the virtio-net uses the API to defer adding the vfio
>   device until the VIRTIO_NET_F_STANDBY feature is acked. It also
>   implements the migration handler to unplug the device from the guest and
>   re-plug in case of migration failure
> 
> * Patch 11 allows migration for failover vfio-pci devices
> 
> Previous discussion:
>   RFC v1 https://patchwork.ozlabs.org/cover/989098/
>   RFC v2 https://www.mail-archive.com/qemu-devel@nongnu.org/msg606906.html
>   v1: https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg03968.html
>   v2: https://www.mail-archive.com/qemu-devel@nongnu.org/msg635214.html
>   v3: https://patchew.org/QEMU/20191011112015.11785-1-jfreimann@redhat.com/
>   v4: https://patchew.org/QEMU/20191018202040.30349-1-jfreimann@redhat.com/
>   v5: https://patchew.org/QEMU/20191023082711.16694-1-jfreimann@redhat.com/
> 
> To summarize concerns/feedback from previous discussion:
> 1.- guest OS can reject or worse _delay_ unplug by any amount of time.
>   Migration might get stuck for unpredictable time with unclear reason.
>   This approach combines two tricky things, hot/unplug and migration.
>   -> We need to let libvirt know what's happening. Add new qmp events
>      and a new migration state. When a primary device is (partially)
>      unplugged (only from guest) we send a qmp event with the device id. When
>      it is unplugged from the guest the DEVICE_DELETED event is sent.
>      Migration will enter the wait-unplug state while waiting for the guest
>      os to unplug all primary devices and then move on with migration.
> 2. PCI devices are a precious ressource. The primary device should never
>   be added to QEMU if it won't be used by guest instead of hiding it in
>   QEMU.
>   -> We only hotplug the device when the standby feature bit was
>      negotiated. We save the device cmdline options until we need it for
>      qdev_device_add()
>      Hiding a device can be a useful concept to model. For example a
>      pci device in a powered-off slot could be marked as hidden until the slot is
>      powered on (mst).
> 3. Management layer software should handle this. Open Stack already has
>   components/code to handle unplug/replug VFIO devices and metadata to
>   provide to the guest for detecting which devices should be paired.
>   -> An approach that includes all software from firmware to
>      higher-level management software wasn't tried in the last years. This is
>      an attempt to keep it simple and contained in QEMU as much as possible.
>      One of the problems that stopped management software and libvirt from
>      implementing this idea is that it can't be sure that it's possible to
>      re-plug the primary device. By not freeing the devices resources in QEMU
>      and only asking the guest OS to unplug it is possible to re-plug the
>      device in case of a migration failure.
> 4. Hotplugging a device and then making it part of a failover setup is
>    not possible
>   -> addressed by extending qdev hotplug functions to check for hidden
>      attribute, so e.g. device_add can be used to plug a device.
> 
> 
> I have tested this with a mlx5 and igb NIC and was able to migrate the VM.
> 
> Command line example:
> 
> qemu-system-x86_64 -enable-kvm -m 3072 -smp 3 \
>         -machine q35,kernel-irqchip=split -cpu host   \
>         -serial stdio   \
>         -net none \
>         -qmp unix:/tmp/qmp.socket,server,nowait \
>         -monitor telnet:127.0.0.1:5555,server,nowait \
>         -device pcie-root-port,id=root0,multifunction=on,chassis=0,addr=0xa \
>         -device pcie-root-port,id=root1,bus=pcie.0,chassis=1 \
>         -device pcie-root-port,id=root2,bus=pcie.0,chassis=2 \
>         -netdev tap,script=/root/bin/bridge.sh,downscript=no,id=hostnet1,vhost=on \
>         -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:6f:55:cc,bus=root2,failover=on \
> 	-device vfio-pci,host=5e:00.2,id=hostdev0,bus=root1,failover_pair_id =net1 \
>         /root/rhel-guest-image-8.0-1781.x86_64.qcow2
> 
> I'm grateful for any remarks or ideas!
> 
> Thanks!
> 
> regards,
> Jens 
> 
> 
> Jens Freimann (11):
>   qdev/qbus: add hidden device support
>   pci: add option for net failover
>   pci: mark devices partially unplugged
>   pci: mark device having guest unplug request pending
>   qapi: add unplug primary event
>   qapi: add failover negotiated event
>   migration: allow unplug during migration for failover devices
>   migration: add new migration state wait-unplug
>   libqos: tolerate wait-unplug migration state
>   net/virtio: add failover support
>   vfio: unplug failover primary device before migration
> 
>  MAINTAINERS                    |   1 +
>  docs/virtio-net-failover.rst   |  68 ++++++++
>  hw/core/qdev.c                 |  25 +++
>  hw/net/virtio-net.c            | 302 +++++++++++++++++++++++++++++++++
>  hw/pci/pci.c                   |  32 ++++
>  hw/pci/pcie.c                  |   6 +
>  hw/vfio/pci.c                  |  26 ++-
>  hw/vfio/pci.h                  |   1 +
>  include/hw/pci/pci.h           |   4 +
>  include/hw/qdev-core.h         |  30 ++++
>  include/hw/virtio/virtio-net.h |  12 ++
>  include/hw/virtio/virtio.h     |   1 +
>  include/migration/vmstate.h    |   2 +
>  migration/migration.c          |  21 +++
>  migration/migration.h          |   3 +
>  migration/savevm.c             |  36 ++++
>  migration/savevm.h             |   2 +
>  qapi/migration.json            |  24 ++-
>  qapi/net.json                  |  19 +++
>  qdev-monitor.c                 |  43 ++++-
>  tests/libqos/libqos.c          |   3 +-
>  vl.c                           |   6 +-
>  22 files changed, 652 insertions(+), 15 deletions(-)
>  create mode 100644 docs/virtio-net-failover.rst
> 
> -- 
> 2.21.0