[Qemu-devel] [RFC PATCH 0/2] implement the failover feature for assigned network devices

Jens Freimann posted 2 patches 5 years, 1 month ago
Test docker-mingw@fedora passed
Test docker-clang@ubuntu passed
Test asan passed
Test checkpatch failed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20190322134447.14831-1-jfreimann@redhat.com
Maintainers: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>, Paolo Bonzini <pbonzini@redhat.com>, Jason Wang <jasowang@redhat.com>, "Michael S. Tsirkin" <mst@redhat.com>
hw/core/qdev.c                 | 27 ++++++++++
hw/net/virtio-net.c            | 95 ++++++++++++++++++++++++++++++++++
hw/pci/pci.c                   |  1 +
include/hw/pci/pci.h           |  2 +
include/hw/qdev-core.h         |  8 +++
include/hw/virtio/virtio-net.h |  7 +++
qdev-monitor.c                 | 48 +++++++++++++++--
vl.c                           |  7 ++-
8 files changed, 189 insertions(+), 6 deletions(-)
[Qemu-devel] [RFC PATCH 0/2] implement the failover feature for assigned network devices
Posted by Jens Freimann 5 years, 1 month ago
This is another attempt at implementing the host side of the
net_failover concept
(https://www.kernel.org/doc/html/latest/networking/net_failover.html)

The general idea is that we have a pair of devices, a vfio-pci and a
emulated device. Before migration the vfio device is unplugged and data
flows to the emulated device, on the target side another vfio-pci device
is plugged in to take over the data-path. In the guest the net_failover
module will pair net devices with the same MAC address.

* In the first patch the infrastructure for hiding the device is added
  for the qbus and qdev APIs. A "hidden" boolean is added to the device
  state and it is set based on a callback to the standby device which
  registers itself for handling the assessment: "should the primary device
  be hidden?" by cross validating the ids of the devices.

* In the second patch the virtio-net uses the API to hide the vfio
  device and unhides it when the feature is acked.

Previous discussion: https://patchwork.ozlabs.org/cover/989098/

To summarize concerns/feedback from previous discussion:
1.- guest OS can reject or worse _delay_ unplug by any amount of time.
  Migration might get stuck for unpredictable time with unclear reason.
  This approach combines two tricky things, hot/unplug and migration. 
  -> We can surprise-remove the PCI device and in QEMU we can do all
     necessary rollbacks transparent to management software. Will it be
     easy, probably not.
2. PCI devices are a precious ressource. The primary device should never
  be added to QEMU if it won't be used by guest instead of hiding it in
  QEMU. 
  -> We only hotplug the device when the standby feature bit was
     negotiated. We save the device cmdline options until we need it for
     qdev_device_add()
     Hiding a device can be a useful concept to model. For example a
     pci device in a powered-off slot could be marked as hidden until the slot is
     powered on (mst).
3. Management layer software should handle this. Open Stack already has
  components/code to handle unplug/replug VFIO devices and metadata to
  provide to the guest for detecting which devices should be paired.
  -> An approach that includes all software from firmware to
     higher-level management software wasn't tried in the last years. This is
     an attempt to keep it simple and contained in QEMU as much as possible.
4. Hotplugging a device and then making it part of a failover setup is
   not possible
  -> addressed by extending qdev hotplug functions to check for hidden
     attribute, so e.g. device_add can be used to plug a device.

There are still some open issues:

Migration: I'm looking for something like a pre-migration hook that I
could use to unplug the vfio-pci device. I tried with a migration
notifier but it is called to late, i.e. after migration is aborted due
to vfio-pci marked unmigrateable. I worked around this by setting it
to migrateable and used a migration notifier on the virtio-net device.

Commandline: There is a dependency between vfio-pci and virtio-net
devices. One points to the other via new parameters
primar=<primary qdev id> and standby='<standby qdev id>'. This means
that the primary device needs to be specified after standby device on 
the qemu command line. Not sure how to solve this. 

Error handling: Patches don't cover all possible error scenarios yet.

I have tested this with a mlx5 NIC and was able to migrate the VM with
above mentioned workarounds for open problems.

Command line example:

qemu-system-x86_64 -enable-kvm -m 3072 -smp 3 \
        -machine q35,kernel-irqchip=split -cpu host   \
        -k fr   \
        -serial stdio   \
        -net none \
        -qmp unix:/tmp/qmp.socket,server,nowait \
        -monitor telnet:127.0.0.1:5555,server,nowait \
        -device pcie-root-port,id=root0,multifunction=on,chassis=0,addr=0xa \
        -device pcie-root-port,id=root1,bus=pcie.0,chassis=1 \
        -device pcie-root-port,id=root2,bus=pcie.0,chassis=2 \
        -netdev tap,script=/root/bin/bridge.sh,downscript=no,id=hostnet1,vhost=on \
        -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:6f:55:cc,bus=root2,primary=hostdev0 \                                                                                     
        -device vfio-pci,host=5e:00.2,id=hostdev0,bus=root1,standby=net1 \
        /root/rhel-guest-image-8.0-1781.x86_64.qcow2

I'm grateful for any remarks or ideas!

Thanks!

regards,
Jens 

Sameeh Jubran (2):
  qdev/qbus: Add hidden device support
  net/virtio: add failover support

 hw/core/qdev.c                 | 27 ++++++++++
 hw/net/virtio-net.c            | 95 ++++++++++++++++++++++++++++++++++
 hw/pci/pci.c                   |  1 +
 include/hw/pci/pci.h           |  2 +
 include/hw/qdev-core.h         |  8 +++
 include/hw/virtio/virtio-net.h |  7 +++
 qdev-monitor.c                 | 48 +++++++++++++++--
 vl.c                           |  7 ++-
 8 files changed, 189 insertions(+), 6 deletions(-)

-- 
2.20.1


Re: [Qemu-devel] [RFC PATCH 0/2] implement the failover feature for assigned network devices
Posted by Jens Freimann 5 years ago
ping 

FYI: I'm also working on a few related tools to detect driver behaviour when
assigning a MAC to the vf device. Code is at https://github.com/jensfr/netfailover_driver_detect

regards,
Jens 

On Fri, Mar 22, 2019 at 02:44:45PM +0100, Jens Freimann wrote:
>This is another attempt at implementing the host side of the
>net_failover concept
>(https://www.kernel.org/doc/html/latest/networking/net_failover.html)
>
>The general idea is that we have a pair of devices, a vfio-pci and a
>emulated device. Before migration the vfio device is unplugged and data
>flows to the emulated device, on the target side another vfio-pci device
>is plugged in to take over the data-path. In the guest the net_failover
>module will pair net devices with the same MAC address.
>
>* In the first patch the infrastructure for hiding the device is added
>  for the qbus and qdev APIs. A "hidden" boolean is added to the device
>  state and it is set based on a callback to the standby device which
>  registers itself for handling the assessment: "should the primary device
>  be hidden?" by cross validating the ids of the devices.
>
>* In the second patch the virtio-net uses the API to hide the vfio
>  device and unhides it when the feature is acked.
>
>Previous discussion: https://patchwork.ozlabs.org/cover/989098/
>
>To summarize concerns/feedback from previous discussion:
>1.- guest OS can reject or worse _delay_ unplug by any amount of time.
>  Migration might get stuck for unpredictable time with unclear reason.
>  This approach combines two tricky things, hot/unplug and migration.
>  -> We can surprise-remove the PCI device and in QEMU we can do all
>     necessary rollbacks transparent to management software. Will it be
>     easy, probably not.
>2. PCI devices are a precious ressource. The primary device should never
>  be added to QEMU if it won't be used by guest instead of hiding it in
>  QEMU.
>  -> We only hotplug the device when the standby feature bit was
>     negotiated. We save the device cmdline options until we need it for
>     qdev_device_add()
>     Hiding a device can be a useful concept to model. For example a
>     pci device in a powered-off slot could be marked as hidden until the slot is
>     powered on (mst).
>3. Management layer software should handle this. Open Stack already has
>  components/code to handle unplug/replug VFIO devices and metadata to
>  provide to the guest for detecting which devices should be paired.
>  -> An approach that includes all software from firmware to
>     higher-level management software wasn't tried in the last years. This is
>     an attempt to keep it simple and contained in QEMU as much as possible.
>4. Hotplugging a device and then making it part of a failover setup is
>   not possible
>  -> addressed by extending qdev hotplug functions to check for hidden
>     attribute, so e.g. device_add can be used to plug a device.
>
>There are still some open issues:
>
>Migration: I'm looking for something like a pre-migration hook that I
>could use to unplug the vfio-pci device. I tried with a migration
>notifier but it is called to late, i.e. after migration is aborted due
>to vfio-pci marked unmigrateable. I worked around this by setting it
>to migrateable and used a migration notifier on the virtio-net device.
>
>Commandline: There is a dependency between vfio-pci and virtio-net
>devices. One points to the other via new parameters
>primar=<primary qdev id> and standby='<standby qdev id>'. This means
>that the primary device needs to be specified after standby device on
>the qemu command line. Not sure how to solve this.
>
>Error handling: Patches don't cover all possible error scenarios yet.
>
>I have tested this with a mlx5 NIC and was able to migrate the VM with
>above mentioned workarounds for open problems.
>
>Command line example:
>
>qemu-system-x86_64 -enable-kvm -m 3072 -smp 3 \
>        -machine q35,kernel-irqchip=split -cpu host   \
>        -k fr   \
>        -serial stdio   \
>        -net none \
>        -qmp unix:/tmp/qmp.socket,server,nowait \
>        -monitor telnet:127.0.0.1:5555,server,nowait \
>        -device pcie-root-port,id=root0,multifunction=on,chassis=0,addr=0xa \
>        -device pcie-root-port,id=root1,bus=pcie.0,chassis=1 \
>        -device pcie-root-port,id=root2,bus=pcie.0,chassis=2 \
>        -netdev tap,script=/root/bin/bridge.sh,downscript=no,id=hostnet1,vhost=on \
>        -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:6f:55:cc,bus=root2,primary=hostdev0 \
>        -device vfio-pci,host=5e:00.2,id=hostdev0,bus=root1,standby=net1 \
>        /root/rhel-guest-image-8.0-1781.x86_64.qcow2
>
>I'm grateful for any remarks or ideas!
>
>Thanks!
>
>regards,
>Jens
>
>Sameeh Jubran (2):
>  qdev/qbus: Add hidden device support
>  net/virtio: add failover support
>
> hw/core/qdev.c                 | 27 ++++++++++
> hw/net/virtio-net.c            | 95 ++++++++++++++++++++++++++++++++++
> hw/pci/pci.c                   |  1 +
> include/hw/pci/pci.h           |  2 +
> include/hw/qdev-core.h         |  8 +++
> include/hw/virtio/virtio-net.h |  7 +++
> qdev-monitor.c                 | 48 +++++++++++++++--
> vl.c                           |  7 ++-
> 8 files changed, 189 insertions(+), 6 deletions(-)
>
>-- 
>2.20.1
>
>

Re: [Qemu-devel] [RFC PATCH 0/2] implement the failover feature for assigned network devices
Posted by Dr. David Alan Gilbert 5 years ago
* Jens Freimann (jfreimann@redhat.com) wrote:
> ping
> 
> FYI: I'm also working on a few related tools to detect driver behaviour when
> assigning a MAC to the vf device. Code is at https://github.com/jensfr/netfailover_driver_detect

Hi Jens,
  I've not been following this too uch, but:

> regards,
> Jens
> 
> On Fri, Mar 22, 2019 at 02:44:45PM +0100, Jens Freimann wrote:
> > This is another attempt at implementing the host side of the
> > net_failover concept
> > (https://www.kernel.org/doc/html/latest/networking/net_failover.html)
> > 
> > The general idea is that we have a pair of devices, a vfio-pci and a
> > emulated device. Before migration the vfio device is unplugged and data
> > flows to the emulated device, on the target side another vfio-pci device
> > is plugged in to take over the data-path. In the guest the net_failover
> > module will pair net devices with the same MAC address.
> > 
> > * In the first patch the infrastructure for hiding the device is added
> >  for the qbus and qdev APIs. A "hidden" boolean is added to the device
> >  state and it is set based on a callback to the standby device which
> >  registers itself for handling the assessment: "should the primary device
> >  be hidden?" by cross validating the ids of the devices.
> > 
> > * In the second patch the virtio-net uses the API to hide the vfio
> >  device and unhides it when the feature is acked.
> > 
> > Previous discussion: https://patchwork.ozlabs.org/cover/989098/
> > 
> > To summarize concerns/feedback from previous discussion:
> > 1.- guest OS can reject or worse _delay_ unplug by any amount of time.
> >  Migration might get stuck for unpredictable time with unclear reason.
> >  This approach combines two tricky things, hot/unplug and migration.
> >  -> We can surprise-remove the PCI device and in QEMU we can do all
> >     necessary rollbacks transparent to management software. Will it be
> >     easy, probably not.

This sounds 'fun' - bonus cases are things like what happens if the
guest gets rebooted somewhere during the process or if it's currently
sitting in the bios/grub/etc

> > 2. PCI devices are a precious ressource. The primary device should never
> >  be added to QEMU if it won't be used by guest instead of hiding it in
> >  QEMU.
> >  -> We only hotplug the device when the standby feature bit was
> >     negotiated. We save the device cmdline options until we need it for
> >     qdev_device_add()
> >     Hiding a device can be a useful concept to model. For example a
> >     pci device in a powered-off slot could be marked as hidden until the slot is
> >     powered on (mst).

Are they really that precious? Personally it's not something I'd worry
about.

> > 3. Management layer software should handle this. Open Stack already has
> >  components/code to handle unplug/replug VFIO devices and metadata to
> >  provide to the guest for detecting which devices should be paired.
> >  -> An approach that includes all software from firmware to
> >     higher-level management software wasn't tried in the last years. This is
> >     an attempt to keep it simple and contained in QEMU as much as possible.
> > 4. Hotplugging a device and then making it part of a failover setup is
> >   not possible
> >  -> addressed by extending qdev hotplug functions to check for hidden
> >     attribute, so e.g. device_add can be used to plug a device.
> > 
> > There are still some open issues:
> > 
> > Migration: I'm looking for something like a pre-migration hook that I
> > could use to unplug the vfio-pci device. I tried with a migration
> > notifier but it is called to late, i.e. after migration is aborted due
> > to vfio-pci marked unmigrateable. I worked around this by setting it
> > to migrateable and used a migration notifier on the virtio-net device.

Why not just let this happen at the libvirt level; then you do the
hotunplug etc before you actually tell qemu anything about starting a
migration?

> > Commandline: There is a dependency between vfio-pci and virtio-net
> > devices. One points to the other via new parameters
> > primar=<primary qdev id> and standby='<standby qdev id>'. This means
> > that the primary device needs to be specified after standby device on
> > the qemu command line. Not sure how to solve this.
> > 
> > Error handling: Patches don't cover all possible error scenarios yet.
> > 
> > I have tested this with a mlx5 NIC and was able to migrate the VM with
> > above mentioned workarounds for open problems.
> > 
> > Command line example:
> > 
> > qemu-system-x86_64 -enable-kvm -m 3072 -smp 3 \
> >        -machine q35,kernel-irqchip=split -cpu host   \
> >        -k fr   \
> >        -serial stdio   \
> >        -net none \
> >        -qmp unix:/tmp/qmp.socket,server,nowait \
> >        -monitor telnet:127.0.0.1:5555,server,nowait \
> >        -device pcie-root-port,id=root0,multifunction=on,chassis=0,addr=0xa \
> >        -device pcie-root-port,id=root1,bus=pcie.0,chassis=1 \
> >        -device pcie-root-port,id=root2,bus=pcie.0,chassis=2 \
> >        -netdev tap,script=/root/bin/bridge.sh,downscript=no,id=hostnet1,vhost=on \
> >        -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:6f:55:cc,bus=root2,primary=hostdev0 \
> >        -device vfio-pci,host=5e:00.2,id=hostdev0,bus=root1,standby=net1 \

Yes, that's a bit grim; it's circular dependency on the 'hostdev0' and
'net1' id's.  cc'ing in Markus.

Dave

> >        /root/rhel-guest-image-8.0-1781.x86_64.qcow2
> > 
> > I'm grateful for any remarks or ideas!
> > 
> > Thanks!
> > 
> > regards,
> > Jens
> > 
> > Sameeh Jubran (2):
> >  qdev/qbus: Add hidden device support
> >  net/virtio: add failover support
> > 
> > hw/core/qdev.c                 | 27 ++++++++++
> > hw/net/virtio-net.c            | 95 ++++++++++++++++++++++++++++++++++
> > hw/pci/pci.c                   |  1 +
> > include/hw/pci/pci.h           |  2 +
> > include/hw/qdev-core.h         |  8 +++
> > include/hw/virtio/virtio-net.h |  7 +++
> > qdev-monitor.c                 | 48 +++++++++++++++--
> > vl.c                           |  7 ++-
> > 8 files changed, 189 insertions(+), 6 deletions(-)
> > 
> > -- 
> > 2.20.1
> > 
> > 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Re: [Qemu-devel] [RFC PATCH 0/2] implement the failover feature for assigned network devices
Posted by Jens Freimann 5 years ago
On Fri, Apr 05, 2019 at 09:56:29AM +0100, Dr. David Alan Gilbert wrote:
>* Jens Freimann (jfreimann@redhat.com) wrote:
[...]
>> > To summarize concerns/feedback from previous discussion:
>> > 1.- guest OS can reject or worse _delay_ unplug by any amount of time.
>> >  Migration might get stuck for unpredictable time with unclear reason.
>> >  This approach combines two tricky things, hot/unplug and migration.
>> >  -> We can surprise-remove the PCI device and in QEMU we can do all
>> >     necessary rollbacks transparent to management software. Will it be
>> >     easy, probably not.
>
>This sounds 'fun' - bonus cases are things like what happens if the
>guest gets rebooted somewhere during the process or if it's currently
>sitting in the bios/grub/etc

Yeah, I have to think about this...

>> > 2. PCI devices are a precious ressource. The primary device should never
>> >  be added to QEMU if it won't be used by guest instead of hiding it in
>> >  QEMU.
>> >  -> We only hotplug the device when the standby feature bit was
>> >     negotiated. We save the device cmdline options until we need it for
>> >     qdev_device_add()
>> >     Hiding a device can be a useful concept to model. For example a
>> >     pci device in a powered-off slot could be marked as hidden until the slot is
>> >     powered on (mst).
>
>Are they really that precious? Personally it's not something I'd worry
>about.
>
>> > 3. Management layer software should handle this. Open Stack already has
>> >  components/code to handle unplug/replug VFIO devices and metadata to
>> >  provide to the guest for detecting which devices should be paired.
>> >  -> An approach that includes all software from firmware to
>> >     higher-level management software wasn't tried in the last years. This is
>> >     an attempt to keep it simple and contained in QEMU as much as possible.
>> > 4. Hotplugging a device and then making it part of a failover setup is
>> >   not possible
>> >  -> addressed by extending qdev hotplug functions to check for hidden
>> >     attribute, so e.g. device_add can be used to plug a device.
>> >
>> > There are still some open issues:
>> >
>> > Migration: I'm looking for something like a pre-migration hook that I
>> > could use to unplug the vfio-pci device. I tried with a migration
>> > notifier but it is called to late, i.e. after migration is aborted due
>> > to vfio-pci marked unmigrateable. I worked around this by setting it
>> > to migrateable and used a migration notifier on the virtio-net device.
>
>Why not just let this happen at the libvirt level; then you do the
>hotunplug etc before you actually tell qemu anything about starting a
>migration?

Yes...the goal was to see if we can contain changes to QEMU (to keep
it simple, although it seems that covering all error cases won't be that simple :).
But I don't see a mechanism to trigger the unplug at the right moment
yet.  So yes, maybe there's no way around involving libvirt at least
for this part...

>> > Commandline: There is a dependency between vfio-pci and virtio-net
>> > devices. One points to the other via new parameters
>> > primar=<primary qdev id> and standby='<standby qdev id>'. This means
>> > that the primary device needs to be specified after standby device on
>> > the qemu command line. Not sure how to solve this.
>> >
>> > Error handling: Patches don't cover all possible error scenarios yet.
>> >
>> > I have tested this with a mlx5 NIC and was able to migrate the VM with
>> > above mentioned workarounds for open problems.
>> >
>> > Command line example:
>> >
>> > qemu-system-x86_64 -enable-kvm -m 3072 -smp 3 \
>> >        -machine q35,kernel-irqchip=split -cpu host   \
>> >        -k fr   \
>> >        -serial stdio   \
>> >        -net none \
>> >        -qmp unix:/tmp/qmp.socket,server,nowait \
>> >        -monitor telnet:127.0.0.1:5555,server,nowait \
>> >        -device pcie-root-port,id=root0,multifunction=on,chassis=0,addr=0xa \
>> >        -device pcie-root-port,id=root1,bus=pcie.0,chassis=1 \
>> >        -device pcie-root-port,id=root2,bus=pcie.0,chassis=2 \
>> >        -netdev tap,script=/root/bin/bridge.sh,downscript=no,id=hostnet1,vhost=on \
>> >        -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:6f:55:cc,bus=root2,primary=hostdev0 \
>> >        -device vfio-pci,host=5e:00.2,id=hostdev0,bus=root1,standby=net1 \
>
>Yes, that's a bit grim; it's circular dependency on the 'hostdev0' and
>'net1' id's.  cc'ing in Markus.

Dan had an idea how to avoid having to specify the id for the
virtio-net device. I'm currently looking into it, but it seems like it
should work. 


regards,
Jens 

Re: [Qemu-devel] [RFC PATCH 0/2] implement the failover feature for assigned network devices
Posted by Markus Armbruster 5 years ago
Jens Freimann <jfreimann@redhat.com> writes:

> On Fri, Apr 05, 2019 at 09:56:29AM +0100, Dr. David Alan Gilbert wrote:
>>* Jens Freimann (jfreimann@redhat.com) wrote:
> [...]
>>> > Commandline: There is a dependency between vfio-pci and virtio-net
>>> > devices. One points to the other via new parameters
>>> > primar=<primary qdev id> and standby='<standby qdev id>'. This means
>>> > that the primary device needs to be specified after standby device on
>>> > the qemu command line. Not sure how to solve this.
>>> >
>>> > Error handling: Patches don't cover all possible error scenarios yet.
>>> >
>>> > I have tested this with a mlx5 NIC and was able to migrate the VM with
>>> > above mentioned workarounds for open problems.
>>> >
>>> > Command line example:
>>> >
>>> > qemu-system-x86_64 -enable-kvm -m 3072 -smp 3 \
>>> >        -machine q35,kernel-irqchip=split -cpu host   \
>>> >        -k fr   \
>>> >        -serial stdio   \
>>> >        -net none \
>>> >        -qmp unix:/tmp/qmp.socket,server,nowait \
>>> >        -monitor telnet:127.0.0.1:5555,server,nowait \
>>> >        -device pcie-root-port,id=root0,multifunction=on,chassis=0,addr=0xa \
>>> >        -device pcie-root-port,id=root1,bus=pcie.0,chassis=1 \
>>> >        -device pcie-root-port,id=root2,bus=pcie.0,chassis=2 \
>>> >        -netdev tap,script=/root/bin/bridge.sh,downscript=no,id=hostnet1,vhost=on \
>>> >        -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:6f:55:cc,bus=root2,primary=hostdev0 \
>>> >        -device vfio-pci,host=5e:00.2,id=hostdev0,bus=root1,standby=net1 \
>>
>>Yes, that's a bit grim; it's circular dependency on the 'hostdev0' and
>>'net1' id's.  cc'ing in Markus.
>
> Dan had an idea how to avoid having to specify the id for the
> virtio-net device. I'm currently looking into it, but it seems like it
> should work. 

Excellent.  A circular dependency between -device could only lead to
trouble.

Re: [Qemu-devel] [RFC PATCH 0/2] implement the failover feature for assigned network devices
Posted by Michael S. Tsirkin 5 years ago
On Fri, Apr 05, 2019 at 09:56:29AM +0100, Dr. David Alan Gilbert wrote:
> * Jens Freimann (jfreimann@redhat.com) wrote:
> > ping
> > 
> > FYI: I'm also working on a few related tools to detect driver behaviour when
> > assigning a MAC to the vf device. Code is at https://github.com/jensfr/netfailover_driver_detect
> 
> Hi Jens,
>   I've not been following this too uch, but:
> 
> > regards,
> > Jens
> > 
> > On Fri, Mar 22, 2019 at 02:44:45PM +0100, Jens Freimann wrote:
> > > This is another attempt at implementing the host side of the
> > > net_failover concept
> > > (https://www.kernel.org/doc/html/latest/networking/net_failover.html)
> > > 
> > > The general idea is that we have a pair of devices, a vfio-pci and a
> > > emulated device. Before migration the vfio device is unplugged and data
> > > flows to the emulated device, on the target side another vfio-pci device
> > > is plugged in to take over the data-path. In the guest the net_failover
> > > module will pair net devices with the same MAC address.
> > > 
> > > * In the first patch the infrastructure for hiding the device is added
> > >  for the qbus and qdev APIs. A "hidden" boolean is added to the device
> > >  state and it is set based on a callback to the standby device which
> > >  registers itself for handling the assessment: "should the primary device
> > >  be hidden?" by cross validating the ids of the devices.
> > > 
> > > * In the second patch the virtio-net uses the API to hide the vfio
> > >  device and unhides it when the feature is acked.
> > > 
> > > Previous discussion: https://patchwork.ozlabs.org/cover/989098/
> > > 
> > > To summarize concerns/feedback from previous discussion:
> > > 1.- guest OS can reject or worse _delay_ unplug by any amount of time.
> > >  Migration might get stuck for unpredictable time with unclear reason.
> > >  This approach combines two tricky things, hot/unplug and migration.
> > >  -> We can surprise-remove the PCI device and in QEMU we can do all
> > >     necessary rollbacks transparent to management software. Will it be
> > >     easy, probably not.
> 
> This sounds 'fun' - bonus cases are things like what happens if the
> guest gets rebooted somewhere during the process or if it's currently
> sitting in the bios/grub/etc

Um, during which process? Guests are gradually fixed to support
surprise removal well. Part of it is thunderbolt which makes
it incredibly easy. Yes - bios/grub will need to learn to
handle this well.

> > > 2. PCI devices are a precious ressource. The primary device should never
> > >  be added to QEMU if it won't be used by guest instead of hiding it in
> > >  QEMU.
> > >  -> We only hotplug the device when the standby feature bit was
> > >     negotiated. We save the device cmdline options until we need it for
> > >     qdev_device_add()
> > >     Hiding a device can be a useful concept to model. For example a
> > >     pci device in a powered-off slot could be marked as hidden until the slot is
> > >     powered on (mst).
> 
> Are they really that precious? Personally it's not something I'd worry
> about.
> 
> > > 3. Management layer software should handle this. Open Stack already has
> > >  components/code to handle unplug/replug VFIO devices and metadata to
> > >  provide to the guest for detecting which devices should be paired.
> > >  -> An approach that includes all software from firmware to
> > >     higher-level management software wasn't tried in the last years. This is
> > >     an attempt to keep it simple and contained in QEMU as much as possible.
> > > 4. Hotplugging a device and then making it part of a failover setup is
> > >   not possible
> > >  -> addressed by extending qdev hotplug functions to check for hidden
> > >     attribute, so e.g. device_add can be used to plug a device.
> > > 
> > > There are still some open issues:
> > > 
> > > Migration: I'm looking for something like a pre-migration hook that I
> > > could use to unplug the vfio-pci device. I tried with a migration
> > > notifier but it is called to late, i.e. after migration is aborted due
> > > to vfio-pci marked unmigrateable. I worked around this by setting it
> > > to migrateable and used a migration notifier on the virtio-net device.
> 
> Why not just let this happen at the libvirt level; then you do the
> hotunplug etc before you actually tell qemu anything about starting a
> migration?

If qemu frees up resources (as it does on unplug) then libvirt
is not guaranteed it can roll the change back on e.g.
migration failure.

But really another issue is simply that it's a mechanism,
there's no policy that management needs to decide on.
Doing it at lowest possible level ensures all
upper layers benefit with minimal pain.

> > > Commandline: There is a dependency between vfio-pci and virtio-net
> > > devices. One points to the other via new parameters
> > > primar=<primary qdev id> and standby='<standby qdev id>'. This means
> > > that the primary device needs to be specified after standby device on
> > > the qemu command line. Not sure how to solve this.
> > > 
> > > Error handling: Patches don't cover all possible error scenarios yet.
> > > 
> > > I have tested this with a mlx5 NIC and was able to migrate the VM with
> > > above mentioned workarounds for open problems.
> > > 
> > > Command line example:
> > > 
> > > qemu-system-x86_64 -enable-kvm -m 3072 -smp 3 \
> > >        -machine q35,kernel-irqchip=split -cpu host   \
> > >        -k fr   \
> > >        -serial stdio   \
> > >        -net none \
> > >        -qmp unix:/tmp/qmp.socket,server,nowait \
> > >        -monitor telnet:127.0.0.1:5555,server,nowait \
> > >        -device pcie-root-port,id=root0,multifunction=on,chassis=0,addr=0xa \
> > >        -device pcie-root-port,id=root1,bus=pcie.0,chassis=1 \
> > >        -device pcie-root-port,id=root2,bus=pcie.0,chassis=2 \
> > >        -netdev tap,script=/root/bin/bridge.sh,downscript=no,id=hostnet1,vhost=on \
> > >        -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:6f:55:cc,bus=root2,primary=hostdev0 \
> > >        -device vfio-pci,host=5e:00.2,id=hostdev0,bus=root1,standby=net1 \
> 
> Yes, that's a bit grim; it's circular dependency on the 'hostdev0' and
> 'net1' id's.  cc'ing in Markus.
> 
> Dave
> 
> > >        /root/rhel-guest-image-8.0-1781.x86_64.qcow2
> > > 
> > > I'm grateful for any remarks or ideas!
> > > 
> > > Thanks!
> > > 
> > > regards,
> > > Jens
> > > 
> > > Sameeh Jubran (2):
> > >  qdev/qbus: Add hidden device support
> > >  net/virtio: add failover support
> > > 
> > > hw/core/qdev.c                 | 27 ++++++++++
> > > hw/net/virtio-net.c            | 95 ++++++++++++++++++++++++++++++++++
> > > hw/pci/pci.c                   |  1 +
> > > include/hw/pci/pci.h           |  2 +
> > > include/hw/qdev-core.h         |  8 +++
> > > include/hw/virtio/virtio-net.h |  7 +++
> > > qdev-monitor.c                 | 48 +++++++++++++++--
> > > vl.c                           |  7 ++-
> > > 8 files changed, 189 insertions(+), 6 deletions(-)
> > > 
> > > -- 
> > > 2.20.1
> > > 
> > > 
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Re: [Qemu-devel] [RFC PATCH 0/2] implement the failover feature for assigned network devices
Posted by Eduardo Habkost 5 years ago
On Fri, Apr 05, 2019 at 07:22:35PM -0400, Michael S. Tsirkin wrote:
> On Fri, Apr 05, 2019 at 09:56:29AM +0100, Dr. David Alan Gilbert wrote:
> > * Jens Freimann (jfreimann@redhat.com) wrote:
> > > On Fri, Mar 22, 2019 at 02:44:45PM +0100, Jens Freimann wrote:
[...]
> > > > 3. Management layer software should handle this. Open Stack already has
> > > >  components/code to handle unplug/replug VFIO devices and metadata to
> > > >  provide to the guest for detecting which devices should be paired.
> > > >  -> An approach that includes all software from firmware to
> > > >     higher-level management software wasn't tried in the last years. This is
> > > >     an attempt to keep it simple and contained in QEMU as much as possible.
> > > > 4. Hotplugging a device and then making it part of a failover setup is
> > > >   not possible
> > > >  -> addressed by extending qdev hotplug functions to check for hidden
> > > >     attribute, so e.g. device_add can be used to plug a device.
> > > > 
> > > > There are still some open issues:
> > > > 
> > > > Migration: I'm looking for something like a pre-migration hook that I
> > > > could use to unplug the vfio-pci device. I tried with a migration
> > > > notifier but it is called to late, i.e. after migration is aborted due
> > > > to vfio-pci marked unmigrateable. I worked around this by setting it
> > > > to migrateable and used a migration notifier on the virtio-net device.
> > 
> > Why not just let this happen at the libvirt level; then you do the
> > hotunplug etc before you actually tell qemu anything about starting a
> > migration?
> 
> If qemu frees up resources (as it does on unplug) then libvirt
> is not guaranteed it can roll the change back on e.g.
> migration failure.

Why should we always free up resources on unplug?

Unplug of a disk device doesn't close the corresponding -blockdev.
Unplug of a serial device doesn't close the corresponding -chardev.
Unplug of a memory device doesn't close the corresponding memory backend.
Unplug of a crypto device doesn't close the corresponding crypto backend.

Why do we expect device_del of a passthrough PCI device to always
release the host side PCI device?  We can provide a better API
than that.


> 
> But really another issue is simply that it's a mechanism,
> there's no policy that management needs to decide on.
> Doing it at lowest possible level ensures all
> upper layers benefit with minimal pain.

I don't see a problem in trying to make this work with surprise
removal too.  But if it is also possible to make this work
without surprise removal support on the guest side, why not
provide the mechanisms for the management layer to implement it?

-- 
Eduardo

Re: [Qemu-devel] [RFC PATCH 0/2] implement the failover feature for assigned network devices
Posted by Markus Armbruster 5 years ago
Eduardo Habkost <ehabkost@redhat.com> writes:

> On Fri, Apr 05, 2019 at 07:22:35PM -0400, Michael S. Tsirkin wrote:
>> On Fri, Apr 05, 2019 at 09:56:29AM +0100, Dr. David Alan Gilbert wrote:
>> > * Jens Freimann (jfreimann@redhat.com) wrote:
>> > > On Fri, Mar 22, 2019 at 02:44:45PM +0100, Jens Freimann wrote:
> [...]
>> > > > 3. Management layer software should handle this. Open Stack already has
>> > > >  components/code to handle unplug/replug VFIO devices and metadata to
>> > > >  provide to the guest for detecting which devices should be paired.
>> > > >  -> An approach that includes all software from firmware to
>> > > >     higher-level management software wasn't tried in the last years. This is
>> > > >     an attempt to keep it simple and contained in QEMU as much as possible.
>> > > > 4. Hotplugging a device and then making it part of a failover setup is
>> > > >   not possible
>> > > >  -> addressed by extending qdev hotplug functions to check for hidden
>> > > >     attribute, so e.g. device_add can be used to plug a device.
>> > > > 
>> > > > There are still some open issues:
>> > > > 
>> > > > Migration: I'm looking for something like a pre-migration hook that I
>> > > > could use to unplug the vfio-pci device. I tried with a migration
>> > > > notifier but it is called to late, i.e. after migration is aborted due
>> > > > to vfio-pci marked unmigrateable. I worked around this by setting it
>> > > > to migrateable and used a migration notifier on the virtio-net device.
>> > 
>> > Why not just let this happen at the libvirt level; then you do the
>> > hotunplug etc before you actually tell qemu anything about starting a
>> > migration?
>> 
>> If qemu frees up resources (as it does on unplug) then libvirt
>> is not guaranteed it can roll the change back on e.g.
>> migration failure.
>
> Why should we always free up resources on unplug?
>
> Unplug of a disk device doesn't close the corresponding -blockdev.

It does for block backends created with -drive, and that was a mistake
we corrected with -blockdev.

> Unplug of a serial device doesn't close the corresponding -chardev.
> Unplug of a memory device doesn't close the corresponding memory backend.
> Unplug of a crypto device doesn't close the corresponding crypto backend.
>
> Why do we expect device_del of a passthrough PCI device to always
> release the host side PCI device?  We can provide a better API
> than that.

device_del should free what device_add allocates.

Does device_add allocate the host side PCI device?  If yes, should it?

[...]

Re: [Qemu-devel] [RFC PATCH 0/2] implement the failover feature for assigned network devices
Posted by Eduardo Habkost 5 years ago
On Mon, Apr 08, 2019 at 07:26:16AM +0200, Markus Armbruster wrote:
> Eduardo Habkost <ehabkost@redhat.com> writes:
> 
> > On Fri, Apr 05, 2019 at 07:22:35PM -0400, Michael S. Tsirkin wrote:
> >> On Fri, Apr 05, 2019 at 09:56:29AM +0100, Dr. David Alan Gilbert wrote:
> >> > * Jens Freimann (jfreimann@redhat.com) wrote:
> >> > > On Fri, Mar 22, 2019 at 02:44:45PM +0100, Jens Freimann wrote:
> > [...]
> >> > > > 3. Management layer software should handle this. Open Stack already has
> >> > > >  components/code to handle unplug/replug VFIO devices and metadata to
> >> > > >  provide to the guest for detecting which devices should be paired.
> >> > > >  -> An approach that includes all software from firmware to
> >> > > >     higher-level management software wasn't tried in the last years. This is
> >> > > >     an attempt to keep it simple and contained in QEMU as much as possible.
> >> > > > 4. Hotplugging a device and then making it part of a failover setup is
> >> > > >   not possible
> >> > > >  -> addressed by extending qdev hotplug functions to check for hidden
> >> > > >     attribute, so e.g. device_add can be used to plug a device.
> >> > > > 
> >> > > > There are still some open issues:
> >> > > > 
> >> > > > Migration: I'm looking for something like a pre-migration hook that I
> >> > > > could use to unplug the vfio-pci device. I tried with a migration
> >> > > > notifier but it is called to late, i.e. after migration is aborted due
> >> > > > to vfio-pci marked unmigrateable. I worked around this by setting it
> >> > > > to migrateable and used a migration notifier on the virtio-net device.
> >> > 
> >> > Why not just let this happen at the libvirt level; then you do the
> >> > hotunplug etc before you actually tell qemu anything about starting a
> >> > migration?
> >> 
> >> If qemu frees up resources (as it does on unplug) then libvirt
> >> is not guaranteed it can roll the change back on e.g.
> >> migration failure.
> >
> > Why should we always free up resources on unplug?
> >
> > Unplug of a disk device doesn't close the corresponding -blockdev.
> 
> It does for block backends created with -drive, and that was a mistake
> we corrected with -blockdev.
> 
> > Unplug of a serial device doesn't close the corresponding -chardev.
> > Unplug of a memory device doesn't close the corresponding memory backend.
> > Unplug of a crypto device doesn't close the corresponding crypto backend.
> >
> > Why do we expect device_del of a passthrough PCI device to always
> > release the host side PCI device?  We can provide a better API
> > than that.
> 
> device_del should free what device_add allocates.

Absolutely.  Making unplug of the guest device not close the host
device implies in having the host device not being opened by
device_add (it would be opened by -object/object_add, I assume).

> 
> Does device_add allocate the host side PCI device?  If yes, should it?

I don't see any reason it should.

-- 
Eduardo

Re: [Qemu-devel] [RFC PATCH 0/2] implement the failover feature for assigned network devices
Posted by Dr. David Alan Gilbert 5 years ago
* Michael S. Tsirkin (mst@redhat.com) wrote:
> On Fri, Apr 05, 2019 at 09:56:29AM +0100, Dr. David Alan Gilbert wrote:
> > * Jens Freimann (jfreimann@redhat.com) wrote:
> > > ping
> > > 
> > > FYI: I'm also working on a few related tools to detect driver behaviour when
> > > assigning a MAC to the vf device. Code is at https://github.com/jensfr/netfailover_driver_detect
> > 
> > Hi Jens,
> >   I've not been following this too uch, but:
> > 
> > > regards,
> > > Jens
> > > 
> > > On Fri, Mar 22, 2019 at 02:44:45PM +0100, Jens Freimann wrote:
> > > > This is another attempt at implementing the host side of the
> > > > net_failover concept
> > > > (https://www.kernel.org/doc/html/latest/networking/net_failover.html)
> > > > 
> > > > The general idea is that we have a pair of devices, a vfio-pci and a
> > > > emulated device. Before migration the vfio device is unplugged and data
> > > > flows to the emulated device, on the target side another vfio-pci device
> > > > is plugged in to take over the data-path. In the guest the net_failover
> > > > module will pair net devices with the same MAC address.
> > > > 
> > > > * In the first patch the infrastructure for hiding the device is added
> > > >  for the qbus and qdev APIs. A "hidden" boolean is added to the device
> > > >  state and it is set based on a callback to the standby device which
> > > >  registers itself for handling the assessment: "should the primary device
> > > >  be hidden?" by cross validating the ids of the devices.
> > > > 
> > > > * In the second patch the virtio-net uses the API to hide the vfio
> > > >  device and unhides it when the feature is acked.
> > > > 
> > > > Previous discussion: https://patchwork.ozlabs.org/cover/989098/
> > > > 
> > > > To summarize concerns/feedback from previous discussion:
> > > > 1.- guest OS can reject or worse _delay_ unplug by any amount of time.
> > > >  Migration might get stuck for unpredictable time with unclear reason.
> > > >  This approach combines two tricky things, hot/unplug and migration.
> > > >  -> We can surprise-remove the PCI device and in QEMU we can do all
> > > >     necessary rollbacks transparent to management software. Will it be
> > > >     easy, probably not.
> > 
> > This sounds 'fun' - bonus cases are things like what happens if the
> > guest gets rebooted somewhere during the process or if it's currently
> > sitting in the bios/grub/etc
> 
> Um, during which process? Guests are gradually fixed to support
> surprise removal well. Part of it is thunderbolt which makes
> it incredibly easy. Yes - bios/grub will need to learn to
> handle this well.

Ignoring the actual mechanism of the unplug itself; there are probably
loads of cases; e.g.

      running with both cards
      hot unplug real card
      start migration
      guest reboots
        Kernel sees only the virtio card
      migration completes
      hotadd the real card back

so the guest has to know to pair the real card even though it booted
with only the virtio card.

I'm sure there are loads of other corners.

> > > > 2. PCI devices are a precious ressource. The primary device should never
> > > >  be added to QEMU if it won't be used by guest instead of hiding it in
> > > >  QEMU.
> > > >  -> We only hotplug the device when the standby feature bit was
> > > >     negotiated. We save the device cmdline options until we need it for
> > > >     qdev_device_add()
> > > >     Hiding a device can be a useful concept to model. For example a
> > > >     pci device in a powered-off slot could be marked as hidden until the slot is
> > > >     powered on (mst).
> > 
> > Are they really that precious? Personally it's not something I'd worry
> > about.
> > 
> > > > 3. Management layer software should handle this. Open Stack already has
> > > >  components/code to handle unplug/replug VFIO devices and metadata to
> > > >  provide to the guest for detecting which devices should be paired.
> > > >  -> An approach that includes all software from firmware to
> > > >     higher-level management software wasn't tried in the last years. This is
> > > >     an attempt to keep it simple and contained in QEMU as much as possible.
> > > > 4. Hotplugging a device and then making it part of a failover setup is
> > > >   not possible
> > > >  -> addressed by extending qdev hotplug functions to check for hidden
> > > >     attribute, so e.g. device_add can be used to plug a device.
> > > > 
> > > > There are still some open issues:
> > > > 
> > > > Migration: I'm looking for something like a pre-migration hook that I
> > > > could use to unplug the vfio-pci device. I tried with a migration
> > > > notifier but it is called to late, i.e. after migration is aborted due
> > > > to vfio-pci marked unmigrateable. I worked around this by setting it
> > > > to migrateable and used a migration notifier on the virtio-net device.
> > 
> > Why not just let this happen at the libvirt level; then you do the
> > hotunplug etc before you actually tell qemu anything about starting a
> > migration?
> 
> If qemu frees up resources (as it does on unplug) then libvirt
> is not guaranteed it can roll the change back on e.g.
> migration failure.

Can you explain this in a bit more detail please; do you mean if it
frees the netdev?

> But really another issue is simply that it's a mechanism,
> there's no policy that management needs to decide on.
> Doing it at lowest possible level ensures all
> upper layers benefit with minimal pain.

Network setups in things like OpenStack can be really complex and
involve interacting with switches etc - something somewhere might
have to reconfigure a switch when you pull the real card.

Dave

> > > > Commandline: There is a dependency between vfio-pci and virtio-net
> > > > devices. One points to the other via new parameters
> > > > primar=<primary qdev id> and standby='<standby qdev id>'. This means
> > > > that the primary device needs to be specified after standby device on
> > > > the qemu command line. Not sure how to solve this.
> > > > 
> > > > Error handling: Patches don't cover all possible error scenarios yet.
> > > > 
> > > > I have tested this with a mlx5 NIC and was able to migrate the VM with
> > > > above mentioned workarounds for open problems.
> > > > 
> > > > Command line example:
> > > > 
> > > > qemu-system-x86_64 -enable-kvm -m 3072 -smp 3 \
> > > >        -machine q35,kernel-irqchip=split -cpu host   \
> > > >        -k fr   \
> > > >        -serial stdio   \
> > > >        -net none \
> > > >        -qmp unix:/tmp/qmp.socket,server,nowait \
> > > >        -monitor telnet:127.0.0.1:5555,server,nowait \
> > > >        -device pcie-root-port,id=root0,multifunction=on,chassis=0,addr=0xa \
> > > >        -device pcie-root-port,id=root1,bus=pcie.0,chassis=1 \
> > > >        -device pcie-root-port,id=root2,bus=pcie.0,chassis=2 \
> > > >        -netdev tap,script=/root/bin/bridge.sh,downscript=no,id=hostnet1,vhost=on \
> > > >        -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:6f:55:cc,bus=root2,primary=hostdev0 \
> > > >        -device vfio-pci,host=5e:00.2,id=hostdev0,bus=root1,standby=net1 \
> > 
> > Yes, that's a bit grim; it's circular dependency on the 'hostdev0' and
> > 'net1' id's.  cc'ing in Markus.
> > 
> > Dave
> > 
> > > >        /root/rhel-guest-image-8.0-1781.x86_64.qcow2
> > > > 
> > > > I'm grateful for any remarks or ideas!
> > > > 
> > > > Thanks!
> > > > 
> > > > regards,
> > > > Jens
> > > > 
> > > > Sameeh Jubran (2):
> > > >  qdev/qbus: Add hidden device support
> > > >  net/virtio: add failover support
> > > > 
> > > > hw/core/qdev.c                 | 27 ++++++++++
> > > > hw/net/virtio-net.c            | 95 ++++++++++++++++++++++++++++++++++
> > > > hw/pci/pci.c                   |  1 +
> > > > include/hw/pci/pci.h           |  2 +
> > > > include/hw/qdev-core.h         |  8 +++
> > > > include/hw/virtio/virtio-net.h |  7 +++
> > > > qdev-monitor.c                 | 48 +++++++++++++++--
> > > > vl.c                           |  7 ++-
> > > > 8 files changed, 189 insertions(+), 6 deletions(-)
> > > > 
> > > > -- 
> > > > 2.20.1
> > > > 
> > > > 
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Re: [Qemu-devel] [RFC PATCH 0/2] implement the failover feature for assigned network devices
Posted by Jens Freimann 5 years ago
On Mon, Apr 08, 2019 at 10:16:50AM +0100, Dr. David Alan Gilbert wrote:
>* Michael S. Tsirkin (mst@redhat.com) wrote:
>> On Fri, Apr 05, 2019 at 09:56:29AM +0100, Dr. David Alan Gilbert wrote:
>> > * Jens Freimann (jfreimann@redhat.com) wrote:
>> > > On Fri, Mar 22, 2019 at 02:44:45PM +0100, Jens Freimann wrote:
>> > > > This is another attempt at implementing the host side of the
>> > > > net_failover concept
>> > > > (https://www.kernel.org/doc/html/latest/networking/net_failover.html)
>> > > >
>> > > > The general idea is that we have a pair of devices, a vfio-pci and a
>> > > > emulated device. Before migration the vfio device is unplugged and data
>> > > > flows to the emulated device, on the target side another vfio-pci device
>> > > > is plugged in to take over the data-path. In the guest the net_failover
>> > > > module will pair net devices with the same MAC address.
>> > > >
>> > > > * In the first patch the infrastructure for hiding the device is added
>> > > >  for the qbus and qdev APIs. A "hidden" boolean is added to the device
>> > > >  state and it is set based on a callback to the standby device which
>> > > >  registers itself for handling the assessment: "should the primary device
>> > > >  be hidden?" by cross validating the ids of the devices.
>> > > >
>> > > > * In the second patch the virtio-net uses the API to hide the vfio
>> > > >  device and unhides it when the feature is acked.
>> > > >
>> > > > Previous discussion: https://patchwork.ozlabs.org/cover/989098/
>> > > >
>> > > > To summarize concerns/feedback from previous discussion:
>> > > > 1.- guest OS can reject or worse _delay_ unplug by any amount of time.
>> > > >  Migration might get stuck for unpredictable time with unclear reason.
>> > > >  This approach combines two tricky things, hot/unplug and migration.
>> > > >  -> We can surprise-remove the PCI device and in QEMU we can do all
>> > > >     necessary rollbacks transparent to management software. Will it be
>> > > >     easy, probably not.
>> >
>> > This sounds 'fun' - bonus cases are things like what happens if the
>> > guest gets rebooted somewhere during the process or if it's currently
>> > sitting in the bios/grub/etc
>>
>> Um, during which process? Guests are gradually fixed to support
>> surprise removal well. Part of it is thunderbolt which makes
>> it incredibly easy. Yes - bios/grub will need to learn to
>> handle this well.
>
>Ignoring the actual mechanism of the unplug itself; there are probably
>loads of cases; e.g.
>
>      running with both cards
>      hot unplug real card
>      start migration
>      guest reboots
>        Kernel sees only the virtio card
>      migration completes
>      hotadd the real card back
>
>so the guest has to know to pair the real card even though it booted
>with only the virtio card.

Maybe I misunderstand, but, when the 'real card' is added back after
migration the net_failover driver in the guest will know to pair it
with the virtio card because they have the same MAC address. Did you
mean something else?

>I'm sure there are loads of other corners.

Probably yes.

regards,
Jens 

Re: [Qemu-devel] [RFC PATCH 0/2] implement the failover feature for assigned network devices
Posted by Dr. David Alan Gilbert 5 years ago
* Jens Freimann (jfreimann@redhat.com) wrote:
> On Mon, Apr 08, 2019 at 10:16:50AM +0100, Dr. David Alan Gilbert wrote:
> > * Michael S. Tsirkin (mst@redhat.com) wrote:
> > > On Fri, Apr 05, 2019 at 09:56:29AM +0100, Dr. David Alan Gilbert wrote:
> > > > * Jens Freimann (jfreimann@redhat.com) wrote:
> > > > > On Fri, Mar 22, 2019 at 02:44:45PM +0100, Jens Freimann wrote:
> > > > > > This is another attempt at implementing the host side of the
> > > > > > net_failover concept
> > > > > > (https://www.kernel.org/doc/html/latest/networking/net_failover.html)
> > > > > >
> > > > > > The general idea is that we have a pair of devices, a vfio-pci and a
> > > > > > emulated device. Before migration the vfio device is unplugged and data
> > > > > > flows to the emulated device, on the target side another vfio-pci device
> > > > > > is plugged in to take over the data-path. In the guest the net_failover
> > > > > > module will pair net devices with the same MAC address.
> > > > > >
> > > > > > * In the first patch the infrastructure for hiding the device is added
> > > > > >  for the qbus and qdev APIs. A "hidden" boolean is added to the device
> > > > > >  state and it is set based on a callback to the standby device which
> > > > > >  registers itself for handling the assessment: "should the primary device
> > > > > >  be hidden?" by cross validating the ids of the devices.
> > > > > >
> > > > > > * In the second patch the virtio-net uses the API to hide the vfio
> > > > > >  device and unhides it when the feature is acked.
> > > > > >
> > > > > > Previous discussion: https://patchwork.ozlabs.org/cover/989098/
> > > > > >
> > > > > > To summarize concerns/feedback from previous discussion:
> > > > > > 1.- guest OS can reject or worse _delay_ unplug by any amount of time.
> > > > > >  Migration might get stuck for unpredictable time with unclear reason.
> > > > > >  This approach combines two tricky things, hot/unplug and migration.
> > > > > >  -> We can surprise-remove the PCI device and in QEMU we can do all
> > > > > >     necessary rollbacks transparent to management software. Will it be
> > > > > >     easy, probably not.
> > > >
> > > > This sounds 'fun' - bonus cases are things like what happens if the
> > > > guest gets rebooted somewhere during the process or if it's currently
> > > > sitting in the bios/grub/etc
> > > 
> > > Um, during which process? Guests are gradually fixed to support
> > > surprise removal well. Part of it is thunderbolt which makes
> > > it incredibly easy. Yes - bios/grub will need to learn to
> > > handle this well.
> > 
> > Ignoring the actual mechanism of the unplug itself; there are probably
> > loads of cases; e.g.
> > 
> >      running with both cards
> >      hot unplug real card
> >      start migration
> >      guest reboots
> >        Kernel sees only the virtio card
> >      migration completes
> >      hotadd the real card back
> > 
> > so the guest has to know to pair the real card even though it booted
> > with only the virtio card.
> 
> Maybe I misunderstand, but, when the 'real card' is added back after
> migration the net_failover driver in the guest will know to pair it
> with the virtio card because they have the same MAC address. Did you
> mean something else?

OK if it knows to do that.

> > I'm sure there are loads of other corners.
> 
> Probably yes.

Yeh, that was just my worry - just there's loads of this type of corner
around reboots.

Dave

> regards,
> Jens
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Re: [Qemu-devel] [RFC PATCH 0/2] implement the failover feature for assigned network devices
Posted by Michael S. Tsirkin 5 years ago
On Mon, Apr 08, 2019 at 10:16:50AM +0100, Dr. David Alan Gilbert wrote:
> * Michael S. Tsirkin (mst@redhat.com) wrote:
> > On Fri, Apr 05, 2019 at 09:56:29AM +0100, Dr. David Alan Gilbert wrote:
> > > * Jens Freimann (jfreimann@redhat.com) wrote:
> > > > ping
> > > > 
> > > > FYI: I'm also working on a few related tools to detect driver behaviour when
> > > > assigning a MAC to the vf device. Code is at https://github.com/jensfr/netfailover_driver_detect
> > > 
> > > Hi Jens,
> > >   I've not been following this too uch, but:
> > > 
> > > > regards,
> > > > Jens
> > > > 
> > > > On Fri, Mar 22, 2019 at 02:44:45PM +0100, Jens Freimann wrote:
> > > > > This is another attempt at implementing the host side of the
> > > > > net_failover concept
> > > > > (https://www.kernel.org/doc/html/latest/networking/net_failover.html)
> > > > > 
> > > > > The general idea is that we have a pair of devices, a vfio-pci and a
> > > > > emulated device. Before migration the vfio device is unplugged and data
> > > > > flows to the emulated device, on the target side another vfio-pci device
> > > > > is plugged in to take over the data-path. In the guest the net_failover
> > > > > module will pair net devices with the same MAC address.
> > > > > 
> > > > > * In the first patch the infrastructure for hiding the device is added
> > > > >  for the qbus and qdev APIs. A "hidden" boolean is added to the device
> > > > >  state and it is set based on a callback to the standby device which
> > > > >  registers itself for handling the assessment: "should the primary device
> > > > >  be hidden?" by cross validating the ids of the devices.
> > > > > 
> > > > > * In the second patch the virtio-net uses the API to hide the vfio
> > > > >  device and unhides it when the feature is acked.
> > > > > 
> > > > > Previous discussion: https://patchwork.ozlabs.org/cover/989098/
> > > > > 
> > > > > To summarize concerns/feedback from previous discussion:
> > > > > 1.- guest OS can reject or worse _delay_ unplug by any amount of time.
> > > > >  Migration might get stuck for unpredictable time with unclear reason.
> > > > >  This approach combines two tricky things, hot/unplug and migration.
> > > > >  -> We can surprise-remove the PCI device and in QEMU we can do all
> > > > >     necessary rollbacks transparent to management software. Will it be
> > > > >     easy, probably not.
> > > 
> > > This sounds 'fun' - bonus cases are things like what happens if the
> > > guest gets rebooted somewhere during the process or if it's currently
> > > sitting in the bios/grub/etc
> > 
> > Um, during which process? Guests are gradually fixed to support
> > surprise removal well. Part of it is thunderbolt which makes
> > it incredibly easy. Yes - bios/grub will need to learn to
> > handle this well.
> 
> Ignoring the actual mechanism of the unplug itself; there are probably
> loads of cases; e.g.
> 
>       running with both cards
>       hot unplug real card
>       start migration
>       guest reboots
>         Kernel sees only the virtio card
>       migration completes
>       hotadd the real card back
> 
> so the guest has to know to pair the real card even though it booted
> with only the virtio card.

Yes - that's why virtio has a flag to notify guest there
will be a pair even if it's not there at the moment.


> I'm sure there are loads of other corners.
> 
> > > > > 2. PCI devices are a precious ressource. The primary device should never
> > > > >  be added to QEMU if it won't be used by guest instead of hiding it in
> > > > >  QEMU.
> > > > >  -> We only hotplug the device when the standby feature bit was
> > > > >     negotiated. We save the device cmdline options until we need it for
> > > > >     qdev_device_add()
> > > > >     Hiding a device can be a useful concept to model. For example a
> > > > >     pci device in a powered-off slot could be marked as hidden until the slot is
> > > > >     powered on (mst).
> > > 
> > > Are they really that precious? Personally it's not something I'd worry
> > > about.
> > > 
> > > > > 3. Management layer software should handle this. Open Stack already has
> > > > >  components/code to handle unplug/replug VFIO devices and metadata to
> > > > >  provide to the guest for detecting which devices should be paired.
> > > > >  -> An approach that includes all software from firmware to
> > > > >     higher-level management software wasn't tried in the last years. This is
> > > > >     an attempt to keep it simple and contained in QEMU as much as possible.
> > > > > 4. Hotplugging a device and then making it part of a failover setup is
> > > > >   not possible
> > > > >  -> addressed by extending qdev hotplug functions to check for hidden
> > > > >     attribute, so e.g. device_add can be used to plug a device.
> > > > > 
> > > > > There are still some open issues:
> > > > > 
> > > > > Migration: I'm looking for something like a pre-migration hook that I
> > > > > could use to unplug the vfio-pci device. I tried with a migration
> > > > > notifier but it is called to late, i.e. after migration is aborted due
> > > > > to vfio-pci marked unmigrateable. I worked around this by setting it
> > > > > to migrateable and used a migration notifier on the virtio-net device.
> > > 
> > > Why not just let this happen at the libvirt level; then you do the
> > > hotunplug etc before you actually tell qemu anything about starting a
> > > migration?
> > 
> > If qemu frees up resources (as it does on unplug) then libvirt
> > is not guaranteed it can roll the change back on e.g.
> > migration failure.
> 
> Can you explain this in a bit more detail please; do you mean if it
> frees the netdev?
> 
> > But really another issue is simply that it's a mechanism,
> > there's no policy that management needs to decide on.
> > Doing it at lowest possible level ensures all
> > upper layers benefit with minimal pain.
> 
> Network setups in things like OpenStack can be really complex and
> involve interacting with switches etc - something somewhere might
> have to reconfigure a switch when you pull the real card.
> 
> Dave
> 
> > > > > Commandline: There is a dependency between vfio-pci and virtio-net
> > > > > devices. One points to the other via new parameters
> > > > > primar=<primary qdev id> and standby='<standby qdev id>'. This means
> > > > > that the primary device needs to be specified after standby device on
> > > > > the qemu command line. Not sure how to solve this.
> > > > > 
> > > > > Error handling: Patches don't cover all possible error scenarios yet.
> > > > > 
> > > > > I have tested this with a mlx5 NIC and was able to migrate the VM with
> > > > > above mentioned workarounds for open problems.
> > > > > 
> > > > > Command line example:
> > > > > 
> > > > > qemu-system-x86_64 -enable-kvm -m 3072 -smp 3 \
> > > > >        -machine q35,kernel-irqchip=split -cpu host   \
> > > > >        -k fr   \
> > > > >        -serial stdio   \
> > > > >        -net none \
> > > > >        -qmp unix:/tmp/qmp.socket,server,nowait \
> > > > >        -monitor telnet:127.0.0.1:5555,server,nowait \
> > > > >        -device pcie-root-port,id=root0,multifunction=on,chassis=0,addr=0xa \
> > > > >        -device pcie-root-port,id=root1,bus=pcie.0,chassis=1 \
> > > > >        -device pcie-root-port,id=root2,bus=pcie.0,chassis=2 \
> > > > >        -netdev tap,script=/root/bin/bridge.sh,downscript=no,id=hostnet1,vhost=on \
> > > > >        -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:6f:55:cc,bus=root2,primary=hostdev0 \
> > > > >        -device vfio-pci,host=5e:00.2,id=hostdev0,bus=root1,standby=net1 \
> > > 
> > > Yes, that's a bit grim; it's circular dependency on the 'hostdev0' and
> > > 'net1' id's.  cc'ing in Markus.
> > > 
> > > Dave
> > > 
> > > > >        /root/rhel-guest-image-8.0-1781.x86_64.qcow2
> > > > > 
> > > > > I'm grateful for any remarks or ideas!
> > > > > 
> > > > > Thanks!
> > > > > 
> > > > > regards,
> > > > > Jens
> > > > > 
> > > > > Sameeh Jubran (2):
> > > > >  qdev/qbus: Add hidden device support
> > > > >  net/virtio: add failover support
> > > > > 
> > > > > hw/core/qdev.c                 | 27 ++++++++++
> > > > > hw/net/virtio-net.c            | 95 ++++++++++++++++++++++++++++++++++
> > > > > hw/pci/pci.c                   |  1 +
> > > > > include/hw/pci/pci.h           |  2 +
> > > > > include/hw/qdev-core.h         |  8 +++
> > > > > include/hw/virtio/virtio-net.h |  7 +++
> > > > > qdev-monitor.c                 | 48 +++++++++++++++--
> > > > > vl.c                           |  7 ++-
> > > > > 8 files changed, 189 insertions(+), 6 deletions(-)
> > > > > 
> > > > > -- 
> > > > > 2.20.1
> > > > > 
> > > > > 
> > > --
> > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Re: [Qemu-devel] [RFC PATCH 0/2] implement the failover feature for assigned network devices
Posted by si-wei liu 4 years, 11 months ago

On 4/5/2019 4:22 PM, Michael S. Tsirkin wrote:
> On Fri, Apr 05, 2019 at 09:56:29AM +0100, Dr. David Alan Gilbert wrote:
>> * Jens Freimann (jfreimann@redhat.com) wrote:
>>> ping
>>>
>>> FYI: I'm also working on a few related tools to detect driver behaviour when
>>> assigning a MAC to the vf device. Code is at https://github.com/jensfr/netfailover_driver_detect
>> Hi Jens,
>>    I've not been following this too uch, but:
>>
>>> regards,
>>> Jens
>>>
>>> On Fri, Mar 22, 2019 at 02:44:45PM +0100, Jens Freimann wrote:
>>>> This is another attempt at implementing the host side of the
>>>> net_failover concept
>>>> (https://www.kernel.org/doc/html/latest/networking/net_failover.html)
>>>>
>>>> The general idea is that we have a pair of devices, a vfio-pci and a
>>>> emulated device. Before migration the vfio device is unplugged and data
>>>> flows to the emulated device, on the target side another vfio-pci device
>>>> is plugged in to take over the data-path. In the guest the net_failover
>>>> module will pair net devices with the same MAC address.
>>>>
>>>> * In the first patch the infrastructure for hiding the device is added
>>>>   for the qbus and qdev APIs. A "hidden" boolean is added to the device
>>>>   state and it is set based on a callback to the standby device which
>>>>   registers itself for handling the assessment: "should the primary device
>>>>   be hidden?" by cross validating the ids of the devices.
>>>>
>>>> * In the second patch the virtio-net uses the API to hide the vfio
>>>>   device and unhides it when the feature is acked.
>>>>
>>>> Previous discussion: https://patchwork.ozlabs.org/cover/989098/
>>>>
>>>> To summarize concerns/feedback from previous discussion:
>>>> 1.- guest OS can reject or worse _delay_ unplug by any amount of time.
>>>>   Migration might get stuck for unpredictable time with unclear reason.
>>>>   This approach combines two tricky things, hot/unplug and migration.
>>>>   -> We can surprise-remove the PCI device and in QEMU we can do all
>>>>      necessary rollbacks transparent to management software. Will it be
>>>>      easy, probably not.
>> This sounds 'fun' - bonus cases are things like what happens if the
>> guest gets rebooted somewhere during the process or if it's currently
>> sitting in the bios/grub/etc
> Um, during which process? Guests are gradually fixed to support
> surprise removal well. Part of it is thunderbolt which makes
> it incredibly easy. Yes - bios/grub will need to learn to
> handle this well.
I shared the same concern. As device emulator (QEMU), you know where 
guest would reject or delay - it's even agnostic bios/grub should 
respond to hot plug or not. You don't even know whether guest has the 
support for ACPI hotplug, surprise removal, do you? How QEMU infer what 
is the right disposition by telling apart these guest states?

-Siwei
>
>>>> 2. PCI devices are a precious ressource. The primary device should never
>>>>   be added to QEMU if it won't be used by guest instead of hiding it in
>>>>   QEMU.
>>>>   -> We only hotplug the device when the standby feature bit was
>>>>      negotiated. We save the device cmdline options until we need it for
>>>>      qdev_device_add()
>>>>      Hiding a device can be a useful concept to model. For example a
>>>>      pci device in a powered-off slot could be marked as hidden until the slot is
>>>>      powered on (mst).
>> Are they really that precious? Personally it's not something I'd worry
>> about.
>>
>>>> 3. Management layer software should handle this. Open Stack already has
>>>>   components/code to handle unplug/replug VFIO devices and metadata to
>>>>   provide to the guest for detecting which devices should be paired.
>>>>   -> An approach that includes all software from firmware to
>>>>      higher-level management software wasn't tried in the last years. This is
>>>>      an attempt to keep it simple and contained in QEMU as much as possible.
>>>> 4. Hotplugging a device and then making it part of a failover setup is
>>>>    not possible
>>>>   -> addressed by extending qdev hotplug functions to check for hidden
>>>>      attribute, so e.g. device_add can be used to plug a device.
>>>>
>>>> There are still some open issues:
>>>>
>>>> Migration: I'm looking for something like a pre-migration hook that I
>>>> could use to unplug the vfio-pci device. I tried with a migration
>>>> notifier but it is called to late, i.e. after migration is aborted due
>>>> to vfio-pci marked unmigrateable. I worked around this by setting it
>>>> to migrateable and used a migration notifier on the virtio-net device.
>> Why not just let this happen at the libvirt level; then you do the
>> hotunplug etc before you actually tell qemu anything about starting a
>> migration?
> If qemu frees up resources (as it does on unplug) then libvirt
> is not guaranteed it can roll the change back on e.g.
> migration failure.
>
> But really another issue is simply that it's a mechanism,
> there's no policy that management needs to decide on.
> Doing it at lowest possible level ensures all
> upper layers benefit with minimal pain.
>
>>>> Commandline: There is a dependency between vfio-pci and virtio-net
>>>> devices. One points to the other via new parameters
>>>> primar=<primary qdev id> and standby='<standby qdev id>'. This means
>>>> that the primary device needs to be specified after standby device on
>>>> the qemu command line. Not sure how to solve this.
>>>>
>>>> Error handling: Patches don't cover all possible error scenarios yet.
>>>>
>>>> I have tested this with a mlx5 NIC and was able to migrate the VM with
>>>> above mentioned workarounds for open problems.
>>>>
>>>> Command line example:
>>>>
>>>> qemu-system-x86_64 -enable-kvm -m 3072 -smp 3 \
>>>>         -machine q35,kernel-irqchip=split -cpu host   \
>>>>         -k fr   \
>>>>         -serial stdio   \
>>>>         -net none \
>>>>         -qmp unix:/tmp/qmp.socket,server,nowait \
>>>>         -monitor telnet:127.0.0.1:5555,server,nowait \
>>>>         -device pcie-root-port,id=root0,multifunction=on,chassis=0,addr=0xa \
>>>>         -device pcie-root-port,id=root1,bus=pcie.0,chassis=1 \
>>>>         -device pcie-root-port,id=root2,bus=pcie.0,chassis=2 \
>>>>         -netdev tap,script=/root/bin/bridge.sh,downscript=no,id=hostnet1,vhost=on \
>>>>         -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:6f:55:cc,bus=root2,primary=hostdev0 \
>>>>         -device vfio-pci,host=5e:00.2,id=hostdev0,bus=root1,standby=net1 \
>> Yes, that's a bit grim; it's circular dependency on the 'hostdev0' and
>> 'net1' id's.  cc'ing in Markus.
>>
>> Dave
>>
>>>>         /root/rhel-guest-image-8.0-1781.x86_64.qcow2
>>>>
>>>> I'm grateful for any remarks or ideas!
>>>>
>>>> Thanks!
>>>>
>>>> regards,
>>>> Jens
>>>>
>>>> Sameeh Jubran (2):
>>>>   qdev/qbus: Add hidden device support
>>>>   net/virtio: add failover support
>>>>
>>>> hw/core/qdev.c                 | 27 ++++++++++
>>>> hw/net/virtio-net.c            | 95 ++++++++++++++++++++++++++++++++++
>>>> hw/pci/pci.c                   |  1 +
>>>> include/hw/pci/pci.h           |  2 +
>>>> include/hw/qdev-core.h         |  8 +++
>>>> include/hw/virtio/virtio-net.h |  7 +++
>>>> qdev-monitor.c                 | 48 +++++++++++++++--
>>>> vl.c                           |  7 ++-
>>>> 8 files changed, 189 insertions(+), 6 deletions(-)
>>>>
>>>> -- 
>>>> 2.20.1
>>>>
>>>>
>> --
>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


Re: [Qemu-devel] [RFC PATCH 0/2] implement the failover feature for assigned network devices
Posted by Michael S. Tsirkin 4 years, 11 months ago
On Tue, May 28, 2019 at 05:35:26PM -0700, si-wei liu wrote:
> 
> 
> On 4/5/2019 4:22 PM, Michael S. Tsirkin wrote:
> > On Fri, Apr 05, 2019 at 09:56:29AM +0100, Dr. David Alan Gilbert wrote:
> > > * Jens Freimann (jfreimann@redhat.com) wrote:
> > > > ping
> > > > 
> > > > FYI: I'm also working on a few related tools to detect driver behaviour when
> > > > assigning a MAC to the vf device. Code is at https://github.com/jensfr/netfailover_driver_detect
> > > Hi Jens,
> > >    I've not been following this too uch, but:
> > > 
> > > > regards,
> > > > Jens
> > > > 
> > > > On Fri, Mar 22, 2019 at 02:44:45PM +0100, Jens Freimann wrote:
> > > > > This is another attempt at implementing the host side of the
> > > > > net_failover concept
> > > > > (https://www.kernel.org/doc/html/latest/networking/net_failover.html)
> > > > > 
> > > > > The general idea is that we have a pair of devices, a vfio-pci and a
> > > > > emulated device. Before migration the vfio device is unplugged and data
> > > > > flows to the emulated device, on the target side another vfio-pci device
> > > > > is plugged in to take over the data-path. In the guest the net_failover
> > > > > module will pair net devices with the same MAC address.
> > > > > 
> > > > > * In the first patch the infrastructure for hiding the device is added
> > > > >   for the qbus and qdev APIs. A "hidden" boolean is added to the device
> > > > >   state and it is set based on a callback to the standby device which
> > > > >   registers itself for handling the assessment: "should the primary device
> > > > >   be hidden?" by cross validating the ids of the devices.
> > > > > 
> > > > > * In the second patch the virtio-net uses the API to hide the vfio
> > > > >   device and unhides it when the feature is acked.
> > > > > 
> > > > > Previous discussion: https://patchwork.ozlabs.org/cover/989098/
> > > > > 
> > > > > To summarize concerns/feedback from previous discussion:
> > > > > 1.- guest OS can reject or worse _delay_ unplug by any amount of time.
> > > > >   Migration might get stuck for unpredictable time with unclear reason.
> > > > >   This approach combines two tricky things, hot/unplug and migration.
> > > > >   -> We can surprise-remove the PCI device and in QEMU we can do all
> > > > >      necessary rollbacks transparent to management software. Will it be
> > > > >      easy, probably not.
> > > This sounds 'fun' - bonus cases are things like what happens if the
> > > guest gets rebooted somewhere during the process or if it's currently
> > > sitting in the bios/grub/etc
> > Um, during which process? Guests are gradually fixed to support
> > surprise removal well. Part of it is thunderbolt which makes
> > it incredibly easy. Yes - bios/grub will need to learn to
> > handle this well.
> I shared the same concern. As device emulator (QEMU), you know where guest
> would reject or delay - it's even agnostic bios/grub should respond to hot
> plug or not. You don't even know whether guest has the support for ACPI
> hotplug, surprise removal, do you? How QEMU infer what is the right
> disposition by telling apart these guest states?
> 
> -Siwei

We can always add a feature bit for that :)
No feature bit -> primary stays hidden.


> > 
> > > > > 2. PCI devices are a precious ressource. The primary device should never
> > > > >   be added to QEMU if it won't be used by guest instead of hiding it in
> > > > >   QEMU.
> > > > >   -> We only hotplug the device when the standby feature bit was
> > > > >      negotiated. We save the device cmdline options until we need it for
> > > > >      qdev_device_add()
> > > > >      Hiding a device can be a useful concept to model. For example a
> > > > >      pci device in a powered-off slot could be marked as hidden until the slot is
> > > > >      powered on (mst).
> > > Are they really that precious? Personally it's not something I'd worry
> > > about.
> > > 
> > > > > 3. Management layer software should handle this. Open Stack already has
> > > > >   components/code to handle unplug/replug VFIO devices and metadata to
> > > > >   provide to the guest for detecting which devices should be paired.
> > > > >   -> An approach that includes all software from firmware to
> > > > >      higher-level management software wasn't tried in the last years. This is
> > > > >      an attempt to keep it simple and contained in QEMU as much as possible.
> > > > > 4. Hotplugging a device and then making it part of a failover setup is
> > > > >    not possible
> > > > >   -> addressed by extending qdev hotplug functions to check for hidden
> > > > >      attribute, so e.g. device_add can be used to plug a device.
> > > > > 
> > > > > There are still some open issues:
> > > > > 
> > > > > Migration: I'm looking for something like a pre-migration hook that I
> > > > > could use to unplug the vfio-pci device. I tried with a migration
> > > > > notifier but it is called to late, i.e. after migration is aborted due
> > > > > to vfio-pci marked unmigrateable. I worked around this by setting it
> > > > > to migrateable and used a migration notifier on the virtio-net device.
> > > Why not just let this happen at the libvirt level; then you do the
> > > hotunplug etc before you actually tell qemu anything about starting a
> > > migration?
> > If qemu frees up resources (as it does on unplug) then libvirt
> > is not guaranteed it can roll the change back on e.g.
> > migration failure.
> > 
> > But really another issue is simply that it's a mechanism,
> > there's no policy that management needs to decide on.
> > Doing it at lowest possible level ensures all
> > upper layers benefit with minimal pain.
> > 
> > > > > Commandline: There is a dependency between vfio-pci and virtio-net
> > > > > devices. One points to the other via new parameters
> > > > > primar=<primary qdev id> and standby='<standby qdev id>'. This means
> > > > > that the primary device needs to be specified after standby device on
> > > > > the qemu command line. Not sure how to solve this.
> > > > > 
> > > > > Error handling: Patches don't cover all possible error scenarios yet.
> > > > > 
> > > > > I have tested this with a mlx5 NIC and was able to migrate the VM with
> > > > > above mentioned workarounds for open problems.
> > > > > 
> > > > > Command line example:
> > > > > 
> > > > > qemu-system-x86_64 -enable-kvm -m 3072 -smp 3 \
> > > > >         -machine q35,kernel-irqchip=split -cpu host   \
> > > > >         -k fr   \
> > > > >         -serial stdio   \
> > > > >         -net none \
> > > > >         -qmp unix:/tmp/qmp.socket,server,nowait \
> > > > >         -monitor telnet:127.0.0.1:5555,server,nowait \
> > > > >         -device pcie-root-port,id=root0,multifunction=on,chassis=0,addr=0xa \
> > > > >         -device pcie-root-port,id=root1,bus=pcie.0,chassis=1 \
> > > > >         -device pcie-root-port,id=root2,bus=pcie.0,chassis=2 \
> > > > >         -netdev tap,script=/root/bin/bridge.sh,downscript=no,id=hostnet1,vhost=on \
> > > > >         -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:6f:55:cc,bus=root2,primary=hostdev0 \
> > > > >         -device vfio-pci,host=5e:00.2,id=hostdev0,bus=root1,standby=net1 \
> > > Yes, that's a bit grim; it's circular dependency on the 'hostdev0' and
> > > 'net1' id's.  cc'ing in Markus.
> > > 
> > > Dave
> > > 
> > > > >         /root/rhel-guest-image-8.0-1781.x86_64.qcow2
> > > > > 
> > > > > I'm grateful for any remarks or ideas!
> > > > > 
> > > > > Thanks!
> > > > > 
> > > > > regards,
> > > > > Jens
> > > > > 
> > > > > Sameeh Jubran (2):
> > > > >   qdev/qbus: Add hidden device support
> > > > >   net/virtio: add failover support
> > > > > 
> > > > > hw/core/qdev.c                 | 27 ++++++++++
> > > > > hw/net/virtio-net.c            | 95 ++++++++++++++++++++++++++++++++++
> > > > > hw/pci/pci.c                   |  1 +
> > > > > include/hw/pci/pci.h           |  2 +
> > > > > include/hw/qdev-core.h         |  8 +++
> > > > > include/hw/virtio/virtio-net.h |  7 +++
> > > > > qdev-monitor.c                 | 48 +++++++++++++++--
> > > > > vl.c                           |  7 ++-
> > > > > 8 files changed, 189 insertions(+), 6 deletions(-)
> > > > > 
> > > > > -- 
> > > > > 2.20.1
> > > > > 
> > > > > 
> > > --
> > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Re: [Qemu-devel] [RFC PATCH 0/2] implement the failover feature for assigned network devices
Posted by Daniel P. Berrangé 5 years ago
On Fri, Mar 22, 2019 at 02:44:45PM +0100, Jens Freimann wrote:
> Commandline: There is a dependency between vfio-pci and virtio-net
> devices. One points to the other via new parameters
> primar=<primary qdev id> and standby='<standby qdev id>'. This means
> that the primary device needs to be specified after standby device on 
> the qemu command line. Not sure how to solve this.

So we hae this pair of args

>         -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:6f:55:cc,bus=root2,primary=hostdev0 \                                                                                     
>         -device vfio-pci,host=5e:00.2,id=hostdev0,bus=root1,standby=net1 \

In this you have denoted the "vfio-pci" device as the primary and
"virtio-net-pci" as the standby.

There's no need for the user to specify the "primary" property for for
the "virtio-net-pci" NIC though. We only need to tell QEMU the relationship
in one direction and it can set the relationship in the reverse direction
automatically.

In fact it is undesirable for the user to specify "primary" property, as
they should be able to start QEMU with only the virtio-net-pci device
present and then hot-plug a vfio-pci on the fly when it is available.

So I think the "virtio-net-pci" merely needs a flag to indicate that it
should be prepared to take part in a failover pair, but *not* take any
device ID from the user.

eg we should be able to start with just

  -device virtio-net-pci,netdev=hostnet1,id=net1,\
          mac=52:54:00:6f:55:cc,bus=root2,failover=on

When vfio-pci is then created, either via a further -device arg on later
in QMP via device_add, it only needs to specify standby=net1.

When vfio-pci is realized it can lookup the virtio-net-pci device
in the QOM tree and set its "primary" property to point back to its
own device ID. There should never be any need for the user to tell
virtio-net-pci what the device ID of the vfio-pci is.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [Qemu-devel] [RFC PATCH 0/2] implement the failover feature for assigned network devices
Posted by no-reply@patchew.org 5 years, 1 month ago
Patchew URL: https://patchew.org/QEMU/20190322134447.14831-1-jfreimann@redhat.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20190322134447.14831-1-jfreimann@redhat.com
Subject: [Qemu-devel] [RFC PATCH 0/2] implement the failover feature for assigned network devices

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag]         patchew/20190322134447.14831-1-jfreimann@redhat.com -> patchew/20190322134447.14831-1-jfreimann@redhat.com
Switched to a new branch 'test'
4aa7f06 net/virtio: add failover support
5d2911f qdev/qbus: Add hidden device support

=== OUTPUT BEGIN ===
1/2 Checking commit 5d2911f6d6d4 (qdev/qbus: Add hidden device support)
ERROR: trailing whitespace
#22: FILE: hw/core/qdev.c:218:
+    $

ERROR: that open brace { should be on the previous line
#30: FILE: hw/core/qdev.c:226:
+        if (match_found)
+        {

ERROR: that open brace { should be on the previous line
#36: FILE: hw/core/qdev.c:232:
+    if (!match_found)
+    {

WARNING: Block comments use a leading /* on a separate line
#89: FILE: include/hw/qdev-core.h:161:
+    /* This callback is called just upon init of the DeviceState

ERROR: line over 90 characters
#93: FILE: include/hw/qdev-core.h:165:
+    void (*should_be_hidden)(DeviceListener *listener, QemuOpts *device_opts,bool *match_found, bool *res);

ERROR: space required after that ',' (ctx:VxV)
#93: FILE: include/hw/qdev-core.h:165:
+    void (*should_be_hidden)(DeviceListener *listener, QemuOpts *device_opts,bool *match_found, bool *res);
                                                                             ^

ERROR: that open brace { should be on the previous line
#123: FILE: qdev-monitor.c:576:
+    if (strcmp(name, "standby") == 0)
+    {

ERROR: that open brace { should be on the previous line
#126: FILE: qdev-monitor.c:579:
+        if (qdev_should_hide_device(opts, errp) && errp && !*errp)
+        {

ERROR: that open brace { should be on the previous line
#130: FILE: qdev-monitor.c:583:
+        else if (errp && *errp)
+        {

ERROR: else should follow close brace '}'
#130: FILE: qdev-monitor.c:583:
+        }
+        else if (errp && *errp)

ERROR: that open brace { should be on the previous line
#140: FILE: qdev-monitor.c:593:
+    if (qemu_opt_foreach(opts, has_standby_device, opts, err) == 0)
+    {

ERROR: trailing whitespace
#155: FILE: qdev-monitor.c:607:
+    $

ERROR: that open brace { should be on the previous line
#156: FILE: qdev-monitor.c:608:
+    if (opts && should_hide_device(opts, &err))
+    {

ERROR: that open brace { should be on the previous line
#158: FILE: qdev-monitor.c:610:
+        if(err)
+        {

ERROR: space required before the open parenthesis '('
#158: FILE: qdev-monitor.c:610:
+        if(err)

ERROR: that open brace { should be on the previous line
#173: FILE: qdev-monitor.c:688:
+    if (dev)
+    {

ERROR: that open brace { should be on the previous line
#202: FILE: vl.c:2343:
+    } else if (dev)
+    {

total: 16 errors, 1 warnings, 165 lines checked

Patch 1/2 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

2/2 Checking commit 4aa7f06bed23 (net/virtio: add failover support)
ERROR: that open brace { should be on the previous line
#73: FILE: hw/net/virtio-net.c:2650:
+    if (!n->primary_dev && err)
+    {

WARNING: line over 80 characters
#82: FILE: hw/net/virtio-net.c:2659:
+static void virtio_net_handle_migration_primary(VirtIONet *n, MigrationState * s)

WARNING: Block comments use a leading /* on a separate line
#88: FILE: hw/net/virtio-net.c:2665:
+        /* Request unplug

WARNING: Block comments should align the * on each line
#91: FILE: hw/net/virtio-net.c:2668:
+         *
+        */

ERROR: that open brace { should be on the previous line
#93: FILE: hw/net/virtio-net.c:2670:
+        if (!err)
+        {

ERROR: that open brace { should be on the previous line
#99: FILE: hw/net/virtio-net.c:2676:
+        if (should_be_hidden && !n->primary_dev)
+        {

ERROR: that open brace { should be on the previous line
#104: FILE: hw/net/virtio-net.c:2681:
+        else
+        {

ERROR: else should follow close brace '}'
#104: FILE: hw/net/virtio-net.c:2681:
+        }
+        else

ERROR: space required after that ',' (ctx:VxV)
#114: FILE: hw/net/virtio-net.c:2691:
+    virtio_net_handle_migration_primary(n,s);
                                          ^

ERROR: line over 90 characters
#117: FILE: hw/net/virtio-net.c:2694:
+static void virtio_net_primary_should_be_hidden(DeviceListener *listener, QemuOpts *device_opts,

ERROR: "foo * bar" should be "foo *bar"
#121: FILE: hw/net/virtio-net.c:2698:
+   const char * dev_id = qemu_opts_id(device_opts);

ERROR: space required after that ',' (ctx:WxV)
#123: FILE: hw/net/virtio-net.c:2700:
+    *match_found = !strcmp(n->net_conf.primary_id_str ,dev_id);
                                                       ^

ERROR: line over 90 characters
#124: FILE: hw/net/virtio-net.c:2701:
+    if (atomic_read(&n->primary_should_be_hidden) && !strcmp(qemu_opt_get(device_opts, "driver"), "vfio-pci")

WARNING: line over 80 characters
#143: FILE: hw/net/virtio-net.c:2743:
+        n->primary_listener.should_be_hidden = virtio_net_primary_should_be_hidden;

ERROR: Missing Signed-off-by: line(s)

total: 11 errors, 4 warnings, 162 lines checked

Patch 2/2 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20190322134447.14831-1-jfreimann@redhat.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com