[Qemu-devel] [PATCH v3 0/3] Use of unique identifier for pairing virtio and passthrough devices...

Venu Busireddy posted 3 patches 5 years, 10 months ago
Failed in applying to current master (apply log)
default-configs/arm-softmmu.mak             |   1 +
default-configs/i386-softmmu.mak            |   1 +
default-configs/x86_64-softmmu.mak          |   1 +
hw/pci-bridge/Makefile.objs                 |   1 +
hw/pci-bridge/pci_bridge_dev.c              |  10 +
hw/pci-bridge/pcie_downstream.c             | 220 ++++++++++++++++++++
hw/pci-bridge/pcie_downstream.h             |  10 +
hw/pci/pci_bridge.c                         |  34 +++
hw/virtio/virtio-pci.c                      |  15 ++
hw/virtio/virtio-pci.h                      |   3 +-
include/hw/pci/pci.h                        |  37 ++--
include/hw/pci/pci_bridge.h                 |   2 +
include/hw/pci/pcie.h                       |   1 +
include/standard-headers/linux/virtio_pci.h |   8 +
14 files changed, 326 insertions(+), 18 deletions(-)
create mode 100644 hw/pci-bridge/pcie_downstream.c
create mode 100644 hw/pci-bridge/pcie_downstream.h
[Qemu-devel] [PATCH v3 0/3] Use of unique identifier for pairing virtio and passthrough devices...
Posted by Venu Busireddy 5 years, 10 months ago
The patch set "Enable virtio_net to act as a standby for a passthru
device" [1] deals with live migration of guests that use passthrough
devices. However, that scheme uses the MAC address for pairing
the virtio device and the passthrough device. The thread "netvsc:
refactor notifier/event handling code to use the failover framework"
[2] discusses an alternate mechanism, such as using an UUID, for pairing
the devices. Based on that discussion, proposals "Add "Group Identifier"
to virtio PCI capabilities." [3] and "RFC: Use of bridge devices to
store pairing information..." [4] were made.

The current patch set includes all the feedback received for proposals [3]
and [4]. For the sake of completeness, patch for the virtio specification
is also included here. Following is the updated proposal.

1. Extend the virtio specification to include a new virtio PCI capability
   "VIRTIO_PCI_CAP_GROUP_ID_CFG".

2. Enhance the QEMU CLI to include a "failover-group-id" option to the
   virtio device. The "failover-group-id" is a 64 bit value.

3. Enhance the QEMU CLI to include a "failover-group-id" option to the
   Red Hat PCI bridge device (support for the i440FX model).

4. Add a new "pcie-downstream" device, with the option
   "failover-group-id" (support for the Q35 model).

5. The operator creates a 64 bit unique identifier, failover-group-id.

6. When the virtio device is created, the operator uses the
   "failover-group-id" option (for example, '-device
   virtio-net-pci,failover-group-id=<identifier>') and specifies the
   failover-group-id created in step 4.

   QEMU stores the failover-group-id in the virtio device's configuration
   space in the capability "VIRTIO_PCI_CAP_GROUP_ID_CFG".

7. When assigning a PCI device to the guest in passthrough mode, the
   operator first creates a bridge using the "failover-group-id" option
   (for example, '-device pcie-downstream,failover-group-id=<identifier>')
   to specify the failover-group-id created in step 4, and then attaches
   the passthrough device to the bridge.

   QEMU stores the failover-group-id in the configuration space of the
   bridge as Vendor-Specific capability (0x09). The "Vendor" here is
   not to be confused with a specific organization. Instead, the vendor
   of the bridge is QEMU.

8. Patch 4 in patch series "Enable virtio_net to act as a standby for
   a passthru device" [1] needs to be modified to use the UUID values
   present in the bridge's configuration space and the virtio device's
   configuration space instead of the MAC address for pairing the devices.

Thanks!

Venu

[1] https://lists.oasis-open.org/archives/virtio-dev/201805/msg00156.html
[2] https://www.spinics.net/lists/netdev/msg499011.html
[3] https://lists.oasis-open.org/archives/virtio-dev/201805/msg00118.html
[4] https://lists.oasis-open.org/archives/virtio-dev/201805/msg00204.html

Changes in v3:
  - As Michael Tsirkin suggested:
    * Changed the failover group identifier from UUID to a 64-bit value.
    * Changed the command line option from "uuid" to "failover-group-id".
    * Tweaked the "pci-bridge" device's Device ID to
      PCI_DEVICE_ID_REDHAT_BRIDGE_FAILOVER.
    * Added to new device "pcie-downstream" with Device ID
      PCI_DEVICE_ID_REDHAT_DOWNPORT_FAILOVER (to support the PCIe case).
  - Changed the patch for virtio specification to reflect the above
    changes.

Changes in v2:
  - As Michael Tsirkin suggested, changed the virtio specification
    to restrict the group identifier to be a 16-byte field, presented
    entirely in the virtio device's configuration space.
  - As Michael Tsirkin suggested, instead of tweaking the ioh3420
    device with Red Hat vendor ID, create a new PCIe bridge device
    named "pcie-downstream" with Red Hat Vendor ID, and include the
    group identifier in this device.
  - Added a new patch to enhance the "pci-bridge" device to support
    the group identifier (for the i440FX model).

Venu Busireddy (3):
  Add "Group Identifier" support to virtio devices.
  Add "Group Identifier" support to Red Hat PCI bridge.
  Add "Group Identifier" support to Red Hat PCI Express bridge.

 default-configs/arm-softmmu.mak             |   1 +
 default-configs/i386-softmmu.mak            |   1 +
 default-configs/x86_64-softmmu.mak          |   1 +
 hw/pci-bridge/Makefile.objs                 |   1 +
 hw/pci-bridge/pci_bridge_dev.c              |  10 +
 hw/pci-bridge/pcie_downstream.c             | 220 ++++++++++++++++++++
 hw/pci-bridge/pcie_downstream.h             |  10 +
 hw/pci/pci_bridge.c                         |  34 +++
 hw/virtio/virtio-pci.c                      |  15 ++
 hw/virtio/virtio-pci.h                      |   3 +-
 include/hw/pci/pci.h                        |  37 ++--
 include/hw/pci/pci_bridge.h                 |   2 +
 include/hw/pci/pcie.h                       |   1 +
 include/standard-headers/linux/virtio_pci.h |   8 +
 14 files changed, 326 insertions(+), 18 deletions(-)
 create mode 100644 hw/pci-bridge/pcie_downstream.c
 create mode 100644 hw/pci-bridge/pcie_downstream.h


Re: [Qemu-devel] [PATCH v3 0/3] Use of unique identifier for pairing virtio and passthrough devices...
Posted by Roman Kagan 5 years, 10 months ago
On Fri, Jun 29, 2018 at 05:19:03PM -0500, Venu Busireddy wrote:
> The patch set "Enable virtio_net to act as a standby for a passthru
> device" [1] deals with live migration of guests that use passthrough
> devices. However, that scheme uses the MAC address for pairing
> the virtio device and the passthrough device. The thread "netvsc:
> refactor notifier/event handling code to use the failover framework"
> [2] discusses an alternate mechanism, such as using an UUID, for pairing
> the devices. Based on that discussion, proposals "Add "Group Identifier"
> to virtio PCI capabilities." [3] and "RFC: Use of bridge devices to
> store pairing information..." [4] were made.
> 
> The current patch set includes all the feedback received for proposals [3]
> and [4]. For the sake of completeness, patch for the virtio specification
> is also included here. Following is the updated proposal.
> 
> 1. Extend the virtio specification to include a new virtio PCI capability
>    "VIRTIO_PCI_CAP_GROUP_ID_CFG".
> 
> 2. Enhance the QEMU CLI to include a "failover-group-id" option to the
>    virtio device. The "failover-group-id" is a 64 bit value.
> 
> 3. Enhance the QEMU CLI to include a "failover-group-id" option to the
>    Red Hat PCI bridge device (support for the i440FX model).
> 
> 4. Add a new "pcie-downstream" device, with the option
>    "failover-group-id" (support for the Q35 model).
> 
> 5. The operator creates a 64 bit unique identifier, failover-group-id.
> 
> 6. When the virtio device is created, the operator uses the
>    "failover-group-id" option (for example, '-device
>    virtio-net-pci,failover-group-id=<identifier>') and specifies the
>    failover-group-id created in step 4.
> 
>    QEMU stores the failover-group-id in the virtio device's configuration
>    space in the capability "VIRTIO_PCI_CAP_GROUP_ID_CFG".
> 
> 7. When assigning a PCI device to the guest in passthrough mode, the
>    operator first creates a bridge using the "failover-group-id" option
>    (for example, '-device pcie-downstream,failover-group-id=<identifier>')
>    to specify the failover-group-id created in step 4, and then attaches
>    the passthrough device to the bridge.
> 
>    QEMU stores the failover-group-id in the configuration space of the
>    bridge as Vendor-Specific capability (0x09). The "Vendor" here is
>    not to be confused with a specific organization. Instead, the vendor
>    of the bridge is QEMU.
> 
> 8. Patch 4 in patch series "Enable virtio_net to act as a standby for
>    a passthru device" [1] needs to be modified to use the UUID values
>    present in the bridge's configuration space and the virtio device's
>    configuration space instead of the MAC address for pairing the devices.

I'm still missing a few bits in the overall scheme.

Is the guest supposed to acknowledge the support for PT-PV failover?

Should the PT device be visibile to the guest before it acknowledges the
support for failover?  How is this supposed to work with legacy guests
that don't support it?

Is the guest supposed to signal the datapath switch to the host?

Is the scheme going to be applied/extended to other transports (vmbus,
virtio-ccw, etc.)?

Is the failover group concept going to be used beyond PT-PV network
device failover?

Thanks,
Roman.

Re: [Qemu-devel] [PATCH v3 0/3] Use of unique identifier for pairing virtio and passthrough devices...
Posted by si-wei liu 5 years, 10 months ago

On 7/2/2018 9:14 AM, Roman Kagan wrote:
> On Fri, Jun 29, 2018 at 05:19:03PM -0500, Venu Busireddy wrote:
>> The patch set "Enable virtio_net to act as a standby for a passthru
>> device" [1] deals with live migration of guests that use passthrough
>> devices. However, that scheme uses the MAC address for pairing
>> the virtio device and the passthrough device. The thread "netvsc:
>> refactor notifier/event handling code to use the failover framework"
>> [2] discusses an alternate mechanism, such as using an UUID, for pairing
>> the devices. Based on that discussion, proposals "Add "Group Identifier"
>> to virtio PCI capabilities." [3] and "RFC: Use of bridge devices to
>> store pairing information..." [4] were made.
>>
>> The current patch set includes all the feedback received for proposals [3]
>> and [4]. For the sake of completeness, patch for the virtio specification
>> is also included here. Following is the updated proposal.
>>
>> 1. Extend the virtio specification to include a new virtio PCI capability
>>     "VIRTIO_PCI_CAP_GROUP_ID_CFG".
>>
>> 2. Enhance the QEMU CLI to include a "failover-group-id" option to the
>>     virtio device. The "failover-group-id" is a 64 bit value.
>>
>> 3. Enhance the QEMU CLI to include a "failover-group-id" option to the
>>     Red Hat PCI bridge device (support for the i440FX model).
>>
>> 4. Add a new "pcie-downstream" device, with the option
>>     "failover-group-id" (support for the Q35 model).
>>
>> 5. The operator creates a 64 bit unique identifier, failover-group-id.
>>
>> 6. When the virtio device is created, the operator uses the
>>     "failover-group-id" option (for example, '-device
>>     virtio-net-pci,failover-group-id=<identifier>') and specifies the
>>     failover-group-id created in step 4.
>>
>>     QEMU stores the failover-group-id in the virtio device's configuration
>>     space in the capability "VIRTIO_PCI_CAP_GROUP_ID_CFG".
>>
>> 7. When assigning a PCI device to the guest in passthrough mode, the
>>     operator first creates a bridge using the "failover-group-id" option
>>     (for example, '-device pcie-downstream,failover-group-id=<identifier>')
>>     to specify the failover-group-id created in step 4, and then attaches
>>     the passthrough device to the bridge.
>>
>>     QEMU stores the failover-group-id in the configuration space of the
>>     bridge as Vendor-Specific capability (0x09). The "Vendor" here is
>>     not to be confused with a specific organization. Instead, the vendor
>>     of the bridge is QEMU.
>>
>> 8. Patch 4 in patch series "Enable virtio_net to act as a standby for
>>     a passthru device" [1] needs to be modified to use the UUID values
>>     present in the bridge's configuration space and the virtio device's
>>     configuration space instead of the MAC address for pairing the devices.
> I'm still missing a few bits in the overall scheme.
>
> Is the guest supposed to acknowledge the support for PT-PV failover?

Yes. We are leveraging virtio's feature negotiation mechanism for that. 
Guest which does not acknowledge the support will not have PT plugged in.

> Should the PT device be visibile to the guest before it acknowledges the
> support for failover?
No. QEMU will only expose PT device after guest acknowledges the support 
through virtio's feature negotiation.

>    How is this supposed to work with legacy guests that don't support it?
Only PV device will be exposed on legacy guest.

>
> Is the guest supposed to signal the datapath switch to the host?
No, guest doesn't need to be initiating datapath switch at all. However, 
QMP events may be generated when exposing or hiding the PT device 
through hot plug/unplug to facilitate host to switch datapath.

>
> Is the scheme going to be applied/extended to other transports (vmbus,
> virtio-ccw, etc.)?
Well, it depends on the use case, and how feasible it can be extended to 
other transport due to constraints and transport specifics.

> Is the failover group concept going to be used beyond PT-PV network
> device failover?
Although the concept of failover group is generic, the implementation 
itself may vary.

-Siwei


>
> Thanks,
> Roman.
>
>


Re: [Qemu-devel] [PATCH v3 0/3] Use of unique identifier for pairing virtio and passthrough devices...
Posted by Roman Kagan 5 years, 10 months ago
On Mon, Jul 02, 2018 at 02:14:52PM -0700, si-wei liu wrote:
> On 7/2/2018 9:14 AM, Roman Kagan wrote:
> > On Fri, Jun 29, 2018 at 05:19:03PM -0500, Venu Busireddy wrote:
> > > The patch set "Enable virtio_net to act as a standby for a passthru
> > > device" [1] deals with live migration of guests that use passthrough
> > > devices. However, that scheme uses the MAC address for pairing
> > > the virtio device and the passthrough device. The thread "netvsc:
> > > refactor notifier/event handling code to use the failover framework"
> > > [2] discusses an alternate mechanism, such as using an UUID, for pairing
> > > the devices. Based on that discussion, proposals "Add "Group Identifier"
> > > to virtio PCI capabilities." [3] and "RFC: Use of bridge devices to
> > > store pairing information..." [4] were made.
> > > 
> > > The current patch set includes all the feedback received for proposals [3]
> > > and [4]. For the sake of completeness, patch for the virtio specification
> > > is also included here. Following is the updated proposal.
> > > 
> > > 1. Extend the virtio specification to include a new virtio PCI capability
> > >     "VIRTIO_PCI_CAP_GROUP_ID_CFG".
> > > 
> > > 2. Enhance the QEMU CLI to include a "failover-group-id" option to the
> > >     virtio device. The "failover-group-id" is a 64 bit value.
> > > 
> > > 3. Enhance the QEMU CLI to include a "failover-group-id" option to the
> > >     Red Hat PCI bridge device (support for the i440FX model).
> > > 
> > > 4. Add a new "pcie-downstream" device, with the option
> > >     "failover-group-id" (support for the Q35 model).
> > > 
> > > 5. The operator creates a 64 bit unique identifier, failover-group-id.
> > > 
> > > 6. When the virtio device is created, the operator uses the
> > >     "failover-group-id" option (for example, '-device
> > >     virtio-net-pci,failover-group-id=<identifier>') and specifies the
> > >     failover-group-id created in step 4.
> > > 
> > >     QEMU stores the failover-group-id in the virtio device's configuration
> > >     space in the capability "VIRTIO_PCI_CAP_GROUP_ID_CFG".
> > > 
> > > 7. When assigning a PCI device to the guest in passthrough mode, the
> > >     operator first creates a bridge using the "failover-group-id" option
> > >     (for example, '-device pcie-downstream,failover-group-id=<identifier>')
> > >     to specify the failover-group-id created in step 4, and then attaches
> > >     the passthrough device to the bridge.
> > > 
> > >     QEMU stores the failover-group-id in the configuration space of the
> > >     bridge as Vendor-Specific capability (0x09). The "Vendor" here is
> > >     not to be confused with a specific organization. Instead, the vendor
> > >     of the bridge is QEMU.
> > > 
> > > 8. Patch 4 in patch series "Enable virtio_net to act as a standby for
> > >     a passthru device" [1] needs to be modified to use the UUID values
> > >     present in the bridge's configuration space and the virtio device's
> > >     configuration space instead of the MAC address for pairing the devices.
> > I'm still missing a few bits in the overall scheme.
> > 
> > Is the guest supposed to acknowledge the support for PT-PV failover?
> 
> Yes. We are leveraging virtio's feature negotiation mechanism for that.
> Guest which does not acknowledge the support will not have PT plugged in.
> 
> > Should the PT device be visibile to the guest before it acknowledges the
> > support for failover?
> No. QEMU will only expose PT device after guest acknowledges the support
> through virtio's feature negotiation.
> 
> >    How is this supposed to work with legacy guests that don't support it?
> Only PV device will be exposed on legacy guest.

So how is this coordination going to work?  One possibility is that the
PV device emits a QMP event upon the guest driver confirming the support
for failover, the management layer intercepts the event and performs
device_add of the PT device.  Another is that the PT device is added
from the very beginning (e.g. on the QEMU command line) but its parent
PCI bridge subscribes a callback with the PV device to "activate" the PT
device upon negotiating the failover feature.

I think this needs to be decided within the scope of this patchset.

> > Is the guest supposed to signal the datapath switch to the host?
> No, guest doesn't need to be initiating datapath switch at all.

What happens if the guest supports failover in its PV driver, but lacks
the driver for the PT device?

> However, QMP
> events may be generated when exposing or hiding the PT device through hot
> plug/unplug to facilitate host to switch datapath.

The PT device hot plug/unplug are initiated by the host, aren't they?  Why
would it also need QMP events for them?

> > Is the scheme going to be applied/extended to other transports (vmbus,
> > virtio-ccw, etc.)?
> Well, it depends on the use case, and how feasible it can be extended to
> other transport due to constraints and transport specifics.
> 
> > Is the failover group concept going to be used beyond PT-PV network
> > device failover?
> Although the concept of failover group is generic, the implementation itself
> may vary.

My point with these two questions is that since this patchset is
defining external interfaces -- with guest OS, with management layer --
which are not easy to change later, it might make sense to try and see
if the interfaces map to other usecases.  E.g. I think we can get enough
information on how Hyper-V handles PT-PV network device failover from
the current Linux implementation; it may be a good idea to share some
concepts and workflows with virtio-pci.

Thanks,
Roman.

Re: [Qemu-devel] [PATCH v3 0/3] Use of unique identifier for pairing virtio and passthrough devices...
Posted by Venu Busireddy 5 years, 10 months ago
On 2018-07-03 12:58:25 +0300, Roman Kagan wrote:
> On Mon, Jul 02, 2018 at 02:14:52PM -0700, si-wei liu wrote:
> > On 7/2/2018 9:14 AM, Roman Kagan wrote:
> > > On Fri, Jun 29, 2018 at 05:19:03PM -0500, Venu Busireddy wrote:
> > > > The patch set "Enable virtio_net to act as a standby for a passthru
> > > > device" [1] deals with live migration of guests that use passthrough
> > > > devices. However, that scheme uses the MAC address for pairing
> > > > the virtio device and the passthrough device. The thread "netvsc:
> > > > refactor notifier/event handling code to use the failover framework"
> > > > [2] discusses an alternate mechanism, such as using an UUID, for pairing
> > > > the devices. Based on that discussion, proposals "Add "Group Identifier"
> > > > to virtio PCI capabilities." [3] and "RFC: Use of bridge devices to
> > > > store pairing information..." [4] were made.
> > > > 
> > > > The current patch set includes all the feedback received for proposals [3]
> > > > and [4]. For the sake of completeness, patch for the virtio specification
> > > > is also included here. Following is the updated proposal.
> > > > 
> > > > 1. Extend the virtio specification to include a new virtio PCI capability
> > > >     "VIRTIO_PCI_CAP_GROUP_ID_CFG".
> > > > 
> > > > 2. Enhance the QEMU CLI to include a "failover-group-id" option to the
> > > >     virtio device. The "failover-group-id" is a 64 bit value.
> > > > 
> > > > 3. Enhance the QEMU CLI to include a "failover-group-id" option to the
> > > >     Red Hat PCI bridge device (support for the i440FX model).
> > > > 
> > > > 4. Add a new "pcie-downstream" device, with the option
> > > >     "failover-group-id" (support for the Q35 model).
> > > > 
> > > > 5. The operator creates a 64 bit unique identifier, failover-group-id.
> > > > 
> > > > 6. When the virtio device is created, the operator uses the
> > > >     "failover-group-id" option (for example, '-device
> > > >     virtio-net-pci,failover-group-id=<identifier>') and specifies the
> > > >     failover-group-id created in step 4.
> > > > 
> > > >     QEMU stores the failover-group-id in the virtio device's configuration
> > > >     space in the capability "VIRTIO_PCI_CAP_GROUP_ID_CFG".
> > > > 
> > > > 7. When assigning a PCI device to the guest in passthrough mode, the
> > > >     operator first creates a bridge using the "failover-group-id" option
> > > >     (for example, '-device pcie-downstream,failover-group-id=<identifier>')
> > > >     to specify the failover-group-id created in step 4, and then attaches
> > > >     the passthrough device to the bridge.
> > > > 
> > > >     QEMU stores the failover-group-id in the configuration space of the
> > > >     bridge as Vendor-Specific capability (0x09). The "Vendor" here is
> > > >     not to be confused with a specific organization. Instead, the vendor
> > > >     of the bridge is QEMU.
> > > > 
> > > > 8. Patch 4 in patch series "Enable virtio_net to act as a standby for
> > > >     a passthru device" [1] needs to be modified to use the UUID values
> > > >     present in the bridge's configuration space and the virtio device's
> > > >     configuration space instead of the MAC address for pairing the devices.
> > > I'm still missing a few bits in the overall scheme.
> > > 
> > > Is the guest supposed to acknowledge the support for PT-PV failover?
> > 
> > Yes. We are leveraging virtio's feature negotiation mechanism for that.
> > Guest which does not acknowledge the support will not have PT plugged in.
> > 
> > > Should the PT device be visibile to the guest before it acknowledges the
> > > support for failover?
> > No. QEMU will only expose PT device after guest acknowledges the support
> > through virtio's feature negotiation.
> > 
> > >    How is this supposed to work with legacy guests that don't support it?
> > Only PV device will be exposed on legacy guest.
> 
> So how is this coordination going to work?  One possibility is that the
> PV device emits a QMP event upon the guest driver confirming the support
> for failover, the management layer intercepts the event and performs
> device_add of the PT device.  Another is that the PT device is added
> from the very beginning (e.g. on the QEMU command line) but its parent
> PCI bridge subscribes a callback with the PV device to "activate" the PT
> device upon negotiating the failover feature.
> 
> I think this needs to be decided within the scope of this patchset.
> 
> > > Is the guest supposed to signal the datapath switch to the host?
> > No, guest doesn't need to be initiating datapath switch at all.
> 
> What happens if the guest supports failover in its PV driver, but lacks
> the driver for the PT device?
> 
> > However, QMP
> > events may be generated when exposing or hiding the PT device through hot
> > plug/unplug to facilitate host to switch datapath.
> 
> The PT device hot plug/unplug are initiated by the host, aren't they?  Why
> would it also need QMP events for them?
> 
> > > Is the scheme going to be applied/extended to other transports (vmbus,
> > > virtio-ccw, etc.)?
> > Well, it depends on the use case, and how feasible it can be extended to
> > other transport due to constraints and transport specifics.
> > 
> > > Is the failover group concept going to be used beyond PT-PV network
> > > device failover?
> > Although the concept of failover group is generic, the implementation itself
> > may vary.
> 
> My point with these two questions is that since this patchset is
> defining external interfaces -- with guest OS, with management layer --

This patch set is not defining any external interfaces. All this is doing
is provide the means and locations to store the "group identifier". How
that info will be used, I thought, should be another patch set.

Venu

> which are not easy to change later, it might make sense to try and see
> if the interfaces map to other usecases.  E.g. I think we can get enough
> information on how Hyper-V handles PT-PV network device failover from
> the current Linux implementation; it may be a good idea to share some
> concepts and workflows with virtio-pci.
> 
> Thanks,
> Roman.

Re: [Qemu-devel] [PATCH v3 0/3] Use of unique identifier for pairing virtio and passthrough devices...
Posted by Roman Kagan 5 years, 10 months ago
On Tue, Jul 03, 2018 at 09:28:17AM -0500, Venu Busireddy wrote:
> On 2018-07-03 12:58:25 +0300, Roman Kagan wrote:
> > My point with these two questions is that since this patchset is
> > defining external interfaces -- with guest OS, with management layer --
> 
> This patch set is not defining any external interfaces. All this is doing
> is provide the means and locations to store the "group identifier". How
> that info will be used, I thought, should be another patch set.

Device properties are a part of the interface to the management layer.
PCI vendor capabilities are a part of the interface to the guest OS.

So yes, I think it makes sense to have at least a rough plan on how it's
going to be used.

Thanks,
Roman.

Re: [Qemu-devel] [virtio-dev] Re: [PATCH v3 0/3] Use of unique identifier for pairing virtio and passthrough devices...
Posted by Cornelia Huck 5 years, 10 months ago
On Tue, 3 Jul 2018 09:28:17 -0500
Venu Busireddy <venu.busireddy@oracle.com> wrote:

> On 2018-07-03 12:58:25 +0300, Roman Kagan wrote:
> > On Mon, Jul 02, 2018 at 02:14:52PM -0700, si-wei liu wrote:  
> > > On 7/2/2018 9:14 AM, Roman Kagan wrote:  
> > > > Is the scheme going to be applied/extended to other transports (vmbus,
> > > > virtio-ccw, etc.)?  
> > > Well, it depends on the use case, and how feasible it can be extended to
> > > other transport due to constraints and transport specifics.
> > >   
> > > > Is the failover group concept going to be used beyond PT-PV network
> > > > device failover?  
> > > Although the concept of failover group is generic, the implementation itself
> > > may vary.  
> > 
> > My point with these two questions is that since this patchset is
> > defining external interfaces -- with guest OS, with management layer --  
> 
> This patch set is not defining any external interfaces. All this is doing
> is provide the means and locations to store the "group identifier". How
> that info will be used, I thought, should be another patch set.
> 
> Venu
> 
> > which are not easy to change later, it might make sense to try and see
> > if the interfaces map to other usecases.  E.g. I think we can get enough
> > information on how Hyper-V handles PT-PV network device failover from
> > the current Linux implementation; it may be a good idea to share some
> > concepts and workflows with virtio-pci.

But this patch set defines a host<->guest interface to make pairing
information on the host available to the guest, no?

From my point of view, there are several concerns:
- This approach assumes a homogeneous pairing (same transport), and
  even more, it assumes that this transport is pci.
- It won't work for zPCI (although zPCI is really strange) -- this
  means it will be completely unusable on s390x.
- It is too focused on a narrow use case. How is it supposed to be
  extended?

What I would prefer:
- Implement a pairing id support that does not rely on a certain
  transport, but leverages virtio (which is in the game anyway). We'd
  get at least the "virtio-net device paired with vfio" use case, which
  is what is currently implemented in the Linux kernel.
- Think about a more generic way to relay configuration metadata to the
  host.

Re: [Qemu-devel] [virtio-dev] Re: [PATCH v3 0/3] Use of unique identifier for pairing virtio and passthrough devices...
Posted by Siwei Liu 5 years, 10 months ago
On Tue, Jul 3, 2018 at 7:52 AM, Cornelia Huck <cohuck@redhat.com> wrote:
> On Tue, 3 Jul 2018 09:28:17 -0500
> Venu Busireddy <venu.busireddy@oracle.com> wrote:
>
>> On 2018-07-03 12:58:25 +0300, Roman Kagan wrote:
>> > On Mon, Jul 02, 2018 at 02:14:52PM -0700, si-wei liu wrote:
>> > > On 7/2/2018 9:14 AM, Roman Kagan wrote:
>> > > > Is the scheme going to be applied/extended to other transports (vmbus,
>> > > > virtio-ccw, etc.)?
>> > > Well, it depends on the use case, and how feasible it can be extended to
>> > > other transport due to constraints and transport specifics.
>> > >
>> > > > Is the failover group concept going to be used beyond PT-PV network
>> > > > device failover?
>> > > Although the concept of failover group is generic, the implementation itself
>> > > may vary.
>> >
>> > My point with these two questions is that since this patchset is
>> > defining external interfaces -- with guest OS, with management layer --
>>
>> This patch set is not defining any external interfaces. All this is doing
>> is provide the means and locations to store the "group identifier". How
>> that info will be used, I thought, should be another patch set.
>>
>> Venu
>>
>> > which are not easy to change later, it might make sense to try and see
>> > if the interfaces map to other usecases.  E.g. I think we can get enough
>> > information on how Hyper-V handles PT-PV network device failover from
>> > the current Linux implementation; it may be a good idea to share some
>> > concepts and workflows with virtio-pci.
>
> But this patch set defines a host<->guest interface to make pairing
> information on the host available to the guest, no?
>
> From my point of view, there are several concerns:
> - This approach assumes a homogeneous pairing (same transport), and
>   even more, it assumes that this transport is pci.

Not really.

There could be some other place to define a generic (transport
independent) virtio feature, whereas the data (group ID) can be stored
in transport specific way. That generic virtio feature and the way to
specify target transport to group with is yet to be defined. I don't
see this patch is in conflict with that direction.


> - It won't work for zPCI (although zPCI is really strange) -- this
>   means it will be completely unusable on s390x.
I still need more information about this use case. First off, does
zPCI support all the hot plug semantics or functionality the same way
as PCI? Or there has to be some platform or firmeware support like
ACPI hotplug? Does QEMU have all the pieces ready for s390 zPCI
hotplug?

Does the s390x use case in your mind concerns with VFIO migraition, or
replacement of a PT device with backup virtio-ccw path? Or something
else?

As the assumption of SR-IOV migration is that, hotplug is used as an
inidicator for datapath switch - which maps to moving MAC address or
VLAN filter around between PV and VF. I am not sure how that maps to
s390x and zPCI with regard to host coordination.

-Siwei

> - It is too focused on a narrow use case. How is it supposed to be
>   extended?
>
> What I would prefer:
> - Implement a pairing id support that does not rely on a certain
>   transport, but leverages virtio (which is in the game anyway). We'd
>   get at least the "virtio-net device paired with vfio" use case, which
>   is what is currently implemented in the Linux kernel.
> - Think about a more generic way to relay configuration metadata to the
>   host.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>

Re: [Qemu-devel] [virtio-dev] Re: [PATCH v3 0/3] Use of unique identifier for pairing virtio and passthrough devices...
Posted by Cornelia Huck 5 years, 10 months ago
On Tue, 3 Jul 2018 16:31:03 -0700
Siwei Liu <loseweigh@gmail.com> wrote:

> On Tue, Jul 3, 2018 at 7:52 AM, Cornelia Huck <cohuck@redhat.com> wrote:
> > From my point of view, there are several concerns:
> > - This approach assumes a homogeneous pairing (same transport), and
> >   even more, it assumes that this transport is pci.  
> 
> Not really.
> 
> There could be some other place to define a generic (transport
> independent) virtio feature, whereas the data (group ID) can be stored
> in transport specific way. That generic virtio feature and the way to
> specify target transport to group with is yet to be defined. I don't
> see this patch is in conflict with that direction.

Sorry, but I really do not see how this is not pci-specific.

- One of your components is a bridge. A transport does not necessarily
  have that concept, at least not in a way meaningful for this approach
  to work.
- Even if we can introduce transport-specific ways for other
  transports, the bridge concept still means that the pairing cannot be
  cross-transport.

I think we should be clear what we want from a generic
(transport-agnostic) virtio feature first. Probably some way to relay
an identifier of the to-be-paired device (transport-specific +
information what the transport is?)

> > - It won't work for zPCI (although zPCI is really strange) -- this
> >   means it will be completely unusable on s390x.  
> I still need more information about this use case. First off, does
> zPCI support all the hot plug semantics or functionality the same way
> as PCI? Or there has to be some platform or firmeware support like
> ACPI hotplug? Does QEMU have all the pieces ready for s390 zPCI
> hotplug?

zPCI is a strange beast, so first a pointer to a writeup I did:
https://virtualpenguins.blogspot.de/2018/02/notes-on-pci-on-s390x.html

It does support hotplug, within the s390 architectural context, but
that should be fine for our needs here.

My concern comes from the 'no topology' issue. We build a fake topology
in QEMU (to use the generic pci infrastructure), but the guest does not
see any of this. It issues an instruction and gets a list of functions.
This means any bridge information is not accessible to the guest.

> 
> Does the s390x use case in your mind concerns with VFIO migraition, or
> replacement of a PT device with backup virtio-ccw path? Or something
> else?

Migration with vfio is the case most likely to be relevant. I'm mostly
concerned with not needlessly closing doors, though.

> 
> As the assumption of SR-IOV migration is that, hotplug is used as an
> inidicator for datapath switch - which maps to moving MAC address or
> VLAN filter around between PV and VF. I am not sure how that maps to
> s390x and zPCI with regard to host coordination.

If we can move MAC addresses or VLAN filters on Linux/QEMU/libvirt in
general, there's no inherent reason why we shouldn't be able to do so
on s390 as well. What matters more is probably which pci network cards
are supported (currently Mellanox AFAIK, not sure if there are others).

> 
> -Siwei
> 
> > - It is too focused on a narrow use case. How is it supposed to be
> >   extended?
> >
> > What I would prefer:
> > - Implement a pairing id support that does not rely on a certain
> >   transport, but leverages virtio (which is in the game anyway). We'd
> >   get at least the "virtio-net device paired with vfio" use case, which
> >   is what is currently implemented in the Linux kernel.
> > - Think about a more generic way to relay configuration metadata to the
> >   host.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> >  


Re: [Qemu-devel] [virtio-dev] Re: [PATCH v3 0/3] Use of unique identifier for pairing virtio and passthrough devices...
Posted by Siwei Liu 5 years, 10 months ago
On Wed, Jul 4, 2018 at 5:15 AM, Cornelia Huck <cohuck@redhat.com> wrote:
> On Tue, 3 Jul 2018 16:31:03 -0700
> Siwei Liu <loseweigh@gmail.com> wrote:
>
>> On Tue, Jul 3, 2018 at 7:52 AM, Cornelia Huck <cohuck@redhat.com> wrote:
>> > From my point of view, there are several concerns:
>> > - This approach assumes a homogeneous pairing (same transport), and
>> >   even more, it assumes that this transport is pci.
>>
>> Not really.
>>
>> There could be some other place to define a generic (transport
>> independent) virtio feature, whereas the data (group ID) can be stored
>> in transport specific way. That generic virtio feature and the way to
>> specify target transport to group with is yet to be defined. I don't
>> see this patch is in conflict with that direction.
>
> Sorry, but I really do not see how this is not pci-specific.
>
> - One of your components is a bridge. A transport does not necessarily
>   have that concept, at least not in a way meaningful for this approach
>   to work.

Assuming we have a transport agnostic high-level virtio feature to
indicate that the target type for the to-be-paired device is PCI, then
we have to use the data stored in a PC bridge to pair the device. It
does not associate transport of virtio itself with the type of target
device, right. The introduction of PCI bridge is just for storing the
group ID data for the PCI passthrough device case, while one may
define and introduce other means to retrieve the info if target device
type is not PCI e.g. a special instruction to get group ID on s390x
zPCI.

> - Even if we can introduce transport-specific ways for other
>   transports, the bridge concept still means that the pairing cannot be
>   cross-transport.

Sorry, I did not get this. A bridge is PCI specific concept indeed,
but the virtio iteself can be of other transport. Why it matters? I
guess a higher-level feature is needed to define the target transport
for the to-be-paired device. The group ID info can reside on the
transport specific area on virtio itself though. E.g. for virtio-pci,
get it on the vendor specific capability in the PCI configuration
space.
For virtio-ccw, you may come up with CCW specific way to get the group
ID data. Is this not what you had in mind?

>
> I think we should be clear what we want from a generic
> (transport-agnostic) virtio feature first. Probably some way to relay
> an identifier of the to-be-paired device (transport-specific +
> information what the transport is?)
>
Are we talking about the transport of the virtio itself or the target
device? Note the virtio feature must always be retrieved using
transport specific way, although the semantics of the feauture can be
transport independent/agnostic. Once a virtio driver instace gets to
the point to read feature bits from the device, the transport of that
virtio device is determined and cannot be changed later.


>> > - It won't work for zPCI (although zPCI is really strange) -- this
>> >   means it will be completely unusable on s390x.
>> I still need more information about this use case. First off, does
>> zPCI support all the hot plug semantics or functionality the same way
>> as PCI? Or there has to be some platform or firmeware support like
>> ACPI hotplug? Does QEMU have all the pieces ready for s390 zPCI
>> hotplug?
>
> zPCI is a strange beast, so first a pointer to a writeup I did:
> https://virtualpenguins.blogspot.de/2018/02/notes-on-pci-on-s390x.html
>
> It does support hotplug, within the s390 architectural context, but
> that should be fine for our needs here.
>
> My concern comes from the 'no topology' issue. We build a fake topology
> in QEMU (to use the generic pci infrastructure), but the guest does not
> see any of this. It issues an instruction and gets a list of functions.
> This means any bridge information is not accessible to the guest.

That is the reason why I prefer leaving it to specific transport
(zPCI) to define its own means to present and retrieve the group ID
information. The high-level feature bit just provides the indirection
to the target transport (for the to-be-paired device), while seperate
transport can have a way of its own to present/retrieve the group ID
data.

I don't know where to present that group ID data on s390 zPCI though
to be honest. But I doubt we can come up with a very general
transport-agnostic way to present the data for both (and pontentially
ALL) bus architectures.

-Siwei

>
>>
>> Does the s390x use case in your mind concerns with VFIO migraition, or
>> replacement of a PT device with backup virtio-ccw path? Or something
>> else?
>
> Migration with vfio is the case most likely to be relevant. I'm mostly
> concerned with not needlessly closing doors, though.
>
>>
>> As the assumption of SR-IOV migration is that, hotplug is used as an
>> inidicator for datapath switch - which maps to moving MAC address or
>> VLAN filter around between PV and VF. I am not sure how that maps to
>> s390x and zPCI with regard to host coordination.
>
> If we can move MAC addresses or VLAN filters on Linux/QEMU/libvirt in
> general, there's no inherent reason why we shouldn't be able to do so
> on s390 as well. What matters more is probably which pci network cards
> are supported (currently Mellanox AFAIK, not sure if there are others).
>
>>
>> -Siwei
>>
>> > - It is too focused on a narrow use case. How is it supposed to be
>> >   extended?
>> >
>> > What I would prefer:
>> > - Implement a pairing id support that does not rely on a certain
>> >   transport, but leverages virtio (which is in the game anyway). We'd
>> >   get at least the "virtio-net device paired with vfio" use case, which
>> >   is what is currently implemented in the Linux kernel.
>> > - Think about a more generic way to relay configuration metadata to the
>> >   host.
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
>> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>> >
>

Re: [Qemu-devel] [virtio-dev] Re: [PATCH v3 0/3] Use of unique identifier for pairing virtio and passthrough devices...
Posted by Cornelia Huck 5 years, 10 months ago
On Thu, 5 Jul 2018 17:49:10 -0700
Siwei Liu <loseweigh@gmail.com> wrote:

> On Wed, Jul 4, 2018 at 5:15 AM, Cornelia Huck <cohuck@redhat.com> wrote:
> > On Tue, 3 Jul 2018 16:31:03 -0700
> > Siwei Liu <loseweigh@gmail.com> wrote:
> >  
> >> On Tue, Jul 3, 2018 at 7:52 AM, Cornelia Huck <cohuck@redhat.com> wrote:  
> >> > From my point of view, there are several concerns:
> >> > - This approach assumes a homogeneous pairing (same transport), and
> >> >   even more, it assumes that this transport is pci.  
> >>
> >> Not really.
> >>
> >> There could be some other place to define a generic (transport
> >> independent) virtio feature, whereas the data (group ID) can be stored
> >> in transport specific way. That generic virtio feature and the way to
> >> specify target transport to group with is yet to be defined. I don't
> >> see this patch is in conflict with that direction.  
> >
> > Sorry, but I really do not see how this is not pci-specific.
> >
> > - One of your components is a bridge. A transport does not necessarily
> >   have that concept, at least not in a way meaningful for this approach
> >   to work.  
> 
> Assuming we have a transport agnostic high-level virtio feature to
> indicate that the target type for the to-be-paired device is PCI, then
> we have to use the data stored in a PC bridge to pair the device. It
> does not associate transport of virtio itself with the type of target
> device, right. The introduction of PCI bridge is just for storing the
> group ID data for the PCI passthrough device case, while one may
> define and introduce other means to retrieve the info if target device
> type is not PCI e.g. a special instruction to get group ID on s390x
> zPCI.

Well, zPCI *is* PCI, just a very quirky one. I'm not sure how upper
layers are supposed to distinguish that.

Is there really no existing PCI identifier that (a) the host admin
already knows and (b) the guest can retrieve easily?

> 
> > - Even if we can introduce transport-specific ways for other
> >   transports, the bridge concept still means that the pairing cannot be
> >   cross-transport.  
> 
> Sorry, I did not get this. A bridge is PCI specific concept indeed,
> but the virtio iteself can be of other transport. Why it matters? I
> guess a higher-level feature is needed to define the target transport
> for the to-be-paired device. The group ID info can reside on the
> transport specific area on virtio itself though. E.g. for virtio-pci,
> get it on the vendor specific capability in the PCI configuration
> space.
> For virtio-ccw, you may come up with CCW specific way to get the group
> ID data. Is this not what you had in mind?

No, my idea was it to add it as a field in the virtio-net config space,
making it transport-agnostic. (And making this a typed value, depending
on the vfio device we want to pair the virtio device with.)

If we want to extend that to other device types, we can add the field
in their config space; but I'd rather prefer a more generic "host
relays config metainformation" approach.

> 
> >
> > I think we should be clear what we want from a generic
> > (transport-agnostic) virtio feature first. Probably some way to relay
> > an identifier of the to-be-paired device (transport-specific +
> > information what the transport is?)
> >  
> Are we talking about the transport of the virtio itself or the target
> device? Note the virtio feature must always be retrieved using
> transport specific way, although the semantics of the feauture can be
> transport independent/agnostic. Once a virtio driver instace gets to
> the point to read feature bits from the device, the transport of that
> virtio device is determined and cannot be changed later.

No, I don't want to introduce a new, per-transport mechanism, but
rather put it into the config space.

> 
> 
> >> > - It won't work for zPCI (although zPCI is really strange) -- this
> >> >   means it will be completely unusable on s390x.  
> >> I still need more information about this use case. First off, does
> >> zPCI support all the hot plug semantics or functionality the same way
> >> as PCI? Or there has to be some platform or firmeware support like
> >> ACPI hotplug? Does QEMU have all the pieces ready for s390 zPCI
> >> hotplug?  
> >
> > zPCI is a strange beast, so first a pointer to a writeup I did:
> > https://virtualpenguins.blogspot.de/2018/02/notes-on-pci-on-s390x.html
> >
> > It does support hotplug, within the s390 architectural context, but
> > that should be fine for our needs here.
> >
> > My concern comes from the 'no topology' issue. We build a fake topology
> > in QEMU (to use the generic pci infrastructure), but the guest does not
> > see any of this. It issues an instruction and gets a list of functions.
> > This means any bridge information is not accessible to the guest.  
> 
> That is the reason why I prefer leaving it to specific transport
> (zPCI) to define its own means to present and retrieve the group ID
> information. The high-level feature bit just provides the indirection
> to the target transport (for the to-be-paired device), while seperate
> transport can have a way of its own to present/retrieve the group ID
> data.
> 
> I don't know where to present that group ID data on s390 zPCI though
> to be honest. But I doubt we can come up with a very general
> transport-agnostic way to present the data for both (and pontentially
> ALL) bus architectures.

I don't want to establish zPCI as a different transport than 'normal'
PCI; that would just make life difficult for everyone. The 'put it into
virtio config space' approach would mean a second feature bit, but
should just work.

Re: [Qemu-devel] [virtio-dev] Re: [PATCH v3 0/3] Use of unique identifier for pairing virtio and passthrough devices...
Posted by Michael S. Tsirkin 5 years, 10 months ago
On Fri, Jul 06, 2018 at 03:54:06PM +0200, Cornelia Huck wrote:
> On Thu, 5 Jul 2018 17:49:10 -0700
> Siwei Liu <loseweigh@gmail.com> wrote:
> 
> > On Wed, Jul 4, 2018 at 5:15 AM, Cornelia Huck <cohuck@redhat.com> wrote:
> > > On Tue, 3 Jul 2018 16:31:03 -0700
> > > Siwei Liu <loseweigh@gmail.com> wrote:
> > >  
> > >> On Tue, Jul 3, 2018 at 7:52 AM, Cornelia Huck <cohuck@redhat.com> wrote:  
> > >> > From my point of view, there are several concerns:
> > >> > - This approach assumes a homogeneous pairing (same transport), and
> > >> >   even more, it assumes that this transport is pci.  
> > >>
> > >> Not really.
> > >>
> > >> There could be some other place to define a generic (transport
> > >> independent) virtio feature, whereas the data (group ID) can be stored
> > >> in transport specific way. That generic virtio feature and the way to
> > >> specify target transport to group with is yet to be defined. I don't
> > >> see this patch is in conflict with that direction.  
> > >
> > > Sorry, but I really do not see how this is not pci-specific.
> > >
> > > - One of your components is a bridge. A transport does not necessarily
> > >   have that concept, at least not in a way meaningful for this approach
> > >   to work.  
> > 
> > Assuming we have a transport agnostic high-level virtio feature to
> > indicate that the target type for the to-be-paired device is PCI, then
> > we have to use the data stored in a PC bridge to pair the device. It
> > does not associate transport of virtio itself with the type of target
> > device, right. The introduction of PCI bridge is just for storing the
> > group ID data for the PCI passthrough device case, while one may
> > define and introduce other means to retrieve the info if target device
> > type is not PCI e.g. a special instruction to get group ID on s390x
> > zPCI.
> 
> Well, zPCI *is* PCI, just a very quirky one. I'm not sure how upper
> layers are supposed to distinguish that.
> 
> Is there really no existing PCI identifier that (a) the host admin
> already knows and (b) the guest can retrieve easily?

There is e.g. PCI Express serial number. However given that
one use of the feature is for live migration, we really
shouldn't use any ID from a real card.

> > 
> > > - Even if we can introduce transport-specific ways for other
> > >   transports, the bridge concept still means that the pairing cannot be
> > >   cross-transport.  
> > 
> > Sorry, I did not get this. A bridge is PCI specific concept indeed,
> > but the virtio iteself can be of other transport. Why it matters? I
> > guess a higher-level feature is needed to define the target transport
> > for the to-be-paired device. The group ID info can reside on the
> > transport specific area on virtio itself though. E.g. for virtio-pci,
> > get it on the vendor specific capability in the PCI configuration
> > space.
> > For virtio-ccw, you may come up with CCW specific way to get the group
> > ID data. Is this not what you had in mind?
> 
> No, my idea was it to add it as a field in the virtio-net config space,
> making it transport-agnostic. (And making this a typed value, depending
> on the vfio device we want to pair the virtio device with.)
> 
> If we want to extend that to other device types, we can add the field
> in their config space; but I'd rather prefer a more generic "host
> relays config metainformation" approach.
> 
> > 
> > >
> > > I think we should be clear what we want from a generic
> > > (transport-agnostic) virtio feature first. Probably some way to relay
> > > an identifier of the to-be-paired device (transport-specific +
> > > information what the transport is?)
> > >  
> > Are we talking about the transport of the virtio itself or the target
> > device? Note the virtio feature must always be retrieved using
> > transport specific way, although the semantics of the feauture can be
> > transport independent/agnostic. Once a virtio driver instace gets to
> > the point to read feature bits from the device, the transport of that
> > virtio device is determined and cannot be changed later.
> 
> No, I don't want to introduce a new, per-transport mechanism, but
> rather put it into the config space.
> > 
> > 
> > >> > - It won't work for zPCI (although zPCI is really strange) -- this
> > >> >   means it will be completely unusable on s390x.  
> > >> I still need more information about this use case. First off, does
> > >> zPCI support all the hot plug semantics or functionality the same way
> > >> as PCI? Or there has to be some platform or firmeware support like
> > >> ACPI hotplug? Does QEMU have all the pieces ready for s390 zPCI
> > >> hotplug?  
> > >
> > > zPCI is a strange beast, so first a pointer to a writeup I did:
> > > https://virtualpenguins.blogspot.de/2018/02/notes-on-pci-on-s390x.html
> > >
> > > It does support hotplug, within the s390 architectural context, but
> > > that should be fine for our needs here.
> > >
> > > My concern comes from the 'no topology' issue. We build a fake topology
> > > in QEMU (to use the generic pci infrastructure), but the guest does not
> > > see any of this. It issues an instruction and gets a list of functions.
> > > This means any bridge information is not accessible to the guest.  
> > 
> > That is the reason why I prefer leaving it to specific transport
> > (zPCI) to define its own means to present and retrieve the group ID
> > information. The high-level feature bit just provides the indirection
> > to the target transport (for the to-be-paired device), while seperate
> > transport can have a way of its own to present/retrieve the group ID
> > data.
> > 
> > I don't know where to present that group ID data on s390 zPCI though
> > to be honest. But I doubt we can come up with a very general
> > transport-agnostic way to present the data for both (and pontentially
> > ALL) bus architectures.
> 
> I don't want to establish zPCI as a different transport than 'normal'
> PCI; that would just make life difficult for everyone. The 'put it into
> virtio config space' approach would mean a second feature bit, but
> should just work.

So how about that. Each transport will gain a new field
which is offset within config space of a generic config.
Each field within that will be guarded by a feature bit.

E.g. add CCW_CMD_READ_CONF_OFFSET.

Is this better than a separate new generic config space?

-- 
MST

Re: [Qemu-devel] [virtio-dev] Re: [PATCH v3 0/3] Use of unique identifier for pairing virtio and passthrough devices...
Posted by Cornelia Huck 5 years, 9 months ago
On Fri, 6 Jul 2018 18:07:00 +0300
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Fri, Jul 06, 2018 at 03:54:06PM +0200, Cornelia Huck wrote:
> > On Thu, 5 Jul 2018 17:49:10 -0700
> > Siwei Liu <loseweigh@gmail.com> wrote:
> >   
> > > On Wed, Jul 4, 2018 at 5:15 AM, Cornelia Huck <cohuck@redhat.com> wrote:  
> > > > On Tue, 3 Jul 2018 16:31:03 -0700
> > > > Siwei Liu <loseweigh@gmail.com> wrote:
> > > >    
> > > >> On Tue, Jul 3, 2018 at 7:52 AM, Cornelia Huck <cohuck@redhat.com> wrote:    
> > > >> > From my point of view, there are several concerns:
> > > >> > - This approach assumes a homogeneous pairing (same transport), and
> > > >> >   even more, it assumes that this transport is pci.    
> > > >>
> > > >> Not really.
> > > >>
> > > >> There could be some other place to define a generic (transport
> > > >> independent) virtio feature, whereas the data (group ID) can be stored
> > > >> in transport specific way. That generic virtio feature and the way to
> > > >> specify target transport to group with is yet to be defined. I don't
> > > >> see this patch is in conflict with that direction.    
> > > >
> > > > Sorry, but I really do not see how this is not pci-specific.
> > > >
> > > > - One of your components is a bridge. A transport does not necessarily
> > > >   have that concept, at least not in a way meaningful for this approach
> > > >   to work.    
> > > 
> > > Assuming we have a transport agnostic high-level virtio feature to
> > > indicate that the target type for the to-be-paired device is PCI, then
> > > we have to use the data stored in a PC bridge to pair the device. It
> > > does not associate transport of virtio itself with the type of target
> > > device, right. The introduction of PCI bridge is just for storing the
> > > group ID data for the PCI passthrough device case, while one may
> > > define and introduce other means to retrieve the info if target device
> > > type is not PCI e.g. a special instruction to get group ID on s390x
> > > zPCI.  
> > 
> > Well, zPCI *is* PCI, just a very quirky one. I'm not sure how upper
> > layers are supposed to distinguish that.
> > 
> > Is there really no existing PCI identifier that (a) the host admin
> > already knows and (b) the guest can retrieve easily?  
> 
> There is e.g. PCI Express serial number. However given that
> one use of the feature is for live migration, we really
> shouldn't use any ID from a real card.

With any "vfio vs. live migration" setup, we basically have two
different cases:
- Both source and target use the same physical device (device shared
  between logical partitions, device accessible from two different
  machines).
- We are dealing with two different devices that can be configured to
  look/act the same.

I have been mostly thinking about the first case, but the serial number
probably won't work with the second case.

> 
> > >   
> > > > - Even if we can introduce transport-specific ways for other
> > > >   transports, the bridge concept still means that the pairing cannot be
> > > >   cross-transport.    
> > > 
> > > Sorry, I did not get this. A bridge is PCI specific concept indeed,
> > > but the virtio iteself can be of other transport. Why it matters? I
> > > guess a higher-level feature is needed to define the target transport
> > > for the to-be-paired device. The group ID info can reside on the
> > > transport specific area on virtio itself though. E.g. for virtio-pci,
> > > get it on the vendor specific capability in the PCI configuration
> > > space.
> > > For virtio-ccw, you may come up with CCW specific way to get the group
> > > ID data. Is this not what you had in mind?  
> > 
> > No, my idea was it to add it as a field in the virtio-net config space,
> > making it transport-agnostic. (And making this a typed value, depending
> > on the vfio device we want to pair the virtio device with.)
> > 
> > If we want to extend that to other device types, we can add the field
> > in their config space; but I'd rather prefer a more generic "host
> > relays config metainformation" approach.
> >   
> > >   
> > > >
> > > > I think we should be clear what we want from a generic
> > > > (transport-agnostic) virtio feature first. Probably some way to relay
> > > > an identifier of the to-be-paired device (transport-specific +
> > > > information what the transport is?)
> > > >    
> > > Are we talking about the transport of the virtio itself or the target
> > > device? Note the virtio feature must always be retrieved using
> > > transport specific way, although the semantics of the feauture can be
> > > transport independent/agnostic. Once a virtio driver instace gets to
> > > the point to read feature bits from the device, the transport of that
> > > virtio device is determined and cannot be changed later.  
> > 
> > No, I don't want to introduce a new, per-transport mechanism, but
> > rather put it into the config space.  
> > > 
> > >   
> > > >> > - It won't work for zPCI (although zPCI is really strange) -- this
> > > >> >   means it will be completely unusable on s390x.    
> > > >> I still need more information about this use case. First off, does
> > > >> zPCI support all the hot plug semantics or functionality the same way
> > > >> as PCI? Or there has to be some platform or firmeware support like
> > > >> ACPI hotplug? Does QEMU have all the pieces ready for s390 zPCI
> > > >> hotplug?    
> > > >
> > > > zPCI is a strange beast, so first a pointer to a writeup I did:
> > > > https://virtualpenguins.blogspot.de/2018/02/notes-on-pci-on-s390x.html
> > > >
> > > > It does support hotplug, within the s390 architectural context, but
> > > > that should be fine for our needs here.
> > > >
> > > > My concern comes from the 'no topology' issue. We build a fake topology
> > > > in QEMU (to use the generic pci infrastructure), but the guest does not
> > > > see any of this. It issues an instruction and gets a list of functions.
> > > > This means any bridge information is not accessible to the guest.    
> > > 
> > > That is the reason why I prefer leaving it to specific transport
> > > (zPCI) to define its own means to present and retrieve the group ID
> > > information. The high-level feature bit just provides the indirection
> > > to the target transport (for the to-be-paired device), while seperate
> > > transport can have a way of its own to present/retrieve the group ID
> > > data.
> > > 
> > > I don't know where to present that group ID data on s390 zPCI though
> > > to be honest. But I doubt we can come up with a very general
> > > transport-agnostic way to present the data for both (and pontentially
> > > ALL) bus architectures.  
> > 
> > I don't want to establish zPCI as a different transport than 'normal'
> > PCI; that would just make life difficult for everyone. The 'put it into
> > virtio config space' approach would mean a second feature bit, but
> > should just work.  
> 
> So how about that. Each transport will gain a new field
> which is offset within config space of a generic config.
> Each field within that will be guarded by a feature bit.
> 
> E.g. add CCW_CMD_READ_CONF_OFFSET.
> 
> Is this better than a separate new generic config space?

From a ccw view, it would amount to the same (i.e., a new command is
needed in any case). Currently, I'm actually not quite sure how much
more designing of this does make sense (I'll reply elsewhere in this
thread).

Re: [Qemu-devel] [virtio-dev] Re: [PATCH v3 0/3] Use of unique identifier for pairing virtio and passthrough devices...
Posted by Siwei Liu 5 years, 10 months ago
On Fri, Jul 6, 2018 at 6:54 AM, Cornelia Huck <cohuck@redhat.com> wrote:
> On Thu, 5 Jul 2018 17:49:10 -0700
> Siwei Liu <loseweigh@gmail.com> wrote:
>
>> On Wed, Jul 4, 2018 at 5:15 AM, Cornelia Huck <cohuck@redhat.com> wrote:
>> > On Tue, 3 Jul 2018 16:31:03 -0700
>> > Siwei Liu <loseweigh@gmail.com> wrote:
>> >
>> >> On Tue, Jul 3, 2018 at 7:52 AM, Cornelia Huck <cohuck@redhat.com> wrote:
>> >> > From my point of view, there are several concerns:
>> >> > - This approach assumes a homogeneous pairing (same transport), and
>> >> >   even more, it assumes that this transport is pci.
>> >>
>> >> Not really.
>> >>
>> >> There could be some other place to define a generic (transport
>> >> independent) virtio feature, whereas the data (group ID) can be stored
>> >> in transport specific way. That generic virtio feature and the way to
>> >> specify target transport to group with is yet to be defined. I don't
>> >> see this patch is in conflict with that direction.
>> >
>> > Sorry, but I really do not see how this is not pci-specific.
>> >
>> > - One of your components is a bridge. A transport does not necessarily
>> >   have that concept, at least not in a way meaningful for this approach
>> >   to work.
>>
>> Assuming we have a transport agnostic high-level virtio feature to
>> indicate that the target type for the to-be-paired device is PCI, then
>> we have to use the data stored in a PC bridge to pair the device. It
>> does not associate transport of virtio itself with the type of target
>> device, right. The introduction of PCI bridge is just for storing the
>> group ID data for the PCI passthrough device case, while one may
>> define and introduce other means to retrieve the info if target device
>> type is not PCI e.g. a special instruction to get group ID on s390x
>> zPCI.
>
> Well, zPCI *is* PCI, just a very quirky one. I'm not sure how upper
> layers are supposed to distinguish that.
>
> Is there really no existing PCI identifier that (a) the host admin
> already knows and (b) the guest can retrieve easily?

Not sure if zPCI should be categorized as real PCI due to lack of
topology. Some other virtual PCI or para-virtual PCI buses behave like
PCI but not a full blown one. Those are not regarded as (native) PCI
emulation either.

Can s390x guest use UID and FID described in the blog post you wrote
to identify zPCI device? I am thinking if we can define other grouping
mechanism than the 64bit long group ID (formerly 128 bit UUID). The
high-level indirection not just specifies target transport, but maybe
grouping mechanism also.

>
>>
>> > - Even if we can introduce transport-specific ways for other
>> >   transports, the bridge concept still means that the pairing cannot be
>> >   cross-transport.
>>
>> Sorry, I did not get this. A bridge is PCI specific concept indeed,
>> but the virtio iteself can be of other transport. Why it matters? I
>> guess a higher-level feature is needed to define the target transport
>> for the to-be-paired device. The group ID info can reside on the
>> transport specific area on virtio itself though. E.g. for virtio-pci,
>> get it on the vendor specific capability in the PCI configuration
>> space.
>> For virtio-ccw, you may come up with CCW specific way to get the group
>> ID data. Is this not what you had in mind?
>
> No, my idea was it to add it as a field in the virtio-net config space,
> making it transport-agnostic. (And making this a typed value, depending
> on the vfio device we want to pair the virtio device with.)
>
> If we want to extend that to other device types, we can add the field
> in their config space; but I'd rather prefer a more generic "host
> relays config metainformation" approach.
>
>>
>> >
>> > I think we should be clear what we want from a generic
>> > (transport-agnostic) virtio feature first. Probably some way to relay
>> > an identifier of the to-be-paired device (transport-specific +
>> > information what the transport is?)
>> >
>> Are we talking about the transport of the virtio itself or the target
>> device? Note the virtio feature must always be retrieved using
>> transport specific way, although the semantics of the feauture can be
>> transport independent/agnostic. Once a virtio driver instace gets to
>> the point to read feature bits from the device, the transport of that
>> virtio device is determined and cannot be changed later.
>
> No, I don't want to introduce a new, per-transport mechanism, but
> rather put it into the config space.

OK.

I think the common understanding is that config space shouldn't be per
device type. MST's suggestion in the other mail to carve some space
from the device config space for generic use seems viable to me.

>
>>
>>
>> >> > - It won't work for zPCI (although zPCI is really strange) -- this
>> >> >   means it will be completely unusable on s390x.
>> >> I still need more information about this use case. First off, does
>> >> zPCI support all the hot plug semantics or functionality the same way
>> >> as PCI? Or there has to be some platform or firmeware support like
>> >> ACPI hotplug? Does QEMU have all the pieces ready for s390 zPCI
>> >> hotplug?
>> >
>> > zPCI is a strange beast, so first a pointer to a writeup I did:
>> > https://virtualpenguins.blogspot.de/2018/02/notes-on-pci-on-s390x.html
>> >
>> > It does support hotplug, within the s390 architectural context, but
>> > that should be fine for our needs here.
>> >
>> > My concern comes from the 'no topology' issue. We build a fake topology
>> > in QEMU (to use the generic pci infrastructure), but the guest does not
>> > see any of this. It issues an instruction and gets a list of functions.
>> > This means any bridge information is not accessible to the guest.
>>
>> That is the reason why I prefer leaving it to specific transport
>> (zPCI) to define its own means to present and retrieve the group ID
>> information. The high-level feature bit just provides the indirection
>> to the target transport (for the to-be-paired device), while seperate
>> transport can have a way of its own to present/retrieve the group ID
>> data.
>>
>> I don't know where to present that group ID data on s390 zPCI though
>> to be honest. But I doubt we can come up with a very general
>> transport-agnostic way to present the data for both (and pontentially
>> ALL) bus architectures.
>
> I don't want to establish zPCI as a different transport than 'normal'
> PCI; that would just make life difficult for everyone. The 'put it into
> virtio config space' approach would mean a second feature bit, but
> should just work.

Sorry, but we still need an ID to match the VFIO passthrough device. I
don't see how putting the info into virtio config space can address
the problem.

-Siwei

Re: [Qemu-devel] [virtio-dev] Re: [PATCH v3 0/3] Use of unique identifier for pairing virtio and passthrough devices...
Posted by Cornelia Huck 5 years, 9 months ago
On Fri, 6 Jul 2018 16:37:10 -0700
Siwei Liu <loseweigh@gmail.com> wrote:

> On Fri, Jul 6, 2018 at 6:54 AM, Cornelia Huck <cohuck@redhat.com> wrote:
> > On Thu, 5 Jul 2018 17:49:10 -0700
> > Siwei Liu <loseweigh@gmail.com> wrote:
> >  
> >> On Wed, Jul 4, 2018 at 5:15 AM, Cornelia Huck <cohuck@redhat.com> wrote:  
> >> > On Tue, 3 Jul 2018 16:31:03 -0700
> >> > Siwei Liu <loseweigh@gmail.com> wrote:
> >> >  
> >> >> On Tue, Jul 3, 2018 at 7:52 AM, Cornelia Huck <cohuck@redhat.com> wrote:  
> >> >> > From my point of view, there are several concerns:
> >> >> > - This approach assumes a homogeneous pairing (same transport), and
> >> >> >   even more, it assumes that this transport is pci.  
> >> >>
> >> >> Not really.
> >> >>
> >> >> There could be some other place to define a generic (transport
> >> >> independent) virtio feature, whereas the data (group ID) can be stored
> >> >> in transport specific way. That generic virtio feature and the way to
> >> >> specify target transport to group with is yet to be defined. I don't
> >> >> see this patch is in conflict with that direction.  
> >> >
> >> > Sorry, but I really do not see how this is not pci-specific.
> >> >
> >> > - One of your components is a bridge. A transport does not necessarily
> >> >   have that concept, at least not in a way meaningful for this approach
> >> >   to work.  
> >>
> >> Assuming we have a transport agnostic high-level virtio feature to
> >> indicate that the target type for the to-be-paired device is PCI, then
> >> we have to use the data stored in a PC bridge to pair the device. It
> >> does not associate transport of virtio itself with the type of target
> >> device, right. The introduction of PCI bridge is just for storing the
> >> group ID data for the PCI passthrough device case, while one may
> >> define and introduce other means to retrieve the info if target device
> >> type is not PCI e.g. a special instruction to get group ID on s390x
> >> zPCI.  
> >
> > Well, zPCI *is* PCI, just a very quirky one. I'm not sure how upper
> > layers are supposed to distinguish that.
> >
> > Is there really no existing PCI identifier that (a) the host admin
> > already knows and (b) the guest can retrieve easily?  
> 
> Not sure if zPCI should be categorized as real PCI due to lack of
> topology. Some other virtual PCI or para-virtual PCI buses behave like
> PCI but not a full blown one. Those are not regarded as (native) PCI
> emulation either.

I'm actually close to giving up on zPCI... it is very odd, not
publically documented, and I see disk passthrough via vfio-ccw as the
more interesting case anyway.

> 
> Can s390x guest use UID and FID described in the blog post you wrote
> to identify zPCI device? I am thinking if we can define other grouping
> mechanism than the 64bit long group ID (formerly 128 bit UUID). The
> high-level indirection not just specifies target transport, but maybe
> grouping mechanism also.

Not sure if this could work with how they are architected
(semantics-wise)... no public documentation is available, so I'll
assume we can't use them for another purpose.

> 
> >  
> >>  
> >> > - Even if we can introduce transport-specific ways for other
> >> >   transports, the bridge concept still means that the pairing cannot be
> >> >   cross-transport.  
> >>
> >> Sorry, I did not get this. A bridge is PCI specific concept indeed,
> >> but the virtio iteself can be of other transport. Why it matters? I
> >> guess a higher-level feature is needed to define the target transport
> >> for the to-be-paired device. The group ID info can reside on the
> >> transport specific area on virtio itself though. E.g. for virtio-pci,
> >> get it on the vendor specific capability in the PCI configuration
> >> space.
> >> For virtio-ccw, you may come up with CCW specific way to get the group
> >> ID data. Is this not what you had in mind?  
> >
> > No, my idea was it to add it as a field in the virtio-net config space,
> > making it transport-agnostic. (And making this a typed value, depending
> > on the vfio device we want to pair the virtio device with.)
> >
> > If we want to extend that to other device types, we can add the field
> > in their config space; but I'd rather prefer a more generic "host
> > relays config metainformation" approach.
> >  
> >>  
> >> >
> >> > I think we should be clear what we want from a generic
> >> > (transport-agnostic) virtio feature first. Probably some way to relay
> >> > an identifier of the to-be-paired device (transport-specific +
> >> > information what the transport is?)
> >> >  
> >> Are we talking about the transport of the virtio itself or the target
> >> device? Note the virtio feature must always be retrieved using
> >> transport specific way, although the semantics of the feauture can be
> >> transport independent/agnostic. Once a virtio driver instace gets to
> >> the point to read feature bits from the device, the transport of that
> >> virtio device is determined and cannot be changed later.  
> >
> > No, I don't want to introduce a new, per-transport mechanism, but
> > rather put it into the config space.  
> 
> OK.
> 
> I think the common understanding is that config space shouldn't be per
> device type. MST's suggestion in the other mail to carve some space
> from the device config space for generic use seems viable to me.
> 
> >  
> >>
> >>  
> >> >> > - It won't work for zPCI (although zPCI is really strange) -- this
> >> >> >   means it will be completely unusable on s390x.  
> >> >> I still need more information about this use case. First off, does
> >> >> zPCI support all the hot plug semantics or functionality the same way
> >> >> as PCI? Or there has to be some platform or firmeware support like
> >> >> ACPI hotplug? Does QEMU have all the pieces ready for s390 zPCI
> >> >> hotplug?  
> >> >
> >> > zPCI is a strange beast, so first a pointer to a writeup I did:
> >> > https://virtualpenguins.blogspot.de/2018/02/notes-on-pci-on-s390x.html
> >> >
> >> > It does support hotplug, within the s390 architectural context, but
> >> > that should be fine for our needs here.
> >> >
> >> > My concern comes from the 'no topology' issue. We build a fake topology
> >> > in QEMU (to use the generic pci infrastructure), but the guest does not
> >> > see any of this. It issues an instruction and gets a list of functions.
> >> > This means any bridge information is not accessible to the guest.  
> >>
> >> That is the reason why I prefer leaving it to specific transport
> >> (zPCI) to define its own means to present and retrieve the group ID
> >> information. The high-level feature bit just provides the indirection
> >> to the target transport (for the to-be-paired device), while seperate
> >> transport can have a way of its own to present/retrieve the group ID
> >> data.
> >>
> >> I don't know where to present that group ID data on s390 zPCI though
> >> to be honest. But I doubt we can come up with a very general
> >> transport-agnostic way to present the data for both (and pontentially
> >> ALL) bus architectures.  
> >
> > I don't want to establish zPCI as a different transport than 'normal'
> > PCI; that would just make life difficult for everyone. The 'put it into
> > virtio config space' approach would mean a second feature bit, but
> > should just work.  
> 
> Sorry, but we still need an ID to match the VFIO passthrough device. I
> don't see how putting the info into virtio config space can address
> the problem.

As said, it is probably better to just give up on zPCI and focus on
'normal' PCI setups. I think we could handle dasd-via-vfio-ccw without
a paired device, and PCI is probably not the prime use case on s390x.
If any IBM folks (with access to documentation etc.) have an opinion,
they should speak up.

Re: [Qemu-devel] [virtio-dev] Re: [PATCH v3 0/3] Use of unique identifier for pairing virtio and passthrough devices...
Posted by Roman Kagan 5 years, 9 months ago
On Fri, Jul 06, 2018 at 03:54:06PM +0200, Cornelia Huck wrote:
> If we want to extend that to other device types, we can add the field
> in their config space; but I'd rather prefer a more generic "host
> relays config metainformation" approach.

The problem with this approach is that it's too generic and poorly
scoped.  As a result I guess it's likely to take ages to converge.

See for instance the discussion on including vm config metadata in qcow2
images: I think it's similar in scope (or the absense thereof).

Roman.

Re: [Qemu-devel] [virtio-dev] Re: [PATCH v3 0/3] Use of unique identifier for pairing virtio and passthrough devices...
Posted by Cornelia Huck 5 years, 9 months ago
On Mon, 9 Jul 2018 16:14:19 +0300
Roman Kagan <rkagan@virtuozzo.com> wrote:

> On Fri, Jul 06, 2018 at 03:54:06PM +0200, Cornelia Huck wrote:
> > If we want to extend that to other device types, we can add the field
> > in their config space; but I'd rather prefer a more generic "host
> > relays config metainformation" approach.  
> 
> The problem with this approach is that it's too generic and poorly
> scoped.  As a result I guess it's likely to take ages to converge.

I'd certainly not want to wait for it for this particular issue;
however, I think there is value in having this (seeing that at least
Open Stack and s390x DPM already make use of such a mechanism.) But we
can still revisit this later on.

(I'm not too sold on branching out to other device types in general,
but there seems to be demand for it.)

> 
> See for instance the discussion on including vm config metadata in qcow2
> images: I think it's similar in scope (or the absense thereof).
> 
> Roman.


Re: [Qemu-devel] [PATCH v3 0/3] Use of unique identifier for pairing virtio and passthrough devices...
Posted by si-wei liu 5 years, 10 months ago

On 7/3/2018 2:58 AM, Roman Kagan wrote:
> On Mon, Jul 02, 2018 at 02:14:52PM -0700, si-wei liu wrote:
>> On 7/2/2018 9:14 AM, Roman Kagan wrote:
>>> On Fri, Jun 29, 2018 at 05:19:03PM -0500, Venu Busireddy wrote:
>>>> The patch set "Enable virtio_net to act as a standby for a passthru
>>>> device" [1] deals with live migration of guests that use passthrough
>>>> devices. However, that scheme uses the MAC address for pairing
>>>> the virtio device and the passthrough device. The thread "netvsc:
>>>> refactor notifier/event handling code to use the failover framework"
>>>> [2] discusses an alternate mechanism, such as using an UUID, for pairing
>>>> the devices. Based on that discussion, proposals "Add "Group Identifier"
>>>> to virtio PCI capabilities." [3] and "RFC: Use of bridge devices to
>>>> store pairing information..." [4] were made.
>>>>
>>>> The current patch set includes all the feedback received for proposals [3]
>>>> and [4]. For the sake of completeness, patch for the virtio specification
>>>> is also included here. Following is the updated proposal.
>>>>
>>>> 1. Extend the virtio specification to include a new virtio PCI capability
>>>>      "VIRTIO_PCI_CAP_GROUP_ID_CFG".
>>>>
>>>> 2. Enhance the QEMU CLI to include a "failover-group-id" option to the
>>>>      virtio device. The "failover-group-id" is a 64 bit value.
>>>>
>>>> 3. Enhance the QEMU CLI to include a "failover-group-id" option to the
>>>>      Red Hat PCI bridge device (support for the i440FX model).
>>>>
>>>> 4. Add a new "pcie-downstream" device, with the option
>>>>      "failover-group-id" (support for the Q35 model).
>>>>
>>>> 5. The operator creates a 64 bit unique identifier, failover-group-id.
>>>>
>>>> 6. When the virtio device is created, the operator uses the
>>>>      "failover-group-id" option (for example, '-device
>>>>      virtio-net-pci,failover-group-id=<identifier>') and specifies the
>>>>      failover-group-id created in step 4.
>>>>
>>>>      QEMU stores the failover-group-id in the virtio device's configuration
>>>>      space in the capability "VIRTIO_PCI_CAP_GROUP_ID_CFG".
>>>>
>>>> 7. When assigning a PCI device to the guest in passthrough mode, the
>>>>      operator first creates a bridge using the "failover-group-id" option
>>>>      (for example, '-device pcie-downstream,failover-group-id=<identifier>')
>>>>      to specify the failover-group-id created in step 4, and then attaches
>>>>      the passthrough device to the bridge.
>>>>
>>>>      QEMU stores the failover-group-id in the configuration space of the
>>>>      bridge as Vendor-Specific capability (0x09). The "Vendor" here is
>>>>      not to be confused with a specific organization. Instead, the vendor
>>>>      of the bridge is QEMU.
>>>>
>>>> 8. Patch 4 in patch series "Enable virtio_net to act as a standby for
>>>>      a passthru device" [1] needs to be modified to use the UUID values
>>>>      present in the bridge's configuration space and the virtio device's
>>>>      configuration space instead of the MAC address for pairing the devices.
>>> I'm still missing a few bits in the overall scheme.
>>>
>>> Is the guest supposed to acknowledge the support for PT-PV failover?
>> Yes. We are leveraging virtio's feature negotiation mechanism for that.
>> Guest which does not acknowledge the support will not have PT plugged in.
>>
>>> Should the PT device be visibile to the guest before it acknowledges the
>>> support for failover?
>> No. QEMU will only expose PT device after guest acknowledges the support
>> through virtio's feature negotiation.
>>
>>>     How is this supposed to work with legacy guests that don't support it?
>> Only PV device will be exposed on legacy guest.
> So how is this coordination going to work?  One possibility is that the
> PV device emits a QMP event upon the guest driver confirming the support
> for failover, the management layer intercepts the event and performs
> device_add of the PT device.  Another is that the PT device is added
> from the very beginning (e.g. on the QEMU command line) but its parent
> PCI bridge subscribes a callback with the PV device to "activate" the PT
> device upon negotiating the failover feature.
>
> I think this needs to be decided within the scope of this patchset.
As what had been discussed in previous thread below, we would go with 
the approach that QEMU manages the visibility of the PT device 
automatically. Management layer supplies PT device to QEMU from the very 
beginning. This PT device won't be exposed to guest immediately, unless 
or until the guest virtio driver acknowledges the backup feature 
already. Once virtio driver in the guest initiates a device reset, the 
corresponding PT device must be taken out from guest. Then add it back 
later on after guest virtio completes negotiation for the backup feature.

https://patchwork.ozlabs.org/patch/909976/

>
>>> Is the guest supposed to signal the datapath switch to the host?
>> No, guest doesn't need to be initiating datapath switch at all.
> What happens if the guest supports failover in its PV driver, but lacks
> the driver for the PT device?
The assumption of failover driver is that the primary (PT device) will 
be able to get a datapath once it shows up in the guest . If adding a PT 
device to an unsupported guest, the result will be same as that without 
a standby PV driver - basically got no networking as you don't get a 
working driver.

Then perhaps don't add the PT device in the first place if guest lacks 
driver support?

>
>> However, QMP
>> events may be generated when exposing or hiding the PT device through hot
>> plug/unplug to facilitate host to switch datapath.
> The PT device hot plug/unplug are initiated by the host, aren't they?  Why
> would it also need QMP events for them?
As indicated above, the hot plug/unplug are initiated by QEMU not the 
management layer. Hence the QMP hot plug event is used as an indicator 
to switch host datapath. Unlike Windows Hyper-V SR-IOV driver model, the 
Linux host network stack does not offer a fine grained PF driver API to 
move MAC/VLAN filter, and the VF driver has to start with some initial 
MAC address filter programmed in when present in the guest. The QMP 
event is served as a checkpoint to switch MAC filter and/or VLAN filter 
between the PV and the VF.

>
>>> Is the scheme going to be applied/extended to other transports (vmbus,
>>> virtio-ccw, etc.)?
>> Well, it depends on the use case, and how feasible it can be extended to
>> other transport due to constraints and transport specifics.
>>
>>> Is the failover group concept going to be used beyond PT-PV network
>>> device failover?
>> Although the concept of failover group is generic, the implementation itself
>> may vary.
> My point with these two questions is that since this patchset is
> defining external interfaces -- with guest OS, with management layer --
> which are not easy to change later, it might make sense to try and see
> if the interfaces map to other usecases.  E.g. I think we can get enough
> information on how Hyper-V handles PT-PV network device failover from
> the current Linux implementation; it may be a good idea to share some
> concepts and workflows with virtio-pci.
As you may see from above, the handshake of virtio failover depends on 
hot plug (PCI or ACPI) and virtio specifics (feature negotiation). So 
far as I see the Hyper-V uses a completely different handshake protocol 
of its own (guest initiated datapath switch, Serial number in VMBus PCI 
bridge) than that of virtio. I can barely imagine how code could be 
implemented in a shared manner, although I agree conceptually failover 
group between these two is similar or the same.

-Siwei

>
> Thanks,
> Roman.
>
>


Re: [Qemu-devel] [PATCH v3 0/3] Use of unique identifier for pairing virtio and passthrough devices...
Posted by Roman Kagan 5 years, 9 months ago
On Tue, Jul 03, 2018 at 03:27:23PM -0700, si-wei liu wrote:
> On 7/3/2018 2:58 AM, Roman Kagan wrote:
> > So how is this coordination going to work?  One possibility is that the
> > PV device emits a QMP event upon the guest driver confirming the support
> > for failover, the management layer intercepts the event and performs
> > device_add of the PT device.  Another is that the PT device is added
> > from the very beginning (e.g. on the QEMU command line) but its parent
> > PCI bridge subscribes a callback with the PV device to "activate" the PT
> > device upon negotiating the failover feature.
> > 
> > I think this needs to be decided within the scope of this patchset.
> As what had been discussed in previous thread below, we would go with the
> approach that QEMU manages the visibility of the PT device automatically.
> Management layer supplies PT device to QEMU from the very beginning. This PT
> device won't be exposed to guest immediately, unless or until the guest
> virtio driver acknowledges the backup feature already. Once virtio driver in
> the guest initiates a device reset, the corresponding PT device must be
> taken out from guest. Then add it back later on after guest virtio completes
> negotiation for the backup feature.

This means that the parent bridge of the PT device (or whatever else can
control the visibility of the PT device to the guest) will need to
cooperate with the PV device *within* QEMU.  The most natural way to
specify this connection is to have a property of one device to refer to
the other by device-id.

Another benefit of this approach is that it will allow to hide the
(possibly transport-specific) device matching identifiers from the QEMU
caller, as it won't need to be persistent nor visible to the management
layer.  In particular, this will allow to move forward with the
implementation of this PT-PV cooperation while the discussion of the
matching scheme is still ongoing, because matching by MAC will certainly
work as a first approximation.

> > > > Is the guest supposed to signal the datapath switch to the host?
> > > No, guest doesn't need to be initiating datapath switch at all.
> > What happens if the guest supports failover in its PV driver, but lacks
> > the driver for the PT device?
> The assumption of failover driver is that the primary (PT device) will be
> able to get a datapath once it shows up in the guest .

I wonder how universal this assumption is, given the variety of possible
network configurations, including filters, VLANs, etc.  For whatever
reason Hyper-V defines a control message over the PV device from guest
to host for that.

> If adding a PT device
> to an unsupported guest, the result will be same as that without a standby
> PV driver - basically got no networking as you don't get a working driver.
> 
> Then perhaps don't add the PT device in the first place if guest lacks
> driver support?

You don't know this in advance.

> > > However, QMP
> > > events may be generated when exposing or hiding the PT device through hot
> > > plug/unplug to facilitate host to switch datapath.
> > The PT device hot plug/unplug are initiated by the host, aren't they?  Why
> > would it also need QMP events for them?
> As indicated above, the hot plug/unplug are initiated by QEMU not the
> management layer. Hence the QMP hot plug event is used as an indicator to
> switch host datapath. Unlike Windows Hyper-V SR-IOV driver model, the Linux
> host network stack does not offer a fine grained PF driver API to move
> MAC/VLAN filter, and the VF driver has to start with some initial MAC
> address filter programmed in when present in the guest. The QMP event is
> served as a checkpoint to switch MAC filter and/or VLAN filter between the
> PV and the VF.
> 

I'd appreciate something like a sequence diagram to better understand
the whole picture...

> > > > Is the scheme going to be applied/extended to other transports (vmbus,
> > > > virtio-ccw, etc.)?
> > > Well, it depends on the use case, and how feasible it can be extended to
> > > other transport due to constraints and transport specifics.
> > > 
> > > > Is the failover group concept going to be used beyond PT-PV network
> > > > device failover?
> > > Although the concept of failover group is generic, the implementation itself
> > > may vary.
> > My point with these two questions is that since this patchset is
> > defining external interfaces -- with guest OS, with management layer --
> > which are not easy to change later, it might make sense to try and see
> > if the interfaces map to other usecases.  E.g. I think we can get enough
> > information on how Hyper-V handles PT-PV network device failover from
> > the current Linux implementation; it may be a good idea to share some
> > concepts and workflows with virtio-pci.
> As you may see from above, the handshake of virtio failover depends on hot
> plug (PCI or ACPI) and virtio specifics (feature negotiation). So far as I
> see the Hyper-V uses a completely different handshake protocol of its own
> (guest initiated datapath switch, Serial number in VMBus PCI bridge) than
> that of virtio. I can barely imagine how code could be implemented in a
> shared manner, although I agree conceptually failover group between these
> two is similar or the same.

I actually think there must be a lot in common: the way for the
management layer to specify the binding between the PT and PV devices;
the overall sequence of state transitions of every component, the QMP
events and the points in time when they are emitted, the way to adjust
host-side network configuration and the time when to do it, and so on.
It's unfortunate that the implementation of PV-PT failover in guest
Linux happens to have diverged between virtio and hyperv, but I don't
see any fundamental difference and I wouldn't be surprised if they
eventually converged sooner rather than later.

There are a few things that need to be specific for PT and/or PV
transport, the matching identifier among them, but I guess a lot can
still be in common.

Roman.

Re: [Qemu-devel] [PATCH v3 0/3] Use of unique identifier for pairing virtio and passthrough devices...
Posted by Michael S. Tsirkin 5 years, 9 months ago
On Mon, Jul 09, 2018 at 04:00:36PM +0300, Roman Kagan wrote:
> > The assumption of failover driver is that the primary (PT device) will be
> > able to get a datapath once it shows up in the guest .
> 
> I wonder how universal this assumption is

Not universal at all. It is however the simplest
use-case we could think of, and that is as a result already
supported by Linux.

I for one will be happy if we spec out a fuller support
for more use-cases but it's probably orthogonal to
grouping.

-- 
MST

Re: [Qemu-devel] [PATCH v3 0/3] Use of unique identifier for pairing virtio and passthrough devices...
Posted by si-wei liu 5 years, 9 months ago

On 7/9/2018 6:00 AM, Roman Kagan wrote:
> On Tue, Jul 03, 2018 at 03:27:23PM -0700, si-wei liu wrote:
>> On 7/3/2018 2:58 AM, Roman Kagan wrote:
>>> So how is this coordination going to work?  One possibility is that the
>>> PV device emits a QMP event upon the guest driver confirming the support
>>> for failover, the management layer intercepts the event and performs
>>> device_add of the PT device.  Another is that the PT device is added
>>> from the very beginning (e.g. on the QEMU command line) but its parent
>>> PCI bridge subscribes a callback with the PV device to "activate" the PT
>>> device upon negotiating the failover feature.
>>>
>>> I think this needs to be decided within the scope of this patchset.
>> As what had been discussed in previous thread below, we would go with the
>> approach that QEMU manages the visibility of the PT device automatically.
>> Management layer supplies PT device to QEMU from the very beginning. This PT
>> device won't be exposed to guest immediately, unless or until the guest
>> virtio driver acknowledges the backup feature already. Once virtio driver in
>> the guest initiates a device reset, the corresponding PT device must be
>> taken out from guest. Then add it back later on after guest virtio completes
>> negotiation for the backup feature.
> This means that the parent bridge of the PT device (or whatever else can
> control the visibility of the PT device to the guest) will need to
> cooperate with the PV device *within* QEMU.  The most natural way to
> specify this connection is to have a property of one device to refer to
> the other by device-id.
This scheme has the problem that one device has to depend on the 
presence of the other - QEMU has the implication of the enumeration 
order if the two are not placed in the same birdge or PCI hierarchy. You 
can't get it reliably working if the bridge is going to be realized 
while the dependent PV device hadn't been yet, or vice versa.

> Another benefit of this approach is that it will allow to hide the
> (possibly transport-specific) device matching identifiers from the QEMU
> caller, as it won't need to be persistent nor visible to the management
> layer.  In particular, this will allow to move forward with the
> implementation of this PT-PV cooperation while the discussion of the
> matching scheme is still ongoing, because matching by MAC will certainly
> work as a first approximation.
The plan is to enable group ID based matching in the first place rather 
than match by MAC, the latter of which is fragile and problematic. I 
have made the Linux side changes and will get it posted once the QEMU 
discussion for grouping is finalized.

>
>>>>> Is the guest supposed to signal the datapath switch to the host?
>>>> No, guest doesn't need to be initiating datapath switch at all.
>>> What happens if the guest supports failover in its PV driver, but lacks
>>> the driver for the PT device?
>> The assumption of failover driver is that the primary (PT device) will be
>> able to get a datapath once it shows up in the guest .
> I wonder how universal this assumption is, given the variety of possible
> network configurations, including filters, VLANs, etc.  For whatever
> reason Hyper-V defines a control message over the PV device from guest
> to host for that.
These scenarios are different than the case with no driver support at 
all within guest. I think I particularly raised this as part of doing 
proper error handling when reviewing the original virtio failover patch 
- if failover module fails to enslave the VF due to guest network 
configurations, it has to signal virtio-net to propagate the error back 
to host. One way to handle that is to have virtio-net kick out a device 
reset and clear the feature bit upon re-negotiation, such that VF will 
be plugged out and won't get plugged in. I don't know for what reason 
the patch submitter did not incorporate that change. But it's in our 
plan to enhance that part, no worries.

>> If adding a PT device
>> to an unsupported guest, the result will be same as that without a standby
>> PV driver - basically got no networking as you don't get a working driver.
>>
>> Then perhaps don't add the PT device in the first place if guest lacks
>> driver support?
> You don't know this in advance.
 From migration point of view, it does not matter if guest lacks driver 
support for VF. I like to avoid duplicating hyper-v concept if at all 
possible. What makes sense with Hyper-V's accelerated networking doesn't 
have to work with KVM/QEMU SR-IOV live migration. Are you sure that the 
Hyper-V control message was added for this sole purpose? Seems to me an 
overkill for such an edge scenario.

>
>>>> However, QMP
>>>> events may be generated when exposing or hiding the PT device through hot
>>>> plug/unplug to facilitate host to switch datapath.
>>> The PT device hot plug/unplug are initiated by the host, aren't they?  Why
>>> would it also need QMP events for them?
>> As indicated above, the hot plug/unplug are initiated by QEMU not the
>> management layer. Hence the QMP hot plug event is used as an indicator to
>> switch host datapath. Unlike Windows Hyper-V SR-IOV driver model, the Linux
>> host network stack does not offer a fine grained PF driver API to move
>> MAC/VLAN filter, and the VF driver has to start with some initial MAC
>> address filter programmed in when present in the guest. The QMP event is
>> served as a checkpoint to switch MAC filter and/or VLAN filter between the
>> PV and the VF.
>>
> I'd appreciate something like a sequence diagram to better understand
> the whole picture...
>
>>>>> Is the scheme going to be applied/extended to other transports (vmbus,
>>>>> virtio-ccw, etc.)?
>>>> Well, it depends on the use case, and how feasible it can be extended to
>>>> other transport due to constraints and transport specifics.
>>>>
>>>>> Is the failover group concept going to be used beyond PT-PV network
>>>>> device failover?
>>>> Although the concept of failover group is generic, the implementation itself
>>>> may vary.
>>> My point with these two questions is that since this patchset is
>>> defining external interfaces -- with guest OS, with management layer --
>>> which are not easy to change later, it might make sense to try and see
>>> if the interfaces map to other usecases.  E.g. I think we can get enough
>>> information on how Hyper-V handles PT-PV network device failover from
>>> the current Linux implementation; it may be a good idea to share some
>>> concepts and workflows with virtio-pci.
>> As you may see from above, the handshake of virtio failover depends on hot
>> plug (PCI or ACPI) and virtio specifics (feature negotiation). So far as I
>> see the Hyper-V uses a completely different handshake protocol of its own
>> (guest initiated datapath switch, Serial number in VMBus PCI bridge) than
>> that of virtio. I can barely imagine how code could be implemented in a
>> shared manner, although I agree conceptually failover group between these
>> two is similar or the same.
> I actually think there must be a lot in common: the way for the
> management layer to specify the binding between the PT and PV devices;
> the overall sequence of state transitions of every component, the QMP
> events and the points in time when they are emitted, the way to adjust
> host-side network configuration and the time when to do it, and so on.
> It's unfortunate that the implementation of PV-PT failover in guest
> Linux happens to have diverged between virtio and hyperv, but I don't
> see any fundamental difference and I wouldn't be surprised if they
> eventually converged sooner rather than later.
(loop in Intel folks and Linux netdev)

Actually it's not without reason that Linux/virtio has to diverge from 
Hyper-V. The Hyper-V SR-IOV driver model allows VF be plugged in and 
registered with the stack without a MAC address filter programmed in the 
NIC, while Windows Hyper-V at the host side is able to move a MAC filter 
around from the NIC PV backend to VF upon receiving the DATAPATH_SWITCH 
control message initiated by guest. Windows NDIS has 
OID_RECEIVE_FILTER_MOVE_FILTER API to have PF update the MAC filter for 
the VF without involving guest or VF. For all of these there are no 
equivalent in the Linux SR-IOV driver model. How do you propose to have 
guest initiate the datapath switching at any point in time when you're 
dealing with Linux host network stack?

One may say we can plug in a VF with a random MAC filter programmed in 
prior, and initially use that random MAC within guest. This would require:

a) not relying on permanent MAC address to do pairing during the initial 
discovery, e.g. use the failover group ID as in this discussion
b) host to toggle the MAC address filter: which includes taking down the 
tap device to return the MAC back to PF, followed by assigning that MAC 
to VF using "ip link ... set vf ..."
c) notify guest to reload/reset VF driver for the change of hardware MAC 
address
d) until VF reloads the driver it won't be able to use the datapath, so 
very short period of network outage is (still) expected

But as you see this still diverges from the Hyper-V model. What do we 
buy for using a random address during initial discovery and requiring VF 
to complete the handshake? Less network downtime during datapath 
switching? Sorry but that's not a key factor at all for our main goal - 
live migration.

Regards,
-Siwei

>
> There are a few things that need to be specific for PT and/or PV
> transport, the matching identifier among them, but I guess a lot can
> still be in common.
>
> Roman.
>


Re: [Qemu-devel] [PATCH v3 0/3] Use of unique identifier for pairing virtio and passthrough devices...
Posted by Michael S. Tsirkin 5 years, 9 months ago
On Mon, Jul 09, 2018 at 06:11:53PM -0700, si-wei liu wrote:
> The plan is to enable group ID based matching in the first place rather than
> match by MAC, the latter of which is fragile and problematic.

It isn't all that fragile - hyperv used same for a while, so if someone
posts working patches with QEMU support but before this grouping stuff,
I'll happily apply them.

> I have made
> the Linux side changes and will get it posted once the QEMU discussion for
> grouping is finalized.

Go ahead but at this point I think we need to see actual QEMU support at
least for the basic functionality before we merge more code Linux-side.

-- 
MST

Re: [Qemu-devel] [virtio-dev] Re: [PATCH v3 0/3] Use of unique identifier for pairing virtio and passthrough devices...
Posted by Siwei Liu 5 years, 9 months ago
On Mon, Jul 9, 2018 at 6:54 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Mon, Jul 09, 2018 at 06:11:53PM -0700, si-wei liu wrote:
>> The plan is to enable group ID based matching in the first place rather than
>> match by MAC, the latter of which is fragile and problematic.
>
> It isn't all that fragile - hyperv used same for a while, so if someone
> posts working patches with QEMU support but before this grouping stuff,
> I'll happily apply them.

I wouldn't box the solution to very limited scenario just because of
matching by MAC, the benefit of having generic group ID in the first
place is that we save the effort of maintaining legacy MAC based
pairing that just adds complexity anyway. Currently the VF's MAC
address cannot be changed by either PF or by the guest user is a
severe limitation due to this. The other use case is that PT device
than VF would generally have different MAC than the standby virtio. We
shouldn't limit itself to VF specific scenario from the very
beginning.

It's just small piece of QEMU code to add in, why not?

>
>> I have made
>> the Linux side changes and will get it posted once the QEMU discussion for
>> grouping is finalized.
>
> Go ahead but at this point I think we need to see actual QEMU support at
> least for the basic functionality before we merge more code Linux-side.

The VFIO visibility work is being worked on. We'll post patches when it's ready.

Thanks,
-Siwei


>
> --
> MST
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>

Re: [Qemu-devel] [virtio-dev] Re: [PATCH v3 0/3] Use of unique identifier for pairing virtio and passthrough devices...
Posted by Cornelia Huck 5 years, 9 months ago
On Tue, 10 Jul 2018 17:07:37 -0700
Siwei Liu <loseweigh@gmail.com> wrote:

> On Mon, Jul 9, 2018 at 6:54 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Mon, Jul 09, 2018 at 06:11:53PM -0700, si-wei liu wrote:  
> >> The plan is to enable group ID based matching in the first place rather than
> >> match by MAC, the latter of which is fragile and problematic.  
> >
> > It isn't all that fragile - hyperv used same for a while, so if someone
> > posts working patches with QEMU support but before this grouping stuff,
> > I'll happily apply them.  
> 
> I wouldn't box the solution to very limited scenario just because of
> matching by MAC, the benefit of having generic group ID in the first
> place is that we save the effort of maintaining legacy MAC based
> pairing that just adds complexity anyway. Currently the VF's MAC
> address cannot be changed by either PF or by the guest user is a
> severe limitation due to this. The other use case is that PT device
> than VF would generally have different MAC than the standby virtio. We
> shouldn't limit itself to VF specific scenario from the very
> beginning.

So, this brings me to a different concern: the semantics of
VIRTIO_NET_F_STANDBY.

* The currently sole user seems to be the virtio-net Linux driver.
* The commit messages, code comments and Documentation/ all talk about
  matching by MAC.
* I could not find any proposed update to the virtio spec. (If there
  had been an older proposal with a different feature name, it is not
  discoverable.)

VIRTIO_NET_F_STANDBY is a host <-> guest interface. As there's no
official spec, you can only go by the Linux implementation, and by that
its semantics seem to be 'match by MAC', not 'match by other criteria'.

How is this supposed to work in the long run?

Re: [Qemu-devel] [virtio-dev] Re: [PATCH v3 0/3] Use of unique identifier for pairing virtio and passthrough devices...
Posted by Siwei Liu 5 years, 9 months ago
On Wed, Jul 11, 2018 at 2:53 AM, Cornelia Huck <cohuck@redhat.com> wrote:
> On Tue, 10 Jul 2018 17:07:37 -0700
> Siwei Liu <loseweigh@gmail.com> wrote:
>
>> On Mon, Jul 9, 2018 at 6:54 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>> > On Mon, Jul 09, 2018 at 06:11:53PM -0700, si-wei liu wrote:
>> >> The plan is to enable group ID based matching in the first place rather than
>> >> match by MAC, the latter of which is fragile and problematic.
>> >
>> > It isn't all that fragile - hyperv used same for a while, so if someone
>> > posts working patches with QEMU support but before this grouping stuff,
>> > I'll happily apply them.
>>
>> I wouldn't box the solution to very limited scenario just because of
>> matching by MAC, the benefit of having generic group ID in the first
>> place is that we save the effort of maintaining legacy MAC based
>> pairing that just adds complexity anyway. Currently the VF's MAC
>> address cannot be changed by either PF or by the guest user is a
>> severe limitation due to this. The other use case is that PT device
>> than VF would generally have different MAC than the standby virtio. We
>> shouldn't limit itself to VF specific scenario from the very
>> beginning.
>
> So, this brings me to a different concern: the semantics of
> VIRTIO_NET_F_STANDBY.
>
> * The currently sole user seems to be the virtio-net Linux driver.
> * The commit messages, code comments and Documentation/ all talk about
>   matching by MAC.
> * I could not find any proposed update to the virtio spec. (If there
>   had been an older proposal with a different feature name, it is not
>   discoverable.)

No, there was no such spec patch at all when the Linux patch was
submitted, hence match by MAC is the only means to pair device due to
lack of QEMU support.

Q: Does it work?
A: Well, it works for some.
Q: Does it work well to support all scenarios?
A: No, not as it claims to.
Q: Can it do better job to support all scenarios?
A: Yes, do pairing with the failover group ID instead.
Q: Does pairing still need to be MAC based if using failover group ID?
A: It depends, it's up to the implementation to verify MAC address
depending on the need (e.g. VF failover versus PT device replacement),
though MAC matching is no longer positioned as a requirement for
pairing or grouping.

There's no such stickiness for matching by MAC defined anywhere. The
semantics of VIRTIO_NET_F_STANDBY feature are mostly a failover
concept that the standby device should be used when the primary is not
present. We now have added the group ID on QEMU. I don't see why
bothering to get rid of the limitation: it's never been exposed. No
existing users. No API/ABI defined at all.

>
> VIRTIO_NET_F_STANDBY is a host <-> guest interface. As there's no
> official spec, you can only go by the Linux implementation, and by that
> its semantics seem to be 'match by MAC', not 'match by other criteria'.
>
> How is this supposed to work in the long run?

That group ID thing should work for all OS. Not just Linux.

I will change all the references to MAC matching in my upcoming patch.
Thank you for the note.

-Siwei

Re: [Qemu-devel] [virtio-dev] Re: [PATCH v3 0/3] Use of unique identifier for pairing virtio and passthrough devices...
Posted by Cornelia Huck 5 years, 9 months ago
On Thu, 12 Jul 2018 02:37:03 -0700
Siwei Liu <loseweigh@gmail.com> wrote:

> On Wed, Jul 11, 2018 at 2:53 AM, Cornelia Huck <cohuck@redhat.com> wrote:
> > On Tue, 10 Jul 2018 17:07:37 -0700
> > Siwei Liu <loseweigh@gmail.com> wrote:
> >  
> >> On Mon, Jul 9, 2018 at 6:54 PM, Michael S. Tsirkin <mst@redhat.com> wrote:  
> >> > On Mon, Jul 09, 2018 at 06:11:53PM -0700, si-wei liu wrote:  
> >> >> The plan is to enable group ID based matching in the first place rather than
> >> >> match by MAC, the latter of which is fragile and problematic.  
> >> >
> >> > It isn't all that fragile - hyperv used same for a while, so if someone
> >> > posts working patches with QEMU support but before this grouping stuff,
> >> > I'll happily apply them.  
> >>
> >> I wouldn't box the solution to very limited scenario just because of
> >> matching by MAC, the benefit of having generic group ID in the first
> >> place is that we save the effort of maintaining legacy MAC based
> >> pairing that just adds complexity anyway. Currently the VF's MAC
> >> address cannot be changed by either PF or by the guest user is a
> >> severe limitation due to this. The other use case is that PT device
> >> than VF would generally have different MAC than the standby virtio. We
> >> shouldn't limit itself to VF specific scenario from the very
> >> beginning.  
> >
> > So, this brings me to a different concern: the semantics of
> > VIRTIO_NET_F_STANDBY.
> >
> > * The currently sole user seems to be the virtio-net Linux driver.
> > * The commit messages, code comments and Documentation/ all talk about
> >   matching by MAC.
> > * I could not find any proposed update to the virtio spec. (If there
> >   had been an older proposal with a different feature name, it is not
> >   discoverable.)  
> 
> No, there was no such spec patch at all when the Linux patch was
> submitted, hence match by MAC is the only means to pair device due to
> lack of QEMU support.

We need to know what the device offers if it offers the feature bit,
and what it is supposed to do when the driver negotiates it. Currently,
we can only go by what the Linux driver does and what it expects. IOW,
we need a spec update proposal. Obviously, this should be discussed in
conjunction with the rest.

> 
> Q: Does it work?
> A: Well, it works for some.
> Q: Does it work well to support all scenarios?
> A: No, not as it claims to.
> Q: Can it do better job to support all scenarios?
> A: Yes, do pairing with the failover group ID instead.
> Q: Does pairing still need to be MAC based if using failover group ID?
> A: It depends, it's up to the implementation to verify MAC address
> depending on the need (e.g. VF failover versus PT device replacement),
> though MAC matching is no longer positioned as a requirement for
> pairing or grouping.

Whether matching by MAC is good or sufficient is a different
discussion. It is, however, what the code currently *does*, and in
absence of a spec update, it is the only reference for this feature.

> 
> There's no such stickiness for matching by MAC defined anywhere. The
> semantics of VIRTIO_NET_F_STANDBY feature are mostly a failover
> concept that the standby device should be used when the primary is not
> present. We now have added the group ID on QEMU. I don't see why
> bothering to get rid of the limitation: it's never been exposed. No
> existing users. No API/ABI defined at all.

This is scheduled to be released with the next Linux version, which is
right now in the -rc phase. It *is* API (a guest <-> host API).

No corresponding code is present in QEMU 3.0, which is in freeze right
now. Anything that goes into QEMU 3.1 or later needs to accommodate
Linux 4.18 as a guest.

> 
> >
> > VIRTIO_NET_F_STANDBY is a host <-> guest interface. As there's no
> > official spec, you can only go by the Linux implementation, and by that
> > its semantics seem to be 'match by MAC', not 'match by other criteria'.
> >
> > How is this supposed to work in the long run?  
> 
> That group ID thing should work for all OS. Not just Linux.

That's exactly my point: We need to care about not-Linux. And about
not-QEMU as well. A virtio feature bit should not be defined by what
Linux and QEMU do, but needs a real spec.

So, currently we have a Linux driver implementation that matches by
MAC. If a Linux version with this is released, every device that offers
VIRTIO_NET_F_STANDBY needs to support matching by MAC so that this
Linux driver will not break. Adding further matching methods should be
fine, but might need additional features (needs to be discussed).

Re: [Qemu-devel] [virtio-dev] Re: [PATCH v3 0/3] Use of unique identifier for pairing virtio and passthrough devices...
Posted by Siwei Liu 5 years, 9 months ago
_

On Thu, Jul 12, 2018 at 4:31 AM, Cornelia Huck <cohuck@redhat.com> wrote:
> On Thu, 12 Jul 2018 02:37:03 -0700
> Siwei Liu <loseweigh@gmail.com> wrote:
>
>> On Wed, Jul 11, 2018 at 2:53 AM, Cornelia Huck <cohuck@redhat.com> wrote:
>> > On Tue, 10 Jul 2018 17:07:37 -0700
>> > Siwei Liu <loseweigh@gmail.com> wrote:
>> >
>> >> On Mon, Jul 9, 2018 at 6:54 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>> >> > On Mon, Jul 09, 2018 at 06:11:53PM -0700, si-wei liu wrote:
>> >> >> The plan is to enable group ID based matching in the first place rather than
>> >> >> match by MAC, the latter of which is fragile and problematic.
>> >> >
>> >> > It isn't all that fragile - hyperv used same for a while, so if someone
>> >> > posts working patches with QEMU support but before this grouping stuff,
>> >> > I'll happily apply them.
>> >>
>> >> I wouldn't box the solution to very limited scenario just because of
>> >> matching by MAC, the benefit of having generic group ID in the first
>> >> place is that we save the effort of maintaining legacy MAC based
>> >> pairing that just adds complexity anyway. Currently the VF's MAC
>> >> address cannot be changed by either PF or by the guest user is a
>> >> severe limitation due to this. The other use case is that PT device
>> >> than VF would generally have different MAC than the standby virtio. We
>> >> shouldn't limit itself to VF specific scenario from the very
>> >> beginning.
>> >
>> > So, this brings me to a different concern: the semantics of
>> > VIRTIO_NET_F_STANDBY.
>> >
>> > * The currently sole user seems to be the virtio-net Linux driver.
>> > * The commit messages, code comments and Documentation/ all talk about
>> >   matching by MAC.
>> > * I could not find any proposed update to the virtio spec. (If there
>> >   had been an older proposal with a different feature name, it is not
>> >   discoverable.)
>>
>> No, there was no such spec patch at all when the Linux patch was
>> submitted, hence match by MAC is the only means to pair device due to
>> lack of QEMU support.
>
> We need to know what the device offers if it offers the feature bit,
> and what it is supposed to do when the driver negotiates it. Currently,
> we can only go by what the Linux driver does and what it expects. IOW,
> we need a spec update proposal. Obviously, this should be discussed in
> conjunction with the rest.

The definition is incomplete due to lack of spec. There's no "host"
part defined yet in the host-guest interface. If match by MAC is an
interface, the same must be done on the host(device) side as well,
which has been agreed not the way to go. However, I don't think that's
what the author intends to do by interpreting his QEMU patch - it
missed the other parts as well, such as the feature negotiation and
how it interacts with the paired device.

What I said is that match by MAC is just a guest implementation that
one can change at any time. We now have the group ID on QEMU, why
still sticking to matching by MAC? It shoulnd't be a host-guest
interface in the first place anyway.

>
>>
>> Q: Does it work?
>> A: Well, it works for some.
>> Q: Does it work well to support all scenarios?
>> A: No, not as it claims to.
>> Q: Can it do better job to support all scenarios?
>> A: Yes, do pairing with the failover group ID instead.
>> Q: Does pairing still need to be MAC based if using failover group ID?
>> A: It depends, it's up to the implementation to verify MAC address
>> depending on the need (e.g. VF failover versus PT device replacement),
>> though MAC matching is no longer positioned as a requirement for
>> pairing or grouping.
>
> Whether matching by MAC is good or sufficient is a different
> discussion. It is, however, what the code currently *does*, and in
> absence of a spec update, it is the only reference for this feature.

Not really, it's the same discussion. Before the group ID discussion I
explicitly asked for the spec for VIRTIO_NET_F_STANDBY about the
semantics.  Since no one come up with a spec, I assumed it is still
being worked on and it was all right to have the initial Linux
implementation to be in. I assume that is a formal process while
working on a complicated feature the spec can come later once
everything becomes clear along with the discussions.

https://www.spinics.net/lists/netdev/msg499011.html

I made the same claim at the time that we shouldn't go with what's
been implemented in Linux, but instead should look at the entire
picture to decide what should be the right semantics for
VIRTIO_NET_F_STANDBY. I agree with you we need a spec. I can work on a
spec but I'd like to clarify that there was nothing defined yet for
VIRTIO_NET_F_STANDBY so it shouldn't be called as spec update. It's
still a work-in-progress feature that IMHO it's different than the
situation that there used to be pre-spec virtio implementation already
working on both host and guest side. The current VIRTIO_NET_F_STANDBY
implementation in Linux is pretty much dead code without a spec
followed by a host/device side implementation.

-Siwei


>
>>
>> There's no such stickiness for matching by MAC defined anywhere. The
>> semantics of VIRTIO_NET_F_STANDBY feature are mostly a failover
>> concept that the standby device should be used when the primary is not
>> present. We now have added the group ID on QEMU. I don't see why
>> bothering to get rid of the limitation: it's never been exposed. No
>> existing users. No API/ABI defined at all.
>
> This is scheduled to be released with the next Linux version, which is
> right now in the -rc phase. It *is* API (a guest <-> host API).
>
> No corresponding code is present in QEMU 3.0, which is in freeze right
> now. Anything that goes into QEMU 3.1 or later needs to accommodate
> Linux 4.18 as a guest.
>
>>
>> >
>> > VIRTIO_NET_F_STANDBY is a host <-> guest interface. As there's no
>> > official spec, you can only go by the Linux implementation, and by that
>> > its semantics seem to be 'match by MAC', not 'match by other criteria'.
>> >
>> > How is this supposed to work in the long run?
>>
>> That group ID thing should work for all OS. Not just Linux.
>
> That's exactly my point: We need to care about not-Linux. And about
> not-QEMU as well. A virtio feature bit should not be defined by what
> Linux and QEMU do, but needs a real spec.
>
> So, currently we have a Linux driver implementation that matches by
> MAC. If a Linux version with this is released, every device that offers
> VIRTIO_NET_F_STANDBY needs to support matching by MAC so that this
> Linux driver will not break. Adding further matching methods should be
> fine, but might need additional features (needs to be discussed).

Re: [Qemu-devel] [virtio-dev] Re: [PATCH v3 0/3] Use of unique identifier for pairing virtio and passthrough devices...
Posted by Michael S. Tsirkin 5 years, 9 months ago
On Thu, Jul 12, 2018 at 01:52:53PM -0700, Siwei Liu wrote:
> The definition is incomplete due to lack of spec. There's no "host"
> part defined yet in the host-guest interface. If match by MAC is an
> interface, the same must be done on the host(device) side as well,
> which has been agreed not the way to go. However, I don't think that's
> what the author intends to do by interpreting his QEMU patch - it
> missed the other parts as well, such as the feature negotiation and
> how it interacts with the paired device.
> 
> What I said is that match by MAC is just a guest implementation that
> one can change at any time. We now have the group ID on QEMU, why
> still sticking to matching by MAC? It shoulnd't be a host-guest
> interface in the first place anyway.

I think that match by MAC is a simple portable way to match devices.
E.g. it will work seamlessly with niche things like zPCI. However
there are other niche use-cases that aren't addressed by match by MAC
such as PF pass-through as a primary, and the pci bridge trick addresses
that at cost of some portability.

So I see no issues supporting both mechanisms, but others on the TC
might feel differently.

-- 
MST

Re: [Qemu-devel] [virtio-dev] Re: [PATCH v3 0/3] Use of unique identifier for pairing virtio and passthrough devices...
Posted by Siwei Liu 5 years, 9 months ago
On Thu, Jul 12, 2018 at 2:00 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Thu, Jul 12, 2018 at 01:52:53PM -0700, Siwei Liu wrote:
>> The definition is incomplete due to lack of spec. There's no "host"
>> part defined yet in the host-guest interface. If match by MAC is an
>> interface, the same must be done on the host(device) side as well,
>> which has been agreed not the way to go. However, I don't think that's
>> what the author intends to do by interpreting his QEMU patch - it
>> missed the other parts as well, such as the feature negotiation and
>> how it interacts with the paired device.
>>
>> What I said is that match by MAC is just a guest implementation that
>> one can change at any time. We now have the group ID on QEMU, why
>> still sticking to matching by MAC? It shoulnd't be a host-guest
>> interface in the first place anyway.
>
> I think that match by MAC is a simple portable way to match devices.
> E.g. it will work seamlessly with niche things like zPCI. However

That's a good point. I'm not sure if it's a valid assumption that zPCI
should always use the same MAC address as that of virtio. Someone
who's more familiar with the use case may decide and work on that. It
means VFIO device has to take in the MAC address as an identifier to
the "-device vfio-pci,.." QEMU option. I think there's no point to
match device using group ID in QEMU while using MAC in the guest.
Based on that assumption, I'd go with making VIRTIO_NET_F_STANDBY to
match device based on group ID, while someone may come up with another
feature bit later, say VIRTIO_NET_F_STANDBY_BY_MAC when its QEMU
support is available. Would it make sense?

-Siwei

> there are other niche use-cases that aren't addressed by match by MAC
> such as PF pass-through as a primary, and the pci bridge trick addresses
> that at cost of some portability.
>
> So I see no issues supporting both mechanisms, but others on the TC
> might feel differently.
>
> --
> MST

Re: [Qemu-devel] [virtio-dev] Re: [PATCH v3 0/3] Use of unique identifier for pairing virtio and passthrough devices...
Posted by Samudrala, Sridhar 5 years, 9 months ago
On 7/12/2018 6:19 PM, Siwei Liu wrote:
> On Thu, Jul 12, 2018 at 2:00 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>> On Thu, Jul 12, 2018 at 01:52:53PM -0700, Siwei Liu wrote:
>>> The definition is incomplete due to lack of spec. There's no "host"
>>> part defined yet in the host-guest interface. If match by MAC is an
>>> interface, the same must be done on the host(device) side as well,
>>> which has been agreed not the way to go. However, I don't think that's
>>> what the author intends to do by interpreting his QEMU patch - it
>>> missed the other parts as well, such as the feature negotiation and
>>> how it interacts with the paired device.
>>>
>>> What I said is that match by MAC is just a guest implementation that
>>> one can change at any time. We now have the group ID on QEMU, why
>>> still sticking to matching by MAC? It shoulnd't be a host-guest
>>> interface in the first place anyway.
>> I think that match by MAC is a simple portable way to match devices.
>> E.g. it will work seamlessly with niche things like zPCI. However
> That's a good point. I'm not sure if it's a valid assumption that zPCI
> should always use the same MAC address as that of virtio. Someone
> who's more familiar with the use case may decide and work on that. It
> means VFIO device has to take in the MAC address as an identifier to
> the "-device vfio-pci,.." QEMU option. I think there's no point to
> match device using group ID in QEMU while using MAC in the guest.
> Based on that assumption, I'd go with making VIRTIO_NET_F_STANDBY to
> match device based on group ID, while someone may come up with another
> feature bit later, say VIRTIO_NET_F_STANDBY_BY_MAC when its QEMU
> support is available. Would it make sense?

VIRTIO_NET_F_STANDBY as defined in the guest virtio_net driver supports match
by MAC address. I think we should add support for this feature bit in QEMU.
If submitting a patch to update the spec is a pre-requisite to add this
feature bit to QEMU, i can do that.

As far as i understand, group id patches to QEMU are still under review.
Matching by group ID can be another feature bit that could support matching
by group id as well as MAC.


> -Siwei
>
>> there are other niche use-cases that aren't addressed by match by MAC
>> such as PF pass-through as a primary, and the pci bridge trick addresses
>> that at cost of some portability.
>>
>> So I see no issues supporting both mechanisms, but others on the TC
>> might feel differently.
>>
>> --
>> MST


Re: [Qemu-devel] [virtio-dev] Re: [PATCH v3 0/3] Use of unique identifier for pairing virtio and passthrough devices...
Posted by Michael S. Tsirkin 5 years, 9 months ago
On Thu, Jul 12, 2018 at 09:20:41PM -0400, Samudrala, Sridhar wrote:
> On 7/12/2018 6:19 PM, Siwei Liu wrote:
> > On Thu, Jul 12, 2018 at 2:00 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > > On Thu, Jul 12, 2018 at 01:52:53PM -0700, Siwei Liu wrote:
> > > > The definition is incomplete due to lack of spec. There's no "host"
> > > > part defined yet in the host-guest interface. If match by MAC is an
> > > > interface, the same must be done on the host(device) side as well,
> > > > which has been agreed not the way to go. However, I don't think that's
> > > > what the author intends to do by interpreting his QEMU patch - it
> > > > missed the other parts as well, such as the feature negotiation and
> > > > how it interacts with the paired device.
> > > > 
> > > > What I said is that match by MAC is just a guest implementation that
> > > > one can change at any time. We now have the group ID on QEMU, why
> > > > still sticking to matching by MAC? It shoulnd't be a host-guest
> > > > interface in the first place anyway.
> > > I think that match by MAC is a simple portable way to match devices.
> > > E.g. it will work seamlessly with niche things like zPCI. However
> > That's a good point. I'm not sure if it's a valid assumption that zPCI
> > should always use the same MAC address as that of virtio. Someone
> > who's more familiar with the use case may decide and work on that. It
> > means VFIO device has to take in the MAC address as an identifier to
> > the "-device vfio-pci,.." QEMU option. I think there's no point to
> > match device using group ID in QEMU while using MAC in the guest.
> > Based on that assumption, I'd go with making VIRTIO_NET_F_STANDBY to
> > match device based on group ID, while someone may come up with another
> > feature bit later, say VIRTIO_NET_F_STANDBY_BY_MAC when its QEMU
> > support is available. Would it make sense?
> 
> VIRTIO_NET_F_STANDBY as defined in the guest virtio_net driver supports match
> by MAC address. I think we should add support for this feature bit in QEMU.
> If submitting a patch to update the spec is a pre-requisite to add this
> feature bit to QEMU, i can do that.

It's not strictly a prerequisite but we need it in spec all the same, so
pls do that.

> As far as i understand, group id patches to QEMU are still under review.
> Matching by group ID can be another feature bit that could support matching
> by group id as well as MAC.
> 
> 
> > -Siwei
> > 
> > > there are other niche use-cases that aren't addressed by match by MAC
> > > such as PF pass-through as a primary, and the pci bridge trick addresses
> > > that at cost of some portability.
> > > 
> > > So I see no issues supporting both mechanisms, but others on the TC
> > > might feel differently.
> > > 
> > > --
> > > MST

Re: [Qemu-devel] [virtio-dev] Re: [PATCH v3 0/3] Use of unique identifier for pairing virtio and passthrough devices...
Posted by Cornelia Huck 5 years, 9 months ago
On Thu, 12 Jul 2018 21:20:41 -0400
"Samudrala, Sridhar" <sridhar.samudrala@intel.com> wrote:

> On 7/12/2018 6:19 PM, Siwei Liu wrote:
> > On Thu, Jul 12, 2018 at 2:00 PM, Michael S. Tsirkin <mst@redhat.com> wrote:  
> >> On Thu, Jul 12, 2018 at 01:52:53PM -0700, Siwei Liu wrote:  
> >>> The definition is incomplete due to lack of spec. There's no "host"
> >>> part defined yet in the host-guest interface. If match by MAC is an
> >>> interface, the same must be done on the host(device) side as well,
> >>> which has been agreed not the way to go. However, I don't think that's
> >>> what the author intends to do by interpreting his QEMU patch - it
> >>> missed the other parts as well, such as the feature negotiation and
> >>> how it interacts with the paired device.
> >>>
> >>> What I said is that match by MAC is just a guest implementation that
> >>> one can change at any time. We now have the group ID on QEMU, why
> >>> still sticking to matching by MAC? It shoulnd't be a host-guest
> >>> interface in the first place anyway.  
> >> I think that match by MAC is a simple portable way to match devices.
> >> E.g. it will work seamlessly with niche things like zPCI. However  
> > That's a good point. I'm not sure if it's a valid assumption that zPCI
> > should always use the same MAC address as that of virtio. Someone

I think we can mostly disregard the weirdness that is zPCI right now.
There should be no fundamental reasons that matching by MAC would not
work, though.

> > who's more familiar with the use case may decide and work on that. It
> > means VFIO device has to take in the MAC address as an identifier to
> > the "-device vfio-pci,.." QEMU option. I think there's no point to
> > match device using group ID in QEMU while using MAC in the guest.
> > Based on that assumption, I'd go with making VIRTIO_NET_F_STANDBY to
> > match device based on group ID, while someone may come up with another
> > feature bit later, say VIRTIO_NET_F_STANDBY_BY_MAC when its QEMU
> > support is available. Would it make sense?  
> 
> VIRTIO_NET_F_STANDBY as defined in the guest virtio_net driver supports match
> by MAC address. I think we should add support for this feature bit in QEMU.
> If submitting a patch to update the spec is a pre-requisite to add this
> feature bit to QEMU, i can do that.

Doing a spec patch and implementing matching by MAC in QEMU sounds like
a good plan to me.

> 
> As far as i understand, group id patches to QEMU are still under review.
> Matching by group ID can be another feature bit that could support matching
> by group id as well as MAC.

That plan sounds good to me as well.

Re: [Qemu-devel] [virtio-dev] Re: [PATCH v3 0/3] Use of unique identifier for pairing virtio and passthrough devices...
Posted by Michael S. Tsirkin 5 years, 9 months ago
On Wed, Jul 11, 2018 at 11:53:44AM +0200, Cornelia Huck wrote:
> On Tue, 10 Jul 2018 17:07:37 -0700
> Siwei Liu <loseweigh@gmail.com> wrote:
> 
> > On Mon, Jul 9, 2018 at 6:54 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > > On Mon, Jul 09, 2018 at 06:11:53PM -0700, si-wei liu wrote:  
> > >> The plan is to enable group ID based matching in the first place rather than
> > >> match by MAC, the latter of which is fragile and problematic.  
> > >
> > > It isn't all that fragile - hyperv used same for a while, so if someone
> > > posts working patches with QEMU support but before this grouping stuff,
> > > I'll happily apply them.  
> > 
> > I wouldn't box the solution to very limited scenario just because of
> > matching by MAC, the benefit of having generic group ID in the first
> > place is that we save the effort of maintaining legacy MAC based
> > pairing that just adds complexity anyway. Currently the VF's MAC
> > address cannot be changed by either PF or by the guest user is a
> > severe limitation due to this. The other use case is that PT device
> > than VF would generally have different MAC than the standby virtio. We
> > shouldn't limit itself to VF specific scenario from the very
> > beginning.
> 
> So, this brings me to a different concern: the semantics of
> VIRTIO_NET_F_STANDBY.
> 
> * The currently sole user seems to be the virtio-net Linux driver.
> * The commit messages, code comments and Documentation/ all talk about
>   matching by MAC.
> * I could not find any proposed update to the virtio spec. (If there
>   had been an older proposal with a different feature name, it is not
>   discoverable.)
> 
> VIRTIO_NET_F_STANDBY is a host <-> guest interface. As there's no
> official spec, you can only go by the Linux implementation, and by that
> its semantics seem to be 'match by MAC', not 'match by other criteria'.
> 
> How is this supposed to work in the long run?

We definitely need a spec patch for VIRTIO_NET_F_STANDBY documenting existing
semantics.  Sridhar, do you plan to take a look?

-- 
MST

Re: [Qemu-devel] [PATCH v3 0/3] Use of unique identifier for pairing virtio and passthrough devices...
Posted by Michael S. Tsirkin 5 years, 9 months ago
On Mon, Jul 09, 2018 at 06:11:53PM -0700, si-wei liu wrote:
> What do we buy
> for using a random address during initial discovery and requiring VF to
> complete the handshake?

I don't see advantages to using a random address that is then
changed either: changing a MAC causes network downtime for most users.

> Less network downtime during datapath switching?
> Sorry but that's not a key factor at all for our main goal - live migration.

Isn't avoiding downtime what makes the migration "live"?
If you don't care about it at all just remove the device
and migrate without all these tricks.


-- 
MST

Re: [Qemu-devel] [virtio-dev] Re: [PATCH v3 0/3] Use of unique identifier for pairing virtio and passthrough devices...
Posted by Siwei Liu 5 years, 9 months ago
On Mon, Jul 9, 2018 at 6:58 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Mon, Jul 09, 2018 at 06:11:53PM -0700, si-wei liu wrote:
>> What do we buy
>> for using a random address during initial discovery and requiring VF to
>> complete the handshake?
>
> I don't see advantages to using a random address that is then
> changed either: changing a MAC causes network downtime for most users.
>
Definitely.

I see Linux host stack fundamentally different with Windows, it's
non-sense to duplicate what Hyper-V is doing especially if there's no
extra benefit.

>> Less network downtime during datapath switching?
>> Sorry but that's not a key factor at all for our main goal - live migration.
>
> Isn't avoiding downtime what makes the migration "live"?
> If you don't care about it at all just remove the device
> and migrate without all these tricks.

Apparently this downtime is not avoidable even if guest initiates the
switch-over when it is done on Linux host stack. Unless the NIC
supports adding duplicate MAC filters with one has higher priority
than the other when both are present. I feel there's very little or
perhaps zero improvement for the downtime if moving to a
guest-initiated datapath switching model.

However, since this downtime is intermittent and generally
unnoticeable with a few packet drops, network should be resilient in
recovering from the drops. My point is that unless we can move to a
datapath switching model with zero downtime theoretically, this kind
of minor optimization offers very little help in general.

Regards,
-Siwei



>
>
> --
> MST
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>

Re: [Qemu-devel] [PATCH v3 0/3] Use of unique identifier for pairing virtio and passthrough devices...
Posted by Michael S. Tsirkin 5 years, 9 months ago
On Mon, Jul 09, 2018 at 04:00:36PM +0300, Roman Kagan wrote:
> There are a few things that need to be specific for PT and/or PV
> transport, the matching identifier among them, but I guess a lot can
> still be in common.

Maybe but frankly unless someone actually plans to bother implementing
all this in QEMU's limited emulation of hyper-v on top of kvm, I don't
see why would we care.

What I saw gives me the impression Hyper-V wants to just be
left alone as much as possible and maybe we should respect that.

-- 
MST

Re: [Qemu-devel] [PATCH v3 0/3] Use of unique identifier for pairing virtio and passthrough devices...
Posted by Michael S. Tsirkin 5 years, 10 months ago
On Tue, Jul 03, 2018 at 12:58:25PM +0300, Roman Kagan wrote:
> > >    How is this supposed to work with legacy guests that don't support it?
> > Only PV device will be exposed on legacy guest.
> 
> So how is this coordination going to work?

Here's a description:
https://www.mail-archive.com/qemu-devel@nongnu.org/msg538867.html

-- 
MST

Re: [Qemu-devel] [PATCH v3 0/3] Use of unique identifier for pairing virtio and passthrough devices...
Posted by Michael S. Tsirkin 5 years, 9 months ago
On Fri, Jun 29, 2018 at 05:19:03PM -0500, Venu Busireddy wrote:
> The current patch set includes all the feedback received for proposals [3]
> and [4]. For the sake of completeness, patch for the virtio specification
> is also included here. Following is the updated proposal.
> 
> 1. Extend the virtio specification to include a new virtio PCI capability
>    "VIRTIO_PCI_CAP_GROUP_ID_CFG".

....

>  default-configs/arm-softmmu.mak             |   1 +
>  default-configs/i386-softmmu.mak            |   1 +
>  default-configs/x86_64-softmmu.mak          |   1 +
>  hw/pci-bridge/Makefile.objs                 |   1 +
>  hw/pci-bridge/pci_bridge_dev.c              |  10 +
>  hw/pci-bridge/pcie_downstream.c             | 220 ++++++++++++++++++++
>  hw/pci-bridge/pcie_downstream.h             |  10 +
>  hw/pci/pci_bridge.c                         |  34 +++
>  hw/virtio/virtio-pci.c                      |  15 ++
>  hw/virtio/virtio-pci.h                      |   3 +-
>  include/hw/pci/pci.h                        |  37 ++--
>  include/hw/pci/pci_bridge.h                 |   2 +
>  include/hw/pci/pcie.h                       |   1 +
>  include/standard-headers/linux/virtio_pci.h |   8 +
>  14 files changed, 326 insertions(+), 18 deletions(-)
>  create mode 100644 hw/pci-bridge/pcie_downstream.c
>  create mode 100644 hw/pci-bridge/pcie_downstream.h

I don't see the spec patch here.

-- 
MST

Re: [Qemu-devel] [PATCH v3 0/3] Use of unique identifier for pairing virtio and passthrough devices...
Posted by Venu Busireddy 5 years, 9 months ago
On 2018-07-10 05:11:18 +0300, Michael S. Tsirkin wrote:
> On Fri, Jun 29, 2018 at 05:19:03PM -0500, Venu Busireddy wrote:
> > The current patch set includes all the feedback received for proposals [3]
> > and [4]. For the sake of completeness, patch for the virtio specification
> > is also included here. Following is the updated proposal.
> > 
> > 1. Extend the virtio specification to include a new virtio PCI capability
> >    "VIRTIO_PCI_CAP_GROUP_ID_CFG".
> 
> ....
> 
> >  default-configs/arm-softmmu.mak             |   1 +
> >  default-configs/i386-softmmu.mak            |   1 +
> >  default-configs/x86_64-softmmu.mak          |   1 +
> >  hw/pci-bridge/Makefile.objs                 |   1 +
> >  hw/pci-bridge/pci_bridge_dev.c              |  10 +
> >  hw/pci-bridge/pcie_downstream.c             | 220 ++++++++++++++++++++
> >  hw/pci-bridge/pcie_downstream.h             |  10 +
> >  hw/pci/pci_bridge.c                         |  34 +++
> >  hw/virtio/virtio-pci.c                      |  15 ++
> >  hw/virtio/virtio-pci.h                      |   3 +-
> >  include/hw/pci/pci.h                        |  37 ++--
> >  include/hw/pci/pci_bridge.h                 |   2 +
> >  include/hw/pci/pcie.h                       |   1 +
> >  include/standard-headers/linux/virtio_pci.h |   8 +
> >  14 files changed, 326 insertions(+), 18 deletions(-)
> >  create mode 100644 hw/pci-bridge/pcie_downstream.c
> >  create mode 100644 hw/pci-bridge/pcie_downstream.h
> 
> I don't see the spec patch here.

Since the spec patch is in a different repo, I could not get 'git
format-patch' to combine them together. As a result, I added the spec
patch itself to the patches, but it is not in the cover letter.

I can combine them manually, and repost. Do you want me do that? 

Thanks,

Venu


Re: [Qemu-devel] [PATCH v3 0/3] Use of unique identifier for pairing virtio and passthrough devices...
Posted by Michael S. Tsirkin 5 years, 9 months ago
On Tue, Jul 10, 2018 at 09:28:57AM -0500, Venu Busireddy wrote:
> On 2018-07-10 05:11:18 +0300, Michael S. Tsirkin wrote:
> > On Fri, Jun 29, 2018 at 05:19:03PM -0500, Venu Busireddy wrote:
> > > The current patch set includes all the feedback received for proposals [3]
> > > and [4]. For the sake of completeness, patch for the virtio specification
> > > is also included here. Following is the updated proposal.
> > > 
> > > 1. Extend the virtio specification to include a new virtio PCI capability
> > >    "VIRTIO_PCI_CAP_GROUP_ID_CFG".
> > 
> > ....
> > 
> > >  default-configs/arm-softmmu.mak             |   1 +
> > >  default-configs/i386-softmmu.mak            |   1 +
> > >  default-configs/x86_64-softmmu.mak          |   1 +
> > >  hw/pci-bridge/Makefile.objs                 |   1 +
> > >  hw/pci-bridge/pci_bridge_dev.c              |  10 +
> > >  hw/pci-bridge/pcie_downstream.c             | 220 ++++++++++++++++++++
> > >  hw/pci-bridge/pcie_downstream.h             |  10 +
> > >  hw/pci/pci_bridge.c                         |  34 +++
> > >  hw/virtio/virtio-pci.c                      |  15 ++
> > >  hw/virtio/virtio-pci.h                      |   3 +-
> > >  include/hw/pci/pci.h                        |  37 ++--
> > >  include/hw/pci/pci_bridge.h                 |   2 +
> > >  include/hw/pci/pcie.h                       |   1 +
> > >  include/standard-headers/linux/virtio_pci.h |   8 +
> > >  14 files changed, 326 insertions(+), 18 deletions(-)
> > >  create mode 100644 hw/pci-bridge/pcie_downstream.c
> > >  create mode 100644 hw/pci-bridge/pcie_downstream.h
> > 
> > I don't see the spec patch here.
> 
> Since the spec patch is in a different repo, I could not get 'git
> format-patch' to combine them together. As a result, I added the spec
> patch itself to the patches, but it is not in the cover letter.
> 
> I can combine them manually, and repost. Do you want me do that? 
> 
> Thanks,
> 
> Venu

It's ok, I found it now.

-- 
MST