[libvirt] [RFC PATCH 00/28] Enable multifunction pci hotplug

Shivaprasad G Bhat posted 28 patches 6 years, 8 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/libvirt tags/patchew/152104711186.10112.1077788328340024644.stgit@localhost.localdomain
Test syntax-check failed
src/conf/device_conf.h                             |    7
src/conf/domain_addr.c                             |  127 ++++++-
src/conf/domain_addr.h                             |   41 +-
src/conf/domain_conf.c                             |  194 +++++++++-
src/conf/domain_conf.h                             |   39 ++
src/libvirt_private.syms                           |   14 +
src/node_device/node_device_udev.c                 |    2
src/qemu/qemu_capabilities.c                       |    5
src/qemu/qemu_domain.c                             |   72 ++++
src/qemu/qemu_domain.h                             |   19 +
src/qemu/qemu_domain_address.c                     |  375 ++++++++++++++++++-
src/qemu/qemu_domain_address.h                     |   15 +
src/qemu/qemu_driver.c                             |  197 +++++++---
src/qemu/qemu_hostdev.c                            |   70 ----
src/qemu/qemu_hostdev.h                            |    3
src/qemu/qemu_hotplug.c                            |  389 ++++++++++++++++----
src/qemu/qemu_hotplug.h                            |   14 +
src/util/virhostdev.c                              |   96 +++++
src/util/virhostdev.h                              |   11 +
src/util/virpci.c                                  |   22 +
src/util/virpci.h                                  |    8
src/util/virprocess.h                              |    2
tests/Makefile.am                                  |    7
tests/qemuargv2xmldata/hostdev-pci-address.args    |    2
tests/qemuargv2xmldata/hostdev-pci-address.xml     |    2
tests/qemuargv2xmltest.c                           |   18 +
tests/qemuhotplugtest.c                            |   98 ++++-
.../qemuhotplug-hostdev-pci.xml                    |    6
.../qemuhotplug-multifunction-hostdev-pci-2.xml    |   14 +
.../qemuhotplug-multifunction-hostdev-pci.xml      |   20 +
.../qemuhotplug-base-live+hostdev-pci.xml          |   60 +++
...hotplug-base-live+multifunction-hostdev-pci.xml |   76 ++++
.../qemuhotplug-pseries-base-live+hostdev-pci.xml  |   53 +++
...eries-base-live+multifunction-hostdev-pci-2.xml |   61 +++
...pseries-base-live+multifunction-hostdev-pci.xml |   69 ++++
.../qemuhotplug-pseries-base-live.xml              |   45 ++
.../hostdev-pci-address-device.args                |    2
.../hostdev-pci-address-device.xml                 |    2
tests/qemuxml2argvdata/hostdev-pci-address.args    |    2
tests/qemuxml2argvdata/hostdev-pci-address.xml     |    2
.../hostdev-pci-multifunction.args                 |   31 ++
.../qemuxml2argvdata/hostdev-pci-multifunction.xml |   59 +++
.../hostdev-pci-no-primary-function.xml            |   23 +
tests/qemuxml2argvdata/hostdev-pci-validate.args   |   25 +
tests/qemuxml2argvdata/hostdev-pci-validate.xml    |   29 +
.../qemuxml2argvdata/hostdev-vfio-multidomain.args |    2
.../qemuxml2argvdata/hostdev-vfio-multidomain.xml  |    2
tests/qemuxml2argvdata/hostdev-vfio.args           |    2
tests/qemuxml2argvdata/hostdev-vfio.xml            |    2
tests/qemuxml2argvdata/net-hostdev-fail.xml        |    2
.../qemuxml2argvdata/net-hostdev-multidomain.args  |    2
tests/qemuxml2argvdata/net-hostdev-multidomain.xml |    2
tests/qemuxml2argvdata/net-hostdev-vfio.args       |    2
tests/qemuxml2argvdata/net-hostdev-vfio.xml        |    2
tests/qemuxml2argvdata/net-hostdev.args            |    2
tests/qemuxml2argvdata/net-hostdev.xml             |    2
tests/qemuxml2argvdata/pci-rom.args                |    4
tests/qemuxml2argvdata/pci-rom.xml                 |    4
tests/qemuxml2argvdata/pseries-hostdevs-1.args     |    5
tests/qemuxml2argvdata/pseries-hostdevs-3.args     |    5
tests/qemuxml2argvtest.c                           |   17 +
tests/qemuxml2xmloutdata/hostdev-pci-address.xml   |    2
.../hostdev-pci-multifunction.xml                  |   79 ++++
tests/qemuxml2xmloutdata/hostdev-vfio.xml          |    2
tests/qemuxml2xmloutdata/net-hostdev-vfio.xml      |    2
tests/qemuxml2xmloutdata/net-hostdev.xml           |    2
tests/qemuxml2xmloutdata/pci-rom.xml               |    4
tests/qemuxml2xmloutdata/pseries-hostdevs-1.xml    |    4
tests/qemuxml2xmloutdata/pseries-hostdevs-3.xml    |    4
tests/qemuxml2xmltest.c                            |    1
tests/virhostdevtest.c                             |   39 --
tests/virpcimock.c                                 |  199 +++++++++-
tests/virpcitest.c                                 |   12 -
tests/virpcitestdata/0000-06-12.0.config           |  Bin
tests/virpcitestdata/0000-06-12.1.config           |  Bin
tests/virpcitestdata/0000-06-12.2.config           |  Bin
tests/virpcitestdata/0005-90-01.1.config           |  Bin
tests/virpcitestdata/0005-90-01.2.config           |  Bin
tests/virpcitestdata/0005-90-01.3.config           |  Bin
tests/virprocessmock.c                             |   28 +
80 files changed, 2463 insertions(+), 400 deletions(-)
create mode 100644 tests/qemuhotplugtestdevices/qemuhotplug-hostdev-pci.xml
create mode 100644 tests/qemuhotplugtestdevices/qemuhotplug-multifunction-hostdev-pci-2.xml
create mode 100644 tests/qemuhotplugtestdevices/qemuhotplug-multifunction-hostdev-pci.xml
create mode 100644 tests/qemuhotplugtestdomains/qemuhotplug-base-live+hostdev-pci.xml
create mode 100644 tests/qemuhotplugtestdomains/qemuhotplug-base-live+multifunction-hostdev-pci.xml
create mode 100644 tests/qemuhotplugtestdomains/qemuhotplug-pseries-base-live+hostdev-pci.xml
create mode 100644 tests/qemuhotplugtestdomains/qemuhotplug-pseries-base-live+multifunction-hostdev-pci-2.xml
create mode 100644 tests/qemuhotplugtestdomains/qemuhotplug-pseries-base-live+multifunction-hostdev-pci.xml
create mode 100644 tests/qemuhotplugtestdomains/qemuhotplug-pseries-base-live.xml
create mode 100644 tests/qemuxml2argvdata/hostdev-pci-multifunction.args
create mode 100644 tests/qemuxml2argvdata/hostdev-pci-multifunction.xml
create mode 100644 tests/qemuxml2argvdata/hostdev-pci-no-primary-function.xml
create mode 100644 tests/qemuxml2argvdata/hostdev-pci-validate.args
create mode 100644 tests/qemuxml2argvdata/hostdev-pci-validate.xml
create mode 100644 tests/qemuxml2xmloutdata/hostdev-pci-multifunction.xml
create mode 100644 tests/virpcitestdata/0000-06-12.0.config
create mode 100644 tests/virpcitestdata/0000-06-12.1.config
create mode 100644 tests/virpcitestdata/0000-06-12.2.config
create mode 100644 tests/virpcitestdata/0005-90-01.3.config
create mode 100644 tests/virprocessmock.c
[libvirt] [RFC PATCH 00/28] Enable multifunction pci hotplug
Posted by Shivaprasad G Bhat 6 years, 8 months ago
Hi All,

I have revisited/rewritten my previously posted patches. Here is
the RFC. Since this patchset is a complete rewrite, I am starting
with v1 here.

The semantics is as discussed before
https://www.redhat.com/archives/libvir-list/2016-April/msg01057.html

As I went on to refactor the code to support multifunction virtio devices,
I realised the abort/cleanup path would be a nightmare there, in case of
failures. So, dropped that attempt. The current RFC limits to the real
practical use cases of Multifunction PCI hostdevices. All new test code
to support multifunction PCI hostdevices and test cases are added to
prove the functionality.

So, to summarise
=============
Patch 1     - is a bug fix
Patch 2-6   - Adds all PCI/VFIO/Multifunction/multiple devices per IOMMU group
              support to our mock test environment.
Patches till here, are kind of basic and independent but necessary for the
remaining patches.
=============
Patch 7-14  - Detect and auto-address PCI multifunction devices.
=============
Patch 15-25 - Refactor/Prepare for hotplug/unplug
Patch 26-28 - Finally implement Hotplug/Unplug

Thanks,
Shivaprasad

---

Shivaprasad G Bhat (28):
      Fix the iommu group path in mock pci
      util: move the hostdev passthrough support functions to utility
      tests: pci: Mock the iommu groups and vfio
      virpcitest: Change the stub driver to vfio from pci-stub
      virpcimock: Mock the SRIOV Virtual functions
      tests: qemu: Add test case for pci-hostdev hotplug
      tests: Add a baseline test for multifunction pci device use case
      util: virpci: detect if the device is a multifunction device from sysfs
      tests: qemu: mock pci environment for qemuargv2xmltests
      virhostdev: Introduce virHostdevPCIDevicesBelongToSameSlot
      qemu: address: Separate the slots into multiple aggregates
      qemu: address: Enable auto addressing multifunction cards
      util: make virHostdevIsVirtualFunction() public
      conf: qemu: validate multifunction hostdevice domain configs
      conf: Add helper to get active functions of a slot of domain
      qemu: hostdev: Move the hostdev preparation to a separate function
      qemu: hotplug: Move the detach of PCI device to the beginnging of live hotplug
      qemu: hotplug: move assignment outside qemuDomainAttachHostPCIDevice
      Introduce virDomainDeviceDefParseXMLMany
      Introduce qemuDomainDeviceParseXMLMany
      qemu: refactor qemuDomain[Attach|Detach]DeviceConfig
      qemu: refactor qemuDomain[Attach|Detach]DeviceLive
      qemu: hotplug: Queue and wait for multiple devices
      domain: addr: Introduce virDomainPCIAddressEnsureMultifunctionAddress
      qemu: hotplug: Implement multifunction device hotplug
      qemu: hotplug : Prevent updates to mulitfunction device
      qemu: hotplug: Move out the Single function check
      qemu: hotplug: Implement multifunction device unplug


 src/conf/device_conf.h                             |    7 
 src/conf/domain_addr.c                             |  127 ++++++-
 src/conf/domain_addr.h                             |   41 +-
 src/conf/domain_conf.c                             |  194 +++++++++-
 src/conf/domain_conf.h                             |   39 ++
 src/libvirt_private.syms                           |   14 +
 src/node_device/node_device_udev.c                 |    2 
 src/qemu/qemu_capabilities.c                       |    5 
 src/qemu/qemu_domain.c                             |   72 ++++
 src/qemu/qemu_domain.h                             |   19 +
 src/qemu/qemu_domain_address.c                     |  375 ++++++++++++++++++-
 src/qemu/qemu_domain_address.h                     |   15 +
 src/qemu/qemu_driver.c                             |  197 +++++++---
 src/qemu/qemu_hostdev.c                            |   70 ----
 src/qemu/qemu_hostdev.h                            |    3 
 src/qemu/qemu_hotplug.c                            |  389 ++++++++++++++++----
 src/qemu/qemu_hotplug.h                            |   14 +
 src/util/virhostdev.c                              |   96 +++++
 src/util/virhostdev.h                              |   11 +
 src/util/virpci.c                                  |   22 +
 src/util/virpci.h                                  |    8 
 src/util/virprocess.h                              |    2 
 tests/Makefile.am                                  |    7 
 tests/qemuargv2xmldata/hostdev-pci-address.args    |    2 
 tests/qemuargv2xmldata/hostdev-pci-address.xml     |    2 
 tests/qemuargv2xmltest.c                           |   18 +
 tests/qemuhotplugtest.c                            |   98 ++++-
 .../qemuhotplug-hostdev-pci.xml                    |    6 
 .../qemuhotplug-multifunction-hostdev-pci-2.xml    |   14 +
 .../qemuhotplug-multifunction-hostdev-pci.xml      |   20 +
 .../qemuhotplug-base-live+hostdev-pci.xml          |   60 +++
 ...hotplug-base-live+multifunction-hostdev-pci.xml |   76 ++++
 .../qemuhotplug-pseries-base-live+hostdev-pci.xml  |   53 +++
 ...eries-base-live+multifunction-hostdev-pci-2.xml |   61 +++
 ...pseries-base-live+multifunction-hostdev-pci.xml |   69 ++++
 .../qemuhotplug-pseries-base-live.xml              |   45 ++
 .../hostdev-pci-address-device.args                |    2 
 .../hostdev-pci-address-device.xml                 |    2 
 tests/qemuxml2argvdata/hostdev-pci-address.args    |    2 
 tests/qemuxml2argvdata/hostdev-pci-address.xml     |    2 
 .../hostdev-pci-multifunction.args                 |   31 ++
 .../qemuxml2argvdata/hostdev-pci-multifunction.xml |   59 +++
 .../hostdev-pci-no-primary-function.xml            |   23 +
 tests/qemuxml2argvdata/hostdev-pci-validate.args   |   25 +
 tests/qemuxml2argvdata/hostdev-pci-validate.xml    |   29 +
 .../qemuxml2argvdata/hostdev-vfio-multidomain.args |    2 
 .../qemuxml2argvdata/hostdev-vfio-multidomain.xml  |    2 
 tests/qemuxml2argvdata/hostdev-vfio.args           |    2 
 tests/qemuxml2argvdata/hostdev-vfio.xml            |    2 
 tests/qemuxml2argvdata/net-hostdev-fail.xml        |    2 
 .../qemuxml2argvdata/net-hostdev-multidomain.args  |    2 
 tests/qemuxml2argvdata/net-hostdev-multidomain.xml |    2 
 tests/qemuxml2argvdata/net-hostdev-vfio.args       |    2 
 tests/qemuxml2argvdata/net-hostdev-vfio.xml        |    2 
 tests/qemuxml2argvdata/net-hostdev.args            |    2 
 tests/qemuxml2argvdata/net-hostdev.xml             |    2 
 tests/qemuxml2argvdata/pci-rom.args                |    4 
 tests/qemuxml2argvdata/pci-rom.xml                 |    4 
 tests/qemuxml2argvdata/pseries-hostdevs-1.args     |    5 
 tests/qemuxml2argvdata/pseries-hostdevs-3.args     |    5 
 tests/qemuxml2argvtest.c                           |   17 +
 tests/qemuxml2xmloutdata/hostdev-pci-address.xml   |    2 
 .../hostdev-pci-multifunction.xml                  |   79 ++++
 tests/qemuxml2xmloutdata/hostdev-vfio.xml          |    2 
 tests/qemuxml2xmloutdata/net-hostdev-vfio.xml      |    2 
 tests/qemuxml2xmloutdata/net-hostdev.xml           |    2 
 tests/qemuxml2xmloutdata/pci-rom.xml               |    4 
 tests/qemuxml2xmloutdata/pseries-hostdevs-1.xml    |    4 
 tests/qemuxml2xmloutdata/pseries-hostdevs-3.xml    |    4 
 tests/qemuxml2xmltest.c                            |    1 
 tests/virhostdevtest.c                             |   39 --
 tests/virpcimock.c                                 |  199 +++++++++-
 tests/virpcitest.c                                 |   12 -
 tests/virpcitestdata/0000-06-12.0.config           |  Bin
 tests/virpcitestdata/0000-06-12.1.config           |  Bin
 tests/virpcitestdata/0000-06-12.2.config           |  Bin
 tests/virpcitestdata/0005-90-01.1.config           |  Bin
 tests/virpcitestdata/0005-90-01.2.config           |  Bin
 tests/virpcitestdata/0005-90-01.3.config           |  Bin
 tests/virprocessmock.c                             |   28 +
 80 files changed, 2463 insertions(+), 400 deletions(-)
 create mode 100644 tests/qemuhotplugtestdevices/qemuhotplug-hostdev-pci.xml
 create mode 100644 tests/qemuhotplugtestdevices/qemuhotplug-multifunction-hostdev-pci-2.xml
 create mode 100644 tests/qemuhotplugtestdevices/qemuhotplug-multifunction-hostdev-pci.xml
 create mode 100644 tests/qemuhotplugtestdomains/qemuhotplug-base-live+hostdev-pci.xml
 create mode 100644 tests/qemuhotplugtestdomains/qemuhotplug-base-live+multifunction-hostdev-pci.xml
 create mode 100644 tests/qemuhotplugtestdomains/qemuhotplug-pseries-base-live+hostdev-pci.xml
 create mode 100644 tests/qemuhotplugtestdomains/qemuhotplug-pseries-base-live+multifunction-hostdev-pci-2.xml
 create mode 100644 tests/qemuhotplugtestdomains/qemuhotplug-pseries-base-live+multifunction-hostdev-pci.xml
 create mode 100644 tests/qemuhotplugtestdomains/qemuhotplug-pseries-base-live.xml
 create mode 100644 tests/qemuxml2argvdata/hostdev-pci-multifunction.args
 create mode 100644 tests/qemuxml2argvdata/hostdev-pci-multifunction.xml
 create mode 100644 tests/qemuxml2argvdata/hostdev-pci-no-primary-function.xml
 create mode 100644 tests/qemuxml2argvdata/hostdev-pci-validate.args
 create mode 100644 tests/qemuxml2argvdata/hostdev-pci-validate.xml
 create mode 100644 tests/qemuxml2xmloutdata/hostdev-pci-multifunction.xml
 create mode 100644 tests/virpcitestdata/0000-06-12.0.config
 create mode 100644 tests/virpcitestdata/0000-06-12.1.config
 create mode 100644 tests/virpcitestdata/0000-06-12.2.config
 create mode 100644 tests/virpcitestdata/0005-90-01.3.config
 create mode 100644 tests/virprocessmock.c

--
Signature

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC PATCH 00/28] Enable multifunction pci hotplug
Posted by Daniel P. Berrangé 6 years, 8 months ago
On Wed, Mar 14, 2018 at 10:44:30PM +0530, Shivaprasad G Bhat wrote:
> Hi All,
> 
> I have revisited/rewritten my previously posted patches. Here is
> the RFC. Since this patchset is a complete rewrite, I am starting
> with v1 here.
> 
> The semantics is as discussed before
> https://www.redhat.com/archives/libvir-list/2016-April/msg01057.html
> 
> As I went on to refactor the code to support multifunction virtio devices,
> I realised the abort/cleanup path would be a nightmare there, in case of
> failures. So, dropped that attempt. The current RFC limits to the real
> practical use cases of Multifunction PCI hostdevices. All new test code
> to support multifunction PCI hostdevices and test cases are added to
> prove the functionality.

I guess I'm not really understanding the use case here.  With SRIOV
devices, you can already choose between assigning either the physical
function (which gives the guest access to all virtual functions), or
to assign an arbitrary set of individiual functions to various guests.
Why do we need to be able to list many <hostdev> at the same time
when hotplugging to assign multiple functions.

Basically can you provide a full description of the problem you are
trying to solve and why existing functionality isn't sufficient.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC PATCH 00/28] Enable multifunction pci hotplug
Posted by Shivaprasad G Bhat 6 years, 8 months ago

On 03/15/2018 03:31 PM, Daniel P. Berrangé wrote:
> On Wed, Mar 14, 2018 at 10:44:30PM +0530, Shivaprasad G Bhat wrote:
>> Hi All,
>>
>> I have revisited/rewritten my previously posted patches. Here is
>> the RFC. Since this patchset is a complete rewrite, I am starting
>> with v1 here.
>>
>> The semantics is as discussed before
>> https://www.redhat.com/archives/libvir-list/2016-April/msg01057.html
>>
>> As I went on to refactor the code to support multifunction virtio devices,
>> I realised the abort/cleanup path would be a nightmare there, in case of
>> failures. So, dropped that attempt. The current RFC limits to the real
>> practical use cases of Multifunction PCI hostdevices. All new test code
>> to support multifunction PCI hostdevices and test cases are added to
>> prove the functionality.
> I guess I'm not really understanding the use case here.  With SRIOV
> devices, you can already choose between assigning either the physical
> function (which gives the guest access to all virtual functions), or
> to assign an arbitrary set of individiual functions to various guests.
> Why do we need to be able to list many <hostdev> at the same time
> when hotplugging to assign multiple functions.
>
> Basically can you provide a full description of the problem you are
> trying to solve and why existing functionality isn't sufficient.

Hi Daniel,

This is for cards which may not necessarily be networking cards. Or may 
be a mix of
networking and storage.

Suppose, user has below card
0005:01:00.0 Ethernet controller: Emulex Corporation OneConnect NIC 
(Lancer) (rev 10)
0005:01:00.1 Ethernet controller: Emulex Corporation OneConnect NIC 
(Lancer) (rev 10)
0005:01:00.2 Ethernet controller: Emulex Corporation OneConnect NIC 
(Lancer) (rev 10)
0005:01:00.3 Ethernet controller: Emulex Corporation OneConnect NIC 
(Lancer) (rev 10)
0005:01:00.4 Fibre Channel: Emulex Corporation OneConnect FCoE Initiator 
(Lancer) (rev 10)
0005:01:00.5 Fibre Channel: Emulex Corporation OneConnect FCoE Initiator 
(Lancer) (rev 10)

If user wants to hotplug this card to guest, He has to detach all the 
functions from host driver,
then hotplug 0005:01:00.0, 0005:01:00.1, so on individually. But, today 
with each hotplug
of the function, each <hostdev> goes to different guest slot. Whereas, 
PCI requires all of
them to be on the same slot. This is not supported on libvirt today.

The multifunction cards cant be hotplugged to guest today with the 
individual
<hostdev>, as the operation is queued by qemu till the function zero of 
guest slot is
hotplugged. On function zero hotplug, the qemu sends out the event to guest
for device probing where all the previously hotplugged functions from the
same slot are discovered. So, grouping the <hostdev>s within the <devices>
would become necessary to make the whole thing a single operation.

The patches try to fix this aspect of the use case.

Thanks,
Shivaprasad

> Regards,
> Daniel

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC PATCH 00/28] Enable multifunction pci hotplug
Posted by Daniel P. Berrangé 6 years, 8 months ago
On Thu, Mar 15, 2018 at 07:54:47PM +0530, Shivaprasad G Bhat wrote:
> 
> 
> On 03/15/2018 03:31 PM, Daniel P. Berrangé wrote:
> > On Wed, Mar 14, 2018 at 10:44:30PM +0530, Shivaprasad G Bhat wrote:
> > > Hi All,
> > > 
> > > I have revisited/rewritten my previously posted patches. Here is
> > > the RFC. Since this patchset is a complete rewrite, I am starting
> > > with v1 here.
> > > 
> > > The semantics is as discussed before
> > > https://www.redhat.com/archives/libvir-list/2016-April/msg01057.html
> > > 
> > > As I went on to refactor the code to support multifunction virtio devices,
> > > I realised the abort/cleanup path would be a nightmare there, in case of
> > > failures. So, dropped that attempt. The current RFC limits to the real
> > > practical use cases of Multifunction PCI hostdevices. All new test code
> > > to support multifunction PCI hostdevices and test cases are added to
> > > prove the functionality.
> > I guess I'm not really understanding the use case here.  With SRIOV
> > devices, you can already choose between assigning either the physical
> > function (which gives the guest access to all virtual functions), or
> > to assign an arbitrary set of individiual functions to various guests.
> > Why do we need to be able to list many <hostdev> at the same time
> > when hotplugging to assign multiple functions.
> > 
> > Basically can you provide a full description of the problem you are
> > trying to solve and why existing functionality isn't sufficient.
> 
> Hi Daniel,
> 
> This is for cards which may not necessarily be networking cards. Or may be a
> mix of
> networking and storage.
> 
> Suppose, user has below card
> 0005:01:00.0 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer)
> (rev 10)
> 0005:01:00.1 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer)
> (rev 10)
> 0005:01:00.2 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer)
> (rev 10)
> 0005:01:00.3 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer)
> (rev 10)
> 0005:01:00.4 Fibre Channel: Emulex Corporation OneConnect FCoE Initiator
> (Lancer) (rev 10)
> 0005:01:00.5 Fibre Channel: Emulex Corporation OneConnect FCoE Initiator
> (Lancer) (rev 10)

Ok, so this is a device with many functions, but which isn't SRIOV
based, and the goal is to assign the physical device to the guest,
such that guest has all functions available.

> If user wants to hotplug this card to guest, He has to detach all the
> functions from host driver,
> then hotplug 0005:01:00.0, 0005:01:00.1, so on individually. But, today with
> each hotplug
> of the function, each <hostdev> goes to different guest slot. Whereas, PCI
> requires all of
> them to be on the same slot. This is not supported on libvirt today.
> 
> The multifunction cards cant be hotplugged to guest today with the
> individual
> <hostdev>, as the operation is queued by qemu till the function zero of
> guest slot is
> hotplugged. On function zero hotplug, the qemu sends out the event to guest
> for device probing where all the previously hotplugged functions from the
> same slot are discovered. So, grouping the <hostdev>s within the <devices>
> would become necessary to make the whole thing a single operation.

So IIUC, from the patches, if the user wants to assign the physical
device to the guest, they would need to provide XML that looked like
this to the virDomainAttachDevice() method:

    <devices>
      <hostdev mode='subsystem' type='pci' managed='yes'>
        <driver name='vfio'/>
        <source>
          <address domain='0x0000' bus='0x05' slot='0x1' function='0x0'/>
        </source>
      </hostdev>
      <hostdev mode='subsystem' type='pci' managed='yes'>
        <driver name='vfio'/>
        <source>
          <address domain='0x0000' bus='0x05' slot='0x1' function='0x1'/>
        </source>
      </hostdev>
      <hostdev mode='subsystem' type='pci' managed='yes'>
        <driver name='vfio'/>
        <source>
          <address domain='0x0000' bus='0x05' slot='0x1' function='0x2'/>
        </source>
      </hostdev>
      <hostdev mode='subsystem' type='pci' managed='yes'>
        <driver name='vfio'/>
        <source>
          <address domain='0x0000' bus='0x05' slot='0x1' function='0x3'/>
        </source>
      </hostdev>
      <hostdev mode='subsystem' type='pci' managed='yes'>
        <driver name='vfio'/>
        <source>
          <address domain='0x0000' bus='0x05' slot='0x1' function='0x4'/>
        </source>
      </hostdev>
      <hostdev mode='subsystem' type='pci' managed='yes'>
        <driver name='vfio'/>
        <source>
          <address domain='0x0000' bus='0x05' slot='0x1' function='0x5'/>
        </source>
      </hostdev>
    </devices>


Where as if the device were SRIOV based, they would only have to
provide

    <device>
      <hostdev mode='subsystem' type='pci' managed='yes'>
        <driver name='vfio'/>
        <source>
          <address domain='0x0000' bus='0x05' slot='0x1' function='0x0'/>
        </source>
      </hostdev>
    </device>

for the guest to get access to all functions.

I find this difference in behaviour and approach really unpleasant.

I think that they user should only need to provide the the address
of the physical device, in both cases. At most perhaps we need a
new attribute  multifunction="on" on the source address to tell
libvirt that it should attach all the functions, not just the
first

    <device>
      <hostdev mode='subsystem' type='pci' managed='yes'>
        <driver name='vfio'/>
        <source>
          <address domain='0x0000' bus='0x05' slot='0x1' function='0x0' mutlifunction="on"/>
        </source>
      </hostdev>
    </device>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC PATCH 00/28] Enable multifunction pci hotplug
Posted by Alex Williamson 6 years, 8 months ago
On Thu, 15 Mar 2018 14:33:55 +0000
Daniel P. Berrangé <berrange@redhat.com> wrote:

> On Thu, Mar 15, 2018 at 07:54:47PM +0530, Shivaprasad G Bhat wrote:
> > 
> > 
> > On 03/15/2018 03:31 PM, Daniel P. Berrangé wrote:  
> > > On Wed, Mar 14, 2018 at 10:44:30PM +0530, Shivaprasad G Bhat wrote:  
> > > > Hi All,
> > > > 
> > > > I have revisited/rewritten my previously posted patches. Here is
> > > > the RFC. Since this patchset is a complete rewrite, I am starting
> > > > with v1 here.
> > > > 
> > > > The semantics is as discussed before
> > > > https://www.redhat.com/archives/libvir-list/2016-April/msg01057.html
> > > > 
> > > > As I went on to refactor the code to support multifunction virtio devices,
> > > > I realised the abort/cleanup path would be a nightmare there, in case of
> > > > failures. So, dropped that attempt. The current RFC limits to the real
> > > > practical use cases of Multifunction PCI hostdevices. All new test code
> > > > to support multifunction PCI hostdevices and test cases are added to
> > > > prove the functionality.  
> > > I guess I'm not really understanding the use case here.  With SRIOV
> > > devices, you can already choose between assigning either the physical
> > > function (which gives the guest access to all virtual functions), or

Say what?  If a guest is assigned a PF, they get the PF, they don't get
to enable SR-IOV to also get the VFs.  But SR-IOV and multifunction are
far from synonymous nor is SR-IOV ubiquitous to all use cases, so I
don't know why we're bringing SR-IOV into this discussion.

> > > to assign an arbitrary set of individiual functions to various guests.
> > > Why do we need to be able to list many <hostdev> at the same time
> > > when hotplugging to assign multiple functions.
> > > 
> > > Basically can you provide a full description of the problem you are
> > > trying to solve and why existing functionality isn't sufficient.  
> > 
> > Hi Daniel,
> > 
> > This is for cards which may not necessarily be networking cards. Or may be a
> > mix of
> > networking and storage.
> > 
> > Suppose, user has below card
> > 0005:01:00.0 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer)
> > (rev 10)
> > 0005:01:00.1 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer)
> > (rev 10)
> > 0005:01:00.2 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer)
> > (rev 10)
> > 0005:01:00.3 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer)
> > (rev 10)
> > 0005:01:00.4 Fibre Channel: Emulex Corporation OneConnect FCoE Initiator
> > (Lancer) (rev 10)
> > 0005:01:00.5 Fibre Channel: Emulex Corporation OneConnect FCoE Initiator
> > (Lancer) (rev 10)  
> 
> Ok, so this is a device with many functions, but which isn't SRIOV
> based, and the goal is to assign the physical device to the guest,
> such that guest has all functions available.
> 
> > If user wants to hotplug this card to guest, He has to detach all the
> > functions from host driver,
> > then hotplug 0005:01:00.0, 0005:01:00.1, so on individually. But, today with
> > each hotplug
> > of the function, each <hostdev> goes to different guest slot. Whereas, PCI
> > requires all of
> > them to be on the same slot. This is not supported on libvirt today.
> > 
> > The multifunction cards cant be hotplugged to guest today with the
> > individual
> > <hostdev>, as the operation is queued by qemu till the function zero of
> > guest slot is
> > hotplugged. On function zero hotplug, the qemu sends out the event to guest
> > for device probing where all the previously hotplugged functions from the
> > same slot are discovered. So, grouping the <hostdev>s within the <devices>
> > would become necessary to make the whole thing a single operation.  
> 
> So IIUC, from the patches, if the user wants to assign the physical
> device to the guest, they would need to provide XML that looked like
> this to the virDomainAttachDevice() method:
> 
>     <devices>
>       <hostdev mode='subsystem' type='pci' managed='yes'>
>         <driver name='vfio'/>
>         <source>
>           <address domain='0x0000' bus='0x05' slot='0x1' function='0x0'/>
>         </source>
>       </hostdev>
>       <hostdev mode='subsystem' type='pci' managed='yes'>
>         <driver name='vfio'/>
>         <source>
>           <address domain='0x0000' bus='0x05' slot='0x1' function='0x1'/>
>         </source>
>       </hostdev>
>       <hostdev mode='subsystem' type='pci' managed='yes'>
>         <driver name='vfio'/>
>         <source>
>           <address domain='0x0000' bus='0x05' slot='0x1' function='0x2'/>
>         </source>
>       </hostdev>
>       <hostdev mode='subsystem' type='pci' managed='yes'>
>         <driver name='vfio'/>
>         <source>
>           <address domain='0x0000' bus='0x05' slot='0x1' function='0x3'/>
>         </source>
>       </hostdev>
>       <hostdev mode='subsystem' type='pci' managed='yes'>
>         <driver name='vfio'/>
>         <source>
>           <address domain='0x0000' bus='0x05' slot='0x1' function='0x4'/>
>         </source>
>       </hostdev>
>       <hostdev mode='subsystem' type='pci' managed='yes'>
>         <driver name='vfio'/>
>         <source>
>           <address domain='0x0000' bus='0x05' slot='0x1' function='0x5'/>
>         </source>
>       </hostdev>
>     </devices>
> 
> 
> Where as if the device were SRIOV based, they would only have to
> provide
> 
>     <device>
>       <hostdev mode='subsystem' type='pci' managed='yes'>
>         <driver name='vfio'/>
>         <source>
>           <address domain='0x0000' bus='0x05' slot='0x1' function='0x0'/>
>         </source>
>       </hostdev>
>     </device>
> 
> for the guest to get access to all functions.

Since when has this been the case? (nit, the example is domain=0x5,
bus=0x1,...)
 
> I find this difference in behaviour and approach really unpleasant.
> 
> I think that they user should only need to provide the the address
> of the physical device, in both cases. At most perhaps we need a
> new attribute  multifunction="on" on the source address to tell
> libvirt that it should attach all the functions, not just the
> first
> 
>     <device>
>       <hostdev mode='subsystem' type='pci' managed='yes'>
>         <driver name='vfio'/>
>         <source>
>           <address domain='0x0000' bus='0x05' slot='0x1' function='0x0' mutlifunction="on"/>
>         </source>
>       </hostdev>
>     </device>

Neither really bothers me, but I'm confused by the claimed existing
handling of SR-IOV.  Either you're assigning a PF and SR-IOV is
irrelevant and unavailable to the guest or you're assigning a VF and,
well, SR-IOV is still mostly irrelevant to libvirt unless someone
decides to assign the PF hosting the VF or libvirt needs to do VF
configuration via the PF.  Thanks,

Alex

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC PATCH 00/28] Enable multifunction pci hotplug
Posted by Daniel P. Berrangé 6 years, 8 months ago
On Thu, Mar 15, 2018 at 08:59:41AM -0600, Alex Williamson wrote:
> On Thu, 15 Mar 2018 14:33:55 +0000
> Daniel P. Berrangé <berrange@redhat.com> wrote:
> 
> > On Thu, Mar 15, 2018 at 07:54:47PM +0530, Shivaprasad G Bhat wrote:
> > > 
> > > 
> > > On 03/15/2018 03:31 PM, Daniel P. Berrangé wrote:  
> > > > On Wed, Mar 14, 2018 at 10:44:30PM +0530, Shivaprasad G Bhat wrote:  
> > > > > Hi All,
> > > > > 
> > > > > I have revisited/rewritten my previously posted patches. Here is
> > > > > the RFC. Since this patchset is a complete rewrite, I am starting
> > > > > with v1 here.
> > > > > 
> > > > > The semantics is as discussed before
> > > > > https://www.redhat.com/archives/libvir-list/2016-April/msg01057.html
> > > > > 
> > > > > As I went on to refactor the code to support multifunction virtio devices,
> > > > > I realised the abort/cleanup path would be a nightmare there, in case of
> > > > > failures. So, dropped that attempt. The current RFC limits to the real
> > > > > practical use cases of Multifunction PCI hostdevices. All new test code
> > > > > to support multifunction PCI hostdevices and test cases are added to
> > > > > prove the functionality.  
> > > > I guess I'm not really understanding the use case here.  With SRIOV
> > > > devices, you can already choose between assigning either the physical
> > > > function (which gives the guest access to all virtual functions), or
> 
> Say what?  If a guest is assigned a PF, they get the PF, they don't get
> to enable SR-IOV to also get the VFs.  But SR-IOV and multifunction are
> far from synonymous nor is SR-IOV ubiquitous to all use cases, so I
> don't know why we're bringing SR-IOV into this discussion.
> 
> > > > to assign an arbitrary set of individiual functions to various guests.
> > > > Why do we need to be able to list many <hostdev> at the same time
> > > > when hotplugging to assign multiple functions.
> > > > 
> > > > Basically can you provide a full description of the problem you are
> > > > trying to solve and why existing functionality isn't sufficient.  
> > > 
> > > Hi Daniel,
> > > 
> > > This is for cards which may not necessarily be networking cards. Or may be a
> > > mix of
> > > networking and storage.
> > > 
> > > Suppose, user has below card
> > > 0005:01:00.0 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer)
> > > (rev 10)
> > > 0005:01:00.1 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer)
> > > (rev 10)
> > > 0005:01:00.2 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer)
> > > (rev 10)
> > > 0005:01:00.3 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer)
> > > (rev 10)
> > > 0005:01:00.4 Fibre Channel: Emulex Corporation OneConnect FCoE Initiator
> > > (Lancer) (rev 10)
> > > 0005:01:00.5 Fibre Channel: Emulex Corporation OneConnect FCoE Initiator
> > > (Lancer) (rev 10)  
> > 
> > Ok, so this is a device with many functions, but which isn't SRIOV
> > based, and the goal is to assign the physical device to the guest,
> > such that guest has all functions available.
> > 
> > > If user wants to hotplug this card to guest, He has to detach all the
> > > functions from host driver,
> > > then hotplug 0005:01:00.0, 0005:01:00.1, so on individually. But, today with
> > > each hotplug
> > > of the function, each <hostdev> goes to different guest slot. Whereas, PCI
> > > requires all of
> > > them to be on the same slot. This is not supported on libvirt today.
> > > 
> > > The multifunction cards cant be hotplugged to guest today with the
> > > individual
> > > <hostdev>, as the operation is queued by qemu till the function zero of
> > > guest slot is
> > > hotplugged. On function zero hotplug, the qemu sends out the event to guest
> > > for device probing where all the previously hotplugged functions from the
> > > same slot are discovered. So, grouping the <hostdev>s within the <devices>
> > > would become necessary to make the whole thing a single operation.  
> > 
> > So IIUC, from the patches, if the user wants to assign the physical
> > device to the guest, they would need to provide XML that looked like
> > this to the virDomainAttachDevice() method:
> > 
> >     <devices>
> >       <hostdev mode='subsystem' type='pci' managed='yes'>
> >         <driver name='vfio'/>
> >         <source>
> >           <address domain='0x0000' bus='0x05' slot='0x1' function='0x0'/>
> >         </source>
> >       </hostdev>
> >       <hostdev mode='subsystem' type='pci' managed='yes'>
> >         <driver name='vfio'/>
> >         <source>
> >           <address domain='0x0000' bus='0x05' slot='0x1' function='0x1'/>
> >         </source>
> >       </hostdev>
> >       <hostdev mode='subsystem' type='pci' managed='yes'>
> >         <driver name='vfio'/>
> >         <source>
> >           <address domain='0x0000' bus='0x05' slot='0x1' function='0x2'/>
> >         </source>
> >       </hostdev>
> >       <hostdev mode='subsystem' type='pci' managed='yes'>
> >         <driver name='vfio'/>
> >         <source>
> >           <address domain='0x0000' bus='0x05' slot='0x1' function='0x3'/>
> >         </source>
> >       </hostdev>
> >       <hostdev mode='subsystem' type='pci' managed='yes'>
> >         <driver name='vfio'/>
> >         <source>
> >           <address domain='0x0000' bus='0x05' slot='0x1' function='0x4'/>
> >         </source>
> >       </hostdev>
> >       <hostdev mode='subsystem' type='pci' managed='yes'>
> >         <driver name='vfio'/>
> >         <source>
> >           <address domain='0x0000' bus='0x05' slot='0x1' function='0x5'/>
> >         </source>
> >       </hostdev>
> >     </devices>
> > 
> > 
> > Where as if the device were SRIOV based, they would only have to
> > provide
> > 
> >     <device>
> >       <hostdev mode='subsystem' type='pci' managed='yes'>
> >         <driver name='vfio'/>
> >         <source>
> >           <address domain='0x0000' bus='0x05' slot='0x1' function='0x0'/>
> >         </source>
> >       </hostdev>
> >     </device>
> > 
> > for the guest to get access to all functions.
> 
> Since when has this been the case? (nit, the example is domain=0x5,
> bus=0x1,...)
>  
> > I find this difference in behaviour and approach really unpleasant.
> > 
> > I think that they user should only need to provide the the address
> > of the physical device, in both cases. At most perhaps we need a
> > new attribute  multifunction="on" on the source address to tell
> > libvirt that it should attach all the functions, not just the
> > first
> > 
> >     <device>
> >       <hostdev mode='subsystem' type='pci' managed='yes'>
> >         <driver name='vfio'/>
> >         <source>
> >           <address domain='0x0000' bus='0x05' slot='0x1' function='0x0' mutlifunction="on"/>
> >         </source>
> >       </hostdev>
> >     </device>
> 
> Neither really bothers me, but I'm confused by the claimed existing
> handling of SR-IOV.  Either you're assigning a PF and SR-IOV is
> irrelevant and unavailable to the guest or you're assigning a VF and,
> well, SR-IOV is still mostly irrelevant to libvirt unless someone
> decides to assign the PF hosting the VF or libvirt needs to do VF
> configuration via the PF.  Thanks,

Hmm, could be a mis-understanding then. I was under the belief that
when you assign the PF or a SRIOV device to the guest, all the
VFs obviously disappear from the host due to driver being unloaded.
The guest now has the PF, and would have the option to enable VFs
too if it so desired, just as the host had option for when it owned
the PF.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC PATCH 00/28] Enable multifunction pci hotplug
Posted by Alex Williamson 6 years, 8 months ago
On Thu, 15 Mar 2018 15:06:38 +0000
Daniel P. Berrangé <berrange@redhat.com> wrote:
> On Thu, Mar 15, 2018 at 08:59:41AM -0600, Alex Williamson wrote:
> > 
> > Neither really bothers me, but I'm confused by the claimed existing
> > handling of SR-IOV.  Either you're assigning a PF and SR-IOV is
> > irrelevant and unavailable to the guest or you're assigning a VF and,
> > well, SR-IOV is still mostly irrelevant to libvirt unless someone
> > decides to assign the PF hosting the VF or libvirt needs to do VF
> > configuration via the PF.  Thanks,  
> 
> Hmm, could be a mis-understanding then. I was under the belief that
> when you assign the PF or a SRIOV device to the guest, all the
> VFs obviously disappear from the host due to driver being unloaded.
> The guest now has the PF, and would have the option to enable VFs
> too if it so desired, just as the host had option for when it owned
> the PF.

Yeah, that's not how it currently works.  Some people would like it if
this were the case, but we've not gotten past the security issues.  If
the user is allowed to enable SR-IOV, those VFs don't just appear for
the VM, they appear on the host.  The host needs to probe for them,
assign resources, and attach drivers.  What should the host do with VFs
that are managed by an untrusted userspace driver?  The isolation
between VFs and PFs depends on the vendor's SR-IOV implementation.
Minimally, the PF driver manages the PCI bus master config bit and can
trivially introduce a denial of service for the VFs.  Allowing a VM to
enable SR-IOV only for the purpose of assigning those VFs back to the VM
owning the PF doesn't seem to be a particularly compelling feature on
its own.  Thanks,

Alex

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC PATCH 00/28] Enable multifunction pci hotplug
Posted by Daniel P. Berrangé 6 years, 8 months ago
On Thu, Mar 15, 2018 at 09:54:46AM -0600, Alex Williamson wrote:
> On Thu, 15 Mar 2018 15:06:38 +0000
> Daniel P. Berrangé <berrange@redhat.com> wrote:
> > On Thu, Mar 15, 2018 at 08:59:41AM -0600, Alex Williamson wrote:
> > > 
> > > Neither really bothers me, but I'm confused by the claimed existing
> > > handling of SR-IOV.  Either you're assigning a PF and SR-IOV is
> > > irrelevant and unavailable to the guest or you're assigning a VF and,
> > > well, SR-IOV is still mostly irrelevant to libvirt unless someone
> > > decides to assign the PF hosting the VF or libvirt needs to do VF
> > > configuration via the PF.  Thanks,  
> > 
> > Hmm, could be a mis-understanding then. I was under the belief that
> > when you assign the PF or a SRIOV device to the guest, all the
> > VFs obviously disappear from the host due to driver being unloaded.
> > The guest now has the PF, and would have the option to enable VFs
> > too if it so desired, just as the host had option for when it owned
> > the PF.
> 
> Yeah, that's not how it currently works.  Some people would like it if
> this were the case, but we've not gotten past the security issues.  If
> the user is allowed to enable SR-IOV, those VFs don't just appear for
> the VM, they appear on the host.  The host needs to probe for them,
> assign resources, and attach drivers.  What should the host do with VFs
> that are managed by an untrusted userspace driver?  The isolation
> between VFs and PFs depends on the vendor's SR-IOV implementation.
> Minimally, the PF driver manages the PCI bus master config bit and can
> trivially introduce a denial of service for the VFs.  Allowing a VM to
> enable SR-IOV only for the purpose of assigning those VFs back to the VM
> owning the PF doesn't seem to be a particularly compelling feature on
> its own.  Thanks,

So it sounds like if, in the future, we did want to allow the guest to
have the PF, *and* all the VFs at the same time, we would probably need
to arrange that all from the host side, vaguely like is being proposed
in this series for non-SRIOV, and not let the guest have control over VF
create/delete itself. 

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC PATCH 00/28] Enable multifunction pci hotplug
Posted by Shivaprasad G Bhat 6 years, 8 months ago

On 03/15/2018 08:03 PM, Daniel P. Berrangé wrote:
> On Thu, Mar 15, 2018 at 07:54:47PM +0530, Shivaprasad G Bhat wrote:
>>
>> On 03/15/2018 03:31 PM, Daniel P. Berrangé wrote:
>>> On Wed, Mar 14, 2018 at 10:44:30PM +0530, Shivaprasad G Bhat wrote:
>>>> Hi All,
>>>>
>>>> I have revisited/rewritten my previously posted patches. Here is
>>>> the RFC. Since this patchset is a complete rewrite, I am starting
>>>> with v1 here.
>>>>
>>>> The semantics is as discussed before
>>>> https://www.redhat.com/archives/libvir-list/2016-April/msg01057.html
>>>>
>>>> As I went on to refactor the code to support multifunction virtio devices,
>>>> I realised the abort/cleanup path would be a nightmare there, in case of
>>>> failures. So, dropped that attempt. The current RFC limits to the real
>>>> practical use cases of Multifunction PCI hostdevices. All new test code
>>>> to support multifunction PCI hostdevices and test cases are added to
>>>> prove the functionality.
>>> I guess I'm not really understanding the use case here.  With SRIOV
>>> devices, you can already choose between assigning either the physical
>>> function (which gives the guest access to all virtual functions), or
>>> to assign an arbitrary set of individiual functions to various guests.
>>> Why do we need to be able to list many <hostdev> at the same time
>>> when hotplugging to assign multiple functions.
>>>
>>> Basically can you provide a full description of the problem you are
>>> trying to solve and why existing functionality isn't sufficient.
>> Hi Daniel,
>>
>> This is for cards which may not necessarily be networking cards. Or may be a
>> mix of
>> networking and storage.
>>
>> Suppose, user has below card
>> 0005:01:00.0 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer)
>> (rev 10)
>> 0005:01:00.1 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer)
>> (rev 10)
>> 0005:01:00.2 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer)
>> (rev 10)
>> 0005:01:00.3 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer)
>> (rev 10)
>> 0005:01:00.4 Fibre Channel: Emulex Corporation OneConnect FCoE Initiator
>> (Lancer) (rev 10)
>> 0005:01:00.5 Fibre Channel: Emulex Corporation OneConnect FCoE Initiator
>> (Lancer) (rev 10)
> Ok, so this is a device with many functions, but which isn't SRIOV
> based, and the goal is to assign the physical device to the guest,
> such that guest has all functions available.
>
>> If user wants to hotplug this card to guest, He has to detach all the
>> functions from host driver,
>> then hotplug 0005:01:00.0, 0005:01:00.1, so on individually. But, today with
>> each hotplug
>> of the function, each <hostdev> goes to different guest slot. Whereas, PCI
>> requires all of
>> them to be on the same slot. This is not supported on libvirt today.
>>
>> The multifunction cards cant be hotplugged to guest today with the
>> individual
>> <hostdev>, as the operation is queued by qemu till the function zero of
>> guest slot is
>> hotplugged. On function zero hotplug, the qemu sends out the event to guest
>> for device probing where all the previously hotplugged functions from the
>> same slot are discovered. So, grouping the <hostdev>s within the <devices>
>> would become necessary to make the whole thing a single operation.
> So IIUC, from the patches, if the user wants to assign the physical
> device to the guest, they would need to provide XML that looked like
> this to the virDomainAttachDevice() method:
>
>      <devices>
>        <hostdev mode='subsystem' type='pci' managed='yes'>
>          <driver name='vfio'/>
>          <source>
>            <address domain='0x0000' bus='0x05' slot='0x1' function='0x0'/>
>          </source>
>        </hostdev>
>        <hostdev mode='subsystem' type='pci' managed='yes'>
>          <driver name='vfio'/>
>          <source>
>            <address domain='0x0000' bus='0x05' slot='0x1' function='0x1'/>
>          </source>
>        </hostdev>
>        <hostdev mode='subsystem' type='pci' managed='yes'>
>          <driver name='vfio'/>
>          <source>
>            <address domain='0x0000' bus='0x05' slot='0x1' function='0x2'/>
>          </source>
>        </hostdev>
>        <hostdev mode='subsystem' type='pci' managed='yes'>
>          <driver name='vfio'/>
>          <source>
>            <address domain='0x0000' bus='0x05' slot='0x1' function='0x3'/>
>          </source>
>        </hostdev>
>        <hostdev mode='subsystem' type='pci' managed='yes'>
>          <driver name='vfio'/>
>          <source>
>            <address domain='0x0000' bus='0x05' slot='0x1' function='0x4'/>
>          </source>
>        </hostdev>
>        <hostdev mode='subsystem' type='pci' managed='yes'>
>          <driver name='vfio'/>
>          <source>
>            <address domain='0x0000' bus='0x05' slot='0x1' function='0x5'/>
>          </source>
>        </hostdev>
>      </devices>
>
>
> Where as if the device were SRIOV based, they would only have to
> provide
>
>      <device>
>        <hostdev mode='subsystem' type='pci' managed='yes'>
>          <driver name='vfio'/>
>          <source>
>            <address domain='0x0000' bus='0x05' slot='0x1' function='0x0'/>
>          </source>
>        </hostdev>
>      </device>
>
> for the guest to get access to all functions.
>
> I find this difference in behaviour and approach really unpleasant.
>
> I think that they user should only need to provide the the address
> of the physical device, in both cases. At most perhaps we need a
> new attribute  multifunction="on" on the source address to tell
> libvirt that it should attach all the functions, not just the
> first
>
>      <device>
>        <hostdev mode='subsystem' type='pci' managed='yes'>
>          <driver name='vfio'/>
>          <source>
>            <address domain='0x0000' bus='0x05' slot='0x1' function='0x0' mutlifunction="on"/>
>          </source>
>        </hostdev>
>      </device>

But with this approach the user can not prevent few functions from being 
assigned
to guest if he wants to. It will be all or none. The PCI requires only 
function zero to be
present and so, partial assignment is expected to work.

So user should have that control?

>
> Regards,
> Daniel

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC PATCH 00/28] Enable multifunction pci hotplug
Posted by Daniel P. Berrangé 6 years, 8 months ago
On Thu, Mar 15, 2018 at 08:47:32PM +0530, Shivaprasad G Bhat wrote:
> 
> 
> On 03/15/2018 08:03 PM, Daniel P. Berrangé wrote:
> > On Thu, Mar 15, 2018 at 07:54:47PM +0530, Shivaprasad G Bhat wrote:
> > > 
> > > On 03/15/2018 03:31 PM, Daniel P. Berrangé wrote:
> > > > On Wed, Mar 14, 2018 at 10:44:30PM +0530, Shivaprasad G Bhat wrote:
> > > > > Hi All,
> > > > > 
> > > > > I have revisited/rewritten my previously posted patches. Here is
> > > > > the RFC. Since this patchset is a complete rewrite, I am starting
> > > > > with v1 here.
> > > > > 
> > > > > The semantics is as discussed before
> > > > > https://www.redhat.com/archives/libvir-list/2016-April/msg01057.html
> > > > > 
> > > > > As I went on to refactor the code to support multifunction virtio devices,
> > > > > I realised the abort/cleanup path would be a nightmare there, in case of
> > > > > failures. So, dropped that attempt. The current RFC limits to the real
> > > > > practical use cases of Multifunction PCI hostdevices. All new test code
> > > > > to support multifunction PCI hostdevices and test cases are added to
> > > > > prove the functionality.
> > > > I guess I'm not really understanding the use case here.  With SRIOV
> > > > devices, you can already choose between assigning either the physical
> > > > function (which gives the guest access to all virtual functions), or
> > > > to assign an arbitrary set of individiual functions to various guests.
> > > > Why do we need to be able to list many <hostdev> at the same time
> > > > when hotplugging to assign multiple functions.
> > > > 
> > > > Basically can you provide a full description of the problem you are
> > > > trying to solve and why existing functionality isn't sufficient.
> > > Hi Daniel,
> > > 
> > > This is for cards which may not necessarily be networking cards. Or may be a
> > > mix of
> > > networking and storage.
> > > 
> > > Suppose, user has below card
> > > 0005:01:00.0 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer)
> > > (rev 10)
> > > 0005:01:00.1 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer)
> > > (rev 10)
> > > 0005:01:00.2 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer)
> > > (rev 10)
> > > 0005:01:00.3 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer)
> > > (rev 10)
> > > 0005:01:00.4 Fibre Channel: Emulex Corporation OneConnect FCoE Initiator
> > > (Lancer) (rev 10)
> > > 0005:01:00.5 Fibre Channel: Emulex Corporation OneConnect FCoE Initiator
> > > (Lancer) (rev 10)
> > Ok, so this is a device with many functions, but which isn't SRIOV
> > based, and the goal is to assign the physical device to the guest,
> > such that guest has all functions available.
> > 
> > > If user wants to hotplug this card to guest, He has to detach all the
> > > functions from host driver,
> > > then hotplug 0005:01:00.0, 0005:01:00.1, so on individually. But, today with
> > > each hotplug
> > > of the function, each <hostdev> goes to different guest slot. Whereas, PCI
> > > requires all of
> > > them to be on the same slot. This is not supported on libvirt today.
> > > 
> > > The multifunction cards cant be hotplugged to guest today with the
> > > individual
> > > <hostdev>, as the operation is queued by qemu till the function zero of
> > > guest slot is
> > > hotplugged. On function zero hotplug, the qemu sends out the event to guest
> > > for device probing where all the previously hotplugged functions from the
> > > same slot are discovered. So, grouping the <hostdev>s within the <devices>
> > > would become necessary to make the whole thing a single operation.
> > So IIUC, from the patches, if the user wants to assign the physical
> > device to the guest, they would need to provide XML that looked like
> > this to the virDomainAttachDevice() method:
> > 
> >      <devices>
> >        <hostdev mode='subsystem' type='pci' managed='yes'>
> >          <driver name='vfio'/>
> >          <source>
> >            <address domain='0x0000' bus='0x05' slot='0x1' function='0x0'/>
> >          </source>
> >        </hostdev>
> >        <hostdev mode='subsystem' type='pci' managed='yes'>
> >          <driver name='vfio'/>
> >          <source>
> >            <address domain='0x0000' bus='0x05' slot='0x1' function='0x1'/>
> >          </source>
> >        </hostdev>
> >        <hostdev mode='subsystem' type='pci' managed='yes'>
> >          <driver name='vfio'/>
> >          <source>
> >            <address domain='0x0000' bus='0x05' slot='0x1' function='0x2'/>
> >          </source>
> >        </hostdev>
> >        <hostdev mode='subsystem' type='pci' managed='yes'>
> >          <driver name='vfio'/>
> >          <source>
> >            <address domain='0x0000' bus='0x05' slot='0x1' function='0x3'/>
> >          </source>
> >        </hostdev>
> >        <hostdev mode='subsystem' type='pci' managed='yes'>
> >          <driver name='vfio'/>
> >          <source>
> >            <address domain='0x0000' bus='0x05' slot='0x1' function='0x4'/>
> >          </source>
> >        </hostdev>
> >        <hostdev mode='subsystem' type='pci' managed='yes'>
> >          <driver name='vfio'/>
> >          <source>
> >            <address domain='0x0000' bus='0x05' slot='0x1' function='0x5'/>
> >          </source>
> >        </hostdev>
> >      </devices>
> > 
> > 
> > Where as if the device were SRIOV based, they would only have to
> > provide
> > 
> >      <device>
> >        <hostdev mode='subsystem' type='pci' managed='yes'>
> >          <driver name='vfio'/>
> >          <source>
> >            <address domain='0x0000' bus='0x05' slot='0x1' function='0x0'/>
> >          </source>
> >        </hostdev>
> >      </device>
> > 
> > for the guest to get access to all functions.
> > 
> > I find this difference in behaviour and approach really unpleasant.
> > 
> > I think that they user should only need to provide the the address
> > of the physical device, in both cases. At most perhaps we need a
> > new attribute  multifunction="on" on the source address to tell
> > libvirt that it should attach all the functions, not just the
> > first
> > 
> >      <device>
> >        <hostdev mode='subsystem' type='pci' managed='yes'>
> >          <driver name='vfio'/>
> >          <source>
> >            <address domain='0x0000' bus='0x05' slot='0x1' function='0x0' mutlifunction="on"/>
> >          </source>
> >        </hostdev>
> >      </device>
> 
> But with this approach the user can not prevent few functions from being
> assigned
> to guest if he wants to. It will be all or none. The PCI requires only
> function zero to be
> present and so, partial assignment is expected to work.

IIUC, once you've assigned the device with function 0 to a guest,
no other guest or the host, can safely used the other functions
of the device, right ?  So I just assumed that if you're going to
give some functions to the guest you might as well give them all,
as nothing else can use them.

> So user should have that control?

Is there a use case for only giving some of the them ?


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC PATCH 00/28] Enable multifunction pci hotplug
Posted by Laine Stump 6 years, 8 months ago
On 03/15/2018 11:35 AM, Daniel P. Berrangé wrote:
> On Thu, Mar 15, 2018 at 08:47:32PM +0530, Shivaprasad G Bhat wrote:
>>
>> On 03/15/2018 08:03 PM, Daniel P. Berrangé wrote:
>>> On Thu, Mar 15, 2018 at 07:54:47PM +0530, Shivaprasad G Bhat wrote:
>>>> On 03/15/2018 03:31 PM, Daniel P. Berrangé wrote:
>>>>> On Wed, Mar 14, 2018 at 10:44:30PM +0530, Shivaprasad G Bhat wrote:
>>>>>> Hi All,
>>>>>>
>>>>>> I have revisited/rewritten my previously posted patches. Here is
>>>>>> the RFC. Since this patchset is a complete rewrite, I am starting
>>>>>> with v1 here.
>>>>>>
>>>>>> The semantics is as discussed before
>>>>>> https://www.redhat.com/archives/libvir-list/2016-April/msg01057.html
>>>>>>
>>>>>> As I went on to refactor the code to support multifunction virtio devices,
>>>>>> I realised the abort/cleanup path would be a nightmare there, in case of
>>>>>> failures. So, dropped that attempt. The current RFC limits to the real
>>>>>> practical use cases of Multifunction PCI hostdevices. All new test code
>>>>>> to support multifunction PCI hostdevices and test cases are added to
>>>>>> prove the functionality.
>>>>> I guess I'm not really understanding the use case here.  With SRIOV
>>>>> devices, you can already choose between assigning either the physical
>>>>> function (which gives the guest access to all virtual functions), or
>>>>> to assign an arbitrary set of individiual functions to various guests.
>>>>> Why do we need to be able to list many <hostdev> at the same time
>>>>> when hotplugging to assign multiple functions.
>>>>>
>>>>> Basically can you provide a full description of the problem you are
>>>>> trying to solve and why existing functionality isn't sufficient.
>>>> Hi Daniel,
>>>>
>>>> This is for cards which may not necessarily be networking cards. Or may be a
>>>> mix of
>>>> networking and storage.
>>>>
>>>> Suppose, user has below card
>>>> 0005:01:00.0 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer)
>>>> (rev 10)
>>>> 0005:01:00.1 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer)
>>>> (rev 10)
>>>> 0005:01:00.2 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer)
>>>> (rev 10)
>>>> 0005:01:00.3 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer)
>>>> (rev 10)
>>>> 0005:01:00.4 Fibre Channel: Emulex Corporation OneConnect FCoE Initiator
>>>> (Lancer) (rev 10)
>>>> 0005:01:00.5 Fibre Channel: Emulex Corporation OneConnect FCoE Initiator
>>>> (Lancer) (rev 10)
>>> Ok, so this is a device with many functions, but which isn't SRIOV
>>> based, and the goal is to assign the physical device to the guest,
>>> such that guest has all functions available.
>>>
>>>> If user wants to hotplug this card to guest, He has to detach all the
>>>> functions from host driver,
>>>> then hotplug 0005:01:00.0, 0005:01:00.1, so on individually. But, today with
>>>> each hotplug
>>>> of the function, each <hostdev> goes to different guest slot. Whereas, PCI
>>>> requires all of
>>>> them to be on the same slot. This is not supported on libvirt today.
>>>>
>>>> The multifunction cards cant be hotplugged to guest today with the
>>>> individual
>>>> <hostdev>, as the operation is queued by qemu till the function zero of
>>>> guest slot is
>>>> hotplugged. On function zero hotplug, the qemu sends out the event to guest
>>>> for device probing where all the previously hotplugged functions from the
>>>> same slot are discovered. So, grouping the <hostdev>s within the <devices>
>>>> would become necessary to make the whole thing a single operation.
>>> So IIUC, from the patches, if the user wants to assign the physical
>>> device to the guest, they would need to provide XML that looked like
>>> this to the virDomainAttachDevice() method:
>>>
>>>      <devices>
>>>        <hostdev mode='subsystem' type='pci' managed='yes'>
>>>          <driver name='vfio'/>
>>>          <source>
>>>            <address domain='0x0000' bus='0x05' slot='0x1' function='0x0'/>
>>>          </source>
>>>        </hostdev>
>>>        <hostdev mode='subsystem' type='pci' managed='yes'>
>>>          <driver name='vfio'/>
>>>          <source>
>>>            <address domain='0x0000' bus='0x05' slot='0x1' function='0x1'/>
>>>          </source>
>>>        </hostdev>
>>>        <hostdev mode='subsystem' type='pci' managed='yes'>
>>>          <driver name='vfio'/>
>>>          <source>
>>>            <address domain='0x0000' bus='0x05' slot='0x1' function='0x2'/>
>>>          </source>
>>>        </hostdev>
>>>        <hostdev mode='subsystem' type='pci' managed='yes'>
>>>          <driver name='vfio'/>
>>>          <source>
>>>            <address domain='0x0000' bus='0x05' slot='0x1' function='0x3'/>
>>>          </source>
>>>        </hostdev>
>>>        <hostdev mode='subsystem' type='pci' managed='yes'>
>>>          <driver name='vfio'/>
>>>          <source>
>>>            <address domain='0x0000' bus='0x05' slot='0x1' function='0x4'/>
>>>          </source>
>>>        </hostdev>
>>>        <hostdev mode='subsystem' type='pci' managed='yes'>
>>>          <driver name='vfio'/>
>>>          <source>
>>>            <address domain='0x0000' bus='0x05' slot='0x1' function='0x5'/>
>>>          </source>
>>>        </hostdev>
>>>      </devices>
>>>
>>>
>>> Where as if the device were SRIOV based, they would only have to
>>> provide
>>>
>>>      <device>
>>>        <hostdev mode='subsystem' type='pci' managed='yes'>
>>>          <driver name='vfio'/>
>>>          <source>
>>>            <address domain='0x0000' bus='0x05' slot='0x1' function='0x0'/>
>>>          </source>
>>>        </hostdev>
>>>      </device>
>>>
>>> for the guest to get access to all functions.
>>>
>>> I find this difference in behaviour and approach really unpleasant.
>>>
>>> I think that they user should only need to provide the the address
>>> of the physical device, in both cases. At most perhaps we need a
>>> new attribute  multifunction="on" on the source address to tell
>>> libvirt that it should attach all the functions, not just the
>>> first
>>>
>>>      <device>
>>>        <hostdev mode='subsystem' type='pci' managed='yes'>
>>>          <driver name='vfio'/>
>>>          <source>
>>>            <address domain='0x0000' bus='0x05' slot='0x1' function='0x0' mutlifunction="on"/>
>>>          </source>
>>>        </hostdev>
>>>      </device>
>> But with this approach the user can not prevent few functions from being
>> assigned
>> to guest if he wants to. It will be all or none. The PCI requires only
>> function zero to be
>> present and so, partial assignment is expected to work.
> IIUC, once you've assigned the device with function 0 to a guest,
> no other guest or the host, can safely used the other functions
> of the device, right ?  So I just assumed that if you're going to
> give some functions to the guest you might as well give them all,
> as nothing else can use them.

You bring up a good point that the unassigned functions probably can't
be safely used for anything else (since they're almost surely in the
same IOMMU group), but the host may not want to give all of them to the
guest. This points out that if we intend for managed='yes' to operate
properly, those other functions will need to be bound to vfio-pci (or
nothing). (*But* since in the past we've said that we don't want to 
implicitly re-bind devices to vfio-pci during assignment if they're not
explicitly called out in the config, we're either going to need some
method of specifying "manage (i.e. auto-bind to vfio-pci) this other
device in the iommu group, but don't assign it", or just inform people
that it's going to fail if they don't use managed='no' and bind to
vfio-pci themselves.


>
>> So user should have that control?
> Is there a use case for only giving some of the them ?

Maybe one of the devices is dangerous to hand over to a guest?

Aside from that, there are a few other scenarios where explicitly
spelling out the devices to be assigned (and where to assign them) makes
sense:

1) possibly (e.g. for testing or compatibility purposes) you want the
addressing of the devices on the guest to be different from what they
are on the host. You may want to put them at different functions, or you
may even want/need devices that are on different functions of the same
slot on the host to be on different slots in the guest.

2) Someone might want to assign multiple *emulated* devices to multiple
functions on the same slot in the guest (either to mimic the device
layout of real hardware, or to overcome slot limitations). Since the
emulated devices have no "physical" topology to mimic on the guest, it
must necessarily be spelled out in the config.

3) There may be a piece of hardware that changes what functions are
populated with devices based on config (an admittedly un-useful example
is the multiple VFs of an SRIOV network card). If you just use
"multifunction='on'" to mean "assign *ALL THE DEVICES*!!!" (insert
spear-wielding meme here), then the hardware given to the guest will
change based on how the hardware has been configured prior to assignment.

Also, in the future there may various knobs that need to be adjusted (in
the config) for the individual devices, and if the assignment of the
device to the guest is just implied by "multifunction='on'" then there
will be no place for that config to live.

I think it's safest to explicitly spell out which device will be
assigned, and where they will be assigned.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC PATCH 00/28] Enable multifunction pci hotplug
Posted by Alex Williamson 6 years, 8 months ago
On Thu, 15 Mar 2018 17:53:02 -0400
Laine Stump <laine@laine.org> wrote:

> On 03/15/2018 11:35 AM, Daniel P. Berrangé wrote:
> > On Thu, Mar 15, 2018 at 08:47:32PM +0530, Shivaprasad G Bhat wrote:  
> >>
> >> On 03/15/2018 08:03 PM, Daniel P. Berrangé wrote:  
> >>> On Thu, Mar 15, 2018 at 07:54:47PM +0530, Shivaprasad G Bhat wrote:  
> >>>> On 03/15/2018 03:31 PM, Daniel P. Berrangé wrote:  
> >>>>> On Wed, Mar 14, 2018 at 10:44:30PM +0530, Shivaprasad G Bhat wrote:  
> >>>>>> Hi All,
> >>>>>>
> >>>>>> I have revisited/rewritten my previously posted patches. Here is
> >>>>>> the RFC. Since this patchset is a complete rewrite, I am starting
> >>>>>> with v1 here.
> >>>>>>
> >>>>>> The semantics is as discussed before
> >>>>>> https://www.redhat.com/archives/libvir-list/2016-April/msg01057.html
> >>>>>>
> >>>>>> As I went on to refactor the code to support multifunction virtio devices,
> >>>>>> I realised the abort/cleanup path would be a nightmare there, in case of
> >>>>>> failures. So, dropped that attempt. The current RFC limits to the real
> >>>>>> practical use cases of Multifunction PCI hostdevices. All new test code
> >>>>>> to support multifunction PCI hostdevices and test cases are added to
> >>>>>> prove the functionality.  
> >>>>> I guess I'm not really understanding the use case here.  With SRIOV
> >>>>> devices, you can already choose between assigning either the physical
> >>>>> function (which gives the guest access to all virtual functions), or
> >>>>> to assign an arbitrary set of individiual functions to various guests.
> >>>>> Why do we need to be able to list many <hostdev> at the same time
> >>>>> when hotplugging to assign multiple functions.
> >>>>>
> >>>>> Basically can you provide a full description of the problem you are
> >>>>> trying to solve and why existing functionality isn't sufficient.  
> >>>> Hi Daniel,
> >>>>
> >>>> This is for cards which may not necessarily be networking cards. Or may be a
> >>>> mix of
> >>>> networking and storage.
> >>>>
> >>>> Suppose, user has below card
> >>>> 0005:01:00.0 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer)
> >>>> (rev 10)
> >>>> 0005:01:00.1 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer)
> >>>> (rev 10)
> >>>> 0005:01:00.2 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer)
> >>>> (rev 10)
> >>>> 0005:01:00.3 Ethernet controller: Emulex Corporation OneConnect NIC (Lancer)
> >>>> (rev 10)
> >>>> 0005:01:00.4 Fibre Channel: Emulex Corporation OneConnect FCoE Initiator
> >>>> (Lancer) (rev 10)
> >>>> 0005:01:00.5 Fibre Channel: Emulex Corporation OneConnect FCoE Initiator
> >>>> (Lancer) (rev 10)  
> >>> Ok, so this is a device with many functions, but which isn't SRIOV
> >>> based, and the goal is to assign the physical device to the guest,
> >>> such that guest has all functions available.
> >>>  
> >>>> If user wants to hotplug this card to guest, He has to detach all the
> >>>> functions from host driver,
> >>>> then hotplug 0005:01:00.0, 0005:01:00.1, so on individually. But, today with
> >>>> each hotplug
> >>>> of the function, each <hostdev> goes to different guest slot. Whereas, PCI
> >>>> requires all of
> >>>> them to be on the same slot. This is not supported on libvirt today.
> >>>>
> >>>> The multifunction cards cant be hotplugged to guest today with the
> >>>> individual
> >>>> <hostdev>, as the operation is queued by qemu till the function zero of
> >>>> guest slot is
> >>>> hotplugged. On function zero hotplug, the qemu sends out the event to guest
> >>>> for device probing where all the previously hotplugged functions from the
> >>>> same slot are discovered. So, grouping the <hostdev>s within the <devices>
> >>>> would become necessary to make the whole thing a single operation.  
> >>> So IIUC, from the patches, if the user wants to assign the physical
> >>> device to the guest, they would need to provide XML that looked like
> >>> this to the virDomainAttachDevice() method:
> >>>
> >>>      <devices>
> >>>        <hostdev mode='subsystem' type='pci' managed='yes'>
> >>>          <driver name='vfio'/>
> >>>          <source>
> >>>            <address domain='0x0000' bus='0x05' slot='0x1' function='0x0'/>
> >>>          </source>
> >>>        </hostdev>
> >>>        <hostdev mode='subsystem' type='pci' managed='yes'>
> >>>          <driver name='vfio'/>
> >>>          <source>
> >>>            <address domain='0x0000' bus='0x05' slot='0x1' function='0x1'/>
> >>>          </source>
> >>>        </hostdev>
> >>>        <hostdev mode='subsystem' type='pci' managed='yes'>
> >>>          <driver name='vfio'/>
> >>>          <source>
> >>>            <address domain='0x0000' bus='0x05' slot='0x1' function='0x2'/>
> >>>          </source>
> >>>        </hostdev>
> >>>        <hostdev mode='subsystem' type='pci' managed='yes'>
> >>>          <driver name='vfio'/>
> >>>          <source>
> >>>            <address domain='0x0000' bus='0x05' slot='0x1' function='0x3'/>
> >>>          </source>
> >>>        </hostdev>
> >>>        <hostdev mode='subsystem' type='pci' managed='yes'>
> >>>          <driver name='vfio'/>
> >>>          <source>
> >>>            <address domain='0x0000' bus='0x05' slot='0x1' function='0x4'/>
> >>>          </source>
> >>>        </hostdev>
> >>>        <hostdev mode='subsystem' type='pci' managed='yes'>
> >>>          <driver name='vfio'/>
> >>>          <source>
> >>>            <address domain='0x0000' bus='0x05' slot='0x1' function='0x5'/>
> >>>          </source>
> >>>        </hostdev>
> >>>      </devices>
> >>>
> >>>
> >>> Where as if the device were SRIOV based, they would only have to
> >>> provide
> >>>
> >>>      <device>
> >>>        <hostdev mode='subsystem' type='pci' managed='yes'>
> >>>          <driver name='vfio'/>
> >>>          <source>
> >>>            <address domain='0x0000' bus='0x05' slot='0x1' function='0x0'/>
> >>>          </source>
> >>>        </hostdev>
> >>>      </device>
> >>>
> >>> for the guest to get access to all functions.
> >>>
> >>> I find this difference in behaviour and approach really unpleasant.
> >>>
> >>> I think that they user should only need to provide the the address
> >>> of the physical device, in both cases. At most perhaps we need a
> >>> new attribute  multifunction="on" on the source address to tell
> >>> libvirt that it should attach all the functions, not just the
> >>> first
> >>>
> >>>      <device>
> >>>        <hostdev mode='subsystem' type='pci' managed='yes'>
> >>>          <driver name='vfio'/>
> >>>          <source>
> >>>            <address domain='0x0000' bus='0x05' slot='0x1' function='0x0' mutlifunction="on"/>
> >>>          </source>
> >>>        </hostdev>
> >>>      </device>  
> >> But with this approach the user can not prevent few functions from being
> >> assigned
> >> to guest if he wants to. It will be all or none. The PCI requires only
> >> function zero to be
> >> present and so, partial assignment is expected to work.  
> > IIUC, once you've assigned the device with function 0 to a guest,
> > no other guest or the host, can safely used the other functions
> > of the device, right ?  So I just assumed that if you're going to
> > give some functions to the guest you might as well give them all,
> > as nothing else can use them.  

No!  We have IOMMU groups to define what devices are not isolated,
please do not start making further assumptions based simply on
functions being in the same slot.  Multifunction device can support ACS
and define that the functions are DMA isolated and we also have quite a
few quirks in the kernel for devices where the vendor has vouched for
the functions being isolated.  Even if the functions are grouped
together, there's no guarantee that the user actually wants all the
functions attached.  If it's a special mode where defining
multifunction on the source identity maps all the functions and allows
hotplug, fine, but we have many existing users where functions are
assigned to separate VMs.
 
> You bring up a good point that the unassigned functions probably can't
> be safely used for anything else (since they're almost surely in the
> same IOMMU group),

No, they're not almost surely in the same group.

> but the host may not want to give all of them to the
> guest.

If they are in the same group, then the host must give them all to the
user, but that doesn't imply the user wants them all assigned.  For
instance, RHEL doesn't by default support assignment of a Quadro card's
audio function, even though they're in the same group (interrupts are
broken).

> This points out that if we intend for managed='yes' to operate
> properly, those other functions will need to be bound to vfio-pci (or
> nothing). (*But* since in the past we've said that we don't want to 
> implicitly re-bind devices to vfio-pci during assignment if they're not
> explicitly called out in the config, we're either going to need some
> method of specifying "manage (i.e. auto-bind to vfio-pci) this other
> device in the iommu group, but don't assign it", or just inform people
> that it's going to fail if they don't use managed='no' and bind to
> vfio-pci themselves.
> 
> 
> >  
> >> So user should have that control?  
> > Is there a use case for only giving some of the them ?  

Yes!

> Maybe one of the devices is dangerous to hand over to a guest?
> 
> Aside from that, there are a few other scenarios where explicitly
> spelling out the devices to be assigned (and where to assign them) makes
> sense:
> 
> 1) possibly (e.g. for testing or compatibility purposes) you want the
> addressing of the devices on the guest to be different from what they
> are on the host. You may want to put them at different functions, or you
> may even want/need devices that are on different functions of the same
> slot on the host to be on different slots in the guest.
> 
> 2) Someone might want to assign multiple *emulated* devices to multiple
> functions on the same slot in the guest (either to mimic the device
> layout of real hardware, or to overcome slot limitations). Since the
> emulated devices have no "physical" topology to mimic on the guest, it
> must necessarily be spelled out in the config.
> 
> 3) There may be a piece of hardware that changes what functions are
> populated with devices based on config (an admittedly un-useful example
> is the multiple VFs of an SRIOV network card). If you just use
> "multifunction='on'" to mean "assign *ALL THE DEVICES*!!!" (insert
> spear-wielding meme here), then the hardware given to the guest will
> change based on how the hardware has been configured prior to assignment.
> 
> Also, in the future there may various knobs that need to be adjusted (in
> the config) for the individual devices, and if the assignment of the
> device to the guest is just implied by "multifunction='on'" then there
> will be no place for that config to live.
> 
> I think it's safest to explicitly spell out which device will be
> assigned, and where they will be assigned.

+1  Making huge generalizations about how functions can be split and
used is definitely the wrong path.  Thanks,

Alex

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list