[libvirt] [PATCH v2 0/4] PCI hostdev partial assignment support

Daniel Henrique Barboza posted 4 patches 4 years, 4 months ago
Test syntax-check failed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/libvirt tags/patchew/20191121221917.15969-1-danielhb413@gmail.com
There is a newer version of this series
docs/formatdomain.html.in                     | 15 +++++
docs/schemas/domaincommon.rng                 |  5 ++
src/conf/domain_conf.c                        | 56 ++++++++++++++--
src/conf/domain_conf.h                        |  3 +
src/qemu/qemu_alias.c                         |  6 ++
src/qemu/qemu_command.c                       |  4 ++
src/qemu/qemu_domain_address.c                |  6 ++
src/util/virhostdev.c                         | 64 +++++++++++++++++--
.../hostdev-pci-address-none.args             | 31 +++++++++
.../hostdev-pci-address-none.xml              | 42 ++++++++++++
...ostdev-pci-multifunction-partial-fail.args | 31 +++++++++
...hostdev-pci-multifunction-partial-fail.xml | 35 ++++++++++
tests/qemuxml2argvtest.c                      |  8 +++
.../hostdev-pci-address-none.xml              | 58 +++++++++++++++++
tests/qemuxml2xmltest.c                       |  1 +
15 files changed, 352 insertions(+), 13 deletions(-)
create mode 100644 tests/qemuxml2argvdata/hostdev-pci-address-none.args
create mode 100644 tests/qemuxml2argvdata/hostdev-pci-address-none.xml
create mode 100644 tests/qemuxml2argvdata/hostdev-pci-multifunction-partial-fail.args
create mode 100644 tests/qemuxml2argvdata/hostdev-pci-multifunction-partial-fail.xml
create mode 100644 tests/qemuxml2xmloutdata/hostdev-pci-address-none.xml
[libvirt] [PATCH v2 0/4] PCI hostdev partial assignment support
Posted by Daniel Henrique Barboza 4 years, 4 months ago
Changes from previous version [1], all of them result of
feedback from Alex Williamson and Abdulla Bubshait:
- use <address type='none'/> instead of creating a new subsys
attribute;
- expand the change to all PCI hostdevs. Former patch 01 was
discarded because we don't need the PCI Multifunction helpers
for now; 
- series changed name to reflect what it's being done
- new patch 04: add documentation to formatdomain.html.in

To avoid a huge wall of text please refer to [1] for context
about the road up to here. Commit msgs of the first 3 patches
tells the story as well.

[1] https://www.redhat.com/archives/libvir-list/2019-October/msg00298.html  

What I want to discuss here instead is a caveat that I've found
while testing this work, since its first version. This test was
done in a Power 8 system with a Broadcom BCM5719 PCIe Multifunction
card, with 4 virtual functions. This series enables Libvirt to
declare PCI hostdevs that will not be visible to the guest using
address type='none'. During the tests I faced a scenario that I
expected to fail, but it didn't. This is the relevant XML except:


    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0001' bus='0x09' slot='0x00' function='0x0'/>
      </source>
      <address type='none'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0001' bus='0x09' slot='0x00' function='0x1'/>
      </source>
      <address type='none'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0001' bus='0x09' slot='0x00' function='0x2'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x01' function='0x2'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0001' bus='0x09' slot='0x00' function='0x3'/>
      </source>
      <address type='none'/>
    </hostdev>


I'm declaring all the BCM5719 functions in the XML, but I am making
functions 0, 1 and 3 unassignable by the guest using address type='none'.
This test was meant to fail, but it didn't. To my surprise the guest
booted and the device is functional:

$ lspci
0000:00:01.0 Unclassified device [00ff]: Red Hat, Inc Virtio memory balloon
0000:00:03.0 USB controller: Red Hat, Inc. QEMU XHCI Host Controller (rev 01)
0000:00:04.0 SCSI storage controller: Red Hat, Inc Virtio block device
0001:00:01.2 Ethernet controller: Broadcom Limited NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
$ 

I've talked with Michael Roth (QEMU PPC64 developer that worked with
the PCI multifunction hotplug/unplug support in the PPC64 machine)
about this. He mentioned that this is intended. I'll quote here what he
had to say about it:


"The current logic is that we only emit the hotplug event when function
0 is attached, but if some other function is attached at boot-time the
guest will still see it on the bus, and whether that works or not I
think is up to the device/driver"


This explains why this test didn't fail as I expected. At least for
the PPC64 machine, depending on the device support, this setup is
allowed. PPC64 machine uses function 0 hotplug as a signal of
'plug all the queue functions and function 0', but function 0
isn't required at boot time.  I would like to hear other opinions
in this because I can't say whether this is allowed in x86.


I am mentioning all this now because this had a direct impact on the
design of this work since the previous version, and I failed
to bring it up back then. I am *not* checking for the assignment of
function 0 at guest boot time in Libvirt, leaving the user free to
decide what to do. I am aware that this will be inconsistent to the
logic of the PCI multifunction hotplug/unplug support, where
function 0 is required. This also puts a lot of faith in the user,
relying that the user is fully aware of the capabilities of the
hardware.

My question is: should Libvirt force function 0 to be present in
boot time as well, regardless of whether the PPC64 guest or some
cards are able to boot without it?


Thanks,


DHB


Daniel Henrique Barboza (4):
  domain_conf: allow address type='none' to unassign PCI hostdevs
  qemu: handle unassigned PCI hostdevs in command line and alias
  virhostdev.c: check all IOMMU devs in virHostdevPreparePCIDevices
  formatdomain.html.in: document <address type='none'/>

 docs/formatdomain.html.in                     | 15 +++++
 docs/schemas/domaincommon.rng                 |  5 ++
 src/conf/domain_conf.c                        | 56 ++++++++++++++--
 src/conf/domain_conf.h                        |  3 +
 src/qemu/qemu_alias.c                         |  6 ++
 src/qemu/qemu_command.c                       |  4 ++
 src/qemu/qemu_domain_address.c                |  6 ++
 src/util/virhostdev.c                         | 64 +++++++++++++++++--
 .../hostdev-pci-address-none.args             | 31 +++++++++
 .../hostdev-pci-address-none.xml              | 42 ++++++++++++
 ...ostdev-pci-multifunction-partial-fail.args | 31 +++++++++
 ...hostdev-pci-multifunction-partial-fail.xml | 35 ++++++++++
 tests/qemuxml2argvtest.c                      |  8 +++
 .../hostdev-pci-address-none.xml              | 58 +++++++++++++++++
 tests/qemuxml2xmltest.c                       |  1 +
 15 files changed, 352 insertions(+), 13 deletions(-)
 create mode 100644 tests/qemuxml2argvdata/hostdev-pci-address-none.args
 create mode 100644 tests/qemuxml2argvdata/hostdev-pci-address-none.xml
 create mode 100644 tests/qemuxml2argvdata/hostdev-pci-multifunction-partial-fail.args
 create mode 100644 tests/qemuxml2argvdata/hostdev-pci-multifunction-partial-fail.xml
 create mode 100644 tests/qemuxml2xmloutdata/hostdev-pci-address-none.xml

-- 
2.21.0


--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH v2 0/4] PCI hostdev partial assignment support
Posted by Alex Williamson 4 years, 4 months ago
On Thu, 21 Nov 2019 19:19:13 -0300
Daniel Henrique Barboza <danielhb413@gmail.com> wrote:

> Changes from previous version [1], all of them result of
> feedback from Alex Williamson and Abdulla Bubshait:
> - use <address type='none'/> instead of creating a new subsys
> attribute;
> - expand the change to all PCI hostdevs. Former patch 01 was
> discarded because we don't need the PCI Multifunction helpers
> for now; 
> - series changed name to reflect what it's being done
> - new patch 04: add documentation to formatdomain.html.in
> 
> To avoid a huge wall of text please refer to [1] for context
> about the road up to here. Commit msgs of the first 3 patches
> tells the story as well.
> 
> [1] https://www.redhat.com/archives/libvir-list/2019-October/msg00298.html  
> 
> What I want to discuss here instead is a caveat that I've found
> while testing this work, since its first version. This test was
> done in a Power 8 system with a Broadcom BCM5719 PCIe Multifunction
> card, with 4 virtual functions. This series enables Libvirt to
> declare PCI hostdevs that will not be visible to the guest using
> address type='none'. During the tests I faced a scenario that I
> expected to fail, but it didn't. This is the relevant XML except:
> 
> 
>     <hostdev mode='subsystem' type='pci' managed='yes'>
>       <driver name='vfio'/>
>       <source>
>         <address domain='0x0001' bus='0x09' slot='0x00' function='0x0'/>
>       </source>
>       <address type='none'/>
>     </hostdev>
>     <hostdev mode='subsystem' type='pci' managed='yes'>
>       <driver name='vfio'/>
>       <source>
>         <address domain='0x0001' bus='0x09' slot='0x00' function='0x1'/>
>       </source>
>       <address type='none'/>
>     </hostdev>
>     <hostdev mode='subsystem' type='pci' managed='yes'>
>       <driver name='vfio'/>
>       <source>
>         <address domain='0x0001' bus='0x09' slot='0x00' function='0x2'/>
>       </source>
>       <address type='pci' domain='0x0000' bus='0x01' slot='0x01' function='0x2'/>
>     </hostdev>
>     <hostdev mode='subsystem' type='pci' managed='yes'>
>       <driver name='vfio'/>
>       <source>
>         <address domain='0x0001' bus='0x09' slot='0x00' function='0x3'/>
>       </source>
>       <address type='none'/>
>     </hostdev>
> 
> 
> I'm declaring all the BCM5719 functions in the XML, but I am making
> functions 0, 1 and 3 unassignable by the guest using address type='none'.
> This test was meant to fail, but it didn't. To my surprise the guest
> booted and the device is functional:
> 
> $ lspci
> 0000:00:01.0 Unclassified device [00ff]: Red Hat, Inc Virtio memory balloon
> 0000:00:03.0 USB controller: Red Hat, Inc. QEMU XHCI Host Controller (rev 01)
> 0000:00:04.0 SCSI storage controller: Red Hat, Inc Virtio block device
> 0001:00:01.2 Ethernet controller: Broadcom Limited NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
> $ 
> 
> I've talked with Michael Roth (QEMU PPC64 developer that worked with
> the PCI multifunction hotplug/unplug support in the PPC64 machine)
> about this. He mentioned that this is intended. I'll quote here what he
> had to say about it:
> 
> 
> "The current logic is that we only emit the hotplug event when function
> 0 is attached, but if some other function is attached at boot-time the
> guest will still see it on the bus, and whether that works or not I
> think is up to the device/driver"
> 
> 
> This explains why this test didn't fail as I expected. At least for
> the PPC64 machine, depending on the device support, this setup is
> allowed. PPC64 machine uses function 0 hotplug as a signal of
> 'plug all the queue functions and function 0', but function 0
> isn't required at boot time.  I would like to hear other opinions
> in this because I can't say whether this is allowed in x86.

I would expect that on x86 a Linux guest would need to be booted with
the pci=pcie_scan_all kernel option to find this device.  The PCI-core
will only scan for devfn 00.0 behind a downstream port and then only
scan non-zero functions if that device indicate multifunction support.
I'd expect more archs to behave this way than what you see on PPC where
it "just works".

> I am mentioning all this now because this had a direct impact on the
> design of this work since the previous version, and I failed
> to bring it up back then. I am *not* checking for the assignment of
> function 0 at guest boot time in Libvirt, leaving the user free to
> decide what to do. I am aware that this will be inconsistent to the
> logic of the PCI multifunction hotplug/unplug support, where
> function 0 is required. This also puts a lot of faith in the user,
> relying that the user is fully aware of the capabilities of the
> hardware.
> 
> My question is: should Libvirt force function 0 to be present in
> boot time as well, regardless of whether the PPC64 guest or some
> cards are able to boot without it?

In my reading of the PCI 3.0 spec, 3.2.2.3.4 indicates that
multifunction devices are required to implement function 0 and that
configuration software is under no obligation to scan for higher number
functions if function 0 is not present and does not have the
multifunction bit set in the header type register.  With PCIe, where
link information, payload size, ARI, VC, etc are exposed in config
space, many of these are only valid or have specific means only for
function 0.

What you're seeing on PPC is, I think, more typical of paravirtualized
PCI enumeration, where you're only seeing a view of the bus as allowed
by a hypervisor.  I can't say whether the pcie_scan_all boot option was
added to allow discovery of devices as in your configuration or simply
to correct cases where function 0 forgot to implement the multifunction
bit.  Whether libvirt wants to prevent this is more of a support
question, it seems spec compliant hardware should never do this, but
not all hardware is spec compliant.  Libvirt should never generate such
a configuration on it's own, but I wouldn't necessarily object if it
allows a user to shoot themselves in the foot.  Thanks,

Alex

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH v2 0/4] PCI hostdev partial assignment support
Posted by Daniel Henrique Barboza 4 years, 4 months ago

On 11/21/19 8:08 PM, Alex Williamson wrote:
> On Thu, 21 Nov 2019 19:19:13 -0300
> Daniel Henrique Barboza <danielhb413@gmail.com> wrote:
> 
[...]
>>
>> My question is: should Libvirt force function 0 to be present in
>> boot time as well, regardless of whether the PPC64 guest or some
>> cards are able to boot without it?
> 
> In my reading of the PCI 3.0 spec, 3.2.2.3.4 indicates that
> multifunction devices are required to implement function 0 and that
> configuration software is under no obligation to scan for higher number
> functions if function 0 is not present and does not have the
> multifunction bit set in the header type register.  With PCIe, where
> link information, payload size, ARI, VC, etc are exposed in config
> space, many of these are only valid or have specific means only for
> function 0.
> 
> What you're seeing on PPC is, I think, more typical of paravirtualized
> PCI enumeration, where you're only seeing a view of the bus as allowed
> by a hypervisor.  I can't say whether the pcie_scan_all boot option was
> added to allow discovery of devices as in your configuration or simply
> to correct cases where function 0 forgot to implement the multifunction


Just checked. Neither the Power 8 host nor the guest is booting with the
pcie_scan_all option.



> bit.Whether libvirt wants to prevent this is more of a support
> question, it seems spec compliant hardware should never do this, but
> not all hardware is spec compliant.  Libvirt should never generate such
> a configuration on it's own, but I wouldn't necessarily object if it
> allows a user to shoot themselves in the foot.  Thanks,

I am not so thrilled about it after what you said. It seems that the PPC64
guest is behaving differently from all other archs in this case, and I'd rather
make PPC64 equal to everyone else rather than allowing the PPC64 user to expect
that non-compliant behavior is allowed.


Thanks,


DHB


> 
> Alex
> 

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list