[libvirt] [PATCH 0/4] PCI multifunction partial assignment support

Daniel Henrique Barboza posted 4 patches 5 years, 1 month ago
Test syntax-check passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/libvirt tags/patchew/20191007211136.16730-1-danielhb413@gmail.com
docs/schemas/domaincommon.rng                 |   5 +
src/conf/domain_conf.c                        |  36 ++++++-
src/conf/domain_conf.h                        |   1 +
src/libvirt_private.syms                      |   2 +
src/qemu/qemu_alias.c                         |   8 ++
src/qemu/qemu_command.c                       |   5 +
src/qemu/qemu_domain_address.c                |   7 ++
src/util/virhostdev.c                         |  88 +++++++++++++++++-
src/util/virhostdev.h                         |   3 +
src/util/virpci.c                             |  15 +++
src/util/virpci.h                             |   2 +
...ostdev-pci-multifunction-partial-fail.args |  31 ++++++
...hostdev-pci-multifunction-partial-fail.xml |  35 +++++++
.../hostdev-pci-multifunction-partial.args    |  31 ++++++
.../hostdev-pci-multifunction-partial.xml     |  41 ++++++++
.../hostdev-pci-multifunction.args            |   7 +-
.../hostdev-pci-multifunction.xml             |   6 ++
tests/qemuxml2argvtest.c                      |   8 ++
.../hostdev-pci-multifunction-partial.xml     |  57 ++++++++++++
.../hostdev-pci-multifunction.xml             |  23 +++--
.../qemuxml2xmloutdata/pseries-hostdevs-1.xml |   4 +-
.../qemuxml2xmloutdata/pseries-hostdevs-2.xml |   4 +-
.../qemuxml2xmloutdata/pseries-hostdevs-3.xml |   4 +-
tests/qemuxml2xmltest.c                       |   1 +
tests/virpcitestdata/0005-90-01.1.config      | Bin 256 -> 256 bytes
tests/virpcitestdata/0005-90-01.2.config      | Bin 256 -> 256 bytes
tests/virpcitestdata/0005-90-01.3.config      | Bin 0 -> 256 bytes
tools/virsh-domain.c                          |   5 +
28 files changed, 408 insertions(+), 21 deletions(-)
create mode 100644 tests/qemuxml2argvdata/hostdev-pci-multifunction-partial-fail.args
create mode 100644 tests/qemuxml2argvdata/hostdev-pci-multifunction-partial-fail.xml
create mode 100644 tests/qemuxml2argvdata/hostdev-pci-multifunction-partial.args
create mode 100644 tests/qemuxml2argvdata/hostdev-pci-multifunction-partial.xml
create mode 100644 tests/qemuxml2xmloutdata/hostdev-pci-multifunction-partial.xml
create mode 100644 tests/virpcitestdata/0005-90-01.3.config
[libvirt] [PATCH 0/4] PCI multifunction partial assignment support
Posted by Daniel Henrique Barboza 5 years, 1 month ago
(--- long post warning ---)

This is a work that derived from the discussions I had with
Laine Stump and Alex Williamson in [1]. I'll provide a quick
gist below.

----------

Today, Libvirt does not have proper support for partial
assignment of functions of passed-through PCI multifunction
devices (hostdev with VFIO-PCI).  By partial assignment I mean
the guest being able to use just some, not all, virtual functions
of the device. Even if the functions itself became useless in
the host, the some functions might not be safe to be used
by the guest, thus the user should be able to limit it.

I mentioned 'proper' because today it is possible to get this
done in Libvirt if we use 'managed=no' in the hostdevs. If the
user makes the proper setup (i.e. detaching all IOMMU devices),
and use managed='no', Libvirt will launch the guest just with the
functions declared in the XML. The technical reason for this is
simple: in virHostdevPreparePCIDevices() we do not take into account
that multifunction PCI devices requires the whole IOMMU to be
detached, not just the devices being declared in def->hostdevs.
In this case, managed='yes' will not work in this scenario, causing
errors in QEMU launch.

The discussion I've started in [1] was motivated by my attempt
of automatically detaching the IOMMU inside the prepare function
with managed='yes' devices. Laine discarded this idea, arguing
that the concept of partial assignment will cause user confusion
if Libvirt starts to handle things without the user being fully
aware. In [1] it was discussed the possibility of declaring the
functions that won't be assigned to the guest in the XML, forcing
the user to be aware that these functions will be lost in the host,
as a possible approach for a solution.

-----------

These series tries to solve the partial assignment of multifunction
hostdev PCI devices by introducing a new hostdev attribute called
'assigned'. This is how it works:

- it is a boolean value that will be efffective just for
multifunction hostdev PCI devices, since there's no other
occurrence for this kind of use in Libvirt. Trying to
declare assign='yes|no' in any other PCI hostdev device
will cause parse errors;

- default value if the attribute is not present is
'assigned=yes';

- <address> element will be forbidden if the hostdev is declared
with assigned='no'. This is to make more evident to the user
that this is a function that the guest will NOT be using, with
a bonus that we will not need to calculate an address that
won't be used;

- with this new mechanic, all the functions of the PCI
multifunction device must be declared in the XML, even
if they're not going to be assigned to the guest. This is
being enforced inside the Prepare() function to avoid XML
parsing errors and existing domains being erased. For consistency,
this is being enforced regardless of the 'managed' flag - even
the managed='no' case is going to adhere with this restriction.

This is a XML snippet of this new attribute, applied to a hostdev
passthrough of a multifunction Broadcom BCM5719 PCIe card with 4
functions:


<hostdev mode='subsystem' type='pci' managed='yes' assigned='yes'>
  <driver name='vfio'/>
  <source>
    <address domain='0x0001' bus='0x09' slot='0x00' function='0x0'/>
  </source>
  <address type='pci' domain='0x0000' bus='0x01' slot='0x01' function='0x0'/>
</hostdev>
<hostdev mode='subsystem' type='pci' managed='yes' assigned='no'>
  <driver name='vfio'/>
  <source>
    <address domain='0x0001' bus='0x09' slot='0x00' function='0x1'/>
  </source>
</hostdev>
<hostdev mode='subsystem' type='pci' managed='yes' assigned='no'>
  <driver name='vfio'/>
  <source>
    <address domain='0x0001' bus='0x09' slot='0x00' function='0x2'/>
  </source>
</hostdev> 
<hostdev mode='subsystem' type='pci' managed='yes' assigned='yes'>
  <driver name='vfio'/>
  <source>
    <address domain='0x0001' bus='0x09' slot='0x00' function='0x3'/>
  </source>
  <address type='pci' domain='0x0000' bus='0x01' slot='0x01' function='0x3'/>
</hostdev>


lspci inside the guest:

$ lspci
0000:00:01.0 Unclassified device [00ff]: Red Hat, Inc Virtio memory balloon
0000:00:03.0 USB controller: Red Hat, Inc. QEMU XHCI Host Controller (rev 01)
0000:00:04.0 SCSI storage controller: Red Hat, Inc Virtio block device
0001:00:01.0 Ethernet controller: Broadcom Limited NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
0001:00:01.3 Ethernet controller: Broadcom Limited NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
$

The intent of this design is to forbid a hostdev with 'assigned=no'
to be attached to QEMU, to have an alias and to have a guest
address. For all other purposes, including the processing done in
virHostdevPreparePCIDevices(), the device is like any other
hostdev - it will be added to activePCIHostdevs and, most important,
it will be detached from the host IOMMU during guest start-up and
will be re-attach back to the host with guest shutdown.

In this very first interaction of this idea I most definitely
missed/forgot to consider something, so any feedback is appreciated.
This has direct implications into the multifunction pci
hotplug/hot-unplug work (in fact, it was the hotplug work that
revealed this gap in Libvirt support - I first noticed it
seeing failed partial hotplug scenarios) and I would like to see
it figured out, with this or any other approach, before continuing
with the hotplug work.


Thanks,


DHB

[1] https://www.redhat.com/archives/libvir-list/2019-July/msg01175.html

Daniel Henrique Barboza (4):
  utils: introducing PCI multifunction detection helpers
  Introducing assigned='yes|no' hostdev attribute
  qemu_command: ignore non-assigned PCI multifunction hostdevs
  virhostdev: all functions of a PCI multifunction dev must be added

 docs/schemas/domaincommon.rng                 |   5 +
 src/conf/domain_conf.c                        |  36 ++++++-
 src/conf/domain_conf.h                        |   1 +
 src/libvirt_private.syms                      |   2 +
 src/qemu/qemu_alias.c                         |   8 ++
 src/qemu/qemu_command.c                       |   5 +
 src/qemu/qemu_domain_address.c                |   7 ++
 src/util/virhostdev.c                         |  88 +++++++++++++++++-
 src/util/virhostdev.h                         |   3 +
 src/util/virpci.c                             |  15 +++
 src/util/virpci.h                             |   2 +
 ...ostdev-pci-multifunction-partial-fail.args |  31 ++++++
 ...hostdev-pci-multifunction-partial-fail.xml |  35 +++++++
 .../hostdev-pci-multifunction-partial.args    |  31 ++++++
 .../hostdev-pci-multifunction-partial.xml     |  41 ++++++++
 .../hostdev-pci-multifunction.args            |   7 +-
 .../hostdev-pci-multifunction.xml             |   6 ++
 tests/qemuxml2argvtest.c                      |   8 ++
 .../hostdev-pci-multifunction-partial.xml     |  57 ++++++++++++
 .../hostdev-pci-multifunction.xml             |  23 +++--
 .../qemuxml2xmloutdata/pseries-hostdevs-1.xml |   4 +-
 .../qemuxml2xmloutdata/pseries-hostdevs-2.xml |   4 +-
 .../qemuxml2xmloutdata/pseries-hostdevs-3.xml |   4 +-
 tests/qemuxml2xmltest.c                       |   1 +
 tests/virpcitestdata/0005-90-01.1.config      | Bin 256 -> 256 bytes
 tests/virpcitestdata/0005-90-01.2.config      | Bin 256 -> 256 bytes
 tests/virpcitestdata/0005-90-01.3.config      | Bin 0 -> 256 bytes
 tools/virsh-domain.c                          |   5 +
 28 files changed, 408 insertions(+), 21 deletions(-)
 create mode 100644 tests/qemuxml2argvdata/hostdev-pci-multifunction-partial-fail.args
 create mode 100644 tests/qemuxml2argvdata/hostdev-pci-multifunction-partial-fail.xml
 create mode 100644 tests/qemuxml2argvdata/hostdev-pci-multifunction-partial.args
 create mode 100644 tests/qemuxml2argvdata/hostdev-pci-multifunction-partial.xml
 create mode 100644 tests/qemuxml2xmloutdata/hostdev-pci-multifunction-partial.xml
 create mode 100644 tests/virpcitestdata/0005-90-01.3.config

-- 
2.21.0

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH 0/4] PCI multifunction partial assignment support
Posted by Alex Williamson 5 years, 1 month ago
On Mon,  7 Oct 2019 18:11:32 -0300
Daniel Henrique Barboza <danielhb413@gmail.com> wrote:

> (--- long post warning ---)
> 
> This is a work that derived from the discussions I had with
> Laine Stump and Alex Williamson in [1]. I'll provide a quick
> gist below.
> 
> ----------
> 
> Today, Libvirt does not have proper support for partial
> assignment of functions of passed-through PCI multifunction
> devices (hostdev with VFIO-PCI).  By partial assignment I mean
> the guest being able to use just some, not all, virtual functions
> of the device. Even if the functions itself became useless in
> the host, the some functions might not be safe to be used
> by the guest, thus the user should be able to limit it.

Not safe in what way?  Patch 2/4 says some devices might be "security
sensitive", but the fact that this patch is necessary implies that the
host kernel already considers the devices non-isolated.  They must be
in the same iommu group to have this issue.  Is there a concrete
example of a device where a user would want this configuration?  The
case I can think of is not a security issue, but a functional one
where GPU and audio functions are grouped together and maybe the audio
function doesn't work well when assigned, or maybe we just want the
guest to default to another audio device and it's easier if we just
don't expose this on-card audio.
 
> I mentioned 'proper' because today it is possible to get this
> done in Libvirt if we use 'managed=no' in the hostdevs. If the
> user makes the proper setup (i.e. detaching all IOMMU devices),
> and use managed='no', Libvirt will launch the guest just with the
> functions declared in the XML. The technical reason for this is
> simple: in virHostdevPreparePCIDevices() we do not take into account
> that multifunction PCI devices requires the whole IOMMU to be
> detached, not just the devices being declared in def->hostdevs.
> In this case, managed='yes' will not work in this scenario, causing
> errors in QEMU launch.
> 
> The discussion I've started in [1] was motivated by my attempt
> of automatically detaching the IOMMU inside the prepare function
> with managed='yes' devices. Laine discarded this idea, arguing
> that the concept of partial assignment will cause user confusion
> if Libvirt starts to handle things without the user being fully
> aware. In [1] it was discussed the possibility of declaring the
> functions that won't be assigned to the guest in the XML, forcing
> the user to be aware that these functions will be lost in the host,
> as a possible approach for a solution.
> 
> -----------
> 
> These series tries to solve the partial assignment of multifunction
> hostdev PCI devices by introducing a new hostdev attribute called
> 'assigned'. This is how it works:
> 
> - it is a boolean value that will be efffective just for
> multifunction hostdev PCI devices, since there's no other
> occurrence for this kind of use in Libvirt. Trying to
> declare assign='yes|no' in any other PCI hostdev device
> will cause parse errors;
> 
> - default value if the attribute is not present is
> 'assigned=yes';
> 
> - <address> element will be forbidden if the hostdev is declared
> with assigned='no'. This is to make more evident to the user
> that this is a function that the guest will NOT be using, with
> a bonus that we will not need to calculate an address that
> won't be used;

It seems more intuitive to me to use the guest <address> element to
expose this.  libvirt often makes use of 'none' to declare empty
devices, so maybe <address type='none'/> would be more in line with
precedent.  Thanks,

Alex

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH 0/4] PCI multifunction partial assignment support
Posted by Daniel Henrique Barboza 5 years, 1 month ago

On 10/7/19 7:41 PM, Alex Williamson wrote:
> On Mon,  7 Oct 2019 18:11:32 -0300
> Daniel Henrique Barboza <danielhb413@gmail.com> wrote:
>
>> (--- long post warning ---)
>>
>> This is a work that derived from the discussions I had with
>> Laine Stump and Alex Williamson in [1]. I'll provide a quick
>> gist below.
>>
>> ----------
>>
>> Today, Libvirt does not have proper support for partial
>> assignment of functions of passed-through PCI multifunction
>> devices (hostdev with VFIO-PCI).  By partial assignment I mean
>> the guest being able to use just some, not all, virtual functions
>> of the device. Even if the functions itself became useless in
>> the host, the some functions might not be safe to be used
>> by the guest, thus the user should be able to limit it.
> Not safe in what way?  Patch 2/4 says some devices might be "security
> sensitive", but the fact that this patch is necessary implies that the
> host kernel already considers the devices non-isolated.  They must be
> in the same iommu group to have this issue.  Is there a concrete
> example of a device where a user would want this configuration?  The
> case I can think of is not a security issue, but a functional one
> where GPU and audio functions are grouped together and maybe the audio
> function doesn't work well when assigned, or maybe we just want the
> guest to default to another audio device and it's easier if we just
> don't expose this on-card audio.

The audio card example is one was thinking of (and I believe it
was brought up in the [1] thread as well) when writing about the
need for this work.

But in the end, I believe my use of 'security issue' wording is
wrong - I am implying that there might be a case in which isolated
devices, in the same IOMMU, can present security risks for each
other or something like that when assigned to the guest. I can't make
such strong claim. These patches are mote about enhancing a functional
use, like you said.


Thanks for pointing this out. I'll rearrange the discourse in the
next spins.


>   
>> I mentioned 'proper' because today it is possible to get this
>> done in Libvirt if we use 'managed=no' in the hostdevs. If the
>> user makes the proper setup (i.e. detaching all IOMMU devices),
>> and use managed='no', Libvirt will launch the guest just with the
>> functions declared in the XML. The technical reason for this is
>> simple: in virHostdevPreparePCIDevices() we do not take into account
>> that multifunction PCI devices requires the whole IOMMU to be
>> detached, not just the devices being declared in def->hostdevs.
>> In this case, managed='yes' will not work in this scenario, causing
>> errors in QEMU launch.
>>
>> The discussion I've started in [1] was motivated by my attempt
>> of automatically detaching the IOMMU inside the prepare function
>> with managed='yes' devices. Laine discarded this idea, arguing
>> that the concept of partial assignment will cause user confusion
>> if Libvirt starts to handle things without the user being fully
>> aware. In [1] it was discussed the possibility of declaring the
>> functions that won't be assigned to the guest in the XML, forcing
>> the user to be aware that these functions will be lost in the host,
>> as a possible approach for a solution.
>>
>> -----------
>>
>> These series tries to solve the partial assignment of multifunction
>> hostdev PCI devices by introducing a new hostdev attribute called
>> 'assigned'. This is how it works:
>>
>> - it is a boolean value that will be efffective just for
>> multifunction hostdev PCI devices, since there's no other
>> occurrence for this kind of use in Libvirt. Trying to
>> declare assign='yes|no' in any other PCI hostdev device
>> will cause parse errors;
>>
>> - default value if the attribute is not present is
>> 'assigned=yes';
>>
>> - <address> element will be forbidden if the hostdev is declared
>> with assigned='no'. This is to make more evident to the user
>> that this is a function that the guest will NOT be using, with
>> a bonus that we will not need to calculate an address that
>> won't be used;
> It seems more intuitive to me to use the guest <address> element to
> expose this.  libvirt often makes use of 'none' to declare empty
> devices, so maybe <address type='none'/> would be more in line with
> precedent.  Thanks,

If <address type='none'> is not being used by anything else (it doesn't
appear to be, at least in a quick look at libvirt.org docs), this is a
good idea indeed. It also spare us from adding more documentation
for a new attribute.

I'll wait to see if more people wants to comment in this work and, unless
someone presents a good reason not to, I'll see if I can make this
<address type='none'> happen.




Thanks,


DHB


>
> Alex

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list