[libvirt] [PATCH v4 0/5] PCI hostdev partial assignment support

Daniel Henrique Barboza posted 5 patches 4 years, 4 months ago
Test syntax-check failed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/libvirt tags/patchew/20191216133607.3055202-1-danielhb413@gmail.com
There is a newer version of this series
docs/formatdomain.html.in                     | 14 ++++
docs/news.xml                                 | 19 ++++++
docs/schemas/domaincommon.rng                 |  5 ++
src/conf/device_conf.c                        |  2 +
src/conf/device_conf.h                        |  1 +
src/conf/domain_conf.c                        |  7 +-
src/qemu/qemu_command.c                       |  5 ++
src/qemu/qemu_domain.c                        |  1 +
src/qemu/qemu_domain_address.c                |  5 ++
src/util/virhostdev.c                         | 64 +++++++++++++++++--
.../hostdev-pci-address-unassigned.args       | 31 +++++++++
.../hostdev-pci-address-unassigned.xml        | 42 ++++++++++++
tests/qemuxml2argvtest.c                      |  4 ++
.../hostdev-pci-address-unassigned.xml        | 58 +++++++++++++++++
tests/qemuxml2xmltest.c                       |  1 +
15 files changed, 251 insertions(+), 8 deletions(-)
create mode 100644 tests/qemuxml2argvdata/hostdev-pci-address-unassigned.args
create mode 100644 tests/qemuxml2argvdata/hostdev-pci-address-unassigned.xml
create mode 100644 tests/qemuxml2xmloutdata/hostdev-pci-address-unassigned.xml
[libvirt] [PATCH v4 0/5] PCI hostdev partial assignment support
Posted by Daniel Henrique Barboza 4 years, 4 months ago
changes from version 3 [1]:
- removed last 2 patches that made function 0 of 
PCI multifunction devices mandatory
- new patch: news.xml update
- changed 'since' version to 6.0.0 in patch 4
- unassigned hostdevs are now getting qemu aliases

[1] https://www.redhat.com/archives/libvir-list/2019-November/msg01263.html

Daniel Henrique Barboza (5):
  Introducing new address type='unassigned' for PCI hostdevs
  qemu: handle unassigned PCI hostdevs in command line
  virhostdev.c: check all IOMMU devs in virHostdevPreparePCIDevices
  formatdomain.html.in: document <address type='unassigned'/>
  news.xml: add address type='unassigned' entry

 docs/formatdomain.html.in                     | 14 ++++
 docs/news.xml                                 | 19 ++++++
 docs/schemas/domaincommon.rng                 |  5 ++
 src/conf/device_conf.c                        |  2 +
 src/conf/device_conf.h                        |  1 +
 src/conf/domain_conf.c                        |  7 +-
 src/qemu/qemu_command.c                       |  5 ++
 src/qemu/qemu_domain.c                        |  1 +
 src/qemu/qemu_domain_address.c                |  5 ++
 src/util/virhostdev.c                         | 64 +++++++++++++++++--
 .../hostdev-pci-address-unassigned.args       | 31 +++++++++
 .../hostdev-pci-address-unassigned.xml        | 42 ++++++++++++
 tests/qemuxml2argvtest.c                      |  4 ++
 .../hostdev-pci-address-unassigned.xml        | 58 +++++++++++++++++
 tests/qemuxml2xmltest.c                       |  1 +
 15 files changed, 251 insertions(+), 8 deletions(-)
 create mode 100644 tests/qemuxml2argvdata/hostdev-pci-address-unassigned.args
 create mode 100644 tests/qemuxml2argvdata/hostdev-pci-address-unassigned.xml
 create mode 100644 tests/qemuxml2xmloutdata/hostdev-pci-address-unassigned.xml

-- 
2.23.0


--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH v4 0/5] PCI hostdev partial assignment support
Posted by Cole Robinson 4 years, 4 months ago
On 12/16/19 8:36 AM, Daniel Henrique Barboza wrote:
> changes from version 3 [1]:
> - removed last 2 patches that made function 0 of 
> PCI multifunction devices mandatory
> - new patch: news.xml update
> - changed 'since' version to 6.0.0 in patch 4
> - unassigned hostdevs are now getting qemu aliases
> 
> [1] https://www.redhat.com/archives/libvir-list/2019-November/msg01263.html
> 
> Daniel Henrique Barboza (5):
>   Introducing new address type='unassigned' for PCI hostdevs
>   qemu: handle unassigned PCI hostdevs in command line
>   virhostdev.c: check all IOMMU devs in virHostdevPreparePCIDevices
>   formatdomain.html.in: document <address type='unassigned'/>
>   news.xml: add address type='unassigned' entry
> 

Codewise it looks fine now. But I'm looking more closely at patch #3 and
realizing that it can explicitly reject a previously accepted VM config.
And indeed, now that I give it a test with my GPU passthrough setup, it
is rejecting my previosly working config.

error: Requested operation is not valid: All devices of the same IOMMU
group 1 of the PCI device 0000:01:00.0 must belong to domain win10

I've attached the nodedev XML for the three devices with iommuGroup 1.
Only the two nvidia devices are assigned to my VM, but not the PCIe
controller device.

Is the libvirt heuristic missing something? Or is this acting as expected?

I didn't quite gather that this is a change to reject previously
accepted configurations, so I will defer to Laine and Alex as to whether
this should be committed.

- Cole
<device>
  <name>pci_0000_00_01_0</name>
  <path>/sys/devices/pci0000:00/0000:00:01.0</path>
  <parent>computer</parent>
  <driver>
    <name>pcieport</name>
  </driver>
  <capability type='pci'>
    <class>0x060400</class>
    <domain>0</domain>
    <bus>0</bus>
    <slot>1</slot>
    <function>0</function>
    <product id='0x1901'>Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16)</product>
    <vendor id='0x8086'>Intel Corporation</vendor>
    <capability type='pci-bridge'/>
    <iommuGroup number='1'>
      <address domain='0x0000' bus='0x00' slot='0x01' function='0x0'/>
      <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
      <address domain='0x0000' bus='0x01' slot='0x00' function='0x1'/>
    </iommuGroup>
    <pci-express>
      <link validity='cap' port='2' speed='8' width='16'/>
      <link validity='sta' speed='8' width='16'/>
    </pci-express>
  </capability>
</device>

<device>
  <name>pci_0000_01_00_0</name>
  <path>/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0</path>
  <parent>pci_0000_00_01_0</parent>
  <driver>
    <name>vfio-pci</name>
  </driver>
  <capability type='pci'>
    <class>0x030000</class>
    <domain>0</domain>
    <bus>1</bus>
    <slot>0</slot>
    <function>0</function>
    <product id='0x1c82'>GP107 [GeForce GTX 1050 Ti]</product>
    <vendor id='0x10de'>NVIDIA Corporation</vendor>
    <iommuGroup number='1'>
      <address domain='0x0000' bus='0x00' slot='0x01' function='0x0'/>
      <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
      <address domain='0x0000' bus='0x01' slot='0x00' function='0x1'/>
    </iommuGroup>
    <pci-express>
      <link validity='cap' port='0' speed='8' width='16'/>
      <link validity='sta' speed='8' width='16'/>
    </pci-express>
  </capability>
</device>

<device>
  <name>pci_0000_01_00_1</name>
  <path>/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.1</path>
  <parent>pci_0000_00_01_0</parent>
  <driver>
    <name>vfio-pci</name>
  </driver>
  <capability type='pci'>
    <class>0x040300</class>
    <domain>0</domain>
    <bus>1</bus>
    <slot>0</slot>
    <function>1</function>
    <product id='0x0fb9'>GP107GL High Definition Audio Controller</product>
    <vendor id='0x10de'>NVIDIA Corporation</vendor>
    <iommuGroup number='1'>
      <address domain='0x0000' bus='0x00' slot='0x01' function='0x0'/>
      <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
      <address domain='0x0000' bus='0x01' slot='0x00' function='0x1'/>
    </iommuGroup>
    <pci-express>
      <link validity='cap' port='0' speed='8' width='16'/>
      <link validity='sta' speed='8' width='16'/>
    </pci-express>
  </capability>
</device>
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v4 0/5] PCI hostdev partial assignment support
Posted by Daniel Henrique Barboza 4 years, 4 months ago

On 12/16/19 7:28 PM, Cole Robinson wrote:
> On 12/16/19 8:36 AM, Daniel Henrique Barboza wrote:
>> changes from version 3 [1]:
>> - removed last 2 patches that made function 0 of
>> PCI multifunction devices mandatory
>> - new patch: news.xml update
>> - changed 'since' version to 6.0.0 in patch 4
>> - unassigned hostdevs are now getting qemu aliases
>>
>> [1] https://www.redhat.com/archives/libvir-list/2019-November/msg01263.html
>>
>> Daniel Henrique Barboza (5):
>>    Introducing new address type='unassigned' for PCI hostdevs
>>    qemu: handle unassigned PCI hostdevs in command line
>>    virhostdev.c: check all IOMMU devs in virHostdevPreparePCIDevices
>>    formatdomain.html.in: document <address type='unassigned'/>
>>    news.xml: add address type='unassigned' entry
>>
> 
> Codewise it looks fine now. But I'm looking more closely at patch #3 and
> realizing that it can explicitly reject a previously accepted VM config.
> And indeed, now that I give it a test with my GPU passthrough setup, it
> is rejecting my previosly working config.
> 
> error: Requested operation is not valid: All devices of the same IOMMU
> group 1 of the PCI device 0000:01:00.0 must belong to domain win10
> 
> I've attached the nodedev XML for the three devices with iommuGroup 1.
> Only the two nvidia devices are assigned to my VM, but not the PCIe
> controller device.
> 
> Is the libvirt heuristic missing something? Or is this acting as expected?

You mentioned that you declared 3 devices of IOMMU group 1. Unless the code in
patch 3 has a bug, there are more PCI hostdevs in IOMMU group 1 that were left
out of the domain XML.


> 
> I didn't quite gather that this is a change to reject previously
> accepted configurations, so I will defer to Laine and Alex as to whether
> this should be committed.


I mentioned in the commit msg of patch 03 that this would break working
configurations that didn't comply with the new 'all devices of the IOMMU
group must be included in the domain XML' directive. Perhaps this is worth
mentioning in the 'news' page to warn users about it.

About breaking existing configurations, there is the possibility of not
going forward with patch 03, which is enforcing this rule of declaring all the
IOMMU group. Existing domains will keep working as usual, the option to
unassign devices will still be present, but the user will have to deal with
the potential QEMU errors if not all PCI devices were detached from the host.

In this case, the 'unassigned' type will become more of a ON/OFF switch to
add/remove the PCI hostdev from the guest without removing it from the
domain XML. It is still useful, but we lose the idea of all the IOMMU
devices being described in the domain XML, which is something Laine
mentioned it would be desirable in one of the RFCs.


Thanks,


DHB


> 
> - Cole
> 

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH v4 0/5] PCI hostdev partial assignment support
Posted by Laine Stump 4 years, 4 months ago
On 12/16/19 6:03 PM, Daniel Henrique Barboza wrote:
>
>
> On 12/16/19 7:28 PM, Cole Robinson wrote:
>> On 12/16/19 8:36 AM, Daniel Henrique Barboza wrote:
>>> changes from version 3 [1]:
>>> - removed last 2 patches that made function 0 of
>>> PCI multifunction devices mandatory
>>> - new patch: news.xml update
>>> - changed 'since' version to 6.0.0 in patch 4
>>> - unassigned hostdevs are now getting qemu aliases
>>>
>>> [1] 
>>> https://www.redhat.com/archives/libvir-list/2019-November/msg01263.html
>>>
>>> Daniel Henrique Barboza (5):
>>>    Introducing new address type='unassigned' for PCI hostdevs
>>>    qemu: handle unassigned PCI hostdevs in command line
>>>    virhostdev.c: check all IOMMU devs in virHostdevPreparePCIDevices
>>>    formatdomain.html.in: document <address type='unassigned'/>
>>>    news.xml: add address type='unassigned' entry
>>>
>>
>> Codewise it looks fine now. But I'm looking more closely at patch #3 and
>> realizing that it can explicitly reject a previously accepted VM config.
>> And indeed, now that I give it a test with my GPU passthrough setup, it
>> is rejecting my previosly working config.
>>
>> error: Requested operation is not valid: All devices of the same IOMMU
>> group 1 of the PCI device 0000:01:00.0 must belong to domain win10
>>
>> I've attached the nodedev XML for the three devices with iommuGroup 1.
>> Only the two nvidia devices are assigned to my VM, but not the PCIe
>> controller device.
>>
>> Is the libvirt heuristic missing something? Or is this acting as 
>> expected?
>
> You mentioned that you declared 3 devices of IOMMU group 1. Unless the 
> code in
> patch 3 has a bug, there are more PCI hostdevs in IOMMU group 1 that 
> were left
> out of the domain XML.
>
>
>>
>> I didn't quite gather that this is a change to reject previously
>> accepted configurations, so I will defer to Laine and Alex as to whether
>> this should be committed.
>
>
> I mentioned in the commit msg of patch 03 that this would break working
> configurations that didn't comply with the new 'all devices of the IOMMU
> group must be included in the domain XML' directive. Perhaps this is 
> worth
> mentioning in the 'news' page to warn users about it.


No, this shouldn't be a requirement at all. In my mind the purpose of 
these patches is to make something work (in a safe manner) that failed 
before, *not* to add new restrictions that break things that already 
work. (Sorry I wasn't paying more attention to the patches earlier).


>
> About breaking existing configurations, there is the possibility of not
> going forward with patch 03, which is enforcing this rule of declaring 
> all the
> IOMMU group. Existing domains will keep working as usual, the option to
> unassign devices will still be present, but the user will have to deal 
> with
> the potential QEMU errors if not all PCI devices were detached from 
> the host.
>
> In this case, the 'unassigned' type will become more of a ON/OFF 
> switch to
> add/remove the PCI hostdev from the guest without removing it from the
> domain XML. It is still useful, but we lose the idea of all the IOMMU
> devices being described in the domain XML, which is something Laine
> mentioned it would be desirable in one of the RFCs.


I don't actually recall saying that :-). I haven't looked in the list 
archives, but what I *can* imagine myself saying is that only devices 
mentioned in the XML should be manipulated in any way by libvirt. So, 
for example, you shouldn't unbind device X from its host driver if there 
is nothing in the XML telling you to do that. But if a device isn't 
mentioned in the XML, and is already bound to some driver that is 
acceptable to the VFIO subsystem (e.g. vfio-pci, pci-stub or no driver 
at all (? is that right Alex?)) then that should not create any problem.

Doing otherwise would break too many existing configs. (For example, my 
own assigned-GPU config, which assumes that all the devices are already 
bound to the proper driver, and uses "managed='no'")



--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v4 0/5] PCI hostdev partial assignment support
Posted by Alex Williamson 4 years, 4 months ago
On Tue, 17 Dec 2019 11:25:38 -0500
Laine Stump <laine@laine.org> wrote:

> On 12/16/19 6:03 PM, Daniel Henrique Barboza wrote:
> >
> >
> > On 12/16/19 7:28 PM, Cole Robinson wrote:  
> >> On 12/16/19 8:36 AM, Daniel Henrique Barboza wrote:  
> >>> changes from version 3 [1]:
> >>> - removed last 2 patches that made function 0 of
> >>> PCI multifunction devices mandatory
> >>> - new patch: news.xml update
> >>> - changed 'since' version to 6.0.0 in patch 4
> >>> - unassigned hostdevs are now getting qemu aliases
> >>>
> >>> [1] 
> >>> https://www.redhat.com/archives/libvir-list/2019-November/msg01263.html
> >>>
> >>> Daniel Henrique Barboza (5):
> >>>    Introducing new address type='unassigned' for PCI hostdevs
> >>>    qemu: handle unassigned PCI hostdevs in command line
> >>>    virhostdev.c: check all IOMMU devs in virHostdevPreparePCIDevices
> >>>    formatdomain.html.in: document <address type='unassigned'/>
> >>>    news.xml: add address type='unassigned' entry
> >>>  
> >>
> >> Codewise it looks fine now. But I'm looking more closely at patch #3 and
> >> realizing that it can explicitly reject a previously accepted VM config.
> >> And indeed, now that I give it a test with my GPU passthrough setup, it
> >> is rejecting my previosly working config.
> >>
> >> error: Requested operation is not valid: All devices of the same IOMMU
> >> group 1 of the PCI device 0000:01:00.0 must belong to domain win10
> >>
> >> I've attached the nodedev XML for the three devices with iommuGroup 1.
> >> Only the two nvidia devices are assigned to my VM, but not the PCIe
> >> controller device.
> >>
> >> Is the libvirt heuristic missing something? Or is this acting as 
> >> expected?  
> >
> > You mentioned that you declared 3 devices of IOMMU group 1. Unless the 
> > code in
> > patch 3 has a bug, there are more PCI hostdevs in IOMMU group 1 that 
> > were left
> > out of the domain XML.
> >
> >  
> >>
> >> I didn't quite gather that this is a change to reject previously
> >> accepted configurations, so I will defer to Laine and Alex as to whether
> >> this should be committed.  
> >
> >
> > I mentioned in the commit msg of patch 03 that this would break working
> > configurations that didn't comply with the new 'all devices of the IOMMU
> > group must be included in the domain XML' directive. Perhaps this is 
> > worth
> > mentioning in the 'news' page to warn users about it.  
> 
> 
> No, this shouldn't be a requirement at all. In my mind the purpose of 
> these patches is to make something work (in a safe manner) that failed 
> before, *not* to add new restrictions that break things that already 
> work. (Sorry I wasn't paying more attention to the patches earlier).

+1

> > About breaking existing configurations, there is the possibility of not
> > going forward with patch 03, which is enforcing this rule of declaring 
> > all the
> > IOMMU group. Existing domains will keep working as usual, the option to
> > unassign devices will still be present, but the user will have to deal 
> > with
> > the potential QEMU errors if not all PCI devices were detached from 
> > the host.
> >
> > In this case, the 'unassigned' type will become more of a ON/OFF 
> > switch to
> > add/remove the PCI hostdev from the guest without removing it from the
> > domain XML. It is still useful, but we lose the idea of all the IOMMU
> > devices being described in the domain XML, which is something Laine
> > mentioned it would be desirable in one of the RFCs.  
> 
> 
> I don't actually recall saying that :-). I haven't looked in the list 
> archives, but what I *can* imagine myself saying is that only devices 
> mentioned in the XML should be manipulated in any way by libvirt. So,

+1
 
> for example, you shouldn't unbind device X from its host driver if there 
> is nothing in the XML telling you to do that. But if a device isn't 
> mentioned in the XML, and is already bound to some driver that is 
> acceptable to the VFIO subsystem (e.g. vfio-pci, pci-stub or no driver 
> at all (? is that right Alex?)) then that should not create any problem.

Yes, that's right.

> Doing otherwise would break too many existing configs. (For example, my 
> own assigned-GPU config, which assumes that all the devices are already 
> bound to the proper driver, and uses "managed='no'")

Effectively anyone assigning PFs with a PCIe root port that does not
support ACS would be broken by this series.  That's a significant
portion of vfio users.  Thanks,

Alex

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v4 0/5] PCI hostdev partial assignment support
Posted by Daniel Henrique Barboza 4 years, 4 months ago

On 12/17/19 1:32 PM, Alex Williamson wrote:
> On Tue, 17 Dec 2019 11:25:38 -0500
> Laine Stump <laine@laine.org> wrote:
> 
>> On 12/16/19 6:03 PM, Daniel Henrique Barboza wrote:
>>> About breaking existing configurations, there is the possibility of not
>>> going forward with patch 03, which is enforcing this rule of declaring
>>> all the
>>> IOMMU group. Existing domains will keep working as usual, the option to
>>> unassign devices will still be present, but the user will have to deal
>>> with
>>> the potential QEMU errors if not all PCI devices were detached from
>>> the host.
>>>
>>> In this case, the 'unassigned' type will become more of a ON/OFF
>>> switch to
>>> add/remove the PCI hostdev from the guest without removing it from the
>>> domain XML. It is still useful, but we lose the idea of all the IOMMU
>>> devices being described in the domain XML, which is something Laine
>>> mentioned it would be desirable in one of the RFCs.
>>
>>
>> I don't actually recall saying that :-). I haven't looked in the list
>> archives, but what I *can* imagine myself saying is that only devices
>> mentioned in the XML should be manipulated in any way by libvirt. So,
> 
> +1
>   
>> for example, you shouldn't unbind device X from its host driver if there
>> is nothing in the XML telling you to do that. But if a device isn't
>> mentioned in the XML, and is already bound to some driver that is
>> acceptable to the VFIO subsystem (e.g. vfio-pci, pci-stub or no driver
>> at all (? is that right Alex?)) then that should not create any problem.
> 
> Yes, that's right.
> 
>> Doing otherwise would break too many existing configs. (For example, my
>> own assigned-GPU config, which assumes that all the devices are already
>> bound to the proper driver, and uses "managed='no'")


This is the new approach of the series I implemented today and plan to
to send for review today/tomorrow. I realized, after all the discussions
yesterday with Alex, that the patch series would be best just sticking with
what we want fixed (managed=yes and parcial assignment) and leaving
unmanaged (managed=no) configurations alone. If the user has an existing,
working unmanaged setup, this means that the user chose to manage device
detach/re-attach manually and shouldn't be bothered with a change that's
aimed to managed devices.


Thanks,


DHB

> 
> Effectively anyone assigning PFs with a PCIe root port that does not
> support ACS would be broken by this series.  That's a significant
> portion of vfio users.  Thanks,
> 
> Alex
> 

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH v4 0/5] PCI hostdev partial assignment support
Posted by Alex Williamson 4 years, 4 months ago
On Tue, 17 Dec 2019 13:43:14 -0300
Daniel Henrique Barboza <danielhb413@gmail.com> wrote:

> On 12/17/19 1:32 PM, Alex Williamson wrote:
> > On Tue, 17 Dec 2019 11:25:38 -0500
> > Laine Stump <laine@laine.org> wrote:
> >   
> >> On 12/16/19 6:03 PM, Daniel Henrique Barboza wrote:  
> >>> About breaking existing configurations, there is the possibility of not
> >>> going forward with patch 03, which is enforcing this rule of declaring
> >>> all the
> >>> IOMMU group. Existing domains will keep working as usual, the option to
> >>> unassign devices will still be present, but the user will have to deal
> >>> with
> >>> the potential QEMU errors if not all PCI devices were detached from
> >>> the host.
> >>>
> >>> In this case, the 'unassigned' type will become more of a ON/OFF
> >>> switch to
> >>> add/remove the PCI hostdev from the guest without removing it from the
> >>> domain XML. It is still useful, but we lose the idea of all the IOMMU
> >>> devices being described in the domain XML, which is something Laine
> >>> mentioned it would be desirable in one of the RFCs.  
> >>
> >>
> >> I don't actually recall saying that :-). I haven't looked in the list
> >> archives, but what I *can* imagine myself saying is that only devices
> >> mentioned in the XML should be manipulated in any way by libvirt. So,  
> > 
> > +1
> >     
> >> for example, you shouldn't unbind device X from its host driver if there
> >> is nothing in the XML telling you to do that. But if a device isn't
> >> mentioned in the XML, and is already bound to some driver that is
> >> acceptable to the VFIO subsystem (e.g. vfio-pci, pci-stub or no driver
> >> at all (? is that right Alex?)) then that should not create any problem.  
> > 
> > Yes, that's right.
> >   
> >> Doing otherwise would break too many existing configs. (For example, my
> >> own assigned-GPU config, which assumes that all the devices are already
> >> bound to the proper driver, and uses "managed='no'")  
> 
> 
> This is the new approach of the series I implemented today and plan to
> to send for review today/tomorrow. I realized, after all the discussions
> yesterday with Alex, that the patch series would be best just sticking with
> what we want fixed (managed=yes and parcial assignment) and leaving
> unmanaged (managed=no) configurations alone. If the user has an existing,
> working unmanaged setup, this means that the user chose to manage device
> detach/re-attach manually and shouldn't be bothered with a change that's
> aimed to managed devices.

There are of course existing, working managed=yes configurations where
the set of assigned devices is only a subset of the IOMMU group, and
the user has configured other means to make the group viable relative
to vfio.  The statement above doesn't convince me that the next
iteration isn't simply going to restrict its manipulation of other
devices.  As Laine says above and I said in my previous reply, libvirt
should not manipulate the driver binding of any devices not explicitly
listed in the domain XMl.  Thanks,

Alex

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH v4 0/5] PCI hostdev partial assignment support
Posted by Daniel Henrique Barboza 4 years, 4 months ago

On 12/17/19 1:43 PM, Daniel Henrique Barboza wrote:
>>> I don't actually recall saying that :-). I haven't looked in the list
>>> archives, but what I *can* imagine myself saying is that only devices
>>> mentioned in the XML should be manipulated in any way by libvirt. So,
>>
>> +1
>>> for example, you shouldn't unbind device X from its host driver if there
>>> is nothing in the XML telling you to do that. But if a device isn't
>>> mentioned in the XML, and is already bound to some driver that is
>>> acceptable to the VFIO subsystem (e.g. vfio-pci, pci-stub or no driver
>>> at all (? is that right Alex?)) then that should not create any problem.
>>
>> Yes, that's right.
>>
>>> Doing otherwise would break too many existing configs. (For example, my
>>> own assigned-GPU config, which assumes that all the devices are already
>>> bound to the proper driver, and uses "managed='no'")


I am re-reading this info and reassessing what I intended to do. Suppose a
scenario in which host IOMMU has PCI devices A,B and C. Let's also suppose
that all of them are already bind to vfio-pci.

If I create a guest with A and B as PCI hostdevs, with managed=yes, using
vfio-pci, the setup would work as it is. If I put the IOMMU validation I was
planning to, it will stop working because "C" isn't described in the domain
XML.

If we stick with the directive "Libvirt shouldn't tamper with devices that
are not described in the domain XML" that Laine mentioned up there, and this
directive alone, then I can't make such assumptions about whether the user
will be OK with assigning everything to vfio-pci, given that there are other
acceptable alternatives for the VFIO subsystem that aren't restricted to
that.

I'm convinced that this validation I was pursuing here is a bad idea. It
has a great potential of being annoying, making assumptions about real life
configurations that aren't true, and offering not much in return. I'll
remove it from the series. The result is that the user will have a
new option to let Libvirt control all the IOMMU devices, making some
of the unassignable to the guest. But it will be a new option, not a
new rule that all existing domains will need to adhere to.


Thanks,


DHB


--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH v4 0/5] PCI hostdev partial assignment support
Posted by Alex Williamson 4 years, 4 months ago
On Mon, 16 Dec 2019 17:28:28 -0500
Cole Robinson <crobinso@redhat.com> wrote:

> On 12/16/19 8:36 AM, Daniel Henrique Barboza wrote:
> > changes from version 3 [1]:
> > - removed last 2 patches that made function 0 of 
> > PCI multifunction devices mandatory
> > - new patch: news.xml update
> > - changed 'since' version to 6.0.0 in patch 4
> > - unassigned hostdevs are now getting qemu aliases
> > 
> > [1] https://www.redhat.com/archives/libvir-list/2019-November/msg01263.html
> > 
> > Daniel Henrique Barboza (5):
> >   Introducing new address type='unassigned' for PCI hostdevs
> >   qemu: handle unassigned PCI hostdevs in command line
> >   virhostdev.c: check all IOMMU devs in virHostdevPreparePCIDevices
> >   formatdomain.html.in: document <address type='unassigned'/>
> >   news.xml: add address type='unassigned' entry
> >   
> 
> Codewise it looks fine now. But I'm looking more closely at patch #3 and
> realizing that it can explicitly reject a previously accepted VM config.
> And indeed, now that I give it a test with my GPU passthrough setup, it
> is rejecting my previosly working config.
> 
> error: Requested operation is not valid: All devices of the same IOMMU
> group 1 of the PCI device 0000:01:00.0 must belong to domain win10
> 
> I've attached the nodedev XML for the three devices with iommuGroup 1.
> Only the two nvidia devices are assigned to my VM, but not the PCIe
> controller device.
> 
> Is the libvirt heuristic missing something? Or is this acting as expected?
> 
> I didn't quite gather that this is a change to reject previously
> accepted configurations, so I will defer to Laine and Alex as to whether
> this should be committed.

Thanks for catching this!  PCIe root ports or bridges being part of an
IOMMU group is part of the nature of the grouping.  However, only
endpoint devices can be bound to vfio-pci and thus participate in this
"partial assignment".  If the code is trying to force all other devices
in the IOMMU group that aren't assigned into this partial assignment
mode, that's just fundamentally broken.  Thanks,

Alex

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH v4 0/5] PCI hostdev partial assignment support
Posted by Daniel Henrique Barboza 4 years, 4 months ago

On 12/16/19 8:06 PM, Alex Williamson wrote:
> On Mon, 16 Dec 2019 17:28:28 -0500
> Cole Robinson <crobinso@redhat.com> wrote:
> 
>> On 12/16/19 8:36 AM, Daniel Henrique Barboza wrote:
>>> changes from version 3 [1]:
> 
> Thanks for catching this!  PCIe root ports or bridges being part of an
> IOMMU group is part of the nature of the grouping.  However, only
> endpoint devices can be bound to vfio-pci and thus participate in this
> "partial assignment".  If the code is trying to force all other devices
> in the IOMMU group that aren't assigned into this partial assignment
> mode, that's just fundamentally broken.  Thanks,

The code isn't forcing a device to be assigned to the guest. It is forcing
all the IOMMU devices to be declared in the domain XML to be detached from
the host.

What I did was to extend a verification Libvirt already does, to check for
PCI devices of the same IOMMU X being used by other domains, to check the
the host as well. Guest start fails if there is any device left in IOMMU X
that's not present in the domain.

In short, the code is implying that all IOMMU devices must be detached from
the host, regardless of whether they're going to be used in the guest,
regardless of whether they are PCI root ports or bridges. Is this assumption
correct, considering kernel/QEMU?


Thanks,


DHB



> 
> Alex
> 

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH v4 0/5] PCI hostdev partial assignment support
Posted by Alex Williamson 4 years, 4 months ago
On Mon, 16 Dec 2019 20:24:56 -0300
Daniel Henrique Barboza <danielhb413@gmail.com> wrote:

> On 12/16/19 8:06 PM, Alex Williamson wrote:
> > On Mon, 16 Dec 2019 17:28:28 -0500
> > Cole Robinson <crobinso@redhat.com> wrote:
> >   
> >> On 12/16/19 8:36 AM, Daniel Henrique Barboza wrote:  
> >>> changes from version 3 [1]:  
> > 
> > Thanks for catching this!  PCIe root ports or bridges being part of an
> > IOMMU group is part of the nature of the grouping.  However, only
> > endpoint devices can be bound to vfio-pci and thus participate in this
> > "partial assignment".  If the code is trying to force all other devices
> > in the IOMMU group that aren't assigned into this partial assignment
> > mode, that's just fundamentally broken.  Thanks,  
> 
> The code isn't forcing a device to be assigned to the guest. It is forcing
> all the IOMMU devices to be declared in the domain XML to be detached from
> the host.

Detached from the host by unbinding from host drivers and binding to
vfio-pci and "partially" assigned to the guest?  That's wrong, we can't
do that.  Not only will vfio-pci not bind to anything but endpoints,
you'll break the host binding bridges that might be part of the group,
and there are valid use cases for sequestering a device with pci-stub
rather than vfio-pci to add another barrier to the user getting access
to the device.
 
> What I did was to extend a verification Libvirt already does, to check for
> PCI devices of the same IOMMU X being used by other domains, to check the
> the host as well. Guest start fails if there is any device left in IOMMU X
> that's not present in the domain.

Yep, can't do that.

> In short, the code is implying that all IOMMU devices must be detached from
> the host, regardless of whether they're going to be used in the guest,
> regardless of whether they are PCI root ports or bridges. Is this assumption
> correct, considering kernel/QEMU?

Nope, please don't do this.  Thanks,

Alex

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH v4 0/5] PCI hostdev partial assignment support
Posted by Daniel Henrique Barboza 4 years, 4 months ago

On 12/16/19 8:43 PM, Alex Williamson wrote:
> On Mon, 16 Dec 2019 20:24:56 -0300
> Daniel Henrique Barboza <danielhb413@gmail.com> wrote:
> 
>>
>> The code isn't forcing a device to be assigned to the guest. It is forcing
>> all the IOMMU devices to be declared in the domain XML to be detached from
>> the host.
> 
> Detached from the host by unbinding from host drivers and binding to
> vfio-pci and "partially" assigned to the guest?  That's wrong, we can't
> do that.  Not only will vfio-pci not bind to anything but endpoints,
> you'll break the host binding bridges that might be part of the group,
> and there are valid use cases for sequestering a device with pci-stub
> rather than vfio-pci to add another barrier to the user getting access
> to the device.
>   
>> What I did was to extend a verification Libvirt already does, to check for
>> PCI devices of the same IOMMU X being used by other domains, to check the
>> the host as well. Guest start fails if there is any device left in IOMMU X
>> that's not present in the domain.
> 
> Yep, can't do that.


Thanks for the info.

To keep the discussion focused, this is the error I'm trying to dodge:

error: internal error: qemu unexpectedly closed the monitor: 2019-10-04T12:39:41.091312Z qemu-system-ppc64: -device vfio-pci,host=0001:09:00.3,id=hostdev0,bus=pci.2.0,addr=0x1.0x3:
vfio 0001:09:00.3: group 1 is not viable
Please ensure all devices within the iommu_group are bound to their vfio bus driver.

This happens when not all PCI devices from IOMMU group 1 are bind to vfio_pci, regardless
of whether QEMU is going to use all of them in the guest. Binding all the IOMMU
devices to vfio-pci makes QEMU satisfied, in this particular case.

What is the minimal condition to avoid this error? What Libvirt is doing ATM is not enough
(it will fail to launch with this QEMU error above), and what I'm proposing is wrong.
Can we say that all PCI endpoints of the same IOMMU must be assigned to vfio-pci?


Thanks,

DHB

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH v4 0/5] PCI hostdev partial assignment support
Posted by Alex Williamson 4 years, 4 months ago
On Mon, 16 Dec 2019 21:09:20 -0300
Daniel Henrique Barboza <danielhb413@gmail.com> wrote:

> On 12/16/19 8:43 PM, Alex Williamson wrote:
> > On Mon, 16 Dec 2019 20:24:56 -0300
> > Daniel Henrique Barboza <danielhb413@gmail.com> wrote:
> >   
> >>
> >> The code isn't forcing a device to be assigned to the guest. It is forcing
> >> all the IOMMU devices to be declared in the domain XML to be detached from
> >> the host.  
> > 
> > Detached from the host by unbinding from host drivers and binding to
> > vfio-pci and "partially" assigned to the guest?  That's wrong, we can't
> > do that.  Not only will vfio-pci not bind to anything but endpoints,
> > you'll break the host binding bridges that might be part of the group,
> > and there are valid use cases for sequestering a device with pci-stub
> > rather than vfio-pci to add another barrier to the user getting access
> > to the device.
> >     
> >> What I did was to extend a verification Libvirt already does, to check for
> >> PCI devices of the same IOMMU X being used by other domains, to check the
> >> the host as well. Guest start fails if there is any device left in IOMMU X
> >> that's not present in the domain.  
> > 
> > Yep, can't do that.  
> 
> 
> Thanks for the info.
> 
> To keep the discussion focused, this is the error I'm trying to dodge:
> 
> error: internal error: qemu unexpectedly closed the monitor: 2019-10-04T12:39:41.091312Z qemu-system-ppc64: -device vfio-pci,host=0001:09:00.3,id=hostdev0,bus=pci.2.0,addr=0x1.0x3:
> vfio 0001:09:00.3: group 1 is not viable
> Please ensure all devices within the iommu_group are bound to their vfio bus driver.
> 
> This happens when not all PCI devices from IOMMU group 1 are bind to vfio_pci, regardless
> of whether QEMU is going to use all of them in the guest. Binding all the IOMMU
> devices to vfio-pci makes QEMU satisfied, in this particular case.
> 
> What is the minimal condition to avoid this error? What Libvirt is doing ATM is not enough
> (it will fail to launch with this QEMU error above), and what I'm proposing is wrong.
> Can we say that all PCI endpoints of the same IOMMU must be assigned to vfio-pci?

Yes, but libvirt should not assume that it can manipulate the bindings
of adjacent devices without being explicitly directed to do so.  The
error may be a hindrance to you, but it might also prevent, for
example, the only other NIC in the system being detached from the host
driver.  Is it worth making the VM run without explicitly listing all
devices to assign at the cost of disrupting host services or subverting
the additional isolation a user might be attempting to configure with
having unused devices bound to vfio-pci.  This seems like a bad idea,
the VM should be configured to explicitly list every device it needs to
have assigned or partially assigned.  Thanks,

Alex

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH v4 0/5] PCI hostdev partial assignment support
Posted by Daniel Henrique Barboza 4 years, 4 months ago

On 12/16/19 9:48 PM, Alex Williamson wrote:
> On Mon, 16 Dec 2019 21:09:20 -0300
> 
> Yes, but libvirt should not assume that it can manipulate the bindings
> of adjacent devices without being explicitly directed to do so.  The
> error may be a hindrance to you, but it might also prevent, for
> example, the only other NIC in the system being detached from the host
> driver.  Is it worth making the VM run without explicitly listing all
> devices to assign at the cost of disrupting host services or subverting
> the additional isolation a user might be attempting to configure with
> having unused devices bound to vfio-pci.  This seems like a bad idea,
> the VM should be configured to explicitly list every device it needs to
> have assigned or partially assigned.  Thanks,
> 

Thanks Alex. I know what's going wrong with this patch series after your messages and
a close inspection of what Libvirt is already doing. Libvirt does a sanity
check for PCI endpoint devices before assigning them to vfio-pci, but the
new detection code I am adding isn't. The result is that Cole can't run his
VM because this new detection code is demanding that a PCI Bridge, that belongs
to the same IOMMU of the devices Cole wants to passthrough, be assigned
to vfio-pci. Which is wrong.


I'll re-send this series fixing that.


Thanks,


DHB


> Alex
> 

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH v4 0/5] PCI hostdev partial assignment support
Posted by Daniel Henrique Barboza 4 years, 4 months ago

On 12/16/19 7:28 PM, Cole Robinson wrote:
> On 12/16/19 8:36 AM, Daniel Henrique Barboza wrote:

> I've attached the nodedev XML for the three devices with iommuGroup 1.
> Only the two nvidia devices are assigned to my VM, but not the PCIe
> controller device.

Just noticed the attached file. This might indicate that there's a bug
in patch 3 that is wrongly assuming that there are more IOMMU devices
in group 1 than the ones you declared.


I'll look into it. The error message could also be more helpful, declaring
which PCI hostdev it thinks it should be declared in the XML.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list