[Qemu-devel] [RFC] virtio-pci: Allow PCIe virtio devices on root bus

David Gibson posted 1 patch 7 years, 1 month ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20170208061602.17666-1-david@gibson.dropbear.id.au
Test checkpatch passed
Test docker passed
Test s390x passed
hw/virtio/virtio-pci.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
[Qemu-devel] [RFC] virtio-pci: Allow PCIe virtio devices on root bus
Posted by David Gibson 7 years, 1 month ago
Marcel,

Your original patch adding PCIe support to virtio-pci.c has the
limitation noted below that PCIe won't be enabled if the device is on
the root bus (rather than under a root or downstream port).  As
reasoned below, I think removing the check is correct, even for x86
(though it would rarely be useful there).  But I could well have
missed something.  Let me know if so...



Virtio devices can appear as either vanilla PCI or PCI-Express devices
depending on the bus they're connected to.  At the moment it will only
appear as vanilla PCI if connected to the root bus of a PCIe host bridge.

Presumably this is to reflect the fact that PCIe devices usually need to
be connected to a root (or further downstream) port rather than directly
on the root bus.  However, due to the odd requirements of the PAPR spec on the 'pseries'
machine type, it's normal for PCIe devices to appear on the root bus
without root ports.

Further, even on x86, there's no inherent reason we couldn't present a
virtio device as an "integrated device" (typically used for things built
into the PCI chipset), and those devices *do* typically appear on the root
bus.

pcie_endpoint_cap_init() already automatically adjusts to advertise as
an integrated device rather than a "normal" PCIe endpoint when attached to
a root bus.  So we can remove the check for root bus within virtio and
allow (at the user's discretion) a PCIe virtio bus to be attached to a
root bus.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/virtio/virtio-pci.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 5ce42af..caea44c 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1737,8 +1737,7 @@ static void virtio_pci_realize(PCIDevice *pci_dev, Error **errp)
 {
     VirtIOPCIProxy *proxy = VIRTIO_PCI(pci_dev);
     VirtioPCIClass *k = VIRTIO_PCI_GET_CLASS(pci_dev);
-    bool pcie_port = pci_bus_is_express(pci_dev->bus) &&
-                     !pci_bus_is_root(pci_dev->bus);
+    bool pcie_port = pci_bus_is_express(pci_dev->bus);
 
     if (!kvm_has_many_ioeventfds()) {
         proxy->flags &= ~VIRTIO_PCI_FLAG_USE_IOEVENTFD;
-- 
2.9.3


Re: [Qemu-devel] [RFC] virtio-pci: Allow PCIe virtio devices on root bus
Posted by Laszlo Ersek 7 years, 1 month ago
On 02/08/17 07:16, David Gibson wrote:
> Marcel,
> 
> Your original patch adding PCIe support to virtio-pci.c has the
> limitation noted below that PCIe won't be enabled if the device is on
> the root bus (rather than under a root or downstream port).  As
> reasoned below, I think removing the check is correct, even for x86
> (though it would rarely be useful there).  But I could well have
> missed something.  Let me know if so...
> 
> 
> 
> Virtio devices can appear as either vanilla PCI or PCI-Express devices
> depending on the bus they're connected to.  At the moment it will only
> appear as vanilla PCI if connected to the root bus of a PCIe host bridge.
> 
> Presumably this is to reflect the fact that PCIe devices usually need to
> be connected to a root (or further downstream) port rather than directly
> on the root bus.  However, due to the odd requirements of the PAPR spec on the 'pseries'
> machine type, it's normal for PCIe devices to appear on the root bus
> without root ports.
> 
> Further, even on x86, there's no inherent reason we couldn't present a
> virtio device as an "integrated device" (typically used for things built
> into the PCI chipset), and those devices *do* typically appear on the root
> bus.

I'm not personally making a counter-argument, just qouting some of the relevant parts of "docs/pcie.txt" ("PCI EXPRESS GUIDELINES"):

> Place only the following kinds of devices directly on the Root Complex:
>     (1) PCI Devices (e.g. network card, graphics card, IDE controller),
>         not controllers. Place only legacy PCI devices on
>         the Root Complex. These will be considered Integrated Endpoints.
>         Note: Integrated Endpoints are not hot-pluggable.
>
>         Although the PCI Express spec does not forbid PCI Express devices as
>         Integrated Endpoints, existing hardware mostly integrates legacy PCI
>         devices with the Root Complex. Guest OSes are suspected to behave
>         strangely when PCI Express devices are integrated
>         with the Root Complex.
>
>     [...]
>
> 2.2 PCI Express only hierarchy
> ==============================
> Always use PCI Express Root Ports to start PCI Express hierarchies.

Above you mention "it's normal for PCIe devices to appear on the root bus without root ports".

Let me turn the question around: is it a *problem* for "pseries" if we require root ports? If so, why exactly?

On 02/08/17 07:16, David Gibson wrote:
> 
> pcie_endpoint_cap_init() already automatically adjusts to advertise as
> an integrated device rather than a "normal" PCIe endpoint when attached to
> a root bus.  So we can remove the check for root bus within virtio and
> allow (at the user's discretion) a PCIe virtio bus to be attached to a
> root bus.

If Marcel thinks this is a good change, then I think we should go through "docs/pcie.txt" with a fine-toothed comb, and update all relevant spots. (If Marcel agrees, perhaps you can include such hunks in your patch at once.)

It also may have consequences for libvirt (but I see you addressed Andrea at once, which is great).

Thanks,
Laszlo

> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  hw/virtio/virtio-pci.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> index 5ce42af..caea44c 100644
> --- a/hw/virtio/virtio-pci.c
> +++ b/hw/virtio/virtio-pci.c
> @@ -1737,8 +1737,7 @@ static void virtio_pci_realize(PCIDevice *pci_dev, Error **errp)
>  {
>      VirtIOPCIProxy *proxy = VIRTIO_PCI(pci_dev);
>      VirtioPCIClass *k = VIRTIO_PCI_GET_CLASS(pci_dev);
> -    bool pcie_port = pci_bus_is_express(pci_dev->bus) &&
> -                     !pci_bus_is_root(pci_dev->bus);
> +    bool pcie_port = pci_bus_is_express(pci_dev->bus);
>  
>      if (!kvm_has_many_ioeventfds()) {
>          proxy->flags &= ~VIRTIO_PCI_FLAG_USE_IOEVENTFD;
> 


Re: [Qemu-devel] [RFC] virtio-pci: Allow PCIe virtio devices on root bus
Posted by David Gibson 7 years, 1 month ago
On Wed, Feb 08, 2017 at 11:40:50AM +0100, Laszlo Ersek wrote:
> On 02/08/17 07:16, David Gibson wrote:
> > Marcel,
> > 
> > Your original patch adding PCIe support to virtio-pci.c has the
> > limitation noted below that PCIe won't be enabled if the device is on
> > the root bus (rather than under a root or downstream port).  As
> > reasoned below, I think removing the check is correct, even for x86
> > (though it would rarely be useful there).  But I could well have
> > missed something.  Let me know if so...
> > 
> > 
> > 
> > Virtio devices can appear as either vanilla PCI or PCI-Express devices
> > depending on the bus they're connected to.  At the moment it will only
> > appear as vanilla PCI if connected to the root bus of a PCIe host bridge.
> > 
> > Presumably this is to reflect the fact that PCIe devices usually need to
> > be connected to a root (or further downstream) port rather than directly
> > on the root bus.  However, due to the odd requirements of the PAPR spec on the 'pseries'
> > machine type, it's normal for PCIe devices to appear on the root bus
> > without root ports.
> > 
> > Further, even on x86, there's no inherent reason we couldn't present a
> > virtio device as an "integrated device" (typically used for things built
> > into the PCI chipset), and those devices *do* typically appear on the root
> > bus.
> 
> I'm not personally making a counter-argument, just qouting some of
> the relevant parts of "docs/pcie.txt" ("PCI EXPRESS GUIDELINES"):

So, an earlier discussion more or less concluded that the PCIe
guidelines don't really work with PAPR guests.  That comes because
PAPR was designed with PowerVM in mind which allows PCI passthrough
but doesn't do any emulated PCI devices.  So they wanted to present
passed through devices (virtual or phyical) to the guest without
inserting virtual root ports.

Now, you can argue that this was a silly decision in PAPR, and you
could well be right, but there it is.

> > Place only the following kinds of devices directly on the Root Complex:
> >     (1) PCI Devices (e.g. network card, graphics card, IDE controller),
> >         not controllers. Place only legacy PCI devices on
> >         the Root Complex. These will be considered Integrated Endpoints.
> >         Note: Integrated Endpoints are not hot-pluggable.
> >
> >         Although the PCI Express spec does not forbid PCI Express devices as
> >         Integrated Endpoints, existing hardware mostly integrates legacy PCI
> >         devices with the Root Complex.

"Mostly".. on my laptop at least the GPU shows up as an integrated PCI
Express endpoint, so it's certainly not the case that *all* root bus
devices are legacy.

> Guest OSes are suspected to behave
> >         strangely when PCI Express devices are integrated
> >         with the Root Complex.

Clearly not that strangely, that often, since my laptop works just fine.

> >
> >     [...]
> >
> > 2.2 PCI Express only hierarchy
> > ==============================
> > Always use PCI Express Root Ports to start PCI Express hierarchies.
> 
> Above you mention "it's normal for PCIe devices to appear on the root bus without root ports".

Well "normal" perhaps wasn't the right word.  Let's say precedented,
if uncommon.

> Let me turn the question around: is it a *problem* for "pseries" if
> we require root ports? If so, why exactly?

That's.. a complex question.  At least Linux guests (and we don't
support any others yet) might cope with the addition of root ports.
Maybe.  I have discussed this option with BenH and others.

However it is gratuitously different from how PCIe devices will
typically appear for the same guest running under PowerVM.  Although I
suspect Linux would cope with the "normal standard" rather than "PAPR
standard" presentation, I'm not as confident about it as I would like.

Another consideration here is that other PCIe capable qemu emulated
devices, such as XHCI, will present fine as PCIe integrated endpoints
when attached to the root bus.  Libvirt won't do that usually, of
course, and it may not be the recommended way of doing things (on PC)
but it's possible.  I don't see any particular reason that virtio-pci
should enforce the root port requirement more so than any other
device.

> On 02/08/17 07:16, David Gibson wrote:
> > 
> > pcie_endpoint_cap_init() already automatically adjusts to advertise as
> > an integrated device rather than a "normal" PCIe endpoint when attached to
> > a root bus.  So we can remove the check for root bus within virtio and
> > allow (at the user's discretion) a PCIe virtio bus to be attached to a
> > root bus.
> 
> If Marcel thinks this is a good change, then I think we should go
> through "docs/pcie.txt" with a fine-toothed comb, and update all
> relevant spots. (If Marcel agrees, perhaps you can include such
> hunks in your patch at once.)

Actually, I think that would be a neverending process.  Maybe better
to put in a whole different spapr-pcie.txt with the assorted ways that
PAPR violates PCIe conventions.

> It also may have consequences for libvirt (but I see you addressed
> Andrea at once, which is great).

Right, I've been discussed this with Andrea all along.  We're working
on a proposed PAPR specific way of allocating PCI and PCIe addresses
(different from the PCIe normal way, but the same as each other).
That will simplify adding PCIe support to PAPR, and also has some
other advantages for PAPR guests (related to the platform specific
isolation, hotplug  and error recovery mechanisms - also different
from the normal PCIe ones).

> 
> Thanks,
> Laszlo
> 
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  hw/virtio/virtio-pci.c | 3 +--
> >  1 file changed, 1 insertion(+), 2 deletions(-)
> > 
> > diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> > index 5ce42af..caea44c 100644
> > --- a/hw/virtio/virtio-pci.c
> > +++ b/hw/virtio/virtio-pci.c
> > @@ -1737,8 +1737,7 @@ static void virtio_pci_realize(PCIDevice *pci_dev, Error **errp)
> >  {
> >      VirtIOPCIProxy *proxy = VIRTIO_PCI(pci_dev);
> >      VirtioPCIClass *k = VIRTIO_PCI_GET_CLASS(pci_dev);
> > -    bool pcie_port = pci_bus_is_express(pci_dev->bus) &&
> > -                     !pci_bus_is_root(pci_dev->bus);
> > +    bool pcie_port = pci_bus_is_express(pci_dev->bus);
> >  
> >      if (!kvm_has_many_ioeventfds()) {
> >          proxy->flags &= ~VIRTIO_PCI_FLAG_USE_IOEVENTFD;
> > 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson
Re: [Qemu-devel] [RFC] virtio-pci: Allow PCIe virtio devices on root bus
Posted by Laszlo Ersek 7 years, 1 month ago
On 02/09/17 05:16, David Gibson wrote:
> On Wed, Feb 08, 2017 at 11:40:50AM +0100, Laszlo Ersek wrote:
>> On 02/08/17 07:16, David Gibson wrote:
>>> Marcel,
>>>
>>> Your original patch adding PCIe support to virtio-pci.c has the
>>> limitation noted below that PCIe won't be enabled if the device is on
>>> the root bus (rather than under a root or downstream port).  As
>>> reasoned below, I think removing the check is correct, even for x86
>>> (though it would rarely be useful there).  But I could well have
>>> missed something.  Let me know if so...
>>>
>>>
>>>
>>> Virtio devices can appear as either vanilla PCI or PCI-Express devices
>>> depending on the bus they're connected to.  At the moment it will only
>>> appear as vanilla PCI if connected to the root bus of a PCIe host bridge.
>>>
>>> Presumably this is to reflect the fact that PCIe devices usually need to
>>> be connected to a root (or further downstream) port rather than directly
>>> on the root bus.  However, due to the odd requirements of the PAPR spec on the 'pseries'
>>> machine type, it's normal for PCIe devices to appear on the root bus
>>> without root ports.
>>>
>>> Further, even on x86, there's no inherent reason we couldn't present a
>>> virtio device as an "integrated device" (typically used for things built
>>> into the PCI chipset), and those devices *do* typically appear on the root
>>> bus.
>>
>> I'm not personally making a counter-argument, just qouting some of
>> the relevant parts of "docs/pcie.txt" ("PCI EXPRESS GUIDELINES"):
> 
> So, an earlier discussion more or less concluded that the PCIe
> guidelines don't really work with PAPR guests.  That comes because
> PAPR was designed with PowerVM in mind which allows PCI passthrough
> but doesn't do any emulated PCI devices.  So they wanted to present
> passed through devices (virtual or phyical) to the guest without
> inserting virtual root ports.
> 
> Now, you can argue that this was a silly decision in PAPR, and you
> could well be right, but there it is.

I can totally accept this, but then we should state it as a fact near
the top of "docs/pcie.txt".

> 
>>> Place only the following kinds of devices directly on the Root Complex:
>>>     (1) PCI Devices (e.g. network card, graphics card, IDE controller),
>>>         not controllers. Place only legacy PCI devices on
>>>         the Root Complex. These will be considered Integrated Endpoints.
>>>         Note: Integrated Endpoints are not hot-pluggable.
>>>
>>>         Although the PCI Express spec does not forbid PCI Express devices as
>>>         Integrated Endpoints, existing hardware mostly integrates legacy PCI
>>>         devices with the Root Complex.
> 
> "Mostly".. on my laptop at least the GPU shows up as an integrated PCI
> Express endpoint, so it's certainly not the case that *all* root bus
> devices are legacy.
> 
>> Guest OSes are suspected to behave
>>>         strangely when PCI Express devices are integrated
>>>         with the Root Complex.
> 
> Clearly not that strangely, that often, since my laptop works just fine.
> 
>>>
>>>     [...]
>>>
>>> 2.2 PCI Express only hierarchy
>>> ==============================
>>> Always use PCI Express Root Ports to start PCI Express hierarchies.
>>
>> Above you mention "it's normal for PCIe devices to appear on the root bus without root ports".
> 
> Well "normal" perhaps wasn't the right word.  Let's say precedented,
> if uncommon.
> 
>> Let me turn the question around: is it a *problem* for "pseries" if
>> we require root ports? If so, why exactly?
> 
> That's.. a complex question.  At least Linux guests (and we don't
> support any others yet) might cope with the addition of root ports.
> Maybe.  I have discussed this option with BenH and others.
> 
> However it is gratuitously different from how PCIe devices will
> typically appear for the same guest running under PowerVM.  Although I
> suspect Linux would cope with the "normal standard" rather than "PAPR
> standard" presentation, I'm not as confident about it as I would like.
> 
> Another consideration here is that other PCIe capable qemu emulated
> devices, such as XHCI, will present fine as PCIe integrated endpoints
> when attached to the root bus.  Libvirt won't do that usually, of
> course, and it may not be the recommended way of doing things (on PC)
> but it's possible.  I don't see any particular reason that virtio-pci
> should enforce the root port requirement more so than any other
> device.
> 
>> On 02/08/17 07:16, David Gibson wrote:
>>>
>>> pcie_endpoint_cap_init() already automatically adjusts to advertise as
>>> an integrated device rather than a "normal" PCIe endpoint when attached to
>>> a root bus.  So we can remove the check for root bus within virtio and
>>> allow (at the user's discretion) a PCIe virtio bus to be attached to a
>>> root bus.
>>
>> If Marcel thinks this is a good change, then I think we should go
>> through "docs/pcie.txt" with a fine-toothed comb, and update all
>> relevant spots. (If Marcel agrees, perhaps you can include such
>> hunks in your patch at once.)
> 
> Actually, I think that would be a neverending process.  Maybe better
> to put in a whole different spapr-pcie.txt with the assorted ways that
> PAPR violates PCIe conventions.

That works for me too, but I think it would be a lot more work for you
and others.

I plan on consulting "docs/pcie.txt" frequently; among other things, for
deciding debates. Thus, improving the scope of "docs/pcie.txt" is very
welcome IMO.

> 
>> It also may have consequences for libvirt (but I see you addressed
>> Andrea at once, which is great).
> 
> Right, I've been discussed this with Andrea all along.  We're working
> on a proposed PAPR specific way of allocating PCI and PCIe addresses
> (different from the PCIe normal way, but the same as each other).
> That will simplify adding PCIe support to PAPR, and also has some
> other advantages for PAPR guests (related to the platform specific
> isolation, hotplug  and error recovery mechanisms - also different
> from the normal PCIe ones).

Great, if Andrea is aware, that's a relief.

Can you resubmit this patch with a small hunk for "docs/pcie.txt" that
removes PAPR from the scope? A short list of actual machine types would
be appreciated too, if that makes sense. (By default we aim at
multi-arch / multi-target with this document; we may not have stated it
explicitly, but AFAIR we intend to cover aarch64 / "virt" too.)

Thanks!
Laszlo

> 
>>
>> Thanks,
>> Laszlo
>>
>>>
>>> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
>>> ---
>>>  hw/virtio/virtio-pci.c | 3 +--
>>>  1 file changed, 1 insertion(+), 2 deletions(-)
>>>
>>> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
>>> index 5ce42af..caea44c 100644
>>> --- a/hw/virtio/virtio-pci.c
>>> +++ b/hw/virtio/virtio-pci.c
>>> @@ -1737,8 +1737,7 @@ static void virtio_pci_realize(PCIDevice *pci_dev, Error **errp)
>>>  {
>>>      VirtIOPCIProxy *proxy = VIRTIO_PCI(pci_dev);
>>>      VirtioPCIClass *k = VIRTIO_PCI_GET_CLASS(pci_dev);
>>> -    bool pcie_port = pci_bus_is_express(pci_dev->bus) &&
>>> -                     !pci_bus_is_root(pci_dev->bus);
>>> +    bool pcie_port = pci_bus_is_express(pci_dev->bus);
>>>  
>>>      if (!kvm_has_many_ioeventfds()) {
>>>          proxy->flags &= ~VIRTIO_PCI_FLAG_USE_IOEVENTFD;
>>>
>>
> 


Re: [Qemu-devel] [RFC] virtio-pci: Allow PCIe virtio devices on root bus
Posted by David Gibson 7 years, 1 month ago
On Thu, Feb 09, 2017 at 10:04:47AM +0100, Laszlo Ersek wrote:
> On 02/09/17 05:16, David Gibson wrote:
> > On Wed, Feb 08, 2017 at 11:40:50AM +0100, Laszlo Ersek wrote:
> >> On 02/08/17 07:16, David Gibson wrote:
> >>> Marcel,
> >>>
> >>> Your original patch adding PCIe support to virtio-pci.c has the
> >>> limitation noted below that PCIe won't be enabled if the device is on
> >>> the root bus (rather than under a root or downstream port).  As
> >>> reasoned below, I think removing the check is correct, even for x86
> >>> (though it would rarely be useful there).  But I could well have
> >>> missed something.  Let me know if so...
> >>>
> >>>
> >>>
> >>> Virtio devices can appear as either vanilla PCI or PCI-Express devices
> >>> depending on the bus they're connected to.  At the moment it will only
> >>> appear as vanilla PCI if connected to the root bus of a PCIe host bridge.
> >>>
> >>> Presumably this is to reflect the fact that PCIe devices usually need to
> >>> be connected to a root (or further downstream) port rather than directly
> >>> on the root bus.  However, due to the odd requirements of the PAPR spec on the 'pseries'
> >>> machine type, it's normal for PCIe devices to appear on the root bus
> >>> without root ports.
> >>>
> >>> Further, even on x86, there's no inherent reason we couldn't present a
> >>> virtio device as an "integrated device" (typically used for things built
> >>> into the PCI chipset), and those devices *do* typically appear on the root
> >>> bus.
> >>
> >> I'm not personally making a counter-argument, just qouting some of
> >> the relevant parts of "docs/pcie.txt" ("PCI EXPRESS GUIDELINES"):
> > 
> > So, an earlier discussion more or less concluded that the PCIe
> > guidelines don't really work with PAPR guests.  That comes because
> > PAPR was designed with PowerVM in mind which allows PCI passthrough
> > but doesn't do any emulated PCI devices.  So they wanted to present
> > passed through devices (virtual or phyical) to the guest without
> > inserting virtual root ports.
> > 
> > Now, you can argue that this was a silly decision in PAPR, and you
> > could well be right, but there it is.
> 
> I can totally accept this, but then we should state it as a fact near
> the top of "docs/pcie.txt".
> 
> > 
> >>> Place only the following kinds of devices directly on the Root Complex:
> >>>     (1) PCI Devices (e.g. network card, graphics card, IDE controller),
> >>>         not controllers. Place only legacy PCI devices on
> >>>         the Root Complex. These will be considered Integrated Endpoints.
> >>>         Note: Integrated Endpoints are not hot-pluggable.
> >>>
> >>>         Although the PCI Express spec does not forbid PCI Express devices as
> >>>         Integrated Endpoints, existing hardware mostly integrates legacy PCI
> >>>         devices with the Root Complex.
> > 
> > "Mostly".. on my laptop at least the GPU shows up as an integrated PCI
> > Express endpoint, so it's certainly not the case that *all* root bus
> > devices are legacy.
> > 
> >> Guest OSes are suspected to behave
> >>>         strangely when PCI Express devices are integrated
> >>>         with the Root Complex.
> > 
> > Clearly not that strangely, that often, since my laptop works just fine.
> > 
> >>>
> >>>     [...]
> >>>
> >>> 2.2 PCI Express only hierarchy
> >>> ==============================
> >>> Always use PCI Express Root Ports to start PCI Express hierarchies.
> >>
> >> Above you mention "it's normal for PCIe devices to appear on the root bus without root ports".
> > 
> > Well "normal" perhaps wasn't the right word.  Let's say precedented,
> > if uncommon.
> > 
> >> Let me turn the question around: is it a *problem* for "pseries" if
> >> we require root ports? If so, why exactly?
> > 
> > That's.. a complex question.  At least Linux guests (and we don't
> > support any others yet) might cope with the addition of root ports.
> > Maybe.  I have discussed this option with BenH and others.
> > 
> > However it is gratuitously different from how PCIe devices will
> > typically appear for the same guest running under PowerVM.  Although I
> > suspect Linux would cope with the "normal standard" rather than "PAPR
> > standard" presentation, I'm not as confident about it as I would like.
> > 
> > Another consideration here is that other PCIe capable qemu emulated
> > devices, such as XHCI, will present fine as PCIe integrated endpoints
> > when attached to the root bus.  Libvirt won't do that usually, of
> > course, and it may not be the recommended way of doing things (on PC)
> > but it's possible.  I don't see any particular reason that virtio-pci
> > should enforce the root port requirement more so than any other
> > device.
> > 
> >> On 02/08/17 07:16, David Gibson wrote:
> >>>
> >>> pcie_endpoint_cap_init() already automatically adjusts to advertise as
> >>> an integrated device rather than a "normal" PCIe endpoint when attached to
> >>> a root bus.  So we can remove the check for root bus within virtio and
> >>> allow (at the user's discretion) a PCIe virtio bus to be attached to a
> >>> root bus.
> >>
> >> If Marcel thinks this is a good change, then I think we should go
> >> through "docs/pcie.txt" with a fine-toothed comb, and update all
> >> relevant spots. (If Marcel agrees, perhaps you can include such
> >> hunks in your patch at once.)
> > 
> > Actually, I think that would be a neverending process.  Maybe better
> > to put in a whole different spapr-pcie.txt with the assorted ways that
> > PAPR violates PCIe conventions.
> 
> That works for me too, but I think it would be a lot more work for you
> and others.
> 
> I plan on consulting "docs/pcie.txt" frequently; among other things, for
> deciding debates. Thus, improving the scope of "docs/pcie.txt" is very
> welcome IMO.
> 
> > 
> >> It also may have consequences for libvirt (but I see you addressed
> >> Andrea at once, which is great).
> > 
> > Right, I've been discussed this with Andrea all along.  We're working
> > on a proposed PAPR specific way of allocating PCI and PCIe addresses
> > (different from the PCIe normal way, but the same as each other).
> > That will simplify adding PCIe support to PAPR, and also has some
> > other advantages for PAPR guests (related to the platform specific
> > isolation, hotplug  and error recovery mechanisms - also different
> > from the normal PCIe ones).
> 
> Great, if Andrea is aware, that's a relief.
> 
> Can you resubmit this patch with a small hunk for "docs/pcie.txt" that
> removes PAPR from the scope?

Well, first I'd like to see if Marcel knows of some reason I didn't
think of why this test is important for virtio particularly here.  But
assuming the basic idea is acceptable, then yes, I'll update pcie.txt.

> A short list of actual machine types would
> be appreciated too, if that makes sense. (By default we aim at
> multi-arch / multi-target with this document; we may not have stated it
> explicitly, but AFAIR we intend to cover aarch64 / "virt" too.)

Right, that was my understanding as well.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson
Re: [Qemu-devel] [RFC] virtio-pci: Allow PCIe virtio devices on root bus
Posted by Marcel Apfelbaum 7 years, 1 month ago
On 02/10/2017 02:37 AM, David Gibson wrote:
> On Thu, Feb 09, 2017 at 10:04:47AM +0100, Laszlo Ersek wrote:
>> On 02/09/17 05:16, David Gibson wrote:
>>> On Wed, Feb 08, 2017 at 11:40:50AM +0100, Laszlo Ersek wrote:
>>>> On 02/08/17 07:16, David Gibson wrote:
>>>>> Marcel,
>>>>>
>>>>> Your original patch adding PCIe support to virtio-pci.c has the
>>>>> limitation noted below that PCIe won't be enabled if the device is on
>>>>> the root bus (rather than under a root or downstream port).  As
>>>>> reasoned below, I think removing the check is correct, even for x86
>>>>> (though it would rarely be useful there).  But I could well have
>>>>> missed something.  Let me know if so...
>>>>>
>>>>>
>>>>>
>>>>> Virtio devices can appear as either vanilla PCI or PCI-Express devices
>>>>> depending on the bus they're connected to.  At the moment it will only
>>>>> appear as vanilla PCI if connected to the root bus of a PCIe host bridge.
>>>>>
>>>>> Presumably this is to reflect the fact that PCIe devices usually need to
>>>>> be connected to a root (or further downstream) port rather than directly
>>>>> on the root bus.  However, due to the odd requirements of the PAPR spec on the 'pseries'
>>>>> machine type, it's normal for PCIe devices to appear on the root bus
>>>>> without root ports.
>>>>>
>>>>> Further, even on x86, there's no inherent reason we couldn't present a
>>>>> virtio device as an "integrated device" (typically used for things built
>>>>> into the PCI chipset), and those devices *do* typically appear on the root
>>>>> bus.
>>>>
>>>> I'm not personally making a counter-argument, just qouting some of
>>>> the relevant parts of "docs/pcie.txt" ("PCI EXPRESS GUIDELINES"):
>>>
>>> So, an earlier discussion more or less concluded that the PCIe
>>> guidelines don't really work with PAPR guests.  That comes because
>>> PAPR was designed with PowerVM in mind which allows PCI passthrough
>>> but doesn't do any emulated PCI devices.  So they wanted to present
>>> passed through devices (virtual or phyical) to the guest without
>>> inserting virtual root ports.
>>>
>>> Now, you can argue that this was a silly decision in PAPR, and you
>>> could well be right, but there it is.
>>
>> I can totally accept this, but then we should state it as a fact near
>> the top of "docs/pcie.txt".
>>
>>>
>>>>> Place only the following kinds of devices directly on the Root Complex:
>>>>>     (1) PCI Devices (e.g. network card, graphics card, IDE controller),
>>>>>         not controllers. Place only legacy PCI devices on
>>>>>         the Root Complex. These will be considered Integrated Endpoints.
>>>>>         Note: Integrated Endpoints are not hot-pluggable.
>>>>>
>>>>>         Although the PCI Express spec does not forbid PCI Express devices as
>>>>>         Integrated Endpoints, existing hardware mostly integrates legacy PCI
>>>>>         devices with the Root Complex.
>>>
>>> "Mostly".. on my laptop at least the GPU shows up as an integrated PCI
>>> Express endpoint, so it's certainly not the case that *all* root bus
>>> devices are legacy.
>>>
>>>> Guest OSes are suspected to behave
>>>>>         strangely when PCI Express devices are integrated
>>>>>         with the Root Complex.
>>>
>>> Clearly not that strangely, that often, since my laptop works just fine.
>>>
>>>>>
>>>>>     [...]
>>>>>
>>>>> 2.2 PCI Express only hierarchy
>>>>> ==============================
>>>>> Always use PCI Express Root Ports to start PCI Express hierarchies.
>>>>
>>>> Above you mention "it's normal for PCIe devices to appear on the root bus without root ports".
>>>
>>> Well "normal" perhaps wasn't the right word.  Let's say precedented,
>>> if uncommon.
>>>
>>>> Let me turn the question around: is it a *problem* for "pseries" if
>>>> we require root ports? If so, why exactly?
>>>
>>> That's.. a complex question.  At least Linux guests (and we don't
>>> support any others yet) might cope with the addition of root ports.
>>> Maybe.  I have discussed this option with BenH and others.
>>>
>>> However it is gratuitously different from how PCIe devices will
>>> typically appear for the same guest running under PowerVM.  Although I
>>> suspect Linux would cope with the "normal standard" rather than "PAPR
>>> standard" presentation, I'm not as confident about it as I would like.
>>>
>>> Another consideration here is that other PCIe capable qemu emulated
>>> devices, such as XHCI, will present fine as PCIe integrated endpoints
>>> when attached to the root bus.  Libvirt won't do that usually, of
>>> course, and it may not be the recommended way of doing things (on PC)
>>> but it's possible.  I don't see any particular reason that virtio-pci
>>> should enforce the root port requirement more so than any other
>>> device.
>>>
>>>> On 02/08/17 07:16, David Gibson wrote:
>>>>>
>>>>> pcie_endpoint_cap_init() already automatically adjusts to advertise as
>>>>> an integrated device rather than a "normal" PCIe endpoint when attached to
>>>>> a root bus.  So we can remove the check for root bus within virtio and
>>>>> allow (at the user's discretion) a PCIe virtio bus to be attached to a
>>>>> root bus.
>>>>
>>>> If Marcel thinks this is a good change, then I think we should go
>>>> through "docs/pcie.txt" with a fine-toothed comb, and update all
>>>> relevant spots. (If Marcel agrees, perhaps you can include such
>>>> hunks in your patch at once.)
>>>
>>> Actually, I think that would be a neverending process.  Maybe better
>>> to put in a whole different spapr-pcie.txt with the assorted ways that
>>> PAPR violates PCIe conventions.
>>
>> That works for me too, but I think it would be a lot more work for you
>> and others.
>>
>> I plan on consulting "docs/pcie.txt" frequently; among other things, for
>> deciding debates. Thus, improving the scope of "docs/pcie.txt" is very
>> welcome IMO.
>>
>>>
>>>> It also may have consequences for libvirt (but I see you addressed
>>>> Andrea at once, which is great).
>>>
>>> Right, I've been discussed this with Andrea all along.  We're working
>>> on a proposed PAPR specific way of allocating PCI and PCIe addresses
>>> (different from the PCIe normal way, but the same as each other).
>>> That will simplify adding PCIe support to PAPR, and also has some
>>> other advantages for PAPR guests (related to the platform specific
>>> isolation, hotplug  and error recovery mechanisms - also different
>>> from the normal PCIe ones).
>>
>> Great, if Andrea is aware, that's a relief.
>>
>> Can you resubmit this patch with a small hunk for "docs/pcie.txt" that
>> removes PAPR from the scope?
>

Hi David,
Sorry for the delay, I just came back from PTO.

> Well, first I'd like to see if Marcel knows of some reason I didn't
> think of why this test is important for virtio particularly here.  But
> assuming the basic idea is acceptable, then yes, I'll update pcie.txt.
>

There are two reasons for keeping virtio Integrated Endpoints as PCI devices.
1. The first point is generic; even if having PCIe devices as Integrated Endpoints should be OK,
    is not recommended because some guests may miss-behave (*). X86 arch supports a large number
    of guests and we don't want to check and fix everything if *we don't have to*.
    Even if is not written anywhere and there are actually some PCI Express Integrated Endpoints,
    most of them are legacy PCI devices (I actually think this is why we have Integrated Endpoints
    at all, but I might be wrong).
2. The second point is virtio specific. Not all the guests have virtio 1.0 support (e.g RHEL 6) and we allow them
    to use legacy virtio devices as Integrated Endpoints (following the thought that this is why we have Integrated Endpoints)
    Making the virtio devices PCI Express, but not virtio 1.0 is also problematic since now we will have too much
    types of virtio devices. We want to keep it simple: virtio legacy -> PCI, virtio modern -> PCIe.


(*) A while ago Alex Williamson found such of issue, I think is this one:
0282ab (vfio/pci: Hide device PCIe capability on non-express buses for PCIe VMs)

>> be appreciated too, if that makes sense. (By default we aim at
>> multi-arch / multi-target with this document; we may not have stated it
>> explicitly, but AFAIR we intend to cover aarch64 / "virt" too.)
>
> Right, that was my understanding as well.
>

Indeed, we want the document to support them all. If PAPR is different, we should mention it.

Thanks,
Marcel

Re: [Qemu-devel] [RFC] virtio-pci: Allow PCIe virtio devices on root bus
Posted by David Gibson 7 years, 1 month ago
On Sun, Feb 12, 2017 at 09:05:46PM +0200, Marcel Apfelbaum wrote:
> On 02/10/2017 02:37 AM, David Gibson wrote:
> > On Thu, Feb 09, 2017 at 10:04:47AM +0100, Laszlo Ersek wrote:
> > > On 02/09/17 05:16, David Gibson wrote:
> > > > On Wed, Feb 08, 2017 at 11:40:50AM +0100, Laszlo Ersek wrote:
> > > > > On 02/08/17 07:16, David Gibson wrote:
> > > > > > Marcel,
> > > > > > 
> > > > > > Your original patch adding PCIe support to virtio-pci.c has the
> > > > > > limitation noted below that PCIe won't be enabled if the device is on
> > > > > > the root bus (rather than under a root or downstream port).  As
> > > > > > reasoned below, I think removing the check is correct, even for x86
> > > > > > (though it would rarely be useful there).  But I could well have
> > > > > > missed something.  Let me know if so...
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > Virtio devices can appear as either vanilla PCI or PCI-Express devices
> > > > > > depending on the bus they're connected to.  At the moment it will only
> > > > > > appear as vanilla PCI if connected to the root bus of a PCIe host bridge.
> > > > > > 
> > > > > > Presumably this is to reflect the fact that PCIe devices usually need to
> > > > > > be connected to a root (or further downstream) port rather than directly
> > > > > > on the root bus.  However, due to the odd requirements of the PAPR spec on the 'pseries'
> > > > > > machine type, it's normal for PCIe devices to appear on the root bus
> > > > > > without root ports.
> > > > > > 
> > > > > > Further, even on x86, there's no inherent reason we couldn't present a
> > > > > > virtio device as an "integrated device" (typically used for things built
> > > > > > into the PCI chipset), and those devices *do* typically appear on the root
> > > > > > bus.
> > > > > 
> > > > > I'm not personally making a counter-argument, just qouting some of
> > > > > the relevant parts of "docs/pcie.txt" ("PCI EXPRESS GUIDELINES"):
> > > > 
> > > > So, an earlier discussion more or less concluded that the PCIe
> > > > guidelines don't really work with PAPR guests.  That comes because
> > > > PAPR was designed with PowerVM in mind which allows PCI passthrough
> > > > but doesn't do any emulated PCI devices.  So they wanted to present
> > > > passed through devices (virtual or phyical) to the guest without
> > > > inserting virtual root ports.
> > > > 
> > > > Now, you can argue that this was a silly decision in PAPR, and you
> > > > could well be right, but there it is.
> > > 
> > > I can totally accept this, but then we should state it as a fact near
> > > the top of "docs/pcie.txt".
> > > 
> > > > 
> > > > > > Place only the following kinds of devices directly on the Root Complex:
> > > > > >     (1) PCI Devices (e.g. network card, graphics card, IDE controller),
> > > > > >         not controllers. Place only legacy PCI devices on
> > > > > >         the Root Complex. These will be considered Integrated Endpoints.
> > > > > >         Note: Integrated Endpoints are not hot-pluggable.
> > > > > > 
> > > > > >         Although the PCI Express spec does not forbid PCI Express devices as
> > > > > >         Integrated Endpoints, existing hardware mostly integrates legacy PCI
> > > > > >         devices with the Root Complex.
> > > > 
> > > > "Mostly".. on my laptop at least the GPU shows up as an integrated PCI
> > > > Express endpoint, so it's certainly not the case that *all* root bus
> > > > devices are legacy.
> > > > 
> > > > > Guest OSes are suspected to behave
> > > > > >         strangely when PCI Express devices are integrated
> > > > > >         with the Root Complex.
> > > > 
> > > > Clearly not that strangely, that often, since my laptop works just fine.
> > > > 
> > > > > > 
> > > > > >     [...]
> > > > > > 
> > > > > > 2.2 PCI Express only hierarchy
> > > > > > ==============================
> > > > > > Always use PCI Express Root Ports to start PCI Express hierarchies.
> > > > > 
> > > > > Above you mention "it's normal for PCIe devices to appear on the root bus without root ports".
> > > > 
> > > > Well "normal" perhaps wasn't the right word.  Let's say precedented,
> > > > if uncommon.
> > > > 
> > > > > Let me turn the question around: is it a *problem* for "pseries" if
> > > > > we require root ports? If so, why exactly?
> > > > 
> > > > That's.. a complex question.  At least Linux guests (and we don't
> > > > support any others yet) might cope with the addition of root ports.
> > > > Maybe.  I have discussed this option with BenH and others.
> > > > 
> > > > However it is gratuitously different from how PCIe devices will
> > > > typically appear for the same guest running under PowerVM.  Although I
> > > > suspect Linux would cope with the "normal standard" rather than "PAPR
> > > > standard" presentation, I'm not as confident about it as I would like.
> > > > 
> > > > Another consideration here is that other PCIe capable qemu emulated
> > > > devices, such as XHCI, will present fine as PCIe integrated endpoints
> > > > when attached to the root bus.  Libvirt won't do that usually, of
> > > > course, and it may not be the recommended way of doing things (on PC)
> > > > but it's possible.  I don't see any particular reason that virtio-pci
> > > > should enforce the root port requirement more so than any other
> > > > device.
> > > > 
> > > > > On 02/08/17 07:16, David Gibson wrote:
> > > > > > 
> > > > > > pcie_endpoint_cap_init() already automatically adjusts to advertise as
> > > > > > an integrated device rather than a "normal" PCIe endpoint when attached to
> > > > > > a root bus.  So we can remove the check for root bus within virtio and
> > > > > > allow (at the user's discretion) a PCIe virtio bus to be attached to a
> > > > > > root bus.
> > > > > 
> > > > > If Marcel thinks this is a good change, then I think we should go
> > > > > through "docs/pcie.txt" with a fine-toothed comb, and update all
> > > > > relevant spots. (If Marcel agrees, perhaps you can include such
> > > > > hunks in your patch at once.)
> > > > 
> > > > Actually, I think that would be a neverending process.  Maybe better
> > > > to put in a whole different spapr-pcie.txt with the assorted ways that
> > > > PAPR violates PCIe conventions.
> > > 
> > > That works for me too, but I think it would be a lot more work for you
> > > and others.
> > > 
> > > I plan on consulting "docs/pcie.txt" frequently; among other things, for
> > > deciding debates. Thus, improving the scope of "docs/pcie.txt" is very
> > > welcome IMO.
> > > 
> > > > 
> > > > > It also may have consequences for libvirt (but I see you addressed
> > > > > Andrea at once, which is great).
> > > > 
> > > > Right, I've been discussed this with Andrea all along.  We're working
> > > > on a proposed PAPR specific way of allocating PCI and PCIe addresses
> > > > (different from the PCIe normal way, but the same as each other).
> > > > That will simplify adding PCIe support to PAPR, and also has some
> > > > other advantages for PAPR guests (related to the platform specific
> > > > isolation, hotplug  and error recovery mechanisms - also different
> > > > from the normal PCIe ones).
> > > 
> > > Great, if Andrea is aware, that's a relief.
> > > 
> > > Can you resubmit this patch with a small hunk for "docs/pcie.txt" that
> > > removes PAPR from the scope?
> > 
> 
> Hi David,
> Sorry for the delay, I just came back from PTO.
> 
> > Well, first I'd like to see if Marcel knows of some reason I didn't
> > think of why this test is important for virtio particularly here.  But
> > assuming the basic idea is acceptable, then yes, I'll update pcie.txt.
> > 
> 
> There are two reasons for keeping virtio Integrated Endpoints as PCI devices.
> 1. The first point is generic; even if having PCIe devices as Integrated Endpoints should be OK,
>    is not recommended because some guests may miss-behave (*). X86 arch supports a large number
>    of guests and we don't want to check and fix everything if *we don't have to*.
>    Even if is not written anywhere and there are actually some PCI Express Integrated Endpoints,
>    most of them are legacy PCI devices (I actually think this is why we have Integrated Endpoints
>    at all, but I might be wrong).

Hm, ok.  Could we implement that restriction in the pci/pcie core
rather than in the virtio device?  That would then protect things like
XHCI as well.  And for my purposes it would also make it easier to
implement aa machine type specific hook to re-allow that configuration
on pseries.

At the moment XHCI and virtio-pci behave differently, which seems less
than ideal.

> 2. The second point is virtio specific. Not all the guests have virtio 1.0 support (e.g RHEL 6) and we allow them
>    to use legacy virtio devices as Integrated Endpoints (following the thought that this is why we have Integrated Endpoints)
>    Making the virtio devices PCI Express, but not virtio 1.0 is also problematic since now we will have too much
>    types of virtio devices. We want to keep it simple: virtio legacy
>    -> PCI, virtio modern -> PCIe.

Ok.. it's not obvious to me why integrated endpoint vs. under a root
port is relevant to this.  Can't we enable/disable PCIe mode based
directly on the legacy/modern settings?

> (*) A while ago Alex Williamson found such of issue, I think is this one:
> 0282ab (vfio/pci: Hide device PCIe capability on non-express buses for PCIe VMs)

It's also not clear to me why this fix is relevant to the question.
That change disables the PCIe capability on a bus which is
not-express, but is under an express root bridge (and is therefore
clearly *not* on the root bus).  For the case I'm talking about the
*is* on an express bus and it *is* the root bus.  AFAICT that patch
would be relevant only for devices under a PCIe-to-PCI bridge on a
PCIe system.

> > > be appreciated too, if that makes sense. (By default we aim at
> > > multi-arch / multi-target with this document; we may not have stated it
> > > explicitly, but AFAIR we intend to cover aarch64 / "virt" too.)
> > 
> > Right, that was my understanding as well.
> > 
> 
> Indeed, we want the document to support them all. If PAPR is different, we should mention it.

Sure.  I'm trying to work out what we can/should do for pseries first;
I'll write something up for the docs when I have something I think is
ready to merge.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson
Re: [Qemu-devel] [RFC] virtio-pci: Allow PCIe virtio devices on root bus
Posted by Marcel Apfelbaum 7 years, 1 month ago
On 02/13/2017 06:33 AM, David Gibson wrote:
> On Sun, Feb 12, 2017 at 09:05:46PM +0200, Marcel Apfelbaum wrote:
>> On 02/10/2017 02:37 AM, David Gibson wrote:
>>> On Thu, Feb 09, 2017 at 10:04:47AM +0100, Laszlo Ersek wrote:
>>>> On 02/09/17 05:16, David Gibson wrote:
>>>>> On Wed, Feb 08, 2017 at 11:40:50AM +0100, Laszlo Ersek wrote:
>>>>>> On 02/08/17 07:16, David Gibson wrote:
>>>>>>> Marcel,
>>>>>>>
>>>>>>> Your original patch adding PCIe support to virtio-pci.c has the
>>>>>>> limitation noted below that PCIe won't be enabled if the device is on
>>>>>>> the root bus (rather than under a root or downstream port).  As
>>>>>>> reasoned below, I think removing the check is correct, even for x86
>>>>>>> (though it would rarely be useful there).  But I could well have
>>>>>>> missed something.  Let me know if so...
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Virtio devices can appear as either vanilla PCI or PCI-Express devices
>>>>>>> depending on the bus they're connected to.  At the moment it will only
>>>>>>> appear as vanilla PCI if connected to the root bus of a PCIe host bridge.
>>>>>>>
>>>>>>> Presumably this is to reflect the fact that PCIe devices usually need to
>>>>>>> be connected to a root (or further downstream) port rather than directly
>>>>>>> on the root bus.  However, due to the odd requirements of the PAPR spec on the 'pseries'
>>>>>>> machine type, it's normal for PCIe devices to appear on the root bus
>>>>>>> without root ports.
>>>>>>>
>>>>>>> Further, even on x86, there's no inherent reason we couldn't present a
>>>>>>> virtio device as an "integrated device" (typically used for things built
>>>>>>> into the PCI chipset), and those devices *do* typically appear on the root
>>>>>>> bus.
>>>>>>
>>>>>> I'm not personally making a counter-argument, just qouting some of
>>>>>> the relevant parts of "docs/pcie.txt" ("PCI EXPRESS GUIDELINES"):
>>>>>
>>>>> So, an earlier discussion more or less concluded that the PCIe
>>>>> guidelines don't really work with PAPR guests.  That comes because
>>>>> PAPR was designed with PowerVM in mind which allows PCI passthrough
>>>>> but doesn't do any emulated PCI devices.  So they wanted to present
>>>>> passed through devices (virtual or phyical) to the guest without
>>>>> inserting virtual root ports.
>>>>>
>>>>> Now, you can argue that this was a silly decision in PAPR, and you
>>>>> could well be right, but there it is.
>>>>
>>>> I can totally accept this, but then we should state it as a fact near
>>>> the top of "docs/pcie.txt".
>>>>
>>>>>
>>>>>>> Place only the following kinds of devices directly on the Root Complex:
>>>>>>>     (1) PCI Devices (e.g. network card, graphics card, IDE controller),
>>>>>>>         not controllers. Place only legacy PCI devices on
>>>>>>>         the Root Complex. These will be considered Integrated Endpoints.
>>>>>>>         Note: Integrated Endpoints are not hot-pluggable.
>>>>>>>
>>>>>>>         Although the PCI Express spec does not forbid PCI Express devices as
>>>>>>>         Integrated Endpoints, existing hardware mostly integrates legacy PCI
>>>>>>>         devices with the Root Complex.
>>>>>
>>>>> "Mostly".. on my laptop at least the GPU shows up as an integrated PCI
>>>>> Express endpoint, so it's certainly not the case that *all* root bus
>>>>> devices are legacy.
>>>>>
>>>>>> Guest OSes are suspected to behave
>>>>>>>         strangely when PCI Express devices are integrated
>>>>>>>         with the Root Complex.
>>>>>
>>>>> Clearly not that strangely, that often, since my laptop works just fine.
>>>>>
>>>>>>>
>>>>>>>     [...]
>>>>>>>
>>>>>>> 2.2 PCI Express only hierarchy
>>>>>>> ==============================
>>>>>>> Always use PCI Express Root Ports to start PCI Express hierarchies.
>>>>>>
>>>>>> Above you mention "it's normal for PCIe devices to appear on the root bus without root ports".
>>>>>
>>>>> Well "normal" perhaps wasn't the right word.  Let's say precedented,
>>>>> if uncommon.
>>>>>
>>>>>> Let me turn the question around: is it a *problem* for "pseries" if
>>>>>> we require root ports? If so, why exactly?
>>>>>
>>>>> That's.. a complex question.  At least Linux guests (and we don't
>>>>> support any others yet) might cope with the addition of root ports.
>>>>> Maybe.  I have discussed this option with BenH and others.
>>>>>
>>>>> However it is gratuitously different from how PCIe devices will
>>>>> typically appear for the same guest running under PowerVM.  Although I
>>>>> suspect Linux would cope with the "normal standard" rather than "PAPR
>>>>> standard" presentation, I'm not as confident about it as I would like.
>>>>>
>>>>> Another consideration here is that other PCIe capable qemu emulated
>>>>> devices, such as XHCI, will present fine as PCIe integrated endpoints
>>>>> when attached to the root bus.  Libvirt won't do that usually, of
>>>>> course, and it may not be the recommended way of doing things (on PC)
>>>>> but it's possible.  I don't see any particular reason that virtio-pci
>>>>> should enforce the root port requirement more so than any other
>>>>> device.
>>>>>
>>>>>> On 02/08/17 07:16, David Gibson wrote:
>>>>>>>
>>>>>>> pcie_endpoint_cap_init() already automatically adjusts to advertise as
>>>>>>> an integrated device rather than a "normal" PCIe endpoint when attached to
>>>>>>> a root bus.  So we can remove the check for root bus within virtio and
>>>>>>> allow (at the user's discretion) a PCIe virtio bus to be attached to a
>>>>>>> root bus.
>>>>>>
>>>>>> If Marcel thinks this is a good change, then I think we should go
>>>>>> through "docs/pcie.txt" with a fine-toothed comb, and update all
>>>>>> relevant spots. (If Marcel agrees, perhaps you can include such
>>>>>> hunks in your patch at once.)
>>>>>
>>>>> Actually, I think that would be a neverending process.  Maybe better
>>>>> to put in a whole different spapr-pcie.txt with the assorted ways that
>>>>> PAPR violates PCIe conventions.
>>>>
>>>> That works for me too, but I think it would be a lot more work for you
>>>> and others.
>>>>
>>>> I plan on consulting "docs/pcie.txt" frequently; among other things, for
>>>> deciding debates. Thus, improving the scope of "docs/pcie.txt" is very
>>>> welcome IMO.
>>>>
>>>>>
>>>>>> It also may have consequences for libvirt (but I see you addressed
>>>>>> Andrea at once, which is great).
>>>>>
>>>>> Right, I've been discussed this with Andrea all along.  We're working
>>>>> on a proposed PAPR specific way of allocating PCI and PCIe addresses
>>>>> (different from the PCIe normal way, but the same as each other).
>>>>> That will simplify adding PCIe support to PAPR, and also has some
>>>>> other advantages for PAPR guests (related to the platform specific
>>>>> isolation, hotplug  and error recovery mechanisms - also different
>>>>> from the normal PCIe ones).
>>>>
>>>> Great, if Andrea is aware, that's a relief.
>>>>
>>>> Can you resubmit this patch with a small hunk for "docs/pcie.txt" that
>>>> removes PAPR from the scope?
>>>
>>
>> Hi David,
>> Sorry for the delay, I just came back from PTO.
>>
>>> Well, first I'd like to see if Marcel knows of some reason I didn't
>>> think of why this test is important for virtio particularly here.  But
>>> assuming the basic idea is acceptable, then yes, I'll update pcie.txt.
>>>
>>
>> There are two reasons for keeping virtio Integrated Endpoints as PCI devices.
>> 1. The first point is generic; even if having PCIe devices as Integrated Endpoints should be OK,
>>    is not recommended because some guests may miss-behave (*). X86 arch supports a large number
>>    of guests and we don't want to check and fix everything if *we don't have to*.
>>    Even if is not written anywhere and there are actually some PCI Express Integrated Endpoints,
>>    most of them are legacy PCI devices (I actually think this is why we have Integrated Endpoints
>>    at all, but I might be wrong).
>
> Hm, ok.  Could we implement that restriction in the pci/pcie core
> rather than in the virtio device?

I am not sure if we should do that. Most of the devices are PCI or PCIe.
Only some devices are "hybrid", the virtio devices, XHCI and I am not sure we have more.

> That would then protect things like
> XHCI as well.

I don't see a reason to have XHCI as Integrated Endpoint, I think it should be always
plugged into a root port. (for x86. arm and power)

   And for my purposes it would also make it easier to
> implement aa machine type specific hook to re-allow that configuration
> on pseries.

I agree we need a solution for PAPR.

What about a pcie_papr() function and then:

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 5ce42af..2c646ae 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1804,7 +1804,7 @@ static void virtio_pci_realize(PCIDevice *pci_dev, Error **errp)
          return;
      }

-    if (pcie_port && pci_is_express(pci_dev)) {
+    if ((pcie_port || pcie_parp()) && pci_is_express(pci_dev)) {
          int pos;

          pos = pcie_endpoint_cap_init(pci_dev, 0);


>
> At the moment XHCI and virtio-pci behave differently, which seems less
> than ideal.
>
>> 2. The second point is virtio specific. Not all the guests have virtio 1.0 support (e.g RHEL 6) and we allow them
>>    to use legacy virtio devices as Integrated Endpoints (following the thought that this is why we have Integrated Endpoints)
>>    Making the virtio devices PCI Express, but not virtio 1.0 is also problematic since now we will have too much
>>    types of virtio devices. We want to keep it simple: virtio legacy
>>    -> PCI, virtio modern -> PCIe.
>
> Ok.. it's not obvious to me why integrated endpoint vs. under a root
> port is relevant to this.  Can't we enable/disable PCIe mode based
> directly on the legacy/modern settings?
>

Yes, we can, but we don't want to do that. Previous setups will stop working
and we will need libvirt to mange the disable-* properties.
As a matter of fact the code today is after some discussions with libvirt guys.


>> (*) A while ago Alex Williamson found such of issue, I think is this one:
>> 0282ab (vfio/pci: Hide device PCIe capability on non-express buses for PCIe VMs)
>
> It's also not clear to me why this fix is relevant to the question.
> That change disables the PCIe capability on a bus which is
> not-express, but is under an express root bridge (and is therefore
> clearly *not* on the root bus).  For the case I'm talking about the
> *is* on an express bus and it *is* the root bus.  AFAICT that patch
> would be relevant only for devices under a PCIe-to-PCI bridge on a
> PCIe system.
>

I might have selected the wrong patch.

Thanks,
Marcel


>>>> be appreciated too, if that makes sense. (By default we aim at
>>>> multi-arch / multi-target with this document; we may not have stated it
>>>> explicitly, but AFAIR we intend to cover aarch64 / "virt" too.)
>>>
>>> Right, that was my understanding as well.
>>>
>>
>> Indeed, we want the document to support them all. If PAPR is different, we should mention it.
>
> Sure.  I'm trying to work out what we can/should do for pseries first;
> I'll write something up for the docs when I have something I think is
> ready to merge.
>


Re: [Qemu-devel] [RFC] virtio-pci: Allow PCIe virtio devices on root bus
Posted by David Gibson 7 years, 1 month ago
On Mon, Feb 13, 2017 at 12:14:23PM +0200, Marcel Apfelbaum wrote:
> On 02/13/2017 06:33 AM, David Gibson wrote:
> > On Sun, Feb 12, 2017 at 09:05:46PM +0200, Marcel Apfelbaum wrote:
> > > On 02/10/2017 02:37 AM, David Gibson wrote:
> > > > On Thu, Feb 09, 2017 at 10:04:47AM +0100, Laszlo Ersek wrote:
> > > > > On 02/09/17 05:16, David Gibson wrote:
> > > > > > On Wed, Feb 08, 2017 at 11:40:50AM +0100, Laszlo Ersek wrote:
> > > > > > > On 02/08/17 07:16, David Gibson wrote:
> > > > > > > > Marcel,
> > > > > > > > 
> > > > > > > > Your original patch adding PCIe support to virtio-pci.c has the
> > > > > > > > limitation noted below that PCIe won't be enabled if the device is on
> > > > > > > > the root bus (rather than under a root or downstream port).  As
> > > > > > > > reasoned below, I think removing the check is correct, even for x86
> > > > > > > > (though it would rarely be useful there).  But I could well have
> > > > > > > > missed something.  Let me know if so...
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > Virtio devices can appear as either vanilla PCI or PCI-Express devices
> > > > > > > > depending on the bus they're connected to.  At the moment it will only
> > > > > > > > appear as vanilla PCI if connected to the root bus of a PCIe host bridge.
> > > > > > > > 
> > > > > > > > Presumably this is to reflect the fact that PCIe devices usually need to
> > > > > > > > be connected to a root (or further downstream) port rather than directly
> > > > > > > > on the root bus.  However, due to the odd requirements of the PAPR spec on the 'pseries'
> > > > > > > > machine type, it's normal for PCIe devices to appear on the root bus
> > > > > > > > without root ports.
> > > > > > > > 
> > > > > > > > Further, even on x86, there's no inherent reason we couldn't present a
> > > > > > > > virtio device as an "integrated device" (typically used for things built
> > > > > > > > into the PCI chipset), and those devices *do* typically appear on the root
> > > > > > > > bus.
> > > > > > > 
> > > > > > > I'm not personally making a counter-argument, just qouting some of
> > > > > > > the relevant parts of "docs/pcie.txt" ("PCI EXPRESS GUIDELINES"):
> > > > > > 
> > > > > > So, an earlier discussion more or less concluded that the PCIe
> > > > > > guidelines don't really work with PAPR guests.  That comes because
> > > > > > PAPR was designed with PowerVM in mind which allows PCI passthrough
> > > > > > but doesn't do any emulated PCI devices.  So they wanted to present
> > > > > > passed through devices (virtual or phyical) to the guest without
> > > > > > inserting virtual root ports.
> > > > > > 
> > > > > > Now, you can argue that this was a silly decision in PAPR, and you
> > > > > > could well be right, but there it is.
> > > > > 
> > > > > I can totally accept this, but then we should state it as a fact near
> > > > > the top of "docs/pcie.txt".
> > > > > 
> > > > > > 
> > > > > > > > Place only the following kinds of devices directly on the Root Complex:
> > > > > > > >     (1) PCI Devices (e.g. network card, graphics card, IDE controller),
> > > > > > > >         not controllers. Place only legacy PCI devices on
> > > > > > > >         the Root Complex. These will be considered Integrated Endpoints.
> > > > > > > >         Note: Integrated Endpoints are not hot-pluggable.
> > > > > > > > 
> > > > > > > >         Although the PCI Express spec does not forbid PCI Express devices as
> > > > > > > >         Integrated Endpoints, existing hardware mostly integrates legacy PCI
> > > > > > > >         devices with the Root Complex.
> > > > > > 
> > > > > > "Mostly".. on my laptop at least the GPU shows up as an integrated PCI
> > > > > > Express endpoint, so it's certainly not the case that *all* root bus
> > > > > > devices are legacy.
> > > > > > 
> > > > > > > Guest OSes are suspected to behave
> > > > > > > >         strangely when PCI Express devices are integrated
> > > > > > > >         with the Root Complex.
> > > > > > 
> > > > > > Clearly not that strangely, that often, since my laptop works just fine.
> > > > > > 
> > > > > > > > 
> > > > > > > >     [...]
> > > > > > > > 
> > > > > > > > 2.2 PCI Express only hierarchy
> > > > > > > > ==============================
> > > > > > > > Always use PCI Express Root Ports to start PCI Express hierarchies.
> > > > > > > 
> > > > > > > Above you mention "it's normal for PCIe devices to appear on the root bus without root ports".
> > > > > > 
> > > > > > Well "normal" perhaps wasn't the right word.  Let's say precedented,
> > > > > > if uncommon.
> > > > > > 
> > > > > > > Let me turn the question around: is it a *problem* for "pseries" if
> > > > > > > we require root ports? If so, why exactly?
> > > > > > 
> > > > > > That's.. a complex question.  At least Linux guests (and we don't
> > > > > > support any others yet) might cope with the addition of root ports.
> > > > > > Maybe.  I have discussed this option with BenH and others.
> > > > > > 
> > > > > > However it is gratuitously different from how PCIe devices will
> > > > > > typically appear for the same guest running under PowerVM.  Although I
> > > > > > suspect Linux would cope with the "normal standard" rather than "PAPR
> > > > > > standard" presentation, I'm not as confident about it as I would like.
> > > > > > 
> > > > > > Another consideration here is that other PCIe capable qemu emulated
> > > > > > devices, such as XHCI, will present fine as PCIe integrated endpoints
> > > > > > when attached to the root bus.  Libvirt won't do that usually, of
> > > > > > course, and it may not be the recommended way of doing things (on PC)
> > > > > > but it's possible.  I don't see any particular reason that virtio-pci
> > > > > > should enforce the root port requirement more so than any other
> > > > > > device.
> > > > > > 
> > > > > > > On 02/08/17 07:16, David Gibson wrote:
> > > > > > > > 
> > > > > > > > pcie_endpoint_cap_init() already automatically adjusts to advertise as
> > > > > > > > an integrated device rather than a "normal" PCIe endpoint when attached to
> > > > > > > > a root bus.  So we can remove the check for root bus within virtio and
> > > > > > > > allow (at the user's discretion) a PCIe virtio bus to be attached to a
> > > > > > > > root bus.
> > > > > > > 
> > > > > > > If Marcel thinks this is a good change, then I think we should go
> > > > > > > through "docs/pcie.txt" with a fine-toothed comb, and update all
> > > > > > > relevant spots. (If Marcel agrees, perhaps you can include such
> > > > > > > hunks in your patch at once.)
> > > > > > 
> > > > > > Actually, I think that would be a neverending process.  Maybe better
> > > > > > to put in a whole different spapr-pcie.txt with the assorted ways that
> > > > > > PAPR violates PCIe conventions.
> > > > > 
> > > > > That works for me too, but I think it would be a lot more work for you
> > > > > and others.
> > > > > 
> > > > > I plan on consulting "docs/pcie.txt" frequently; among other things, for
> > > > > deciding debates. Thus, improving the scope of "docs/pcie.txt" is very
> > > > > welcome IMO.
> > > > > 
> > > > > > 
> > > > > > > It also may have consequences for libvirt (but I see you addressed
> > > > > > > Andrea at once, which is great).
> > > > > > 
> > > > > > Right, I've been discussed this with Andrea all along.  We're working
> > > > > > on a proposed PAPR specific way of allocating PCI and PCIe addresses
> > > > > > (different from the PCIe normal way, but the same as each other).
> > > > > > That will simplify adding PCIe support to PAPR, and also has some
> > > > > > other advantages for PAPR guests (related to the platform specific
> > > > > > isolation, hotplug  and error recovery mechanisms - also different
> > > > > > from the normal PCIe ones).
> > > > > 
> > > > > Great, if Andrea is aware, that's a relief.
> > > > > 
> > > > > Can you resubmit this patch with a small hunk for "docs/pcie.txt" that
> > > > > removes PAPR from the scope?
> > > > 
> > > 
> > > Hi David,
> > > Sorry for the delay, I just came back from PTO.
> > > 
> > > > Well, first I'd like to see if Marcel knows of some reason I didn't
> > > > think of why this test is important for virtio particularly here.  But
> > > > assuming the basic idea is acceptable, then yes, I'll update pcie.txt.
> > > > 
> > > 
> > > There are two reasons for keeping virtio Integrated Endpoints as PCI devices.
> > > 1. The first point is generic; even if having PCIe devices as Integrated Endpoints should be OK,
> > >    is not recommended because some guests may miss-behave (*). X86 arch supports a large number
> > >    of guests and we don't want to check and fix everything if *we don't have to*.
> > >    Even if is not written anywhere and there are actually some PCI Express Integrated Endpoints,
> > >    most of them are legacy PCI devices (I actually think this is why we have Integrated Endpoints
> > >    at all, but I might be wrong).
> > 
> > Hm, ok.  Could we implement that restriction in the pci/pcie core
> > rather than in the virtio device?
> 
> I am not sure if we should do that. Most of the devices are PCI or PCIe.
> Only some devices are "hybrid", the virtio devices, XHCI and I am not sure we have more.

Ok, I see your point, the pcie core might not be right.  But it still
seems really weird to have it explicitly in each hybrid device, even
if it's just the two.  As the code stands right now, XHCI and virtio
have different behaviour, without a clear reason for it.

Maybe a hybrid_setup() helper function?

> > That would then protect things like
> > XHCI as well.
> 
> I don't see a reason to have XHCI as Integrated Endpoint, I think it should be always
> plugged into a root port. (for x86. arm and power)

No, not for Power.  Well, ok, yes for most ppc machine types, but not
for the paravirtualized 'pseries' machine (which is the one we most
care about).

Well.. I guess it doesn't need to be an Integrated Endpoint per se,
but we should be able to have it appear on the guest root bus.

The thing to realize is that the paravirtualized PCI interfaces in
PAPR mean there's very little guest visible difference between PCI and
PCIe.  In fact in most ways you could say that the paravirtualized bus
operates like a vanilla PCI bus.. except that it does provide a way to
access PCIe extended config space.  Which means that you can use it to
drive PCIe devices just fine.  "Bus level" PCIe extensions like AER
and PCIe standard hotplug won't work, but PAPR has its own mechanisms
for those (common between PCI and PCIe).

I did float the idea of having the pseries PCI bus remain plain PCI
but with a special flag to allow PCIe devices to be attached to it
anyway.  It wasn't greeted with much enthusiasm..

>   And for my purposes it would also make it easier to
> > implement aa machine type specific hook to re-allow that configuration
> > on pseries.
> 
> I agree we need a solution for PAPR.
> 
> What about a pcie_papr() function and then:
> 
> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> index 5ce42af..2c646ae 100644
> --- a/hw/virtio/virtio-pci.c
> +++ b/hw/virtio/virtio-pci.c
> @@ -1804,7 +1804,7 @@ static void virtio_pci_realize(PCIDevice *pci_dev, Error **errp)
>          return;
>      }
> 
> -    if (pcie_port && pci_is_express(pci_dev)) {
> +    if ((pcie_port || pcie_parp()) && pci_is_express(pci_dev)) {
>          int pos;
> 
>          pos = pcie_endpoint_cap_init(pci_dev, 0);

That would be sufficient, yes, so I'll take it if we don't come to any
other solutions.  It still seems weird to me to have this logic within
a specific device implementation though.

> > 
> > At the moment XHCI and virtio-pci behave differently, which seems less
> > than ideal.
> > 
> > > 2. The second point is virtio specific. Not all the guests have virtio 1.0 support (e.g RHEL 6) and we allow them
> > >    to use legacy virtio devices as Integrated Endpoints (following the thought that this is why we have Integrated Endpoints)
> > >    Making the virtio devices PCI Express, but not virtio 1.0 is also problematic since now we will have too much
> > >    types of virtio devices. We want to keep it simple: virtio legacy
> > >    -> PCI, virtio modern -> PCIe.
> > 
> > Ok.. it's not obvious to me why integrated endpoint vs. under a root
> > port is relevant to this.  Can't we enable/disable PCIe mode based
> > directly on the legacy/modern settings?
> > 
> 
> Yes, we can, but we don't want to do that. Previous setups will stop working
> and we will need libvirt to mange the disable-* properties.
> As a matter of fact the code today is after some discussions with libvirt guys.

Uh.. I think I'm missing something.. but it's still not clear to me
why this would break existing setups or impose more work on libvirt.

> > > (*) A while ago Alex Williamson found such of issue, I think is this one:
> > > 0282ab (vfio/pci: Hide device PCIe capability on non-express buses for PCIe VMs)
> > 
> > It's also not clear to me why this fix is relevant to the question.
> > That change disables the PCIe capability on a bus which is
> > not-express, but is under an express root bridge (and is therefore
> > clearly *not* on the root bus).  For the case I'm talking about the
> > *is* on an express bus and it *is* the root bus.  AFAICT that patch
> > would be relevant only for devices under a PCIe-to-PCI bridge on a
> > PCIe system.
> > 
> 
> I might have selected the wrong patch.

Ok.

> 
> Thanks,
> Marcel
> 
> 
> > > > > be appreciated too, if that makes sense. (By default we aim at
> > > > > multi-arch / multi-target with this document; we may not have stated it
> > > > > explicitly, but AFAIR we intend to cover aarch64 / "virt" too.)
> > > > 
> > > > Right, that was my understanding as well.
> > > > 
> > > 
> > > Indeed, we want the document to support them all. If PAPR is different, we should mention it.
> > 
> > Sure.  I'm trying to work out what we can/should do for pseries first;
> > I'll write something up for the docs when I have something I think is
> > ready to merge.
> > 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson
Re: [Qemu-devel] [RFC] virtio-pci: Allow PCIe virtio devices on root bus
Posted by Marcel Apfelbaum 7 years, 1 month ago
On 02/14/2017 06:15 AM, David Gibson wrote:
> On Mon, Feb 13, 2017 at 12:14:23PM +0200, Marcel Apfelbaum wrote:
>> On 02/13/2017 06:33 AM, David Gibson wrote:
>>> On Sun, Feb 12, 2017 at 09:05:46PM +0200, Marcel Apfelbaum wrote:
>>>> On 02/10/2017 02:37 AM, David Gibson wrote:
>>>>> On Thu, Feb 09, 2017 at 10:04:47AM +0100, Laszlo Ersek wrote:
>>>>>> On 02/09/17 05:16, David Gibson wrote:
>>>>>>> On Wed, Feb 08, 2017 at 11:40:50AM +0100, Laszlo Ersek wrote:
>>>>>>>> On 02/08/17 07:16, David Gibson wrote:
>>>>>>>>> Marcel,
>>>>>>>>>
>>>>>>>>> Your original patch adding PCIe support to virtio-pci.c has the
>>>>>>>>> limitation noted below that PCIe won't be enabled if the device is on
>>>>>>>>> the root bus (rather than under a root or downstream port).  As
>>>>>>>>> reasoned below, I think removing the check is correct, even for x86
>>>>>>>>> (though it would rarely be useful there).  But I could well have
>>>>>>>>> missed something.  Let me know if so...
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Virtio devices can appear as either vanilla PCI or PCI-Express devices
>>>>>>>>> depending on the bus they're connected to.  At the moment it will only
>>>>>>>>> appear as vanilla PCI if connected to the root bus of a PCIe host bridge.
>>>>>>>>>
>>>>>>>>> Presumably this is to reflect the fact that PCIe devices usually need to
>>>>>>>>> be connected to a root (or further downstream) port rather than directly
>>>>>>>>> on the root bus.  However, due to the odd requirements of the PAPR spec on the 'pseries'
>>>>>>>>> machine type, it's normal for PCIe devices to appear on the root bus
>>>>>>>>> without root ports.
>>>>>>>>>
>>>>>>>>> Further, even on x86, there's no inherent reason we couldn't present a
>>>>>>>>> virtio device as an "integrated device" (typically used for things built
>>>>>>>>> into the PCI chipset), and those devices *do* typically appear on the root
>>>>>>>>> bus.
>>>>>>>>
>>>>>>>> I'm not personally making a counter-argument, just qouting some of
>>>>>>>> the relevant parts of "docs/pcie.txt" ("PCI EXPRESS GUIDELINES"):
>>>>>>>
>>>>>>> So, an earlier discussion more or less concluded that the PCIe
>>>>>>> guidelines don't really work with PAPR guests.  That comes because
>>>>>>> PAPR was designed with PowerVM in mind which allows PCI passthrough
>>>>>>> but doesn't do any emulated PCI devices.  So they wanted to present
>>>>>>> passed through devices (virtual or phyical) to the guest without
>>>>>>> inserting virtual root ports.
>>>>>>>
>>>>>>> Now, you can argue that this was a silly decision in PAPR, and you
>>>>>>> could well be right, but there it is.
>>>>>>
>>>>>> I can totally accept this, but then we should state it as a fact near
>>>>>> the top of "docs/pcie.txt".
>>>>>>
>>>>>>>
>>>>>>>>> Place only the following kinds of devices directly on the Root Complex:
>>>>>>>>>     (1) PCI Devices (e.g. network card, graphics card, IDE controller),
>>>>>>>>>         not controllers. Place only legacy PCI devices on
>>>>>>>>>         the Root Complex. These will be considered Integrated Endpoints.
>>>>>>>>>         Note: Integrated Endpoints are not hot-pluggable.
>>>>>>>>>
>>>>>>>>>         Although the PCI Express spec does not forbid PCI Express devices as
>>>>>>>>>         Integrated Endpoints, existing hardware mostly integrates legacy PCI
>>>>>>>>>         devices with the Root Complex.
>>>>>>>
>>>>>>> "Mostly".. on my laptop at least the GPU shows up as an integrated PCI
>>>>>>> Express endpoint, so it's certainly not the case that *all* root bus
>>>>>>> devices are legacy.
>>>>>>>
>>>>>>>> Guest OSes are suspected to behave
>>>>>>>>>         strangely when PCI Express devices are integrated
>>>>>>>>>         with the Root Complex.
>>>>>>>
>>>>>>> Clearly not that strangely, that often, since my laptop works just fine.
>>>>>>>
>>>>>>>>>
>>>>>>>>>     [...]
>>>>>>>>>
>>>>>>>>> 2.2 PCI Express only hierarchy
>>>>>>>>> ==============================
>>>>>>>>> Always use PCI Express Root Ports to start PCI Express hierarchies.
>>>>>>>>
>>>>>>>> Above you mention "it's normal for PCIe devices to appear on the root bus without root ports".
>>>>>>>
>>>>>>> Well "normal" perhaps wasn't the right word.  Let's say precedented,
>>>>>>> if uncommon.
>>>>>>>
>>>>>>>> Let me turn the question around: is it a *problem* for "pseries" if
>>>>>>>> we require root ports? If so, why exactly?
>>>>>>>
>>>>>>> That's.. a complex question.  At least Linux guests (and we don't
>>>>>>> support any others yet) might cope with the addition of root ports.
>>>>>>> Maybe.  I have discussed this option with BenH and others.
>>>>>>>
>>>>>>> However it is gratuitously different from how PCIe devices will
>>>>>>> typically appear for the same guest running under PowerVM.  Although I
>>>>>>> suspect Linux would cope with the "normal standard" rather than "PAPR
>>>>>>> standard" presentation, I'm not as confident about it as I would like.
>>>>>>>
>>>>>>> Another consideration here is that other PCIe capable qemu emulated
>>>>>>> devices, such as XHCI, will present fine as PCIe integrated endpoints
>>>>>>> when attached to the root bus.  Libvirt won't do that usually, of
>>>>>>> course, and it may not be the recommended way of doing things (on PC)
>>>>>>> but it's possible.  I don't see any particular reason that virtio-pci
>>>>>>> should enforce the root port requirement more so than any other
>>>>>>> device.
>>>>>>>
>>>>>>>> On 02/08/17 07:16, David Gibson wrote:
>>>>>>>>>
>>>>>>>>> pcie_endpoint_cap_init() already automatically adjusts to advertise as
>>>>>>>>> an integrated device rather than a "normal" PCIe endpoint when attached to
>>>>>>>>> a root bus.  So we can remove the check for root bus within virtio and
>>>>>>>>> allow (at the user's discretion) a PCIe virtio bus to be attached to a
>>>>>>>>> root bus.
>>>>>>>>
>>>>>>>> If Marcel thinks this is a good change, then I think we should go
>>>>>>>> through "docs/pcie.txt" with a fine-toothed comb, and update all
>>>>>>>> relevant spots. (If Marcel agrees, perhaps you can include such
>>>>>>>> hunks in your patch at once.)
>>>>>>>
>>>>>>> Actually, I think that would be a neverending process.  Maybe better
>>>>>>> to put in a whole different spapr-pcie.txt with the assorted ways that
>>>>>>> PAPR violates PCIe conventions.
>>>>>>
>>>>>> That works for me too, but I think it would be a lot more work for you
>>>>>> and others.
>>>>>>
>>>>>> I plan on consulting "docs/pcie.txt" frequently; among other things, for
>>>>>> deciding debates. Thus, improving the scope of "docs/pcie.txt" is very
>>>>>> welcome IMO.
>>>>>>
>>>>>>>
>>>>>>>> It also may have consequences for libvirt (but I see you addressed
>>>>>>>> Andrea at once, which is great).
>>>>>>>
>>>>>>> Right, I've been discussed this with Andrea all along.  We're working
>>>>>>> on a proposed PAPR specific way of allocating PCI and PCIe addresses
>>>>>>> (different from the PCIe normal way, but the same as each other).
>>>>>>> That will simplify adding PCIe support to PAPR, and also has some
>>>>>>> other advantages for PAPR guests (related to the platform specific
>>>>>>> isolation, hotplug  and error recovery mechanisms - also different
>>>>>>> from the normal PCIe ones).
>>>>>>
>>>>>> Great, if Andrea is aware, that's a relief.
>>>>>>
>>>>>> Can you resubmit this patch with a small hunk for "docs/pcie.txt" that
>>>>>> removes PAPR from the scope?
>>>>>
>>>>
>>>> Hi David,
>>>> Sorry for the delay, I just came back from PTO.
>>>>
>>>>> Well, first I'd like to see if Marcel knows of some reason I didn't
>>>>> think of why this test is important for virtio particularly here.  But
>>>>> assuming the basic idea is acceptable, then yes, I'll update pcie.txt.
>>>>>
>>>>
>>>> There are two reasons for keeping virtio Integrated Endpoints as PCI devices.
>>>> 1. The first point is generic; even if having PCIe devices as Integrated Endpoints should be OK,
>>>>    is not recommended because some guests may miss-behave (*). X86 arch supports a large number
>>>>    of guests and we don't want to check and fix everything if *we don't have to*.
>>>>    Even if is not written anywhere and there are actually some PCI Express Integrated Endpoints,
>>>>    most of them are legacy PCI devices (I actually think this is why we have Integrated Endpoints
>>>>    at all, but I might be wrong).
>>>
>>> Hm, ok.  Could we implement that restriction in the pci/pcie core
>>> rather than in the virtio device?
>>
>> I am not sure if we should do that. Most of the devices are PCI or PCIe.
>> Only some devices are "hybrid", the virtio devices, XHCI and I am not sure we have more.
>
> Ok, I see your point, the pcie core might not be right.  But it still
> seems really weird to have it explicitly in each hybrid device, even
> if it's just the two.  As the code stands right now, XHCI and virtio
> have different behaviour, without a clear reason for it.
>

I suppose XHCI can behave the same as virtio if Gerd has nothing against it
(remain PCI if plugged into the Root Complex), but I don't know how it will help your case.

> Maybe a hybrid_setup() helper function?
>

How would it help PAPR scenario? Anyway, Eduardo is working on supplying
new query interfaces for libvirt and he touches this subject, I think
he does plan to mark the hybrid devices in some way.


>>> That would then protect things like
>>> XHCI as well.
>>
>> I don't see a reason to have XHCI as Integrated Endpoint, I think it should be always
>> plugged into a root port. (for x86. arm and power)
>
> No, not for Power.  Well, ok, yes for most ppc machine types, but not
> for the paravirtualized 'pseries' machine (which is the one we most
> care about).
>
> Well.. I guess it doesn't need to be an Integrated Endpoint per se,
> but we should be able to have it appear on the guest root bus.
>
> The thing to realize is that the paravirtualized PCI interfaces in
> PAPR mean there's very little guest visible difference between PCI and
> PCIe.  In fact in most ways you could say that the paravirtualized bus
> operates like a vanilla PCI bus.. except that it does provide a way to
> access PCIe extended config space.

Maybe we can have a new bus type deriving from PCIBus and sibling of the PCIe Bus.
Then we can have the same rules as the PCI bus and add tweaks when necessary.

This does not solve the virtio problem since the code is actively looking for
a PCIe Root Port and we don't have it for PAPR.

   Which means that you can use it to
> drive PCIe devices just fine.  "Bus level" PCIe extensions like AER
> and PCIe standard hotplug won't work, but PAPR has its own mechanisms
> for those (common between PCI and PCIe).
>
> I did float the idea of having the pseries PCI bus remain plain PCI
> but with a special flag to allow PCIe devices to be attached to it
> anyway.  It wasn't greeted with much enthusiasm..
>

Can you point me to the discussion please? It seems similar to what I proposed above.

As you properly described it, is much closer to PCI then PCIe, even the only characteristic
that makes it "a little" PCIe, the Extended Configuration Space support,
is done with an alternative interface.

I agree the PAPR bus is not PCIe.

>>   And for my purposes it would also make it easier to
>>> implement aa machine type specific hook to re-allow that configuration
>>> on pseries.
>>
>> I agree we need a solution for PAPR.
>>
>> What about a pcie_papr() function and then:
>>
>> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
>> index 5ce42af..2c646ae 100644
>> --- a/hw/virtio/virtio-pci.c
>> +++ b/hw/virtio/virtio-pci.c
>> @@ -1804,7 +1804,7 @@ static void virtio_pci_realize(PCIDevice *pci_dev, Error **errp)
>>          return;
>>      }
>>
>> -    if (pcie_port && pci_is_express(pci_dev)) {
>> +    if ((pcie_port || pcie_parp()) && pci_is_express(pci_dev)) {
>>          int pos;
>>
>>          pos = pcie_endpoint_cap_init(pci_dev, 0);
>
> That would be sufficient, yes, so I'll take it if we don't come to any
> other solutions.

OK

  It still seems weird to me to have this logic within
> a specific device implementation though.
>

I suppose you have a point, let's wait for Gerd and maybe Michael to comment
on that, but anyway it will not help your case. (it will only make XHCI PCI if plugged into Root Complex)

>>>
>>> At the moment XHCI and virtio-pci behave differently, which seems less
>>> than ideal.
>>>
>>>> 2. The second point is virtio specific. Not all the guests have virtio 1.0 support (e.g RHEL 6) and we allow them
>>>>    to use legacy virtio devices as Integrated Endpoints (following the thought that this is why we have Integrated Endpoints)
>>>>    Making the virtio devices PCI Express, but not virtio 1.0 is also problematic since now we will have too much
>>>>    types of virtio devices. We want to keep it simple: virtio legacy
>>>>    -> PCI, virtio modern -> PCIe.
>>>
>>> Ok.. it's not obvious to me why integrated endpoint vs. under a root
>>> port is relevant to this.  Can't we enable/disable PCIe mode based
>>> directly on the legacy/modern settings?
>>>
>>
>> Yes, we can, but we don't want to do that. Previous setups will stop working
>> and we will need libvirt to mange the disable-* properties.
>> As a matter of fact the code today is after some discussions with libvirt guys.
>
> Uh.. I think I'm missing something.. but it's still not clear to me
> why this would break existing setups or impose more work on libvirt.
>

Long story short, libvirt guys don't want to manually set the disable-* properties
of virtio devices to make them comply to our goal (legacy -> PCI, modern PCIe).
They also don't want to look at QEMU version/machine type to make a decision on the above properties.

I agree the solution is not perfect, but at least it makes our testing matrix smaller
and makes our PCIe guidelines a little easier to understand (e.g. all supported devices are PCIe,
but if want legacy PCI devices put them on the Root Complex)

XHCI being PCIe on Root Complex is an unintended exception, but we want it connected to a
Root Port anyway, we don't have anything to gain from having it as Integrated Endpoint.
We only loose a slot that can be used by 8 Root Ports assembled into one multi-function device.

PAPR bus should not be considered PCIe and should have a different set of rules allowing PCIe
devices to be plugged into Root Complex.

Thanks,
Marcel

>>>> (*) A while ago Alex Williamson found such of issue, I think is this one:
>>>> 0282ab (vfio/pci: Hide device PCIe capability on non-express buses for PCIe VMs)
>>>
>>> It's also not clear to me why this fix is relevant to the question.
>>> That change disables the PCIe capability on a bus which is
>>> not-express, but is under an express root bridge (and is therefore
>>> clearly *not* on the root bus).  For the case I'm talking about the
>>> *is* on an express bus and it *is* the root bus.  AFAICT that patch
>>> would be relevant only for devices under a PCIe-to-PCI bridge on a
>>> PCIe system.
>>>
>>
>> I might have selected the wrong patch.
>
> Ok.
>
>>
>> Thanks,
>> Marcel
>>
>>
>>>>>> be appreciated too, if that makes sense. (By default we aim at
>>>>>> multi-arch / multi-target with this document; we may not have stated it
>>>>>> explicitly, but AFAIR we intend to cover aarch64 / "virt" too.)
>>>>>
>>>>> Right, that was my understanding as well.
>>>>>
>>>>
>>>> Indeed, we want the document to support them all. If PAPR is different, we should mention it.
>>>
>>> Sure.  I'm trying to work out what we can/should do for pseries first;
>>> I'll write something up for the docs when I have something I think is
>>> ready to merge.
>>>
>>
>


Re: [Qemu-devel] [RFC] virtio-pci: Allow PCIe virtio devices on root bus
Posted by David Gibson 7 years, 1 month ago
On Tue, Feb 14, 2017 at 02:53:08PM +0200, Marcel Apfelbaum wrote:
> On 02/14/2017 06:15 AM, David Gibson wrote:
> > On Mon, Feb 13, 2017 at 12:14:23PM +0200, Marcel Apfelbaum wrote:
> > > On 02/13/2017 06:33 AM, David Gibson wrote:
> > > > On Sun, Feb 12, 2017 at 09:05:46PM +0200, Marcel Apfelbaum wrote:
> > > > > On 02/10/2017 02:37 AM, David Gibson wrote:
> > > > > > On Thu, Feb 09, 2017 at 10:04:47AM +0100, Laszlo Ersek wrote:
> > > > > > > On 02/09/17 05:16, David Gibson wrote:
> > > > > > > > On Wed, Feb 08, 2017 at 11:40:50AM +0100, Laszlo Ersek wrote:
> > > > > > > > > On 02/08/17 07:16, David Gibson wrote:
> > > > > > > > > > Marcel,
> > > > > > > > > > 
> > > > > > > > > > Your original patch adding PCIe support to virtio-pci.c has the
> > > > > > > > > > limitation noted below that PCIe won't be enabled if the device is on
> > > > > > > > > > the root bus (rather than under a root or downstream port).  As
> > > > > > > > > > reasoned below, I think removing the check is correct, even for x86
> > > > > > > > > > (though it would rarely be useful there).  But I could well have
> > > > > > > > > > missed something.  Let me know if so...
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > Virtio devices can appear as either vanilla PCI or PCI-Express devices
> > > > > > > > > > depending on the bus they're connected to.  At the moment it will only
> > > > > > > > > > appear as vanilla PCI if connected to the root bus of a PCIe host bridge.
> > > > > > > > > > 
> > > > > > > > > > Presumably this is to reflect the fact that PCIe devices usually need to
> > > > > > > > > > be connected to a root (or further downstream) port rather than directly
> > > > > > > > > > on the root bus.  However, due to the odd requirements of the PAPR spec on the 'pseries'
> > > > > > > > > > machine type, it's normal for PCIe devices to appear on the root bus
> > > > > > > > > > without root ports.
> > > > > > > > > > 
> > > > > > > > > > Further, even on x86, there's no inherent reason we couldn't present a
> > > > > > > > > > virtio device as an "integrated device" (typically used for things built
> > > > > > > > > > into the PCI chipset), and those devices *do* typically appear on the root
> > > > > > > > > > bus.
> > > > > > > > > 
> > > > > > > > > I'm not personally making a counter-argument, just qouting some of
> > > > > > > > > the relevant parts of "docs/pcie.txt" ("PCI EXPRESS GUIDELINES"):
> > > > > > > > 
> > > > > > > > So, an earlier discussion more or less concluded that the PCIe
> > > > > > > > guidelines don't really work with PAPR guests.  That comes because
> > > > > > > > PAPR was designed with PowerVM in mind which allows PCI passthrough
> > > > > > > > but doesn't do any emulated PCI devices.  So they wanted to present
> > > > > > > > passed through devices (virtual or phyical) to the guest without
> > > > > > > > inserting virtual root ports.
> > > > > > > > 
> > > > > > > > Now, you can argue that this was a silly decision in PAPR, and you
> > > > > > > > could well be right, but there it is.
> > > > > > > 
> > > > > > > I can totally accept this, but then we should state it as a fact near
> > > > > > > the top of "docs/pcie.txt".
> > > > > > > 
> > > > > > > > 
> > > > > > > > > > Place only the following kinds of devices directly on the Root Complex:
> > > > > > > > > >     (1) PCI Devices (e.g. network card, graphics card, IDE controller),
> > > > > > > > > >         not controllers. Place only legacy PCI devices on
> > > > > > > > > >         the Root Complex. These will be considered Integrated Endpoints.
> > > > > > > > > >         Note: Integrated Endpoints are not hot-pluggable.
> > > > > > > > > > 
> > > > > > > > > >         Although the PCI Express spec does not forbid PCI Express devices as
> > > > > > > > > >         Integrated Endpoints, existing hardware mostly integrates legacy PCI
> > > > > > > > > >         devices with the Root Complex.
> > > > > > > > 
> > > > > > > > "Mostly".. on my laptop at least the GPU shows up as an integrated PCI
> > > > > > > > Express endpoint, so it's certainly not the case that *all* root bus
> > > > > > > > devices are legacy.
> > > > > > > > 
> > > > > > > > > Guest OSes are suspected to behave
> > > > > > > > > >         strangely when PCI Express devices are integrated
> > > > > > > > > >         with the Root Complex.
> > > > > > > > 
> > > > > > > > Clearly not that strangely, that often, since my laptop works just fine.
> > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > >     [...]
> > > > > > > > > > 
> > > > > > > > > > 2.2 PCI Express only hierarchy
> > > > > > > > > > ==============================
> > > > > > > > > > Always use PCI Express Root Ports to start PCI Express hierarchies.
> > > > > > > > > 
> > > > > > > > > Above you mention "it's normal for PCIe devices to appear on the root bus without root ports".
> > > > > > > > 
> > > > > > > > Well "normal" perhaps wasn't the right word.  Let's say precedented,
> > > > > > > > if uncommon.
> > > > > > > > 
> > > > > > > > > Let me turn the question around: is it a *problem* for "pseries" if
> > > > > > > > > we require root ports? If so, why exactly?
> > > > > > > > 
> > > > > > > > That's.. a complex question.  At least Linux guests (and we don't
> > > > > > > > support any others yet) might cope with the addition of root ports.
> > > > > > > > Maybe.  I have discussed this option with BenH and others.
> > > > > > > > 
> > > > > > > > However it is gratuitously different from how PCIe devices will
> > > > > > > > typically appear for the same guest running under PowerVM.  Although I
> > > > > > > > suspect Linux would cope with the "normal standard" rather than "PAPR
> > > > > > > > standard" presentation, I'm not as confident about it as I would like.
> > > > > > > > 
> > > > > > > > Another consideration here is that other PCIe capable qemu emulated
> > > > > > > > devices, such as XHCI, will present fine as PCIe integrated endpoints
> > > > > > > > when attached to the root bus.  Libvirt won't do that usually, of
> > > > > > > > course, and it may not be the recommended way of doing things (on PC)
> > > > > > > > but it's possible.  I don't see any particular reason that virtio-pci
> > > > > > > > should enforce the root port requirement more so than any other
> > > > > > > > device.
> > > > > > > > 
> > > > > > > > > On 02/08/17 07:16, David Gibson wrote:
> > > > > > > > > > 
> > > > > > > > > > pcie_endpoint_cap_init() already automatically adjusts to advertise as
> > > > > > > > > > an integrated device rather than a "normal" PCIe endpoint when attached to
> > > > > > > > > > a root bus.  So we can remove the check for root bus within virtio and
> > > > > > > > > > allow (at the user's discretion) a PCIe virtio bus to be attached to a
> > > > > > > > > > root bus.
> > > > > > > > > 
> > > > > > > > > If Marcel thinks this is a good change, then I think we should go
> > > > > > > > > through "docs/pcie.txt" with a fine-toothed comb, and update all
> > > > > > > > > relevant spots. (If Marcel agrees, perhaps you can include such
> > > > > > > > > hunks in your patch at once.)
> > > > > > > > 
> > > > > > > > Actually, I think that would be a neverending process.  Maybe better
> > > > > > > > to put in a whole different spapr-pcie.txt with the assorted ways that
> > > > > > > > PAPR violates PCIe conventions.
> > > > > > > 
> > > > > > > That works for me too, but I think it would be a lot more work for you
> > > > > > > and others.
> > > > > > > 
> > > > > > > I plan on consulting "docs/pcie.txt" frequently; among other things, for
> > > > > > > deciding debates. Thus, improving the scope of "docs/pcie.txt" is very
> > > > > > > welcome IMO.
> > > > > > > 
> > > > > > > > 
> > > > > > > > > It also may have consequences for libvirt (but I see you addressed
> > > > > > > > > Andrea at once, which is great).
> > > > > > > > 
> > > > > > > > Right, I've been discussed this with Andrea all along.  We're working
> > > > > > > > on a proposed PAPR specific way of allocating PCI and PCIe addresses
> > > > > > > > (different from the PCIe normal way, but the same as each other).
> > > > > > > > That will simplify adding PCIe support to PAPR, and also has some
> > > > > > > > other advantages for PAPR guests (related to the platform specific
> > > > > > > > isolation, hotplug  and error recovery mechanisms - also different
> > > > > > > > from the normal PCIe ones).
> > > > > > > 
> > > > > > > Great, if Andrea is aware, that's a relief.
> > > > > > > 
> > > > > > > Can you resubmit this patch with a small hunk for "docs/pcie.txt" that
> > > > > > > removes PAPR from the scope?
> > > > > > 
> > > > > 
> > > > > Hi David,
> > > > > Sorry for the delay, I just came back from PTO.
> > > > > 
> > > > > > Well, first I'd like to see if Marcel knows of some reason I didn't
> > > > > > think of why this test is important for virtio particularly here.  But
> > > > > > assuming the basic idea is acceptable, then yes, I'll update pcie.txt.
> > > > > > 
> > > > > 
> > > > > There are two reasons for keeping virtio Integrated Endpoints as PCI devices.
> > > > > 1. The first point is generic; even if having PCIe devices as Integrated Endpoints should be OK,
> > > > >    is not recommended because some guests may miss-behave (*). X86 arch supports a large number
> > > > >    of guests and we don't want to check and fix everything if *we don't have to*.
> > > > >    Even if is not written anywhere and there are actually some PCI Express Integrated Endpoints,
> > > > >    most of them are legacy PCI devices (I actually think this is why we have Integrated Endpoints
> > > > >    at all, but I might be wrong).
> > > > 
> > > > Hm, ok.  Could we implement that restriction in the pci/pcie core
> > > > rather than in the virtio device?
> > > 
> > > I am not sure if we should do that. Most of the devices are PCI or PCIe.
> > > Only some devices are "hybrid", the virtio devices, XHCI and I am not sure we have more.
> > 
> > Ok, I see your point, the pcie core might not be right.  But it still
> > seems really weird to have it explicitly in each hybrid device, even
> > if it's just the two.  As the code stands right now, XHCI and virtio
> > have different behaviour, without a clear reason for it.
> > 
> 
> I suppose XHCI can behave the same as virtio if Gerd has nothing against it
> (remain PCI if plugged into the Root Complex), but I don't know how it will help your case.

> > Maybe a hybrid_setup() helper function?
> > 
> 
> How would it help PAPR scenario?

It doesn't, directly, but it means there would only be a single place
to fix with a PAPR hack, instead of both virtio and XHCI and any
future hybrid devices.

> Anyway, Eduardo is working on supplying
> new query interfaces for libvirt and he touches this subject, I think
> he does plan to mark the hybrid devices in some way.
> 
> 
> > > > That would then protect things like
> > > > XHCI as well.
> > > 
> > > I don't see a reason to have XHCI as Integrated Endpoint, I think it should be always
> > > plugged into a root port. (for x86. arm and power)
> > 
> > No, not for Power.  Well, ok, yes for most ppc machine types, but not
> > for the paravirtualized 'pseries' machine (which is the one we most
> > care about).
> > 
> > Well.. I guess it doesn't need to be an Integrated Endpoint per se,
> > but we should be able to have it appear on the guest root bus.
> > 
> > The thing to realize is that the paravirtualized PCI interfaces in
> > PAPR mean there's very little guest visible difference between PCI and
> > PCIe.  In fact in most ways you could say that the paravirtualized bus
> > operates like a vanilla PCI bus.. except that it does provide a way to
> > access PCIe extended config space.
> 
> Maybe we can have a new bus type deriving from PCIBus and sibling of the PCIe Bus.
> Then we can have the same rules as the PCI bus and add tweaks when
> necessary.

Sounds possible.  I guess it would need to return true for
pci_bus_is_express(), so that PCIe devices will attach to it.  In
which case I'm not entirely sure how it would differ from a PCIe bus
from the qemu side.  AFAICT with the exception of the virtio check and
maybe a handful of others, the PCIe placement rules are handled by
libvirt, rather than by qemu.

> This does not solve the virtio problem since the code is actively looking for
> a PCIe Root Port and we don't have it for PAPR.

Technically, it's not even looking for a root port, just excluding
being on the root bus itself.  I guess any kind of bridge that has
something PCI-ish on the upstream side and PCIe on the downstream side
more or less has to be either a root port or a PCIe switch downstream
port, though.

>   Which means that you can use it to
> > drive PCIe devices just fine.  "Bus level" PCIe extensions like AER
> > and PCIe standard hotplug won't work, but PAPR has its own mechanisms
> > for those (common between PCI and PCIe).
> > 
> > I did float the idea of having the pseries PCI bus remain plain PCI
> > but with a special flag to allow PCIe devices to be attached to it
> > anyway.  It wasn't greeted with much enthusiasm..
> > 
> 
> Can you point me to the discussion please? It seems similar to what I proposed above.

Sorry, I was misleading.  I think I just raised that idea with Andrea
and a few other people internally, not on one of the lists at large.

> As you properly described it, is much closer to PCI then PCIe, even the only characteristic
> that makes it "a little" PCIe, the Extended Configuration Space support,
> is done with an alternative interface.
> 
> I agree the PAPR bus is not PCIe.

Ok, so if we take that direction, the question becomes how do we let
PCIe devices plug into this mostly-not-PCIe bus.  Maybe introduce a
"pci_bus_accepts_express()" function that will replace many, but not
all current uses of "pci_bus_is_express()"?

Such a helper could maybe simplify the logic in virtio-pci (and XHCI?)
by returning false on an x86 root bus.

> 
> > >   And for my purposes it would also make it easier to
> > > > implement aa machine type specific hook to re-allow that configuration
> > > > on pseries.
> > > 
> > > I agree we need a solution for PAPR.
> > > 
> > > What about a pcie_papr() function and then:
> > > 
> > > diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> > > index 5ce42af..2c646ae 100644
> > > --- a/hw/virtio/virtio-pci.c
> > > +++ b/hw/virtio/virtio-pci.c
> > > @@ -1804,7 +1804,7 @@ static void virtio_pci_realize(PCIDevice *pci_dev, Error **errp)
> > >          return;
> > >      }
> > > 
> > > -    if (pcie_port && pci_is_express(pci_dev)) {
> > > +    if ((pcie_port || pcie_parp()) && pci_is_express(pci_dev)) {
> > >          int pos;
> > > 
> > >          pos = pcie_endpoint_cap_init(pci_dev, 0);
> > 
> > That would be sufficient, yes, so I'll take it if we don't come to any
> > other solutions.
> 
> OK
> 
>  It still seems weird to me to have this logic within
> > a specific device implementation though.
> > 
> 
> I suppose you have a point, let's wait for Gerd and maybe Michael to comment
> on that, but anyway it will not help your case. (it will only make XHCI PCI if plugged into Root Complex)

It won't help directly, but it means we could put that pcie_papr()
test in just the common code, rather than having to look at every
hybrid and PCIe device.

> > > > 
> > > > At the moment XHCI and virtio-pci behave differently, which seems less
> > > > than ideal.
> > > > 
> > > > > 2. The second point is virtio specific. Not all the guests have virtio 1.0 support (e.g RHEL 6) and we allow them
> > > > >    to use legacy virtio devices as Integrated Endpoints (following the thought that this is why we have Integrated Endpoints)
> > > > >    Making the virtio devices PCI Express, but not virtio 1.0 is also problematic since now we will have too much
> > > > >    types of virtio devices. We want to keep it simple: virtio legacy
> > > > >    -> PCI, virtio modern -> PCIe.
> > > > 
> > > > Ok.. it's not obvious to me why integrated endpoint vs. under a root
> > > > port is relevant to this.  Can't we enable/disable PCIe mode based
> > > > directly on the legacy/modern settings?
> > > > 
> > > 
> > > Yes, we can, but we don't want to do that. Previous setups will stop working
> > > and we will need libvirt to mange the disable-* properties.
> > > As a matter of fact the code today is after some discussions with libvirt guys.
> > 
> > Uh.. I think I'm missing something.. but it's still not clear to me
> > why this would break existing setups or impose more work on libvirt.
> > 
> 
> Long story short, libvirt guys don't want to manually set the disable-* properties
> of virtio devices to make them comply to our goal (legacy -> PCI, modern PCIe).
> They also don't want to look at QEMU version/machine type to make a decision on the above properties.
> 
> I agree the solution is not perfect, but at least it makes our testing matrix smaller
> and makes our PCIe guidelines a little easier to understand (e.g. all supported devices are PCIe,
> but if want legacy PCI devices put them on the Root Complex)

Ok.. I'm still missing something.  Are you saying that instead of the
legacy/modern status determining PCIe support, that PCIe status is
determining legacy/modern support by default?  That would actually
seem to simplify things: whatever method we end up allowing PCIe
virtio on PAPR, that should automatically enable modern mode, which is
fine.

Although.. I do wonder about legacy/modern for PAPR in general.
Current kernels will support virtio 1.0 fine, but we have older
released versions which only support legacy.  Unlike PC we won't (I
hope) have a whole machine type switch to handle this.  At this stage
I think we want virtio devices (whatever their bus type) on PAPR to
default to allowing both legacy and modern for the forseeable future.
How does that affect the matrix?

> XHCI being PCIe on Root Complex is an unintended exception, but we want it connected to a
> Root Port anyway, we don't have anything to gain from having it as Integrated Endpoint.
> We only loose a slot that can be used by 8 Root Ports assembled into one multi-function device.
> 
> PAPR bus should not be considered PCIe and should have a different set of rules allowing PCIe
> devices to be plugged into Root Complex.

Alright, I can work with that.  Michael, Andrea, how does this idea
seem to you?

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson
Re: [Qemu-devel] [RFC] virtio-pci: Allow PCIe virtio devices on root bus
Posted by Marcel Apfelbaum 7 years, 1 month ago
On 02/15/2017 03:45 AM, David Gibson wrote:
> On Tue, Feb 14, 2017 at 02:53:08PM +0200, Marcel Apfelbaum wrote:
>> On 02/14/2017 06:15 AM, David Gibson wrote:
>>> On Mon, Feb 13, 2017 at 12:14:23PM +0200, Marcel Apfelbaum wrote:
>>>> On 02/13/2017 06:33 AM, David Gibson wrote:
>>>>> On Sun, Feb 12, 2017 at 09:05:46PM +0200, Marcel Apfelbaum wrote:
>>>>>> On 02/10/2017 02:37 AM, David Gibson wrote:
>>>>>>> On Thu, Feb 09, 2017 at 10:04:47AM +0100, Laszlo Ersek wrote:
>>>>>>>> On 02/09/17 05:16, David Gibson wrote:
>>>>>>>>> On Wed, Feb 08, 2017 at 11:40:50AM +0100, Laszlo Ersek wrote:
>>>>>>>>>> On 02/08/17 07:16, David Gibson wrote:
>>>>>>>>>>> Marcel,
>>>>>>>>>>>
>>>>>>>>>>> Your original patch adding PCIe support to virtio-pci.c has the
>>>>>>>>>>> limitation noted below that PCIe won't be enabled if the device is on
>>>>>>>>>>> the root bus (rather than under a root or downstream port).  As
>>>>>>>>>>> reasoned below, I think removing the check is correct, even for x86
>>>>>>>>>>> (though it would rarely be useful there).  But I could well have
>>>>>>>>>>> missed something.  Let me know if so...
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Virtio devices can appear as either vanilla PCI or PCI-Express devices
>>>>>>>>>>> depending on the bus they're connected to.  At the moment it will only
>>>>>>>>>>> appear as vanilla PCI if connected to the root bus of a PCIe host bridge.
>>>>>>>>>>>
>>>>>>>>>>> Presumably this is to reflect the fact that PCIe devices usually need to
>>>>>>>>>>> be connected to a root (or further downstream) port rather than directly
>>>>>>>>>>> on the root bus.  However, due to the odd requirements of the PAPR spec on the 'pseries'
>>>>>>>>>>> machine type, it's normal for PCIe devices to appear on the root bus
>>>>>>>>>>> without root ports.
>>>>>>>>>>>
>>>>>>>>>>> Further, even on x86, there's no inherent reason we couldn't present a
>>>>>>>>>>> virtio device as an "integrated device" (typically used for things built
>>>>>>>>>>> into the PCI chipset), and those devices *do* typically appear on the root
>>>>>>>>>>> bus.
>>>>>>>>>>
>>>>>>>>>> I'm not personally making a counter-argument, just qouting some of
>>>>>>>>>> the relevant parts of "docs/pcie.txt" ("PCI EXPRESS GUIDELINES"):
>>>>>>>>>
>>>>>>>>> So, an earlier discussion more or less concluded that the PCIe
>>>>>>>>> guidelines don't really work with PAPR guests.  That comes because
>>>>>>>>> PAPR was designed with PowerVM in mind which allows PCI passthrough
>>>>>>>>> but doesn't do any emulated PCI devices.  So they wanted to present
>>>>>>>>> passed through devices (virtual or phyical) to the guest without
>>>>>>>>> inserting virtual root ports.
>>>>>>>>>
>>>>>>>>> Now, you can argue that this was a silly decision in PAPR, and you
>>>>>>>>> could well be right, but there it is.
>>>>>>>>
>>>>>>>> I can totally accept this, but then we should state it as a fact near
>>>>>>>> the top of "docs/pcie.txt".
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>> Place only the following kinds of devices directly on the Root Complex:
>>>>>>>>>>>     (1) PCI Devices (e.g. network card, graphics card, IDE controller),
>>>>>>>>>>>         not controllers. Place only legacy PCI devices on
>>>>>>>>>>>         the Root Complex. These will be considered Integrated Endpoints.
>>>>>>>>>>>         Note: Integrated Endpoints are not hot-pluggable.
>>>>>>>>>>>
>>>>>>>>>>>         Although the PCI Express spec does not forbid PCI Express devices as
>>>>>>>>>>>         Integrated Endpoints, existing hardware mostly integrates legacy PCI
>>>>>>>>>>>         devices with the Root Complex.
>>>>>>>>>
>>>>>>>>> "Mostly".. on my laptop at least the GPU shows up as an integrated PCI
>>>>>>>>> Express endpoint, so it's certainly not the case that *all* root bus
>>>>>>>>> devices are legacy.
>>>>>>>>>
>>>>>>>>>> Guest OSes are suspected to behave
>>>>>>>>>>>         strangely when PCI Express devices are integrated
>>>>>>>>>>>         with the Root Complex.
>>>>>>>>>
>>>>>>>>> Clearly not that strangely, that often, since my laptop works just fine.
>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>     [...]
>>>>>>>>>>>
>>>>>>>>>>> 2.2 PCI Express only hierarchy
>>>>>>>>>>> ==============================
>>>>>>>>>>> Always use PCI Express Root Ports to start PCI Express hierarchies.
>>>>>>>>>>
>>>>>>>>>> Above you mention "it's normal for PCIe devices to appear on the root bus without root ports".
>>>>>>>>>
>>>>>>>>> Well "normal" perhaps wasn't the right word.  Let's say precedented,
>>>>>>>>> if uncommon.
>>>>>>>>>
>>>>>>>>>> Let me turn the question around: is it a *problem* for "pseries" if
>>>>>>>>>> we require root ports? If so, why exactly?
>>>>>>>>>
>>>>>>>>> That's.. a complex question.  At least Linux guests (and we don't
>>>>>>>>> support any others yet) might cope with the addition of root ports.
>>>>>>>>> Maybe.  I have discussed this option with BenH and others.
>>>>>>>>>
>>>>>>>>> However it is gratuitously different from how PCIe devices will
>>>>>>>>> typically appear for the same guest running under PowerVM.  Although I
>>>>>>>>> suspect Linux would cope with the "normal standard" rather than "PAPR
>>>>>>>>> standard" presentation, I'm not as confident about it as I would like.
>>>>>>>>>
>>>>>>>>> Another consideration here is that other PCIe capable qemu emulated
>>>>>>>>> devices, such as XHCI, will present fine as PCIe integrated endpoints
>>>>>>>>> when attached to the root bus.  Libvirt won't do that usually, of
>>>>>>>>> course, and it may not be the recommended way of doing things (on PC)
>>>>>>>>> but it's possible.  I don't see any particular reason that virtio-pci
>>>>>>>>> should enforce the root port requirement more so than any other
>>>>>>>>> device.
>>>>>>>>>
>>>>>>>>>> On 02/08/17 07:16, David Gibson wrote:
>>>>>>>>>>>
>>>>>>>>>>> pcie_endpoint_cap_init() already automatically adjusts to advertise as
>>>>>>>>>>> an integrated device rather than a "normal" PCIe endpoint when attached to
>>>>>>>>>>> a root bus.  So we can remove the check for root bus within virtio and
>>>>>>>>>>> allow (at the user's discretion) a PCIe virtio bus to be attached to a
>>>>>>>>>>> root bus.
>>>>>>>>>>
>>>>>>>>>> If Marcel thinks this is a good change, then I think we should go
>>>>>>>>>> through "docs/pcie.txt" with a fine-toothed comb, and update all
>>>>>>>>>> relevant spots. (If Marcel agrees, perhaps you can include such
>>>>>>>>>> hunks in your patch at once.)
>>>>>>>>>
>>>>>>>>> Actually, I think that would be a neverending process.  Maybe better
>>>>>>>>> to put in a whole different spapr-pcie.txt with the assorted ways that
>>>>>>>>> PAPR violates PCIe conventions.
>>>>>>>>
>>>>>>>> That works for me too, but I think it would be a lot more work for you
>>>>>>>> and others.
>>>>>>>>
>>>>>>>> I plan on consulting "docs/pcie.txt" frequently; among other things, for
>>>>>>>> deciding debates. Thus, improving the scope of "docs/pcie.txt" is very
>>>>>>>> welcome IMO.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> It also may have consequences for libvirt (but I see you addressed
>>>>>>>>>> Andrea at once, which is great).
>>>>>>>>>
>>>>>>>>> Right, I've been discussed this with Andrea all along.  We're working
>>>>>>>>> on a proposed PAPR specific way of allocating PCI and PCIe addresses
>>>>>>>>> (different from the PCIe normal way, but the same as each other).
>>>>>>>>> That will simplify adding PCIe support to PAPR, and also has some
>>>>>>>>> other advantages for PAPR guests (related to the platform specific
>>>>>>>>> isolation, hotplug  and error recovery mechanisms - also different
>>>>>>>>> from the normal PCIe ones).
>>>>>>>>
>>>>>>>> Great, if Andrea is aware, that's a relief.
>>>>>>>>
>>>>>>>> Can you resubmit this patch with a small hunk for "docs/pcie.txt" that
>>>>>>>> removes PAPR from the scope?
>>>>>>>
>>>>>>
>>>>>> Hi David,
>>>>>> Sorry for the delay, I just came back from PTO.
>>>>>>
>>>>>>> Well, first I'd like to see if Marcel knows of some reason I didn't
>>>>>>> think of why this test is important for virtio particularly here.  But
>>>>>>> assuming the basic idea is acceptable, then yes, I'll update pcie.txt.
>>>>>>>
>>>>>>
>>>>>> There are two reasons for keeping virtio Integrated Endpoints as PCI devices.
>>>>>> 1. The first point is generic; even if having PCIe devices as Integrated Endpoints should be OK,
>>>>>>    is not recommended because some guests may miss-behave (*). X86 arch supports a large number
>>>>>>    of guests and we don't want to check and fix everything if *we don't have to*.
>>>>>>    Even if is not written anywhere and there are actually some PCI Express Integrated Endpoints,
>>>>>>    most of them are legacy PCI devices (I actually think this is why we have Integrated Endpoints
>>>>>>    at all, but I might be wrong).
>>>>>
>>>>> Hm, ok.  Could we implement that restriction in the pci/pcie core
>>>>> rather than in the virtio device?
>>>>
>>>> I am not sure if we should do that. Most of the devices are PCI or PCIe.
>>>> Only some devices are "hybrid", the virtio devices, XHCI and I am not sure we have more.
>>>
>>> Ok, I see your point, the pcie core might not be right.  But it still
>>> seems really weird to have it explicitly in each hybrid device, even
>>> if it's just the two.  As the code stands right now, XHCI and virtio
>>> have different behaviour, without a clear reason for it.
>>>
>>
>> I suppose XHCI can behave the same as virtio if Gerd has nothing against it
>> (remain PCI if plugged into the Root Complex), but I don't know how it will help your case.
>
>>> Maybe a hybrid_setup() helper function?
>>>
>>
>> How would it help PAPR scenario?
>
> It doesn't, directly, but it means there would only be a single place
> to fix with a PAPR hack, instead of both virtio and XHCI and any
> future hybrid devices.
>
>> Anyway, Eduardo is working on supplying
>> new query interfaces for libvirt and he touches this subject, I think
>> he does plan to mark the hybrid devices in some way.
>>
>>
>>>>> That would then protect things like
>>>>> XHCI as well.
>>>>
>>>> I don't see a reason to have XHCI as Integrated Endpoint, I think it should be always
>>>> plugged into a root port. (for x86. arm and power)
>>>
>>> No, not for Power.  Well, ok, yes for most ppc machine types, but not
>>> for the paravirtualized 'pseries' machine (which is the one we most
>>> care about).
>>>
>>> Well.. I guess it doesn't need to be an Integrated Endpoint per se,
>>> but we should be able to have it appear on the guest root bus.
>>>
>>> The thing to realize is that the paravirtualized PCI interfaces in
>>> PAPR mean there's very little guest visible difference between PCI and
>>> PCIe.  In fact in most ways you could say that the paravirtualized bus
>>> operates like a vanilla PCI bus.. except that it does provide a way to
>>> access PCIe extended config space.
>>
>> Maybe we can have a new bus type deriving from PCIBus and sibling of the PCIe Bus.
>> Then we can have the same rules as the PCI bus and add tweaks when
>> necessary.
>
> Sounds possible.  I guess it would need to return true for
> pci_bus_is_express(), so that PCIe devices will attach to it.  In
> which case I'm not entirely sure how it would differ from a PCIe bus
> from the qemu side.  AFAICT with the exception of the virtio check and
> maybe a handful of others, the PCIe placement rules are handled by
> libvirt, rather than by qemu.
>
>> This does not solve the virtio problem since the code is actively looking for
>> a PCIe Root Port and we don't have it for PAPR.
>
> Technically, it's not even looking for a root port, just excluding
> being on the root bus itself.  I guess any kind of bridge that has
> something PCI-ish on the upstream side and PCIe on the downstream side
> more or less has to be either a root port or a PCIe switch downstream
> port, though.
>
>>   Which means that you can use it to
>>> drive PCIe devices just fine.  "Bus level" PCIe extensions like AER
>>> and PCIe standard hotplug won't work, but PAPR has its own mechanisms
>>> for those (common between PCI and PCIe).
>>>
>>> I did float the idea of having the pseries PCI bus remain plain PCI
>>> but with a special flag to allow PCIe devices to be attached to it
>>> anyway.  It wasn't greeted with much enthusiasm..
>>>
>>
>> Can you point me to the discussion please? It seems similar to what I proposed above.
>
> Sorry, I was misleading.  I think I just raised that idea with Andrea
> and a few other people internally, not on one of the lists at large.
>
>> As you properly described it, is much closer to PCI then PCIe, even the only characteristic
>> that makes it "a little" PCIe, the Extended Configuration Space support,
>> is done with an alternative interface.
>>
>> I agree the PAPR bus is not PCIe.
>
> Ok, so if we take that direction, the question becomes how do we let
> PCIe devices plug into this mostly-not-PCIe bus.  Maybe introduce a
> "pci_bus_accepts_express()" function that will replace many, but not
> all current uses of "pci_bus_is_express()"?
>

Sounds good and I think Eduardo is already working on exactly this
idea, however he is on PTO now. It is better to synchronize with him.

> Such a helper could maybe simplify the logic in virtio-pci (and XHCI?)
> by returning false on an x86 root bus.
>

The rule would me more complicated. We don't want to completely remove the
possibility to have PCIe devices as part of Root Complex. it seems
like I am contradicting myself, but no).
This is why we have guidelines and  not hard-coded policies.
Also ,the QEMU way is to be more permissive. We provide guidelines and sane
defaults, but we let the user to chose.

Getting back to our problem, the rule would be:
hybrid devices should be PCI or PCIe for a bus?
PAPR bus should return 'PCIe' for hybrid devices.
X86 bus should return 'PCIe' if not root.


>>
>>>>   And for my purposes it would also make it easier to
>>>>> implement aa machine type specific hook to re-allow that configuration
>>>>> on pseries.
>>>>
>>>> I agree we need a solution for PAPR.
>>>>
>>>> What about a pcie_papr() function and then:
>>>>
>>>> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
>>>> index 5ce42af..2c646ae 100644
>>>> --- a/hw/virtio/virtio-pci.c
>>>> +++ b/hw/virtio/virtio-pci.c
>>>> @@ -1804,7 +1804,7 @@ static void virtio_pci_realize(PCIDevice *pci_dev, Error **errp)
>>>>          return;
>>>>      }
>>>>
>>>> -    if (pcie_port && pci_is_express(pci_dev)) {
>>>> +    if ((pcie_port || pcie_parp()) && pci_is_express(pci_dev)) {
>>>>          int pos;
>>>>
>>>>          pos = pcie_endpoint_cap_init(pci_dev, 0);
>>>
>>> That would be sufficient, yes, so I'll take it if we don't come to any
>>> other solutions.
>>
>> OK
>>
>>  It still seems weird to me to have this logic within
>>> a specific device implementation though.
>>>
>>
>> I suppose you have a point, let's wait for Gerd and maybe Michael to comment
>> on that, but anyway it will not help your case. (it will only make XHCI PCI if plugged into Root Complex)
>
> It won't help directly, but it means we could put that pcie_papr()
> test in just the common code, rather than having to look at every
> hybrid and PCIe device.
>
>>>>>
>>>>> At the moment XHCI and virtio-pci behave differently, which seems less
>>>>> than ideal.
>>>>>
>>>>>> 2. The second point is virtio specific. Not all the guests have virtio 1.0 support (e.g RHEL 6) and we allow them
>>>>>>    to use legacy virtio devices as Integrated Endpoints (following the thought that this is why we have Integrated Endpoints)
>>>>>>    Making the virtio devices PCI Express, but not virtio 1.0 is also problematic since now we will have too much
>>>>>>    types of virtio devices. We want to keep it simple: virtio legacy
>>>>>>    -> PCI, virtio modern -> PCIe.
>>>>>
>>>>> Ok.. it's not obvious to me why integrated endpoint vs. under a root
>>>>> port is relevant to this.  Can't we enable/disable PCIe mode based
>>>>> directly on the legacy/modern settings?
>>>>>
>>>>
>>>> Yes, we can, but we don't want to do that. Previous setups will stop working
>>>> and we will need libvirt to mange the disable-* properties.
>>>> As a matter of fact the code today is after some discussions with libvirt guys.
>>>
>>> Uh.. I think I'm missing something.. but it's still not clear to me
>>> why this would break existing setups or impose more work on libvirt.
>>>
>>
>> Long story short, libvirt guys don't want to manually set the disable-* properties
>> of virtio devices to make them comply to our goal (legacy -> PCI, modern PCIe).
>> They also don't want to look at QEMU version/machine type to make a decision on the above properties.
>>
>> I agree the solution is not perfect, but at least it makes our testing matrix smaller
>> and makes our PCIe guidelines a little easier to understand (e.g. all supported devices are PCIe,
>> but if want legacy PCI devices put them on the Root Complex)
>
> Ok.. I'm still missing something.  Are you saying that instead of the
> legacy/modern status determining PCIe support, that PCIe status is
> determining legacy/modern support by default?

Is more a guideline. We want QEMU to behave like this *by default*.
The modern spec does not require virtio 1.0 devices to be PCIe.

   That would actually
> seem to simplify things: whatever method we end up allowing PCIe
> virtio on PAPR, that should automatically enable modern mode, which is
> fine.
>

Agreed.

> Although.. I do wonder about legacy/modern for PAPR in general.
> Current kernels will support virtio 1.0 fine, but we have older
> released versions which only support legacy.

So, do you want legacy virtio devices to be PCIe ?

   Unlike PC we won't (I
> hope) have a whole machine type switch to handle this.

Good for you!

   At this stage
> I think we want virtio devices (whatever their bus type) on PAPR to
> default to allowing both legacy and modern for the forseeable future.
> How does that affect the matrix?
>

It adds a legacy virtio PCIe device to the matrix. For X86 we don't need
this combination because PC machines are PCI and Q35 machines are to be used
only on modern setups.

Since PARP does not have separate machines for PCi and PCIe, we need to live with it.


>> XHCI being PCIe on Root Complex is an unintended exception, but we want it connected to a
>> Root Port anyway, we don't have anything to gain from having it as Integrated Endpoint.
>> We only loose a slot that can be used by 8 Root Ports assembled into one multi-function device.
>>
>> PAPR bus should not be considered PCIe and should have a different set of rules allowing PCIe
>> devices to be plugged into Root Complex.
>
> Alright, I can work with that.  Michael, Andrea, how does this idea
> seem to you?
>

We should definitely get more opinions.
Thanks,
Marcel


Re: [Qemu-devel] [RFC] virtio-pci: Allow PCIe virtio devices on root bus
Posted by David Gibson 7 years, 1 month ago
On Wed, Feb 15, 2017 at 04:59:33PM +0200, Marcel Apfelbaum wrote:
> On 02/15/2017 03:45 AM, David Gibson wrote:
> > On Tue, Feb 14, 2017 at 02:53:08PM +0200, Marcel Apfelbaum wrote:
> > > On 02/14/2017 06:15 AM, David Gibson wrote:
> > > > On Mon, Feb 13, 2017 at 12:14:23PM +0200, Marcel Apfelbaum wrote:
> > > > > On 02/13/2017 06:33 AM, David Gibson wrote:
> > > > > > On Sun, Feb 12, 2017 at 09:05:46PM +0200, Marcel Apfelbaum wrote:
> > > > > > > On 02/10/2017 02:37 AM, David Gibson wrote:
> > > > > > > > On Thu, Feb 09, 2017 at 10:04:47AM +0100, Laszlo Ersek wrote:
> > > > > > > > > On 02/09/17 05:16, David Gibson wrote:
> > > > > > > > > > On Wed, Feb 08, 2017 at 11:40:50AM +0100, Laszlo Ersek wrote:
> > > > > > > > > > > On 02/08/17 07:16, David Gibson wrote:
> > > > > > > > > > > > Marcel,
> > > > > > > > > > > > 
> > > > > > > > > > > > Your original patch adding PCIe support to virtio-pci.c has the
> > > > > > > > > > > > limitation noted below that PCIe won't be enabled if the device is on
> > > > > > > > > > > > the root bus (rather than under a root or downstream port).  As
> > > > > > > > > > > > reasoned below, I think removing the check is correct, even for x86
> > > > > > > > > > > > (though it would rarely be useful there).  But I could well have
> > > > > > > > > > > > missed something.  Let me know if so...
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > Virtio devices can appear as either vanilla PCI or PCI-Express devices
> > > > > > > > > > > > depending on the bus they're connected to.  At the moment it will only
> > > > > > > > > > > > appear as vanilla PCI if connected to the root bus of a PCIe host bridge.
> > > > > > > > > > > > 
> > > > > > > > > > > > Presumably this is to reflect the fact that PCIe devices usually need to
> > > > > > > > > > > > be connected to a root (or further downstream) port rather than directly
> > > > > > > > > > > > on the root bus.  However, due to the odd requirements of the PAPR spec on the 'pseries'
> > > > > > > > > > > > machine type, it's normal for PCIe devices to appear on the root bus
> > > > > > > > > > > > without root ports.
> > > > > > > > > > > > 
> > > > > > > > > > > > Further, even on x86, there's no inherent reason we couldn't present a
> > > > > > > > > > > > virtio device as an "integrated device" (typically used for things built
> > > > > > > > > > > > into the PCI chipset), and those devices *do* typically appear on the root
> > > > > > > > > > > > bus.
> > > > > > > > > > > 
> > > > > > > > > > > I'm not personally making a counter-argument, just qouting some of
> > > > > > > > > > > the relevant parts of "docs/pcie.txt" ("PCI EXPRESS GUIDELINES"):
> > > > > > > > > > 
> > > > > > > > > > So, an earlier discussion more or less concluded that the PCIe
> > > > > > > > > > guidelines don't really work with PAPR guests.  That comes because
> > > > > > > > > > PAPR was designed with PowerVM in mind which allows PCI passthrough
> > > > > > > > > > but doesn't do any emulated PCI devices.  So they wanted to present
> > > > > > > > > > passed through devices (virtual or phyical) to the guest without
> > > > > > > > > > inserting virtual root ports.
> > > > > > > > > > 
> > > > > > > > > > Now, you can argue that this was a silly decision in PAPR, and you
> > > > > > > > > > could well be right, but there it is.
> > > > > > > > > 
> > > > > > > > > I can totally accept this, but then we should state it as a fact near
> > > > > > > > > the top of "docs/pcie.txt".
> > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > > > Place only the following kinds of devices directly on the Root Complex:
> > > > > > > > > > > >     (1) PCI Devices (e.g. network card, graphics card, IDE controller),
> > > > > > > > > > > >         not controllers. Place only legacy PCI devices on
> > > > > > > > > > > >         the Root Complex. These will be considered Integrated Endpoints.
> > > > > > > > > > > >         Note: Integrated Endpoints are not hot-pluggable.
> > > > > > > > > > > > 
> > > > > > > > > > > >         Although the PCI Express spec does not forbid PCI Express devices as
> > > > > > > > > > > >         Integrated Endpoints, existing hardware mostly integrates legacy PCI
> > > > > > > > > > > >         devices with the Root Complex.
> > > > > > > > > > 
> > > > > > > > > > "Mostly".. on my laptop at least the GPU shows up as an integrated PCI
> > > > > > > > > > Express endpoint, so it's certainly not the case that *all* root bus
> > > > > > > > > > devices are legacy.
> > > > > > > > > > 
> > > > > > > > > > > Guest OSes are suspected to behave
> > > > > > > > > > > >         strangely when PCI Express devices are integrated
> > > > > > > > > > > >         with the Root Complex.
> > > > > > > > > > 
> > > > > > > > > > Clearly not that strangely, that often, since my laptop works just fine.
> > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > >     [...]
> > > > > > > > > > > > 
> > > > > > > > > > > > 2.2 PCI Express only hierarchy
> > > > > > > > > > > > ==============================
> > > > > > > > > > > > Always use PCI Express Root Ports to start PCI Express hierarchies.
> > > > > > > > > > > 
> > > > > > > > > > > Above you mention "it's normal for PCIe devices to appear on the root bus without root ports".
> > > > > > > > > > 
> > > > > > > > > > Well "normal" perhaps wasn't the right word.  Let's say precedented,
> > > > > > > > > > if uncommon.
> > > > > > > > > > 
> > > > > > > > > > > Let me turn the question around: is it a *problem* for "pseries" if
> > > > > > > > > > > we require root ports? If so, why exactly?
> > > > > > > > > > 
> > > > > > > > > > That's.. a complex question.  At least Linux guests (and we don't
> > > > > > > > > > support any others yet) might cope with the addition of root ports.
> > > > > > > > > > Maybe.  I have discussed this option with BenH and others.
> > > > > > > > > > 
> > > > > > > > > > However it is gratuitously different from how PCIe devices will
> > > > > > > > > > typically appear for the same guest running under PowerVM.  Although I
> > > > > > > > > > suspect Linux would cope with the "normal standard" rather than "PAPR
> > > > > > > > > > standard" presentation, I'm not as confident about it as I would like.
> > > > > > > > > > 
> > > > > > > > > > Another consideration here is that other PCIe capable qemu emulated
> > > > > > > > > > devices, such as XHCI, will present fine as PCIe integrated endpoints
> > > > > > > > > > when attached to the root bus.  Libvirt won't do that usually, of
> > > > > > > > > > course, and it may not be the recommended way of doing things (on PC)
> > > > > > > > > > but it's possible.  I don't see any particular reason that virtio-pci
> > > > > > > > > > should enforce the root port requirement more so than any other
> > > > > > > > > > device.
> > > > > > > > > > 
> > > > > > > > > > > On 02/08/17 07:16, David Gibson wrote:
> > > > > > > > > > > > 
> > > > > > > > > > > > pcie_endpoint_cap_init() already automatically adjusts to advertise as
> > > > > > > > > > > > an integrated device rather than a "normal" PCIe endpoint when attached to
> > > > > > > > > > > > a root bus.  So we can remove the check for root bus within virtio and
> > > > > > > > > > > > allow (at the user's discretion) a PCIe virtio bus to be attached to a
> > > > > > > > > > > > root bus.
> > > > > > > > > > > 
> > > > > > > > > > > If Marcel thinks this is a good change, then I think we should go
> > > > > > > > > > > through "docs/pcie.txt" with a fine-toothed comb, and update all
> > > > > > > > > > > relevant spots. (If Marcel agrees, perhaps you can include such
> > > > > > > > > > > hunks in your patch at once.)
> > > > > > > > > > 
> > > > > > > > > > Actually, I think that would be a neverending process.  Maybe better
> > > > > > > > > > to put in a whole different spapr-pcie.txt with the assorted ways that
> > > > > > > > > > PAPR violates PCIe conventions.
> > > > > > > > > 
> > > > > > > > > That works for me too, but I think it would be a lot more work for you
> > > > > > > > > and others.
> > > > > > > > > 
> > > > > > > > > I plan on consulting "docs/pcie.txt" frequently; among other things, for
> > > > > > > > > deciding debates. Thus, improving the scope of "docs/pcie.txt" is very
> > > > > > > > > welcome IMO.
> > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > > It also may have consequences for libvirt (but I see you addressed
> > > > > > > > > > > Andrea at once, which is great).
> > > > > > > > > > 
> > > > > > > > > > Right, I've been discussed this with Andrea all along.  We're working
> > > > > > > > > > on a proposed PAPR specific way of allocating PCI and PCIe addresses
> > > > > > > > > > (different from the PCIe normal way, but the same as each other).
> > > > > > > > > > That will simplify adding PCIe support to PAPR, and also has some
> > > > > > > > > > other advantages for PAPR guests (related to the platform specific
> > > > > > > > > > isolation, hotplug  and error recovery mechanisms - also different
> > > > > > > > > > from the normal PCIe ones).
> > > > > > > > > 
> > > > > > > > > Great, if Andrea is aware, that's a relief.
> > > > > > > > > 
> > > > > > > > > Can you resubmit this patch with a small hunk for "docs/pcie.txt" that
> > > > > > > > > removes PAPR from the scope?
> > > > > > > > 
> > > > > > > 
> > > > > > > Hi David,
> > > > > > > Sorry for the delay, I just came back from PTO.
> > > > > > > 
> > > > > > > > Well, first I'd like to see if Marcel knows of some reason I didn't
> > > > > > > > think of why this test is important for virtio particularly here.  But
> > > > > > > > assuming the basic idea is acceptable, then yes, I'll update pcie.txt.
> > > > > > > > 
> > > > > > > 
> > > > > > > There are two reasons for keeping virtio Integrated Endpoints as PCI devices.
> > > > > > > 1. The first point is generic; even if having PCIe devices as Integrated Endpoints should be OK,
> > > > > > >    is not recommended because some guests may miss-behave (*). X86 arch supports a large number
> > > > > > >    of guests and we don't want to check and fix everything if *we don't have to*.
> > > > > > >    Even if is not written anywhere and there are actually some PCI Express Integrated Endpoints,
> > > > > > >    most of them are legacy PCI devices (I actually think this is why we have Integrated Endpoints
> > > > > > >    at all, but I might be wrong).
> > > > > > 
> > > > > > Hm, ok.  Could we implement that restriction in the pci/pcie core
> > > > > > rather than in the virtio device?
> > > > > 
> > > > > I am not sure if we should do that. Most of the devices are PCI or PCIe.
> > > > > Only some devices are "hybrid", the virtio devices, XHCI and I am not sure we have more.
> > > > 
> > > > Ok, I see your point, the pcie core might not be right.  But it still
> > > > seems really weird to have it explicitly in each hybrid device, even
> > > > if it's just the two.  As the code stands right now, XHCI and virtio
> > > > have different behaviour, without a clear reason for it.
> > > > 
> > > 
> > > I suppose XHCI can behave the same as virtio if Gerd has nothing against it
> > > (remain PCI if plugged into the Root Complex), but I don't know how it will help your case.
> > 
> > > > Maybe a hybrid_setup() helper function?
> > > > 
> > > 
> > > How would it help PAPR scenario?
> > 
> > It doesn't, directly, but it means there would only be a single place
> > to fix with a PAPR hack, instead of both virtio and XHCI and any
> > future hybrid devices.
> > 
> > > Anyway, Eduardo is working on supplying
> > > new query interfaces for libvirt and he touches this subject, I think
> > > he does plan to mark the hybrid devices in some way.
> > > 
> > > 
> > > > > > That would then protect things like
> > > > > > XHCI as well.
> > > > > 
> > > > > I don't see a reason to have XHCI as Integrated Endpoint, I think it should be always
> > > > > plugged into a root port. (for x86. arm and power)
> > > > 
> > > > No, not for Power.  Well, ok, yes for most ppc machine types, but not
> > > > for the paravirtualized 'pseries' machine (which is the one we most
> > > > care about).
> > > > 
> > > > Well.. I guess it doesn't need to be an Integrated Endpoint per se,
> > > > but we should be able to have it appear on the guest root bus.
> > > > 
> > > > The thing to realize is that the paravirtualized PCI interfaces in
> > > > PAPR mean there's very little guest visible difference between PCI and
> > > > PCIe.  In fact in most ways you could say that the paravirtualized bus
> > > > operates like a vanilla PCI bus.. except that it does provide a way to
> > > > access PCIe extended config space.
> > > 
> > > Maybe we can have a new bus type deriving from PCIBus and sibling of the PCIe Bus.
> > > Then we can have the same rules as the PCI bus and add tweaks when
> > > necessary.
> > 
> > Sounds possible.  I guess it would need to return true for
> > pci_bus_is_express(), so that PCIe devices will attach to it.  In
> > which case I'm not entirely sure how it would differ from a PCIe bus
> > from the qemu side.  AFAICT with the exception of the virtio check and
> > maybe a handful of others, the PCIe placement rules are handled by
> > libvirt, rather than by qemu.
> > 
> > > This does not solve the virtio problem since the code is actively looking for
> > > a PCIe Root Port and we don't have it for PAPR.
> > 
> > Technically, it's not even looking for a root port, just excluding
> > being on the root bus itself.  I guess any kind of bridge that has
> > something PCI-ish on the upstream side and PCIe on the downstream side
> > more or less has to be either a root port or a PCIe switch downstream
> > port, though.
> > 
> > >   Which means that you can use it to
> > > > drive PCIe devices just fine.  "Bus level" PCIe extensions like AER
> > > > and PCIe standard hotplug won't work, but PAPR has its own mechanisms
> > > > for those (common between PCI and PCIe).
> > > > 
> > > > I did float the idea of having the pseries PCI bus remain plain PCI
> > > > but with a special flag to allow PCIe devices to be attached to it
> > > > anyway.  It wasn't greeted with much enthusiasm..
> > > > 
> > > 
> > > Can you point me to the discussion please? It seems similar to what I proposed above.
> > 
> > Sorry, I was misleading.  I think I just raised that idea with Andrea
> > and a few other people internally, not on one of the lists at large.
> > 
> > > As you properly described it, is much closer to PCI then PCIe, even the only characteristic
> > > that makes it "a little" PCIe, the Extended Configuration Space support,
> > > is done with an alternative interface.
> > > 
> > > I agree the PAPR bus is not PCIe.
> > 
> > Ok, so if we take that direction, the question becomes how do we let
> > PCIe devices plug into this mostly-not-PCIe bus.  Maybe introduce a
> > "pci_bus_accepts_express()" function that will replace many, but not
> > all current uses of "pci_bus_is_express()"?
> > 
> 
> Sounds good and I think Eduardo is already working on exactly this
> idea, however he is on PTO now. It is better to synchronize with him.

Ah, right.  Do you know when he'll be back?  This is semi-urgent for
Power.


> > Such a helper could maybe simplify the logic in virtio-pci (and XHCI?)
> > by returning false on an x86 root bus.
> > 
> 
> The rule would me more complicated. We don't want to completely remove the
> possibility to have PCIe devices as part of Root Complex. it seems
> like I am contradicting myself, but no).
> This is why we have guidelines and  not hard-coded policies.
> Also ,the QEMU way is to be more permissive. We provide guidelines and sane
> defaults, but we let the user to chose.
> 
> Getting back to our problem, the rule would be:
> hybrid devices should be PCI or PCIe for a bus?
> PAPR bus should return 'PCIe' for hybrid devices.
> X86 bus should return 'PCIe' if not root.

Ok.

> > > > >   And for my purposes it would also make it easier to
> > > > > > implement aa machine type specific hook to re-allow that configuration
> > > > > > on pseries.
> > > > > 
> > > > > I agree we need a solution for PAPR.
> > > > > 
> > > > > What about a pcie_papr() function and then:
> > > > > 
> > > > > diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> > > > > index 5ce42af..2c646ae 100644
> > > > > --- a/hw/virtio/virtio-pci.c
> > > > > +++ b/hw/virtio/virtio-pci.c
> > > > > @@ -1804,7 +1804,7 @@ static void virtio_pci_realize(PCIDevice *pci_dev, Error **errp)
> > > > >          return;
> > > > >      }
> > > > > 
> > > > > -    if (pcie_port && pci_is_express(pci_dev)) {
> > > > > +    if ((pcie_port || pcie_parp()) && pci_is_express(pci_dev)) {
> > > > >          int pos;
> > > > > 
> > > > >          pos = pcie_endpoint_cap_init(pci_dev, 0);
> > > > 
> > > > That would be sufficient, yes, so I'll take it if we don't come to any
> > > > other solutions.
> > > 
> > > OK
> > > 
> > >  It still seems weird to me to have this logic within
> > > > a specific device implementation though.
> > > > 
> > > 
> > > I suppose you have a point, let's wait for Gerd and maybe Michael to comment
> > > on that, but anyway it will not help your case. (it will only make XHCI PCI if plugged into Root Complex)
> > 
> > It won't help directly, but it means we could put that pcie_papr()
> > test in just the common code, rather than having to look at every
> > hybrid and PCIe device.
> > 
> > > > > > 
> > > > > > At the moment XHCI and virtio-pci behave differently, which seems less
> > > > > > than ideal.
> > > > > > 
> > > > > > > 2. The second point is virtio specific. Not all the guests have virtio 1.0 support (e.g RHEL 6) and we allow them
> > > > > > >    to use legacy virtio devices as Integrated Endpoints (following the thought that this is why we have Integrated Endpoints)
> > > > > > >    Making the virtio devices PCI Express, but not virtio 1.0 is also problematic since now we will have too much
> > > > > > >    types of virtio devices. We want to keep it simple: virtio legacy
> > > > > > >    -> PCI, virtio modern -> PCIe.
> > > > > > 
> > > > > > Ok.. it's not obvious to me why integrated endpoint vs. under a root
> > > > > > port is relevant to this.  Can't we enable/disable PCIe mode based
> > > > > > directly on the legacy/modern settings?
> > > > > > 
> > > > > 
> > > > > Yes, we can, but we don't want to do that. Previous setups will stop working
> > > > > and we will need libvirt to mange the disable-* properties.
> > > > > As a matter of fact the code today is after some discussions with libvirt guys.
> > > > 
> > > > Uh.. I think I'm missing something.. but it's still not clear to me
> > > > why this would break existing setups or impose more work on libvirt.
> > > > 
> > > 
> > > Long story short, libvirt guys don't want to manually set the disable-* properties
> > > of virtio devices to make them comply to our goal (legacy -> PCI, modern PCIe).
> > > They also don't want to look at QEMU version/machine type to make a decision on the above properties.
> > > 
> > > I agree the solution is not perfect, but at least it makes our testing matrix smaller
> > > and makes our PCIe guidelines a little easier to understand (e.g. all supported devices are PCIe,
> > > but if want legacy PCI devices put them on the Root Complex)
> > 
> > Ok.. I'm still missing something.  Are you saying that instead of the
> > legacy/modern status determining PCIe support, that PCIe status is
> > determining legacy/modern support by default?
> 
> Is more a guideline. We want QEMU to behave like this *by default*.
> The modern spec does not require virtio 1.0 devices to be PCIe.

Ok.  But I'm trying to get a handle on how - even by default - the
linkage between PCI/PCIe and modern/legacy is created:  it's not
obvious to me.

>   That would actually
> > seem to simplify things: whatever method we end up allowing PCIe
> > virtio on PAPR, that should automatically enable modern mode, which is
> > fine.
> > 
> 
> Agreed.
> 
> > Although.. I do wonder about legacy/modern for PAPR in general.
> > Current kernels will support virtio 1.0 fine, but we have older
> > released versions which only support legacy.
> 
> So, do you want legacy virtio devices to be PCIe ?

TBH, we probably don't care one way or another.

I'm not clear here if you're making a distinction between legacy-only
and legacy+modern virtio instances.

I think we need to keep legacy support on for Power machines, because
we don't have the machine type switchover to let us handle it.  But we
certainly would like to enable modern support so that newer guests can
exploit it.

>   Unlike PC we won't (I
> > hope) have a whole machine type switch to handle this.
> 
> Good for you!
> 
>   At this stage
> > I think we want virtio devices (whatever their bus type) on PAPR to
> > default to allowing both legacy and modern for the forseeable future.
> > How does that affect the matrix?
> > 
> 
> It adds a legacy virtio PCIe device to the matrix. For X86 we don't need
> this combination because PC machines are PCI and Q35 machines are to be used
> only on modern setups.
> 
> Since PARP does not have separate machines for PCi and PCIe, we need to live with it.
> 
> 
> > > XHCI being PCIe on Root Complex is an unintended exception, but we want it connected to a
> > > Root Port anyway, we don't have anything to gain from having it as Integrated Endpoint.
> > > We only loose a slot that can be used by 8 Root Ports assembled into one multi-function device.
> > > 
> > > PAPR bus should not be considered PCIe and should have a different set of rules allowing PCIe
> > > devices to be plugged into Root Complex.
> > 
> > Alright, I can work with that.  Michael, Andrea, how does this idea
> > seem to you?
> > 
> 
> We should definitely get more opinions.
> Thanks,
> Marcel
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson
Re: [Qemu-devel] [RFC] virtio-pci: Allow PCIe virtio devices on root bus
Posted by David Gibson 7 years, 1 month ago
On Thu, Feb 16, 2017 at 01:48:42PM +1100, David Gibson wrote:
> On Wed, Feb 15, 2017 at 04:59:33PM +0200, Marcel Apfelbaum wrote:
> > On 02/15/2017 03:45 AM, David Gibson wrote:
> > > On Tue, Feb 14, 2017 at 02:53:08PM +0200, Marcel Apfelbaum wrote:
> > > > On 02/14/2017 06:15 AM, David Gibson wrote:
> > > > > On Mon, Feb 13, 2017 at 12:14:23PM +0200, Marcel Apfelbaum wrote:
> > > > > > On 02/13/2017 06:33 AM, David Gibson wrote:
> > > > > > > On Sun, Feb 12, 2017 at 09:05:46PM +0200, Marcel Apfelbaum wrote:
> > > > > > > > On 02/10/2017 02:37 AM, David Gibson wrote:
> > > > > > > > > On Thu, Feb 09, 2017 at 10:04:47AM +0100, Laszlo Ersek wrote:
> > > > > > > > > > On 02/09/17 05:16, David Gibson wrote:
> > > > > > > > > > > On Wed, Feb 08, 2017 at 11:40:50AM +0100, Laszlo Ersek wrote:
> > > > > > > > > > > > On 02/08/17 07:16, David Gibson wrote:
[snip]
> > > >   Which means that you can use it to
> > > > > drive PCIe devices just fine.  "Bus level" PCIe extensions like AER
> > > > > and PCIe standard hotplug won't work, but PAPR has its own mechanisms
> > > > > for those (common between PCI and PCIe).
> > > > > 
> > > > > I did float the idea of having the pseries PCI bus remain plain PCI
> > > > > but with a special flag to allow PCIe devices to be attached to it
> > > > > anyway.  It wasn't greeted with much enthusiasm..
> > > > > 
> > > > 
> > > > Can you point me to the discussion please? It seems similar to what I proposed above.
> > > 
> > > Sorry, I was misleading.  I think I just raised that idea with Andrea
> > > and a few other people internally, not on one of the lists at large.
> > > 
> > > > As you properly described it, is much closer to PCI then PCIe, even the only characteristic
> > > > that makes it "a little" PCIe, the Extended Configuration Space support,
> > > > is done with an alternative interface.
> > > > 
> > > > I agree the PAPR bus is not PCIe.
> > > 
> > > Ok, so if we take that direction, the question becomes how do we let
> > > PCIe devices plug into this mostly-not-PCIe bus.  Maybe introduce a
> > > "pci_bus_accepts_express()" function that will replace many, but not
> > > all current uses of "pci_bus_is_express()"?
> > > 
> > 
> > Sounds good and I think Eduardo is already working on exactly this
> > idea, however he is on PTO now. It is better to synchronize with him.
> 
> Ah, right.  Do you know when he'll be back?  This is semi-urgent for
> Power.
> 
> 
> > > Such a helper could maybe simplify the logic in virtio-pci (and XHCI?)
> > > by returning false on an x86 root bus.
> > > 
> > 
> > The rule would me more complicated. We don't want to completely remove the
> > possibility to have PCIe devices as part of Root Complex. it seems
> > like I am contradicting myself, but no).
> > This is why we have guidelines and  not hard-coded policies.
> > Also ,the QEMU way is to be more permissive. We provide guidelines and sane
> > defaults, but we let the user to chose.
> > 
> > Getting back to our problem, the rule would be:
> > hybrid devices should be PCI or PCIe for a bus?
> > PAPR bus should return 'PCIe' for hybrid devices.
> > X86 bus should return 'PCIe' if not root.
> 
> Ok.

Wait, actually.. we have two possible directions to go, both of which
have been mentioned in the thread, but I don't think we've settled on
one:

1) Have pseries create a PCIe bus (as my first cut draft does).

That should allow pure PCIe devices to appear either under a port or
(more usually for PAPR) as "integrated endpoints".  In addition we'd
need as suggested above a "pcie_hybrid_type()" function that would
tell hybrid devices to also appear as PCIe rather than PCI.

2) Have pseries create a vanilla PCI bus (or a special PAPR PCI
   variant)

Appearing as vanilla PCI would in a number of ways more closely match
the way PCI buses are handled on PAPR.  However, we still need to
connect PCIe devices to it.  So we'd need some 'bus_accepts_pcie()'
hook and use that (in place of pci_bus_is_express()) to determine both
whether we can attach pure PCIe devices and that hybrid devices should
appear as PCIe rather than plain PCI.


Based on the immediately preceding discussion, I was leaning towards
(2).  Is that your feeling as well?

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson
Re: [Qemu-devel] [RFC] virtio-pci: Allow PCIe virtio devices on root bus
Posted by Marcel Apfelbaum 7 years, 1 month ago
On 02/16/2017 05:28 AM, David Gibson wrote:
> On Thu, Feb 16, 2017 at 01:48:42PM +1100, David Gibson wrote:
>> On Wed, Feb 15, 2017 at 04:59:33PM +0200, Marcel Apfelbaum wrote:
>>> On 02/15/2017 03:45 AM, David Gibson wrote:
>>>> On Tue, Feb 14, 2017 at 02:53:08PM +0200, Marcel Apfelbaum wrote:
>>>>> On 02/14/2017 06:15 AM, David Gibson wrote:
>>>>>> On Mon, Feb 13, 2017 at 12:14:23PM +0200, Marcel Apfelbaum wrote:
>>>>>>> On 02/13/2017 06:33 AM, David Gibson wrote:
>>>>>>>> On Sun, Feb 12, 2017 at 09:05:46PM +0200, Marcel Apfelbaum wrote:
>>>>>>>>> On 02/10/2017 02:37 AM, David Gibson wrote:
>>>>>>>>>> On Thu, Feb 09, 2017 at 10:04:47AM +0100, Laszlo Ersek wrote:
>>>>>>>>>>> On 02/09/17 05:16, David Gibson wrote:
>>>>>>>>>>>> On Wed, Feb 08, 2017 at 11:40:50AM +0100, Laszlo Ersek wrote:
>>>>>>>>>>>>> On 02/08/17 07:16, David Gibson wrote:
> [snip]
>>>>>   Which means that you can use it to
>>>>>> drive PCIe devices just fine.  "Bus level" PCIe extensions like AER
>>>>>> and PCIe standard hotplug won't work, but PAPR has its own mechanisms
>>>>>> for those (common between PCI and PCIe).
>>>>>>
>>>>>> I did float the idea of having the pseries PCI bus remain plain PCI
>>>>>> but with a special flag to allow PCIe devices to be attached to it
>>>>>> anyway.  It wasn't greeted with much enthusiasm..
>>>>>>
>>>>>
>>>>> Can you point me to the discussion please? It seems similar to what I proposed above.
>>>>
>>>> Sorry, I was misleading.  I think I just raised that idea with Andrea
>>>> and a few other people internally, not on one of the lists at large.
>>>>
>>>>> As you properly described it, is much closer to PCI then PCIe, even the only characteristic
>>>>> that makes it "a little" PCIe, the Extended Configuration Space support,
>>>>> is done with an alternative interface.
>>>>>
>>>>> I agree the PAPR bus is not PCIe.
>>>>
>>>> Ok, so if we take that direction, the question becomes how do we let
>>>> PCIe devices plug into this mostly-not-PCIe bus.  Maybe introduce a
>>>> "pci_bus_accepts_express()" function that will replace many, but not
>>>> all current uses of "pci_bus_is_express()"?
>>>>
>>>
>>> Sounds good and I think Eduardo is already working on exactly this
>>> idea, however he is on PTO now. It is better to synchronize with him.
>>
>> Ah, right.  Do you know when he'll be back?  This is semi-urgent for
>> Power.
>>
>>
>>>> Such a helper could maybe simplify the logic in virtio-pci (and XHCI?)
>>>> by returning false on an x86 root bus.
>>>>
>>>
>>> The rule would me more complicated. We don't want to completely remove the
>>> possibility to have PCIe devices as part of Root Complex. it seems
>>> like I am contradicting myself, but no).
>>> This is why we have guidelines and  not hard-coded policies.
>>> Also ,the QEMU way is to be more permissive. We provide guidelines and sane
>>> defaults, but we let the user to chose.
>>>
>>> Getting back to our problem, the rule would be:
>>> hybrid devices should be PCI or PCIe for a bus?
>>> PAPR bus should return 'PCIe' for hybrid devices.
>>> X86 bus should return 'PCIe' if not root.
>>
>> Ok.
>
> Wait, actually.. we have two possible directions to go, both of which
> have been mentioned in the thread, but I don't think we've settled on
> one:
>
> 1) Have pseries create a PCIe bus (as my first cut draft does).
>
> That should allow pure PCIe devices to appear either under a port or
> (more usually for PAPR) as "integrated endpoints".  In addition we'd
> need as suggested above a "pcie_hybrid_type()" function that would
> tell hybrid devices to also appear as PCIe rather than PCI.
>
> 2) Have pseries create a vanilla PCI bus (or a special PAPR PCI
>    variant)
>
> Appearing as vanilla PCI would in a number of ways more closely match
> the way PCI buses are handled on PAPR.  However, we still need to
> connect PCIe devices to it.  So we'd need some 'bus_accepts_pcie()'
> hook and use that (in place of pci_bus_is_express()) to determine both
> whether we can attach pure PCIe devices and that hybrid devices should
> appear as PCIe rather than plain PCI.
>
>
> Based on the immediately preceding discussion, I was leaning towards
> (2).  Is that your feeling as well?
>

On 02/16/2017 05:28 AM, David Gibson wrote:
 > On Thu, Feb 16, 2017 at 01:48:42PM +1100, David Gibson wrote:
 >> On Wed, Feb 15, 2017 at 04:59:33PM +0200, Marcel Apfelbaum wrote:
 >>> On 02/15/2017 03:45 AM, David Gibson wrote:
 >>>> On Tue, Feb 14, 2017 at 02:53:08PM +0200, Marcel Apfelbaum wrote:
 >>>>> On 02/14/2017 06:15 AM, David Gibson wrote:
 >>>>>> On Mon, Feb 13, 2017 at 12:14:23PM +0200, Marcel Apfelbaum wrote:
 >>>>>>> On 02/13/2017 06:33 AM, David Gibson wrote:
 >>>>>>>> On Sun, Feb 12, 2017 at 09:05:46PM +0200, Marcel Apfelbaum wrote:
 >>>>>>>>> On 02/10/2017 02:37 AM, David Gibson wrote:
 >>>>>>>>>> On Thu, Feb 09, 2017 at 10:04:47AM +0100, Laszlo Ersek wrote:
 >>>>>>>>>>> On 02/09/17 05:16, David Gibson wrote:
 >>>>>>>>>>>> On Wed, Feb 08, 2017 at 11:40:50AM +0100, Laszlo Ersek wrote:
 >>>>>>>>>>>>> On 02/08/17 07:16, David Gibson wrote:
 > [snip]
 >>>>>   Which means that you can use it to
 >>>>>> drive PCIe devices just fine.  "Bus level" PCIe extensions like AER
 >>>>>> and PCIe standard hotplug won't work, but PAPR has its own mechanisms
 >>>>>> for those (common between PCI and PCIe).
 >>>>>>
 >>>>>> I did float the idea of having the pseries PCI bus remain plain PCI
 >>>>>> but with a special flag to allow PCIe devices to be attached to it
 >>>>>> anyway.  It wasn't greeted with much enthusiasm..
 >>>>>>
 >>>>>
 >>>>> Can you point me to the discussion please? It seems similar to what I proposed above.
 >>>>
 >>>> Sorry, I was misleading.  I think I just raised that idea with Andrea
 >>>> and a few other people internally, not on one of the lists at large.
 >>>>
 >>>>> As you properly described it, is much closer to PCI then PCIe, even the only characteristic
 >>>>> that makes it "a little" PCIe, the Extended Configuration Space support,
 >>>>> is done with an alternative interface.
 >>>>>
 >>>>> I agree the PAPR bus is not PCIe.
 >>>>
 >>>> Ok, so if we take that direction, the question becomes how do we let
 >>>> PCIe devices plug into this mostly-not-PCIe bus.  Maybe introduce a
 >>>> "pci_bus_accepts_express()" function that will replace many, but not
 >>>> all current uses of "pci_bus_is_express()"?
 >>>>
 >>>
 >>> Sounds good and I think Eduardo is already working on exactly this
 >>> idea, however he is on PTO now. It is better to synchronize with him.
 >>
 >> Ah, right.  Do you know when he'll be back?  This is semi-urgent for
 >> Power.
 >>
 >>
 >>>> Such a helper could maybe simplify the logic in virtio-pci (and XHCI?)
 >>>> by returning false on an x86 root bus.
 >>>>
 >>>
 >>> The rule would me more complicated. We don't want to completely remove the
 >>> possibility to have PCIe devices as part of Root Complex. it seems
 >>> like I am contradicting myself, but no).
 >>> This is why we have guidelines and  not hard-coded policies.
 >>> Also ,the QEMU way is to be more permissive. We provide guidelines and sane
 >>> defaults, but we let the user to chose.
 >>>
 >>> Getting back to our problem, the rule would be:
 >>> hybrid devices should be PCI or PCIe for a bus?
 >>> PAPR bus should return 'PCIe' for hybrid devices.
 >>> X86 bus should return 'PCIe' if not root.
 >>
 >> Ok.
 >
 > Wait, actually.. we have two possible directions to go, both of which
 > have been mentioned in the thread, but I don't think we've settled on
 > one:
 >
 > 1) Have pseries create a PCIe bus (as my first cut draft does).
 >
 > That should allow pure PCIe devices to appear either under a port or
 > (more usually for PAPR) as "integrated endpoints".  In addition we'd
 > need as suggested above a "pcie_hybrid_type()" function that would
 > tell hybrid devices to also appear as PCIe rather than PCI.
 >
 > 2) Have pseries create a vanilla PCI bus (or a special PAPR PCI
 >    variant)
 >
 > Appearing as vanilla PCI would in a number of ways more closely match
 > the way PCI buses are handled on PAPR.  However, we still need to
 > connect PCIe devices to it.  So we'd need some 'bus_accepts_pcie()'
 > hook and use that (in place of pci_bus_is_express()) to determine both
 > whether we can attach pure PCIe devices and that hybrid devices should
 > appear as PCIe rather than plain PCI.
 >
 >
 > Based on the immediately preceding discussion, I was leaning towards
 > (2).  Is that your feeling as well?
 >

I also like option (2).

Thanks,
Marcel



Re: [Qemu-devel] [RFC] virtio-pci: Allow PCIe virtio devices on root bus
Posted by Andrea Bolognani 7 years, 1 month ago
On Thu, 2017-02-16 at 21:14 +0200, Marcel Apfelbaum wrote:
>  > Wait, actually.. we have two possible directions to go, both of which
>  > have been mentioned in the thread, but I don't think we've settled on
>  > one:
>  >
>  > 1) Have pseries create a PCIe bus (as my first cut draft does).
>  >
>  > That should allow pure PCIe devices to appear either under a port or
>  > (more usually for PAPR) as "integrated endpoints".  In addition we'd
>  > need as suggested above a "pcie_hybrid_type()" function that would
>  > tell hybrid devices to also appear as PCIe rather than PCI.
>  >
>  > 2) Have pseries create a vanilla PCI bus (or a special PAPR PCI
>  >    variant)
>  >
>  > Appearing as vanilla PCI would in a number of ways more closely match
>  > the way PCI buses are handled on PAPR.  However, we still need to
>  > connect PCIe devices to it.  So we'd need some 'bus_accepts_pcie()'
>  > hook and use that (in place of pci_bus_is_express()) to determine both
>  > whether we can attach pure PCIe devices and that hybrid devices should
>  > appear as PCIe rather than plain PCI.
>  >
>  >
>  > Based on the immediately preceding discussion, I was leaning towards
>  > (2).  Is that your feeling as well?
> 
> I also like option (2).

After catching up with the thread, I tend to agree.

I've also been thinking about what libvirt would need to
do to adapt to the changes proposed here, and I believe the
answer to be: not much, really.

The current code already places VirtIO devices on the root
bus, even though it does so because it assumes pSeries
guests are not PCIe capable; same for XHCI, and pretty much
all other devices including those assigned through VFIO.

Basically, as far as I can tell, the changes proposed here
would only affect the guest (eg. ability to access the
extended config space), not how the devices would have to
be placed on the various buses. Or did I miss something?

-- 
Andrea Bolognani / Red Hat / Virtualization

Re: [Qemu-devel] [RFC] virtio-pci: Allow PCIe virtio devices on root bus
Posted by Marcel Apfelbaum 7 years, 1 month ago
On 02/16/2017 04:48 AM, David Gibson wrote:
> On Wed, Feb 15, 2017 at 04:59:33PM +0200, Marcel Apfelbaum wrote:

[...]

>>>>>
>>>>> I did float the idea of having the pseries PCI bus remain plain PCI
>>>>> but with a special flag to allow PCIe devices to be attached to it
>>>>> anyway.  It wasn't greeted with much enthusiasm..
>>>>>
>>>>
>>>> Can you point me to the discussion please? It seems similar to what I proposed above.
>>>
>>> Sorry, I was misleading.  I think I just raised that idea with Andrea
>>> and a few other people internally, not on one of the lists at large.
>>>
>>>> As you properly described it, is much closer to PCI then PCIe, even the only characteristic
>>>> that makes it "a little" PCIe, the Extended Configuration Space support,
>>>> is done with an alternative interface.
>>>>
>>>> I agree the PAPR bus is not PCIe.
>>>
>>> Ok, so if we take that direction, the question becomes how do we let
>>> PCIe devices plug into this mostly-not-PCIe bus.  Maybe introduce a
>>> "pci_bus_accepts_express()" function that will replace many, but not
>>> all current uses of "pci_bus_is_express()"?
>>>
>>
>> Sounds good and I think Eduardo is already working on exactly this
>> idea, however he is on PTO now. It is better to synchronize with him.
>
> Ah, right.  Do you know when he'll be back?  This is semi-urgent for
> Power.
>

In a week or so.

>

[...]

>>>
>>> Ok.. I'm still missing something.  Are you saying that instead of the
>>> legacy/modern status determining PCIe support, that PCIe status is
>>> determining legacy/modern support by default?
>>
>> Is more a guideline. We want QEMU to behave like this *by default*.
>> The modern spec does not require virtio 1.0 devices to be PCIe.
>
> Ok.  But I'm trying to get a handle on how - even by default - the
> linkage between PCI/PCIe and modern/legacy is created:  it's not
> obvious to me.
>

There is no direct connection between modern/legacy <-> PCI/PCIe.
However the defaults work like that:
I440FX - (pc): devices are PCI, transitional
    created by: disable-legacy=auto,disable-modern=off
    - in virtio_pci_realize disable-legacy: auto => off
Q35 - on root bus - devices are PCI, transitional
     - other buses - devices are PCIe, modern
    created by: disable-legacy=auto,disable-modern=off
    - in virtio_pci_dc_realize:
         if disable-modern=off -> pcie
    - in virtio_pci_realize:
     if on pcie root port -> disable-legacy: auto => on
     otherwise ->    disable-legacy: auto => off


>>   That would actually
>>> seem to simplify things: whatever method we end up allowing PCIe
>>> virtio on PAPR, that should automatically enable modern mode, which is
>>> fine.
>>>
>>
>> Agreed.
>>
>>> Although.. I do wonder about legacy/modern for PAPR in general.
>>> Current kernels will support virtio 1.0 fine, but we have older
>>> released versions which only support legacy.
>>
>> So, do you want legacy virtio devices to be PCIe ?
>
> TBH, we probably don't care one way or another.
>
> I'm not clear here if you're making a distinction between legacy-only
> and legacy+modern virtio instances.
>

Legacy devices and the modern ones are different devices (product ID).
I think the 'transitional' ones have the ID of the modern virtio.

> I think we need to keep legacy support on for Power machines, because
> we don't have the machine type switchover to let us handle it.  But we
> certainly would like to enable modern support so that newer guests can
> exploit it.
>

"transitional" is the right way to go.

Thanks,
Marcel

>>   Unlike PC we won't (I
>>> hope) have a whole machine type switch to handle this.
>>
>> Good for you!
>>
>>   At this stage
>>> I think we want virtio devices (whatever their bus type) on PAPR to
>>> default to allowing both legacy and modern for the forseeable future.
>>> How does that affect the matrix?
>>>
>>
>> It adds a legacy virtio PCIe device to the matrix. For X86 we don't need
>> this combination because PC machines are PCI and Q35 machines are to be used
>> only on modern setups.
>>
>> Since PARP does not have separate machines for PCi and PCIe, we need to live with it.
>>
>>
>>>> XHCI being PCIe on Root Complex is an unintended exception, but we want it connected to a
>>>> Root Port anyway, we don't have anything to gain from having it as Integrated Endpoint.
>>>> We only loose a slot that can be used by 8 Root Ports assembled into one multi-function device.
>>>>
>>>> PAPR bus should not be considered PCIe and should have a different set of rules allowing PCIe
>>>> devices to be plugged into Root Complex.
>>>
>>> Alright, I can work with that.  Michael, Andrea, how does this idea
>>> seem to you?
>>>
>>
>> We should definitely get more opinions.
>> Thanks,
>> Marcel
>>
>


Re: [Qemu-devel] [RFC] virtio-pci: Allow PCIe virtio devices on root bus
Posted by Gerd Hoffmann 7 years, 1 month ago
On Di, 2017-02-14 at 14:53 +0200, Marcel Apfelbaum wrote:
> I suppose XHCI can behave the same as virtio if Gerd has nothing
> against it

No objections.

cheers,
  Gerd