[RFC PATCH 03/18] hw/pci: use PCIDevice gpio for device IRQ

Mark Cave-Ayland posted 18 patches 2 years, 9 months ago
[RFC PATCH 03/18] hw/pci: use PCIDevice gpio for device IRQ
Posted by Mark Cave-Ayland 2 years, 9 months ago
Change pci_set_irq() to call qemu_set_irq() on the PCI device IRQ rather than
calling PCI bus IRQ handler function directly. In order to preserve the
existing behaviour update pci_qdev_realize() so that it automatically connects
the PCI device IRQ to the PCI bus IRQ handler.

Finally add a "QEMU interface" description documenting the new PCI device IRQ
gpio next to the declaration of TYPE_PCI_DEVICE.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
---
 hw/pci/pci.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 9471f996a7..3da1481eb5 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -1680,8 +1680,7 @@ qemu_irq pci_allocate_irq(PCIDevice *pci_dev)
 
 void pci_set_irq(PCIDevice *pci_dev, int level)
 {
-    int intx = pci_intx(pci_dev);
-    pci_irq_handler(pci_dev, intx, level);
+    qemu_set_irq(pci_dev->irq, level);
 }
 
 /* Special hooks used by device assignment */
@@ -2193,6 +2192,10 @@ static void pci_qdev_realize(DeviceState *qdev, Error **errp)
     pci_set_power(pci_dev, true);
 
     pci_dev->msi_trigger = pci_msi_trigger;
+
+    /* Connect device IRQ to bus */
+    qdev_connect_gpio_out(DEVICE(pci_dev), 0,
+                          pci_get_bus(pci_dev)->irq_in[pci_dev->devfn]);
 }
 
 static void pci_device_init(Object *obj)
@@ -2850,6 +2853,11 @@ void pci_set_power(PCIDevice *d, bool state)
     }
 }
 
+/*
+ * QEMU interface:
+ * + Unnamed GPIO output: set to 1 if the PCI Device has asserted its irq
+ */
+
 static const TypeInfo pci_device_type_info = {
     .name = TYPE_PCI_DEVICE,
     .parent = TYPE_DEVICE,
-- 
2.30.2
Re: [RFC PATCH 03/18] hw/pci: use PCIDevice gpio for device IRQ
Posted by Bernhard Beschow 2 years, 9 months ago

Am 11. Mai 2023 08:57:16 UTC schrieb Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>:
>Change pci_set_irq() to call qemu_set_irq() on the PCI device IRQ rather than
>calling PCI bus IRQ handler function directly. In order to preserve the
>existing behaviour update pci_qdev_realize() so that it automatically connects
>the PCI device IRQ to the PCI bus IRQ handler.
>
>Finally add a "QEMU interface" description documenting the new PCI device IRQ
>gpio next to the declaration of TYPE_PCI_DEVICE.
>
>Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
>---
> hw/pci/pci.c | 12 ++++++++++--
> 1 file changed, 10 insertions(+), 2 deletions(-)
>
>diff --git a/hw/pci/pci.c b/hw/pci/pci.c
>index 9471f996a7..3da1481eb5 100644
>--- a/hw/pci/pci.c
>+++ b/hw/pci/pci.c
>@@ -1680,8 +1680,7 @@ qemu_irq pci_allocate_irq(PCIDevice *pci_dev)
> 
> void pci_set_irq(PCIDevice *pci_dev, int level)
> {
>-    int intx = pci_intx(pci_dev);
>-    pci_irq_handler(pci_dev, intx, level);
>+    qemu_set_irq(pci_dev->irq, level);
> }
> 
> /* Special hooks used by device assignment */
>@@ -2193,6 +2192,10 @@ static void pci_qdev_realize(DeviceState *qdev, Error **errp)
>     pci_set_power(pci_dev, true);
> 
>     pci_dev->msi_trigger = pci_msi_trigger;
>+
>+    /* Connect device IRQ to bus */
>+    qdev_connect_gpio_out(DEVICE(pci_dev), 0,
>+                          pci_get_bus(pci_dev)->irq_in[pci_dev->devfn]);

I think this is confusing a few things. In my understanding -- unlike ISA -- PCI considers interrupt lanes only for PCI slots but not for buses. So for example each PCI slot could have its own direct connections (up to four, intA..intD) to the interrupt controller. IOW interrupt lanes and PCI buses are unrelated, thus PCIBus shouldn't really have IRQs.

Moreover, in case the interrupt lines are shared between multiple PCI slots, a usual pattern is to swizzle these lines such that the intAs from the slots don't all occupy just one IRQ line. That means that depending on the slot the device is plugged into a different lane is triggered. Above code, however, would always trigger the same line and wouldn't even allow for modeling the swizzeling.

Also, above code would cause out of bounds array accesses if a PCI device had more functions than there are on "the bus":
For example, consider PIIX which has four PIRQs, so ARRAY_SIZE(irq_fn) == 4, right? devfn can be up to 8 according to the PCI spec which would cause an out if bounds array access above.

I think that this commit does actually re-define how PCI buses work in QEMU although the cover letter claims to save this for another day. We should probably not apply the series in its current form.

Best regards,
Bernhard

> }
> 
> static void pci_device_init(Object *obj)
>@@ -2850,6 +2853,11 @@ void pci_set_power(PCIDevice *d, bool state)
>     }
> }
> 
>+/*
>+ * QEMU interface:
>+ * + Unnamed GPIO output: set to 1 if the PCI Device has asserted its irq
>+ */
>+
> static const TypeInfo pci_device_type_info = {
>     .name = TYPE_PCI_DEVICE,
>     .parent = TYPE_DEVICE,
Re: [RFC PATCH 03/18] hw/pci: use PCIDevice gpio for device IRQ
Posted by Mark Cave-Ayland 2 years, 9 months ago
On 11/05/2023 22:44, Bernhard Beschow wrote:

> Am 11. Mai 2023 08:57:16 UTC schrieb Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>:
>> Change pci_set_irq() to call qemu_set_irq() on the PCI device IRQ rather than
>> calling PCI bus IRQ handler function directly. In order to preserve the
>> existing behaviour update pci_qdev_realize() so that it automatically connects
>> the PCI device IRQ to the PCI bus IRQ handler.
>>
>> Finally add a "QEMU interface" description documenting the new PCI device IRQ
>> gpio next to the declaration of TYPE_PCI_DEVICE.
>>
>> Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
>> ---
>> hw/pci/pci.c | 12 ++++++++++--
>> 1 file changed, 10 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
>> index 9471f996a7..3da1481eb5 100644
>> --- a/hw/pci/pci.c
>> +++ b/hw/pci/pci.c
>> @@ -1680,8 +1680,7 @@ qemu_irq pci_allocate_irq(PCIDevice *pci_dev)
>>
>> void pci_set_irq(PCIDevice *pci_dev, int level)
>> {
>> -    int intx = pci_intx(pci_dev);
>> -    pci_irq_handler(pci_dev, intx, level);
>> +    qemu_set_irq(pci_dev->irq, level);
>> }
>>
>> /* Special hooks used by device assignment */
>> @@ -2193,6 +2192,10 @@ static void pci_qdev_realize(DeviceState *qdev, Error **errp)
>>      pci_set_power(pci_dev, true);
>>
>>      pci_dev->msi_trigger = pci_msi_trigger;
>> +
>> +    /* Connect device IRQ to bus */
>> +    qdev_connect_gpio_out(DEVICE(pci_dev), 0,
>> +                          pci_get_bus(pci_dev)->irq_in[pci_dev->devfn]);
> 
> I think this is confusing a few things. In my understanding -- unlike ISA -- PCI considers interrupt lanes only for PCI slots but not for buses. So for example each PCI slot could have its own direct connections (up to four, intA..intD) to the interrupt controller. IOW interrupt lanes and PCI buses are unrelated, thus PCIBus shouldn't really have IRQs.

That's definitely true: the restriction here is caused by the fact that QEMU's PCI 
IRQ routing is already deeply integrated into the PCIBus object. This is visible by 
the fact that pci_bus_map_irqs() is used to set the IRQ swizzling for each PCIBus so 
I don't see there is a way to untangle without a massive redesign of the PCI buses in 
QEMU, which is certainly outside the scope of this series.

> Moreover, in case the interrupt lines are shared between multiple PCI slots, a usual pattern is to swizzle these lines such that the intAs from the slots don't all occupy just one IRQ line. That means that depending on the slot the device is plugged into a different lane is triggered. Above code, however, would always trigger the same line and wouldn't even allow for modeling the swizzeling.
> 
> Also, above code would cause out of bounds array accesses if a PCI device had more functions than there are on "the bus":
> For example, consider PIIX which has four PIRQs, so ARRAY_SIZE(irq_fn) == 4, right? devfn can be up to 8 according to the PCI spec which would cause an out if bounds array access above.

Another restriction on the current QEMU design of PCI devices is that there is an 
implicit assumption that there is only one IRQ i.e. it's the current INTX rather than 
representing the 4 separate PIRQ lines. Again this is something that people have 
expressed interest in resolving, but as soon as you get here you end up with the same 
problem above in that you need to revisit the PCI IRQ swizzling code above.

In particular there are some interesting use cases such as SPARC64 sabre whereby we 
use virtual interrupt numbers during IRQ swizzling because we lose the PCIDevice of 
the originating device as the IRQ propagates upwards through PCI bridges. So that 
would also need a redesign assuming we move towards a more physical PCI model.

For the moment the PCI input IRQs are stored within PCIBus to ensure the existing 
interrupt code works without having to touch any IRQ swizzling code, and so as the 
current design assumes a single interrupt for each PCIDevice we only need a single 
IRQ for each devfn. That's why there is no overflow of the array, and I can confirm 
the code passed both gitlab CI and a local --enable-sanitizers "make check" without 
introducing any additional failures.

> I think that this commit does actually re-define how PCI buses work in QEMU although the cover letter claims to save this for another day. We should probably not apply the series in its current form.

I hope the above explanation gives a bit more background as to why the series is 
structured in the way it is. In effect it makes no attempt to alter the existing 
PCIBus routing, but starts by considering that PCI devices have their own unique IRQ 
handling using pci_set_irq(). So let's take a baby step which is to convert PCI 
devices to use a standard qdev gpio instead of their own custom PCI IRQ handler, 
which then allows for the possibility of using the standard qdev APIs for PCI IRQ 
routing in future.


ATB,

Mark.
Re: [RFC PATCH 03/18] hw/pci: use PCIDevice gpio for device IRQ
Posted by Michael S. Tsirkin 2 years, 9 months ago
On Thu, May 11, 2023 at 09:44:51PM +0000, Bernhard Beschow wrote:
> 
> 
> Am 11. Mai 2023 08:57:16 UTC schrieb Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>:
> >Change pci_set_irq() to call qemu_set_irq() on the PCI device IRQ rather than
> >calling PCI bus IRQ handler function directly. In order to preserve the
> >existing behaviour update pci_qdev_realize() so that it automatically connects
> >the PCI device IRQ to the PCI bus IRQ handler.
> >
> >Finally add a "QEMU interface" description documenting the new PCI device IRQ
> >gpio next to the declaration of TYPE_PCI_DEVICE.
> >
> >Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
> >---
> > hw/pci/pci.c | 12 ++++++++++--
> > 1 file changed, 10 insertions(+), 2 deletions(-)
> >
> >diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> >index 9471f996a7..3da1481eb5 100644
> >--- a/hw/pci/pci.c
> >+++ b/hw/pci/pci.c
> >@@ -1680,8 +1680,7 @@ qemu_irq pci_allocate_irq(PCIDevice *pci_dev)
> > 
> > void pci_set_irq(PCIDevice *pci_dev, int level)
> > {
> >-    int intx = pci_intx(pci_dev);
> >-    pci_irq_handler(pci_dev, intx, level);
> >+    qemu_set_irq(pci_dev->irq, level);
> > }
> > 
> > /* Special hooks used by device assignment */
> >@@ -2193,6 +2192,10 @@ static void pci_qdev_realize(DeviceState *qdev, Error **errp)
> >     pci_set_power(pci_dev, true);
> > 
> >     pci_dev->msi_trigger = pci_msi_trigger;
> >+
> >+    /* Connect device IRQ to bus */
> >+    qdev_connect_gpio_out(DEVICE(pci_dev), 0,
> >+                          pci_get_bus(pci_dev)->irq_in[pci_dev->devfn]);
> 
> I think this is confusing a few things. In my understanding -- unlike
> ISA -- PCI considers interrupt lanes only for PCI slots but not for
> buses.
> So for example each PCI slot could have its own direct
> connections (up to four, intA..intD) to the interrupt controller. IOW
> interrupt lanes and PCI buses are unrelated, thus PCIBus shouldn't
> really have IRQs.

True, interrupt lines (not lanes I think - lanes is a PCI express
unrelated to interrupts since interrupts are just messages under PCIe)
bypass the PCI bus. They are in fact even used outside the
normal GNT#/REQ# protocol.

	The system vendor is free to combine the various INTx# signals from the PCI connector(s)
	in any way to connect them to the interrupt controller. They may be wire-ORed or
	electronically switched under program control, or any combination thereof. The system
	designer must insure that each INTx# signal from each connector is connected to an input
	on the interrupt controller. This means the device driver may not make any assumptions
	about interrupt sharing. All PCI device drivers must be able to share an interrupt (chaining)
	with any other logical device including devices in the same multi-function package.

> 
> Moreover, in case the interrupt lines are shared between multiple PCI slots, a usual pattern is to swizzle these lines such that the intAs from the slots don't all occupy just one IRQ line. That means that depending on the slot the device is plugged into a different lane is triggered. Above code, however, would always trigger the same line and wouldn't even allow for modeling the swizzeling.

the swizzeling always applies in case of PCI bridges:

However, since bridges will be used on add-in cards, the BIOS will assume an association
between device location and which INTx# line it uses when requesting an interrupt.
...
The BIOS code will assume the following binding behind the bridge and will
write the IRQ number in each device as described in Table 9-1. The interrupt binding
defined in this table is mandatory for add-in cards utilizing a bridge.





> Also, above code would cause out of bounds array accesses if a PCI device had more functions than there are on "the bus":
> For example, consider PIIX which has four PIRQs, so ARRAY_SIZE(irq_fn) == 4, right? devfn can be up to 8 according to the PCI spec which would cause an out if bounds array access above.
> 
> I think that this commit does actually re-define how PCI buses work in QEMU although the cover letter claims to save this for another day. We should probably not apply the series in its current form.
> 
> Best regards,
> Bernhard
> 
> > }
> > 
> > static void pci_device_init(Object *obj)
> >@@ -2850,6 +2853,11 @@ void pci_set_power(PCIDevice *d, bool state)
> >     }
> > }
> > 
> >+/*
> >+ * QEMU interface:
> >+ * + Unnamed GPIO output: set to 1 if the PCI Device has asserted its irq
> >+ */
> >+
> > static const TypeInfo pci_device_type_info = {
> >     .name = TYPE_PCI_DEVICE,
> >     .parent = TYPE_DEVICE,
Re: [RFC PATCH 03/18] hw/pci: use PCIDevice gpio for device IRQ
Posted by Mark Cave-Ayland 2 years, 9 months ago
On 12/05/2023 06:51, Michael S. Tsirkin wrote:

> On Thu, May 11, 2023 at 09:44:51PM +0000, Bernhard Beschow wrote:
>>
>>
>> Am 11. Mai 2023 08:57:16 UTC schrieb Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>:
>>> Change pci_set_irq() to call qemu_set_irq() on the PCI device IRQ rather than
>>> calling PCI bus IRQ handler function directly. In order to preserve the
>>> existing behaviour update pci_qdev_realize() so that it automatically connects
>>> the PCI device IRQ to the PCI bus IRQ handler.
>>>
>>> Finally add a "QEMU interface" description documenting the new PCI device IRQ
>>> gpio next to the declaration of TYPE_PCI_DEVICE.
>>>
>>> Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
>>> ---
>>> hw/pci/pci.c | 12 ++++++++++--
>>> 1 file changed, 10 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
>>> index 9471f996a7..3da1481eb5 100644
>>> --- a/hw/pci/pci.c
>>> +++ b/hw/pci/pci.c
>>> @@ -1680,8 +1680,7 @@ qemu_irq pci_allocate_irq(PCIDevice *pci_dev)
>>>
>>> void pci_set_irq(PCIDevice *pci_dev, int level)
>>> {
>>> -    int intx = pci_intx(pci_dev);
>>> -    pci_irq_handler(pci_dev, intx, level);
>>> +    qemu_set_irq(pci_dev->irq, level);
>>> }
>>>
>>> /* Special hooks used by device assignment */
>>> @@ -2193,6 +2192,10 @@ static void pci_qdev_realize(DeviceState *qdev, Error **errp)
>>>      pci_set_power(pci_dev, true);
>>>
>>>      pci_dev->msi_trigger = pci_msi_trigger;
>>> +
>>> +    /* Connect device IRQ to bus */
>>> +    qdev_connect_gpio_out(DEVICE(pci_dev), 0,
>>> +                          pci_get_bus(pci_dev)->irq_in[pci_dev->devfn]);
>>
>> I think this is confusing a few things. In my understanding -- unlike
>> ISA -- PCI considers interrupt lanes only for PCI slots but not for
>> buses.
>> So for example each PCI slot could have its own direct
>> connections (up to four, intA..intD) to the interrupt controller. IOW
>> interrupt lanes and PCI buses are unrelated, thus PCIBus shouldn't
>> really have IRQs.
> 
> True, interrupt lines (not lanes I think - lanes is a PCI express
> unrelated to interrupts since interrupts are just messages under PCIe)
> bypass the PCI bus. They are in fact even used outside the
> normal GNT#/REQ# protocol.
> 
> 	The system vendor is free to combine the various INTx# signals from the PCI connector(s)
> 	in any way to connect them to the interrupt controller. They may be wire-ORed or
> 	electronically switched under program control, or any combination thereof. The system
> 	designer must insure that each INTx# signal from each connector is connected to an input
> 	on the interrupt controller. This means the device driver may not make any assumptions
> 	about interrupt sharing. All PCI device drivers must be able to share an interrupt (chaining)
> 	with any other logical device including devices in the same multi-function package.

I hope I covered this in my reply to Bernhard, but I think this supports the idea 
that using a gpio is the right approach here: in the case of PCI IRQ the gpio 
represents the physical signal and can be wired using standard qdev APIs, whereas for 
PCIe the GPIOs can be wired to an internal message handler in exactly the same way.

>> Moreover, in case the interrupt lines are shared between multiple PCI slots, a usual pattern is to swizzle these lines such that the intAs from the slots don't all occupy just one IRQ line. That means that depending on the slot the device is plugged into a different lane is triggered. Above code, however, would always trigger the same line and wouldn't even allow for modeling the swizzeling.
> 
> the swizzeling always applies in case of PCI bridges:
> 
> However, since bridges will be used on add-in cards, the BIOS will assume an association
> between device location and which INTx# line it uses when requesting an interrupt.
> ...
> The BIOS code will assume the following binding behind the bridge and will
> write the IRQ number in each device as described in Table 9-1. The interrupt binding
> defined in this table is mandatory for add-in cards utilizing a bridge.

Also just to clarify in line with my previous reply: this series only changes the PCI 
device IRQ endpoint to use a gpio as a starting point to facilitate using standard 
qdev APIs to enable physical PCI device routing in future, as opposed to using a 
bespoke pci_set_irq() function just for PCI.

If this makes sense, I'd be interested to hear whether you think the current approach 
of using a named gpio "pci-input-irq" (which appears in "info qom-tree") is suitable, 
or whether it makes sense to embed the PCI IRQ directly in the device which is the 
normal qdev approach but will take more work as it involves updating all the relevant 
PCIDevices.


ATB,

Mark.