[PATCH v7 07/12] hw/pci: Introduce pci_setup_iommu_per_bus() for per-bus IOMMU ops retrieval

Shameer Kolothum via posted 12 patches 4 months, 1 week ago
Maintainers: Eric Auger <eric.auger@redhat.com>, Peter Maydell <peter.maydell@linaro.org>, Shannon Zhao <shannon.zhaosl@gmail.com>, "Michael S. Tsirkin" <mst@redhat.com>, Igor Mammedov <imammedo@redhat.com>, Ani Sinha <anisinha@redhat.com>, Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
There is a newer version of this series
[PATCH v7 07/12] hw/pci: Introduce pci_setup_iommu_per_bus() for per-bus IOMMU ops retrieval
Posted by Shameer Kolothum via 4 months, 1 week ago
Currently, pci_setup_iommu() registers IOMMU ops for a given PCIBus.
However, when retrieving IOMMU ops for a device using
pci_device_get_iommu_bus_devfn(), the function checks the parent_dev
and fetches IOMMU ops from the parent device, even if the current
bus does not have any associated IOMMU ops.

This behavior works for now because QEMU's IOMMU implementations are
globally scoped, and host bridges rely on the bypass_iommu property
to skip IOMMU translation when needed.

However, this model will break with the soon to be introduced
arm-smmuv3 device, which allows users to associate the IOMMU
with a specific PCIe root complex (e.g., the default pcie.0
or a pxb-pcie root complex).

For example, consider the following setup with multiple root
complexes:

-device arm-smmuv3,primary-bus=pcie.0,id=smmuv3.0 \
...
-device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \
-device pcie-root-port,id=pcie.port1,bus=pcie.1 \
-device virtio-net-pci,bus=pcie.port1

In Qemu, pxb-pcie acts as a special root complex whose parent is
effectively the default root complex(pcie.0). Hence, though pcie.1
has no associated SMMUv3 as per above, pci_device_get_iommu_bus_devfn()
will incorrectly return the IOMMU ops from pcie.0 due to the fallback
via parent_dev.

To fix this, introduce a new helper pci_setup_iommu_per_bus() that
explicitly sets the new iommu_per_bus field in the PCIBus structure.
This helper will be used in a subsequent patch that adds support for
the new arm-smmuv3 device.

Update pci_device_get_iommu_bus_devfn() to use iommu_per_bus when
determining the correct IOMMU ops, ensuring accurate behavior for
per-bus IOMMUs.

Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Nathan Chen <nathanc@nvidia.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 hw/pci/pci.c             | 31 +++++++++++++++++++++++++++++++
 include/hw/pci/pci.h     |  2 ++
 include/hw/pci/pci_bus.h |  1 +
 3 files changed, 34 insertions(+)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index c70b5ceeba..85d6a497da 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2909,6 +2909,19 @@ static void pci_device_get_iommu_bus_devfn(PCIDevice *dev,
             }
         }
 
+        /*
+         * When multiple PCI Express Root Buses are defined using pxb-pcie,
+         * the IOMMU configuration may be specific to each root bus. However,
+         * pxb-pcie acts as a special root complex whose parent is effectively
+         * the default root complex(pcie.0). Ensure that we retrieve the
+         * correct IOMMU ops(if any) in such cases.
+         */
+        if (pci_bus_is_express(iommu_bus) && pci_bus_is_root(iommu_bus)) {
+            if (!iommu_bus->iommu_per_bus && parent_bus->iommu_per_bus) {
+                break;
+            }
+        }
+
         iommu_bus = parent_bus;
     }
 
@@ -3169,6 +3182,24 @@ void pci_setup_iommu(PCIBus *bus, const PCIIOMMUOps *ops, void *opaque)
     bus->iommu_opaque = opaque;
 }
 
+/*
+ * Similar to pci_setup_iommu(), but sets iommu_per_bus to true,
+ * indicating that the IOMMU is specific to this bus. This is used by
+ * IOMMU implementations that are tied to a specific PCIe root complex.
+ *
+ * In QEMU, pxb-pcie behaves as a special root complex whose parent is
+ * effectively the default root complex (pcie.0). The iommu_per_bus
+ * is checked in pci_device_get_iommu_bus_devfn() to ensure the correct
+ * IOMMU ops are returned, avoiding the use of the parent’s IOMMU when
+ * it's not appropriate.
+ */
+void pci_setup_iommu_per_bus(PCIBus *bus, const PCIIOMMUOps *ops,
+                             void *opaque)
+{
+    pci_setup_iommu(bus, ops, opaque);
+    bus->iommu_per_bus = true;
+}
+
 static void pci_dev_get_w64(PCIBus *b, PCIDevice *dev, void *opaque)
 {
     Range *range = opaque;
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index df3cc7b875..a3e0870a15 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -764,6 +764,8 @@ int pci_iommu_unregister_iotlb_notifier(PCIDevice *dev, uint32_t pasid,
  */
 void pci_setup_iommu(PCIBus *bus, const PCIIOMMUOps *ops, void *opaque);
 
+void pci_setup_iommu_per_bus(PCIBus *bus, const PCIIOMMUOps *ops, void *opaque);
+
 pcibus_t pci_bar_address(PCIDevice *d,
                          int reg, uint8_t type, pcibus_t size);
 
diff --git a/include/hw/pci/pci_bus.h b/include/hw/pci/pci_bus.h
index 2261312546..c738446788 100644
--- a/include/hw/pci/pci_bus.h
+++ b/include/hw/pci/pci_bus.h
@@ -35,6 +35,7 @@ struct PCIBus {
     enum PCIBusFlags flags;
     const PCIIOMMUOps *iommu_ops;
     void *iommu_opaque;
+    bool iommu_per_bus;
     uint8_t devfn_min;
     uint32_t slot_reserved_mask;
     pci_set_irq_fn set_irq;
-- 
2.47.0


Re: [PATCH v7 07/12] hw/pci: Introduce pci_setup_iommu_per_bus() for per-bus IOMMU ops retrieval
Posted by Nicolin Chen 4 months, 1 week ago
On Tue, Jul 08, 2025 at 04:40:50PM +0100, Shameer Kolothum wrote:
> Currently, pci_setup_iommu() registers IOMMU ops for a given PCIBus.
> However, when retrieving IOMMU ops for a device using
> pci_device_get_iommu_bus_devfn(), the function checks the parent_dev
> and fetches IOMMU ops from the parent device, even if the current
> bus does not have any associated IOMMU ops.
> 
> This behavior works for now because QEMU's IOMMU implementations are
> globally scoped, and host bridges rely on the bypass_iommu property
> to skip IOMMU translation when needed.
> 
> However, this model will break with the soon to be introduced
> arm-smmuv3 device, which allows users to associate the IOMMU
> with a specific PCIe root complex (e.g., the default pcie.0
> or a pxb-pcie root complex).
> 
> For example, consider the following setup with multiple root
> complexes:
> 
> -device arm-smmuv3,primary-bus=pcie.0,id=smmuv3.0 \
> ...
> -device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \
> -device pcie-root-port,id=pcie.port1,bus=pcie.1 \
> -device virtio-net-pci,bus=pcie.port1
> 
> In Qemu, pxb-pcie acts as a special root complex whose parent is
> effectively the default root complex(pcie.0). Hence, though pcie.1
> has no associated SMMUv3 as per above, pci_device_get_iommu_bus_devfn()
> will incorrectly return the IOMMU ops from pcie.0 due to the fallback
> via parent_dev.
> 
> To fix this, introduce a new helper pci_setup_iommu_per_bus() that
> explicitly sets the new iommu_per_bus field in the PCIBus structure.
> This helper will be used in a subsequent patch that adds support for
> the new arm-smmuv3 device.
> 
> Update pci_device_get_iommu_bus_devfn() to use iommu_per_bus when
> determining the correct IOMMU ops, ensuring accurate behavior for
> per-bus IOMMUs.
> 
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Reviewed-by: Eric Auger <eric.auger@redhat.com>
> Tested-by: Nathan Chen <nathanc@nvidia.com>
> Tested-by: Eric Auger <eric.auger@redhat.com>
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>

Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>

With a nit:

> +        /*
> +         * When multiple PCI Express Root Buses are defined using pxb-pcie,
> +         * the IOMMU configuration may be specific to each root bus. However,
> +         * pxb-pcie acts as a special root complex whose parent is effectively
> +         * the default root complex(pcie.0). Ensure that we retrieve the
> +         * correct IOMMU ops(if any) in such cases.
> +         */
> +        if (pci_bus_is_express(iommu_bus) && pci_bus_is_root(iommu_bus)) {
> +            if (!iommu_bus->iommu_per_bus && parent_bus->iommu_per_bus) {
> +                break;
> +            }

I think this should just check "if (parent_bus->iommu_per_bus)",
which means that the parent's iommu bus is private so not shared
with any other PCI buses.
Re: [PATCH v7 07/12] hw/pci: Introduce pci_setup_iommu_per_bus() for per-bus IOMMU ops retrieval
Posted by Nicolin Chen 4 months, 1 week ago
On Tue, Jul 08, 2025 at 04:40:50PM +0100, Shameer Kolothum wrote:
> @@ -2909,6 +2909,19 @@ static void pci_device_get_iommu_bus_devfn(PCIDevice *dev,
>              }
>          }
>  
> +        /*
> +         * When multiple PCI Express Root Buses are defined using pxb-pcie,
> +         * the IOMMU configuration may be specific to each root bus. However,
> +         * pxb-pcie acts as a special root complex whose parent is effectively
> +         * the default root complex(pcie.0). Ensure that we retrieve the
> +         * correct IOMMU ops(if any) in such cases.
> +         */
> +        if (pci_bus_is_express(iommu_bus) && pci_bus_is_root(iommu_bus)) {
> +            if (!iommu_bus->iommu_per_bus && parent_bus->iommu_per_bus) {
> +                break;
 
Mind elaborating why the current bus must unset iommu_per_bus while
its parent sets iommu_per_bus?

My understanding is that for a pxb-pcie we should set iommu_per_bus
but for its parent (the default root complex) we should unset its
iommu_per_bus?

Thanks
Nicolin
RE: [PATCH v7 07/12] hw/pci: Introduce pci_setup_iommu_per_bus() for per-bus IOMMU ops retrieval
Posted by Shameerali Kolothum Thodi via 4 months, 1 week ago

> -----Original Message-----
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Tuesday, July 8, 2025 10:26 PM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
> ddutile@redhat.com; berrange@redhat.com; imammedo@redhat.com;
> nathanc@nvidia.com; mochs@nvidia.com; smostafa@google.com;
> gustavo.romero@linaro.org; mst@redhat.com;
> marcel.apfelbaum@gmail.com; Linuxarm <linuxarm@huawei.com>;
> Wangzhou (B) <wangzhou1@hisilicon.com>; jiangkunkun
> <jiangkunkun@huawei.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
> Subject: Re: [PATCH v7 07/12] hw/pci: Introduce
> pci_setup_iommu_per_bus() for per-bus IOMMU ops retrieval
> 
> On Tue, Jul 08, 2025 at 04:40:50PM +0100, Shameer Kolothum wrote:
> > @@ -2909,6 +2909,19 @@ static void
> pci_device_get_iommu_bus_devfn(PCIDevice *dev,
> >              }
> >          }
> >
> > +        /*
> > +         * When multiple PCI Express Root Buses are defined using pxb-
> pcie,
> > +         * the IOMMU configuration may be specific to each root bus.
> However,
> > +         * pxb-pcie acts as a special root complex whose parent is
> effectively
> > +         * the default root complex(pcie.0). Ensure that we retrieve the
> > +         * correct IOMMU ops(if any) in such cases.
> > +         */
> > +        if (pci_bus_is_express(iommu_bus) &&
> pci_bus_is_root(iommu_bus)) {
> > +            if (!iommu_bus->iommu_per_bus && parent_bus-
> >iommu_per_bus) {
> > +                break;
> 
> Mind elaborating why the current bus must unset iommu_per_bus while
> its parent sets iommu_per_bus?
> 
> My understanding is that for a pxb-pcie we should set iommu_per_bus
> but for its parent (the default root complex) we should unset its
> iommu_per_bus?

Well, for new arm-smmuv3 dev you need an associated pcie root
complex. Either the default pcie.0 or a pxb-pcie one. And as I
mentioned in my reply to the other thread(patch #2) and commit log here,
the pxb-pcie is special extra root complex in Qemu which has pcie.0 has
parent bus.

The above pci_device_get_iommu_bus_devfn() at present, iterate over the
parent_dev if it is set and returns the parent_bus IOMMU ops even if the
associated pxb-pcie bus doesn't have any IOMMU. This creates problem
for a case that is described here in the cover letter here,
https://lore.kernel.org/qemu-devel/20250708154055.101012-1-shameerali.kolothum.thodi@huawei.com/

(Please see "Major changes from v4:" section)

To address that issue, this patch introduces an new helper function to specify that
the IOMMU ops are specific to the associated root complex(iommu_per_bus) and
use that to return the correct IOMMU ops.

Hope with that context it is clear now.

Thanks,
Shameer
Re: [PATCH v7 07/12] hw/pci: Introduce pci_setup_iommu_per_bus() for per-bus IOMMU ops retrieval
Posted by Nicolin Chen 4 months, 1 week ago
On Wed, Jul 09, 2025 at 08:20:35AM +0000, Shameerali Kolothum Thodi wrote:
> > On Tue, Jul 08, 2025 at 04:40:50PM +0100, Shameer Kolothum wrote:
> > > @@ -2909,6 +2909,19 @@ static void
> > pci_device_get_iommu_bus_devfn(PCIDevice *dev,
> > >              }
> > >          }
> > >
> > > +        /*
> > > +         * When multiple PCI Express Root Buses are defined using pxb-
> > pcie,
> > > +         * the IOMMU configuration may be specific to each root bus.
> > However,
> > > +         * pxb-pcie acts as a special root complex whose parent is
> > effectively
> > > +         * the default root complex(pcie.0). Ensure that we retrieve the
> > > +         * correct IOMMU ops(if any) in such cases.
> > > +         */
> > > +        if (pci_bus_is_express(iommu_bus) &&
> > pci_bus_is_root(iommu_bus)) {
> > > +            if (!iommu_bus->iommu_per_bus && parent_bus-
> > >iommu_per_bus) {
> > > +                break;
> > 
> > Mind elaborating why the current bus must unset iommu_per_bus while
> > its parent sets iommu_per_bus?
> > 
> > My understanding is that for a pxb-pcie we should set iommu_per_bus
> > but for its parent (the default root complex) we should unset its
> > iommu_per_bus?
> 
> Well, for new arm-smmuv3 dev you need an associated pcie root
> complex. Either the default pcie.0 or a pxb-pcie one. And as I
> mentioned in my reply to the other thread(patch #2) and commit log here,
> the pxb-pcie is special extra root complex in Qemu which has pcie.0 has
> parent bus.
> 
> The above pci_device_get_iommu_bus_devfn() at present, iterate over the
> parent_dev if it is set and returns the parent_bus IOMMU ops even if the
> associated pxb-pcie bus doesn't have any IOMMU. This creates problem
> for a case that is described here in the cover letter here,
> https://lore.kernel.org/qemu-devel/20250708154055.101012-1-shameerali.kolothum.thodi@huawei.com/
> 
> (Please see "Major changes from v4:" section)
> 
> To address that issue, this patch introduces an new helper function to specify that
> the IOMMU ops are specific to the associated root complex(iommu_per_bus) and
> use that to return the correct IOMMU ops.
> 
> Hope with that context it is clear now.

Hmm, I was not questioning the context, I get what the patch is
supposed to do.

I was asking the logic that is unclear to me why it breaks when:
    !pxb-pcie->iommu_per_bus && pcie.0->iommu_per_bus

Or in which case pcie.0 would be set to iommu_per_bus=true?

Thanks
Nicolin
RE: [PATCH v7 07/12] hw/pci: Introduce pci_setup_iommu_per_bus() for per-bus IOMMU ops retrieval
Posted by Shameerali Kolothum Thodi via 4 months, 1 week ago

> -----Original Message-----
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Thursday, July 10, 2025 1:07 AM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
> ddutile@redhat.com; berrange@redhat.com; imammedo@redhat.com;
> nathanc@nvidia.com; mochs@nvidia.com; smostafa@google.com;
> gustavo.romero@linaro.org; mst@redhat.com;
> marcel.apfelbaum@gmail.com; Linuxarm <linuxarm@huawei.com>;
> Wangzhou (B) <wangzhou1@hisilicon.com>; jiangkunkun
> <jiangkunkun@huawei.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
> Subject: Re: [PATCH v7 07/12] hw/pci: Introduce
> pci_setup_iommu_per_bus() for per-bus IOMMU ops retrieval
> 
> On Wed, Jul 09, 2025 at 08:20:35AM +0000, Shameerali Kolothum Thodi
> wrote:
> > > On Tue, Jul 08, 2025 at 04:40:50PM +0100, Shameer Kolothum wrote:
> > > > @@ -2909,6 +2909,19 @@ static void
> > > pci_device_get_iommu_bus_devfn(PCIDevice *dev,
> > > >              }
> > > >          }
> > > >
> > > > +        /*
> > > > +         * When multiple PCI Express Root Buses are defined using pxb-
> > > pcie,
> > > > +         * the IOMMU configuration may be specific to each root bus.
> > > However,
> > > > +         * pxb-pcie acts as a special root complex whose parent is
> > > effectively
> > > > +         * the default root complex(pcie.0). Ensure that we retrieve the
> > > > +         * correct IOMMU ops(if any) in such cases.
> > > > +         */
> > > > +        if (pci_bus_is_express(iommu_bus) &&
> > > pci_bus_is_root(iommu_bus)) {
> > > > +            if (!iommu_bus->iommu_per_bus && parent_bus-
> > > >iommu_per_bus) {
> > > > +                break;
> > >
> > > Mind elaborating why the current bus must unset iommu_per_bus
> while
> > > its parent sets iommu_per_bus?
> > >
> > > My understanding is that for a pxb-pcie we should set iommu_per_bus
> > > but for its parent (the default root complex) we should unset its
> > > iommu_per_bus?
> >
> > Well, for new arm-smmuv3 dev you need an associated pcie root
> > complex. Either the default pcie.0 or a pxb-pcie one. And as I
> > mentioned in my reply to the other thread(patch #2) and commit log
> here,
> > the pxb-pcie is special extra root complex in Qemu which has pcie.0 has
> > parent bus.
> >
> > The above pci_device_get_iommu_bus_devfn() at present, iterate over
> the
> > parent_dev if it is set and returns the parent_bus IOMMU ops even if the
> > associated pxb-pcie bus doesn't have any IOMMU. This creates problem
> > for a case that is described here in the cover letter here,
> > https://lore.kernel.org/qemu-devel/20250708154055.101012-1-
> shameerali.kolothum.thodi@huawei.com/
> >
> > (Please see "Major changes from v4:" section)
> >
> > To address that issue, this patch introduces an new helper function to
> specify that
> > the IOMMU ops are specific to the associated root
> complex(iommu_per_bus) and
> > use that to return the correct IOMMU ops.
> >
> > Hope with that context it is clear now.
> 
> Hmm, I was not questioning the context, I get what the patch is
> supposed to do.
> 
> I was asking the logic that is unclear to me why it breaks when:
>     !pxb-pcie->iommu_per_bus && pcie.0->iommu_per_bus
> 
> Or in which case pcie.0 would be set to iommu_per_bus=true?

Yes. Consider the example I gave in cover  letter,

-device arm-smmuv3,primary-bus=pcie.0,id=smmuv3.1 \
-device virtio-net-pci,bus=pcie.0,netdev=net0,id=virtionet.0 \
-device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \
-device arm-smmuv3,primary-bus=pcie.1,id=smmuv3.2 \
-device pcie-root-port,id=pcie.port1,chassis=2,bus=pcie.1 \
-device virtio-net-pci,bus=pcie.port1,netdev=net1,id=virtionet.1

pcie.0 is behind new SMMUv3 dev(smmuv3.1) and has iommu_per_bus set.
pcie.1 has no SMMv3U and iommu_per_bus is not set for it.

And we don't want pci_device_get_iommu_bus_devfn() to return pcie.0's
IOMMU ops for virtionet.1. Hence the break.

Thanks,
Shameer
Re: [PATCH v7 07/12] hw/pci: Introduce pci_setup_iommu_per_bus() for per-bus IOMMU ops retrieval
Posted by Donald Dutile 4 months, 1 week ago

On 7/10/25 3:37 AM, Shameerali Kolothum Thodi wrote:
> 
> 
>> -----Original Message-----
>> From: Nicolin Chen <nicolinc@nvidia.com>
>> Sent: Thursday, July 10, 2025 1:07 AM
>> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
>> Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
>> eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
>> ddutile@redhat.com; berrange@redhat.com; imammedo@redhat.com;
>> nathanc@nvidia.com; mochs@nvidia.com; smostafa@google.com;
>> gustavo.romero@linaro.org; mst@redhat.com;
>> marcel.apfelbaum@gmail.com; Linuxarm <linuxarm@huawei.com>;
>> Wangzhou (B) <wangzhou1@hisilicon.com>; jiangkunkun
>> <jiangkunkun@huawei.com>; Jonathan Cameron
>> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
>> Subject: Re: [PATCH v7 07/12] hw/pci: Introduce
>> pci_setup_iommu_per_bus() for per-bus IOMMU ops retrieval
>>
>> On Wed, Jul 09, 2025 at 08:20:35AM +0000, Shameerali Kolothum Thodi
>> wrote:
>>>> On Tue, Jul 08, 2025 at 04:40:50PM +0100, Shameer Kolothum wrote:
>>>>> @@ -2909,6 +2909,19 @@ static void
>>>> pci_device_get_iommu_bus_devfn(PCIDevice *dev,
>>>>>               }
>>>>>           }
>>>>>
>>>>> +        /*
>>>>> +         * When multiple PCI Express Root Buses are defined using pxb-
>>>> pcie,
>>>>> +         * the IOMMU configuration may be specific to each root bus.
>>>> However,
>>>>> +         * pxb-pcie acts as a special root complex whose parent is
>>>> effectively
>>>>> +         * the default root complex(pcie.0). Ensure that we retrieve the
>>>>> +         * correct IOMMU ops(if any) in such cases.
>>>>> +         */
>>>>> +        if (pci_bus_is_express(iommu_bus) &&
>>>> pci_bus_is_root(iommu_bus)) {
>>>>> +            if (!iommu_bus->iommu_per_bus && parent_bus-
>>>>> iommu_per_bus) {
>>>>> +                break;
>>>>
>>>> Mind elaborating why the current bus must unset iommu_per_bus
>> while
>>>> its parent sets iommu_per_bus?
>>>>
>>>> My understanding is that for a pxb-pcie we should set iommu_per_bus
>>>> but for its parent (the default root complex) we should unset its
>>>> iommu_per_bus?
>>>
>>> Well, for new arm-smmuv3 dev you need an associated pcie root
>>> complex. Either the default pcie.0 or a pxb-pcie one. And as I
>>> mentioned in my reply to the other thread(patch #2) and commit log
>> here,
>>> the pxb-pcie is special extra root complex in Qemu which has pcie.0 has
>>> parent bus.
>>>
>>> The above pci_device_get_iommu_bus_devfn() at present, iterate over
>> the
>>> parent_dev if it is set and returns the parent_bus IOMMU ops even if the
>>> associated pxb-pcie bus doesn't have any IOMMU. This creates problem
>>> for a case that is described here in the cover letter here,
>>> https://lore.kernel.org/qemu-devel/20250708154055.101012-1-
>> shameerali.kolothum.thodi@huawei.com/
>>>
>>> (Please see "Major changes from v4:" section)
>>>
>>> To address that issue, this patch introduces an new helper function to
>> specify that
>>> the IOMMU ops are specific to the associated root
>> complex(iommu_per_bus) and
>>> use that to return the correct IOMMU ops.
>>>
>>> Hope with that context it is clear now.
>>
>> Hmm, I was not questioning the context, I get what the patch is
>> supposed to do.
>>
>> I was asking the logic that is unclear to me why it breaks when:
>>      !pxb-pcie->iommu_per_bus && pcie.0->iommu_per_bus
>>
>> Or in which case pcie.0 would be set to iommu_per_bus=true?
> 
> Yes. Consider the example I gave in cover  letter,
> 
> -device arm-smmuv3,primary-bus=pcie.0,id=smmuv3.1 \
> -device virtio-net-pci,bus=pcie.0,netdev=net0,id=virtionet.0 \
> -device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \
> -device arm-smmuv3,primary-bus=pcie.1,id=smmuv3.2 \
> -device pcie-root-port,id=pcie.port1,chassis=2,bus=pcie.1 \
> -device virtio-net-pci,bus=pcie.port1,netdev=net1,id=virtionet.1
> 
> pcie.0 is behind new SMMUv3 dev(smmuv3.1) and has iommu_per_bus set.
> pcie.1 has no SMMv3U and iommu_per_bus is not set for it.
pcie.1 doesn't?   then what is this line saying/meaning?:
  -device arm-smmuv3,primary-bus=pcie.1,id=smmuv3.2 \

I read that as an smmuv3 attached to pcie.1, with an id of smmuv3.2;
just as I read this config:
  -device arm-smmuv3,primary-bus=pcie.0,id=smmuv3.1 \
as an smmuv3 attached to pcie.0 iwth id smmuv3.1

> 
> And we don't want pci_device_get_iommu_bus_devfn() to return pcie.0's
> IOMMU ops for virtionet.1. Hence the break.
> 
> Thanks,
> Shameer
>
RE: [PATCH v7 07/12] hw/pci: Introduce pci_setup_iommu_per_bus() for per-bus IOMMU ops retrieval
Posted by Shameerali Kolothum Thodi via 4 months, 1 week ago

> -----Original Message-----
> From: Donald Dutile <ddutile@redhat.com>
> Sent: Thursday, July 10, 2025 5:00 PM
> To: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; Nicolin Chen
> <nicolinc@nvidia.com>
> Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
> berrange@redhat.com; imammedo@redhat.com; nathanc@nvidia.com;
> mochs@nvidia.com; smostafa@google.com; gustavo.romero@linaro.org;
> mst@redhat.com; marcel.apfelbaum@gmail.com; Linuxarm
> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
> Subject: Re: [PATCH v7 07/12] hw/pci: Introduce
> pci_setup_iommu_per_bus() for per-bus IOMMU ops retrieval
> 
> 
> 
> On 7/10/25 3:37 AM, Shameerali Kolothum Thodi wrote:
> >
> >
> >> -----Original Message-----
> >> From: Nicolin Chen <nicolinc@nvidia.com>
> >> Sent: Thursday, July 10, 2025 1:07 AM
> >> To: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>
> >> Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> >> eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
> >> ddutile@redhat.com; berrange@redhat.com; imammedo@redhat.com;
> >> nathanc@nvidia.com; mochs@nvidia.com; smostafa@google.com;
> >> gustavo.romero@linaro.org; mst@redhat.com;
> >> marcel.apfelbaum@gmail.com; Linuxarm <linuxarm@huawei.com>;
> Wangzhou
> >> (B) <wangzhou1@hisilicon.com>; jiangkunkun
> <jiangkunkun@huawei.com>;
> >> Jonathan Cameron <jonathan.cameron@huawei.com>;
> >> zhangfei.gao@linaro.org
> >> Subject: Re: [PATCH v7 07/12] hw/pci: Introduce
> >> pci_setup_iommu_per_bus() for per-bus IOMMU ops retrieval
> >>
> >> On Wed, Jul 09, 2025 at 08:20:35AM +0000, Shameerali Kolothum Thodi
> >> wrote:
> >>>> On Tue, Jul 08, 2025 at 04:40:50PM +0100, Shameer Kolothum wrote:
> >>>>> @@ -2909,6 +2909,19 @@ static void
> >>>> pci_device_get_iommu_bus_devfn(PCIDevice *dev,
> >>>>>               }
> >>>>>           }
> >>>>>
> >>>>> +        /*
> >>>>> +         * When multiple PCI Express Root Buses are defined using
> >>>>> + pxb-
> >>>> pcie,
> >>>>> +         * the IOMMU configuration may be specific to each root bus.
> >>>> However,
> >>>>> +         * pxb-pcie acts as a special root complex whose parent
> >>>>> + is
> >>>> effectively
> >>>>> +         * the default root complex(pcie.0). Ensure that we retrieve the
> >>>>> +         * correct IOMMU ops(if any) in such cases.
> >>>>> +         */
> >>>>> +        if (pci_bus_is_express(iommu_bus) &&
> >>>> pci_bus_is_root(iommu_bus)) {
> >>>>> +            if (!iommu_bus->iommu_per_bus && parent_bus-
> >>>>> iommu_per_bus) {
> >>>>> +                break;
> >>>>
> >>>> Mind elaborating why the current bus must unset iommu_per_bus
> >> while
> >>>> its parent sets iommu_per_bus?
> >>>>
> >>>> My understanding is that for a pxb-pcie we should set
> iommu_per_bus
> >>>> but for its parent (the default root complex) we should unset its
> >>>> iommu_per_bus?
> >>>
> >>> Well, for new arm-smmuv3 dev you need an associated pcie root
> >>> complex. Either the default pcie.0 or a pxb-pcie one. And as I
> >>> mentioned in my reply to the other thread(patch #2) and commit log
> >> here,
> >>> the pxb-pcie is special extra root complex in Qemu which has pcie.0
> >>> has parent bus.
> >>>
> >>> The above pci_device_get_iommu_bus_devfn() at present, iterate over
> >> the
> >>> parent_dev if it is set and returns the parent_bus IOMMU ops even if
> >>> the associated pxb-pcie bus doesn't have any IOMMU. This creates
> >>> problem for a case that is described here in the cover letter here,
> >>> https://lore.kernel.org/qemu-devel/20250708154055.101012-1-
> >> shameerali.kolothum.thodi@huawei.com/
> >>>
> >>> (Please see "Major changes from v4:" section)
> >>>
> >>> To address that issue, this patch introduces an new helper function
> >>> to
> >> specify that
> >>> the IOMMU ops are specific to the associated root
> >> complex(iommu_per_bus) and
> >>> use that to return the correct IOMMU ops.
> >>>
> >>> Hope with that context it is clear now.
> >>
> >> Hmm, I was not questioning the context, I get what the patch is
> >> supposed to do.
> >>
> >> I was asking the logic that is unclear to me why it breaks when:
> >>      !pxb-pcie->iommu_per_bus && pcie.0->iommu_per_bus
> >>
> >> Or in which case pcie.0 would be set to iommu_per_bus=true?
> >
> > Yes. Consider the example I gave in cover  letter,
> >
> > -device arm-smmuv3,primary-bus=pcie.0,id=smmuv3.1 \ -device
> > virtio-net-pci,bus=pcie.0,netdev=net0,id=virtionet.0 \ -device
> > pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \ -device
> > arm-smmuv3,primary-bus=pcie.1,id=smmuv3.2 \ -device
> > pcie-root-port,id=pcie.port1,chassis=2,bus=pcie.1 \ -device
> > virtio-net-pci,bus=pcie.port1,netdev=net1,id=virtionet.1
> >
> > pcie.0 is behind new SMMUv3 dev(smmuv3.1) and has iommu_per_bus
> set.
> > pcie.1 has no SMMv3U and iommu_per_bus is not set for it.
> pcie.1 doesn't?   then what is this line saying/meaning?:
>   -device arm-smmuv3,primary-bus=pcie.1,id=smmuv3.2 \
> 
> I read that as an smmuv3 attached to pcie.1, with an id of smmuv3.2; just
> as I read this config:
>   -device arm-smmuv3,primary-bus=pcie.0,id=smmuv3.1 \ as an smmuv3
> attached to pcie.0 iwth id smmuv3.1

Oops..I forgot to delete that from the config:
This is what I meant,

-device arm-smmuv3,primary-bus=pcie.0,id=smmuv3.1 \
-device virtio-net-pci,bus=pcie.0,netdev=net0,id=virtionet.0 \
-device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \
-device pcie-root-port,id=pcie.port1,chassis=2,bus=pcie.1 \
-device virtio-net-pci,bus=pcie.port1,netdev=net1,id=virtionet.1 \

Thanks,
Shameer

> 
> >
> > And we don't want pci_device_get_iommu_bus_devfn() to return pcie.0's
> > IOMMU ops for virtionet.1. Hence the break.
> >
> > Thanks,
> > Shameer
> >
> 

Re: [PATCH v7 07/12] hw/pci: Introduce pci_setup_iommu_per_bus() for per-bus IOMMU ops retrieval
Posted by Donald Dutile 4 months, 1 week ago

On 7/10/25 12:21 PM, Shameerali Kolothum Thodi wrote:
> 
> 
>> -----Original Message-----
>> From: Donald Dutile <ddutile@redhat.com>
>> Sent: Thursday, July 10, 2025 5:00 PM
>> To: Shameerali Kolothum Thodi
>> <shameerali.kolothum.thodi@huawei.com>; Nicolin Chen
>> <nicolinc@nvidia.com>
>> Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
>> eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
>> berrange@redhat.com; imammedo@redhat.com; nathanc@nvidia.com;
>> mochs@nvidia.com; smostafa@google.com; gustavo.romero@linaro.org;
>> mst@redhat.com; marcel.apfelbaum@gmail.com; Linuxarm
>> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
>> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
>> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
>> Subject: Re: [PATCH v7 07/12] hw/pci: Introduce
>> pci_setup_iommu_per_bus() for per-bus IOMMU ops retrieval
>>
>>
>>
>> On 7/10/25 3:37 AM, Shameerali Kolothum Thodi wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Nicolin Chen <nicolinc@nvidia.com>
>>>> Sent: Thursday, July 10, 2025 1:07 AM
>>>> To: Shameerali Kolothum Thodi
>> <shameerali.kolothum.thodi@huawei.com>
>>>> Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
>>>> eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
>>>> ddutile@redhat.com; berrange@redhat.com; imammedo@redhat.com;
>>>> nathanc@nvidia.com; mochs@nvidia.com; smostafa@google.com;
>>>> gustavo.romero@linaro.org; mst@redhat.com;
>>>> marcel.apfelbaum@gmail.com; Linuxarm <linuxarm@huawei.com>;
>> Wangzhou
>>>> (B) <wangzhou1@hisilicon.com>; jiangkunkun
>> <jiangkunkun@huawei.com>;
>>>> Jonathan Cameron <jonathan.cameron@huawei.com>;
>>>> zhangfei.gao@linaro.org
>>>> Subject: Re: [PATCH v7 07/12] hw/pci: Introduce
>>>> pci_setup_iommu_per_bus() for per-bus IOMMU ops retrieval
>>>>
>>>> On Wed, Jul 09, 2025 at 08:20:35AM +0000, Shameerali Kolothum Thodi
>>>> wrote:
>>>>>> On Tue, Jul 08, 2025 at 04:40:50PM +0100, Shameer Kolothum wrote:
>>>>>>> @@ -2909,6 +2909,19 @@ static void
>>>>>> pci_device_get_iommu_bus_devfn(PCIDevice *dev,
>>>>>>>                }
>>>>>>>            }
>>>>>>>
>>>>>>> +        /*
>>>>>>> +         * When multiple PCI Express Root Buses are defined using
>>>>>>> + pxb-
>>>>>> pcie,
>>>>>>> +         * the IOMMU configuration may be specific to each root bus.
>>>>>> However,
>>>>>>> +         * pxb-pcie acts as a special root complex whose parent
>>>>>>> + is
>>>>>> effectively
>>>>>>> +         * the default root complex(pcie.0). Ensure that we retrieve the
>>>>>>> +         * correct IOMMU ops(if any) in such cases.
>>>>>>> +         */
>>>>>>> +        if (pci_bus_is_express(iommu_bus) &&
>>>>>> pci_bus_is_root(iommu_bus)) {
>>>>>>> +            if (!iommu_bus->iommu_per_bus && parent_bus-
>>>>>>> iommu_per_bus) {
>>>>>>> +                break;
>>>>>>
>>>>>> Mind elaborating why the current bus must unset iommu_per_bus
>>>> while
>>>>>> its parent sets iommu_per_bus?
>>>>>>
>>>>>> My understanding is that for a pxb-pcie we should set
>> iommu_per_bus
>>>>>> but for its parent (the default root complex) we should unset its
>>>>>> iommu_per_bus?
>>>>>
>>>>> Well, for new arm-smmuv3 dev you need an associated pcie root
>>>>> complex. Either the default pcie.0 or a pxb-pcie one. And as I
>>>>> mentioned in my reply to the other thread(patch #2) and commit log
>>>> here,
>>>>> the pxb-pcie is special extra root complex in Qemu which has pcie.0
>>>>> has parent bus.
>>>>>
>>>>> The above pci_device_get_iommu_bus_devfn() at present, iterate over
>>>> the
>>>>> parent_dev if it is set and returns the parent_bus IOMMU ops even if
>>>>> the associated pxb-pcie bus doesn't have any IOMMU. This creates
>>>>> problem for a case that is described here in the cover letter here,
>>>>> https://lore.kernel.org/qemu-devel/20250708154055.101012-1-
>>>> shameerali.kolothum.thodi@huawei.com/
>>>>>
>>>>> (Please see "Major changes from v4:" section)
>>>>>
>>>>> To address that issue, this patch introduces an new helper function
>>>>> to
>>>> specify that
>>>>> the IOMMU ops are specific to the associated root
>>>> complex(iommu_per_bus) and
>>>>> use that to return the correct IOMMU ops.
>>>>>
>>>>> Hope with that context it is clear now.
>>>>
>>>> Hmm, I was not questioning the context, I get what the patch is
>>>> supposed to do.
>>>>
>>>> I was asking the logic that is unclear to me why it breaks when:
>>>>       !pxb-pcie->iommu_per_bus && pcie.0->iommu_per_bus
>>>>
>>>> Or in which case pcie.0 would be set to iommu_per_bus=true?
>>>
>>> Yes. Consider the example I gave in cover  letter,
>>>
>>> -device arm-smmuv3,primary-bus=pcie.0,id=smmuv3.1 \ -device
>>> virtio-net-pci,bus=pcie.0,netdev=net0,id=virtionet.0 \ -device
>>> pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \ -device
>>> arm-smmuv3,primary-bus=pcie.1,id=smmuv3.2 \ -device
>>> pcie-root-port,id=pcie.port1,chassis=2,bus=pcie.1 \ -device
>>> virtio-net-pci,bus=pcie.port1,netdev=net1,id=virtionet.1
>>>
>>> pcie.0 is behind new SMMUv3 dev(smmuv3.1) and has iommu_per_bus
>> set.
>>> pcie.1 has no SMMv3U and iommu_per_bus is not set for it.
>> pcie.1 doesn't?   then what is this line saying/meaning?:
>>    -device arm-smmuv3,primary-bus=pcie.1,id=smmuv3.2 \
>>
>> I read that as an smmuv3 attached to pcie.1, with an id of smmuv3.2; just
>> as I read this config:
>>    -device arm-smmuv3,primary-bus=pcie.0,id=smmuv3.1 \ as an smmuv3
>> attached to pcie.0 iwth id smmuv3.1
> 
> Oops..I forgot to delete that from the config:
> This is what I meant,
> 
> -device arm-smmuv3,primary-bus=pcie.0,id=smmuv3.1 \
> -device virtio-net-pci,bus=pcie.0,netdev=net0,id=virtionet.0 \
> -device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \
> -device pcie-root-port,id=pcie.port1,chassis=2,bus=pcie.1 \
> -device virtio-net-pci,bus=pcie.port1,netdev=net1,id=virtionet.1 \
> 
> Thanks,
> Shameer
> 
the dirt is in the details... thanks for the clarification.
- Don

>>
>>>
>>> And we don't want pci_device_get_iommu_bus_devfn() to return pcie.0's
>>> IOMMU ops for virtionet.1. Hence the break.
>>>
>>> Thanks,
>>> Shameer
>>>
>>
>
Re: [PATCH v7 07/12] hw/pci: Introduce pci_setup_iommu_per_bus() for per-bus IOMMU ops retrieval
Posted by Nicolin Chen 4 months, 1 week ago
On Thu, Jul 10, 2025 at 04:21:41PM +0000, Shameerali Kolothum Thodi wrote:
> > >> On Wed, Jul 09, 2025 at 08:20:35AM +0000, Shameerali Kolothum Thodi
> > >> wrote:
> > >>>> On Tue, Jul 08, 2025 at 04:40:50PM +0100, Shameer Kolothum wrote:
> > >>>>> @@ -2909,6 +2909,19 @@ static void
> > >>>> pci_device_get_iommu_bus_devfn(PCIDevice *dev,
> > >>>>>               }
> > >>>>>           }
> > >>>>>
> > >>>>> +        /*
> > >>>>> +         * When multiple PCI Express Root Buses are defined using
> > >>>>> + pxb-
> > >>>> pcie,
> > >>>>> +         * the IOMMU configuration may be specific to each root bus.
> > >>>> However,
> > >>>>> +         * pxb-pcie acts as a special root complex whose parent
> > >>>>> + is
> > >>>> effectively
> > >>>>> +         * the default root complex(pcie.0). Ensure that we retrieve the
> > >>>>> +         * correct IOMMU ops(if any) in such cases.
> > >>>>> +         */
> > >>>>> +        if (pci_bus_is_express(iommu_bus) &&
> > >>>> pci_bus_is_root(iommu_bus)) {
> > >>>>> +            if (!iommu_bus->iommu_per_bus && parent_bus-
> > >>>>> iommu_per_bus) {
> > >>>>> +                break;
> > >>>>
> > >>>> Mind elaborating why the current bus must unset iommu_per_bus
> > >> while
> > >>>> its parent sets iommu_per_bus?
> > >>>>
> > >>>> My understanding is that for a pxb-pcie we should set
> > iommu_per_bus
> > >>>> but for its parent (the default root complex) we should unset its
> > >>>> iommu_per_bus?
> > >>>
> > >>> Well, for new arm-smmuv3 dev you need an associated pcie root
> > >>> complex. Either the default pcie.0 or a pxb-pcie one. And as I
> > >>> mentioned in my reply to the other thread(patch #2) and commit log
> > >> here,
> > >>> the pxb-pcie is special extra root complex in Qemu which has pcie.0
> > >>> has parent bus.
> > >>>
> > >>> The above pci_device_get_iommu_bus_devfn() at present, iterate over
> > >> the
> > >>> parent_dev if it is set and returns the parent_bus IOMMU ops even if
> > >>> the associated pxb-pcie bus doesn't have any IOMMU. This creates
> > >>> problem for a case that is described here in the cover letter here,
> > >>> https://lore.kernel.org/qemu-devel/20250708154055.101012-1-
> > >> shameerali.kolothum.thodi@huawei.com/
> > >>>
> > >>> (Please see "Major changes from v4:" section)
> > >>>
> > >>> To address that issue, this patch introduces an new helper function
> > >>> to
> > >> specify that
> > >>> the IOMMU ops are specific to the associated root
> > >> complex(iommu_per_bus) and
> > >>> use that to return the correct IOMMU ops.
> > >>>
> > >>> Hope with that context it is clear now.
> > >>
> > >> Hmm, I was not questioning the context, I get what the patch is
> > >> supposed to do.
> > >>
> > >> I was asking the logic that is unclear to me why it breaks when:
> > >>      !pxb-pcie->iommu_per_bus && pcie.0->iommu_per_bus
> > >>
> > >> Or in which case pcie.0 would be set to iommu_per_bus=true?
> > >
> > > Yes. Consider the example I gave in cover  letter,
> > >
> > > -device arm-smmuv3,primary-bus=pcie.0,id=smmuv3.1 \ -device
> > > virtio-net-pci,bus=pcie.0,netdev=net0,id=virtionet.0 \ -device
> > > pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \ -device
> > > arm-smmuv3,primary-bus=pcie.1,id=smmuv3.2 \ -device
> > > pcie-root-port,id=pcie.port1,chassis=2,bus=pcie.1 \ -device
> > > virtio-net-pci,bus=pcie.port1,netdev=net1,id=virtionet.1
> > >
> > > pcie.0 is behind new SMMUv3 dev(smmuv3.1) and has iommu_per_bus
> > set.
> > > pcie.1 has no SMMv3U and iommu_per_bus is not set for it.
> > pcie.1 doesn't?   then what is this line saying/meaning?:
> >   -device arm-smmuv3,primary-bus=pcie.1,id=smmuv3.2 \
> > 
> > I read that as an smmuv3 attached to pcie.1, with an id of smmuv3.2; just
> > as I read this config:
> >   -device arm-smmuv3,primary-bus=pcie.0,id=smmuv3.1 \ as an smmuv3
> > attached to pcie.0 iwth id smmuv3.1
> 
> Oops..I forgot to delete that from the config:
> This is what I meant,
> 
> -device arm-smmuv3,primary-bus=pcie.0,id=smmuv3.1 \
> -device virtio-net-pci,bus=pcie.0,netdev=net0,id=virtionet.0 \
> -device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \
> -device pcie-root-port,id=pcie.port1,chassis=2,bus=pcie.1 \
> -device virtio-net-pci,bus=pcie.port1,netdev=net1,id=virtionet.1 \

So, the logic is trying to avoid:
        "iommu_bus = parent_bus;"
for a case that parent_bus (pcie.0) having its own IOMMU.

But shouldn't it be just "if (parent_bus->iommu_per_bus)"?

Why does the current iommu_bus->iommu_per_bus has to be unset?

I think "iommu_bus = parent_bus" should be avoided too even if
the current iommu_bus has its own IOMMU, i.e. iommu_per_bus is
set?

Thanks
Nicolin
RE: [PATCH v7 07/12] hw/pci: Introduce pci_setup_iommu_per_bus() for per-bus IOMMU ops retrieval
Posted by Shameerali Kolothum Thodi via 4 months, 1 week ago

> -----Original Message-----
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Thursday, July 10, 2025 5:56 PM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: Donald Dutile <ddutile@redhat.com>; qemu-arm@nongnu.org; qemu-
> devel@nongnu.org; eric.auger@redhat.com; peter.maydell@linaro.org;
> jgg@nvidia.com; berrange@redhat.com; imammedo@redhat.com;
> nathanc@nvidia.com; mochs@nvidia.com; smostafa@google.com;
> gustavo.romero@linaro.org; mst@redhat.com;
> marcel.apfelbaum@gmail.com; Linuxarm <linuxarm@huawei.com>;
> Wangzhou (B) <wangzhou1@hisilicon.com>; jiangkunkun
> <jiangkunkun@huawei.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
> Subject: Re: [PATCH v7 07/12] hw/pci: Introduce
> pci_setup_iommu_per_bus() for per-bus IOMMU ops retrieval
> 
> On Thu, Jul 10, 2025 at 04:21:41PM +0000, Shameerali Kolothum Thodi
> wrote:
> > > >> On Wed, Jul 09, 2025 at 08:20:35AM +0000, Shameerali Kolothum
> Thodi
> > > >> wrote:
> > > >>>> On Tue, Jul 08, 2025 at 04:40:50PM +0100, Shameer Kolothum
> wrote:
> > > >>>>> @@ -2909,6 +2909,19 @@ static void
> > > >>>> pci_device_get_iommu_bus_devfn(PCIDevice *dev,
> > > >>>>>               }
> > > >>>>>           }
> > > >>>>>
> > > >>>>> +        /*
> > > >>>>> +         * When multiple PCI Express Root Buses are defined using
> > > >>>>> + pxb-
> > > >>>> pcie,
> > > >>>>> +         * the IOMMU configuration may be specific to each root
> bus.
> > > >>>> However,
> > > >>>>> +         * pxb-pcie acts as a special root complex whose parent
> > > >>>>> + is
> > > >>>> effectively
> > > >>>>> +         * the default root complex(pcie.0). Ensure that we retrieve
> the
> > > >>>>> +         * correct IOMMU ops(if any) in such cases.
> > > >>>>> +         */
> > > >>>>> +        if (pci_bus_is_express(iommu_bus) &&
> > > >>>> pci_bus_is_root(iommu_bus)) {
> > > >>>>> +            if (!iommu_bus->iommu_per_bus && parent_bus-
> > > >>>>> iommu_per_bus) {
> > > >>>>> +                break;
> > > >>>>
> > > >>>> Mind elaborating why the current bus must unset iommu_per_bus
> > > >> while
> > > >>>> its parent sets iommu_per_bus?
> > > >>>>
> > > >>>> My understanding is that for a pxb-pcie we should set
> > > iommu_per_bus
> > > >>>> but for its parent (the default root complex) we should unset its
> > > >>>> iommu_per_bus?
> > > >>>
> > > >>> Well, for new arm-smmuv3 dev you need an associated pcie root
> > > >>> complex. Either the default pcie.0 or a pxb-pcie one. And as I
> > > >>> mentioned in my reply to the other thread(patch #2) and commit
> log
> > > >> here,
> > > >>> the pxb-pcie is special extra root complex in Qemu which has pcie.0
> > > >>> has parent bus.
> > > >>>
> > > >>> The above pci_device_get_iommu_bus_devfn() at present, iterate
> over
> > > >> the
> > > >>> parent_dev if it is set and returns the parent_bus IOMMU ops even
> if
> > > >>> the associated pxb-pcie bus doesn't have any IOMMU. This creates
> > > >>> problem for a case that is described here in the cover letter here,
> > > >>> https://lore.kernel.org/qemu-devel/20250708154055.101012-1-
> > > >> shameerali.kolothum.thodi@huawei.com/
> > > >>>
> > > >>> (Please see "Major changes from v4:" section)
> > > >>>
> > > >>> To address that issue, this patch introduces an new helper function
> > > >>> to
> > > >> specify that
> > > >>> the IOMMU ops are specific to the associated root
> > > >> complex(iommu_per_bus) and
> > > >>> use that to return the correct IOMMU ops.
> > > >>>
> > > >>> Hope with that context it is clear now.
> > > >>
> > > >> Hmm, I was not questioning the context, I get what the patch is
> > > >> supposed to do.
> > > >>
> > > >> I was asking the logic that is unclear to me why it breaks when:
> > > >>      !pxb-pcie->iommu_per_bus && pcie.0->iommu_per_bus
> > > >>
> > > >> Or in which case pcie.0 would be set to iommu_per_bus=true?
> > > >
> > > > Yes. Consider the example I gave in cover  letter,
> > > >
> > > > -device arm-smmuv3,primary-bus=pcie.0,id=smmuv3.1 \ -device
> > > > virtio-net-pci,bus=pcie.0,netdev=net0,id=virtionet.0 \ -device
> > > > pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \ -device
> > > > arm-smmuv3,primary-bus=pcie.1,id=smmuv3.2 \ -device
> > > > pcie-root-port,id=pcie.port1,chassis=2,bus=pcie.1 \ -device
> > > > virtio-net-pci,bus=pcie.port1,netdev=net1,id=virtionet.1
> > > >
> > > > pcie.0 is behind new SMMUv3 dev(smmuv3.1) and has
> iommu_per_bus
> > > set.
> > > > pcie.1 has no SMMv3U and iommu_per_bus is not set for it.
> > > pcie.1 doesn't?   then what is this line saying/meaning?:
> > >   -device arm-smmuv3,primary-bus=pcie.1,id=smmuv3.2 \
> > >
> > > I read that as an smmuv3 attached to pcie.1, with an id of smmuv3.2;
> just
> > > as I read this config:
> > >   -device arm-smmuv3,primary-bus=pcie.0,id=smmuv3.1 \ as an smmuv3
> > > attached to pcie.0 iwth id smmuv3.1
> >
> > Oops..I forgot to delete that from the config:
> > This is what I meant,
> >
> > -device arm-smmuv3,primary-bus=pcie.0,id=smmuv3.1 \
> > -device virtio-net-pci,bus=pcie.0,netdev=net0,id=virtionet.0 \
> > -device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \
> > -device pcie-root-port,id=pcie.port1,chassis=2,bus=pcie.1 \
> > -device virtio-net-pci,bus=pcie.port1,netdev=net1,id=virtionet.1 \
> 
> So, the logic is trying to avoid:
>         "iommu_bus = parent_bus;"
> for a case that parent_bus (pcie.0) having its own IOMMU.
> 
> But shouldn't it be just "if (parent_bus->iommu_per_bus)"?
> 
> Why does the current iommu_bus->iommu_per_bus has to be unset?

I think that !iommu_bus->iommu_per_bus check will be always true as 
it enters the while loop only if !iommu_bus->iommu_ops case,

while (iommu_bus && !iommu_bus->iommu_ops && iommu_bus->parent_dev) {

}

So yes, I think that can be removed. I will double check though as I cant
recollect why I added that now.

> 
> I think "iommu_bus = parent_bus" should be avoided too even if
> the current iommu_bus has its own IOMMU, i.e. iommu_per_bus is
> set?

Why? Not clear to me. It only enters the loop if the current iommu_bus
doesn't have Iommu_ops set which in turn means iommu_per_bus is not set. 
Isn't it? . Do you have a particular configuration in mind where it will fail
otherwise?

Thanks,
Shameer
Re: [PATCH v7 07/12] hw/pci: Introduce pci_setup_iommu_per_bus() for per-bus IOMMU ops retrieval
Posted by Nicolin Chen 4 months, 1 week ago
On Thu, Jul 10, 2025 at 09:40:28PM +0000, Shameerali Kolothum Thodi wrote:
> > So, the logic is trying to avoid:
> >         "iommu_bus = parent_bus;"
> > for a case that parent_bus (pcie.0) having its own IOMMU.
> > 
> > But shouldn't it be just "if (parent_bus->iommu_per_bus)"?
> > 
> > Why does the current iommu_bus->iommu_per_bus has to be unset?
> 
> I think that !iommu_bus->iommu_per_bus check will be always true as 
> it enters the while loop only if !iommu_bus->iommu_ops case,
> 
> while (iommu_bus && !iommu_bus->iommu_ops && iommu_bus->parent_dev) {
> 
> }

Yea, that makses sense. "!iommu_bus->iommu_ops" should imply that
"iommu_bus->iommu_per_bus" is unset.

> > I think "iommu_bus = parent_bus" should be avoided too even if
> > the current iommu_bus has its own IOMMU, i.e. iommu_per_bus is
> > set?
> 
> Why? Not clear to me. It only enters the loop if the current iommu_bus
> doesn't have Iommu_ops set which in turn means iommu_per_bus is not set. 
> Isn't it?

Yea. I missed that condition. But what I said still stays correct:
if iommu_bus (pcie.1) has its own IOMMU, it shouldn't take the ops
from the parent_bus. The thing is that this logic is inside a while
condition that has already excluded such a case.

Thanks
Nicolin