From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
On Arm we need to parse DT PCI-IOMMU specifier and provide it to
the driver (for describing the relationship between PCI devices
and IOMMUs) before adding a device to it.
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
---
v1->v2:
* new patch title (was: "pci/arm: Use iommu_add_dt_pci_device() instead of arch hook")
* move iommu_add_dt_pci_device() call (and associated #ifdef) to
pci_add_device()
* use existing call to iommu_add_device()
downstream->v1:
* rebase
* add __maybe_unused attribute to const struct domain_iommu *hd;
* Rename: s/iommu_add_pci_device/iommu_add_dt_pci_device/
* guard iommu_add_dt_pci_device call with CONFIG_HAS_DEVICE_TREE instead of
CONFIG_ARM
(cherry picked from commit 2b9d26badab8b24b5a80d028c4499a5022817213 from
the downstream branch poc/pci-passthrough from
https://gitlab.com/xen-project/people/bmarquis/xen-arm-poc.git)
---
xen/drivers/passthrough/pci.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index b42acb8d7c09..6dbaae682773 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -34,6 +34,11 @@
#include <xen/vpci.h>
#include <xen/msi.h>
#include <xsm/xsm.h>
+
+#ifdef CONFIG_HAS_DEVICE_TREE
+#include <asm/iommu_fwspec.h>
+#endif
+
#include "ats.h"
struct pci_seg {
@@ -762,9 +767,20 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
pdev->domain = NULL;
goto out;
}
+#ifdef CONFIG_HAS_DEVICE_TREE
+ ret = iommu_add_dt_pci_device(pdev);
+ if ( ret < 0 )
+ {
+ printk(XENLOG_ERR "pci-iommu translation failed: %d\n", ret);
+ goto out;
+ }
+#endif
ret = iommu_add_device(pdev);
if ( ret )
{
+#ifdef CONFIG_HAS_DEVICE_TREE
+ iommu_fwspec_free(pci_to_dev(pdev));
+#endif
vpci_remove_device(pdev);
list_del(&pdev->domain_list);
pdev->domain = NULL;
--
2.40.1
On 11.05.2023 21:16, Stewart Hildebrand wrote:
> @@ -762,9 +767,20 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
> pdev->domain = NULL;
> goto out;
> }
> +#ifdef CONFIG_HAS_DEVICE_TREE
> + ret = iommu_add_dt_pci_device(pdev);
> + if ( ret < 0 )
> + {
> + printk(XENLOG_ERR "pci-iommu translation failed: %d\n", ret);
> + goto out;
> + }
> +#endif
> ret = iommu_add_device(pdev);
Hmm, am I misremembering that in the earlier patch you had #else to
invoke the alternative behavior? Now you end up calling both functions;
if that's indeed intended, this may still want doing differently.
Looking at the earlier patch introducing the function, I can't infer
though whether that's intended: iommu_add_dt_pci_device() checks that
the add_device hook is present, but then I didn't find any use of this
hook. The revlog there suggests the check might be stale.
If indeed the function does only preparatory work, I don't see why it
would need naming "iommu_..."; I'd rather consider pci_add_dt_device()
then. Plus in such a case #ifdef-ary here probably wants avoiding by
introducing a suitable no-op stub for the !HAS_DEVICE_TREE case. Then
...
> if ( ret )
> {
> +#ifdef CONFIG_HAS_DEVICE_TREE
> + iommu_fwspec_free(pci_to_dev(pdev));
> +#endif
... this (which I understand is doing the corresponding cleanup) then
also wants wrapping in a suitably named tiny helper function.
But yet further I'm then no longer convinced this is the right place
for the addition. pci_add_device() is backing physdev hypercalls. It
would seem to me that the function may want invoking yet one layer
further up, or it may even want invoking from a brand new DT-specific
physdev-op. This would then leave at least the x86-only paths (invoking
pci_add_device() from outside of pci_physdev_op()) entirely alone.
Jan
On 5/12/23 03:25, Jan Beulich wrote:
> On 11.05.2023 21:16, Stewart Hildebrand wrote:
>> @@ -762,9 +767,20 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
>> pdev->domain = NULL;
>> goto out;
>> }
>> +#ifdef CONFIG_HAS_DEVICE_TREE
>> + ret = iommu_add_dt_pci_device(pdev);
>> + if ( ret < 0 )
>> + {
>> + printk(XENLOG_ERR "pci-iommu translation failed: %d\n", ret);
>> + goto out;
>> + }
>> +#endif
>> ret = iommu_add_device(pdev);
>
> Hmm, am I misremembering that in the earlier patch you had #else to
> invoke the alternative behavior?
You are remembering correctly. v1 had an #else, v2 does not.
> Now you end up calling both functions;
> if that's indeed intended,
Yes, this is intentional.
> this may still want doing differently.
> Looking at the earlier patch introducing the function, I can't infer
> though whether that's intended: iommu_add_dt_pci_device() checks that
> the add_device hook is present, but then I didn't find any use of this
> hook. The revlog there suggests the check might be stale.
Ah, right, the ops->add_device check is stale in the other patch. Good catch, I'll remove it there.
> If indeed the function does only preparatory work, I don't see why it
> would need naming "iommu_..."; I'd rather consider pci_add_dt_device()
> then.
The function has now been reduced to reading SMMU configuration data from DT and mapping RID/BDF -> AXI stream ID. However, it is still SMMU related, and it is still invoking another iommu_ops hook function, dt_xlate (which is yet another AXI stream ID translation, separate from what is being discussed here). Does this justify keeping "iommu_..." in the name? I'm not convinced pci_add_dt_device() is a good name for it either (more on this below).
> Plus in such a case #ifdef-ary here probably wants avoiding by
> introducing a suitable no-op stub for the !HAS_DEVICE_TREE case. Then
> ...
>
>> if ( ret )
>> {
>> +#ifdef CONFIG_HAS_DEVICE_TREE
>> + iommu_fwspec_free(pci_to_dev(pdev));
>> +#endif
>
> ... this (which I understand is doing the corresponding cleanup) then
> also wants wrapping in a suitably named tiny helper function.
Sure, I'm on board with eliminating/reducing the #ifdef-ary where possible. Will do.
> But yet further I'm then no longer convinced this is the right place
> for the addition. pci_add_device() is backing physdev hypercalls. It
> would seem to me that the function may want invoking yet one layer
> further up, or it may even want invoking from a brand new DT-specific
> physdev-op. This would then leave at least the x86-only paths (invoking
> pci_add_device() from outside of pci_physdev_op()) entirely alone.
Let's establish that pci_add_device()/iommu_add_device() are already inherently performing tasks related to setting up a PCI device to work with an IOMMU.
The preparatory work in question needs to happen after:
pci_add_device()
-> alloc_pdev()
since we need to know all the possible RIDs (including those for phantom functions), but before the add_device iommu hook:
pci_add_device()
-> iommu_add_device()
-> iommu_call(hd->platform_ops, add_device, ...)
The preparatory work (i.e. mapping RID/BDF -> AXI stream ID) is inherently associated with setting up a PCI device to work with an ARM SMMU (but not any particular variant of the SMMU). The SMMU distinguishes what PCI device/function DMA traffic is associated with based on the derived AXI stream ID (sideband data), not the RID/BDF directly. See [1].
Moving the preparatory work one layer up would mean duplicating what alloc_pdev() is already doing to set up pdev->phantom_stride (which we need to figure out all RIDs for that particular device). Moving it down into the individual SMMU drivers (smmu_ops/platform_ops) would mean duplicating special phantom function handling in each SMMU driver, and further deviating from the Linux SMMU driver(s) they are based on.
It still feels to me like pci_add_device() (or iommu_add_device()) is the right place to perform the RID/BDF -> AXI stream ID mapping.
Since there's nothing inherently DT specific (or ACPI specific) about deriving sideband data from RID/BDF, let me propose a new name for the function (instead of iommu_add_dt_pci_device):
iommu_derive_pci_device_sideband_IDs()
Now, as far as DT and ACPI co-existing goes, I admit I haven't tested with CONFIG_ACPI=y yet (there seems to be some issues when both CONFIG_ARM_SMMU_V3=y and CONFIG_ACPI=y are enabled, even in staging). But I do recognize that we need a way support both CONFIG_HAS_DEVICE_TREE=y and CONFIG_ACPI=y simultaneously. Let me think on that for a bit...
[1] https://www.kernel.org/doc/Documentation/devicetree/bindings/pci/pci-iommu.txt
On 12.05.2023 23:03, Stewart Hildebrand wrote:
> On 5/12/23 03:25, Jan Beulich wrote:
>> On 11.05.2023 21:16, Stewart Hildebrand wrote:
>>> @@ -762,9 +767,20 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
>>> pdev->domain = NULL;
>>> goto out;
>>> }
>>> +#ifdef CONFIG_HAS_DEVICE_TREE
>>> + ret = iommu_add_dt_pci_device(pdev);
>>> + if ( ret < 0 )
>>> + {
>>> + printk(XENLOG_ERR "pci-iommu translation failed: %d\n", ret);
>>> + goto out;
>>> + }
>>> +#endif
>>> ret = iommu_add_device(pdev);
>>
>> Hmm, am I misremembering that in the earlier patch you had #else to
>> invoke the alternative behavior?
>
> You are remembering correctly. v1 had an #else, v2 does not.
>
>> Now you end up calling both functions;
>> if that's indeed intended,
>
> Yes, this is intentional.
>
>> this may still want doing differently.
>> Looking at the earlier patch introducing the function, I can't infer
>> though whether that's intended: iommu_add_dt_pci_device() checks that
>> the add_device hook is present, but then I didn't find any use of this
>> hook. The revlog there suggests the check might be stale.
>
> Ah, right, the ops->add_device check is stale in the other patch. Good catch, I'll remove it there.
>
>> If indeed the function does only preparatory work, I don't see why it
>> would need naming "iommu_..."; I'd rather consider pci_add_dt_device()
>> then.
>
> The function has now been reduced to reading SMMU configuration data from DT and mapping RID/BDF -> AXI stream ID. However, it is still SMMU related, and it is still invoking another iommu_ops hook function, dt_xlate (which is yet another AXI stream ID translation, separate from what is being discussed here). Does this justify keeping "iommu_..." in the name? I'm not convinced pci_add_dt_device() is a good name for it either (more on this below).
The function being SMMU-related pretty strongly suggests it wants to be
invoked via a hook. If the add_device() one isn't suitable, perhaps we
need a new (optional) prepare_device() one? With pci_add_device() then
calling iommu_prepare_device(), wrapping the hook invocation?
But just to be clear: A new hook would need enough justification as to
the existing one being unsuitable.
Jan
On 5/15/23 03:30, Jan Beulich wrote:
> On 12.05.2023 23:03, Stewart Hildebrand wrote:
>> On 5/12/23 03:25, Jan Beulich wrote:
>>> On 11.05.2023 21:16, Stewart Hildebrand wrote:
>>>> @@ -762,9 +767,20 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
>>>> pdev->domain = NULL;
>>>> goto out;
>>>> }
>>>> +#ifdef CONFIG_HAS_DEVICE_TREE
>>>> + ret = iommu_add_dt_pci_device(pdev);
>>>> + if ( ret < 0 )
>>>> + {
>>>> + printk(XENLOG_ERR "pci-iommu translation failed: %d\n", ret);
>>>> + goto out;
>>>> + }
>>>> +#endif
>>>> ret = iommu_add_device(pdev);
>>>
>>> Hmm, am I misremembering that in the earlier patch you had #else to
>>> invoke the alternative behavior?
>>
>> You are remembering correctly. v1 had an #else, v2 does not.
>>
>>> Now you end up calling both functions;
>>> if that's indeed intended,
>>
>> Yes, this is intentional.
>>
>>> this may still want doing differently.
>>> Looking at the earlier patch introducing the function, I can't infer
>>> though whether that's intended: iommu_add_dt_pci_device() checks that
>>> the add_device hook is present, but then I didn't find any use of this
>>> hook. The revlog there suggests the check might be stale.
>>
>> Ah, right, the ops->add_device check is stale in the other patch. Good catch, I'll remove it there.
>>
>>> If indeed the function does only preparatory work, I don't see why it
>>> would need naming "iommu_..."; I'd rather consider pci_add_dt_device()
>>> then.
>>
>> The function has now been reduced to reading SMMU configuration data from DT and mapping RID/BDF -> AXI stream ID. However, it is still SMMU related, and it is still invoking another iommu_ops hook function, dt_xlate (which is yet another AXI stream ID translation, separate from what is being discussed here). Does this justify keeping "iommu_..." in the name? I'm not convinced pci_add_dt_device() is a good name for it either (more on this below).
>
> The function being SMMU-related pretty strongly suggests it wants to be
> invoked via a hook. If the add_device() one isn't suitable, perhaps we
> need a new (optional) prepare_device() one? With pci_add_device() then
> calling iommu_prepare_device(), wrapping the hook invocation?
>
> But just to be clear: A new hook would need enough justification as to
> the existing one being unsuitable.
I'll move it to the add_device hook in v3
On 12.05.2023 23:03, Stewart Hildebrand wrote:
> On 5/12/23 03:25, Jan Beulich wrote:
>> On 11.05.2023 21:16, Stewart Hildebrand wrote:
>>> @@ -762,9 +767,20 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
>>> pdev->domain = NULL;
>>> goto out;
>>> }
>>> +#ifdef CONFIG_HAS_DEVICE_TREE
>>> + ret = iommu_add_dt_pci_device(pdev);
>>> + if ( ret < 0 )
>>> + {
>>> + printk(XENLOG_ERR "pci-iommu translation failed: %d\n", ret);
>>> + goto out;
>>> + }
>>> +#endif
>>> ret = iommu_add_device(pdev);
>>
>> Hmm, am I misremembering that in the earlier patch you had #else to
>> invoke the alternative behavior?
>
> You are remembering correctly. v1 had an #else, v2 does not.
>
>> Now you end up calling both functions;
>> if that's indeed intended,
>
> Yes, this is intentional.
>
>> this may still want doing differently.
>> Looking at the earlier patch introducing the function, I can't infer
>> though whether that's intended: iommu_add_dt_pci_device() checks that
>> the add_device hook is present, but then I didn't find any use of this
>> hook. The revlog there suggests the check might be stale.
>
> Ah, right, the ops->add_device check is stale in the other patch. Good catch, I'll remove it there.
>
>> If indeed the function does only preparatory work, I don't see why it
>> would need naming "iommu_..."; I'd rather consider pci_add_dt_device()
>> then.
>
> The function has now been reduced to reading SMMU configuration data from DT and mapping RID/BDF -> AXI stream ID. However, it is still SMMU related, and it is still invoking another iommu_ops hook function, dt_xlate (which is yet another AXI stream ID translation, separate from what is being discussed here). Does this justify keeping "iommu_..." in the name? I'm not convinced pci_add_dt_device() is a good name for it either (more on this below).
The function being SMMU-related pretty strongly suggests it wants to be
invoked via a hook. If the add_device() one isn't suitable, perhaps we
need a new (optional) prepare_device() one? With pci_add_device() then
calling iommu_prepare_device(), wrapping the hook invocation?
But just to be clear: A new hook would need enough justification as to
the existing one being unsuitable.
Jan
© 2016 - 2026 Red Hat, Inc.