1
1
2
> -----Original Message-----
2
> -----Original Message-----
3
> From: Nathan Chen <nathanc@nvidia.com>
3
> From: Daniel P. Berrangé <berrange@redhat.com>
4
> Sent: Saturday, January 25, 2025 2:44 AM
4
> Sent: Thursday, February 6, 2025 2:47 PM
5
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
5
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
6
> Cc: ddutile@redhat.com; eric.auger@redhat.com; jgg@nvidia.com;
6
> Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
7
> eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
8
> nicolinc@nvidia.com; ddutile@redhat.com; Linuxarm
9
> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
7
> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
10
> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
8
> <jonathan.cameron@huawei.com>; Linuxarm <linuxarm@huawei.com>;
11
> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org;
9
> nathanc@nvidia.com; nicolinc@nvidia.com; peter.maydell@linaro.org;
12
> nathanc@nvidia.com
10
> qemu-arm@nongnu.org; Wangzhou (B) <wangzhou1@hisilicon.com>;
13
> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
11
> zhangfei.gao@linaro.org; qemu-devel@nongnu.org
12
> Subject: RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
13
> nested SMMUv3
14
> nested SMMUv3
14
>
15
>
15
> >> >with an error message indicating DMA mapping failed for the
16
> On Thu, Feb 06, 2025 at 01:51:15PM +0000, Shameerali Kolothum Thodi
16
> >> passthrough >devices.
17
> wrote:
17
> >>
18
> > Hmm..I don’t think just swapping the order will change the association
18
> >> A correction - the message indicates UEFI failed to find a mapping for
19
> with
19
> >> the boot partition ("map: no mapping found"), not that DMA mapping
20
> > Guest SMMU here. Because, we have,
20
> >> failed. But earlier EDK debug logs still show PCI host bridge resource
21
> >> conflicts for the passthrough devices that seem related to the VM boot
22
> >> failure.
23
> >
21
> >
24
> > I have tried a 2023 version EFI which works. And for more recent tests I
22
> > > -device arm-smmuv3-accel,id=smmuv2,bus=pcie.2
25
> am
26
> > using a one built directly from,
27
> > https://github.com/tianocore/edk2.git master
28
> >
23
> >
29
> > Commit: 0f3867fa6ef0("UefiPayloadPkg/UefiPayloadEntry: Fix PT
24
> > During smmuv3-accel realize time, this will result in,
30
> protection
25
> > pci_setup_iommu(primary_bus, ops, smmu_state);
31
> > in 5 level paging"
32
> >
26
> >
33
> > With both, I don’t remember seeing any boot failure and the above UEFI
27
> > And when the vfio dev realization happens,
34
> > related "map: no mapping found" error. But the Guest kernel at times
28
> > set_iommu_device()
35
> > complaints about pci bridge window memory assignment failures.
29
> > smmu_dev_set_iommu_device(bus, smmu_state, ,)
36
> > ...
30
> > --> this is where the guest smmuv3-->host smmuv3 association is first
37
> > pci 0000:10:01.0: bridge window [mem size 0x00200000 64bit pref]: can't
31
> > established. And any further vfio dev to this Guest SMMU will
38
> assign; no space
32
> > only succeeds if it belongs to the same phys SMMU.
39
> > pci 0000:10:01.0: bridge window [mem size 0x00200000 64bit pref]: failed
40
> to assign
41
> > pci 0000:10:00.0: bridge window [io size 0x1000]:can't assign; no space
42
> > ...
43
> >
33
> >
44
> > But Guest still boots and worked fine so far.
34
> > ie, the Guest SMMU to pci bus association, actually make sure you have
35
> the
36
> > same Guest SMMU for the device.
45
>
37
>
46
> Hi Shameer,
38
> Ok, so at time of VFIO device realize, QEMU is telling the kernel
39
> to associate a physical SMMU, and its doing this with the virtual
40
> SMMU attached to PXB parenting the VFIO device.
47
>
41
>
48
> Just letting you know I resolved this by increasing the MMIO region size
42
> > smmuv2 --> pcie.2 --> (pxb-pcie, numa_id = 1)
49
> in hw/arm/virt.c to support passing through GPUs with large BAR regions
43
> > 0000:dev2 --> pcie.port2 --> pcie.2 --> smmuv2 (pxb-pcie, numa_id = 1)
50
> (VIRT_HIGH_PCIE_MMIO). Thanks for taking a look.
44
> >
45
> > Hence the association of 0000:dev2 to Guest SMMUv2 remain same.
51
>
46
>
47
> Yes, I concur the SMMU physical <-> virtual association should
48
> be fixed, as long as the same VFIO device is always added to
49
> the same virtual SMMU.
50
>
51
> > I hope this is clear. And I am not sure the association will be broken in
52
> any
53
> > other way unless Qemu CLI specify the dev to a different PXB.
54
>
55
> Although the ordering is at least predictable, I remain uncomfortable
56
> about the idea of the virtual SMMU association with the physical SMMU
57
> being a side effect of the VFIO device placement.
58
>
59
> There is still the open door for admin mis-configuration that will not
60
> be diagnosed. eg consider we attached VFIO device 1 from the host NUMA
61
> node 1 to a PXB associated with host NUMA node 0. As long as that's
62
> the first VFIO device, the kernel will happily associate the physical
63
> and guest SMMUs.
52
64
53
Ok. Thanks for that. Does that mean may be an optional property to specify
65
Yes. A mis-configuration can place it on a wrong one.
54
the size for VIRT_HIGH_PCIE_MMIO is worth adding?
66
67
> If we set the physical/guest SMMU relationship directly, then at the
68
> time the VFIO device is plugged, we can diagnose the incorrectly
69
> placed VFIO device, and better reason about behaviour.
55
70
56
And for the PCI bridge window specific errors that I mentioned above,
71
Agree.
57
72
58
>>pci 0000:10:01.0: bridge window [mem size 0x00200000 64bit pref]: can't assign; no space
73
> I've another question about unplug behaviour..
74
>
75
> 1. Plug a VFIO device for host SMMU 1 into a PXB with guest SMMU 1.
76
> => Kernel associates host SMMU 1 and guest SMMU 1 together
77
> 2. Unplug this VFIO device
78
> 3. Plug a VFIO device for host SMMU 2 into a PXB with guest SMMU 1.
79
>
80
> Does the host/guest SMMU 1<-> 1 association remain set after step 2,
81
> implying step 3 will fail ? Or does it get unset, allowing step 3
82
> to succeed, and establish a new mapping host SMMU 2 to guest SMMU 1.
59
83
60
adding ""mem-reserve=X" and "io-reserve=X" to pcie-root-port helps.
84
At the moment the first association is not persistent. So a new mapping
85
is possible.
86
87
> If step 2 does NOT break the association, do we preserve that
88
> across a savevm+loadvm sequence of QEMU. If we don't, then step
89
> 3 would fail before the savevm, but succeed after the loadvm.
61
90
62
Thanks,
91
Right. I haven't attempted migration tests yet. But agree that an
92
explicit association is better to make migration compatible. Also
93
I am not sure if the target has a different phys SMMUV3<--> dev
94
mapping how we handle that.
95
96
> Explicitly representing the host SMMU association on the guest SMMU
97
> config makes this behaviour unambiguous. The host / guest SMMU
98
> relationship is fixed for the lifetime of the VM and invariant of
99
> whatever VFIO device is (or was previously) plugged.
100
>
101
> So I still go back to my general principle that automatic side effects
102
> are an undesirable idea in QEMU configuration. We have a long tradition
103
> of making everything entirely explicit to produce easily predictable
104
> behaviour.
105
106
Ok. Convinced 😊. Thanks for explaining.
107
63
Shameer
108
Shameer
64
65
diff view generated by jsdifflib