1 | 1 | ||
---|---|---|---|
2 | > -----Original Message----- | 2 | > -----Original Message----- |
3 | > From: Nathan Chen <nathanc@nvidia.com> | 3 | > From: Daniel P. Berrangé <berrange@redhat.com> |
4 | > Sent: Saturday, January 25, 2025 2:44 AM | 4 | > Sent: Thursday, February 6, 2025 2:47 PM |
5 | > To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> | 5 | > To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> |
6 | > Cc: ddutile@redhat.com; eric.auger@redhat.com; jgg@nvidia.com; | 6 | > Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org; |
7 | > eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com; | ||
8 | > nicolinc@nvidia.com; ddutile@redhat.com; Linuxarm | ||
9 | > <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>; | ||
7 | > jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron | 10 | > jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron |
8 | > <jonathan.cameron@huawei.com>; Linuxarm <linuxarm@huawei.com>; | 11 | > <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org; |
9 | > nathanc@nvidia.com; nicolinc@nvidia.com; peter.maydell@linaro.org; | 12 | > nathanc@nvidia.com |
10 | > qemu-arm@nongnu.org; Wangzhou (B) <wangzhou1@hisilicon.com>; | 13 | > Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable |
11 | > zhangfei.gao@linaro.org; qemu-devel@nongnu.org | ||
12 | > Subject: RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable | ||
13 | > nested SMMUv3 | 14 | > nested SMMUv3 |
14 | > | 15 | > |
15 | > >> >with an error message indicating DMA mapping failed for the | 16 | > On Thu, Feb 06, 2025 at 01:51:15PM +0000, Shameerali Kolothum Thodi |
16 | > >> passthrough >devices. | 17 | > wrote: |
17 | > >> | 18 | > > Hmm..I don’t think just swapping the order will change the association |
18 | > >> A correction - the message indicates UEFI failed to find a mapping for | 19 | > with |
19 | > >> the boot partition ("map: no mapping found"), not that DMA mapping | 20 | > > Guest SMMU here. Because, we have, |
20 | > >> failed. But earlier EDK debug logs still show PCI host bridge resource | ||
21 | > >> conflicts for the passthrough devices that seem related to the VM boot | ||
22 | > >> failure. | ||
23 | > > | 21 | > > |
24 | > > I have tried a 2023 version EFI which works. And for more recent tests I | 22 | > > > -device arm-smmuv3-accel,id=smmuv2,bus=pcie.2 |
25 | > am | ||
26 | > > using a one built directly from, | ||
27 | > > https://github.com/tianocore/edk2.git master | ||
28 | > > | 23 | > > |
29 | > > Commit: 0f3867fa6ef0("UefiPayloadPkg/UefiPayloadEntry: Fix PT | 24 | > > During smmuv3-accel realize time, this will result in, |
30 | > protection | 25 | > > pci_setup_iommu(primary_bus, ops, smmu_state); |
31 | > > in 5 level paging" | ||
32 | > > | 26 | > > |
33 | > > With both, I don’t remember seeing any boot failure and the above UEFI | 27 | > > And when the vfio dev realization happens, |
34 | > > related "map: no mapping found" error. But the Guest kernel at times | 28 | > > set_iommu_device() |
35 | > > complaints about pci bridge window memory assignment failures. | 29 | > > smmu_dev_set_iommu_device(bus, smmu_state, ,) |
36 | > > ... | 30 | > > --> this is where the guest smmuv3-->host smmuv3 association is first |
37 | > > pci 0000:10:01.0: bridge window [mem size 0x00200000 64bit pref]: can't | 31 | > > established. And any further vfio dev to this Guest SMMU will |
38 | > assign; no space | 32 | > > only succeeds if it belongs to the same phys SMMU. |
39 | > > pci 0000:10:01.0: bridge window [mem size 0x00200000 64bit pref]: failed | ||
40 | > to assign | ||
41 | > > pci 0000:10:00.0: bridge window [io size 0x1000]:can't assign; no space | ||
42 | > > ... | ||
43 | > > | 33 | > > |
44 | > > But Guest still boots and worked fine so far. | 34 | > > ie, the Guest SMMU to pci bus association, actually make sure you have |
35 | > the | ||
36 | > > same Guest SMMU for the device. | ||
45 | > | 37 | > |
46 | > Hi Shameer, | 38 | > Ok, so at time of VFIO device realize, QEMU is telling the kernel |
39 | > to associate a physical SMMU, and its doing this with the virtual | ||
40 | > SMMU attached to PXB parenting the VFIO device. | ||
47 | > | 41 | > |
48 | > Just letting you know I resolved this by increasing the MMIO region size | 42 | > > smmuv2 --> pcie.2 --> (pxb-pcie, numa_id = 1) |
49 | > in hw/arm/virt.c to support passing through GPUs with large BAR regions | 43 | > > 0000:dev2 --> pcie.port2 --> pcie.2 --> smmuv2 (pxb-pcie, numa_id = 1) |
50 | > (VIRT_HIGH_PCIE_MMIO). Thanks for taking a look. | 44 | > > |
45 | > > Hence the association of 0000:dev2 to Guest SMMUv2 remain same. | ||
51 | > | 46 | > |
47 | > Yes, I concur the SMMU physical <-> virtual association should | ||
48 | > be fixed, as long as the same VFIO device is always added to | ||
49 | > the same virtual SMMU. | ||
50 | > | ||
51 | > > I hope this is clear. And I am not sure the association will be broken in | ||
52 | > any | ||
53 | > > other way unless Qemu CLI specify the dev to a different PXB. | ||
54 | > | ||
55 | > Although the ordering is at least predictable, I remain uncomfortable | ||
56 | > about the idea of the virtual SMMU association with the physical SMMU | ||
57 | > being a side effect of the VFIO device placement. | ||
58 | > | ||
59 | > There is still the open door for admin mis-configuration that will not | ||
60 | > be diagnosed. eg consider we attached VFIO device 1 from the host NUMA | ||
61 | > node 1 to a PXB associated with host NUMA node 0. As long as that's | ||
62 | > the first VFIO device, the kernel will happily associate the physical | ||
63 | > and guest SMMUs. | ||
52 | 64 | ||
53 | Ok. Thanks for that. Does that mean may be an optional property to specify | 65 | Yes. A mis-configuration can place it on a wrong one. |
54 | the size for VIRT_HIGH_PCIE_MMIO is worth adding? | 66 | |
67 | > If we set the physical/guest SMMU relationship directly, then at the | ||
68 | > time the VFIO device is plugged, we can diagnose the incorrectly | ||
69 | > placed VFIO device, and better reason about behaviour. | ||
55 | 70 | ||
56 | And for the PCI bridge window specific errors that I mentioned above, | 71 | Agree. |
57 | 72 | ||
58 | >>pci 0000:10:01.0: bridge window [mem size 0x00200000 64bit pref]: can't assign; no space | 73 | > I've another question about unplug behaviour.. |
74 | > | ||
75 | > 1. Plug a VFIO device for host SMMU 1 into a PXB with guest SMMU 1. | ||
76 | > => Kernel associates host SMMU 1 and guest SMMU 1 together | ||
77 | > 2. Unplug this VFIO device | ||
78 | > 3. Plug a VFIO device for host SMMU 2 into a PXB with guest SMMU 1. | ||
79 | > | ||
80 | > Does the host/guest SMMU 1<-> 1 association remain set after step 2, | ||
81 | > implying step 3 will fail ? Or does it get unset, allowing step 3 | ||
82 | > to succeed, and establish a new mapping host SMMU 2 to guest SMMU 1. | ||
59 | 83 | ||
60 | adding ""mem-reserve=X" and "io-reserve=X" to pcie-root-port helps. | 84 | At the moment the first association is not persistent. So a new mapping |
85 | is possible. | ||
86 | |||
87 | > If step 2 does NOT break the association, do we preserve that | ||
88 | > across a savevm+loadvm sequence of QEMU. If we don't, then step | ||
89 | > 3 would fail before the savevm, but succeed after the loadvm. | ||
61 | 90 | ||
62 | Thanks, | 91 | Right. I haven't attempted migration tests yet. But agree that an |
92 | explicit association is better to make migration compatible. Also | ||
93 | I am not sure if the target has a different phys SMMUV3<--> dev | ||
94 | mapping how we handle that. | ||
95 | |||
96 | > Explicitly representing the host SMMU association on the guest SMMU | ||
97 | > config makes this behaviour unambiguous. The host / guest SMMU | ||
98 | > relationship is fixed for the lifetime of the VM and invariant of | ||
99 | > whatever VFIO device is (or was previously) plugged. | ||
100 | > | ||
101 | > So I still go back to my general principle that automatic side effects | ||
102 | > are an undesirable idea in QEMU configuration. We have a long tradition | ||
103 | > of making everything entirely explicit to produce easily predictable | ||
104 | > behaviour. | ||
105 | |||
106 | Ok. Convinced 😊. Thanks for explaining. | ||
107 | |||
63 | Shameer | 108 | Shameer |
64 | |||
65 | diff view generated by jsdifflib |