1 | 1 | ||
---|---|---|---|
2 | > -----Original Message----- | 2 | > -----Original Message----- |
3 | > From: Shameerali Kolothum Thodi | 3 | > From: Peter Maydell <peter.maydell@linaro.org> |
4 | > Sent: Thursday, January 30, 2025 6:09 PM | 4 | > Sent: Friday, December 13, 2024 1:33 PM |
5 | > To: 'Daniel P. Berrangé' <berrange@redhat.com> | 5 | > To: Jason Gunthorpe <jgg@nvidia.com> |
6 | > Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org; | 6 | > Cc: Daniel P. Berrangé <berrange@redhat.com>; Shameerali Kolothum |
7 | > eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com; | 7 | > Thodi <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org; |
8 | > nicolinc@nvidia.com; ddutile@redhat.com; Linuxarm | 8 | > qemu-devel@nongnu.org; eric.auger@redhat.com; nicolinc@nvidia.com; |
9 | > <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>; | 9 | > ddutile@redhat.com; Linuxarm <linuxarm@huawei.com>; Wangzhou (B) |
10 | > jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron | 10 | > <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>; |
11 | > <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org | 11 | > Jonathan Cameron <jonathan.cameron@huawei.com>; |
12 | > Subject: RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable | 12 | > zhangfei.gao@linaro.org |
13 | > Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable | ||
13 | > nested SMMUv3 | 14 | > nested SMMUv3 |
14 | > | 15 | > |
15 | > Hi Daniel, | 16 | > On Fri, 13 Dec 2024 at 12:46, Jason Gunthorpe <jgg@nvidia.com> wrote: |
17 | > > | ||
18 | > > On Fri, Dec 13, 2024 at 12:00:43PM +0000, Daniel P. Berrangé wrote: | ||
19 | > > > On Fri, Nov 08, 2024 at 12:52:37PM +0000, Shameer Kolothum via wrote: | ||
20 | > > > > Hi, | ||
21 | > > > > | ||
22 | > > > > This series adds initial support for a user-creatable "arm-smmuv3- | ||
23 | > nested" | ||
24 | > > > > device to Qemu. At present the Qemu ARM SMMUv3 emulation is per | ||
25 | > machine | ||
26 | > > > > and cannot support multiple SMMUv3s. | ||
27 | > > > > | ||
28 | > > > > In order to support vfio-pci dev assignment with vSMMUv3, the | ||
29 | > physical | ||
30 | > > > > SMMUv3 has to be configured in nested mode. Having a pluggable | ||
31 | > > > > "arm-smmuv3-nested" device enables us to have multiple vSMMUv3 | ||
32 | > for Guests | ||
33 | > > > > running on a host with multiple physical SMMUv3s. A few benefits of | ||
34 | > doing | ||
35 | > > > > this are, | ||
36 | > > > | ||
37 | > > > I'm not very familiar with arm, but from this description I'm not | ||
38 | > > > really seeing how "nesting" is involved here. You're only talking | ||
39 | > > > about the host and 1 L1 guest, no L2 guest. | ||
40 | > > | ||
41 | > > nesting is the term the iommu side is using to refer to the 2 | ||
42 | > > dimensional paging, ie a guest page table on top of a hypervisor page | ||
43 | > > table. | ||
16 | > | 44 | > |
17 | > > -----Original Message----- | 45 | > Isn't that more usually called "two stage" paging? Calling |
18 | > > From: Daniel P. Berrangé <berrange@redhat.com> | 46 | > that "nesting" seems like it is going to be massively confusing... |
19 | > > Sent: Thursday, January 30, 2025 4:00 PM | 47 | |
20 | > > To: Shameerali Kolothum Thodi | 48 | Yes. This will be renamed in future revisions as arm-smmuv3-accel. |
21 | > <shameerali.kolothum.thodi@huawei.com> | 49 | |
22 | > > Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org; | ||
23 | > > eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com; | ||
24 | > > nicolinc@nvidia.com; ddutile@redhat.com; Linuxarm | ||
25 | > > <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>; | ||
26 | > > jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron | ||
27 | > > <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org | ||
28 | > > Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable | ||
29 | > > nested SMMUv3 | ||
30 | > > | ||
31 | > > On Fri, Nov 08, 2024 at 12:52:37PM +0000, Shameer Kolothum via wrote: | ||
32 | > > > How to use it(Eg:): | ||
33 | > > > | ||
34 | > > > On a HiSilicon platform that has multiple physical SMMUv3s, the ACC | ||
35 | > ZIP | ||
36 | > > VF | ||
37 | > > > devices and HNS VF devices are behind different SMMUv3s. So for a | ||
38 | > > Guest, | ||
39 | > > > specify two smmuv3-nested devices each behind a pxb-pcie as below, | ||
40 | > > > | ||
41 | > > > ./qemu-system-aarch64 -machine virt,gic-version=3,default-bus-bypass- | ||
42 | > > iommu=on \ | ||
43 | > > > -enable-kvm -cpu host -m 4G -smp cpus=8,maxcpus=8 \ | ||
44 | > > > -object iommufd,id=iommufd0 \ | ||
45 | > > > -bios QEMU_EFI.fd \ | ||
46 | > > > -kernel Image \ | ||
47 | > > > -device virtio-blk-device,drive=fs \ | ||
48 | > > > -drive if=none,file=rootfs.qcow2,id=fs \ | ||
49 | > > > -device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \ | ||
50 | > > > -device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1 \ | ||
51 | > > > -device arm-smmuv3-nested,id=smmuv1,pci-bus=pcie.1 \ | ||
52 | > > > -device vfio-pci,host=0000:7d:02.1,bus=pcie.port1,iommufd=iommufd0 \ | ||
53 | > > > -device pxb-pcie,id=pcie.2,bus_nr=16,bus=pcie.0 \ | ||
54 | > > > -device pcie-root-port,id=pcie.port2,bus=pcie.2,chassis=2 \ | ||
55 | > > > -device arm-smmuv3-nested,id=smmuv2,pci-bus=pcie.2 \ | ||
56 | > > > -device vfio-pci,host=0000:75:00.1,bus=pcie.port2,iommufd=iommufd0 \ | ||
57 | > > > -append "rdinit=init console=ttyAMA0 root=/dev/vda2 rw | ||
58 | > > earlycon=pl011,0x9000000" \ | ||
59 | > > > -device virtio-9p-pci,fsdev=p9fs2,mount_tag=p9,bus=pcie.0 \ | ||
60 | > > > -fsdev local,id=p9fs2,path=p9root,security_model=mapped \ | ||
61 | > > > -net none \ | ||
62 | > > > -nographic | ||
63 | > > | ||
64 | > > Above you say the host has 2 SMMUv3 devices, and you've created 2 | ||
65 | > > SMMUv3 | ||
66 | > > guest devices to match. | ||
67 | > > | ||
68 | > > The various emails in this thread & libvirt thread, indicate that each | ||
69 | > > guest SMMUv3 is associated with a host SMMUv3, but I don't see any | ||
70 | > > property on the command line for 'arm-ssmv3-nested' that tells it which | ||
71 | > > host eSMMUv3 it is to be associated with. | ||
72 | > > | ||
73 | > > How does this association work ? | ||
74 | > | 50 | > |
75 | > You are right. The association is not very obvious in Qemu. The association | 51 | > Also, how does it relate to what this series seems to be |
76 | > and checking is done implicitly by kernel at the moment. I will try to | 52 | > doing, where we provide the guest with two separate SMMUs? |
77 | > explain | 53 | > (Are those two SMMUs "nested" in the sense that one is sitting |
78 | > it here. | 54 | > behind the other?) |
79 | > | ||
80 | > Each "arm-smmuv3-nested" instance, when the first device gets attached | ||
81 | > to it, will create a S2 HWPT and a corresponding SMMUv3 domain in kernel | ||
82 | > SMMUv3 driver. This domain will have a pointer representing the physical | ||
83 | > SMMUv3 that the device belongs. And any other device which belongs to | ||
84 | > the same physical SMMUv3 can share this S2 domain. | ||
85 | > | ||
86 | > If a device that belongs to a different physical SMMUv3 gets attached to | ||
87 | > the above domain, the HWPT attach will eventually fail as the physical | ||
88 | > smmuv3 in the domains will have a mismatch, | ||
89 | > https://elixir.bootlin.com/linux/v6.13/source/drivers/iommu/arm/arm- | ||
90 | > smmu-v3/arm-smmu-v3.c#L2860 | ||
91 | > | ||
92 | > And as I mentioned in cover letter, Qemu will report, | ||
93 | > | ||
94 | > " | ||
95 | > Attempt to add the HNS VF to a different SMMUv3 will result in, | ||
96 | > | ||
97 | > -device vfio-pci,host=0000:7d:02.2,bus=pcie.port3,iommufd=iommufd0: | ||
98 | > Unable to attach viommu | ||
99 | > -device vfio-pci,host=0000:7d:02.2,bus=pcie.port3,iommufd=iommufd0: vfio | ||
100 | > 0000:7d:02.2: | ||
101 | > Failed to set iommu_device: [iommufd=29] error attach 0000:7d:02.2 (38) | ||
102 | > to id=11: Invalid argument | ||
103 | > | ||
104 | > At present Qemu is not doing any extra validation other than the above | ||
105 | > failure to make sure the user configuration is correct or not. The | ||
106 | > assumption is libvirt will take care of this. | ||
107 | > " | ||
108 | > So in summary, if the libvirt gets it wrong, Qemu will fail with error. | ||
109 | > | ||
110 | > If a more explicit association is required, some help from kernel is required | ||
111 | > to identify the physical SMMUv3 associated with the device. | ||
112 | 55 | ||
113 | Again thinking about this, to have an explicit association in the Qemu command | 56 | I don't think it requires two SMMUs in Guest. The nested or "two |
114 | line between the vSMMUv3 and the phys smmuv3, | 57 | stage" means the stage 1 page table is owned by Guest and stage 2 |
58 | by host. And this is achieved by IOMMUFD provided IOCTLs. | ||
115 | 59 | ||
116 | We can possibly add something like, | 60 | There is a precurser to this series where the support for hw accelerated |
61 | 2 stage support is added in Qemu SMMUv3 code. | ||
117 | 62 | ||
118 | -device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \ | 63 | Please see the complete branch here, |
119 | -device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1 \ | 64 | https://github.com/hisilicon/qemu/commits/private-smmuv3-nested-dev-rfc-v1/ |
120 | -device arm-smmuv3-accel,bus=pcie.1,phys-smmuv3= smmu3.0x0000000100000000 \ | 65 | And patches prior to this commit adds that support: |
121 | -device vfio-pci,host=0000:7d:02.1,bus=pcie.port1,iommufd=iommufd0 \ | 66 | 4ccdbe3: ("cover-letter: Add HW accelerated nesting support for arm |
67 | SMMUv3") | ||
122 | 68 | ||
123 | -device pxb-pcie,id=pcie.2,bus_nr=16,bus=pcie.0 \ | 69 | Nicolin is soon going to send out those for review. Or I can include |
124 | -device pcie-root-port,id=pcie.port2,bus=pcie.2,chassis=2 \ | 70 | those in this series so that it gives a complete picture. Nicolin? |
125 | -device arm-smmuv3-nested,id=smmuv2,pci-bus=pcie.2, phys-smmuv3= smmu3.0x0000000200000000 \ | ||
126 | -device vfio-pci,host=0000:75:00.1,bus=pcie.port2,iommufd=iommufd0 \ | ||
127 | 71 | ||
128 | etc. | 72 | Hope this clarifies any confusion. |
129 | |||
130 | And Qemu does some checking to make sure that the device is indeed associated | ||
131 | with the specified phys-smmuv3. This can be done going through the sysfs path checking | ||
132 | which is what I guess libvirt is currently doing to populate the topology. So basically | ||
133 | Qemu is just replicating that to validate again. | ||
134 | |||
135 | Or another option is extending the IOMMU_GET_HW_INFO IOCTL to return the phys | ||
136 | smmuv3 base address which can avoid going through the sysfs. | ||
137 | |||
138 | The only difference between the current approach(kernel failing the attach implicitly) | ||
139 | and the above is, Qemu can provide a validation of inputs and may be report a better | ||
140 | error message than just saying " Unable to attach viommu/: Invalid argument". | ||
141 | |||
142 | If the command line looks Ok, I will go with the sysfs path validation method first in my | ||
143 | next respin. | ||
144 | |||
145 | Please let me know. | ||
146 | 73 | ||
147 | Thanks, | 74 | Thanks, |
148 | Shameer | 75 | Shameer |
149 | 76 | ||
150 | 77 | ||
151 | 78 | ||
152 | diff view generated by jsdifflib |