RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Shameerali Kolothum Thodi via posted 5 patches 2 months, 2 weeks ago
Only 0 patches received!
There is a newer version of this series
RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
Posted by Shameerali Kolothum Thodi via 2 months, 2 weeks ago

> -----Original Message-----
> From: Shameerali Kolothum Thodi
> Sent: Thursday, January 30, 2025 6:09 PM
> To: 'Daniel P. Berrangé' <berrange@redhat.com>
> Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
> nicolinc@nvidia.com; ddutile@redhat.com; Linuxarm
> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
> Subject: RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> nested SMMUv3
> 
> Hi Daniel,
> 
> > -----Original Message-----
> > From: Daniel P. Berrangé <berrange@redhat.com>
> > Sent: Thursday, January 30, 2025 4:00 PM
> > To: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>
> > Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> > eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
> > nicolinc@nvidia.com; ddutile@redhat.com; Linuxarm
> > <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
> > jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
> > <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
> > Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> > nested SMMUv3
> >
> > On Fri, Nov 08, 2024 at 12:52:37PM +0000, Shameer Kolothum via wrote:
> > > How to use it(Eg:):
> > >
> > > On a HiSilicon platform that has multiple physical SMMUv3s, the ACC
> ZIP
> > VF
> > > devices and HNS VF devices are behind different SMMUv3s. So for a
> > Guest,
> > > specify two smmuv3-nested devices each behind a pxb-pcie as below,
> > >
> > > ./qemu-system-aarch64 -machine virt,gic-version=3,default-bus-bypass-
> > iommu=on \
> > > -enable-kvm -cpu host -m 4G -smp cpus=8,maxcpus=8 \
> > > -object iommufd,id=iommufd0 \
> > > -bios QEMU_EFI.fd \
> > > -kernel Image \
> > > -device virtio-blk-device,drive=fs \
> > > -drive if=none,file=rootfs.qcow2,id=fs \
> > > -device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \
> > > -device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1 \
> > > -device arm-smmuv3-nested,id=smmuv1,pci-bus=pcie.1 \
> > > -device vfio-pci,host=0000:7d:02.1,bus=pcie.port1,iommufd=iommufd0 \
> > > -device pxb-pcie,id=pcie.2,bus_nr=16,bus=pcie.0 \
> > > -device pcie-root-port,id=pcie.port2,bus=pcie.2,chassis=2 \
> > > -device arm-smmuv3-nested,id=smmuv2,pci-bus=pcie.2 \
> > > -device vfio-pci,host=0000:75:00.1,bus=pcie.port2,iommufd=iommufd0 \
> > > -append "rdinit=init console=ttyAMA0 root=/dev/vda2 rw
> > earlycon=pl011,0x9000000" \
> > > -device virtio-9p-pci,fsdev=p9fs2,mount_tag=p9,bus=pcie.0 \
> > > -fsdev local,id=p9fs2,path=p9root,security_model=mapped \
> > > -net none \
> > > -nographic
> >
> > Above you say the host has 2 SMMUv3 devices, and you've created 2
> > SMMUv3
> > guest devices to match.
> >
> > The various emails in this thread & libvirt thread, indicate that each
> > guest SMMUv3 is associated with a host SMMUv3, but I don't see any
> > property on the command line for 'arm-ssmv3-nested' that tells it which
> > host eSMMUv3 it is to be associated with.
> >
> > How does this association work ?
> 
> You are right. The association is not very obvious in Qemu. The association
> and checking is done implicitly by kernel at the moment.  I will try to
> explain
> it here.
> 
> Each "arm-smmuv3-nested" instance, when the first device gets attached
> to it, will create a S2 HWPT and a corresponding SMMUv3 domain in kernel
> SMMUv3 driver. This domain will have a pointer representing the physical
> SMMUv3 that the device belongs. And any other device which belongs to
> the same physical SMMUv3 can share this S2 domain.
> 
> If a device that belongs to a different physical SMMUv3 gets attached to
> the above domain, the HWPT attach will eventually fail as the physical
> smmuv3 in the domains will have a mismatch,
> https://elixir.bootlin.com/linux/v6.13/source/drivers/iommu/arm/arm-
> smmu-v3/arm-smmu-v3.c#L2860
> 
> And as I mentioned in cover letter, Qemu will report,
> 
> "
> Attempt to add the HNS VF to a different SMMUv3 will result in,
> 
> -device vfio-pci,host=0000:7d:02.2,bus=pcie.port3,iommufd=iommufd0:
> Unable to attach viommu
> -device vfio-pci,host=0000:7d:02.2,bus=pcie.port3,iommufd=iommufd0: vfio
> 0000:7d:02.2:
>    Failed to set iommu_device: [iommufd=29] error attach 0000:7d:02.2 (38)
> to id=11: Invalid argument
> 
> At present Qemu is not doing any extra validation other than the above
> failure to make sure the user configuration is correct or not. The
> assumption is libvirt will take care of this.
> "
> So in summary, if the libvirt gets it wrong, Qemu will fail with error.
> 
> If a more explicit association is required, some help from kernel is required
> to identify the physical SMMUv3 associated with the device.

Again thinking about this, to have an explicit association in the Qemu command 
line between the vSMMUv3 and the phys smmuv3,

We can possibly add something like,

-device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \
-device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1 \
-device arm-smmuv3-accel,bus=pcie.1,phys-smmuv3= smmu3.0x0000000100000000 \
-device vfio-pci,host=0000:7d:02.1,bus=pcie.port1,iommufd=iommufd0 \

-device pxb-pcie,id=pcie.2,bus_nr=16,bus=pcie.0 \
-device pcie-root-port,id=pcie.port2,bus=pcie.2,chassis=2 \
-device arm-smmuv3-nested,id=smmuv2,pci-bus=pcie.2, phys-smmuv3= smmu3.0x0000000200000000  \
-device vfio-pci,host=0000:75:00.1,bus=pcie.port2,iommufd=iommufd0 \

etc.

And Qemu does some checking to make sure that the device is indeed associated
with the specified phys-smmuv3.  This can be done going through the sysfs path checking
which is what I guess libvirt is currently doing to populate the topology. So basically
Qemu is just replicating that to validate again.

Or another option is extending the IOMMU_GET_HW_INFO IOCTL to return the phys
smmuv3 base address which can avoid going through the sysfs.

The only difference between the current approach(kernel failing the attach implicitly)
and the above is, Qemu can provide a validation of inputs and may be report a  better
error message than just saying " Unable to attach viommu/: Invalid argument".

If the command line looks Ok, I will go with the sysfs path validation method first in my
next respin.

Please let me know.

Thanks,
Shameer




Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
Posted by Jason Gunthorpe 2 months, 2 weeks ago
On Fri, Jan 31, 2025 at 09:33:16AM +0000, Shameerali Kolothum Thodi wrote:

> And Qemu does some checking to make sure that the device is indeed associated
> with the specified phys-smmuv3.  This can be done going through the sysfs path checking
> which is what I guess libvirt is currently doing to populate the topology. So basically
> Qemu is just replicating that to validate again.

I would prefer that iommufd users not have to go out to sysfs..
 
> Or another option is extending the IOMMU_GET_HW_INFO IOCTL to return the phys
> smmuv3 base address which can avoid going through the sysfs.

It also doesn't seem great to expose a physical address. But we could
have an 'iommu instance id' that was a unique small integer?

Jason
RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
Posted by Shameerali Kolothum Thodi via 2 months, 2 weeks ago

> -----Original Message-----
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Friday, January 31, 2025 2:24 PM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: Daniel P. Berrangé <berrange@redhat.com>; qemu-arm@nongnu.org;
> qemu-devel@nongnu.org; eric.auger@redhat.com;
> peter.maydell@linaro.org; nicolinc@nvidia.com; ddutile@redhat.com;
> Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> Jonathan Cameron <jonathan.cameron@huawei.com>;
> zhangfei.gao@linaro.org; Nathan Chen <nathanc@nvidia.com>
> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> nested SMMUv3
> 
> On Fri, Jan 31, 2025 at 09:33:16AM +0000, Shameerali Kolothum Thodi
> wrote:
> 
> > And Qemu does some checking to make sure that the device is indeed
> associated
> > with the specified phys-smmuv3.  This can be done going through the
> sysfs path checking
> > which is what I guess libvirt is currently doing to populate the topology.
> So basically
> > Qemu is just replicating that to validate again.
> 
> I would prefer that iommufd users not have to go out to sysfs..
> 
> > Or another option is extending the IOMMU_GET_HW_INFO IOCTL to
> return the phys
> > smmuv3 base address which can avoid going through the sysfs.
> 
> It also doesn't seem great to expose a physical address. But we could
> have an 'iommu instance id' that was a unique small integer?

Ok. But how the user space can map that to the device?

Something like,
/sys/bus/pci/devices/0000:7d:00.1/iommu/instance.X ?

Thanks,
Shameer
Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
Posted by Jason Gunthorpe 2 months, 2 weeks ago
On Fri, Jan 31, 2025 at 02:39:53PM +0000, Shameerali Kolothum Thodi wrote:

> > > And Qemu does some checking to make sure that the device is indeed
> > associated
> > > with the specified phys-smmuv3.  This can be done going through the
> > sysfs path checking
> > > which is what I guess libvirt is currently doing to populate the topology.
> > So basically
> > > Qemu is just replicating that to validate again.
> > 
> > I would prefer that iommufd users not have to go out to sysfs..
> > 
> > > Or another option is extending the IOMMU_GET_HW_INFO IOCTL to
> > return the phys
> > > smmuv3 base address which can avoid going through the sysfs.
> > 
> > It also doesn't seem great to expose a physical address. But we could
> > have an 'iommu instance id' that was a unique small integer?
> 
> Ok. But how the user space can map that to the device?

Why does it need to?

libvirt picks some label for the vsmmu instance, it doesn't matter
what the string is.

qemu validates that all of the vsmmu instances are only linked to PCI
device that have the same iommu ID. This is already happening in the
kernel, it will fail attaches to mismatched instances.

Nothing further is needed?

Jason
RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
Posted by Shameerali Kolothum Thodi via 2 months, 2 weeks ago

> -----Original Message-----
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Friday, January 31, 2025 2:54 PM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: Daniel P. Berrangé <berrange@redhat.com>; qemu-arm@nongnu.org;
> qemu-devel@nongnu.org; eric.auger@redhat.com;
> peter.maydell@linaro.org; nicolinc@nvidia.com; ddutile@redhat.com;
> Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> Jonathan Cameron <jonathan.cameron@huawei.com>;
> zhangfei.gao@linaro.org; Nathan Chen <nathanc@nvidia.com>
> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> nested SMMUv3
> 
> On Fri, Jan 31, 2025 at 02:39:53PM +0000, Shameerali Kolothum Thodi
> wrote:
> 
> > > > And Qemu does some checking to make sure that the device is indeed
> > > associated
> > > > with the specified phys-smmuv3.  This can be done going through the
> > > sysfs path checking
> > > > which is what I guess libvirt is currently doing to populate the
> topology.
> > > So basically
> > > > Qemu is just replicating that to validate again.
> > >
> > > I would prefer that iommufd users not have to go out to sysfs..
> > >
> > > > Or another option is extending the IOMMU_GET_HW_INFO IOCTL to
> > > return the phys
> > > > smmuv3 base address which can avoid going through the sysfs.
> > >
> > > It also doesn't seem great to expose a physical address. But we could
> > > have an 'iommu instance id' that was a unique small integer?
> >
> > Ok. But how the user space can map that to the device?
> 
> Why does it need to?
> 
> libvirt picks some label for the vsmmu instance, it doesn't matter
> what the string is.
> 
> qemu validates that all of the vsmmu instances are only linked to PCI
> device that have the same iommu ID. This is already happening in the
> kernel, it will fail attaches to mismatched instances.
> 
> Nothing further is needed?

-device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \
-device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1 \
-device arm-smmuv3-accel,bus=pcie.1,id=smmuv1 \
-device vfio-pci,host=0000:7d:02.1,bus=pcie.port1,iommufd=iommufd0 \

-device pxb-pcie,id=pcie.2,bus_nr=16,bus=pcie.0 \
-device pcie-root-port,id=pcie.port2,bus=pcie.2,chassis=2 \
-device arm-smmuv3-accel,pci-bus=pcie.2,id=smmuv2 \
-device vfio-pci,host=0000:75:00.1,bus=pcie.port2,iommufd=iommufd0 \

I think it works from a functionality point of view. A  particular
instance of arm-smmuv3-accel(say id=smmuv1) can only have devices attached
to the same phys smmuv3 "iommu instance id"

But not sure from a libvirt/Qemu interface point of view[0] the concerns
are addressed. Daniel/Nathan?

Thanks,
Shameer
https://lists.libvirt.org/archives/list/devel@lists.libvirt.org/message/X6R52JRBYDFZ5PSJFR534A655UZ3RHKN/
Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
Posted by Eric Auger 2 months, 2 weeks ago
Hi,


On 1/31/25 4:23 PM, Shameerali Kolothum Thodi wrote:
>
>> -----Original Message-----
>> From: Jason Gunthorpe <jgg@nvidia.com>
>> Sent: Friday, January 31, 2025 2:54 PM
>> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
>> Cc: Daniel P. Berrangé <berrange@redhat.com>; qemu-arm@nongnu.org;
>> qemu-devel@nongnu.org; eric.auger@redhat.com;
>> peter.maydell@linaro.org; nicolinc@nvidia.com; ddutile@redhat.com;
>> Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
>> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
>> Jonathan Cameron <jonathan.cameron@huawei.com>;
>> zhangfei.gao@linaro.org; Nathan Chen <nathanc@nvidia.com>
>> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
>> nested SMMUv3
>>
>> On Fri, Jan 31, 2025 at 02:39:53PM +0000, Shameerali Kolothum Thodi
>> wrote:
>>
>>>>> And Qemu does some checking to make sure that the device is indeed
>>>> associated
>>>>> with the specified phys-smmuv3.  This can be done going through the
>>>> sysfs path checking
>>>>> which is what I guess libvirt is currently doing to populate the
>> topology.
>>>> So basically
>>>>> Qemu is just replicating that to validate again.
>>>> I would prefer that iommufd users not have to go out to sysfs..
>>>>
>>>>> Or another option is extending the IOMMU_GET_HW_INFO IOCTL to
>>>> return the phys
>>>>> smmuv3 base address which can avoid going through the sysfs.
>>>> It also doesn't seem great to expose a physical address. But we could
>>>> have an 'iommu instance id' that was a unique small integer?
>>> Ok. But how the user space can map that to the device?
>> Why does it need to?
>>
>> libvirt picks some label for the vsmmu instance, it doesn't matter
>> what the string is.
>>
>> qemu validates that all of the vsmmu instances are only linked to PCI
>> device that have the same iommu ID. This is already happening in the
>> kernel, it will fail attaches to mismatched instances.
>>
>> Nothing further is needed?
> -device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \
> -device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1 \
> -device arm-smmuv3-accel,bus=pcie.1,id=smmuv1 \
I don't get what is the point of adding such an id if it is not
referenced anywhere?

Eric
> -device vfio-pci,host=0000:7d:02.1,bus=pcie.port1,iommufd=iommufd0 \
>
> -device pxb-pcie,id=pcie.2,bus_nr=16,bus=pcie.0 \
> -device pcie-root-port,id=pcie.port2,bus=pcie.2,chassis=2 \
> -device arm-smmuv3-accel,pci-bus=pcie.2,id=smmuv2 \
> -device vfio-pci,host=0000:75:00.1,bus=pcie.port2,iommufd=iommufd0 \
>
> I think it works from a functionality point of view. A  particular
> instance of arm-smmuv3-accel(say id=smmuv1) can only have devices attached
> to the same phys smmuv3 "iommu instance id"
>
> But not sure from a libvirt/Qemu interface point of view[0] the concerns
> are addressed. Daniel/Nathan?
>
> Thanks,
> Shameer
> https://lists.libvirt.org/archives/list/devel@lists.libvirt.org/message/X6R52JRBYDFZ5PSJFR534A655UZ3RHKN/
>


Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
Posted by Daniel P. Berrangé 2 months, 1 week ago
On Fri, Jan 31, 2025 at 05:08:28PM +0100, Eric Auger wrote:
> Hi,
> 
> 
> On 1/31/25 4:23 PM, Shameerali Kolothum Thodi wrote:
> >
> >> -----Original Message-----
> >> From: Jason Gunthorpe <jgg@nvidia.com>
> >> Sent: Friday, January 31, 2025 2:54 PM
> >> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> >> Cc: Daniel P. Berrangé <berrange@redhat.com>; qemu-arm@nongnu.org;
> >> qemu-devel@nongnu.org; eric.auger@redhat.com;
> >> peter.maydell@linaro.org; nicolinc@nvidia.com; ddutile@redhat.com;
> >> Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> >> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> >> Jonathan Cameron <jonathan.cameron@huawei.com>;
> >> zhangfei.gao@linaro.org; Nathan Chen <nathanc@nvidia.com>
> >> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> >> nested SMMUv3
> >>
> >> On Fri, Jan 31, 2025 at 02:39:53PM +0000, Shameerali Kolothum Thodi
> >> wrote:
> >>
> >>>>> And Qemu does some checking to make sure that the device is indeed
> >>>> associated
> >>>>> with the specified phys-smmuv3.  This can be done going through the
> >>>> sysfs path checking
> >>>>> which is what I guess libvirt is currently doing to populate the
> >> topology.
> >>>> So basically
> >>>>> Qemu is just replicating that to validate again.
> >>>> I would prefer that iommufd users not have to go out to sysfs..
> >>>>
> >>>>> Or another option is extending the IOMMU_GET_HW_INFO IOCTL to
> >>>> return the phys
> >>>>> smmuv3 base address which can avoid going through the sysfs.
> >>>> It also doesn't seem great to expose a physical address. But we could
> >>>> have an 'iommu instance id' that was a unique small integer?
> >>> Ok. But how the user space can map that to the device?
> >> Why does it need to?
> >>
> >> libvirt picks some label for the vsmmu instance, it doesn't matter
> >> what the string is.
> >>
> >> qemu validates that all of the vsmmu instances are only linked to PCI
> >> device that have the same iommu ID. This is already happening in the
> >> kernel, it will fail attaches to mismatched instances.
> >>
> >> Nothing further is needed?
> > -device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \
> > -device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1 \
> > -device arm-smmuv3-accel,bus=pcie.1,id=smmuv1 \
> I don't get what is the point of adding such an id if it is not
> referenced anywhere?

Every QDev device instance has an 'id' property - if you don't
set one explicitly, QEMU will generate one internally. Libvirt
will always set the 'id' property to avoid the internal auto-
generated IDs, as it wants full knowledge of naming.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
Posted by Eric Auger 2 months, 1 week ago


On 2/6/25 9:53 AM, Daniel P. Berrangé wrote:
> On Fri, Jan 31, 2025 at 05:08:28PM +0100, Eric Auger wrote:
>> Hi,
>>
>>
>> On 1/31/25 4:23 PM, Shameerali Kolothum Thodi wrote:
>>>> -----Original Message-----
>>>> From: Jason Gunthorpe <jgg@nvidia.com>
>>>> Sent: Friday, January 31, 2025 2:54 PM
>>>> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
>>>> Cc: Daniel P. Berrangé <berrange@redhat.com>; qemu-arm@nongnu.org;
>>>> qemu-devel@nongnu.org; eric.auger@redhat.com;
>>>> peter.maydell@linaro.org; nicolinc@nvidia.com; ddutile@redhat.com;
>>>> Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
>>>> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
>>>> Jonathan Cameron <jonathan.cameron@huawei.com>;
>>>> zhangfei.gao@linaro.org; Nathan Chen <nathanc@nvidia.com>
>>>> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
>>>> nested SMMUv3
>>>>
>>>> On Fri, Jan 31, 2025 at 02:39:53PM +0000, Shameerali Kolothum Thodi
>>>> wrote:
>>>>
>>>>>>> And Qemu does some checking to make sure that the device is indeed
>>>>>> associated
>>>>>>> with the specified phys-smmuv3.  This can be done going through the
>>>>>> sysfs path checking
>>>>>>> which is what I guess libvirt is currently doing to populate the
>>>> topology.
>>>>>> So basically
>>>>>>> Qemu is just replicating that to validate again.
>>>>>> I would prefer that iommufd users not have to go out to sysfs..
>>>>>>
>>>>>>> Or another option is extending the IOMMU_GET_HW_INFO IOCTL to
>>>>>> return the phys
>>>>>>> smmuv3 base address which can avoid going through the sysfs.
>>>>>> It also doesn't seem great to expose a physical address. But we could
>>>>>> have an 'iommu instance id' that was a unique small integer?
>>>>> Ok. But how the user space can map that to the device?
>>>> Why does it need to?
>>>>
>>>> libvirt picks some label for the vsmmu instance, it doesn't matter
>>>> what the string is.
>>>>
>>>> qemu validates that all of the vsmmu instances are only linked to PCI
>>>> device that have the same iommu ID. This is already happening in the
>>>> kernel, it will fail attaches to mismatched instances.
>>>>
>>>> Nothing further is needed?
>>> -device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \
>>> -device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1 \
>>> -device arm-smmuv3-accel,bus=pcie.1,id=smmuv1 \
>> I don't get what is the point of adding such an id if it is not
>> referenced anywhere?
> Every QDev device instance has an 'id' property - if you don't
> set one explicitly, QEMU will generate one internally. Libvirt
> will always set the 'id' property to avoid the internal auto-
> generated IDs, as it wants full knowledge of naming.

OK thank you for the explanation

Eric
>
> With regards,
> Daniel


Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
Posted by Nathan Chen 2 months, 1 week ago

On 1/31/2025 8:08 AM, Eric Auger wrote:
>>>>>> And Qemu does some checking to make sure that the device is indeed
>>>>> associated
>>>>>> with the specified phys-smmuv3.  This can be done going through the
>>>>> sysfs path checking
>>>>>> which is what I guess libvirt is currently doing to populate the
>>> topology.
>>>>> So basically
>>>>>> Qemu is just replicating that to validate again.
>>>>> I would prefer that iommufd users not have to go out to sysfs..
>>>>>
>>>>>> Or another option is extending the IOMMU_GET_HW_INFO IOCTL to
>>>>> return the phys
>>>>>> smmuv3 base address which can avoid going through the sysfs.
>>>>> It also doesn't seem great to expose a physical address. But we could
>>>>> have an 'iommu instance id' that was a unique small integer?
>>>> Ok. But how the user space can map that to the device?
>>> Why does it need to?
>>>
>>> libvirt picks some label for the vsmmu instance, it doesn't matter
>>> what the string is.
>>>
>>> qemu validates that all of the vsmmu instances are only linked to PCI
>>> device that have the same iommu ID. This is already happening in the
>>> kernel, it will fail attaches to mismatched instances.
>>>
>>> Nothing further is needed?
>> -device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \
>> -device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1 \
>> -device arm-smmuv3-accel,bus=pcie.1,id=smmuv1 \
> I don't get what is the point of adding such an id if it is not
> referenced anywhere?
> 
> Eric

Daniel mentions that the host-to-guest SMMU pairing must be chosen such 
that it makes conceptual sense w.r.t. the guest NUMA to host NUMA 
pairing [0]. The current implementation allows for incorrect host to 
guest numa node pairings, e.g. pSMMU has affinity to host numa node 0, 
but it’s paired with a vSMMU paired with a guest numa node pinned to 
host numa node 1.

By specifying the host SMMU id, we can explicitly pair a host SMMU with 
a guest SMMU associated with the correct PXB NUMA node, vs. implying the 
host-to-guest SMMU pairing based on what devices are attached to the 
PXB. While it would not completely prevent the incorrect pSMMU/vSMMU 
pairing w.r.t. host to guest numa node pairings, specifying the pSMMU id 
would make the implications of host to guest numa node pairings more 
clear when specifying a vSMMU instance.

 From the libvirt discussion with Daniel [1], he also states "libvirt's 
goal has always been to make everything that's functionally impacting a 
guest device be 100% explicit. So I don't think we should be implying 
mappings to the host SMMU in QEMU at all, QEMU must be told what to map 
to." Specifying the id would be a means of explicitly specifying host to 
guest SMMU mapping instead of implying the mapping.

[0] https://lore.kernel.org/qemu-devel/Z51DmtP83741RAsb@redhat.com/
[1] 
https://lists.libvirt.org/archives/list/devel@lists.libvirt.org/thread/7GDT6RX5LPAJMPP4ZSC4ACME6GVMG236/#X6R52JRBYDFZ5PSJFR534A655UZ3RHKN

Thanks,
Nathan

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
Posted by Daniel P. Berrangé 2 months, 1 week ago
On Wed, Feb 05, 2025 at 12:53:42PM -0800, Nathan Chen wrote:
> 
> 
> On 1/31/2025 8:08 AM, Eric Auger wrote:
> > > > > > > And Qemu does some checking to make sure that the device is indeed
> > > > > > associated
> > > > > > > with the specified phys-smmuv3.  This can be done going through the
> > > > > > sysfs path checking
> > > > > > > which is what I guess libvirt is currently doing to populate the
> > > > topology.
> > > > > > So basically
> > > > > > > Qemu is just replicating that to validate again.
> > > > > > I would prefer that iommufd users not have to go out to sysfs..
> > > > > > 
> > > > > > > Or another option is extending the IOMMU_GET_HW_INFO IOCTL to
> > > > > > return the phys
> > > > > > > smmuv3 base address which can avoid going through the sysfs.
> > > > > > It also doesn't seem great to expose a physical address. But we could
> > > > > > have an 'iommu instance id' that was a unique small integer?
> > > > > Ok. But how the user space can map that to the device?
> > > > Why does it need to?
> > > > 
> > > > libvirt picks some label for the vsmmu instance, it doesn't matter
> > > > what the string is.
> > > > 
> > > > qemu validates that all of the vsmmu instances are only linked to PCI
> > > > device that have the same iommu ID. This is already happening in the
> > > > kernel, it will fail attaches to mismatched instances.
> > > > 
> > > > Nothing further is needed?
> > > -device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \
> > > -device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1 \
> > > -device arm-smmuv3-accel,bus=pcie.1,id=smmuv1 \
> > I don't get what is the point of adding such an id if it is not
> > referenced anywhere?
> > 
> > Eric
> 
> Daniel mentions that the host-to-guest SMMU pairing must be chosen such that
> it makes conceptual sense w.r.t. the guest NUMA to host NUMA pairing [0].
> The current implementation allows for incorrect host to guest numa node
> pairings, e.g. pSMMU has affinity to host numa node 0, but it’s paired with
> a vSMMU paired with a guest numa node pinned to host numa node 1.
> 
> By specifying the host SMMU id, we can explicitly pair a host SMMU with a
> guest SMMU associated with the correct PXB NUMA node, vs. implying the
> host-to-guest SMMU pairing based on what devices are attached to the PXB.
> While it would not completely prevent the incorrect pSMMU/vSMMU pairing
> w.r.t. host to guest numa node pairings, specifying the pSMMU id would make
> the implications of host to guest numa node pairings more clear when
> specifying a vSMMU instance.

You've not specified any host SMMU id in the above CLI args though,
only the PXB association.

It needs something like

 -device arm-smmuv3-accel,bus=pcie.1,id=smmuv1,host-smmu=XXXXX

where 'XXXX' is some value to identify the host SMMU

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3
Posted by Eric Auger 2 months, 2 weeks ago
Hi Shameer,

On 1/31/25 10:33 AM, Shameerali Kolothum Thodi wrote:
>
>> -----Original Message-----
>> From: Shameerali Kolothum Thodi
>> Sent: Thursday, January 30, 2025 6:09 PM
>> To: 'Daniel P. Berrangé' <berrange@redhat.com>
>> Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
>> eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
>> nicolinc@nvidia.com; ddutile@redhat.com; Linuxarm
>> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
>> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
>> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
>> Subject: RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
>> nested SMMUv3
>>
>> Hi Daniel,
>>
>>> -----Original Message-----
>>> From: Daniel P. Berrangé <berrange@redhat.com>
>>> Sent: Thursday, January 30, 2025 4:00 PM
>>> To: Shameerali Kolothum Thodi
>> <shameerali.kolothum.thodi@huawei.com>
>>> Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
>>> eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
>>> nicolinc@nvidia.com; ddutile@redhat.com; Linuxarm
>>> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
>>> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
>>> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
>>> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
>>> nested SMMUv3
>>>
>>> On Fri, Nov 08, 2024 at 12:52:37PM +0000, Shameer Kolothum via wrote:
>>>> How to use it(Eg:):
>>>>
>>>> On a HiSilicon platform that has multiple physical SMMUv3s, the ACC
>> ZIP
>>> VF
>>>> devices and HNS VF devices are behind different SMMUv3s. So for a
>>> Guest,
>>>> specify two smmuv3-nested devices each behind a pxb-pcie as below,
>>>>
>>>> ./qemu-system-aarch64 -machine virt,gic-version=3,default-bus-bypass-
>>> iommu=on \
>>>> -enable-kvm -cpu host -m 4G -smp cpus=8,maxcpus=8 \
>>>> -object iommufd,id=iommufd0 \
>>>> -bios QEMU_EFI.fd \
>>>> -kernel Image \
>>>> -device virtio-blk-device,drive=fs \
>>>> -drive if=none,file=rootfs.qcow2,id=fs \
>>>> -device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \
>>>> -device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1 \
>>>> -device arm-smmuv3-nested,id=smmuv1,pci-bus=pcie.1 \
>>>> -device vfio-pci,host=0000:7d:02.1,bus=pcie.port1,iommufd=iommufd0 \
>>>> -device pxb-pcie,id=pcie.2,bus_nr=16,bus=pcie.0 \
>>>> -device pcie-root-port,id=pcie.port2,bus=pcie.2,chassis=2 \
>>>> -device arm-smmuv3-nested,id=smmuv2,pci-bus=pcie.2 \
>>>> -device vfio-pci,host=0000:75:00.1,bus=pcie.port2,iommufd=iommufd0 \
>>>> -append "rdinit=init console=ttyAMA0 root=/dev/vda2 rw
>>> earlycon=pl011,0x9000000" \
>>>> -device virtio-9p-pci,fsdev=p9fs2,mount_tag=p9,bus=pcie.0 \
>>>> -fsdev local,id=p9fs2,path=p9root,security_model=mapped \
>>>> -net none \
>>>> -nographic
>>> Above you say the host has 2 SMMUv3 devices, and you've created 2
>>> SMMUv3
>>> guest devices to match.
>>>
>>> The various emails in this thread & libvirt thread, indicate that each
>>> guest SMMUv3 is associated with a host SMMUv3, but I don't see any
>>> property on the command line for 'arm-ssmv3-nested' that tells it which
>>> host eSMMUv3 it is to be associated with.
>>>
>>> How does this association work ?
>> You are right. The association is not very obvious in Qemu. The association
>> and checking is done implicitly by kernel at the moment.  I will try to
>> explain
>> it here.
>>
>> Each "arm-smmuv3-nested" instance, when the first device gets attached
>> to it, will create a S2 HWPT and a corresponding SMMUv3 domain in kernel
>> SMMUv3 driver. This domain will have a pointer representing the physical
>> SMMUv3 that the device belongs. And any other device which belongs to
>> the same physical SMMUv3 can share this S2 domain.
>>
>> If a device that belongs to a different physical SMMUv3 gets attached to
>> the above domain, the HWPT attach will eventually fail as the physical
>> smmuv3 in the domains will have a mismatch,
>> https://elixir.bootlin.com/linux/v6.13/source/drivers/iommu/arm/arm-
>> smmu-v3/arm-smmu-v3.c#L2860
>>
>> And as I mentioned in cover letter, Qemu will report,
>>
>> "
>> Attempt to add the HNS VF to a different SMMUv3 will result in,
>>
>> -device vfio-pci,host=0000:7d:02.2,bus=pcie.port3,iommufd=iommufd0:
>> Unable to attach viommu
>> -device vfio-pci,host=0000:7d:02.2,bus=pcie.port3,iommufd=iommufd0: vfio
>> 0000:7d:02.2:
>>    Failed to set iommu_device: [iommufd=29] error attach 0000:7d:02.2 (38)
>> to id=11: Invalid argument
>>
>> At present Qemu is not doing any extra validation other than the above
>> failure to make sure the user configuration is correct or not. The
>> assumption is libvirt will take care of this.
>> "
>> So in summary, if the libvirt gets it wrong, Qemu will fail with error.
>>
>> If a more explicit association is required, some help from kernel is required
>> to identify the physical SMMUv3 associated with the device.
> Again thinking about this, to have an explicit association in the Qemu command 
> line between the vSMMUv3 and the phys smmuv3,
>
> We can possibly add something like,
>
> -device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \
> -device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1 \
> -device arm-smmuv3-accel,bus=pcie.1,phys-smmuv3= smmu3.0x0000000100000000 \
> -device vfio-pci,host=0000:7d:02.1,bus=pcie.port1,iommufd=iommufd0 \
>
> -device pxb-pcie,id=pcie.2,bus_nr=16,bus=pcie.0 \
> -device pcie-root-port,id=pcie.port2,bus=pcie.2,chassis=2 \
> -device arm-smmuv3-nested,id=smmuv2,pci-bus=pcie.2, phys-smmuv3= smmu3.0x0000000200000000  \
> -device vfio-pci,host=0000:75:00.1,bus=pcie.port2,iommufd=iommufd0 \
>
> etc.
>
> And Qemu does some checking to make sure that the device is indeed associated
> with the specified phys-smmuv3.  This can be done going through the sysfs path checking
> which is what I guess libvirt is currently doing to populate the topology. So basically
> Qemu is just replicating that to validate again.
>
> Or another option is extending the IOMMU_GET_HW_INFO IOCTL to return the phys
> smmuv3 base address which can avoid going through the sysfs.
>
> The only difference between the current approach(kernel failing the attach implicitly)
> and the above is, Qemu can provide a validation of inputs and may be report a  better
> error message than just saying " Unable to attach viommu/: Invalid argument".
>
> If the command line looks Ok, I will go with the sysfs path validation method first in my
> next respin.
The command line looks sensible to me. on vfio we use
host=6810000.ethernet. Maybe reuse this instead of phys-smmuv3? Thanks Eric
>
> Please let me know.
>
> Thanks,
> Shameer
>
>
>
>