[v1] RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Shameerali Kolothum Thodi via 1 month ago


> -----Original Message-----
> From: Daniel P. Berrangé <berrange@redhat.com>
> Sent: Thursday, February 6, 2025 2:47 PM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
> nicolinc@nvidia.com; ddutile@redhat.com; Linuxarm
> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org;
> nathanc@nvidia.com
> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> nested SMMUv3
> 
> On Thu, Feb 06, 2025 at 01:51:15PM +0000, Shameerali Kolothum Thodi
> wrote:
> > Hmm..I don’t think just swapping the order will change the association
> with
> > Guest SMMU here. Because, we have,
> >
> > >  -device arm-smmuv3-accel,id=smmuv2,bus=pcie.2
> >
> > During smmuv3-accel realize time, this will result in,
> >  pci_setup_iommu(primary_bus, ops, smmu_state);
> >
> > And when the vfio dev realization happens,
> >  set_iommu_device()
> >    smmu_dev_set_iommu_device(bus, smmu_state, ,)
> >       --> this is where the guest smmuv3-->host smmuv3 association is first
> >             established. And any further vfio dev to this Guest SMMU will
> >             only succeeds if it belongs to the same phys SMMU.
> >
> > ie, the Guest SMMU to pci bus association, actually make sure you have
> the
> > same Guest SMMU for the device.
> 
> Ok, so at time of VFIO device realize, QEMU is telling the kernel
> to associate a physical SMMU, and its doing this with the virtual
> SMMU attached to PXB parenting the VFIO device.
> 
> > smmuv2 --> pcie.2 --> (pxb-pcie, numa_id = 1)
> > 0000:dev2 -->  pcie.port2 --> pcie.2 --> smmuv2 (pxb-pcie, numa_id = 1)
> >
> > Hence the association of 0000:dev2 to Guest SMMUv2 remain same.
> 
> Yes, I concur the SMMU physical <-> virtual association should
> be fixed, as long as the same VFIO device is always added to
> the same virtual SMMU.
> 
> > I hope this is clear. And I am not sure the association will be broken in
> any
> > other way unless Qemu CLI specify the dev to a different PXB.
> 
> Although the ordering is at least predictable, I remain uncomfortable
> about the idea of the virtual SMMU association with the physical SMMU
> being a side effect of the VFIO device placement.
> 
> There is still the open door for admin mis-configuration that will not
> be diagnosed. eg consider we attached VFIO device 1 from the host NUMA
> node 1 to  a PXB associated with host NUMA node 0. As long as that's
> the first VFIO device, the kernel will happily associate the physical
> and guest SMMUs.

Yes. A mis-configuration can place it on a wrong one. 
 
> If we set the physical/guest SMMU relationship directly, then at the
> time the VFIO device is plugged, we can diagnose the incorrectly
> placed VFIO device, and better reason about behaviour.

Agree.

> I've another question about unplug behaviour..
> 
>  1. Plug a VFIO device for host SMMU 1 into a PXB with guest SMMU 1.
>       => Kernel associates host SMMU 1 and guest SMMU 1 together
>  2. Unplug this VFIO device
>  3. Plug a VFIO device for host SMMU 2 into a PXB with guest SMMU 1.
> 
> Does the host/guest SMMU 1<-> 1 association remain set after step 2,
> implying step 3 will fail ? Or does it get unset, allowing step 3
> to succeed, and establish a new mapping host SMMU 2 to guest SMMU 1.

At the moment the first association is not persistent. So a new mapping 
is possible.
 
> If step 2 does NOT break the association, do we preserve that
> across a savevm+loadvm sequence of QEMU. If we don't, then step
> 3 would fail before the savevm, but succeed after the loadvm.

Right. I haven't attempted migration tests yet. But agree that an 
explicit association is better to make migration compatible. Also
I am not sure if the target has a different phys SMMUV3<--> dev
mapping how we handle that.

> Explicitly representing the host SMMU association on the guest SMMU
> config makes this behaviour unambiguous. The host / guest SMMU
> relationship is fixed for the lifetime of the VM and invariant of
> whatever VFIO device is (or was previously) plugged.
> 
> So I still go back to my general principle that automatic side effects
> are an undesirable idea in QEMU configuration. We have a long tradition
> of making everything entirely explicit to produce easily predictable
> behaviour.

Ok. Convinced 😊. Thanks for explaining.

Shameer

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Jason Gunthorpe 1 month ago

On Thu, Feb 06, 2025 at 03:07:06PM +0000, Shameerali Kolothum Thodi wrote:
> > If we set the physical/guest SMMU relationship directly, then at the
> > time the VFIO device is plugged, we can diagnose the incorrectly
> > placed VFIO device, and better reason about behaviour.
> 
> Agree.

Can you just take in a VFIO cdev FD reference on this command line:

 -device arm-smmuv3-accel,id=smmuv2,bus=pcie.2

And that will lock the pSMMU/vSMMU relationship?

Jason

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Daniel P. Berrangé 1 month ago

On Thu, Feb 06, 2025 at 01:02:38PM -0400, Jason Gunthorpe wrote:
> On Thu, Feb 06, 2025 at 03:07:06PM +0000, Shameerali Kolothum Thodi wrote:
> > > If we set the physical/guest SMMU relationship directly, then at the
> > > time the VFIO device is plugged, we can diagnose the incorrectly
> > > placed VFIO device, and better reason about behaviour.
> > 
> > Agree.
> 
> Can you just take in a VFIO cdev FD reference on this command line:
> 
>  -device arm-smmuv3-accel,id=smmuv2,bus=pcie.2
> 
> And that will lock the pSMMU/vSMMU relationship?

We shouldn't assume any VFIO device exists in the QEMU cnofig at the time
we realize the virtual ssmu. I expect the SMMU may be cold plugged, while
the VFIO devices may be hot plugged arbitrarly later, and we should have
the association initialized the SMMU is realized.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Jason Gunthorpe 1 month ago

On Thu, Feb 06, 2025 at 05:10:32PM +0000, Daniel P. Berrangé wrote:
> On Thu, Feb 06, 2025 at 01:02:38PM -0400, Jason Gunthorpe wrote:
> > On Thu, Feb 06, 2025 at 03:07:06PM +0000, Shameerali Kolothum Thodi wrote:
> > > > If we set the physical/guest SMMU relationship directly, then at the
> > > > time the VFIO device is plugged, we can diagnose the incorrectly
> > > > placed VFIO device, and better reason about behaviour.
> > > 
> > > Agree.
> > 
> > Can you just take in a VFIO cdev FD reference on this command line:
> > 
> >  -device arm-smmuv3-accel,id=smmuv2,bus=pcie.2
> > 
> > And that will lock the pSMMU/vSMMU relationship?
> 
> We shouldn't assume any VFIO device exists in the QEMU cnofig at the time
> we realize the virtual ssmu. I expect the SMMU may be cold plugged, while
> the VFIO devices may be hot plugged arbitrarly later, and we should have
> the association initialized the SMMU is realized.

This is not supported kernel side, you can't instantiate a vIOMMU
without a VFIO device that uses it. For security.

Jason

RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Shameerali Kolothum Thodi via 1 month ago


> -----Original Message-----
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Thursday, February 6, 2025 5:47 PM
> To: Daniel P. Berrangé <berrange@redhat.com>
> Cc: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
> qemu-devel@nongnu.org; eric.auger@redhat.com;
> peter.maydell@linaro.org; nicolinc@nvidia.com; ddutile@redhat.com;
> Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> Jonathan Cameron <jonathan.cameron@huawei.com>;
> zhangfei.gao@linaro.org; nathanc@nvidia.com
> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> nested SMMUv3
> 
> On Thu, Feb 06, 2025 at 05:10:32PM +0000, Daniel P. Berrangé wrote:
> > On Thu, Feb 06, 2025 at 01:02:38PM -0400, Jason Gunthorpe wrote:
> > > On Thu, Feb 06, 2025 at 03:07:06PM +0000, Shameerali Kolothum Thodi
> wrote:
> > > > > If we set the physical/guest SMMU relationship directly, then at the
> > > > > time the VFIO device is plugged, we can diagnose the incorrectly
> > > > > placed VFIO device, and better reason about behaviour.
> > > >
> > > > Agree.
> > >
> > > Can you just take in a VFIO cdev FD reference on this command line:
> > >
> > >  -device arm-smmuv3-accel,id=smmuv2,bus=pcie.2
> > >
> > > And that will lock the pSMMU/vSMMU relationship?
> >
> > We shouldn't assume any VFIO device exists in the QEMU cnofig at the
> time
> > we realize the virtual ssmu. I expect the SMMU may be cold plugged,
> while
> > the VFIO devices may be hot plugged arbitrarly later, and we should have
> > the association initialized the SMMU is realized.
> 
> This is not supported kernel side, you can't instantiate a vIOMMU
> without a VFIO device that uses it. For security.

I think that is fine if Qemu knows about association beforehand. During 
vIOMMU instantiation it can cross check whether the user specified
pSMMU <->vSMMU is correct for the device.

Also how do we do it with multiple VF devices under a pSUMMU ? Which
cdev fd in that case? 

Thanks,
Shameer

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Jason Gunthorpe 1 month ago

On Thu, Feb 06, 2025 at 05:57:38PM +0000, Shameerali Kolothum Thodi wrote:

> Also how do we do it with multiple VF devices under a pSUMMU ? Which
> cdev fd in that case? 

It doesn't matter, they are all interchangeable. Creating the VIOMMU
object just requires any vfio device that is attached to the physical
smmu.

Jason

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Daniel P. Berrangé 1 month ago

On Thu, Feb 06, 2025 at 01:46:47PM -0400, Jason Gunthorpe wrote:
> On Thu, Feb 06, 2025 at 05:10:32PM +0000, Daniel P. Berrangé wrote:
> > On Thu, Feb 06, 2025 at 01:02:38PM -0400, Jason Gunthorpe wrote:
> > > On Thu, Feb 06, 2025 at 03:07:06PM +0000, Shameerali Kolothum Thodi wrote:
> > > > > If we set the physical/guest SMMU relationship directly, then at the
> > > > > time the VFIO device is plugged, we can diagnose the incorrectly
> > > > > placed VFIO device, and better reason about behaviour.
> > > > 
> > > > Agree.
> > > 
> > > Can you just take in a VFIO cdev FD reference on this command line:
> > > 
> > >  -device arm-smmuv3-accel,id=smmuv2,bus=pcie.2
> > > 
> > > And that will lock the pSMMU/vSMMU relationship?
> > 
> > We shouldn't assume any VFIO device exists in the QEMU cnofig at the time
> > we realize the virtual ssmu. I expect the SMMU may be cold plugged, while
> > the VFIO devices may be hot plugged arbitrarly later, and we should have
> > the association initialized the SMMU is realized.
> 
> This is not supported kernel side, you can't instantiate a vIOMMU
> without a VFIO device that uses it. For security.

What are the security concerns here ?

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Jason Gunthorpe 1 month ago

On Thu, Feb 06, 2025 at 05:54:57PM +0000, Daniel P. Berrangé wrote:
> > > We shouldn't assume any VFIO device exists in the QEMU cnofig at the time
> > > we realize the virtual ssmu. I expect the SMMU may be cold plugged, while
> > > the VFIO devices may be hot plugged arbitrarly later, and we should have
> > > the association initialized the SMMU is realized.
> > 
> > This is not supported kernel side, you can't instantiate a vIOMMU
> > without a VFIO device that uses it. For security.
> 
> What are the security concerns here ?

You should not be able to open iommufd and manipulate iommu HW that
you don't have a VFIO descriptor for, including creating physical
vIOMMU resources, allocating command queues and whatever else.

Some kind of hot plug smmu would have to create a vSMMU without any
kernel backing and then later bind it to a kernel implementation.

Jason

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Daniel P. Berrangé 1 month ago

On Thu, Feb 06, 2025 at 01:58:43PM -0400, Jason Gunthorpe wrote:
> On Thu, Feb 06, 2025 at 05:54:57PM +0000, Daniel P. Berrangé wrote:
> > > > We shouldn't assume any VFIO device exists in the QEMU cnofig at the time
> > > > we realize the virtual ssmu. I expect the SMMU may be cold plugged, while
> > > > the VFIO devices may be hot plugged arbitrarly later, and we should have
> > > > the association initialized the SMMU is realized.
> > > 
> > > This is not supported kernel side, you can't instantiate a vIOMMU
> > > without a VFIO device that uses it. For security.
> > 
> > What are the security concerns here ?
> 
> You should not be able to open iommufd and manipulate iommu HW that
> you don't have a VFIO descriptor for, including creating physical
> vIOMMU resources, allocating command queues and whatever else.
> 
> Some kind of hot plug smmu would have to create a vSMMU without any
> kernel backing and then later bind it to a kernel implementation.

Ok, so if we give the info about the vSMMU <-> pSMMU binding to
QEMU upfront, it can delay using it until the point where the kernel
accepts it. This at least gives a clear design to applications outside
QEMU, and hides the low level impl details to inside QEMU.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Shameerali Kolothum Thodi via 1 month ago


> -----Original Message-----
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Thursday, February 6, 2025 5:59 PM
> To: Daniel P. Berrangé <berrange@redhat.com>
> Cc: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
> qemu-devel@nongnu.org; eric.auger@redhat.com;
> peter.maydell@linaro.org; nicolinc@nvidia.com; ddutile@redhat.com;
> Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> Jonathan Cameron <jonathan.cameron@huawei.com>;
> zhangfei.gao@linaro.org; nathanc@nvidia.com
> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> nested SMMUv3
> 
> On Thu, Feb 06, 2025 at 05:54:57PM +0000, Daniel P. Berrangé wrote:
> > > > We shouldn't assume any VFIO device exists in the QEMU cnofig at the
> time
> > > > we realize the virtual ssmu. I expect the SMMU may be cold plugged,
> while
> > > > the VFIO devices may be hot plugged arbitrarly later, and we should
> have
> > > > the association initialized the SMMU is realized.
> > >
> > > This is not supported kernel side, you can't instantiate a vIOMMU
> > > without a VFIO device that uses it. For security.
> >
> > What are the security concerns here ?
> 
> You should not be able to open iommufd and manipulate iommu HW that
> you don't have a VFIO descriptor for, including creating physical
> vIOMMU resources, allocating command queues and whatever else.
> 
> Some kind of hot plug smmu would have to create a vSMMU without any
> kernel backing and then later bind it to a kernel implementation.

Not sure I get the problem with associating vSMMU with a pSMMU. Something
like an iommu instance id mentioned before,

-device arm-smmuv3-accel,id=smmuv2,bus=pcie.2,host-smmu=iommu.1

This can realize the vSMMU without actually creating a vIOMMU in kernel.
And when the dev gets attached/realized, check (GET_HW_INFO)the specified
iommu instance id matches or not.

Or the concern here is exporting an iommu instance id to user space?

Thanks,
Shameer

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Jason Gunthorpe 1 month ago

On Thu, Feb 06, 2025 at 06:04:57PM +0000, Shameerali Kolothum Thodi wrote:
> > Some kind of hot plug smmu would have to create a vSMMU without any
> > kernel backing and then later bind it to a kernel implementation.
> 
> Not sure I get the problem with associating vSMMU with a pSMMU. Something
> like an iommu instance id mentioned before,
> 
> -device arm-smmuv3-accel,id=smmuv2,bus=pcie.2,host-smmu=iommu.1
> 
> This can realize the vSMMU without actually creating a vIOMMU in kernel.
> And when the dev gets attached/realized, check (GET_HW_INFO)the specified
> iommu instance id matches or not.
> 
> Or the concern here is exporting an iommu instance id to user space?

Philisophically we do not permit any HW access through iommufd without
a VFIO fd to "prove" the process has rights to touch hardware.

We don't have any way to prove the process has rights to touch the
iommu hardware seperately from VFIO.

So even if you invent an iommu ID we cannot accept it as a handle to
create viommu in iommufd.

Jason

RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Shameerali Kolothum Thodi via 1 month ago


> -----Original Message-----
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Thursday, February 6, 2025 6:13 PM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: Daniel P. Berrangé <berrange@redhat.com>; qemu-arm@nongnu.org;
> qemu-devel@nongnu.org; eric.auger@redhat.com;
> peter.maydell@linaro.org; nicolinc@nvidia.com; ddutile@redhat.com;
> Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> Jonathan Cameron <jonathan.cameron@huawei.com>;
> zhangfei.gao@linaro.org; nathanc@nvidia.com
> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> nested SMMUv3
> 
> On Thu, Feb 06, 2025 at 06:04:57PM +0000, Shameerali Kolothum Thodi
> wrote:
> > > Some kind of hot plug smmu would have to create a vSMMU without
> any
> > > kernel backing and then later bind it to a kernel implementation.
> >
> > Not sure I get the problem with associating vSMMU with a pSMMU.
> Something
> > like an iommu instance id mentioned before,
> >
> > -device arm-smmuv3-accel,id=smmuv2,bus=pcie.2,host-smmu=iommu.1
> >
> > This can realize the vSMMU without actually creating a vIOMMU in kernel.
> > And when the dev gets attached/realized, check (GET_HW_INFO)the
> specified
> > iommu instance id matches or not.
> >
> > Or the concern here is exporting an iommu instance id to user space?
> 
> Philisophically we do not permit any HW access through iommufd without
> a VFIO fd to "prove" the process has rights to touch hardware.
> 
> We don't have any way to prove the process has rights to touch the
> iommu hardware seperately from VFIO.

It is not. Qemu just instantiates a vSMMU and assigns the IOMMU 
instance id to it.

> 
> So even if you invent an iommu ID we cannot accept it as a handle to
> create viommu in iommufd.

Creating the vIOMMU only happens when the user does a  cold/hot plug of
a VFIO device. At that time Qemu checks whether the assigned id matches
with whatever the kernel tell it. 

Thanks,
Shameer

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Jason Gunthorpe 1 month ago

On Thu, Feb 06, 2025 at 06:18:14PM +0000, Shameerali Kolothum Thodi wrote:

> > So even if you invent an iommu ID we cannot accept it as a handle to
> > create viommu in iommufd.
> 
> Creating the vIOMMU only happens when the user does a  cold/hot plug of
> a VFIO device. At that time Qemu checks whether the assigned id matches
> with whatever the kernel tell it. 

This is not hard up until the guest is started. If you boot a guest
without a backing viommu iommufd object then there will be some more
complexities.

Jason

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Nicolin Chen 1 month ago

On Thu, Feb 06, 2025 at 02:22:01PM -0400, Jason Gunthorpe wrote:
> On Thu, Feb 06, 2025 at 06:18:14PM +0000, Shameerali Kolothum Thodi wrote:
> 
> > > So even if you invent an iommu ID we cannot accept it as a handle to
> > > create viommu in iommufd.
> > 
> > Creating the vIOMMU only happens when the user does a  cold/hot plug of
> > a VFIO device. At that time Qemu checks whether the assigned id matches
> > with whatever the kernel tell it. 
> 
> This is not hard up until the guest is started. If you boot a guest
> without a backing viommu iommufd object then there will be some more
> complexities.

Yea, I imagined that things would be complicated with hotplugs..

On one hand, I got the part that we need some fixed link forehand
to ease migration/hotplugs.

On the other hand, all IOMMUFD ioctls need a VFIO device FD, which
brings the immediate attention that we cannot even decide vSMMU's
capabilities being reflected in its IDR/IIDR registers, without a
coldplug device -- if we boot a VM (one vSMMU<->pSMMU) with only a
hotplug device, the IOMMU_GET_HW_INFO cannot be done during guest
kernel probing vSMMU instance. So we would have to reset the vSMMU
"HW" after the device hotplug?

Nicolin

RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Shameerali Kolothum Thodi via 1 month ago


> -----Original Message-----
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Thursday, February 6, 2025 8:33 PM
> To: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; Daniel P. Berrangé
> <berrange@redhat.com>; Jason Gunthorpe <jgg@nvidia.com>
> Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> eric.auger@redhat.com; peter.maydell@linaro.org; ddutile@redhat.com;
> Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> Jonathan Cameron <jonathan.cameron@huawei.com>;
> zhangfei.gao@linaro.org; nathanc@nvidia.com
> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> nested SMMUv3
> 
> On Thu, Feb 06, 2025 at 02:22:01PM -0400, Jason Gunthorpe wrote:
> > On Thu, Feb 06, 2025 at 06:18:14PM +0000, Shameerali Kolothum Thodi
> wrote:
> >
> > > > So even if you invent an iommu ID we cannot accept it as a handle to
> > > > create viommu in iommufd.
> > >
> > > Creating the vIOMMU only happens when the user does a  cold/hot
> plug of
> > > a VFIO device. At that time Qemu checks whether the assigned id
> matches
> > > with whatever the kernel tell it.
> >
> > This is not hard up until the guest is started. If you boot a guest
> > without a backing viommu iommufd object then there will be some more
> > complexities.
> 
> Yea, I imagined that things would be complicated with hotplugs..
> 
> On one hand, I got the part that we need some fixed link forehand
> to ease migration/hotplugs.
> 
> On the other hand, all IOMMUFD ioctls need a VFIO device FD, which
> brings the immediate attention that we cannot even decide vSMMU's
> capabilities being reflected in its IDR/IIDR registers, without a
> coldplug device -- if we boot a VM (one vSMMU<->pSMMU) with only a
> hotplug device, the IOMMU_GET_HW_INFO cannot be done during guest

Right. I forgot about the call to smmu_dev_get_info() during the reset.
That means we need at least one dev per Guest SMMU during Guest
boot :(

Thanks,
Shameer

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Daniel P. Berrangé 1 month ago

On Fri, Feb 07, 2025 at 10:21:17AM +0000, Shameerali Kolothum Thodi wrote:
> 
> 
> > -----Original Message-----
> > From: Nicolin Chen <nicolinc@nvidia.com>
> > Sent: Thursday, February 6, 2025 8:33 PM
> > To: Shameerali Kolothum Thodi
> > <shameerali.kolothum.thodi@huawei.com>; Daniel P. Berrangé
> > <berrange@redhat.com>; Jason Gunthorpe <jgg@nvidia.com>
> > Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> > eric.auger@redhat.com; peter.maydell@linaro.org; ddutile@redhat.com;
> > Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> > <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> > Jonathan Cameron <jonathan.cameron@huawei.com>;
> > zhangfei.gao@linaro.org; nathanc@nvidia.com
> > Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> > nested SMMUv3
> > 
> > On Thu, Feb 06, 2025 at 02:22:01PM -0400, Jason Gunthorpe wrote:
> > > On Thu, Feb 06, 2025 at 06:18:14PM +0000, Shameerali Kolothum Thodi
> > wrote:
> > >
> > > > > So even if you invent an iommu ID we cannot accept it as a handle to
> > > > > create viommu in iommufd.
> > > >
> > > > Creating the vIOMMU only happens when the user does a  cold/hot
> > plug of
> > > > a VFIO device. At that time Qemu checks whether the assigned id
> > matches
> > > > with whatever the kernel tell it.
> > >
> > > This is not hard up until the guest is started. If you boot a guest
> > > without a backing viommu iommufd object then there will be some more
> > > complexities.
> > 
> > Yea, I imagined that things would be complicated with hotplugs..
> > 
> > On one hand, I got the part that we need some fixed link forehand
> > to ease migration/hotplugs.
> > 
> > On the other hand, all IOMMUFD ioctls need a VFIO device FD, which
> > brings the immediate attention that we cannot even decide vSMMU's
> > capabilities being reflected in its IDR/IIDR registers, without a
> > coldplug device -- if we boot a VM (one vSMMU<->pSMMU) with only a
> > hotplug device, the IOMMU_GET_HW_INFO cannot be done during guest
> 
> Right. I forgot about the call to smmu_dev_get_info() during the reset.
> That means we need at least one dev per Guest SMMU during Guest
> boot :(

That's pretty unpleasant as a usage restriction. It sounds like there
needs to be a way to configure & control the vIOMMU independantly of
attaching a specific VFIO device.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Shameerali Kolothum Thodi via 1 month ago


> -----Original Message-----
> From: Daniel P. Berrangé <berrange@redhat.com>
> Sent: Friday, February 7, 2025 10:32 AM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: Nicolin Chen <nicolinc@nvidia.com>; Jason Gunthorpe
> <jgg@nvidia.com>; qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> eric.auger@redhat.com; peter.maydell@linaro.org; ddutile@redhat.com;
> Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> Jonathan Cameron <jonathan.cameron@huawei.com>;
> zhangfei.gao@linaro.org; nathanc@nvidia.com
> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> nested SMMUv3
> 
> On Fri, Feb 07, 2025 at 10:21:17AM +0000, Shameerali Kolothum Thodi
> wrote:
> >
> >
> > > -----Original Message-----
> > > From: Nicolin Chen <nicolinc@nvidia.com>
> > > Sent: Thursday, February 6, 2025 8:33 PM
> > > To: Shameerali Kolothum Thodi
> > > <shameerali.kolothum.thodi@huawei.com>; Daniel P. Berrangé
> > > <berrange@redhat.com>; Jason Gunthorpe <jgg@nvidia.com>
> > > Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> > > eric.auger@redhat.com; peter.maydell@linaro.org;
> ddutile@redhat.com;
> > > Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> > > <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> > > Jonathan Cameron <jonathan.cameron@huawei.com>;
> > > zhangfei.gao@linaro.org; nathanc@nvidia.com
> > > Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-
> creatable
> > > nested SMMUv3
> > >
> > > On Thu, Feb 06, 2025 at 02:22:01PM -0400, Jason Gunthorpe wrote:
> > > > On Thu, Feb 06, 2025 at 06:18:14PM +0000, Shameerali Kolothum
> Thodi
> > > wrote:
> > > >
> > > > > > So even if you invent an iommu ID we cannot accept it as a handle
> to
> > > > > > create viommu in iommufd.
> > > > >
> > > > > Creating the vIOMMU only happens when the user does a  cold/hot
> > > plug of
> > > > > a VFIO device. At that time Qemu checks whether the assigned id
> > > matches
> > > > > with whatever the kernel tell it.
> > > >
> > > > This is not hard up until the guest is started. If you boot a guest
> > > > without a backing viommu iommufd object then there will be some
> more
> > > > complexities.
> > >
> > > Yea, I imagined that things would be complicated with hotplugs..
> > >
> > > On one hand, I got the part that we need some fixed link forehand
> > > to ease migration/hotplugs.
> > >
> > > On the other hand, all IOMMUFD ioctls need a VFIO device FD, which
> > > brings the immediate attention that we cannot even decide vSMMU's
> > > capabilities being reflected in its IDR/IIDR registers, without a
> > > coldplug device -- if we boot a VM (one vSMMU<->pSMMU) with only a
> > > hotplug device, the IOMMU_GET_HW_INFO cannot be done during
> guest
> >
> > Right. I forgot about the call to smmu_dev_get_info() during the reset.
> > That means we need at least one dev per Guest SMMU during Guest
> > boot :(
> 
> That's pretty unpleasant as a usage restriction. It sounds like there
> needs to be a way to configure & control the vIOMMU independantly of
> attaching a specific VFIO device.

Yes, that would be ideal.  

Just wondering whether we can have something like the
vfio_register_iommu_driver() for iommufd subsystem by which it can directly
access iommu drivers ops(may be a restricted set). 

Not sure about the layering violations and other security issues with that...

Thanks,
Shameer

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Jason Gunthorpe 1 month ago

On Fri, Feb 07, 2025 at 12:21:54PM +0000, Shameerali Kolothum Thodi wrote:

> Just wondering whether we can have something like the
> vfio_register_iommu_driver() for iommufd subsystem by which it can directly
> access iommu drivers ops(may be a restricted set). 

I very much want to try hard to avoid that.

AFAICT you do not need a VFIO device, or access to the HW_INFO of the
smmu to start up a SMMU driver.

Yes, you cannot later attach a VFIO device with a pSMMU that
materially differs from vSMMU setup, but that is fine.

qemu has long had a duality where you can either "inherit from host"
for an easy setup or be "fully specified" and support live
migration/etc. CPUID as a simple example.

So, what the smmu patches are doing now is "inherit from host" and
that requires a VFIO device to work. I think that is fine.

If you want to do full hotplug then you need to "fully specified" on
the command line so a working vSMMU can be shown to the guest with no
devices, and no kernel involvement.

Obviously this is a highly advanced operating mode as things like IIDR
and errata need to be considered, but I would guess booting with no
vPCI devices is already abnormal.

Jason

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Jason Gunthorpe 1 month ago

On Thu, Feb 06, 2025 at 12:33:19PM -0800, Nicolin Chen wrote:
> On Thu, Feb 06, 2025 at 02:22:01PM -0400, Jason Gunthorpe wrote:
> > On Thu, Feb 06, 2025 at 06:18:14PM +0000, Shameerali Kolothum Thodi wrote:
> > 
> > > > So even if you invent an iommu ID we cannot accept it as a handle to
> > > > create viommu in iommufd.
> > > 
> > > Creating the vIOMMU only happens when the user does a  cold/hot plug of
> > > a VFIO device. At that time Qemu checks whether the assigned id matches
> > > with whatever the kernel tell it. 
> > 
> > This is not hard up until the guest is started. If you boot a guest
> > without a backing viommu iommufd object then there will be some more
> > complexities.
> 
> Yea, I imagined that things would be complicated with hotplugs..
> 
> On one hand, I got the part that we need some fixed link forehand
> to ease migration/hotplugs.
> 
> On the other hand, all IOMMUFD ioctls need a VFIO device FD, which
> brings the immediate attention that we cannot even decide vSMMU's
> capabilities being reflected in its IDR/IIDR registers, without a
> coldplug device

As Daniel was saying this all has to be specifiable on the command
line.

IMHO if the vSMMU is not fully specified by the time the boot happens
(either explicity via command line or implicitly by querying the live
HW) then it qemu should fail.

Jason

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Nicolin Chen 1 month ago

On Thu, Feb 06, 2025 at 04:38:55PM -0400, Jason Gunthorpe wrote:
> On Thu, Feb 06, 2025 at 12:33:19PM -0800, Nicolin Chen wrote:
> > On Thu, Feb 06, 2025 at 02:22:01PM -0400, Jason Gunthorpe wrote:
> > > On Thu, Feb 06, 2025 at 06:18:14PM +0000, Shameerali Kolothum Thodi wrote:
> > > 
> > > > > So even if you invent an iommu ID we cannot accept it as a handle to
> > > > > create viommu in iommufd.
> > > > 
> > > > Creating the vIOMMU only happens when the user does a  cold/hot plug of
> > > > a VFIO device. At that time Qemu checks whether the assigned id matches
> > > > with whatever the kernel tell it. 
> > > 
> > > This is not hard up until the guest is started. If you boot a guest
> > > without a backing viommu iommufd object then there will be some more
> > > complexities.
> > 
> > Yea, I imagined that things would be complicated with hotplugs..
> > 
> > On one hand, I got the part that we need some fixed link forehand
> > to ease migration/hotplugs.
> > 
> > On the other hand, all IOMMUFD ioctls need a VFIO device FD, which
> > brings the immediate attention that we cannot even decide vSMMU's
> > capabilities being reflected in its IDR/IIDR registers, without a
> > coldplug device
> 
> As Daniel was saying this all has to be specifiable on the command
> line.
> 
> IMHO if the vSMMU is not fully specified by the time the boot happens
> (either explicity via command line or implicitly by querying the live
> HW) then it qemu should fail.

Though that makes sense, that would assume we could only support
the case where a VM has at least one cold plug device per vSMMU?

Otherwise, even if we specify vSMMU to which pSMMU via a command
line, we can't get access to the pSMMU via IOMMU_GET_HW_INFO..

Thanks
Nicolin

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Jason Gunthorpe 1 month ago

On Thu, Feb 06, 2025 at 12:48:40PM -0800, Nicolin Chen wrote:
> On Thu, Feb 06, 2025 at 04:38:55PM -0400, Jason Gunthorpe wrote:
> > On Thu, Feb 06, 2025 at 12:33:19PM -0800, Nicolin Chen wrote:
> > > On Thu, Feb 06, 2025 at 02:22:01PM -0400, Jason Gunthorpe wrote:
> > > > On Thu, Feb 06, 2025 at 06:18:14PM +0000, Shameerali Kolothum Thodi wrote:
> > > > 
> > > > > > So even if you invent an iommu ID we cannot accept it as a handle to
> > > > > > create viommu in iommufd.
> > > > > 
> > > > > Creating the vIOMMU only happens when the user does a  cold/hot plug of
> > > > > a VFIO device. At that time Qemu checks whether the assigned id matches
> > > > > with whatever the kernel tell it. 
> > > > 
> > > > This is not hard up until the guest is started. If you boot a guest
> > > > without a backing viommu iommufd object then there will be some more
> > > > complexities.
> > > 
> > > Yea, I imagined that things would be complicated with hotplugs..
> > > 
> > > On one hand, I got the part that we need some fixed link forehand
> > > to ease migration/hotplugs.
> > > 
> > > On the other hand, all IOMMUFD ioctls need a VFIO device FD, which
> > > brings the immediate attention that we cannot even decide vSMMU's
> > > capabilities being reflected in its IDR/IIDR registers, without a
> > > coldplug device
> > 
> > As Daniel was saying this all has to be specifiable on the command
> > line.
> > 
> > IMHO if the vSMMU is not fully specified by the time the boot happens
> > (either explicity via command line or implicitly by querying the live
> > HW) then it qemu should fail.
> 
> Though that makes sense, that would assume we could only support
> the case where a VM has at least one cold plug device per vSMMU?
> 
> Otherwise, even if we specify vSMMU to which pSMMU via a command
> line, we can't get access to the pSMMU via IOMMU_GET_HW_INFO..

You'd use the command line information and wouldn't need GET_HW_INFO,
it would be complicated

Jason

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Nicolin Chen 1 month ago

On Thu, Feb 06, 2025 at 05:11:13PM -0400, Jason Gunthorpe wrote:
> On Thu, Feb 06, 2025 at 12:48:40PM -0800, Nicolin Chen wrote:
> > On Thu, Feb 06, 2025 at 04:38:55PM -0400, Jason Gunthorpe wrote:
> > > On Thu, Feb 06, 2025 at 12:33:19PM -0800, Nicolin Chen wrote:
> > > > On Thu, Feb 06, 2025 at 02:22:01PM -0400, Jason Gunthorpe wrote:
> > > > > On Thu, Feb 06, 2025 at 06:18:14PM +0000, Shameerali Kolothum Thodi wrote:
> > > > > 
> > > > > > > So even if you invent an iommu ID we cannot accept it as a handle to
> > > > > > > create viommu in iommufd.
> > > > > > 
> > > > > > Creating the vIOMMU only happens when the user does a  cold/hot plug of
> > > > > > a VFIO device. At that time Qemu checks whether the assigned id matches
> > > > > > with whatever the kernel tell it. 
> > > > > 
> > > > > This is not hard up until the guest is started. If you boot a guest
> > > > > without a backing viommu iommufd object then there will be some more
> > > > > complexities.
> > > > 
> > > > Yea, I imagined that things would be complicated with hotplugs..
> > > > 
> > > > On one hand, I got the part that we need some fixed link forehand
> > > > to ease migration/hotplugs.
> > > > 
> > > > On the other hand, all IOMMUFD ioctls need a VFIO device FD, which
> > > > brings the immediate attention that we cannot even decide vSMMU's
> > > > capabilities being reflected in its IDR/IIDR registers, without a
> > > > coldplug device
> > > 
> > > As Daniel was saying this all has to be specifiable on the command
> > > line.
> > > 
> > > IMHO if the vSMMU is not fully specified by the time the boot happens
> > > (either explicity via command line or implicitly by querying the live
> > > HW) then it qemu should fail.
> > 
> > Though that makes sense, that would assume we could only support
> > the case where a VM has at least one cold plug device per vSMMU?
> > 
> > Otherwise, even if we specify vSMMU to which pSMMU via a command
> > line, we can't get access to the pSMMU via IOMMU_GET_HW_INFO..
> 
> You'd use the command line information and wouldn't need GET_HW_INFO,
> it would be complicated

Do you mean the "-device arm-smmuv3-accel,id=xx" line? This still
won't give us the host IDR/IIDR register values to probe a vSMMU,
unless it has a VFIO device assigned to vSMMU's associated PXB in
that command line?

Nicolin

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Jason Gunthorpe 1 month ago

On Thu, Feb 06, 2025 at 02:46:42PM -0800, Nicolin Chen wrote:
> > You'd use the command line information and wouldn't need GET_HW_INFO,
> > it would be complicated
> 
> Do you mean the "-device arm-smmuv3-accel,id=xx" line? This still
> won't give us the host IDR/IIDR register values to probe a vSMMU,
> unless it has a VFIO device assigned to vSMMU's associated PXB in
> that command line?

Yes, put the IDR registers on the command line too.

Nothing from the host should be copied to the guest without the option
to control it through the command line.

Jason