hw/arm/virt: Add support for user-creatable nested SMMUv3

[RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Shameer Kolothum via 1 year, 3 months ago

Hi,

This series adds initial support for a user-creatable "arm-smmuv3-nested"
device to Qemu. At present the Qemu ARM SMMUv3 emulation is per machine
and cannot support multiple SMMUv3s.

In order to support vfio-pci dev assignment with vSMMUv3, the physical
SMMUv3 has to be configured in nested mode. Having a pluggable
"arm-smmuv3-nested" device enables us to have multiple vSMMUv3 for Guests
running on a host with multiple physical SMMUv3s. A few benefits of doing
this are,

1. Avoid invalidation broadcast or lookup in case devices are behind
   multiple phys SMMUv3s.
2. Makes it easy to handle phys SMMUv3s that differ in features.
3. Easy to handle future requirements such as vCMDQ support.

This is based on discussions/suggestions received for a previous RFC by
Nicolin here[0].

This series includes,
 -Adds support for "arm-smmuv3-nested" device. At present only virt is
  supported and is using _plug_cb() callback to hook the sysbus mem
  and irq (Not sure this has any negative repercussions). Patch #3.
 -Provides a way to associate a pci-bus(pxb-pcie) to the above device.
  Patch #3.
 -The last patch is adding RMR support for MSI doorbell handling. Patch #5.
  This may change in future[1].

This RFC is for initial discussion/test purposes only and includes patches
that are only relevant for adding the "arm-smmuv3-nested" support. For the
complete branch please find,
https://github.com/hisilicon/qemu/tree/private-smmuv3-nested-dev-rfc-v1

Few ToDos to note,
1. At present default-bus-bypass-iommu=on should be set when
   arm-smmuv3-nested dev is specified. Otherwise you may get an IORT
   related boot error.  Requires fixing.
2. Hot adding a device is not working at the moment. Looks like pcihp irq issue.
   Could be a bug in IORT id mappings.
3. The above branch doesn't support vSVA yet.

Hopefully this is helpful in taking the discussion forward. Please take a
look and let me know.

How to use it(Eg:):

On a HiSilicon platform that has multiple physical SMMUv3s, the ACC ZIP VF
devices and HNS VF devices are behind different SMMUv3s. So for a Guest,
specify two smmuv3-nested devices each behind a pxb-pcie as below,

./qemu-system-aarch64 -machine virt,gic-version=3,default-bus-bypass-iommu=on \
-enable-kvm -cpu host -m 4G -smp cpus=8,maxcpus=8 \
-object iommufd,id=iommufd0 \
-bios QEMU_EFI.fd \
-kernel Image \
-device virtio-blk-device,drive=fs \
-drive if=none,file=rootfs.qcow2,id=fs \
-device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \
-device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1 \
-device arm-smmuv3-nested,id=smmuv1,pci-bus=pcie.1 \
-device vfio-pci,host=0000:7d:02.1,bus=pcie.port1,iommufd=iommufd0 \
-device pxb-pcie,id=pcie.2,bus_nr=16,bus=pcie.0 \
-device pcie-root-port,id=pcie.port2,bus=pcie.2,chassis=2 \
-device arm-smmuv3-nested,id=smmuv2,pci-bus=pcie.2 \
-device vfio-pci,host=0000:75:00.1,bus=pcie.port2,iommufd=iommufd0 \
-append "rdinit=init console=ttyAMA0 root=/dev/vda2 rw earlycon=pl011,0x9000000" \
-device virtio-9p-pci,fsdev=p9fs2,mount_tag=p9,bus=pcie.0 \
-fsdev local,id=p9fs2,path=p9root,security_model=mapped \
-net none \
-nographic

Guest will boot with two SMMuv3s,
[    1.608130] arm-smmu-v3 arm-smmu-v3.0.auto: option mask 0x0
[    1.609655] arm-smmu-v3 arm-smmu-v3.0.auto: ias 48-bit, oas 48-bit (features 0x00020b25)
[    1.612475] arm-smmu-v3 arm-smmu-v3.0.auto: allocated 65536 entries for cmdq
[    1.614444] arm-smmu-v3 arm-smmu-v3.0.auto: allocated 32768 entries for evtq
[    1.617451] arm-smmu-v3 arm-smmu-v3.1.auto: option mask 0x0
[    1.618842] arm-smmu-v3 arm-smmu-v3.1.auto: ias 48-bit, oas 48-bit (features 0x00020b25)
[    1.621366] arm-smmu-v3 arm-smmu-v3.1.auto: allocated 65536 entries for cmdq
[    1.623225] arm-smmu-v3 arm-smmu-v3.1.auto: allocated 32768 entries for evtq

With a pci topology like below,
[root@localhost ~]# lspci -tv
-+-[0000:00]-+-00.0  Red Hat, Inc. QEMU PCIe Host bridge
 |           +-01.0  Red Hat, Inc. QEMU PCIe Expander bridge
 |           +-02.0  Red Hat, Inc. QEMU PCIe Expander bridge
 |           \-03.0  Virtio: Virtio filesystem
 +-[0000:08]---00.0-[09]----00.0  Huawei Technologies Co., Ltd. HNS Network Controller (Virtual Function)
 \-[0000:10]---00.0-[11]----00.0  Huawei Technologies Co., Ltd. HiSilicon ZIP Engine(Virtual Function)
[root@localhost ~]#

And if you want to add another HNS VF, it should be added to the same SMMUv3
as of the first HNS dev,

-device pcie-root-port,id=pcie.port3,bus=pcie.1,chassis=3 \
-device vfio-pci,host=0000:7d:02.2,bus=pcie.port3,iommufd=iommufd0 \

[root@localhost ~]# lspci -tv
-+-[0000:00]-+-00.0  Red Hat, Inc. QEMU PCIe Host bridge
 |           +-01.0  Red Hat, Inc. QEMU PCIe Expander bridge
 |           +-02.0  Red Hat, Inc. QEMU PCIe Expander bridge
 |           \-03.0  Virtio: Virtio filesystem
 +-[0000:08]-+-00.0-[09]----00.0  Huawei Technologies Co., Ltd. HNS Network Controller (Virtual Function)
 |           \-01.0-[0a]----00.0  Huawei Technologies Co., Ltd. HNS Network Controller (Virtual Function)
 \-[0000:10]---00.0-[11]----00.0  Huawei Technologies Co., Ltd. HiSilicon ZIP Engine(Virtual Function)
[root@localhost ~]#

Attempt to add the HNS VF to a different SMMUv3 will result in,

-device vfio-pci,host=0000:7d:02.2,bus=pcie.port3,iommufd=iommufd0: Unable to attach viommu
-device vfio-pci,host=0000:7d:02.2,bus=pcie.port3,iommufd=iommufd0: vfio 0000:7d:02.2:
   Failed to set iommu_device: [iommufd=29] error attach 0000:7d:02.2 (38) to id=11: Invalid argument

At present Qemu is not doing any extra validation other than the above
failure to make sure the user configuration is correct or not. The
assumption is libvirt will take care of this.

Thanks,
Shameer
[0] https://lore.kernel.org/qemu-devel/cover.1719361174.git.nicolinc@nvidia.com/
[1] https://lore.kernel.org/linux-iommu/ZrVN05VylFq8lK4q@Asurada-Nvidia/

Eric Auger (1):
  hw/arm/virt-acpi-build: Add IORT RMR regions to handle MSI nested
    binding

Nicolin Chen (2):
  hw/arm/virt: Add an SMMU_IO_LEN macro
  hw/arm/virt-acpi-build: Build IORT with multiple SMMU nodes

Shameer Kolothum (2):
  hw/arm/smmuv3: Add initial support for SMMUv3 Nested device
  hw/arm/smmuv3: Associate a pci bus with a SMMUv3 Nested device

 hw/arm/smmuv3.c          |  61 ++++++++++++++++++++++
 hw/arm/virt-acpi-build.c | 109 ++++++++++++++++++++++++++++++++-------
 hw/arm/virt.c            |  33 ++++++++++--
 hw/core/sysbus-fdt.c     |   1 +
 include/hw/arm/smmuv3.h  |  17 ++++++
 include/hw/arm/virt.h    |  15 ++++++
 6 files changed, 215 insertions(+), 21 deletions(-)

-- 
2.34.1

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Daniel P. Berrangé 1 year ago

On Fri, Nov 08, 2024 at 12:52:37PM +0000, Shameer Kolothum via wrote:
> How to use it(Eg:):
> 
> On a HiSilicon platform that has multiple physical SMMUv3s, the ACC ZIP VF
> devices and HNS VF devices are behind different SMMUv3s. So for a Guest,
> specify two smmuv3-nested devices each behind a pxb-pcie as below,
> 
> ./qemu-system-aarch64 -machine virt,gic-version=3,default-bus-bypass-iommu=on \
> -enable-kvm -cpu host -m 4G -smp cpus=8,maxcpus=8 \
> -object iommufd,id=iommufd0 \
> -bios QEMU_EFI.fd \
> -kernel Image \
> -device virtio-blk-device,drive=fs \
> -drive if=none,file=rootfs.qcow2,id=fs \
> -device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \
> -device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1 \
> -device arm-smmuv3-nested,id=smmuv1,pci-bus=pcie.1 \
> -device vfio-pci,host=0000:7d:02.1,bus=pcie.port1,iommufd=iommufd0 \
> -device pxb-pcie,id=pcie.2,bus_nr=16,bus=pcie.0 \
> -device pcie-root-port,id=pcie.port2,bus=pcie.2,chassis=2 \
> -device arm-smmuv3-nested,id=smmuv2,pci-bus=pcie.2 \
> -device vfio-pci,host=0000:75:00.1,bus=pcie.port2,iommufd=iommufd0 \
> -append "rdinit=init console=ttyAMA0 root=/dev/vda2 rw earlycon=pl011,0x9000000" \
> -device virtio-9p-pci,fsdev=p9fs2,mount_tag=p9,bus=pcie.0 \
> -fsdev local,id=p9fs2,path=p9root,security_model=mapped \
> -net none \
> -nographic

Above you say the host has 2 SMMUv3 devices, and you've created 2 SMMUv3
guest devices to match.

The various emails in this thread & libvirt thread, indicate that each
guest SMMUv3 is associated with a host SMMUv3, but I don't see any
property on the command line for 'arm-ssmv3-nested' that tells it which
host eSMMUv3 it is to be associated with.

How does this association work ?


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Shameerali Kolothum Thodi via 1 year ago

Hi Daniel,

> -----Original Message-----
> From: Daniel P. Berrangé <berrange@redhat.com>
> Sent: Thursday, January 30, 2025 4:00 PM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
> nicolinc@nvidia.com; ddutile@redhat.com; Linuxarm
> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> nested SMMUv3
> 
> On Fri, Nov 08, 2024 at 12:52:37PM +0000, Shameer Kolothum via wrote:
> > How to use it(Eg:):
> >
> > On a HiSilicon platform that has multiple physical SMMUv3s, the ACC ZIP
> VF
> > devices and HNS VF devices are behind different SMMUv3s. So for a
> Guest,
> > specify two smmuv3-nested devices each behind a pxb-pcie as below,
> >
> > ./qemu-system-aarch64 -machine virt,gic-version=3,default-bus-bypass-
> iommu=on \
> > -enable-kvm -cpu host -m 4G -smp cpus=8,maxcpus=8 \
> > -object iommufd,id=iommufd0 \
> > -bios QEMU_EFI.fd \
> > -kernel Image \
> > -device virtio-blk-device,drive=fs \
> > -drive if=none,file=rootfs.qcow2,id=fs \
> > -device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \
> > -device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1 \
> > -device arm-smmuv3-nested,id=smmuv1,pci-bus=pcie.1 \
> > -device vfio-pci,host=0000:7d:02.1,bus=pcie.port1,iommufd=iommufd0 \
> > -device pxb-pcie,id=pcie.2,bus_nr=16,bus=pcie.0 \
> > -device pcie-root-port,id=pcie.port2,bus=pcie.2,chassis=2 \
> > -device arm-smmuv3-nested,id=smmuv2,pci-bus=pcie.2 \
> > -device vfio-pci,host=0000:75:00.1,bus=pcie.port2,iommufd=iommufd0 \
> > -append "rdinit=init console=ttyAMA0 root=/dev/vda2 rw
> earlycon=pl011,0x9000000" \
> > -device virtio-9p-pci,fsdev=p9fs2,mount_tag=p9,bus=pcie.0 \
> > -fsdev local,id=p9fs2,path=p9root,security_model=mapped \
> > -net none \
> > -nographic
> 
> Above you say the host has 2 SMMUv3 devices, and you've created 2
> SMMUv3
> guest devices to match.
> 
> The various emails in this thread & libvirt thread, indicate that each
> guest SMMUv3 is associated with a host SMMUv3, but I don't see any
> property on the command line for 'arm-ssmv3-nested' that tells it which
> host eSMMUv3 it is to be associated with.
> 
> How does this association work ?

You are right. The association is not very obvious in Qemu. The association
and checking is done implicitly by kernel at the moment.  I will try to explain
it here.

Each "arm-smmuv3-nested" instance, when the first device gets attached
to it, will create a S2 HWPT and a corresponding SMMUv3 domain in kernel
SMMUv3 driver. This domain will have a pointer representing the physical
SMMUv3 that the device belongs. And any other device which belongs to
the same physical SMMUv3 can share this S2 domain.

If a device that belongs to a different physical SMMUv3 gets attached to
the above domain, the HWPT attach will eventually fail as the physical
smmuv3 in the domains will have a mismatch,
https://elixir.bootlin.com/linux/v6.13/source/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c#L2860

And as I mentioned in cover letter, Qemu will report,

"
Attempt to add the HNS VF to a different SMMUv3 will result in,

-device vfio-pci,host=0000:7d:02.2,bus=pcie.port3,iommufd=iommufd0: Unable to attach viommu
-device vfio-pci,host=0000:7d:02.2,bus=pcie.port3,iommufd=iommufd0: vfio 0000:7d:02.2:
   Failed to set iommu_device: [iommufd=29] error attach 0000:7d:02.2 (38) to id=11: Invalid argument

At present Qemu is not doing any extra validation other than the above 
failure to make sure the user configuration is correct or not. The
assumption is libvirt will take care of this.
"
So in summary, if the libvirt gets it wrong, Qemu will fail with error.

If a more explicit association is required, some help from kernel is required
to identify the physical SMMUv3 associated with the device.  

Jason/Nicolin, any thoughts on this?

Thanks,
Shameer

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Daniel P. Berrangé 1 year ago

On Thu, Jan 30, 2025 at 06:09:24PM +0000, Shameerali Kolothum Thodi wrote:
> 
> Each "arm-smmuv3-nested" instance, when the first device gets attached
> to it, will create a S2 HWPT and a corresponding SMMUv3 domain in kernel
> SMMUv3 driver. This domain will have a pointer representing the physical
> SMMUv3 that the device belongs. And any other device which belongs to
> the same physical SMMUv3 can share this S2 domain.

Ok, so given two guest SMMUv3s,   A and B, and two host SMMUv3s,
C and D, we could end up with A&C and B&D paired, or we could
end up with A&D and B&C paired, depending on whether we plug 
the first VFIO device into guest SMMUv3  A or B.

This is bad.  Behaviour must not vary depending on the order
in which we create devices.

An guest SMMUv3 is paired to a guest PXB. A guest PXB is liable
to be paired to a guest NUMA node. A guest NUMA node is liable
to be paired to host NUMA node. The guest/host SMMU pairing
must be chosen such that it makes conceptual sense wrt to the
guest PXB NUMA to host NUMA pairing.

If the kernel picks guest<->host SMMU pairings on a first-device
first-paired basis, this can end up with incorrect guest NUMA
configurations.

The mgmt apps needs to be able to tell QEMU exactly which
host SMMU to pair with each guest SMMU, and QEMU needs to
then tell the kernel.

> And as I mentioned in cover letter, Qemu will report,
> 
> "
> Attempt to add the HNS VF to a different SMMUv3 will result in,
> 
> -device vfio-pci,host=0000:7d:02.2,bus=pcie.port3,iommufd=iommufd0: Unable to attach viommu
> -device vfio-pci,host=0000:7d:02.2,bus=pcie.port3,iommufd=iommufd0: vfio 0000:7d:02.2:
>    Failed to set iommu_device: [iommufd=29] error attach 0000:7d:02.2 (38) to id=11: Invalid argument
> 
> At present Qemu is not doing any extra validation other than the above 
> failure to make sure the user configuration is correct or not. The
> assumption is libvirt will take care of this.
> "
> So in summary, if the libvirt gets it wrong, Qemu will fail with error.

That's good error checking, and required, but also insufficient
as illustrated above IMHO.

> If a more explicit association is required, some help from kernel is required
> to identify the physical SMMUv3 associated with the device.

Yep, I think SMMUv3 info for devices needs to be exposed to userspace,
as well as a mechanism for QEMU to tell the kernel the SMMU mapping.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Shameerali Kolothum Thodi via 1 year ago

Hi Daniel,

> -----Original Message-----
> From: Daniel P. Berrangé <berrange@redhat.com>
> Sent: Friday, January 31, 2025 9:42 PM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
> nicolinc@nvidia.com; ddutile@redhat.com; Linuxarm
> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> nested SMMUv3
> 
> On Thu, Jan 30, 2025 at 06:09:24PM +0000, Shameerali Kolothum Thodi
> wrote:
> >
> > Each "arm-smmuv3-nested" instance, when the first device gets attached
> > to it, will create a S2 HWPT and a corresponding SMMUv3 domain in
> kernel
> > SMMUv3 driver. This domain will have a pointer representing the physical
> > SMMUv3 that the device belongs. And any other device which belongs to
> > the same physical SMMUv3 can share this S2 domain.
> 
> Ok, so given two guest SMMUv3s,   A and B, and two host SMMUv3s,
> C and D, we could end up with A&C and B&D paired, or we could
> end up with A&D and B&C paired, depending on whether we plug
> the first VFIO device into guest SMMUv3  A or B.
> 
> This is bad.  Behaviour must not vary depending on the order
> in which we create devices.
> 
> An guest SMMUv3 is paired to a guest PXB. A guest PXB is liable
> to be paired to a guest NUMA node. A guest NUMA node is liable
> to be paired to host NUMA node. The guest/host SMMU pairing
> must be chosen such that it makes conceptual sense wrt to the
> guest PXB NUMA to host NUMA pairing.
> 
> If the kernel picks guest<->host SMMU pairings on a first-device
> first-paired basis, this can end up with incorrect guest NUMA
> configurations.

Ok. I am trying to understand how this can happen as I assume the
Guest PXB numa node is picked up by whatever device we are
attaching to it and based on which numa_id that device belongs to
in physical host.

And the physical smmuv3 numa id will be the same to that of the
device numa_id  it is associated with. Isn't it?

For example I have a system here, that has 8 phys SMMUv3s and numa
assignments on this is something like below,

Phys SMMUv3.0 --> node 0
  \..dev1 --> node0
Phys SMMUv3.1 --> node 0
\..dev2 -->node0
Phys SMMUv3.2 --> node 0
Phys SMMUv3.3 --> node 0

Phys SMMUv3.4 --> node 1
Phys SMMUv3.5 --> node 1
\..dev5 --> node1
Phys SMMUv3.6 --> node 1
Phys SMMUv3.7 --> node 1


If I have to assign say dev 1, 2 and 5 to a Guest, we need to specify 3
 "arm-smmuv3-accel" instances as they belong to different phys SMMUv3s.

-device pxb-pcie,id=pcie.1,bus_nr=1,bus=pcie.0,numa_id=0 \
-device pxb-pcie,id=pcie.2,bus_nr=2,bus=pcie.0,numa_id=0 \
-device pxb-pcie,id=pcie.3,bus_nr=3,bus=pcie.0,numa_id=1 \
-device arm-smmuv3-accel,id=smmuv1,bus=pcie.1 \
-device arm-smmuv3-accel,id=smmuv2,bus=pcie.2 \
-device arm-smmuv3-accel,id=smmuv3,bus=pcie.3 \
-device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1 \
-device pcie-root-port,id=pcie.port2,bus=pcie.3,chassis=2 \
-device pcie-root-port,id=pcie.port3,bus=pcie.2,chassis=3 \
-device vfio-pci,host=0000:dev1,bus=pcie.port1,iommufd=iommufd0 \
-device vfio-pci,host=0000: dev2,bus=pcie.port2,iommufd=iommufd0 \
-device vfio-pci,host=0000: dev5,bus=pcie.port3,iommufd=iommufd0

So I guess even if we don't specify the physical SMMUv3 association
explicitly, the kernel will check that based on the devices the Guest
SMMUv3 is attached to (and hence the Numa association), right?

In other words how an explicit association helps us here?

Or is it that the Guest PXB numa_id allocation is not always based
on device numa_id?

(May be I am missing something here. Sorry)

Thanks,
Shameer 















 
> The mgmt apps needs to be able to tell QEMU exactly which
> host SMMU to pair with each guest SMMU, and QEMU needs to
> then tell the kernel.
> 
> > And as I mentioned in cover letter, Qemu will report,
> >
> > "
> > Attempt to add the HNS VF to a different SMMUv3 will result in,
> >
> > -device vfio-pci,host=0000:7d:02.2,bus=pcie.port3,iommufd=iommufd0:
> Unable to attach viommu
> > -device vfio-pci,host=0000:7d:02.2,bus=pcie.port3,iommufd=iommufd0:
> vfio 0000:7d:02.2:
> >    Failed to set iommu_device: [iommufd=29] error attach 0000:7d:02.2
> (38) to id=11: Invalid argument
> >
> > At present Qemu is not doing any extra validation other than the above
> > failure to make sure the user configuration is correct or not. The
> > assumption is libvirt will take care of this.
> > "
> > So in summary, if the libvirt gets it wrong, Qemu will fail with error.
> 
> That's good error checking, and required, but also insufficient
> as illustrated above IMHO.
> 
> > If a more explicit association is required, some help from kernel is
> required
> > to identify the physical SMMUv3 associated with the device.
> 
> Yep, I think SMMUv3 info for devices needs to be exposed to userspace,
> as well as a mechanism for QEMU to tell the kernel the SMMU mapping.
> 
> 
> With regards,
> Daniel
> --
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange
> :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-
> https://www.instagram.com/dberrange :|
>

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Daniel P. Berrangé 1 year ago

On Thu, Feb 06, 2025 at 10:02:25AM +0000, Shameerali Kolothum Thodi wrote:
> Hi Daniel,
> 
> > -----Original Message-----
> > From: Daniel P. Berrangé <berrange@redhat.com>
> > Sent: Friday, January 31, 2025 9:42 PM
> > To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> > Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> > eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
> > nicolinc@nvidia.com; ddutile@redhat.com; Linuxarm
> > <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
> > jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
> > <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
> > Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> > nested SMMUv3
> > 
> > On Thu, Jan 30, 2025 at 06:09:24PM +0000, Shameerali Kolothum Thodi
> > wrote:
> > >
> > > Each "arm-smmuv3-nested" instance, when the first device gets attached
> > > to it, will create a S2 HWPT and a corresponding SMMUv3 domain in
> > kernel
> > > SMMUv3 driver. This domain will have a pointer representing the physical
> > > SMMUv3 that the device belongs. And any other device which belongs to
> > > the same physical SMMUv3 can share this S2 domain.
> > 
> > Ok, so given two guest SMMUv3s,   A and B, and two host SMMUv3s,
> > C and D, we could end up with A&C and B&D paired, or we could
> > end up with A&D and B&C paired, depending on whether we plug
> > the first VFIO device into guest SMMUv3  A or B.
> > 
> > This is bad.  Behaviour must not vary depending on the order
> > in which we create devices.
> > 
> > An guest SMMUv3 is paired to a guest PXB. A guest PXB is liable
> > to be paired to a guest NUMA node. A guest NUMA node is liable
> > to be paired to host NUMA node. The guest/host SMMU pairing
> > must be chosen such that it makes conceptual sense wrt to the
> > guest PXB NUMA to host NUMA pairing.
> > 
> > If the kernel picks guest<->host SMMU pairings on a first-device
> > first-paired basis, this can end up with incorrect guest NUMA
> > configurations.
> 
> Ok. I am trying to understand how this can happen as I assume the
> Guest PXB numa node is picked up by whatever device we are
> attaching to it and based on which numa_id that device belongs to
> in physical host.
> 
> And the physical smmuv3 numa id will be the same to that of the
> device numa_id  it is associated with. Isn't it?
> 
> For example I have a system here, that has 8 phys SMMUv3s and numa
> assignments on this is something like below,
> 
> Phys SMMUv3.0 --> node 0
>   \..dev1 --> node0
> Phys SMMUv3.1 --> node 0
> \..dev2 -->node0
> Phys SMMUv3.2 --> node 0
> Phys SMMUv3.3 --> node 0
> 
> Phys SMMUv3.4 --> node 1
> Phys SMMUv3.5 --> node 1
> \..dev5 --> node1
> Phys SMMUv3.6 --> node 1
> Phys SMMUv3.7 --> node 1
> 
> 
> If I have to assign say dev 1, 2 and 5 to a Guest, we need to specify 3
>  "arm-smmuv3-accel" instances as they belong to different phys SMMUv3s.
> 
> -device pxb-pcie,id=pcie.1,bus_nr=1,bus=pcie.0,numa_id=0 \
> -device pxb-pcie,id=pcie.2,bus_nr=2,bus=pcie.0,numa_id=0 \
> -device pxb-pcie,id=pcie.3,bus_nr=3,bus=pcie.0,numa_id=1 \
> -device arm-smmuv3-accel,id=smmuv1,bus=pcie.1 \
> -device arm-smmuv3-accel,id=smmuv2,bus=pcie.2 \
> -device arm-smmuv3-accel,id=smmuv3,bus=pcie.3 \
> -device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1 \
> -device pcie-root-port,id=pcie.port2,bus=pcie.3,chassis=2 \
> -device pcie-root-port,id=pcie.port3,bus=pcie.2,chassis=3 \
> -device vfio-pci,host=0000:dev1,bus=pcie.port1,iommufd=iommufd0 \
> -device vfio-pci,host=0000: dev2,bus=pcie.port2,iommufd=iommufd0 \
> -device vfio-pci,host=0000: dev5,bus=pcie.port3,iommufd=iommufd0
> 
> So I guess even if we don't specify the physical SMMUv3 association
> explicitly, the kernel will check that based on the devices the Guest
> SMMUv3 is attached to (and hence the Numa association), right?

It isn't about checking the devices, it is about the guest SMMU
getting differing host SMMU associations.

> In other words how an explicit association helps us here?
> 
> Or is it that the Guest PXB numa_id allocation is not always based
> on device numa_id?

Lets simplify to 2 SMMUs for shorter CLIs.

So to start with we assume physical host with two SMMUs, and
two PCI devices we want to assign

  0000:dev1 - associated with host SMMU 1, and host NUMA node 0
  0000:dev2 - associated with host SMMU 2, and host NUMA node 1

So now we configure QEMU like this:

 -device pxb-pcie,id=pcie.1,bus_nr=1,bus=pcie.0,numa_id=0
 -device pxb-pcie,id=pcie.2,bus_nr=2,bus=pcie.0,numa_id=1
 -device arm-smmuv3-accel,id=smmuv1,bus=pcie.1
 -device arm-smmuv3-accel,id=smmuv2,bus=pcie.2
 -device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1
 -device pcie-root-port,id=pcie.port2,bus=pcie.2,chassis=2
 -device vfio-pci,host=0000:dev1,bus=pcie.port1,iommufd=iommufd0
 -device vfio-pci,host=0000:dev2,bus=pcie.port2,iommufd=iommufd0

For brevity I'm not going to show the config for host/guest NUMA mappings,
but assume that guest NUMA node 0 has been configured to map to host NUMA
node 0 and guest node 1 to host node 1.

In this order of QEMU CLI args we get

  VFIO device 0000:dev1 causes the kernel to associate guest smmuv1 with
  host SSMU 1.

  VFIO device 0000:dev2 causes the kernel to associate guest smmuv2 with
  host SSMU 2.

Now consider we swap the ordering of the VFIO Devices on the QEMU cli


 -device pxb-pcie,id=pcie.1,bus_nr=1,bus=pcie.0,numa_id=0
 -device pxb-pcie,id=pcie.2,bus_nr=2,bus=pcie.0,numa_id=1
 -device arm-smmuv3-accel,id=smmuv1,bus=pcie.1
 -device arm-smmuv3-accel,id=smmuv2,bus=pcie.2
 -device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1
 -device pcie-root-port,id=pcie.port2,bus=pcie.2,chassis=2
 -device vfio-pci,host=0000:dev2,bus=pcie.port2,iommufd=iommufd0
 -device vfio-pci,host=0000:dev1,bus=pcie.port1,iommufd=iommufd0

In this order of QEMU CLI args we get

  VFIO device 0000:dev2 causes the kernel to associate guest smmuv1 with
  host SSMU 2.

  VFIO device 0000:dev1 causes the kernel to associate guest smmuv2 with
  host SSMU 1.

This is broken, as now we have inconsistent NUMA mappings between host
and guest. 0000:dev2 is associated with a PXB on NUMA node 1, but
associated with a guest SMMU that was paired with a PXB on NUMA node 0.

This is because the kernel is doing first-come first-matched logic for
mapping guest and host SMMUs, and thus is sensitive to ordering of the
VFIO devices on the CLI. We need to be ordering invariant, which means
libvirt must tell  QEMU which host + guest SMMUs to pair together, and
QEMU must in turn tell the kernel.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Shameerali Kolothum Thodi via 1 year ago


> -----Original Message-----
> From: Daniel P. Berrangé <berrange@redhat.com>
> Sent: Thursday, February 6, 2025 10:37 AM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
> nicolinc@nvidia.com; ddutile@redhat.com; Linuxarm
> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org;
> nathanc@nvidia.com
> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> nested SMMUv3
> 
> On Thu, Feb 06, 2025 at 10:02:25AM +0000, Shameerali Kolothum Thodi
> wrote:
> > Hi Daniel,
> >
> > > -----Original Message-----
> > > From: Daniel P. Berrangé <berrange@redhat.com>
> > > Sent: Friday, January 31, 2025 9:42 PM
> > > To: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>
> > > Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> > > eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
> > > nicolinc@nvidia.com; ddutile@redhat.com; Linuxarm
> > > <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
> > > jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
> > > <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
> > > Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-
> creatable
> > > nested SMMUv3
> > >
> > > On Thu, Jan 30, 2025 at 06:09:24PM +0000, Shameerali Kolothum Thodi
> > > wrote:
> > > >
> > > > Each "arm-smmuv3-nested" instance, when the first device gets
> attached
> > > > to it, will create a S2 HWPT and a corresponding SMMUv3 domain in
> > > kernel
> > > > SMMUv3 driver. This domain will have a pointer representing the
> physical
> > > > SMMUv3 that the device belongs. And any other device which belongs
> to
> > > > the same physical SMMUv3 can share this S2 domain.
> > >
> > > Ok, so given two guest SMMUv3s,   A and B, and two host SMMUv3s,
> > > C and D, we could end up with A&C and B&D paired, or we could
> > > end up with A&D and B&C paired, depending on whether we plug
> > > the first VFIO device into guest SMMUv3  A or B.
> > >
> > > This is bad.  Behaviour must not vary depending on the order
> > > in which we create devices.
> > >
> > > An guest SMMUv3 is paired to a guest PXB. A guest PXB is liable
> > > to be paired to a guest NUMA node. A guest NUMA node is liable
> > > to be paired to host NUMA node. The guest/host SMMU pairing
> > > must be chosen such that it makes conceptual sense wrt to the
> > > guest PXB NUMA to host NUMA pairing.
> > >
> > > If the kernel picks guest<->host SMMU pairings on a first-device
> > > first-paired basis, this can end up with incorrect guest NUMA
> > > configurations.
> >
> > Ok. I am trying to understand how this can happen as I assume the
> > Guest PXB numa node is picked up by whatever device we are
> > attaching to it and based on which numa_id that device belongs to
> > in physical host.
> >
> > And the physical smmuv3 numa id will be the same to that of the
> > device numa_id  it is associated with. Isn't it?
> >
> > For example I have a system here, that has 8 phys SMMUv3s and numa
> > assignments on this is something like below,
> >
> > Phys SMMUv3.0 --> node 0
> >   \..dev1 --> node0
> > Phys SMMUv3.1 --> node 0
> > \..dev2 -->node0
> > Phys SMMUv3.2 --> node 0
> > Phys SMMUv3.3 --> node 0
> >
> > Phys SMMUv3.4 --> node 1
> > Phys SMMUv3.5 --> node 1
> > \..dev5 --> node1
> > Phys SMMUv3.6 --> node 1
> > Phys SMMUv3.7 --> node 1
> >
> >
> > If I have to assign say dev 1, 2 and 5 to a Guest, we need to specify 3
> >  "arm-smmuv3-accel" instances as they belong to different phys
> SMMUv3s.
> >
> > -device pxb-pcie,id=pcie.1,bus_nr=1,bus=pcie.0,numa_id=0 \
> > -device pxb-pcie,id=pcie.2,bus_nr=2,bus=pcie.0,numa_id=0 \
> > -device pxb-pcie,id=pcie.3,bus_nr=3,bus=pcie.0,numa_id=1 \
> > -device arm-smmuv3-accel,id=smmuv1,bus=pcie.1 \
> > -device arm-smmuv3-accel,id=smmuv2,bus=pcie.2 \
> > -device arm-smmuv3-accel,id=smmuv3,bus=pcie.3 \
> > -device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1 \
> > -device pcie-root-port,id=pcie.port2,bus=pcie.3,chassis=2 \
> > -device pcie-root-port,id=pcie.port3,bus=pcie.2,chassis=3 \
> > -device vfio-pci,host=0000:dev1,bus=pcie.port1,iommufd=iommufd0 \
> > -device vfio-pci,host=0000: dev2,bus=pcie.port2,iommufd=iommufd0 \
> > -device vfio-pci,host=0000: dev5,bus=pcie.port3,iommufd=iommufd0
> >
> > So I guess even if we don't specify the physical SMMUv3 association
> > explicitly, the kernel will check that based on the devices the Guest
> > SMMUv3 is attached to (and hence the Numa association), right?
> 
> It isn't about checking the devices, it is about the guest SMMU
> getting differing host SMMU associations.
> 
> > In other words how an explicit association helps us here?
> >
> > Or is it that the Guest PXB numa_id allocation is not always based
> > on device numa_id?
> 
> Lets simplify to 2 SMMUs for shorter CLIs.
> 
> So to start with we assume physical host with two SMMUs, and
> two PCI devices we want to assign
> 
>   0000:dev1 - associated with host SMMU 1, and host NUMA node 0
>   0000:dev2 - associated with host SMMU 2, and host NUMA node 1
> 
> So now we configure QEMU like this:
> 
>  -device pxb-pcie,id=pcie.1,bus_nr=1,bus=pcie.0,numa_id=0
>  -device pxb-pcie,id=pcie.2,bus_nr=2,bus=pcie.0,numa_id=1
>  -device arm-smmuv3-accel,id=smmuv1,bus=pcie.1
>  -device arm-smmuv3-accel,id=smmuv2,bus=pcie.2
>  -device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1
>  -device pcie-root-port,id=pcie.port2,bus=pcie.2,chassis=2
>  -device vfio-pci,host=0000:dev1,bus=pcie.port1,iommufd=iommufd0
>  -device vfio-pci,host=0000:dev2,bus=pcie.port2,iommufd=iommufd0
> 
> For brevity I'm not going to show the config for host/guest NUMA
> mappings,
> but assume that guest NUMA node 0 has been configured to map to host
> NUMA
> node 0 and guest node 1 to host node 1.
> 
> In this order of QEMU CLI args we get
> 
>   VFIO device 0000:dev1 causes the kernel to associate guest smmuv1 with
>   host SSMU 1.
> 
>   VFIO device 0000:dev2 causes the kernel to associate guest smmuv2 with
>   host SSMU 2.
> 
> Now consider we swap the ordering of the VFIO Devices on the QEMU cli
> 
> 
>  -device pxb-pcie,id=pcie.1,bus_nr=1,bus=pcie.0,numa_id=0
>  -device pxb-pcie,id=pcie.2,bus_nr=2,bus=pcie.0,numa_id=1
>  -device arm-smmuv3-accel,id=smmuv1,bus=pcie.1
>  -device arm-smmuv3-accel,id=smmuv2,bus=pcie.2
>  -device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1
>  -device pcie-root-port,id=pcie.port2,bus=pcie.2,chassis=2
>  -device vfio-pci,host=0000:dev2,bus=pcie.port2,iommufd=iommufd0
>  -device vfio-pci,host=0000:dev1,bus=pcie.port1,iommufd=iommufd0
> 
> In this order of QEMU CLI args we get
> 
>   VFIO device 0000:dev2 causes the kernel to associate guest smmuv1 with
>   host SSMU 2.
> 
>   VFIO device 0000:dev1 causes the kernel to associate guest smmuv2 with
>   host SSMU 1.
> 
> This is broken, as now we have inconsistent NUMA mappings between host
> and guest. 0000:dev2 is associated with a PXB on NUMA node 1, but
> associated with a guest SMMU that was paired with a PXB on NUMA node
> 0.

Hmm..I don’t think just swapping the order will change the association with
Guest SMMU here. Because, we have,

>  -device arm-smmuv3-accel,id=smmuv2,bus=pcie.2

During smmuv3-accel realize time, this will result in, 
 pci_setup_iommu(primary_bus, ops, smmu_state);

And when the vfio dev realization happens,
 set_iommu_device() 
   smmu_dev_set_iommu_device(bus, smmu_state, ,)
      --> this is where the guest smmuv3-->host smmuv3 association is first
            established. And any further vfio dev to this Guest SMMU will
            only succeeds if it belongs to the same phys SMMU.

ie, the Guest SMMU to pci bus association, actually make sure you have the
same Guest SMMU for the device.

smmuv2 --> pcie.2 --> (pxb-pcie, numa_id = 1)
0000:dev2 -->  pcie.port2 --> pcie.2 --> smmuv2 (pxb-pcie, numa_id = 1)

Hence the association of 0000:dev2 to Guest SMMUv2 remain same.

I hope this is clear. And I am not sure the association will be broken in any
other way unless Qemu CLI specify the dev to a different PXB.

May be it is that one of my earlier replies caused this confusion that 
ordering of the VFIO Devices on the QEMU cli will affect the association.

Thanks,
Shameer

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Daniel P. Berrangé 1 year ago

On Thu, Feb 06, 2025 at 01:51:15PM +0000, Shameerali Kolothum Thodi wrote:
> Hmm..I don’t think just swapping the order will change the association with
> Guest SMMU here. Because, we have,
> 
> >  -device arm-smmuv3-accel,id=smmuv2,bus=pcie.2
> 
> During smmuv3-accel realize time, this will result in, 
>  pci_setup_iommu(primary_bus, ops, smmu_state);
> 
> And when the vfio dev realization happens,
>  set_iommu_device() 
>    smmu_dev_set_iommu_device(bus, smmu_state, ,)
>       --> this is where the guest smmuv3-->host smmuv3 association is first
>             established. And any further vfio dev to this Guest SMMU will
>             only succeeds if it belongs to the same phys SMMU.
> 
> ie, the Guest SMMU to pci bus association, actually make sure you have the
> same Guest SMMU for the device.

Ok, so at time of VFIO device realize, QEMU is telling the kernel
to associate a physical SMMU, and its doing this with the virtual
SMMU attached to PXB parenting the VFIO device.

> smmuv2 --> pcie.2 --> (pxb-pcie, numa_id = 1)
> 0000:dev2 -->  pcie.port2 --> pcie.2 --> smmuv2 (pxb-pcie, numa_id = 1)
> 
> Hence the association of 0000:dev2 to Guest SMMUv2 remain same.

Yes, I concur the SMMU physical <-> virtual association should
be fixed, as long as the same VFIO device is always added to
the same virtual SMMU.

> I hope this is clear. And I am not sure the association will be broken in any
> other way unless Qemu CLI specify the dev to a different PXB.

Although the ordering is at least predictable, I remain uncomfortable
about the idea of the virtual SMMU association with the physical SMMU
being a side effect of the VFIO device placement.

There is still the open door for admin mis-configuration that will not
be diagnosed. eg consider we attached VFIO device 1 from the host NUMA
node 1 to  a PXB associated with host NUMA node 0. As long as that's
the first VFIO device, the kernel will happily associate the physical
and guest SMMUs.

If we set the physical/guest SMMU relationship directly, then at the
time the VFIO device is plugged, we can diagnose the incorrectly
placed VFIO device, and better reason about behaviour.

I've another question about unplug behaviour..

 1. Plug a VFIO device for host SMMU 1 into a PXB with guest SMMU 1.
      => Kernel associates host SMMU 1 and guest SMMU 1 together
 2. Unplug this VFIO device
 3. Plug a VFIO device for host SMMU 2 into a PXB with guest SMMU 1.

Does the host/guest SMMU 1<-> 1 association remain set after step 2,
implying step 3 will fail ? Or does it get unset, allowing step 3
to succeed, and establish a new mapping host SMMU 2 to guest SMMU 1.

If step 2 does NOT break the association, do we preserve that
across a savevm+loadvm sequence of QEMU. If we don't, then step
3 would fail before the savevm, but succeed after the loadvm.

Explicitly representing the host SMMU association on the guest SMMU
config makes this behaviour unambiguous. The host / guest SMMU
relationship is fixed for the lifetime of the VM and invariant of
whatever VFIO device is (or was previously) plugged.

So I still go back to my general principle that automatic side effects
are an undesirable idea in QEMU configuration. We have a long tradition
of making everything entirely explicit to produce easily predictable
behaviour.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Shameerali Kolothum Thodi via 1 year ago


> -----Original Message-----
> From: Daniel P. Berrangé <berrange@redhat.com>
> Sent: Thursday, February 6, 2025 2:47 PM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
> nicolinc@nvidia.com; ddutile@redhat.com; Linuxarm
> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org;
> nathanc@nvidia.com
> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> nested SMMUv3
> 
> On Thu, Feb 06, 2025 at 01:51:15PM +0000, Shameerali Kolothum Thodi
> wrote:
> > Hmm..I don’t think just swapping the order will change the association
> with
> > Guest SMMU here. Because, we have,
> >
> > >  -device arm-smmuv3-accel,id=smmuv2,bus=pcie.2
> >
> > During smmuv3-accel realize time, this will result in,
> >  pci_setup_iommu(primary_bus, ops, smmu_state);
> >
> > And when the vfio dev realization happens,
> >  set_iommu_device()
> >    smmu_dev_set_iommu_device(bus, smmu_state, ,)
> >       --> this is where the guest smmuv3-->host smmuv3 association is first
> >             established. And any further vfio dev to this Guest SMMU will
> >             only succeeds if it belongs to the same phys SMMU.
> >
> > ie, the Guest SMMU to pci bus association, actually make sure you have
> the
> > same Guest SMMU for the device.
> 
> Ok, so at time of VFIO device realize, QEMU is telling the kernel
> to associate a physical SMMU, and its doing this with the virtual
> SMMU attached to PXB parenting the VFIO device.
> 
> > smmuv2 --> pcie.2 --> (pxb-pcie, numa_id = 1)
> > 0000:dev2 -->  pcie.port2 --> pcie.2 --> smmuv2 (pxb-pcie, numa_id = 1)
> >
> > Hence the association of 0000:dev2 to Guest SMMUv2 remain same.
> 
> Yes, I concur the SMMU physical <-> virtual association should
> be fixed, as long as the same VFIO device is always added to
> the same virtual SMMU.
> 
> > I hope this is clear. And I am not sure the association will be broken in
> any
> > other way unless Qemu CLI specify the dev to a different PXB.
> 
> Although the ordering is at least predictable, I remain uncomfortable
> about the idea of the virtual SMMU association with the physical SMMU
> being a side effect of the VFIO device placement.
> 
> There is still the open door for admin mis-configuration that will not
> be diagnosed. eg consider we attached VFIO device 1 from the host NUMA
> node 1 to  a PXB associated with host NUMA node 0. As long as that's
> the first VFIO device, the kernel will happily associate the physical
> and guest SMMUs.

Yes. A mis-configuration can place it on a wrong one. 
 
> If we set the physical/guest SMMU relationship directly, then at the
> time the VFIO device is plugged, we can diagnose the incorrectly
> placed VFIO device, and better reason about behaviour.

Agree.

> I've another question about unplug behaviour..
> 
>  1. Plug a VFIO device for host SMMU 1 into a PXB with guest SMMU 1.
>       => Kernel associates host SMMU 1 and guest SMMU 1 together
>  2. Unplug this VFIO device
>  3. Plug a VFIO device for host SMMU 2 into a PXB with guest SMMU 1.
> 
> Does the host/guest SMMU 1<-> 1 association remain set after step 2,
> implying step 3 will fail ? Or does it get unset, allowing step 3
> to succeed, and establish a new mapping host SMMU 2 to guest SMMU 1.

At the moment the first association is not persistent. So a new mapping 
is possible.
 
> If step 2 does NOT break the association, do we preserve that
> across a savevm+loadvm sequence of QEMU. If we don't, then step
> 3 would fail before the savevm, but succeed after the loadvm.

Right. I haven't attempted migration tests yet. But agree that an 
explicit association is better to make migration compatible. Also
I am not sure if the target has a different phys SMMUV3<--> dev
mapping how we handle that.

> Explicitly representing the host SMMU association on the guest SMMU
> config makes this behaviour unambiguous. The host / guest SMMU
> relationship is fixed for the lifetime of the VM and invariant of
> whatever VFIO device is (or was previously) plugged.
> 
> So I still go back to my general principle that automatic side effects
> are an undesirable idea in QEMU configuration. We have a long tradition
> of making everything entirely explicit to produce easily predictable
> behaviour.

Ok. Convinced 😊. Thanks for explaining.

Shameer

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Jason Gunthorpe 1 year ago

On Thu, Feb 06, 2025 at 03:07:06PM +0000, Shameerali Kolothum Thodi wrote:
> > If we set the physical/guest SMMU relationship directly, then at the
> > time the VFIO device is plugged, we can diagnose the incorrectly
> > placed VFIO device, and better reason about behaviour.
> 
> Agree.

Can you just take in a VFIO cdev FD reference on this command line:

 -device arm-smmuv3-accel,id=smmuv2,bus=pcie.2

And that will lock the pSMMU/vSMMU relationship?

Jason

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Daniel P. Berrangé 1 year ago

On Thu, Feb 06, 2025 at 01:02:38PM -0400, Jason Gunthorpe wrote:
> On Thu, Feb 06, 2025 at 03:07:06PM +0000, Shameerali Kolothum Thodi wrote:
> > > If we set the physical/guest SMMU relationship directly, then at the
> > > time the VFIO device is plugged, we can diagnose the incorrectly
> > > placed VFIO device, and better reason about behaviour.
> > 
> > Agree.
> 
> Can you just take in a VFIO cdev FD reference on this command line:
> 
>  -device arm-smmuv3-accel,id=smmuv2,bus=pcie.2
> 
> And that will lock the pSMMU/vSMMU relationship?

We shouldn't assume any VFIO device exists in the QEMU cnofig at the time
we realize the virtual ssmu. I expect the SMMU may be cold plugged, while
the VFIO devices may be hot plugged arbitrarly later, and we should have
the association initialized the SMMU is realized.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Jason Gunthorpe 1 year ago

On Thu, Feb 06, 2025 at 05:10:32PM +0000, Daniel P. Berrangé wrote:
> On Thu, Feb 06, 2025 at 01:02:38PM -0400, Jason Gunthorpe wrote:
> > On Thu, Feb 06, 2025 at 03:07:06PM +0000, Shameerali Kolothum Thodi wrote:
> > > > If we set the physical/guest SMMU relationship directly, then at the
> > > > time the VFIO device is plugged, we can diagnose the incorrectly
> > > > placed VFIO device, and better reason about behaviour.
> > > 
> > > Agree.
> > 
> > Can you just take in a VFIO cdev FD reference on this command line:
> > 
> >  -device arm-smmuv3-accel,id=smmuv2,bus=pcie.2
> > 
> > And that will lock the pSMMU/vSMMU relationship?
> 
> We shouldn't assume any VFIO device exists in the QEMU cnofig at the time
> we realize the virtual ssmu. I expect the SMMU may be cold plugged, while
> the VFIO devices may be hot plugged arbitrarly later, and we should have
> the association initialized the SMMU is realized.

This is not supported kernel side, you can't instantiate a vIOMMU
without a VFIO device that uses it. For security.

Jason

RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Shameerali Kolothum Thodi via 1 year ago


> -----Original Message-----
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Thursday, February 6, 2025 5:47 PM
> To: Daniel P. Berrangé <berrange@redhat.com>
> Cc: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
> qemu-devel@nongnu.org; eric.auger@redhat.com;
> peter.maydell@linaro.org; nicolinc@nvidia.com; ddutile@redhat.com;
> Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> Jonathan Cameron <jonathan.cameron@huawei.com>;
> zhangfei.gao@linaro.org; nathanc@nvidia.com
> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> nested SMMUv3
> 
> On Thu, Feb 06, 2025 at 05:10:32PM +0000, Daniel P. Berrangé wrote:
> > On Thu, Feb 06, 2025 at 01:02:38PM -0400, Jason Gunthorpe wrote:
> > > On Thu, Feb 06, 2025 at 03:07:06PM +0000, Shameerali Kolothum Thodi
> wrote:
> > > > > If we set the physical/guest SMMU relationship directly, then at the
> > > > > time the VFIO device is plugged, we can diagnose the incorrectly
> > > > > placed VFIO device, and better reason about behaviour.
> > > >
> > > > Agree.
> > >
> > > Can you just take in a VFIO cdev FD reference on this command line:
> > >
> > >  -device arm-smmuv3-accel,id=smmuv2,bus=pcie.2
> > >
> > > And that will lock the pSMMU/vSMMU relationship?
> >
> > We shouldn't assume any VFIO device exists in the QEMU cnofig at the
> time
> > we realize the virtual ssmu. I expect the SMMU may be cold plugged,
> while
> > the VFIO devices may be hot plugged arbitrarly later, and we should have
> > the association initialized the SMMU is realized.
> 
> This is not supported kernel side, you can't instantiate a vIOMMU
> without a VFIO device that uses it. For security.

I think that is fine if Qemu knows about association beforehand. During 
vIOMMU instantiation it can cross check whether the user specified
pSMMU <->vSMMU is correct for the device.

Also how do we do it with multiple VF devices under a pSUMMU ? Which
cdev fd in that case? 

Thanks,
Shameer

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Jason Gunthorpe 1 year ago

On Thu, Feb 06, 2025 at 05:57:38PM +0000, Shameerali Kolothum Thodi wrote:

> Also how do we do it with multiple VF devices under a pSUMMU ? Which
> cdev fd in that case? 

It doesn't matter, they are all interchangeable. Creating the VIOMMU
object just requires any vfio device that is attached to the physical
smmu.

Jason

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Daniel P. Berrangé 1 year ago

On Thu, Feb 06, 2025 at 01:46:47PM -0400, Jason Gunthorpe wrote:
> On Thu, Feb 06, 2025 at 05:10:32PM +0000, Daniel P. Berrangé wrote:
> > On Thu, Feb 06, 2025 at 01:02:38PM -0400, Jason Gunthorpe wrote:
> > > On Thu, Feb 06, 2025 at 03:07:06PM +0000, Shameerali Kolothum Thodi wrote:
> > > > > If we set the physical/guest SMMU relationship directly, then at the
> > > > > time the VFIO device is plugged, we can diagnose the incorrectly
> > > > > placed VFIO device, and better reason about behaviour.
> > > > 
> > > > Agree.
> > > 
> > > Can you just take in a VFIO cdev FD reference on this command line:
> > > 
> > >  -device arm-smmuv3-accel,id=smmuv2,bus=pcie.2
> > > 
> > > And that will lock the pSMMU/vSMMU relationship?
> > 
> > We shouldn't assume any VFIO device exists in the QEMU cnofig at the time
> > we realize the virtual ssmu. I expect the SMMU may be cold plugged, while
> > the VFIO devices may be hot plugged arbitrarly later, and we should have
> > the association initialized the SMMU is realized.
> 
> This is not supported kernel side, you can't instantiate a vIOMMU
> without a VFIO device that uses it. For security.

What are the security concerns here ?

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Jason Gunthorpe 1 year ago

On Thu, Feb 06, 2025 at 05:54:57PM +0000, Daniel P. Berrangé wrote:
> > > We shouldn't assume any VFIO device exists in the QEMU cnofig at the time
> > > we realize the virtual ssmu. I expect the SMMU may be cold plugged, while
> > > the VFIO devices may be hot plugged arbitrarly later, and we should have
> > > the association initialized the SMMU is realized.
> > 
> > This is not supported kernel side, you can't instantiate a vIOMMU
> > without a VFIO device that uses it. For security.
> 
> What are the security concerns here ?

You should not be able to open iommufd and manipulate iommu HW that
you don't have a VFIO descriptor for, including creating physical
vIOMMU resources, allocating command queues and whatever else.

Some kind of hot plug smmu would have to create a vSMMU without any
kernel backing and then later bind it to a kernel implementation.

Jason

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Daniel P. Berrangé 1 year ago

On Thu, Feb 06, 2025 at 01:58:43PM -0400, Jason Gunthorpe wrote:
> On Thu, Feb 06, 2025 at 05:54:57PM +0000, Daniel P. Berrangé wrote:
> > > > We shouldn't assume any VFIO device exists in the QEMU cnofig at the time
> > > > we realize the virtual ssmu. I expect the SMMU may be cold plugged, while
> > > > the VFIO devices may be hot plugged arbitrarly later, and we should have
> > > > the association initialized the SMMU is realized.
> > > 
> > > This is not supported kernel side, you can't instantiate a vIOMMU
> > > without a VFIO device that uses it. For security.
> > 
> > What are the security concerns here ?
> 
> You should not be able to open iommufd and manipulate iommu HW that
> you don't have a VFIO descriptor for, including creating physical
> vIOMMU resources, allocating command queues and whatever else.
> 
> Some kind of hot plug smmu would have to create a vSMMU without any
> kernel backing and then later bind it to a kernel implementation.

Ok, so if we give the info about the vSMMU <-> pSMMU binding to
QEMU upfront, it can delay using it until the point where the kernel
accepts it. This at least gives a clear design to applications outside
QEMU, and hides the low level impl details to inside QEMU.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Shameerali Kolothum Thodi via 1 year ago


> -----Original Message-----
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Thursday, February 6, 2025 5:59 PM
> To: Daniel P. Berrangé <berrange@redhat.com>
> Cc: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
> qemu-devel@nongnu.org; eric.auger@redhat.com;
> peter.maydell@linaro.org; nicolinc@nvidia.com; ddutile@redhat.com;
> Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> Jonathan Cameron <jonathan.cameron@huawei.com>;
> zhangfei.gao@linaro.org; nathanc@nvidia.com
> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> nested SMMUv3
> 
> On Thu, Feb 06, 2025 at 05:54:57PM +0000, Daniel P. Berrangé wrote:
> > > > We shouldn't assume any VFIO device exists in the QEMU cnofig at the
> time
> > > > we realize the virtual ssmu. I expect the SMMU may be cold plugged,
> while
> > > > the VFIO devices may be hot plugged arbitrarly later, and we should
> have
> > > > the association initialized the SMMU is realized.
> > >
> > > This is not supported kernel side, you can't instantiate a vIOMMU
> > > without a VFIO device that uses it. For security.
> >
> > What are the security concerns here ?
> 
> You should not be able to open iommufd and manipulate iommu HW that
> you don't have a VFIO descriptor for, including creating physical
> vIOMMU resources, allocating command queues and whatever else.
> 
> Some kind of hot plug smmu would have to create a vSMMU without any
> kernel backing and then later bind it to a kernel implementation.

Not sure I get the problem with associating vSMMU with a pSMMU. Something
like an iommu instance id mentioned before,

-device arm-smmuv3-accel,id=smmuv2,bus=pcie.2,host-smmu=iommu.1

This can realize the vSMMU without actually creating a vIOMMU in kernel.
And when the dev gets attached/realized, check (GET_HW_INFO)the specified
iommu instance id matches or not.

Or the concern here is exporting an iommu instance id to user space?

Thanks,
Shameer

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Jason Gunthorpe 1 year ago

On Thu, Feb 06, 2025 at 06:04:57PM +0000, Shameerali Kolothum Thodi wrote:
> > Some kind of hot plug smmu would have to create a vSMMU without any
> > kernel backing and then later bind it to a kernel implementation.
> 
> Not sure I get the problem with associating vSMMU with a pSMMU. Something
> like an iommu instance id mentioned before,
> 
> -device arm-smmuv3-accel,id=smmuv2,bus=pcie.2,host-smmu=iommu.1
> 
> This can realize the vSMMU without actually creating a vIOMMU in kernel.
> And when the dev gets attached/realized, check (GET_HW_INFO)the specified
> iommu instance id matches or not.
> 
> Or the concern here is exporting an iommu instance id to user space?

Philisophically we do not permit any HW access through iommufd without
a VFIO fd to "prove" the process has rights to touch hardware.

We don't have any way to prove the process has rights to touch the
iommu hardware seperately from VFIO.

So even if you invent an iommu ID we cannot accept it as a handle to
create viommu in iommufd.

Jason

RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Shameerali Kolothum Thodi via 1 year ago


> -----Original Message-----
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Thursday, February 6, 2025 6:13 PM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: Daniel P. Berrangé <berrange@redhat.com>; qemu-arm@nongnu.org;
> qemu-devel@nongnu.org; eric.auger@redhat.com;
> peter.maydell@linaro.org; nicolinc@nvidia.com; ddutile@redhat.com;
> Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> Jonathan Cameron <jonathan.cameron@huawei.com>;
> zhangfei.gao@linaro.org; nathanc@nvidia.com
> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> nested SMMUv3
> 
> On Thu, Feb 06, 2025 at 06:04:57PM +0000, Shameerali Kolothum Thodi
> wrote:
> > > Some kind of hot plug smmu would have to create a vSMMU without
> any
> > > kernel backing and then later bind it to a kernel implementation.
> >
> > Not sure I get the problem with associating vSMMU with a pSMMU.
> Something
> > like an iommu instance id mentioned before,
> >
> > -device arm-smmuv3-accel,id=smmuv2,bus=pcie.2,host-smmu=iommu.1
> >
> > This can realize the vSMMU without actually creating a vIOMMU in kernel.
> > And when the dev gets attached/realized, check (GET_HW_INFO)the
> specified
> > iommu instance id matches or not.
> >
> > Or the concern here is exporting an iommu instance id to user space?
> 
> Philisophically we do not permit any HW access through iommufd without
> a VFIO fd to "prove" the process has rights to touch hardware.
> 
> We don't have any way to prove the process has rights to touch the
> iommu hardware seperately from VFIO.

It is not. Qemu just instantiates a vSMMU and assigns the IOMMU 
instance id to it.

> 
> So even if you invent an iommu ID we cannot accept it as a handle to
> create viommu in iommufd.

Creating the vIOMMU only happens when the user does a  cold/hot plug of
a VFIO device. At that time Qemu checks whether the assigned id matches
with whatever the kernel tell it. 

Thanks,
Shameer

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Jason Gunthorpe 1 year ago

On Thu, Feb 06, 2025 at 06:18:14PM +0000, Shameerali Kolothum Thodi wrote:

> > So even if you invent an iommu ID we cannot accept it as a handle to
> > create viommu in iommufd.
> 
> Creating the vIOMMU only happens when the user does a  cold/hot plug of
> a VFIO device. At that time Qemu checks whether the assigned id matches
> with whatever the kernel tell it. 

This is not hard up until the guest is started. If you boot a guest
without a backing viommu iommufd object then there will be some more
complexities.

Jason

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Nicolin Chen 1 year ago

On Thu, Feb 06, 2025 at 02:22:01PM -0400, Jason Gunthorpe wrote:
> On Thu, Feb 06, 2025 at 06:18:14PM +0000, Shameerali Kolothum Thodi wrote:
> 
> > > So even if you invent an iommu ID we cannot accept it as a handle to
> > > create viommu in iommufd.
> > 
> > Creating the vIOMMU only happens when the user does a  cold/hot plug of
> > a VFIO device. At that time Qemu checks whether the assigned id matches
> > with whatever the kernel tell it. 
> 
> This is not hard up until the guest is started. If you boot a guest
> without a backing viommu iommufd object then there will be some more
> complexities.

Yea, I imagined that things would be complicated with hotplugs..

On one hand, I got the part that we need some fixed link forehand
to ease migration/hotplugs.

On the other hand, all IOMMUFD ioctls need a VFIO device FD, which
brings the immediate attention that we cannot even decide vSMMU's
capabilities being reflected in its IDR/IIDR registers, without a
coldplug device -- if we boot a VM (one vSMMU<->pSMMU) with only a
hotplug device, the IOMMU_GET_HW_INFO cannot be done during guest
kernel probing vSMMU instance. So we would have to reset the vSMMU
"HW" after the device hotplug?

Nicolin

RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Shameerali Kolothum Thodi via 1 year ago


> -----Original Message-----
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Thursday, February 6, 2025 8:33 PM
> To: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; Daniel P. Berrangé
> <berrange@redhat.com>; Jason Gunthorpe <jgg@nvidia.com>
> Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> eric.auger@redhat.com; peter.maydell@linaro.org; ddutile@redhat.com;
> Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> Jonathan Cameron <jonathan.cameron@huawei.com>;
> zhangfei.gao@linaro.org; nathanc@nvidia.com
> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> nested SMMUv3
> 
> On Thu, Feb 06, 2025 at 02:22:01PM -0400, Jason Gunthorpe wrote:
> > On Thu, Feb 06, 2025 at 06:18:14PM +0000, Shameerali Kolothum Thodi
> wrote:
> >
> > > > So even if you invent an iommu ID we cannot accept it as a handle to
> > > > create viommu in iommufd.
> > >
> > > Creating the vIOMMU only happens when the user does a  cold/hot
> plug of
> > > a VFIO device. At that time Qemu checks whether the assigned id
> matches
> > > with whatever the kernel tell it.
> >
> > This is not hard up until the guest is started. If you boot a guest
> > without a backing viommu iommufd object then there will be some more
> > complexities.
> 
> Yea, I imagined that things would be complicated with hotplugs..
> 
> On one hand, I got the part that we need some fixed link forehand
> to ease migration/hotplugs.
> 
> On the other hand, all IOMMUFD ioctls need a VFIO device FD, which
> brings the immediate attention that we cannot even decide vSMMU's
> capabilities being reflected in its IDR/IIDR registers, without a
> coldplug device -- if we boot a VM (one vSMMU<->pSMMU) with only a
> hotplug device, the IOMMU_GET_HW_INFO cannot be done during guest

Right. I forgot about the call to smmu_dev_get_info() during the reset.
That means we need at least one dev per Guest SMMU during Guest
boot :(

Thanks,
Shameer

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Daniel P. Berrangé 1 year ago

On Fri, Feb 07, 2025 at 10:21:17AM +0000, Shameerali Kolothum Thodi wrote:
> 
> 
> > -----Original Message-----
> > From: Nicolin Chen <nicolinc@nvidia.com>
> > Sent: Thursday, February 6, 2025 8:33 PM
> > To: Shameerali Kolothum Thodi
> > <shameerali.kolothum.thodi@huawei.com>; Daniel P. Berrangé
> > <berrange@redhat.com>; Jason Gunthorpe <jgg@nvidia.com>
> > Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> > eric.auger@redhat.com; peter.maydell@linaro.org; ddutile@redhat.com;
> > Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> > <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> > Jonathan Cameron <jonathan.cameron@huawei.com>;
> > zhangfei.gao@linaro.org; nathanc@nvidia.com
> > Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> > nested SMMUv3
> > 
> > On Thu, Feb 06, 2025 at 02:22:01PM -0400, Jason Gunthorpe wrote:
> > > On Thu, Feb 06, 2025 at 06:18:14PM +0000, Shameerali Kolothum Thodi
> > wrote:
> > >
> > > > > So even if you invent an iommu ID we cannot accept it as a handle to
> > > > > create viommu in iommufd.
> > > >
> > > > Creating the vIOMMU only happens when the user does a  cold/hot
> > plug of
> > > > a VFIO device. At that time Qemu checks whether the assigned id
> > matches
> > > > with whatever the kernel tell it.
> > >
> > > This is not hard up until the guest is started. If you boot a guest
> > > without a backing viommu iommufd object then there will be some more
> > > complexities.
> > 
> > Yea, I imagined that things would be complicated with hotplugs..
> > 
> > On one hand, I got the part that we need some fixed link forehand
> > to ease migration/hotplugs.
> > 
> > On the other hand, all IOMMUFD ioctls need a VFIO device FD, which
> > brings the immediate attention that we cannot even decide vSMMU's
> > capabilities being reflected in its IDR/IIDR registers, without a
> > coldplug device -- if we boot a VM (one vSMMU<->pSMMU) with only a
> > hotplug device, the IOMMU_GET_HW_INFO cannot be done during guest
> 
> Right. I forgot about the call to smmu_dev_get_info() during the reset.
> That means we need at least one dev per Guest SMMU during Guest
> boot :(

That's pretty unpleasant as a usage restriction. It sounds like there
needs to be a way to configure & control the vIOMMU independantly of
attaching a specific VFIO device.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Shameerali Kolothum Thodi via 1 year ago


> -----Original Message-----
> From: Daniel P. Berrangé <berrange@redhat.com>
> Sent: Friday, February 7, 2025 10:32 AM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: Nicolin Chen <nicolinc@nvidia.com>; Jason Gunthorpe
> <jgg@nvidia.com>; qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> eric.auger@redhat.com; peter.maydell@linaro.org; ddutile@redhat.com;
> Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> Jonathan Cameron <jonathan.cameron@huawei.com>;
> zhangfei.gao@linaro.org; nathanc@nvidia.com
> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> nested SMMUv3
> 
> On Fri, Feb 07, 2025 at 10:21:17AM +0000, Shameerali Kolothum Thodi
> wrote:
> >
> >
> > > -----Original Message-----
> > > From: Nicolin Chen <nicolinc@nvidia.com>
> > > Sent: Thursday, February 6, 2025 8:33 PM
> > > To: Shameerali Kolothum Thodi
> > > <shameerali.kolothum.thodi@huawei.com>; Daniel P. Berrangé
> > > <berrange@redhat.com>; Jason Gunthorpe <jgg@nvidia.com>
> > > Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> > > eric.auger@redhat.com; peter.maydell@linaro.org;
> ddutile@redhat.com;
> > > Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> > > <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> > > Jonathan Cameron <jonathan.cameron@huawei.com>;
> > > zhangfei.gao@linaro.org; nathanc@nvidia.com
> > > Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-
> creatable
> > > nested SMMUv3
> > >
> > > On Thu, Feb 06, 2025 at 02:22:01PM -0400, Jason Gunthorpe wrote:
> > > > On Thu, Feb 06, 2025 at 06:18:14PM +0000, Shameerali Kolothum
> Thodi
> > > wrote:
> > > >
> > > > > > So even if you invent an iommu ID we cannot accept it as a handle
> to
> > > > > > create viommu in iommufd.
> > > > >
> > > > > Creating the vIOMMU only happens when the user does a  cold/hot
> > > plug of
> > > > > a VFIO device. At that time Qemu checks whether the assigned id
> > > matches
> > > > > with whatever the kernel tell it.
> > > >
> > > > This is not hard up until the guest is started. If you boot a guest
> > > > without a backing viommu iommufd object then there will be some
> more
> > > > complexities.
> > >
> > > Yea, I imagined that things would be complicated with hotplugs..
> > >
> > > On one hand, I got the part that we need some fixed link forehand
> > > to ease migration/hotplugs.
> > >
> > > On the other hand, all IOMMUFD ioctls need a VFIO device FD, which
> > > brings the immediate attention that we cannot even decide vSMMU's
> > > capabilities being reflected in its IDR/IIDR registers, without a
> > > coldplug device -- if we boot a VM (one vSMMU<->pSMMU) with only a
> > > hotplug device, the IOMMU_GET_HW_INFO cannot be done during
> guest
> >
> > Right. I forgot about the call to smmu_dev_get_info() during the reset.
> > That means we need at least one dev per Guest SMMU during Guest
> > boot :(
> 
> That's pretty unpleasant as a usage restriction. It sounds like there
> needs to be a way to configure & control the vIOMMU independantly of
> attaching a specific VFIO device.

Yes, that would be ideal.  

Just wondering whether we can have something like the
vfio_register_iommu_driver() for iommufd subsystem by which it can directly
access iommu drivers ops(may be a restricted set). 

Not sure about the layering violations and other security issues with that...

Thanks,
Shameer

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Jason Gunthorpe 1 year ago

On Fri, Feb 07, 2025 at 12:21:54PM +0000, Shameerali Kolothum Thodi wrote:

> Just wondering whether we can have something like the
> vfio_register_iommu_driver() for iommufd subsystem by which it can directly
> access iommu drivers ops(may be a restricted set). 

I very much want to try hard to avoid that.

AFAICT you do not need a VFIO device, or access to the HW_INFO of the
smmu to start up a SMMU driver.

Yes, you cannot later attach a VFIO device with a pSMMU that
materially differs from vSMMU setup, but that is fine.

qemu has long had a duality where you can either "inherit from host"
for an easy setup or be "fully specified" and support live
migration/etc. CPUID as a simple example.

So, what the smmu patches are doing now is "inherit from host" and
that requires a VFIO device to work. I think that is fine.

If you want to do full hotplug then you need to "fully specified" on
the command line so a working vSMMU can be shown to the guest with no
devices, and no kernel involvement.

Obviously this is a highly advanced operating mode as things like IIDR
and errata need to be considered, but I would guess booting with no
vPCI devices is already abnormal.

Jason

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Jason Gunthorpe 1 year ago

On Thu, Feb 06, 2025 at 12:33:19PM -0800, Nicolin Chen wrote:
> On Thu, Feb 06, 2025 at 02:22:01PM -0400, Jason Gunthorpe wrote:
> > On Thu, Feb 06, 2025 at 06:18:14PM +0000, Shameerali Kolothum Thodi wrote:
> > 
> > > > So even if you invent an iommu ID we cannot accept it as a handle to
> > > > create viommu in iommufd.
> > > 
> > > Creating the vIOMMU only happens when the user does a  cold/hot plug of
> > > a VFIO device. At that time Qemu checks whether the assigned id matches
> > > with whatever the kernel tell it. 
> > 
> > This is not hard up until the guest is started. If you boot a guest
> > without a backing viommu iommufd object then there will be some more
> > complexities.
> 
> Yea, I imagined that things would be complicated with hotplugs..
> 
> On one hand, I got the part that we need some fixed link forehand
> to ease migration/hotplugs.
> 
> On the other hand, all IOMMUFD ioctls need a VFIO device FD, which
> brings the immediate attention that we cannot even decide vSMMU's
> capabilities being reflected in its IDR/IIDR registers, without a
> coldplug device

As Daniel was saying this all has to be specifiable on the command
line.

IMHO if the vSMMU is not fully specified by the time the boot happens
(either explicity via command line or implicitly by querying the live
HW) then it qemu should fail.

Jason

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Nicolin Chen 1 year ago

On Thu, Feb 06, 2025 at 04:38:55PM -0400, Jason Gunthorpe wrote:
> On Thu, Feb 06, 2025 at 12:33:19PM -0800, Nicolin Chen wrote:
> > On Thu, Feb 06, 2025 at 02:22:01PM -0400, Jason Gunthorpe wrote:
> > > On Thu, Feb 06, 2025 at 06:18:14PM +0000, Shameerali Kolothum Thodi wrote:
> > > 
> > > > > So even if you invent an iommu ID we cannot accept it as a handle to
> > > > > create viommu in iommufd.
> > > > 
> > > > Creating the vIOMMU only happens when the user does a  cold/hot plug of
> > > > a VFIO device. At that time Qemu checks whether the assigned id matches
> > > > with whatever the kernel tell it. 
> > > 
> > > This is not hard up until the guest is started. If you boot a guest
> > > without a backing viommu iommufd object then there will be some more
> > > complexities.
> > 
> > Yea, I imagined that things would be complicated with hotplugs..
> > 
> > On one hand, I got the part that we need some fixed link forehand
> > to ease migration/hotplugs.
> > 
> > On the other hand, all IOMMUFD ioctls need a VFIO device FD, which
> > brings the immediate attention that we cannot even decide vSMMU's
> > capabilities being reflected in its IDR/IIDR registers, without a
> > coldplug device
> 
> As Daniel was saying this all has to be specifiable on the command
> line.
> 
> IMHO if the vSMMU is not fully specified by the time the boot happens
> (either explicity via command line or implicitly by querying the live
> HW) then it qemu should fail.

Though that makes sense, that would assume we could only support
the case where a VM has at least one cold plug device per vSMMU?

Otherwise, even if we specify vSMMU to which pSMMU via a command
line, we can't get access to the pSMMU via IOMMU_GET_HW_INFO..

Thanks
Nicolin

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Jason Gunthorpe 1 year ago

On Thu, Feb 06, 2025 at 12:48:40PM -0800, Nicolin Chen wrote:
> On Thu, Feb 06, 2025 at 04:38:55PM -0400, Jason Gunthorpe wrote:
> > On Thu, Feb 06, 2025 at 12:33:19PM -0800, Nicolin Chen wrote:
> > > On Thu, Feb 06, 2025 at 02:22:01PM -0400, Jason Gunthorpe wrote:
> > > > On Thu, Feb 06, 2025 at 06:18:14PM +0000, Shameerali Kolothum Thodi wrote:
> > > > 
> > > > > > So even if you invent an iommu ID we cannot accept it as a handle to
> > > > > > create viommu in iommufd.
> > > > > 
> > > > > Creating the vIOMMU only happens when the user does a  cold/hot plug of
> > > > > a VFIO device. At that time Qemu checks whether the assigned id matches
> > > > > with whatever the kernel tell it. 
> > > > 
> > > > This is not hard up until the guest is started. If you boot a guest
> > > > without a backing viommu iommufd object then there will be some more
> > > > complexities.
> > > 
> > > Yea, I imagined that things would be complicated with hotplugs..
> > > 
> > > On one hand, I got the part that we need some fixed link forehand
> > > to ease migration/hotplugs.
> > > 
> > > On the other hand, all IOMMUFD ioctls need a VFIO device FD, which
> > > brings the immediate attention that we cannot even decide vSMMU's
> > > capabilities being reflected in its IDR/IIDR registers, without a
> > > coldplug device
> > 
> > As Daniel was saying this all has to be specifiable on the command
> > line.
> > 
> > IMHO if the vSMMU is not fully specified by the time the boot happens
> > (either explicity via command line or implicitly by querying the live
> > HW) then it qemu should fail.
> 
> Though that makes sense, that would assume we could only support
> the case where a VM has at least one cold plug device per vSMMU?
> 
> Otherwise, even if we specify vSMMU to which pSMMU via a command
> line, we can't get access to the pSMMU via IOMMU_GET_HW_INFO..

You'd use the command line information and wouldn't need GET_HW_INFO,
it would be complicated

Jason

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Nicolin Chen 1 year ago

On Thu, Feb 06, 2025 at 05:11:13PM -0400, Jason Gunthorpe wrote:
> On Thu, Feb 06, 2025 at 12:48:40PM -0800, Nicolin Chen wrote:
> > On Thu, Feb 06, 2025 at 04:38:55PM -0400, Jason Gunthorpe wrote:
> > > On Thu, Feb 06, 2025 at 12:33:19PM -0800, Nicolin Chen wrote:
> > > > On Thu, Feb 06, 2025 at 02:22:01PM -0400, Jason Gunthorpe wrote:
> > > > > On Thu, Feb 06, 2025 at 06:18:14PM +0000, Shameerali Kolothum Thodi wrote:
> > > > > 
> > > > > > > So even if you invent an iommu ID we cannot accept it as a handle to
> > > > > > > create viommu in iommufd.
> > > > > > 
> > > > > > Creating the vIOMMU only happens when the user does a  cold/hot plug of
> > > > > > a VFIO device. At that time Qemu checks whether the assigned id matches
> > > > > > with whatever the kernel tell it. 
> > > > > 
> > > > > This is not hard up until the guest is started. If you boot a guest
> > > > > without a backing viommu iommufd object then there will be some more
> > > > > complexities.
> > > > 
> > > > Yea, I imagined that things would be complicated with hotplugs..
> > > > 
> > > > On one hand, I got the part that we need some fixed link forehand
> > > > to ease migration/hotplugs.
> > > > 
> > > > On the other hand, all IOMMUFD ioctls need a VFIO device FD, which
> > > > brings the immediate attention that we cannot even decide vSMMU's
> > > > capabilities being reflected in its IDR/IIDR registers, without a
> > > > coldplug device
> > > 
> > > As Daniel was saying this all has to be specifiable on the command
> > > line.
> > > 
> > > IMHO if the vSMMU is not fully specified by the time the boot happens
> > > (either explicity via command line or implicitly by querying the live
> > > HW) then it qemu should fail.
> > 
> > Though that makes sense, that would assume we could only support
> > the case where a VM has at least one cold plug device per vSMMU?
> > 
> > Otherwise, even if we specify vSMMU to which pSMMU via a command
> > line, we can't get access to the pSMMU via IOMMU_GET_HW_INFO..
> 
> You'd use the command line information and wouldn't need GET_HW_INFO,
> it would be complicated

Do you mean the "-device arm-smmuv3-accel,id=xx" line? This still
won't give us the host IDR/IIDR register values to probe a vSMMU,
unless it has a VFIO device assigned to vSMMU's associated PXB in
that command line?

Nicolin

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Jason Gunthorpe 1 year ago

On Thu, Feb 06, 2025 at 02:46:42PM -0800, Nicolin Chen wrote:
> > You'd use the command line information and wouldn't need GET_HW_INFO,
> > it would be complicated
> 
> Do you mean the "-device arm-smmuv3-accel,id=xx" line? This still
> won't give us the host IDR/IIDR register values to probe a vSMMU,
> unless it has a VFIO device assigned to vSMMU's associated PXB in
> that command line?

Yes, put the IDR registers on the command line too.

Nothing from the host should be copied to the guest without the option
to control it through the command line.

Jason

RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Shameerali Kolothum Thodi via 1 year ago


> -----Original Message-----
> From: Shameerali Kolothum Thodi
> Sent: Thursday, January 30, 2025 6:09 PM
> To: 'Daniel P. Berrangé' <berrange@redhat.com>
> Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
> nicolinc@nvidia.com; ddutile@redhat.com; Linuxarm
> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
> Subject: RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> nested SMMUv3
> 
> Hi Daniel,
> 
> > -----Original Message-----
> > From: Daniel P. Berrangé <berrange@redhat.com>
> > Sent: Thursday, January 30, 2025 4:00 PM
> > To: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>
> > Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> > eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
> > nicolinc@nvidia.com; ddutile@redhat.com; Linuxarm
> > <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
> > jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
> > <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
> > Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> > nested SMMUv3
> >
> > On Fri, Nov 08, 2024 at 12:52:37PM +0000, Shameer Kolothum via wrote:
> > > How to use it(Eg:):
> > >
> > > On a HiSilicon platform that has multiple physical SMMUv3s, the ACC
> ZIP
> > VF
> > > devices and HNS VF devices are behind different SMMUv3s. So for a
> > Guest,
> > > specify two smmuv3-nested devices each behind a pxb-pcie as below,
> > >
> > > ./qemu-system-aarch64 -machine virt,gic-version=3,default-bus-bypass-
> > iommu=on \
> > > -enable-kvm -cpu host -m 4G -smp cpus=8,maxcpus=8 \
> > > -object iommufd,id=iommufd0 \
> > > -bios QEMU_EFI.fd \
> > > -kernel Image \
> > > -device virtio-blk-device,drive=fs \
> > > -drive if=none,file=rootfs.qcow2,id=fs \
> > > -device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \
> > > -device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1 \
> > > -device arm-smmuv3-nested,id=smmuv1,pci-bus=pcie.1 \
> > > -device vfio-pci,host=0000:7d:02.1,bus=pcie.port1,iommufd=iommufd0 \
> > > -device pxb-pcie,id=pcie.2,bus_nr=16,bus=pcie.0 \
> > > -device pcie-root-port,id=pcie.port2,bus=pcie.2,chassis=2 \
> > > -device arm-smmuv3-nested,id=smmuv2,pci-bus=pcie.2 \
> > > -device vfio-pci,host=0000:75:00.1,bus=pcie.port2,iommufd=iommufd0 \
> > > -append "rdinit=init console=ttyAMA0 root=/dev/vda2 rw
> > earlycon=pl011,0x9000000" \
> > > -device virtio-9p-pci,fsdev=p9fs2,mount_tag=p9,bus=pcie.0 \
> > > -fsdev local,id=p9fs2,path=p9root,security_model=mapped \
> > > -net none \
> > > -nographic
> >
> > Above you say the host has 2 SMMUv3 devices, and you've created 2
> > SMMUv3
> > guest devices to match.
> >
> > The various emails in this thread & libvirt thread, indicate that each
> > guest SMMUv3 is associated with a host SMMUv3, but I don't see any
> > property on the command line for 'arm-ssmv3-nested' that tells it which
> > host eSMMUv3 it is to be associated with.
> >
> > How does this association work ?
> 
> You are right. The association is not very obvious in Qemu. The association
> and checking is done implicitly by kernel at the moment.  I will try to
> explain
> it here.
> 
> Each "arm-smmuv3-nested" instance, when the first device gets attached
> to it, will create a S2 HWPT and a corresponding SMMUv3 domain in kernel
> SMMUv3 driver. This domain will have a pointer representing the physical
> SMMUv3 that the device belongs. And any other device which belongs to
> the same physical SMMUv3 can share this S2 domain.
> 
> If a device that belongs to a different physical SMMUv3 gets attached to
> the above domain, the HWPT attach will eventually fail as the physical
> smmuv3 in the domains will have a mismatch,
> https://elixir.bootlin.com/linux/v6.13/source/drivers/iommu/arm/arm-
> smmu-v3/arm-smmu-v3.c#L2860
> 
> And as I mentioned in cover letter, Qemu will report,
> 
> "
> Attempt to add the HNS VF to a different SMMUv3 will result in,
> 
> -device vfio-pci,host=0000:7d:02.2,bus=pcie.port3,iommufd=iommufd0:
> Unable to attach viommu
> -device vfio-pci,host=0000:7d:02.2,bus=pcie.port3,iommufd=iommufd0: vfio
> 0000:7d:02.2:
>    Failed to set iommu_device: [iommufd=29] error attach 0000:7d:02.2 (38)
> to id=11: Invalid argument
> 
> At present Qemu is not doing any extra validation other than the above
> failure to make sure the user configuration is correct or not. The
> assumption is libvirt will take care of this.
> "
> So in summary, if the libvirt gets it wrong, Qemu will fail with error.
> 
> If a more explicit association is required, some help from kernel is required
> to identify the physical SMMUv3 associated with the device.

Again thinking about this, to have an explicit association in the Qemu command 
line between the vSMMUv3 and the phys smmuv3,

We can possibly add something like,

-device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \
-device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1 \
-device arm-smmuv3-accel,bus=pcie.1,phys-smmuv3= smmu3.0x0000000100000000 \
-device vfio-pci,host=0000:7d:02.1,bus=pcie.port1,iommufd=iommufd0 \

-device pxb-pcie,id=pcie.2,bus_nr=16,bus=pcie.0 \
-device pcie-root-port,id=pcie.port2,bus=pcie.2,chassis=2 \
-device arm-smmuv3-nested,id=smmuv2,pci-bus=pcie.2, phys-smmuv3= smmu3.0x0000000200000000  \
-device vfio-pci,host=0000:75:00.1,bus=pcie.port2,iommufd=iommufd0 \

etc.

And Qemu does some checking to make sure that the device is indeed associated
with the specified phys-smmuv3.  This can be done going through the sysfs path checking
which is what I guess libvirt is currently doing to populate the topology. So basically
Qemu is just replicating that to validate again.

Or another option is extending the IOMMU_GET_HW_INFO IOCTL to return the phys
smmuv3 base address which can avoid going through the sysfs.

The only difference between the current approach(kernel failing the attach implicitly)
and the above is, Qemu can provide a validation of inputs and may be report a  better
error message than just saying " Unable to attach viommu/: Invalid argument".

If the command line looks Ok, I will go with the sysfs path validation method first in my
next respin.

Please let me know.

Thanks,
Shameer

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Jason Gunthorpe 1 year ago

On Fri, Jan 31, 2025 at 09:33:16AM +0000, Shameerali Kolothum Thodi wrote:

> And Qemu does some checking to make sure that the device is indeed associated
> with the specified phys-smmuv3.  This can be done going through the sysfs path checking
> which is what I guess libvirt is currently doing to populate the topology. So basically
> Qemu is just replicating that to validate again.

I would prefer that iommufd users not have to go out to sysfs..
 
> Or another option is extending the IOMMU_GET_HW_INFO IOCTL to return the phys
> smmuv3 base address which can avoid going through the sysfs.

It also doesn't seem great to expose a physical address. But we could
have an 'iommu instance id' that was a unique small integer?

Jason

RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Shameerali Kolothum Thodi via 1 year ago


> -----Original Message-----
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Friday, January 31, 2025 2:24 PM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: Daniel P. Berrangé <berrange@redhat.com>; qemu-arm@nongnu.org;
> qemu-devel@nongnu.org; eric.auger@redhat.com;
> peter.maydell@linaro.org; nicolinc@nvidia.com; ddutile@redhat.com;
> Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> Jonathan Cameron <jonathan.cameron@huawei.com>;
> zhangfei.gao@linaro.org; Nathan Chen <nathanc@nvidia.com>
> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> nested SMMUv3
> 
> On Fri, Jan 31, 2025 at 09:33:16AM +0000, Shameerali Kolothum Thodi
> wrote:
> 
> > And Qemu does some checking to make sure that the device is indeed
> associated
> > with the specified phys-smmuv3.  This can be done going through the
> sysfs path checking
> > which is what I guess libvirt is currently doing to populate the topology.
> So basically
> > Qemu is just replicating that to validate again.
> 
> I would prefer that iommufd users not have to go out to sysfs..
> 
> > Or another option is extending the IOMMU_GET_HW_INFO IOCTL to
> return the phys
> > smmuv3 base address which can avoid going through the sysfs.
> 
> It also doesn't seem great to expose a physical address. But we could
> have an 'iommu instance id' that was a unique small integer?

Ok. But how the user space can map that to the device?

Something like,
/sys/bus/pci/devices/0000:7d:00.1/iommu/instance.X ?

Thanks,
Shameer

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Jason Gunthorpe 1 year ago

On Fri, Jan 31, 2025 at 02:39:53PM +0000, Shameerali Kolothum Thodi wrote:

> > > And Qemu does some checking to make sure that the device is indeed
> > associated
> > > with the specified phys-smmuv3.  This can be done going through the
> > sysfs path checking
> > > which is what I guess libvirt is currently doing to populate the topology.
> > So basically
> > > Qemu is just replicating that to validate again.
> > 
> > I would prefer that iommufd users not have to go out to sysfs..
> > 
> > > Or another option is extending the IOMMU_GET_HW_INFO IOCTL to
> > return the phys
> > > smmuv3 base address which can avoid going through the sysfs.
> > 
> > It also doesn't seem great to expose a physical address. But we could
> > have an 'iommu instance id' that was a unique small integer?
> 
> Ok. But how the user space can map that to the device?

Why does it need to?

libvirt picks some label for the vsmmu instance, it doesn't matter
what the string is.

qemu validates that all of the vsmmu instances are only linked to PCI
device that have the same iommu ID. This is already happening in the
kernel, it will fail attaches to mismatched instances.

Nothing further is needed?

Jason

RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Shameerali Kolothum Thodi via 1 year ago


> -----Original Message-----
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Friday, January 31, 2025 2:54 PM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: Daniel P. Berrangé <berrange@redhat.com>; qemu-arm@nongnu.org;
> qemu-devel@nongnu.org; eric.auger@redhat.com;
> peter.maydell@linaro.org; nicolinc@nvidia.com; ddutile@redhat.com;
> Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> Jonathan Cameron <jonathan.cameron@huawei.com>;
> zhangfei.gao@linaro.org; Nathan Chen <nathanc@nvidia.com>
> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> nested SMMUv3
> 
> On Fri, Jan 31, 2025 at 02:39:53PM +0000, Shameerali Kolothum Thodi
> wrote:
> 
> > > > And Qemu does some checking to make sure that the device is indeed
> > > associated
> > > > with the specified phys-smmuv3.  This can be done going through the
> > > sysfs path checking
> > > > which is what I guess libvirt is currently doing to populate the
> topology.
> > > So basically
> > > > Qemu is just replicating that to validate again.
> > >
> > > I would prefer that iommufd users not have to go out to sysfs..
> > >
> > > > Or another option is extending the IOMMU_GET_HW_INFO IOCTL to
> > > return the phys
> > > > smmuv3 base address which can avoid going through the sysfs.
> > >
> > > It also doesn't seem great to expose a physical address. But we could
> > > have an 'iommu instance id' that was a unique small integer?
> >
> > Ok. But how the user space can map that to the device?
> 
> Why does it need to?
> 
> libvirt picks some label for the vsmmu instance, it doesn't matter
> what the string is.
> 
> qemu validates that all of the vsmmu instances are only linked to PCI
> device that have the same iommu ID. This is already happening in the
> kernel, it will fail attaches to mismatched instances.
> 
> Nothing further is needed?

-device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \
-device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1 \
-device arm-smmuv3-accel,bus=pcie.1,id=smmuv1 \
-device vfio-pci,host=0000:7d:02.1,bus=pcie.port1,iommufd=iommufd0 \

-device pxb-pcie,id=pcie.2,bus_nr=16,bus=pcie.0 \
-device pcie-root-port,id=pcie.port2,bus=pcie.2,chassis=2 \
-device arm-smmuv3-accel,pci-bus=pcie.2,id=smmuv2 \
-device vfio-pci,host=0000:75:00.1,bus=pcie.port2,iommufd=iommufd0 \

I think it works from a functionality point of view. A  particular
instance of arm-smmuv3-accel(say id=smmuv1) can only have devices attached
to the same phys smmuv3 "iommu instance id"

But not sure from a libvirt/Qemu interface point of view[0] the concerns
are addressed. Daniel/Nathan?

Thanks,
Shameer
https://lists.libvirt.org/archives/list/devel@lists.libvirt.org/message/X6R52JRBYDFZ5PSJFR534A655UZ3RHKN/

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Eric Auger 1 year ago

Hi,


On 1/31/25 4:23 PM, Shameerali Kolothum Thodi wrote:
>
>> -----Original Message-----
>> From: Jason Gunthorpe <jgg@nvidia.com>
>> Sent: Friday, January 31, 2025 2:54 PM
>> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
>> Cc: Daniel P. Berrangé <berrange@redhat.com>; qemu-arm@nongnu.org;
>> qemu-devel@nongnu.org; eric.auger@redhat.com;
>> peter.maydell@linaro.org; nicolinc@nvidia.com; ddutile@redhat.com;
>> Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
>> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
>> Jonathan Cameron <jonathan.cameron@huawei.com>;
>> zhangfei.gao@linaro.org; Nathan Chen <nathanc@nvidia.com>
>> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
>> nested SMMUv3
>>
>> On Fri, Jan 31, 2025 at 02:39:53PM +0000, Shameerali Kolothum Thodi
>> wrote:
>>
>>>>> And Qemu does some checking to make sure that the device is indeed
>>>> associated
>>>>> with the specified phys-smmuv3.  This can be done going through the
>>>> sysfs path checking
>>>>> which is what I guess libvirt is currently doing to populate the
>> topology.
>>>> So basically
>>>>> Qemu is just replicating that to validate again.
>>>> I would prefer that iommufd users not have to go out to sysfs..
>>>>
>>>>> Or another option is extending the IOMMU_GET_HW_INFO IOCTL to
>>>> return the phys
>>>>> smmuv3 base address which can avoid going through the sysfs.
>>>> It also doesn't seem great to expose a physical address. But we could
>>>> have an 'iommu instance id' that was a unique small integer?
>>> Ok. But how the user space can map that to the device?
>> Why does it need to?
>>
>> libvirt picks some label for the vsmmu instance, it doesn't matter
>> what the string is.
>>
>> qemu validates that all of the vsmmu instances are only linked to PCI
>> device that have the same iommu ID. This is already happening in the
>> kernel, it will fail attaches to mismatched instances.
>>
>> Nothing further is needed?
> -device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \
> -device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1 \
> -device arm-smmuv3-accel,bus=pcie.1,id=smmuv1 \
I don't get what is the point of adding such an id if it is not
referenced anywhere?

Eric
> -device vfio-pci,host=0000:7d:02.1,bus=pcie.port1,iommufd=iommufd0 \
>
> -device pxb-pcie,id=pcie.2,bus_nr=16,bus=pcie.0 \
> -device pcie-root-port,id=pcie.port2,bus=pcie.2,chassis=2 \
> -device arm-smmuv3-accel,pci-bus=pcie.2,id=smmuv2 \
> -device vfio-pci,host=0000:75:00.1,bus=pcie.port2,iommufd=iommufd0 \
>
> I think it works from a functionality point of view. A  particular
> instance of arm-smmuv3-accel(say id=smmuv1) can only have devices attached
> to the same phys smmuv3 "iommu instance id"
>
> But not sure from a libvirt/Qemu interface point of view[0] the concerns
> are addressed. Daniel/Nathan?
>
> Thanks,
> Shameer
> https://lists.libvirt.org/archives/list/devel@lists.libvirt.org/message/X6R52JRBYDFZ5PSJFR534A655UZ3RHKN/
>

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Daniel P. Berrangé 1 year ago

On Fri, Jan 31, 2025 at 05:08:28PM +0100, Eric Auger wrote:
> Hi,
> 
> 
> On 1/31/25 4:23 PM, Shameerali Kolothum Thodi wrote:
> >
> >> -----Original Message-----
> >> From: Jason Gunthorpe <jgg@nvidia.com>
> >> Sent: Friday, January 31, 2025 2:54 PM
> >> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> >> Cc: Daniel P. Berrangé <berrange@redhat.com>; qemu-arm@nongnu.org;
> >> qemu-devel@nongnu.org; eric.auger@redhat.com;
> >> peter.maydell@linaro.org; nicolinc@nvidia.com; ddutile@redhat.com;
> >> Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> >> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> >> Jonathan Cameron <jonathan.cameron@huawei.com>;
> >> zhangfei.gao@linaro.org; Nathan Chen <nathanc@nvidia.com>
> >> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> >> nested SMMUv3
> >>
> >> On Fri, Jan 31, 2025 at 02:39:53PM +0000, Shameerali Kolothum Thodi
> >> wrote:
> >>
> >>>>> And Qemu does some checking to make sure that the device is indeed
> >>>> associated
> >>>>> with the specified phys-smmuv3.  This can be done going through the
> >>>> sysfs path checking
> >>>>> which is what I guess libvirt is currently doing to populate the
> >> topology.
> >>>> So basically
> >>>>> Qemu is just replicating that to validate again.
> >>>> I would prefer that iommufd users not have to go out to sysfs..
> >>>>
> >>>>> Or another option is extending the IOMMU_GET_HW_INFO IOCTL to
> >>>> return the phys
> >>>>> smmuv3 base address which can avoid going through the sysfs.
> >>>> It also doesn't seem great to expose a physical address. But we could
> >>>> have an 'iommu instance id' that was a unique small integer?
> >>> Ok. But how the user space can map that to the device?
> >> Why does it need to?
> >>
> >> libvirt picks some label for the vsmmu instance, it doesn't matter
> >> what the string is.
> >>
> >> qemu validates that all of the vsmmu instances are only linked to PCI
> >> device that have the same iommu ID. This is already happening in the
> >> kernel, it will fail attaches to mismatched instances.
> >>
> >> Nothing further is needed?
> > -device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \
> > -device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1 \
> > -device arm-smmuv3-accel,bus=pcie.1,id=smmuv1 \
> I don't get what is the point of adding such an id if it is not
> referenced anywhere?

Every QDev device instance has an 'id' property - if you don't
set one explicitly, QEMU will generate one internally. Libvirt
will always set the 'id' property to avoid the internal auto-
generated IDs, as it wants full knowledge of naming.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Eric Auger 1 year ago



On 2/6/25 9:53 AM, Daniel P. Berrangé wrote:
> On Fri, Jan 31, 2025 at 05:08:28PM +0100, Eric Auger wrote:
>> Hi,
>>
>>
>> On 1/31/25 4:23 PM, Shameerali Kolothum Thodi wrote:
>>>> -----Original Message-----
>>>> From: Jason Gunthorpe <jgg@nvidia.com>
>>>> Sent: Friday, January 31, 2025 2:54 PM
>>>> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
>>>> Cc: Daniel P. Berrangé <berrange@redhat.com>; qemu-arm@nongnu.org;
>>>> qemu-devel@nongnu.org; eric.auger@redhat.com;
>>>> peter.maydell@linaro.org; nicolinc@nvidia.com; ddutile@redhat.com;
>>>> Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
>>>> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
>>>> Jonathan Cameron <jonathan.cameron@huawei.com>;
>>>> zhangfei.gao@linaro.org; Nathan Chen <nathanc@nvidia.com>
>>>> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
>>>> nested SMMUv3
>>>>
>>>> On Fri, Jan 31, 2025 at 02:39:53PM +0000, Shameerali Kolothum Thodi
>>>> wrote:
>>>>
>>>>>>> And Qemu does some checking to make sure that the device is indeed
>>>>>> associated
>>>>>>> with the specified phys-smmuv3.  This can be done going through the
>>>>>> sysfs path checking
>>>>>>> which is what I guess libvirt is currently doing to populate the
>>>> topology.
>>>>>> So basically
>>>>>>> Qemu is just replicating that to validate again.
>>>>>> I would prefer that iommufd users not have to go out to sysfs..
>>>>>>
>>>>>>> Or another option is extending the IOMMU_GET_HW_INFO IOCTL to
>>>>>> return the phys
>>>>>>> smmuv3 base address which can avoid going through the sysfs.
>>>>>> It also doesn't seem great to expose a physical address. But we could
>>>>>> have an 'iommu instance id' that was a unique small integer?
>>>>> Ok. But how the user space can map that to the device?
>>>> Why does it need to?
>>>>
>>>> libvirt picks some label for the vsmmu instance, it doesn't matter
>>>> what the string is.
>>>>
>>>> qemu validates that all of the vsmmu instances are only linked to PCI
>>>> device that have the same iommu ID. This is already happening in the
>>>> kernel, it will fail attaches to mismatched instances.
>>>>
>>>> Nothing further is needed?
>>> -device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \
>>> -device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1 \
>>> -device arm-smmuv3-accel,bus=pcie.1,id=smmuv1 \
>> I don't get what is the point of adding such an id if it is not
>> referenced anywhere?
> Every QDev device instance has an 'id' property - if you don't
> set one explicitly, QEMU will generate one internally. Libvirt
> will always set the 'id' property to avoid the internal auto-
> generated IDs, as it wants full knowledge of naming.

OK thank you for the explanation

Eric
>
> With regards,
> Daniel

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Nathan Chen 1 year ago

On 1/31/2025 8:08 AM, Eric Auger wrote:
>>>>>> And Qemu does some checking to make sure that the device is indeed
>>>>> associated
>>>>>> with the specified phys-smmuv3.  This can be done going through the
>>>>> sysfs path checking
>>>>>> which is what I guess libvirt is currently doing to populate the
>>> topology.
>>>>> So basically
>>>>>> Qemu is just replicating that to validate again.
>>>>> I would prefer that iommufd users not have to go out to sysfs..
>>>>>
>>>>>> Or another option is extending the IOMMU_GET_HW_INFO IOCTL to
>>>>> return the phys
>>>>>> smmuv3 base address which can avoid going through the sysfs.
>>>>> It also doesn't seem great to expose a physical address. But we could
>>>>> have an 'iommu instance id' that was a unique small integer?
>>>> Ok. But how the user space can map that to the device?
>>> Why does it need to?
>>>
>>> libvirt picks some label for the vsmmu instance, it doesn't matter
>>> what the string is.
>>>
>>> qemu validates that all of the vsmmu instances are only linked to PCI
>>> device that have the same iommu ID. This is already happening in the
>>> kernel, it will fail attaches to mismatched instances.
>>>
>>> Nothing further is needed?
>> -device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \
>> -device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1 \
>> -device arm-smmuv3-accel,bus=pcie.1,id=smmuv1 \
> I don't get what is the point of adding such an id if it is not
> referenced anywhere?
> 
> Eric

Daniel mentions that the host-to-guest SMMU pairing must be chosen such 
that it makes conceptual sense w.r.t. the guest NUMA to host NUMA 
pairing [0]. The current implementation allows for incorrect host to 
guest numa node pairings, e.g. pSMMU has affinity to host numa node 0, 
but it’s paired with a vSMMU paired with a guest numa node pinned to 
host numa node 1.

By specifying the host SMMU id, we can explicitly pair a host SMMU with 
a guest SMMU associated with the correct PXB NUMA node, vs. implying the 
host-to-guest SMMU pairing based on what devices are attached to the 
PXB. While it would not completely prevent the incorrect pSMMU/vSMMU 
pairing w.r.t. host to guest numa node pairings, specifying the pSMMU id 
would make the implications of host to guest numa node pairings more 
clear when specifying a vSMMU instance.

 From the libvirt discussion with Daniel [1], he also states "libvirt's 
goal has always been to make everything that's functionally impacting a 
guest device be 100% explicit. So I don't think we should be implying 
mappings to the host SMMU in QEMU at all, QEMU must be told what to map 
to." Specifying the id would be a means of explicitly specifying host to 
guest SMMU mapping instead of implying the mapping.

[0] https://lore.kernel.org/qemu-devel/Z51DmtP83741RAsb@redhat.com/
[1] 
https://lists.libvirt.org/archives/list/devel@lists.libvirt.org/thread/7GDT6RX5LPAJMPP4ZSC4ACME6GVMG236/#X6R52JRBYDFZ5PSJFR534A655UZ3RHKN

Thanks,
Nathan

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Daniel P. Berrangé 1 year ago

On Wed, Feb 05, 2025 at 12:53:42PM -0800, Nathan Chen wrote:
> 
> 
> On 1/31/2025 8:08 AM, Eric Auger wrote:
> > > > > > > And Qemu does some checking to make sure that the device is indeed
> > > > > > associated
> > > > > > > with the specified phys-smmuv3.  This can be done going through the
> > > > > > sysfs path checking
> > > > > > > which is what I guess libvirt is currently doing to populate the
> > > > topology.
> > > > > > So basically
> > > > > > > Qemu is just replicating that to validate again.
> > > > > > I would prefer that iommufd users not have to go out to sysfs..
> > > > > > 
> > > > > > > Or another option is extending the IOMMU_GET_HW_INFO IOCTL to
> > > > > > return the phys
> > > > > > > smmuv3 base address which can avoid going through the sysfs.
> > > > > > It also doesn't seem great to expose a physical address. But we could
> > > > > > have an 'iommu instance id' that was a unique small integer?
> > > > > Ok. But how the user space can map that to the device?
> > > > Why does it need to?
> > > > 
> > > > libvirt picks some label for the vsmmu instance, it doesn't matter
> > > > what the string is.
> > > > 
> > > > qemu validates that all of the vsmmu instances are only linked to PCI
> > > > device that have the same iommu ID. This is already happening in the
> > > > kernel, it will fail attaches to mismatched instances.
> > > > 
> > > > Nothing further is needed?
> > > -device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \
> > > -device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1 \
> > > -device arm-smmuv3-accel,bus=pcie.1,id=smmuv1 \
> > I don't get what is the point of adding such an id if it is not
> > referenced anywhere?
> > 
> > Eric
> 
> Daniel mentions that the host-to-guest SMMU pairing must be chosen such that
> it makes conceptual sense w.r.t. the guest NUMA to host NUMA pairing [0].
> The current implementation allows for incorrect host to guest numa node
> pairings, e.g. pSMMU has affinity to host numa node 0, but it’s paired with
> a vSMMU paired with a guest numa node pinned to host numa node 1.
> 
> By specifying the host SMMU id, we can explicitly pair a host SMMU with a
> guest SMMU associated with the correct PXB NUMA node, vs. implying the
> host-to-guest SMMU pairing based on what devices are attached to the PXB.
> While it would not completely prevent the incorrect pSMMU/vSMMU pairing
> w.r.t. host to guest numa node pairings, specifying the pSMMU id would make
> the implications of host to guest numa node pairings more clear when
> specifying a vSMMU instance.

You've not specified any host SMMU id in the above CLI args though,
only the PXB association.

It needs something like

 -device arm-smmuv3-accel,bus=pcie.1,id=smmuv1,host-smmu=XXXXX

where 'XXXX' is some value to identify the host SMMU

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Eric Auger 1 year ago

Hi Shameer,

On 1/31/25 10:33 AM, Shameerali Kolothum Thodi wrote:
>
>> -----Original Message-----
>> From: Shameerali Kolothum Thodi
>> Sent: Thursday, January 30, 2025 6:09 PM
>> To: 'Daniel P. Berrangé' <berrange@redhat.com>
>> Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
>> eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
>> nicolinc@nvidia.com; ddutile@redhat.com; Linuxarm
>> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
>> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
>> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
>> Subject: RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
>> nested SMMUv3
>>
>> Hi Daniel,
>>
>>> -----Original Message-----
>>> From: Daniel P. Berrangé <berrange@redhat.com>
>>> Sent: Thursday, January 30, 2025 4:00 PM
>>> To: Shameerali Kolothum Thodi
>> <shameerali.kolothum.thodi@huawei.com>
>>> Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
>>> eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
>>> nicolinc@nvidia.com; ddutile@redhat.com; Linuxarm
>>> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
>>> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
>>> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
>>> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
>>> nested SMMUv3
>>>
>>> On Fri, Nov 08, 2024 at 12:52:37PM +0000, Shameer Kolothum via wrote:
>>>> How to use it(Eg:):
>>>>
>>>> On a HiSilicon platform that has multiple physical SMMUv3s, the ACC
>> ZIP
>>> VF
>>>> devices and HNS VF devices are behind different SMMUv3s. So for a
>>> Guest,
>>>> specify two smmuv3-nested devices each behind a pxb-pcie as below,
>>>>
>>>> ./qemu-system-aarch64 -machine virt,gic-version=3,default-bus-bypass-
>>> iommu=on \
>>>> -enable-kvm -cpu host -m 4G -smp cpus=8,maxcpus=8 \
>>>> -object iommufd,id=iommufd0 \
>>>> -bios QEMU_EFI.fd \
>>>> -kernel Image \
>>>> -device virtio-blk-device,drive=fs \
>>>> -drive if=none,file=rootfs.qcow2,id=fs \
>>>> -device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \
>>>> -device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1 \
>>>> -device arm-smmuv3-nested,id=smmuv1,pci-bus=pcie.1 \
>>>> -device vfio-pci,host=0000:7d:02.1,bus=pcie.port1,iommufd=iommufd0 \
>>>> -device pxb-pcie,id=pcie.2,bus_nr=16,bus=pcie.0 \
>>>> -device pcie-root-port,id=pcie.port2,bus=pcie.2,chassis=2 \
>>>> -device arm-smmuv3-nested,id=smmuv2,pci-bus=pcie.2 \
>>>> -device vfio-pci,host=0000:75:00.1,bus=pcie.port2,iommufd=iommufd0 \
>>>> -append "rdinit=init console=ttyAMA0 root=/dev/vda2 rw
>>> earlycon=pl011,0x9000000" \
>>>> -device virtio-9p-pci,fsdev=p9fs2,mount_tag=p9,bus=pcie.0 \
>>>> -fsdev local,id=p9fs2,path=p9root,security_model=mapped \
>>>> -net none \
>>>> -nographic
>>> Above you say the host has 2 SMMUv3 devices, and you've created 2
>>> SMMUv3
>>> guest devices to match.
>>>
>>> The various emails in this thread & libvirt thread, indicate that each
>>> guest SMMUv3 is associated with a host SMMUv3, but I don't see any
>>> property on the command line for 'arm-ssmv3-nested' that tells it which
>>> host eSMMUv3 it is to be associated with.
>>>
>>> How does this association work ?
>> You are right. The association is not very obvious in Qemu. The association
>> and checking is done implicitly by kernel at the moment.  I will try to
>> explain
>> it here.
>>
>> Each "arm-smmuv3-nested" instance, when the first device gets attached
>> to it, will create a S2 HWPT and a corresponding SMMUv3 domain in kernel
>> SMMUv3 driver. This domain will have a pointer representing the physical
>> SMMUv3 that the device belongs. And any other device which belongs to
>> the same physical SMMUv3 can share this S2 domain.
>>
>> If a device that belongs to a different physical SMMUv3 gets attached to
>> the above domain, the HWPT attach will eventually fail as the physical
>> smmuv3 in the domains will have a mismatch,
>> https://elixir.bootlin.com/linux/v6.13/source/drivers/iommu/arm/arm-
>> smmu-v3/arm-smmu-v3.c#L2860
>>
>> And as I mentioned in cover letter, Qemu will report,
>>
>> "
>> Attempt to add the HNS VF to a different SMMUv3 will result in,
>>
>> -device vfio-pci,host=0000:7d:02.2,bus=pcie.port3,iommufd=iommufd0:
>> Unable to attach viommu
>> -device vfio-pci,host=0000:7d:02.2,bus=pcie.port3,iommufd=iommufd0: vfio
>> 0000:7d:02.2:
>>    Failed to set iommu_device: [iommufd=29] error attach 0000:7d:02.2 (38)
>> to id=11: Invalid argument
>>
>> At present Qemu is not doing any extra validation other than the above
>> failure to make sure the user configuration is correct or not. The
>> assumption is libvirt will take care of this.
>> "
>> So in summary, if the libvirt gets it wrong, Qemu will fail with error.
>>
>> If a more explicit association is required, some help from kernel is required
>> to identify the physical SMMUv3 associated with the device.
> Again thinking about this, to have an explicit association in the Qemu command 
> line between the vSMMUv3 and the phys smmuv3,
>
> We can possibly add something like,
>
> -device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \
> -device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1 \
> -device arm-smmuv3-accel,bus=pcie.1,phys-smmuv3= smmu3.0x0000000100000000 \
> -device vfio-pci,host=0000:7d:02.1,bus=pcie.port1,iommufd=iommufd0 \
>
> -device pxb-pcie,id=pcie.2,bus_nr=16,bus=pcie.0 \
> -device pcie-root-port,id=pcie.port2,bus=pcie.2,chassis=2 \
> -device arm-smmuv3-nested,id=smmuv2,pci-bus=pcie.2, phys-smmuv3= smmu3.0x0000000200000000  \
> -device vfio-pci,host=0000:75:00.1,bus=pcie.port2,iommufd=iommufd0 \
>
> etc.
>
> And Qemu does some checking to make sure that the device is indeed associated
> with the specified phys-smmuv3.  This can be done going through the sysfs path checking
> which is what I guess libvirt is currently doing to populate the topology. So basically
> Qemu is just replicating that to validate again.
>
> Or another option is extending the IOMMU_GET_HW_INFO IOCTL to return the phys
> smmuv3 base address which can avoid going through the sysfs.
>
> The only difference between the current approach(kernel failing the attach implicitly)
> and the above is, Qemu can provide a validation of inputs and may be report a  better
> error message than just saying " Unable to attach viommu/: Invalid argument".
>
> If the command line looks Ok, I will go with the sysfs path validation method first in my
> next respin.
The command line looks sensible to me. on vfio we use
host=6810000.ethernet. Maybe reuse this instead of phys-smmuv3? Thanks Eric
>
> Please let me know.
>
> Thanks,
> Shameer
>
>
>
>

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Daniel P. Berrangé 1 year, 1 month ago

On Fri, Nov 08, 2024 at 12:52:37PM +0000, Shameer Kolothum via wrote:
> Hi,
> 
> This series adds initial support for a user-creatable "arm-smmuv3-nested"
> device to Qemu. At present the Qemu ARM SMMUv3 emulation is per machine
> and cannot support multiple SMMUv3s.
> 
> In order to support vfio-pci dev assignment with vSMMUv3, the physical
> SMMUv3 has to be configured in nested mode. Having a pluggable
> "arm-smmuv3-nested" device enables us to have multiple vSMMUv3 for Guests
> running on a host with multiple physical SMMUv3s. A few benefits of doing
> this are,

I'm not very familiar with arm, but from this description I'm not
really seeing how "nesting" is involved here. You're only talking
about the host and 1 L1 guest, no L2 guest.

Also what is the relation between the physical SMMUv3 and the guest
SMMUv3 that's referenced ? Is this in fact some form of host device
passthrough rather than nesting ?

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Jason Gunthorpe 1 year, 1 month ago

On Fri, Dec 13, 2024 at 12:00:43PM +0000, Daniel P. Berrangé wrote:
> On Fri, Nov 08, 2024 at 12:52:37PM +0000, Shameer Kolothum via wrote:
> > Hi,
> > 
> > This series adds initial support for a user-creatable "arm-smmuv3-nested"
> > device to Qemu. At present the Qemu ARM SMMUv3 emulation is per machine
> > and cannot support multiple SMMUv3s.
> > 
> > In order to support vfio-pci dev assignment with vSMMUv3, the physical
> > SMMUv3 has to be configured in nested mode. Having a pluggable
> > "arm-smmuv3-nested" device enables us to have multiple vSMMUv3 for Guests
> > running on a host with multiple physical SMMUv3s. A few benefits of doing
> > this are,
> 
> I'm not very familiar with arm, but from this description I'm not
> really seeing how "nesting" is involved here. You're only talking
> about the host and 1 L1 guest, no L2 guest.

nesting is the term the iommu side is using to refer to the 2
dimensional paging, ie a guest page table on top of a hypervisor page
table.

Nothing to do with vm nesting.

> Also what is the relation between the physical SMMUv3 and the guest
> SMMUv3 that's referenced ? Is this in fact some form of host device
> passthrough rather than nesting ?

It is an acceeleration feature, the iommu HW does more work instead of
the software emulating things. Similar to how the 2d paging option in
KVM is an acceleration feature.

All of the iommu series on vfio are creating paravirtualized iommu
models inside the VM. They access various levels of HW acceleration to
speed up the paravirtualization.

Jason

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Peter Maydell 1 year, 1 month ago

On Fri, 13 Dec 2024 at 12:46, Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Fri, Dec 13, 2024 at 12:00:43PM +0000, Daniel P. Berrangé wrote:
> > On Fri, Nov 08, 2024 at 12:52:37PM +0000, Shameer Kolothum via wrote:
> > > Hi,
> > >
> > > This series adds initial support for a user-creatable "arm-smmuv3-nested"
> > > device to Qemu. At present the Qemu ARM SMMUv3 emulation is per machine
> > > and cannot support multiple SMMUv3s.
> > >
> > > In order to support vfio-pci dev assignment with vSMMUv3, the physical
> > > SMMUv3 has to be configured in nested mode. Having a pluggable
> > > "arm-smmuv3-nested" device enables us to have multiple vSMMUv3 for Guests
> > > running on a host with multiple physical SMMUv3s. A few benefits of doing
> > > this are,
> >
> > I'm not very familiar with arm, but from this description I'm not
> > really seeing how "nesting" is involved here. You're only talking
> > about the host and 1 L1 guest, no L2 guest.
>
> nesting is the term the iommu side is using to refer to the 2
> dimensional paging, ie a guest page table on top of a hypervisor page
> table.

Isn't that more usually called "two stage" paging? Calling
that "nesting" seems like it is going to be massively confusing...

Also, how does it relate to what this series seems to be
doing, where we provide the guest with two separate SMMUs?
(Are those two SMMUs "nested" in the sense that one is sitting
behind the other?)

thanks
-- PMM

RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Shameerali Kolothum Thodi via 1 year, 1 month ago


> -----Original Message-----
> From: Peter Maydell <peter.maydell@linaro.org>
> Sent: Friday, December 13, 2024 1:33 PM
> To: Jason Gunthorpe <jgg@nvidia.com>
> Cc: Daniel P. Berrangé <berrange@redhat.com>; Shameerali Kolothum
> Thodi <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
> qemu-devel@nongnu.org; eric.auger@redhat.com; nicolinc@nvidia.com;
> ddutile@redhat.com; Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> Jonathan Cameron <jonathan.cameron@huawei.com>;
> zhangfei.gao@linaro.org
> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> nested SMMUv3
> 
> On Fri, 13 Dec 2024 at 12:46, Jason Gunthorpe <jgg@nvidia.com> wrote:
> >
> > On Fri, Dec 13, 2024 at 12:00:43PM +0000, Daniel P. Berrangé wrote:
> > > On Fri, Nov 08, 2024 at 12:52:37PM +0000, Shameer Kolothum via wrote:
> > > > Hi,
> > > >
> > > > This series adds initial support for a user-creatable "arm-smmuv3-
> nested"
> > > > device to Qemu. At present the Qemu ARM SMMUv3 emulation is per
> machine
> > > > and cannot support multiple SMMUv3s.
> > > >
> > > > In order to support vfio-pci dev assignment with vSMMUv3, the
> physical
> > > > SMMUv3 has to be configured in nested mode. Having a pluggable
> > > > "arm-smmuv3-nested" device enables us to have multiple vSMMUv3
> for Guests
> > > > running on a host with multiple physical SMMUv3s. A few benefits of
> doing
> > > > this are,
> > >
> > > I'm not very familiar with arm, but from this description I'm not
> > > really seeing how "nesting" is involved here. You're only talking
> > > about the host and 1 L1 guest, no L2 guest.
> >
> > nesting is the term the iommu side is using to refer to the 2
> > dimensional paging, ie a guest page table on top of a hypervisor page
> > table.
> 
> Isn't that more usually called "two stage" paging? Calling
> that "nesting" seems like it is going to be massively confusing...

Yes. This will be renamed in future revisions as arm-smmuv3-accel.

> 
> Also, how does it relate to what this series seems to be
> doing, where we provide the guest with two separate SMMUs?
> (Are those two SMMUs "nested" in the sense that one is sitting
> behind the other?)

I don't think it requires two SMMUs in Guest. The nested or "two
stage" means the stage 1 page table is owned by Guest and stage 2
by host. And this is achieved by IOMMUFD provided IOCTLs. 

There is a precurser to this series where the support for hw accelerated
2 stage support is added in Qemu SMMUv3 code.

Please see the complete branch here,
https://github.com/hisilicon/qemu/commits/private-smmuv3-nested-dev-rfc-v1/
And patches prior to this commit adds that support: 
4ccdbe3: ("cover-letter: Add HW accelerated nesting support for arm
SMMUv3")

Nicolin is soon going to send out those for review. Or I can include
those in this series so that it gives a complete picture. Nicolin?

Hope this clarifies any confusion.

Thanks,
Shameer

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Nicolin Chen 1 year, 1 month ago

On Mon, Dec 16, 2024 at 10:01:29AM +0000, Shameerali Kolothum Thodi wrote:
> And patches prior to this commit adds that support: 
> 4ccdbe3: ("cover-letter: Add HW accelerated nesting support for arm
> SMMUv3")
> 
> Nicolin is soon going to send out those for review. Or I can include
> those in this series so that it gives a complete picture. Nicolin?

Just found that I forgot to reply this one...sorry

I asked Don/Eric to take over that vSMMU series:
https://lore.kernel.org/qemu-devel/Zy0jiPItu8A3wNTL@Asurada-Nvidia/
(The majority of my effort has been still on the kernel side:
 previously vIOMMU/vDEVICE, and now vEVENTQ/MSI/vCMDQ..)

Don/Eric, is there any update from your side?

I think it's also a good time to align with each other so we
can take our next step in the new year :)

Thanks
Nicolin

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Eric Auger 1 year ago

Hi Nicolin,


On 1/9/25 5:45 AM, Nicolin Chen wrote:
> On Mon, Dec 16, 2024 at 10:01:29AM +0000, Shameerali Kolothum Thodi wrote:
>> And patches prior to this commit adds that support: 
>> 4ccdbe3: ("cover-letter: Add HW accelerated nesting support for arm
>> SMMUv3")
>>
>> Nicolin is soon going to send out those for review. Or I can include
>> those in this series so that it gives a complete picture. Nicolin?
> Just found that I forgot to reply this one...sorry
>
> I asked Don/Eric to take over that vSMMU series:
> https://lore.kernel.org/qemu-devel/Zy0jiPItu8A3wNTL@Asurada-Nvidia/
> (The majority of my effort has been still on the kernel side:
>  previously vIOMMU/vDEVICE, and now vEVENTQ/MSI/vCMDQ..)
>
> Don/Eric, is there any update from your side?
To be honest we have not much progressed so far. On my end I can
dedicate some cycles now. I currently try to understand how and what
subset I can respin and which test setup can be used. I will come back
to you next week.

Eric

>
> I think it's also a good time to align with each other so we
> can take our next step in the new year :)
>
> Thanks
> Nicolin
>

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Nicolin Chen 1 year ago

On Fri, Jan 31, 2025 at 05:54:56PM +0100, Eric Auger wrote:
> On 1/9/25 5:45 AM, Nicolin Chen wrote:
> > On Mon, Dec 16, 2024 at 10:01:29AM +0000, Shameerali Kolothum Thodi wrote:
> >> And patches prior to this commit adds that support: 
> >> 4ccdbe3: ("cover-letter: Add HW accelerated nesting support for arm
> >> SMMUv3")
> >>
> >> Nicolin is soon going to send out those for review. Or I can include
> >> those in this series so that it gives a complete picture. Nicolin?
> > Just found that I forgot to reply this one...sorry
> >
> > I asked Don/Eric to take over that vSMMU series:
> > https://lore.kernel.org/qemu-devel/Zy0jiPItu8A3wNTL@Asurada-Nvidia/
> > (The majority of my effort has been still on the kernel side:
> >  previously vIOMMU/vDEVICE, and now vEVENTQ/MSI/vCMDQ..)
> >
> > Don/Eric, is there any update from your side?
> To be honest we have not much progressed so far. On my end I can
> dedicate some cycles now. I currently try to understand how and what
> subset I can respin and which test setup can be used. I will come back
> to you next week.

In summary, we will have the following series:
1) HWPT uAPI patches in backends/iommufd.c (Zhenzhong or Shameer)
   https://lore.kernel.org/qemu-devel/SJ0PR11MB6744943702EB5798EC9B3B9992E02@SJ0PR11MB6744.namprd11.prod.outlook.com/
2) vIOMMU uAPI patches in backends/iommufd.c (I will rebase/send)
3) vSMMUv3 patches for HW-acc/nesting (Hoping Don/you could take over)
4) Shameer's work on "-device" in ARM virt.c
5) vEVENTQ for fault injection (if time is right, squash into 2/3)

Perhaps, 3/4 would come in a different order, or maybe 4 could split
into a few patches changing "-device" (sending before 3) and then a
few other patches adding multi-vSMMU support (sending after 3).

My latest QEMU branch for reference:
https://github.com/nicolinc/qemu/commits/wip/for_iommufd_veventq-v6
It hasn't integrated Shameer's and Nathan's work though..
For testing, use this kernel branch:
https://github.com/nicolinc/iommufd/commits/iommufd_veventq-v6-with-rmr

I think we'd need to build a shared branch by integrating the latest
series in the list above.

Thanks
Nicolin

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Eric Auger 1 year ago

Hi Nicolin, Shameer,

On 2/3/25 7:50 PM, Nicolin Chen wrote:
> On Fri, Jan 31, 2025 at 05:54:56PM +0100, Eric Auger wrote:
>> On 1/9/25 5:45 AM, Nicolin Chen wrote:
>>> On Mon, Dec 16, 2024 at 10:01:29AM +0000, Shameerali Kolothum Thodi wrote:
>>>> And patches prior to this commit adds that support: 
>>>> 4ccdbe3: ("cover-letter: Add HW accelerated nesting support for arm
>>>> SMMUv3")
>>>>
>>>> Nicolin is soon going to send out those for review. Or I can include
>>>> those in this series so that it gives a complete picture. Nicolin?
>>> Just found that I forgot to reply this one...sorry
>>>
>>> I asked Don/Eric to take over that vSMMU series:
>>> https://lore.kernel.org/qemu-devel/Zy0jiPItu8A3wNTL@Asurada-Nvidia/
>>> (The majority of my effort has been still on the kernel side:
>>>  previously vIOMMU/vDEVICE, and now vEVENTQ/MSI/vCMDQ..)
>>>
>>> Don/Eric, is there any update from your side?
>> To be honest we have not much progressed so far. On my end I can
>> dedicate some cycles now. I currently try to understand how and what
>> subset I can respin and which test setup can be used. I will come back
>> to you next week.
> In summary, we will have the following series:
> 1) HWPT uAPI patches in backends/iommufd.c (Zhenzhong or Shameer)
>    https://lore.kernel.org/qemu-devel/SJ0PR11MB6744943702EB5798EC9B3B9992E02@SJ0PR11MB6744.namprd11.prod.outlook.com/
> 2) vIOMMU uAPI patches in backends/iommufd.c (I will rebase/send)
for 1 and 2, are you taking about the "Add VIOMMU infrastructure support
" series in Shameer's branch: private-smmuv3-nested-dev-rfc-v1.
Sorry I may instead refer to NVidia or Intel's branch but I am not sure
about the last ones.
> 3) vSMMUv3 patches for HW-acc/nesting (Hoping Don/you could take over)
We can start sending it upstream assuming we have a decent test environment.

However in
https://lore.kernel.org/all/329445b2f68a47269292aefb34584375@huawei.com/

Shameer suggested he may include it in his SMMU multi instance series.
What do you both prefer?

Eric


> 4) Shameer's work on "-device" in ARM virt.c
> 5) vEVENTQ for fault injection (if time is right, squash into 2/3)
>
> Perhaps, 3/4 would come in a different order, or maybe 4 could split
> into a few patches changing "-device" (sending before 3) and then a
> few other patches adding multi-vSMMU support (sending after 3).
>
> My latest QEMU branch for reference:
> https://github.com/nicolinc/qemu/commits/wip/for_iommufd_veventq-v6
> It hasn't integrated Shameer's and Nathan's work though..
> For testing, use this kernel branch:
> https://github.com/nicolinc/iommufd/commits/iommufd_veventq-v6-with-rmr
>
> I think we'd need to build a shared branch by integrating the latest
> series in the list above.
>
> Thanks
> Nicolin
>

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Nicolin Chen 1 year ago

On Tue, Feb 04, 2025 at 06:49:15PM +0100, Eric Auger wrote:
> > In summary, we will have the following series:
> > 1) HWPT uAPI patches in backends/iommufd.c (Zhenzhong or Shameer)
> >    https://lore.kernel.org/qemu-devel/SJ0PR11MB6744943702EB5798EC9B3B9992E02@SJ0PR11MB6744.namprd11.prod.outlook.com/
> > 2) vIOMMU uAPI patches in backends/iommufd.c (I will rebase/send)

> for 1 and 2, are you taking about the "Add VIOMMU infrastructure support
> " series in Shameer's branch: private-smmuv3-nested-dev-rfc-v1.
> Sorry I may instead refer to NVidia or Intel's branch but I am not sure
> about the last ones.

That "vIOMMU infrastructure" is for 2, yes.

For 1, it's inside the Intel's series:
"cover-letter: intel_iommu: Enable stage-1 translation for passthrough device"

So, we need to extract them out and make it separately..

> > 3) vSMMUv3 patches for HW-acc/nesting (Hoping Don/you could take over)
> We can start sending it upstream assuming we have a decent test environment.
> 
> However in
> https://lore.kernel.org/all/329445b2f68a47269292aefb34584375@huawei.com/
> 
> Shameer suggested he may include it in his SMMU multi instance series.
> What do you both prefer?

Sure, I think it's good to include those patches, though I believe
we need to build a new shared branch as Shameer's branch might not
reflect the latest kernel uAPI header.

Here is a new branch on top of latest master tree (v9.2.50):
https://github.com/nicolinc/qemu/commits/wip/for_shameer_02042025

I took HWPT patches from Zhenzhong's series and rebased all related
changes from my tree. I did some sanity and it should work with RMR.

Shameer, would you please try this branch and then integrate your
series on top of the following series?
   cover-letter: Add HW accelerated nesting support for arm SMMUv3
   cover-letter: Add vIOMMU-based nesting infrastructure support
   cover-letter: Add HWPT-based nesting infrastructure support
Basically, just replace my old multi-instance series with yours, to
create a shared branch for all of us.

Eric, perhaps you can start to look at the these series. Even the
first two iommufd series are a bit of rough integrations :)

Thanks
Nicolin

RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Shameerali Kolothum Thodi via 1 year ago

Hi Nicolin,

> -----Original Message-----
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Wednesday, February 5, 2025 12:09 AM
> To: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; Eric Auger
> <eric.auger@redhat.com>
> Cc: ddutile@redhat.com; Peter Maydell <peter.maydell@linaro.org>; Jason
> Gunthorpe <jgg@nvidia.com>; Daniel P. Berrangé <berrange@redhat.com>;
> qemu-arm@nongnu.org; qemu-devel@nongnu.org; Linuxarm
> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> nested SMMUv3
> 
> On Tue, Feb 04, 2025 at 06:49:15PM +0100, Eric Auger wrote:
> > > In summary, we will have the following series:
> > > 1) HWPT uAPI patches in backends/iommufd.c (Zhenzhong or Shameer)
> > >    https://lore.kernel.org/qemu-
> devel/SJ0PR11MB6744943702EB5798EC9B3B9992E02@SJ0PR11MB6744.nam
> prd11.prod.outlook.com/
> > > 2) vIOMMU uAPI patches in backends/iommufd.c (I will rebase/send)
> 
> > for 1 and 2, are you taking about the "Add VIOMMU infrastructure
> support
> > " series in Shameer's branch: private-smmuv3-nested-dev-rfc-v1.
> > Sorry I may instead refer to NVidia or Intel's branch but I am not sure
> > about the last ones.
> 
> That "vIOMMU infrastructure" is for 2, yes.
> 
> For 1, it's inside the Intel's series:
> "cover-letter: intel_iommu: Enable stage-1 translation for passthrough
> device"
> 
> So, we need to extract them out and make it separately..
> 
> > > 3) vSMMUv3 patches for HW-acc/nesting (Hoping Don/you could take
> over)
> > We can start sending it upstream assuming we have a decent test
> environment.
> >
> > However in
> >
> https://lore.kernel.org/all/329445b2f68a47269292aefb34584375@huawei.c
> om/
> >
> > Shameer suggested he may include it in his SMMU multi instance series.
> > What do you both prefer?
> 
> Sure, I think it's good to include those patches, 

One of the feedback I received on my series was to rename "arm-smmuv3-nested"
to "arm-smmuv3-accel" and possibly rename function names to include "accel' as well
and move those functions to a separate "smmuv3-accel.c" file. I suppose that applies to 
the " Add HW accelerated nesting support for arm SMMUv3" series as well. 

Is that fine with you?

Thanks,
Shameer

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Nicolin Chen 1 year ago

On Thu, Feb 06, 2025 at 10:34:15AM +0000, Shameerali Kolothum Thodi wrote:
> > -----Original Message-----
> > From: Nicolin Chen <nicolinc@nvidia.com>
> > On Tue, Feb 04, 2025 at 06:49:15PM +0100, Eric Auger wrote:
> > > However in
> > >
> > > Shameer suggested he may include it in his SMMU multi instance series.
> > > What do you both prefer?
> > 
> > Sure, I think it's good to include those patches, 
> 
> One of the feedback I received on my series was to rename "arm-smmuv3-nested"
> to "arm-smmuv3-accel" and possibly rename function names to include "accel' as well
> and move those functions to a separate "smmuv3-accel.c" file. I suppose that applies to 
> the " Add HW accelerated nesting support for arm SMMUv3" series as well. 
> 
> Is that fine with you?

Oh, no problem. If you want to rename the whole thing, please feel
free. I do see the naming conflict between the "nested" stage and
the "nested" HW feature, which are both supported by the vSMMU now.

Thanks
Nicolin

RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Shameerali Kolothum Thodi via 11 months, 1 week ago

Hi Nicolin,

> -----Original Message-----
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Thursday, February 6, 2025 6:58 PM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: Eric Auger <eric.auger@redhat.com>; ddutile@redhat.com; Peter
> Maydell <peter.maydell@linaro.org>; Jason Gunthorpe <jgg@nvidia.com>;
> Daniel P. Berrangé <berrange@redhat.com>; qemu-arm@nongnu.org;
> qemu-devel@nongnu.org; Linuxarm <linuxarm@huawei.com>; Wangzhou
> (B) <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> Jonathan Cameron <jonathan.cameron@huawei.com>;
> zhangfei.gao@linaro.org
> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> nested SMMUv3
> 
[..]

> > One of the feedback I received on my series was to rename "arm-smmuv3-
> nested"
> > to "arm-smmuv3-accel" and possibly rename function names to include
> "accel' as well
> > and move those functions to a separate "smmuv3-accel.c" file. I suppose
> that applies to
> > the " Add HW accelerated nesting support for arm SMMUv3" series as
> well.
> >
> > Is that fine with you?
> 
> Oh, no problem. If you want to rename the whole thing, please feel
> free. I do see the naming conflict between the "nested" stage and
> the "nested" HW feature, which are both supported by the vSMMU now.

I am working on the above now and have quick question to you😊.

Looking at the smmu_dev_attach_viommu() fn here[0],
it appears to do the following:

1. Alloc a s2_hwpt if not allocated already and attach it.
2. Allocate abort and bypass hwpt
3. Attach bypass hwpt.

I didn't get why we are doing the step 3 here. To me it looks like,
when we attach the s2_hwpt(ie, the nested parent domain attach), 
the kernel will do,

arm_smmu_attach_dev()
  arm_smmu_make_s2_domain_ste()

It appears through step 3, we achieve the same thing again.

Or it is possible I missed something obvious here.

Please let me know.

Thanks,
Shameer

[0] https://github.com/nicolinc/qemu/blob/wip/for_shameer_02042025/hw/arm/smmu-common.c#L910C13-L910C35

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Nicolin Chen 11 months, 1 week ago

On Mon, Mar 03, 2025 at 03:21:57PM +0000, Shameerali Kolothum Thodi wrote:
> I am working on the above now and have quick question to you😊.
> 
> Looking at the smmu_dev_attach_viommu() fn here[0],
> it appears to do the following:
> 
> 1. Alloc a s2_hwpt if not allocated already and attach it.
> 2. Allocate abort and bypass hwpt
> 3. Attach bypass hwpt.
> 
> I didn't get why we are doing the step 3 here. To me it looks like,
> when we attach the s2_hwpt(ie, the nested parent domain attach), 
> the kernel will do,
> 
> arm_smmu_attach_dev()
>   arm_smmu_make_s2_domain_ste()
> 
> It appears through step 3, we achieve the same thing again.
> 
> Or it is possible I missed something obvious here.

Because a device cannot attach to a vIOMMU object directly, but
only via a proxy hwpt_nested. So, this bypass hwpt gives us the
port to associate the device to the vIOMMU, before a vDEVICE or
a "translate" hwpt_nested is allocated.

Currently it's the same because an S2 parent hwpt holds a VMID,
so we could just attach the device to the S2 hwpt for the same
STE configuration as attaching the device to the proxy bypass
hwpt. Yet, this will change in the future after letting vIOMMU
objects hold their own VMIDs to share a common S2 parent hwpt
that won't have a VMID, i.e. arm_smmu_make_s2_domain_ste() will
need the vIOMMU object to get the VMID for STE.

I should have added a few lines of comments there :)

Thanks
Nicolin

RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Shameerali Kolothum Thodi via 11 months, 1 week ago


> -----Original Message-----
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Monday, March 3, 2025 5:05 PM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: Eric Auger <eric.auger@redhat.com>; ddutile@redhat.com; Peter
> Maydell <peter.maydell@linaro.org>; Jason Gunthorpe <jgg@nvidia.com>;
> Daniel P. Berrangé <berrange@redhat.com>; qemu-arm@nongnu.org;
> qemu-devel@nongnu.org; Linuxarm <linuxarm@huawei.com>; Wangzhou
> (B) <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> Jonathan Cameron <jonathan.cameron@huawei.com>;
> zhangfei.gao@linaro.org
> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> nested SMMUv3
> 
> On Mon, Mar 03, 2025 at 03:21:57PM +0000, Shameerali Kolothum Thodi
> wrote:
> > I am working on the above now and have quick question to you😊.
> >
> > Looking at the smmu_dev_attach_viommu() fn here[0],
> > it appears to do the following:
> >
> > 1. Alloc a s2_hwpt if not allocated already and attach it.
> > 2. Allocate abort and bypass hwpt
> > 3. Attach bypass hwpt.
> >
> > I didn't get why we are doing the step 3 here. To me it looks like,
> > when we attach the s2_hwpt(ie, the nested parent domain attach),
> > the kernel will do,
> >
> > arm_smmu_attach_dev()
> >   arm_smmu_make_s2_domain_ste()
> >
> > It appears through step 3, we achieve the same thing again.
> >
> > Or it is possible I missed something obvious here.
> 
> Because a device cannot attach to a vIOMMU object directly, but
> only via a proxy hwpt_nested. So, this bypass hwpt gives us the
> port to associate the device to the vIOMMU, before a vDEVICE or
> a "translate" hwpt_nested is allocated.
> 
> Currently it's the same because an S2 parent hwpt holds a VMID,
> so we could just attach the device to the S2 hwpt for the same
> STE configuration as attaching the device to the proxy bypass
> hwpt. Yet, this will change in the future after letting vIOMMU
> objects hold their own VMIDs to share a common S2 parent hwpt
> that won't have a VMID, i.e. arm_smmu_make_s2_domain_ste() will
> need the vIOMMU object to get the VMID for STE.
> 
> I should have added a few lines of comments there :)

Ok. Thanks for the explanation. I will keep it then and add few comments
to make it clear.

Do you have an initial implementation of the above with vIOMMU object
holding the VMIDs to share? Actually I do have a dependency on that for
my KVM pinned VMID series[0] where it was suggested that the VMID
should associated with a vIOMMU object rather than the IOMMUFD
context I used in there.

And Jason mentioned about the work involved to do that here[1]. Appreciate
if you could share if any progress is made on that so that I can try to rebase
that KVM  Pinned series on top of that and give it a try.

Thanks,
Shameer
[0] https://lore.kernel.org/linux-iommu/20240208151837.35068-1-shameerali.kolothum.thodi@huawei.com/
[1] https://lore.kernel.org/linux-arm-kernel/20241129150628.GG1253388@nvidia.com/

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Eric Auger 1 year ago

Hi Nicolin,


On 2/5/25 1:08 AM, Nicolin Chen wrote:
> On Tue, Feb 04, 2025 at 06:49:15PM +0100, Eric Auger wrote:
>>> In summary, we will have the following series:
>>> 1) HWPT uAPI patches in backends/iommufd.c (Zhenzhong or Shameer)
>>>    https://lore.kernel.org/qemu-devel/SJ0PR11MB6744943702EB5798EC9B3B9992E02@SJ0PR11MB6744.namprd11.prod.outlook.com/
>>> 2) vIOMMU uAPI patches in backends/iommufd.c (I will rebase/send)
>> for 1 and 2, are you taking about the "Add VIOMMU infrastructure support
>> " series in Shameer's branch: private-smmuv3-nested-dev-rfc-v1.
>> Sorry I may instead refer to NVidia or Intel's branch but I am not sure
>> about the last ones.
> That "vIOMMU infrastructure" is for 2, yes.
>
> For 1, it's inside the Intel's series:
> "cover-letter: intel_iommu: Enable stage-1 translation for passthrough device"
>
> So, we need to extract them out and make it separately..

OK
>
>>> 3) vSMMUv3 patches for HW-acc/nesting (Hoping Don/you could take over)
>> We can start sending it upstream assuming we have a decent test environment.
>>
>> However in
>> https://lore.kernel.org/all/329445b2f68a47269292aefb34584375@huawei.com/
>>
>> Shameer suggested he may include it in his SMMU multi instance series.
>> What do you both prefer?
> Sure, I think it's good to include those patches, though I believe
> we need to build a new shared branch as Shameer's branch might not
> reflect the latest kernel uAPI header.
>
> Here is a new branch on top of latest master tree (v9.2.50):
> https://github.com/nicolinc/qemu/commits/wip/for_shameer_02042025
>
> I took HWPT patches from Zhenzhong's series and rebased all related
> changes from my tree. I did some sanity and it should work with RMR.
>
> Shameer, would you please try this branch and then integrate your
> series on top of the following series?
>    cover-letter: Add HW accelerated nesting support for arm SMMUv3
>    cover-letter: Add vIOMMU-based nesting infrastructure support
>    cover-letter: Add HWPT-based nesting infrastructure support
> Basically, just replace my old multi-instance series with yours, to
> create a shared branch for all of us.
>
> Eric, perhaps you can start to look at the these series. Even the
> first two iommufd series are a bit of rough integrations :)
OK I am starting this week

Eric
>
> Thanks
> Nicolin
>

RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Shameerali Kolothum Thodi via 1 year ago


> -----Original Message-----
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Wednesday, February 5, 2025 12:09 AM
> To: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; Eric Auger
> <eric.auger@redhat.com>
> Cc: ddutile@redhat.com; Peter Maydell <peter.maydell@linaro.org>; Jason
> Gunthorpe <jgg@nvidia.com>; Daniel P. Berrangé <berrange@redhat.com>;
> qemu-arm@nongnu.org; qemu-devel@nongnu.org; Linuxarm
> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> nested SMMUv3
> 
> On Tue, Feb 04, 2025 at 06:49:15PM +0100, Eric Auger wrote:
> > > In summary, we will have the following series:
> > > 1) HWPT uAPI patches in backends/iommufd.c (Zhenzhong or Shameer)
> > >    https://lore.kernel.org/qemu-
> devel/SJ0PR11MB6744943702EB5798EC9B3B9992E02@SJ0PR11MB6744.nam
> prd11.prod.outlook.com/
> > > 2) vIOMMU uAPI patches in backends/iommufd.c (I will rebase/send)
> 
> > for 1 and 2, are you taking about the "Add VIOMMU infrastructure
> support
> > " series in Shameer's branch: private-smmuv3-nested-dev-rfc-v1.
> > Sorry I may instead refer to NVidia or Intel's branch but I am not sure
> > about the last ones.
> 
> That "vIOMMU infrastructure" is for 2, yes.
> 
> For 1, it's inside the Intel's series:
> "cover-letter: intel_iommu: Enable stage-1 translation for passthrough
> device"
> 
> So, we need to extract them out and make it separately..
> 
> > > 3) vSMMUv3 patches for HW-acc/nesting (Hoping Don/you could take
> over)
> > We can start sending it upstream assuming we have a decent test
> environment.
> >
> > However in
> >
> https://lore.kernel.org/all/329445b2f68a47269292aefb34584375@huawei.c
> om/
> >
> > Shameer suggested he may include it in his SMMU multi instance series.
> > What do you both prefer?
> 
> Sure, I think it's good to include those patches, though I believe
> we need to build a new shared branch as Shameer's branch might not
> reflect the latest kernel uAPI header.
> 
> Here is a new branch on top of latest master tree (v9.2.50):
> https://github.com/nicolinc/qemu/commits/wip/for_shameer_02042025
> 
> I took HWPT patches from Zhenzhong's series and rebased all related
> changes from my tree. I did some sanity and it should work with RMR.
> 
> Shameer, would you please try this branch and then integrate your
> series on top of the following series?
>    cover-letter: Add HW accelerated nesting support for arm SMMUv3
>    cover-letter: Add vIOMMU-based nesting infrastructure support
>    cover-letter: Add HWPT-based nesting infrastructure support
> Basically, just replace my old multi-instance series with yours, to
> create a shared branch for all of us.

Ok. I will take a look at that and rebase.

Thanks,
Shameer

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Donald Dutile 1 year ago

Nicolin,
Hi!


On 1/8/25 11:45 PM, Nicolin Chen wrote:
> On Mon, Dec 16, 2024 at 10:01:29AM +0000, Shameerali Kolothum Thodi wrote:
>> And patches prior to this commit adds that support:
>> 4ccdbe3: ("cover-letter: Add HW accelerated nesting support for arm
>> SMMUv3")
>>
>> Nicolin is soon going to send out those for review. Or I can include
>> those in this series so that it gives a complete picture. Nicolin?
> 
> Just found that I forgot to reply this one...sorry
> 
> I asked Don/Eric to take over that vSMMU series:
> https://lore.kernel.org/qemu-devel/Zy0jiPItu8A3wNTL@Asurada-Nvidia/
> (The majority of my effort has been still on the kernel side:
>   previously vIOMMU/vDEVICE, and now vEVENTQ/MSI/vCMDQ..)
> 
> Don/Eric, is there any update from your side?
> 
Apologies for delayed response, been at customer site, and haven't been keeping up w/biz email.
Eric is probably waiting for me to get back and chat as well.
Will look to reply early next week.
- Don

> I think it's also a good time to align with each other so we
> can take our next step in the new year :)
> 
> Thanks
> Nicolin
>

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Nicolin Chen 1 year ago

Hi Don,

On Fri, Jan 10, 2025 at 11:05:24PM -0500, Donald Dutile wrote:
> On 1/8/25 11:45 PM, Nicolin Chen wrote:
> > On Mon, Dec 16, 2024 at 10:01:29AM +0000, Shameerali Kolothum Thodi wrote:
> > > And patches prior to this commit adds that support:
> > > 4ccdbe3: ("cover-letter: Add HW accelerated nesting support for arm
> > > SMMUv3")
> > > 
> > > Nicolin is soon going to send out those for review. Or I can include
> > > those in this series so that it gives a complete picture. Nicolin?
> > 
> > Just found that I forgot to reply this one...sorry
> > 
> > I asked Don/Eric to take over that vSMMU series:
> > https://lore.kernel.org/qemu-devel/Zy0jiPItu8A3wNTL@Asurada-Nvidia/
> > (The majority of my effort has been still on the kernel side:
> >   previously vIOMMU/vDEVICE, and now vEVENTQ/MSI/vCMDQ..)
> > 
> > Don/Eric, is there any update from your side?
> > 
> Apologies for delayed response, been at customer site, and haven't been keeping up w/biz email.
> Eric is probably waiting for me to get back and chat as well.
> Will look to reply early next week.
 
I wonder if we can make some progress in Feb? If so, we can start
to wrap up the iommufd uAPI patches for HWPT, which was a part of
intel's series but never got sent since their emulated series is
seemingly still pending?

One detail for the uAPI patches is to decide how vIOMMU code will
interact with those backend APIs.. Hopefully, you and Eric should
have something in mind :)

Thanks
Nicolin

RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Shameerali Kolothum Thodi via 1 year ago


> -----Original Message-----
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Thursday, January 23, 2025 4:10 AM
> To: Donald Dutile <ddutile@redhat.com>
> Cc: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; eric.auger@redhat.com; Peter
> Maydell <peter.maydell@linaro.org>; Jason Gunthorpe <jgg@nvidia.com>;
> Daniel P. Berrangé <berrange@redhat.com>; qemu-arm@nongnu.org;
> qemu-devel@nongnu.org; Linuxarm <linuxarm@huawei.com>; Wangzhou
> (B) <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> Jonathan Cameron <jonathan.cameron@huawei.com>;
> zhangfei.gao@linaro.org
> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> nested SMMUv3
> 
> Hi Don,
> 
> On Fri, Jan 10, 2025 at 11:05:24PM -0500, Donald Dutile wrote:
> > On 1/8/25 11:45 PM, Nicolin Chen wrote:
> > > On Mon, Dec 16, 2024 at 10:01:29AM +0000, Shameerali Kolothum Thodi
> wrote:
> > > > And patches prior to this commit adds that support:
> > > > 4ccdbe3: ("cover-letter: Add HW accelerated nesting support for arm
> > > > SMMUv3")
> > > >
> > > > Nicolin is soon going to send out those for review. Or I can include
> > > > those in this series so that it gives a complete picture. Nicolin?
> > >
> > > Just found that I forgot to reply this one...sorry
> > >
> > > I asked Don/Eric to take over that vSMMU series:
> > > https://lore.kernel.org/qemu-devel/Zy0jiPItu8A3wNTL@Asurada-Nvidia/
> > > (The majority of my effort has been still on the kernel side:
> > >   previously vIOMMU/vDEVICE, and now vEVENTQ/MSI/vCMDQ..)
> > >
> > > Don/Eric, is there any update from your side?
> > >
> > Apologies for delayed response, been at customer site, and haven't been
> keeping up w/biz email.
> > Eric is probably waiting for me to get back and chat as well.
> > Will look to reply early next week.
> 
> I wonder if we can make some progress in Feb? If so, we can start
> to wrap up the iommufd uAPI patches for HWPT, which was a part of
> intel's series but never got sent since their emulated series is
> seemingly still pending?

I think these are the  5 patches that we require from Intel pass-through series,

vfio/iommufd: Implement [at|de]tach_hwpt handlers
vfio/iommufd: Implement HostIOMMUDeviceClass::realize_late() handler
HostIOMMUDevice: Introduce realize_late callback
vfio/iommufd: Add properties and handlers to TYPE_HOST_IOMMU_DEVICE_IOMMUFD
backends/iommufd: Add helpers for invalidating user-managed HWPT

See the commits from here,
https://github.com/hisilicon/qemu/commit/bbdc65af38fa5723f1bd9b026e292730901f57b5

[CC  Zhenzhong]

Hi Zhenzhong,

Just wondering what your plans are for the above patches.  If it make sense and you
are fine with it, I think it is a good idea one of us can pick up those from that series
and sent out separately so that it can get some review and take it forward.

Thanks,
Shameer

RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Duan, Zhenzhong 1 year ago

Hi Shameer,

>-----Original Message-----
>From: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
>Subject: RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested
>SMMUv3
>
>
>
>> -----Original Message-----
>> From: Nicolin Chen <nicolinc@nvidia.com>
>> Sent: Thursday, January 23, 2025 4:10 AM
>> To: Donald Dutile <ddutile@redhat.com>
>> Cc: Shameerali Kolothum Thodi
>> <shameerali.kolothum.thodi@huawei.com>; eric.auger@redhat.com; Peter
>> Maydell <peter.maydell@linaro.org>; Jason Gunthorpe <jgg@nvidia.com>;
>> Daniel P. Berrangé <berrange@redhat.com>; qemu-arm@nongnu.org;
>> qemu-devel@nongnu.org; Linuxarm <linuxarm@huawei.com>; Wangzhou
>> (B) <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
>> Jonathan Cameron <jonathan.cameron@huawei.com>;
>> zhangfei.gao@linaro.org
>> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
>> nested SMMUv3
>>
>> Hi Don,
>>
>> On Fri, Jan 10, 2025 at 11:05:24PM -0500, Donald Dutile wrote:
>> > On 1/8/25 11:45 PM, Nicolin Chen wrote:
>> > > On Mon, Dec 16, 2024 at 10:01:29AM +0000, Shameerali Kolothum Thodi
>> wrote:
>> > > > And patches prior to this commit adds that support:
>> > > > 4ccdbe3: ("cover-letter: Add HW accelerated nesting support for arm
>> > > > SMMUv3")
>> > > >
>> > > > Nicolin is soon going to send out those for review. Or I can include
>> > > > those in this series so that it gives a complete picture. Nicolin?
>> > >
>> > > Just found that I forgot to reply this one...sorry
>> > >
>> > > I asked Don/Eric to take over that vSMMU series:
>> > > https://lore.kernel.org/qemu-devel/Zy0jiPItu8A3wNTL@Asurada-Nvidia/
>> > > (The majority of my effort has been still on the kernel side:
>> > >   previously vIOMMU/vDEVICE, and now vEVENTQ/MSI/vCMDQ..)
>> > >
>> > > Don/Eric, is there any update from your side?
>> > >
>> > Apologies for delayed response, been at customer site, and haven't been
>> keeping up w/biz email.
>> > Eric is probably waiting for me to get back and chat as well.
>> > Will look to reply early next week.
>>
>> I wonder if we can make some progress in Feb? If so, we can start
>> to wrap up the iommufd uAPI patches for HWPT, which was a part of
>> intel's series but never got sent since their emulated series is
>> seemingly still pending?
>
>I think these are the  5 patches that we require from Intel pass-through series,
>
>vfio/iommufd: Implement [at|de]tach_hwpt handlers
>vfio/iommufd: Implement HostIOMMUDeviceClass::realize_late() handler
>HostIOMMUDevice: Introduce realize_late callback
>vfio/iommufd: Add properties and handlers to
>TYPE_HOST_IOMMU_DEVICE_IOMMUFD
>backends/iommufd: Add helpers for invalidating user-managed HWPT
>
>See the commits from here,
>https://github.com/hisilicon/qemu/commit/bbdc65af38fa5723f1bd9b026e29273
>0901f57b5
>
>[CC  Zhenzhong]
>
>Hi Zhenzhong,
>
>Just wondering what your plans are for the above patches.  If it make sense and
>you
>are fine with it, I think it is a good idea one of us can pick up those from that
>series
>and sent out separately so that it can get some review and take it forward.

Emulated series is merged, I plan to send Intel pass-through series after
Chinese festival vacation, but at least half a month later. So feel free to
pick those patches you need and send for comments.

Thanks
Zhenzhong

RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Duan, Zhenzhong 11 months, 3 weeks ago

Hi Shameer, Nicolin,

>-----Original Message-----
>From: Duan, Zhenzhong
>Subject: RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested
>SMMUv3
>
>Hi Shameer,
>
>>-----Original Message-----
>>From: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
>>Subject: RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
>nested
>>SMMUv3
>>
>>
>>
>>> -----Original Message-----
>>> From: Nicolin Chen <nicolinc@nvidia.com>
>>> Sent: Thursday, January 23, 2025 4:10 AM
>>> To: Donald Dutile <ddutile@redhat.com>
>>> Cc: Shameerali Kolothum Thodi
>>> <shameerali.kolothum.thodi@huawei.com>; eric.auger@redhat.com; Peter
>>> Maydell <peter.maydell@linaro.org>; Jason Gunthorpe <jgg@nvidia.com>;
>>> Daniel P. Berrangé <berrange@redhat.com>; qemu-arm@nongnu.org;
>>> qemu-devel@nongnu.org; Linuxarm <linuxarm@huawei.com>; Wangzhou
>>> (B) <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
>>> Jonathan Cameron <jonathan.cameron@huawei.com>;
>>> zhangfei.gao@linaro.org
>>> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
>>> nested SMMUv3
>>>
>>> Hi Don,
>>>
>>> On Fri, Jan 10, 2025 at 11:05:24PM -0500, Donald Dutile wrote:
>>> > On 1/8/25 11:45 PM, Nicolin Chen wrote:
>>> > > On Mon, Dec 16, 2024 at 10:01:29AM +0000, Shameerali Kolothum Thodi
>>> wrote:
>>> > > > And patches prior to this commit adds that support:
>>> > > > 4ccdbe3: ("cover-letter: Add HW accelerated nesting support for arm
>>> > > > SMMUv3")
>>> > > >
>>> > > > Nicolin is soon going to send out those for review. Or I can include
>>> > > > those in this series so that it gives a complete picture. Nicolin?
>>> > >
>>> > > Just found that I forgot to reply this one...sorry
>>> > >
>>> > > I asked Don/Eric to take over that vSMMU series:
>>> > > https://lore.kernel.org/qemu-devel/Zy0jiPItu8A3wNTL@Asurada-Nvidia/
>>> > > (The majority of my effort has been still on the kernel side:
>>> > >   previously vIOMMU/vDEVICE, and now vEVENTQ/MSI/vCMDQ..)
>>> > >
>>> > > Don/Eric, is there any update from your side?
>>> > >
>>> > Apologies for delayed response, been at customer site, and haven't been
>>> keeping up w/biz email.
>>> > Eric is probably waiting for me to get back and chat as well.
>>> > Will look to reply early next week.
>>>
>>> I wonder if we can make some progress in Feb? If so, we can start
>>> to wrap up the iommufd uAPI patches for HWPT, which was a part of
>>> intel's series but never got sent since their emulated series is
>>> seemingly still pending?
>>
>>I think these are the  5 patches that we require from Intel pass-through series,
>>
>>vfio/iommufd: Implement [at|de]tach_hwpt handlers
>>vfio/iommufd: Implement HostIOMMUDeviceClass::realize_late() handler
>>HostIOMMUDevice: Introduce realize_late callback
>>vfio/iommufd: Add properties and handlers to
>>TYPE_HOST_IOMMU_DEVICE_IOMMUFD
>>backends/iommufd: Add helpers for invalidating user-managed HWPT
>>
>>See the commits from here,
>>https://github.com/hisilicon/qemu/commit/bbdc65af38fa5723f1bd9b026e2927
>3
>>0901f57b5
>>
>>[CC  Zhenzhong]
>>
>>Hi Zhenzhong,
>>
>>Just wondering what your plans are for the above patches.  If it make sense and
>>you
>>are fine with it, I think it is a good idea one of us can pick up those from that
>>series
>>and sent out separately so that it can get some review and take it forward.
>
>Emulated series is merged, I plan to send Intel pass-through series after
>Chinese festival vacation, but at least half a month later. So feel free to
>pick those patches you need and send for comments.

I plan to send vtd nesting series out this week and want to ask about status
of "1) HWPT uAPI patches in backends/iommufd.c" series.

If you had sent it out, I will do a rebase and bypass them to avoid duplicate
review effort in community. Or I can send them in vtd nesting series if you not yet.

Thanks
Zhenzhong

RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Shameerali Kolothum Thodi via 11 months, 3 weeks ago

Hi Zhenzhong,

> -----Original Message-----
> From: Duan, Zhenzhong <zhenzhong.duan@intel.com>
> Sent: Monday, February 17, 2025 9:17 AM
> To: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; Nicolin Chen
> <nicolinc@nvidia.com>; Donald Dutile <ddutile@redhat.com>
> Cc: eric.auger@redhat.com; Peter Maydell <peter.maydell@linaro.org>;
> Jason Gunthorpe <jgg@nvidia.com>; Daniel P. Berrangé
> <berrange@redhat.com>; qemu-arm@nongnu.org; qemu-
> devel@nongnu.org; Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> Jonathan Cameron <jonathan.cameron@huawei.com>;
> zhangfei.gao@linaro.org; Peng, Chao P <chao.p.peng@intel.com>
> Subject: RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> nested SMMUv3
> 
> Hi Shameer, Nicolin,
> 
[...]

> >>Hi Zhenzhong,
> >>
> >>Just wondering what your plans are for the above patches.  If it make
> sense and
> >>you
> >>are fine with it, I think it is a good idea one of us can pick up those from
> that
> >>series
> >>and sent out separately so that it can get some review and take it
> forward.
> >
> >Emulated series is merged, I plan to send Intel pass-through series after
> >Chinese festival vacation, but at least half a month later. So feel free to
> >pick those patches you need and send for comments.
> 
> I plan to send vtd nesting series out this week and want to ask about status
> of "1) HWPT uAPI patches in backends/iommufd.c" series.
> 
> If you had sent it out, I will do a rebase and bypass them to avoid duplicate
> review effort in community. Or I can send them in vtd nesting series if you
> not yet.

No. It is not send out yet. Please include it in your vtd nesting series. Thanks.

I am currently working on refactoring the SMMUv3 accel series and the
"Add HW accelerated nesting support for arm SMMUv3" series
from Nicolin.

Thanks,
Shameer.

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Eric Auger 11 months, 1 week ago

Hi Shammeer,


On 2/18/25 7:52 AM, Shameerali Kolothum Thodi wrote:
> Hi Zhenzhong,
>
>> -----Original Message-----
>> From: Duan, Zhenzhong <zhenzhong.duan@intel.com>
>> Sent: Monday, February 17, 2025 9:17 AM
>> To: Shameerali Kolothum Thodi
>> <shameerali.kolothum.thodi@huawei.com>; Nicolin Chen
>> <nicolinc@nvidia.com>; Donald Dutile <ddutile@redhat.com>
>> Cc: eric.auger@redhat.com; Peter Maydell <peter.maydell@linaro.org>;
>> Jason Gunthorpe <jgg@nvidia.com>; Daniel P. Berrangé
>> <berrange@redhat.com>; qemu-arm@nongnu.org; qemu-
>> devel@nongnu.org; Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
>> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
>> Jonathan Cameron <jonathan.cameron@huawei.com>;
>> zhangfei.gao@linaro.org; Peng, Chao P <chao.p.peng@intel.com>
>> Subject: RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
>> nested SMMUv3
>>
>> Hi Shameer, Nicolin,
>>
> [...]
>
>>>> Hi Zhenzhong,
>>>>
>>>> Just wondering what your plans are for the above patches.  If it make
>> sense and
>>>> you
>>>> are fine with it, I think it is a good idea one of us can pick up those from
>> that
>>>> series
>>>> and sent out separately so that it can get some review and take it
>> forward.
>>> Emulated series is merged, I plan to send Intel pass-through series after
>>> Chinese festival vacation, but at least half a month later. So feel free to
>>> pick those patches you need and send for comments.
>> I plan to send vtd nesting series out this week and want to ask about status
>> of "1) HWPT uAPI patches in backends/iommufd.c" series.
>>
>> If you had sent it out, I will do a rebase and bypass them to avoid duplicate
>> review effort in community. Or I can send them in vtd nesting series if you
>> not yet.
> No. It is not send out yet. Please include it in your vtd nesting series. Thanks.
>
> I am currently working on refactoring the SMMUv3 accel series and the
> "Add HW accelerated nesting support for arm SMMUv3" series
so will you send "Add HW accelerated nesting support for arm SMMUv3" or
do you want me to do it? Thanks Eric
> from Nicolin.
>
> Thanks,
> Shameer.
>
>

RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Shameerali Kolothum Thodi via 11 months, 1 week ago


> -----Original Message-----
> From: Eric Auger <eric.auger@redhat.com>
> Sent: Thursday, March 6, 2025 6:00 PM
> To: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; Duan, Zhenzhong
> <zhenzhong.duan@intel.com>; Nicolin Chen <nicolinc@nvidia.com>;
> Donald Dutile <ddutile@redhat.com>
> Cc: Peter Maydell <peter.maydell@linaro.org>; Jason Gunthorpe
> <jgg@nvidia.com>; Daniel P. Berrangé <berrange@redhat.com>; qemu-
> arm@nongnu.org; qemu-devel@nongnu.org; Linuxarm
> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org; Peng, Chao P
> <chao.p.peng@intel.com>
> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> nested SMMUv3
> 
> Hi Shammeer,
> 

Hi Eric,

> >
> > I am currently working on refactoring the SMMUv3 accel series and the
> > "Add HW accelerated nesting support for arm SMMUv3" series
> so will you send "Add HW accelerated nesting support for arm SMMUv3" or
> do you want me to do it? Thanks Eric

Yes. I am on it. Hopefully I will be able to send out everything next week.

Thanks,
Shameer

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Eric Auger 11 months, 1 week ago

Hi Shameer,

On 3/6/25 7:27 PM, Shameerali Kolothum Thodi wrote:
>
>> -----Original Message-----
>> From: Eric Auger <eric.auger@redhat.com>
>> Sent: Thursday, March 6, 2025 6:00 PM
>> To: Shameerali Kolothum Thodi
>> <shameerali.kolothum.thodi@huawei.com>; Duan, Zhenzhong
>> <zhenzhong.duan@intel.com>; Nicolin Chen <nicolinc@nvidia.com>;
>> Donald Dutile <ddutile@redhat.com>
>> Cc: Peter Maydell <peter.maydell@linaro.org>; Jason Gunthorpe
>> <jgg@nvidia.com>; Daniel P. Berrangé <berrange@redhat.com>; qemu-
>> arm@nongnu.org; qemu-devel@nongnu.org; Linuxarm
>> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
>> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
>> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org; Peng, Chao P
>> <chao.p.peng@intel.com>
>> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
>> nested SMMUv3
>>
>> Hi Shammeer,
>>
> Hi Eric,
>
>>> I am currently working on refactoring the SMMUv3 accel series and the
>>> "Add HW accelerated nesting support for arm SMMUv3" series
>> so will you send "Add HW accelerated nesting support for arm SMMUv3" or
>> do you want me to do it? Thanks Eric
> Yes. I am on it. Hopefully I will be able to send out everything next week.
Sure. No pressure. I will continue reviewing Zhenzhong's series then.
Looking forward to seeing your respin.

Eric
>
> Thanks,
> Shameer

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Nicolin Chen 1 year ago

On Thu, Jan 23, 2025 at 08:28:34AM +0000, Shameerali Kolothum Thodi wrote:
> > -----Original Message-----
> > From: Nicolin Chen <nicolinc@nvidia.com>
> > I wonder if we can make some progress in Feb? If so, we can start
> > to wrap up the iommufd uAPI patches for HWPT, which was a part of
> > intel's series but never got sent since their emulated series is
> > seemingly still pending?
> 
> I think these are the  5 patches that we require from Intel pass-through series,
> 
> vfio/iommufd: Implement [at|de]tach_hwpt handlers
> vfio/iommufd: Implement HostIOMMUDeviceClass::realize_late() handler
> HostIOMMUDevice: Introduce realize_late callback
> vfio/iommufd: Add properties and handlers to TYPE_HOST_IOMMU_DEVICE_IOMMUFD
> backends/iommufd: Add helpers for invalidating user-managed HWPT
 
> Hi Zhenzhong,
> 
> Just wondering what your plans are for the above patches.  If it make sense and you
> are fine with it, I think it is a good idea one of us can pick up those from that series
> and sent out separately so that it can get some review and take it forward.

+1

These uAPI/backend patches can be sent in a smaller series to
get reviewed prior to the intel/arm series. It can merge with
either of the intel/arm series that runs faster at the end of
the day :)

Nicolin

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Daniel P. Berrangé 1 year, 1 month ago

On Fri, Dec 13, 2024 at 08:46:42AM -0400, Jason Gunthorpe wrote:
> On Fri, Dec 13, 2024 at 12:00:43PM +0000, Daniel P. Berrangé wrote:
> > On Fri, Nov 08, 2024 at 12:52:37PM +0000, Shameer Kolothum via wrote:
> > > Hi,
> > > 
> > > This series adds initial support for a user-creatable "arm-smmuv3-nested"
> > > device to Qemu. At present the Qemu ARM SMMUv3 emulation is per machine
> > > and cannot support multiple SMMUv3s.
> > > 
> > > In order to support vfio-pci dev assignment with vSMMUv3, the physical
> > > SMMUv3 has to be configured in nested mode. Having a pluggable
> > > "arm-smmuv3-nested" device enables us to have multiple vSMMUv3 for Guests
> > > running on a host with multiple physical SMMUv3s. A few benefits of doing
> > > this are,
> > 
> > I'm not very familiar with arm, but from this description I'm not
> > really seeing how "nesting" is involved here. You're only talking
> > about the host and 1 L1 guest, no L2 guest.
> 
> nesting is the term the iommu side is using to refer to the 2
> dimensional paging, ie a guest page table on top of a hypervisor page
> table.
> 
> Nothing to do with vm nesting.

Ok, that naming is destined to cause confusion for many, given the
commonly understood use of 'nesting' in the context of VMs...

>  
> > Also what is the relation between the physical SMMUv3 and the guest
> > SMMUv3 that's referenced ? Is this in fact some form of host device
> > passthrough rather than nesting ?
> 
> It is an acceeleration feature, the iommu HW does more work instead of
> the software emulating things. Similar to how the 2d paging option in
> KVM is an acceleration feature.
> 
> All of the iommu series on vfio are creating paravirtualized iommu
> models inside the VM. They access various levels of HW acceleration to
> speed up the paravirtualization.

... describing it as a HW accelerated iommu makes it significantly clearer
to me what this proposal is about. Perhaps the device is better named as
"arm-smmuv3-accel" ?


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Donald Dutile 1 year, 1 month ago


On 12/13/24 8:19 AM, Daniel P. Berrangé wrote:
> On Fri, Dec 13, 2024 at 08:46:42AM -0400, Jason Gunthorpe wrote:
>> On Fri, Dec 13, 2024 at 12:00:43PM +0000, Daniel P. Berrangé wrote:
>>> On Fri, Nov 08, 2024 at 12:52:37PM +0000, Shameer Kolothum via wrote:
>>>> Hi,
>>>>
>>>> This series adds initial support for a user-creatable "arm-smmuv3-nested"
>>>> device to Qemu. At present the Qemu ARM SMMUv3 emulation is per machine
>>>> and cannot support multiple SMMUv3s.
>>>>
>>>> In order to support vfio-pci dev assignment with vSMMUv3, the physical
>>>> SMMUv3 has to be configured in nested mode. Having a pluggable
>>>> "arm-smmuv3-nested" device enables us to have multiple vSMMUv3 for Guests
>>>> running on a host with multiple physical SMMUv3s. A few benefits of doing
>>>> this are,
>>>
>>> I'm not very familiar with arm, but from this description I'm not
>>> really seeing how "nesting" is involved here. You're only talking
>>> about the host and 1 L1 guest, no L2 guest.
>>
>> nesting is the term the iommu side is using to refer to the 2
>> dimensional paging, ie a guest page table on top of a hypervisor page
>> table.
>>
>> Nothing to do with vm nesting.
> 
> Ok, that naming is destined to cause confusion for many, given the
> commonly understood use of 'nesting' in the context of VMs...
> 
>>   
>>> Also what is the relation between the physical SMMUv3 and the guest
>>> SMMUv3 that's referenced ? Is this in fact some form of host device
>>> passthrough rather than nesting ?
>>
>> It is an acceeleration feature, the iommu HW does more work instead of
>> the software emulating things. Similar to how the 2d paging option in
>> KVM is an acceleration feature.
>>
>> All of the iommu series on vfio are creating paravirtualized iommu
>> models inside the VM. They access various levels of HW acceleration to
>> speed up the paravirtualization.
> 
> ... describing it as a HW accelerated iommu makes it significantly clearer
> to me what this proposal is about. Perhaps the device is better named as
> "arm-smmuv3-accel" ?
> 
I'm having deja-vu! ;-)
Thanks for echo-ing my earlier statements in this patch series about the use of 'nested'.
and the better use of 'accel' in these circumstances.
Even 'accel' on an 'arm-smmuv3' is a bit of a hammer, as there can be multiple accel's features
&/or implementations... I would like to see the 'accel' as a parameter to 'arm-smmuv3', and not
a complete name-space onto itself, so we can do things like 'accel=cmdvq', accel='2-level', ...

and for libvirt's sanity, a way to get those hw features from sysfs for
(possible) migration-compatibility testing.

> 
> With regards,
> Daniel

RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Shameerali Kolothum Thodi via 1 year, 1 month ago


> -----Original Message-----
> From: Daniel P. Berrangé <berrange@redhat.com>
> Sent: Friday, December 13, 2024 1:20 PM
> To: Jason Gunthorpe <jgg@nvidia.com>
> Cc: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
> qemu-devel@nongnu.org; eric.auger@redhat.com;
> peter.maydell@linaro.org; nicolinc@nvidia.com; ddutile@redhat.com;
> Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> Jonathan Cameron <jonathan.cameron@huawei.com>;
> zhangfei.gao@linaro.org
> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> nested SMMUv3
> 
> On Fri, Dec 13, 2024 at 08:46:42AM -0400, Jason Gunthorpe wrote:
> > On Fri, Dec 13, 2024 at 12:00:43PM +0000, Daniel P. Berrangé wrote:
> > > On Fri, Nov 08, 2024 at 12:52:37PM +0000, Shameer Kolothum via wrote:
> > > > Hi,
> > > >
> > > > This series adds initial support for a user-creatable "arm-smmuv3-
> nested"
> > > > device to Qemu. At present the Qemu ARM SMMUv3 emulation is per
> machine
> > > > and cannot support multiple SMMUv3s.
> > > >
> > > > In order to support vfio-pci dev assignment with vSMMUv3, the
> physical
> > > > SMMUv3 has to be configured in nested mode. Having a pluggable
> > > > "arm-smmuv3-nested" device enables us to have multiple vSMMUv3
> for Guests
> > > > running on a host with multiple physical SMMUv3s. A few benefits of
> doing
> > > > this are,
> > >
> > > I'm not very familiar with arm, but from this description I'm not
> > > really seeing how "nesting" is involved here. You're only talking
> > > about the host and 1 L1 guest, no L2 guest.
> >
> > nesting is the term the iommu side is using to refer to the 2
> > dimensional paging, ie a guest page table on top of a hypervisor page
> > table.
> >
> > Nothing to do with vm nesting.
> 
> Ok, that naming is destined to cause confusion for many, given the
> commonly understood use of 'nesting' in the context of VMs...
> 
> >
> > > Also what is the relation between the physical SMMUv3 and the guest
> > > SMMUv3 that's referenced ? Is this in fact some form of host device
> > > passthrough rather than nesting ?
> >
> > It is an acceeleration feature, the iommu HW does more work instead of
> > the software emulating things. Similar to how the 2d paging option in
> > KVM is an acceleration feature.
> >
> > All of the iommu series on vfio are creating paravirtualized iommu
> > models inside the VM. They access various levels of HW acceleration to
> > speed up the paravirtualization.
> 
> ... describing it as a HW accelerated iommu makes it significantly clearer
> to me what this proposal is about. Perhaps the device is better named as
> "arm-smmuv3-accel" ?

Agree. There were similar previous comments from reviewers that current smmuv3 
already has emulated stage 1 and stage 2 support and refers to that as "nested"
in code. So this will be renamed as above. 

Thanks,
Shameer

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Eric Auger 1 year, 2 months ago

Hi Shameer,

On 11/8/24 13:52, Shameer Kolothum wrote:
> Hi,
>
> This series adds initial support for a user-creatable "arm-smmuv3-nested"
> device to Qemu. At present the Qemu ARM SMMUv3 emulation is per machine
> and cannot support multiple SMMUv3s.
>
> In order to support vfio-pci dev assignment with vSMMUv3, the physical
> SMMUv3 has to be configured in nested mode. Having a pluggable
> "arm-smmuv3-nested" device enables us to have multiple vSMMUv3 for Guests
> running on a host with multiple physical SMMUv3s. A few benefits of doing
> this are,
>
> 1. Avoid invalidation broadcast or lookup in case devices are behind
>    multiple phys SMMUv3s.
> 2. Makes it easy to handle phys SMMUv3s that differ in features.
> 3. Easy to handle future requirements such as vCMDQ support.
>
> This is based on discussions/suggestions received for a previous RFC by
> Nicolin here[0].
>
> This series includes,
>  -Adds support for "arm-smmuv3-nested" device. At present only virt is
>   supported and is using _plug_cb() callback to hook the sysbus mem
>   and irq (Not sure this has any negative repercussions). Patch #3.
>  -Provides a way to associate a pci-bus(pxb-pcie) to the above device.
>   Patch #3.
>  -The last patch is adding RMR support for MSI doorbell handling. Patch #5.
>   This may change in future[1].
>
> This RFC is for initial discussion/test purposes only and includes patches
> that are only relevant for adding the "arm-smmuv3-nested" support. For the
> complete branch please find,
> https://github.com/hisilicon/qemu/tree/private-smmuv3-nested-dev-rfc-v1
>
> Few ToDos to note,
> 1. At present default-bus-bypass-iommu=on should be set when
>    arm-smmuv3-nested dev is specified. Otherwise you may get an IORT
>    related boot error.  Requires fixing.
> 2. Hot adding a device is not working at the moment. Looks like pcihp irq issue.
>    Could be a bug in IORT id mappings.
> 3. The above branch doesn't support vSVA yet.
>
> Hopefully this is helpful in taking the discussion forward. Please take a
> look and let me know.
>
> How to use it(Eg:):
>
> On a HiSilicon platform that has multiple physical SMMUv3s, the ACC ZIP VF
> devices and HNS VF devices are behind different SMMUv3s. So for a Guest,
> specify two smmuv3-nested devices each behind a pxb-pcie as below,
>
> ./qemu-system-aarch64 -machine virt,gic-version=3,default-bus-bypass-iommu=on \
> -enable-kvm -cpu host -m 4G -smp cpus=8,maxcpus=8 \
> -object iommufd,id=iommufd0 \
> -bios QEMU_EFI.fd \
> -kernel Image \
> -device virtio-blk-device,drive=fs \
> -drive if=none,file=rootfs.qcow2,id=fs \
> -device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \
> -device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1 \
> -device arm-smmuv3-nested,id=smmuv1,pci-bus=pcie.1 \
> -device vfio-pci,host=0000:7d:02.1,bus=pcie.port1,iommufd=iommufd0 \
> -device pxb-pcie,id=pcie.2,bus_nr=16,bus=pcie.0 \
> -device pcie-root-port,id=pcie.port2,bus=pcie.2,chassis=2 \
> -device arm-smmuv3-nested,id=smmuv2,pci-bus=pcie.2 \
> -device vfio-pci,host=0000:75:00.1,bus=pcie.port2,iommufd=iommufd0 \
> -append "rdinit=init console=ttyAMA0 root=/dev/vda2 rw earlycon=pl011,0x9000000" \
> -device virtio-9p-pci,fsdev=p9fs2,mount_tag=p9,bus=pcie.0 \
> -fsdev local,id=p9fs2,path=p9root,security_model=mapped \
This kind of instantiation matches what I had in mind. It is
questionable whether the legacy SMMU shouldn't be migrated to that mode
too (instead of using a machine option setting), depending on Peter's
feedbacks and also comments from Libvirt guys. Adding Andrea in the loop.

Thanks

Eric
> -net none \
> -nographic
>
> Guest will boot with two SMMuv3s,
> [    1.608130] arm-smmu-v3 arm-smmu-v3.0.auto: option mask 0x0
> [    1.609655] arm-smmu-v3 arm-smmu-v3.0.auto: ias 48-bit, oas 48-bit (features 0x00020b25)
> [    1.612475] arm-smmu-v3 arm-smmu-v3.0.auto: allocated 65536 entries for cmdq
> [    1.614444] arm-smmu-v3 arm-smmu-v3.0.auto: allocated 32768 entries for evtq
> [    1.617451] arm-smmu-v3 arm-smmu-v3.1.auto: option mask 0x0
> [    1.618842] arm-smmu-v3 arm-smmu-v3.1.auto: ias 48-bit, oas 48-bit (features 0x00020b25)
> [    1.621366] arm-smmu-v3 arm-smmu-v3.1.auto: allocated 65536 entries for cmdq
> [    1.623225] arm-smmu-v3 arm-smmu-v3.1.auto: allocated 32768 entries for evtq
>
> With a pci topology like below,
> [root@localhost ~]# lspci -tv
> -+-[0000:00]-+-00.0  Red Hat, Inc. QEMU PCIe Host bridge
>  |           +-01.0  Red Hat, Inc. QEMU PCIe Expander bridge
>  |           +-02.0  Red Hat, Inc. QEMU PCIe Expander bridge
>  |           \-03.0  Virtio: Virtio filesystem
>  +-[0000:08]---00.0-[09]----00.0  Huawei Technologies Co., Ltd. HNS Network Controller (Virtual Function)
>  \-[0000:10]---00.0-[11]----00.0  Huawei Technologies Co., Ltd. HiSilicon ZIP Engine(Virtual Function)
> [root@localhost ~]#
>
> And if you want to add another HNS VF, it should be added to the same SMMUv3
> as of the first HNS dev,
>
> -device pcie-root-port,id=pcie.port3,bus=pcie.1,chassis=3 \
> -device vfio-pci,host=0000:7d:02.2,bus=pcie.port3,iommufd=iommufd0 \
>
> [root@localhost ~]# lspci -tv
> -+-[0000:00]-+-00.0  Red Hat, Inc. QEMU PCIe Host bridge
>  |           +-01.0  Red Hat, Inc. QEMU PCIe Expander bridge
>  |           +-02.0  Red Hat, Inc. QEMU PCIe Expander bridge
>  |           \-03.0  Virtio: Virtio filesystem
>  +-[0000:08]-+-00.0-[09]----00.0  Huawei Technologies Co., Ltd. HNS Network Controller (Virtual Function)
>  |           \-01.0-[0a]----00.0  Huawei Technologies Co., Ltd. HNS Network Controller (Virtual Function)
>  \-[0000:10]---00.0-[11]----00.0  Huawei Technologies Co., Ltd. HiSilicon ZIP Engine(Virtual Function)
> [root@localhost ~]#
>
> Attempt to add the HNS VF to a different SMMUv3 will result in,
>
> -device vfio-pci,host=0000:7d:02.2,bus=pcie.port3,iommufd=iommufd0: Unable to attach viommu
> -device vfio-pci,host=0000:7d:02.2,bus=pcie.port3,iommufd=iommufd0: vfio 0000:7d:02.2:
>    Failed to set iommu_device: [iommufd=29] error attach 0000:7d:02.2 (38) to id=11: Invalid argument
>
> At present Qemu is not doing any extra validation other than the above
> failure to make sure the user configuration is correct or not. The
> assumption is libvirt will take care of this.
>
> Thanks,
> Shameer
> [0] https://lore.kernel.org/qemu-devel/cover.1719361174.git.nicolinc@nvidia.com/
> [1] https://lore.kernel.org/linux-iommu/ZrVN05VylFq8lK4q@Asurada-Nvidia/
>
> Eric Auger (1):
>   hw/arm/virt-acpi-build: Add IORT RMR regions to handle MSI nested
>     binding
>
> Nicolin Chen (2):
>   hw/arm/virt: Add an SMMU_IO_LEN macro
>   hw/arm/virt-acpi-build: Build IORT with multiple SMMU nodes
>
> Shameer Kolothum (2):
>   hw/arm/smmuv3: Add initial support for SMMUv3 Nested device
>   hw/arm/smmuv3: Associate a pci bus with a SMMUv3 Nested device
>
>  hw/arm/smmuv3.c          |  61 ++++++++++++++++++++++
>  hw/arm/virt-acpi-build.c | 109 ++++++++++++++++++++++++++++++++-------
>  hw/arm/virt.c            |  33 ++++++++++--
>  hw/core/sysbus-fdt.c     |   1 +
>  include/hw/arm/smmuv3.h  |  17 ++++++
>  include/hw/arm/virt.h    |  15 ++++++
>  6 files changed, 215 insertions(+), 21 deletions(-)
>

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Daniel P. Berrangé 1 year ago

On Mon, Nov 18, 2024 at 11:50:46AM +0100, Eric Auger wrote:
> Hi Shameer,
> 
> On 11/8/24 13:52, Shameer Kolothum wrote:
> > Hi,
> >
> > This series adds initial support for a user-creatable "arm-smmuv3-nested"
> > device to Qemu. At present the Qemu ARM SMMUv3 emulation is per machine
> > and cannot support multiple SMMUv3s.
> >
> > In order to support vfio-pci dev assignment with vSMMUv3, the physical
> > SMMUv3 has to be configured in nested mode. Having a pluggable
> > "arm-smmuv3-nested" device enables us to have multiple vSMMUv3 for Guests
> > running on a host with multiple physical SMMUv3s. A few benefits of doing
> > this are,
> >
> > 1. Avoid invalidation broadcast or lookup in case devices are behind
> >    multiple phys SMMUv3s.
> > 2. Makes it easy to handle phys SMMUv3s that differ in features.
> > 3. Easy to handle future requirements such as vCMDQ support.
> >
> > This is based on discussions/suggestions received for a previous RFC by
> > Nicolin here[0].
> >
> > This series includes,
> >  -Adds support for "arm-smmuv3-nested" device. At present only virt is
> >   supported and is using _plug_cb() callback to hook the sysbus mem
> >   and irq (Not sure this has any negative repercussions). Patch #3.
> >  -Provides a way to associate a pci-bus(pxb-pcie) to the above device.
> >   Patch #3.
> >  -The last patch is adding RMR support for MSI doorbell handling. Patch #5.
> >   This may change in future[1].
> >
> > This RFC is for initial discussion/test purposes only and includes patches
> > that are only relevant for adding the "arm-smmuv3-nested" support. For the
> > complete branch please find,
> > https://github.com/hisilicon/qemu/tree/private-smmuv3-nested-dev-rfc-v1
> >
> > Few ToDos to note,
> > 1. At present default-bus-bypass-iommu=on should be set when
> >    arm-smmuv3-nested dev is specified. Otherwise you may get an IORT
> >    related boot error.  Requires fixing.
> > 2. Hot adding a device is not working at the moment. Looks like pcihp irq issue.
> >    Could be a bug in IORT id mappings.
> > 3. The above branch doesn't support vSVA yet.
> >
> > Hopefully this is helpful in taking the discussion forward. Please take a
> > look and let me know.
> >
> > How to use it(Eg:):
> >
> > On a HiSilicon platform that has multiple physical SMMUv3s, the ACC ZIP VF
> > devices and HNS VF devices are behind different SMMUv3s. So for a Guest,
> > specify two smmuv3-nested devices each behind a pxb-pcie as below,
> >
> > ./qemu-system-aarch64 -machine virt,gic-version=3,default-bus-bypass-iommu=on \
> > -enable-kvm -cpu host -m 4G -smp cpus=8,maxcpus=8 \
> > -object iommufd,id=iommufd0 \
> > -bios QEMU_EFI.fd \
> > -kernel Image \
> > -device virtio-blk-device,drive=fs \
> > -drive if=none,file=rootfs.qcow2,id=fs \
> > -device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \
> > -device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1 \
> > -device arm-smmuv3-nested,id=smmuv1,pci-bus=pcie.1 \
> > -device vfio-pci,host=0000:7d:02.1,bus=pcie.port1,iommufd=iommufd0 \
> > -device pxb-pcie,id=pcie.2,bus_nr=16,bus=pcie.0 \
> > -device pcie-root-port,id=pcie.port2,bus=pcie.2,chassis=2 \
> > -device arm-smmuv3-nested,id=smmuv2,pci-bus=pcie.2 \
> > -device vfio-pci,host=0000:75:00.1,bus=pcie.port2,iommufd=iommufd0 \
> > -append "rdinit=init console=ttyAMA0 root=/dev/vda2 rw earlycon=pl011,0x9000000" \
> > -device virtio-9p-pci,fsdev=p9fs2,mount_tag=p9,bus=pcie.0 \
> > -fsdev local,id=p9fs2,path=p9root,security_model=mapped \
> This kind of instantiation matches what I had in mind. It is
> questionable whether the legacy SMMU shouldn't be migrated to that mode
> too (instead of using a machine option setting), depending on Peter's
> feedbacks and also comments from Libvirt guys. Adding Andrea in the loop.

Yeah, looking at the current config I'm pretty surprised to see it
configured with '-machine virt,iommu=ssmuv3', where 'smmuv3' is a
type name. This effectively a back-door reinvention of the '-device'
arg.

I think it'd make more sense to deprecate the 'iommu' property
on the machine, and allow '-device ssmu3,pci-bus=pcie.0' to
associate the IOMMU with the PCI root bus, so we have consistent
approaches for all SMMU impls.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Nicolin Chen 1 year, 2 months ago

On Fri, Nov 08, 2024 at 12:52:37PM +0000, Shameer Kolothum wrote:
> This RFC is for initial discussion/test purposes only and includes patches
> that are only relevant for adding the "arm-smmuv3-nested" support. For the
> complete branch please find,
> https://github.com/hisilicon/qemu/commits/private-smmuv3-nested-dev-rfc-v1/

I guess the QEMU branch above pairs with this (vIOMMU v6)?
https://github.com/nicolinc/iommufd/commits/smmuv3_nesting-with-rmr

Thanks
Nicolin

RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Shameerali Kolothum Thodi via 1 year, 2 months ago


> -----Original Message-----
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Wednesday, November 13, 2024 9:43 PM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
> ddutile@redhat.com; Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> Jonathan Cameron <jonathan.cameron@huawei.com>;
> zhangfei.gao@linaro.org
> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> nested SMMUv3
> 
> On Fri, Nov 08, 2024 at 12:52:37PM +0000, Shameer Kolothum wrote:
> > This RFC is for initial discussion/test purposes only and includes
> > patches that are only relevant for adding the "arm-smmuv3-nested"
> > support. For the complete branch please find,
> > https://github.com/hisilicon/qemu/commits/private-smmuv3-nested-dev-
> rf
> > c-v1/
> 
> I guess the QEMU branch above pairs with this (vIOMMU v6)?
> https://github.com/nicolinc/iommufd/commits/smmuv3_nesting-with-rmr

I actually based it on top of a kernel branch that Zhangfei is keeping for his verification tests.
https://github.com/Linaro/linux-kernel-uadk/commits/6.12-wip-10.26/

But yes, it indeed looks like based on the branch you mentioned above.

Thanks,
Shameer.

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Mostafa Saleh 1 year, 2 months ago

Hi Shameer,

On Fri, Nov 08, 2024 at 12:52:37PM +0000, Shameer Kolothum via wrote:
> Hi,
> 
> This series adds initial support for a user-creatable "arm-smmuv3-nested"
> device to Qemu. At present the Qemu ARM SMMUv3 emulation is per machine
> and cannot support multiple SMMUv3s.
> 

I had a quick look at the SMMUv3 files, as now SMMUv3 supports nested
translation emulation, would it make sense to rename this? As AFAIU,
this is about virt (stage-1) SMMUv3 that is emulated to a guest.
Including vSMMU or virt would help distinguish the code, as now
some new function as smmu_nested_realize() looks confusing.

Thanks,
Mostafa

> In order to support vfio-pci dev assignment with vSMMUv3, the physical
> SMMUv3 has to be configured in nested mode. Having a pluggable
> "arm-smmuv3-nested" device enables us to have multiple vSMMUv3 for Guests
> running on a host with multiple physical SMMUv3s. A few benefits of doing
> this are,
> 
> 1. Avoid invalidation broadcast or lookup in case devices are behind
>    multiple phys SMMUv3s.
> 2. Makes it easy to handle phys SMMUv3s that differ in features.
> 3. Easy to handle future requirements such as vCMDQ support.
> 
> This is based on discussions/suggestions received for a previous RFC by
> Nicolin here[0].
> 
> This series includes,
>  -Adds support for "arm-smmuv3-nested" device. At present only virt is
>   supported and is using _plug_cb() callback to hook the sysbus mem
>   and irq (Not sure this has any negative repercussions). Patch #3.
>  -Provides a way to associate a pci-bus(pxb-pcie) to the above device.
>   Patch #3.
>  -The last patch is adding RMR support for MSI doorbell handling. Patch #5.
>   This may change in future[1].
> 
> This RFC is for initial discussion/test purposes only and includes patches
> that are only relevant for adding the "arm-smmuv3-nested" support. For the
> complete branch please find,
> https://github.com/hisilicon/qemu/tree/private-smmuv3-nested-dev-rfc-v1
> 
> Few ToDos to note,
> 1. At present default-bus-bypass-iommu=on should be set when
>    arm-smmuv3-nested dev is specified. Otherwise you may get an IORT
>    related boot error.  Requires fixing.
> 2. Hot adding a device is not working at the moment. Looks like pcihp irq issue.
>    Could be a bug in IORT id mappings.
> 3. The above branch doesn't support vSVA yet.
> 
> Hopefully this is helpful in taking the discussion forward. Please take a
> look and let me know.
> 
> How to use it(Eg:):
> 
> On a HiSilicon platform that has multiple physical SMMUv3s, the ACC ZIP VF
> devices and HNS VF devices are behind different SMMUv3s. So for a Guest,
> specify two smmuv3-nested devices each behind a pxb-pcie as below,
> 
> ./qemu-system-aarch64 -machine virt,gic-version=3,default-bus-bypass-iommu=on \
> -enable-kvm -cpu host -m 4G -smp cpus=8,maxcpus=8 \
> -object iommufd,id=iommufd0 \
> -bios QEMU_EFI.fd \
> -kernel Image \
> -device virtio-blk-device,drive=fs \
> -drive if=none,file=rootfs.qcow2,id=fs \
> -device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \
> -device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1 \
> -device arm-smmuv3-nested,id=smmuv1,pci-bus=pcie.1 \
> -device vfio-pci,host=0000:7d:02.1,bus=pcie.port1,iommufd=iommufd0 \
> -device pxb-pcie,id=pcie.2,bus_nr=16,bus=pcie.0 \
> -device pcie-root-port,id=pcie.port2,bus=pcie.2,chassis=2 \
> -device arm-smmuv3-nested,id=smmuv2,pci-bus=pcie.2 \
> -device vfio-pci,host=0000:75:00.1,bus=pcie.port2,iommufd=iommufd0 \
> -append "rdinit=init console=ttyAMA0 root=/dev/vda2 rw earlycon=pl011,0x9000000" \
> -device virtio-9p-pci,fsdev=p9fs2,mount_tag=p9,bus=pcie.0 \
> -fsdev local,id=p9fs2,path=p9root,security_model=mapped \
> -net none \
> -nographic
> 
> Guest will boot with two SMMuv3s,
> [    1.608130] arm-smmu-v3 arm-smmu-v3.0.auto: option mask 0x0
> [    1.609655] arm-smmu-v3 arm-smmu-v3.0.auto: ias 48-bit, oas 48-bit (features 0x00020b25)
> [    1.612475] arm-smmu-v3 arm-smmu-v3.0.auto: allocated 65536 entries for cmdq
> [    1.614444] arm-smmu-v3 arm-smmu-v3.0.auto: allocated 32768 entries for evtq
> [    1.617451] arm-smmu-v3 arm-smmu-v3.1.auto: option mask 0x0
> [    1.618842] arm-smmu-v3 arm-smmu-v3.1.auto: ias 48-bit, oas 48-bit (features 0x00020b25)
> [    1.621366] arm-smmu-v3 arm-smmu-v3.1.auto: allocated 65536 entries for cmdq
> [    1.623225] arm-smmu-v3 arm-smmu-v3.1.auto: allocated 32768 entries for evtq
> 
> With a pci topology like below,
> [root@localhost ~]# lspci -tv
> -+-[0000:00]-+-00.0  Red Hat, Inc. QEMU PCIe Host bridge
>  |           +-01.0  Red Hat, Inc. QEMU PCIe Expander bridge
>  |           +-02.0  Red Hat, Inc. QEMU PCIe Expander bridge
>  |           \-03.0  Virtio: Virtio filesystem
>  +-[0000:08]---00.0-[09]----00.0  Huawei Technologies Co., Ltd. HNS Network Controller (Virtual Function)
>  \-[0000:10]---00.0-[11]----00.0  Huawei Technologies Co., Ltd. HiSilicon ZIP Engine(Virtual Function)
> [root@localhost ~]#
> 
> And if you want to add another HNS VF, it should be added to the same SMMUv3
> as of the first HNS dev,
> 
> -device pcie-root-port,id=pcie.port3,bus=pcie.1,chassis=3 \
> -device vfio-pci,host=0000:7d:02.2,bus=pcie.port3,iommufd=iommufd0 \
> 
> [root@localhost ~]# lspci -tv
> -+-[0000:00]-+-00.0  Red Hat, Inc. QEMU PCIe Host bridge
>  |           +-01.0  Red Hat, Inc. QEMU PCIe Expander bridge
>  |           +-02.0  Red Hat, Inc. QEMU PCIe Expander bridge
>  |           \-03.0  Virtio: Virtio filesystem
>  +-[0000:08]-+-00.0-[09]----00.0  Huawei Technologies Co., Ltd. HNS Network Controller (Virtual Function)
>  |           \-01.0-[0a]----00.0  Huawei Technologies Co., Ltd. HNS Network Controller (Virtual Function)
>  \-[0000:10]---00.0-[11]----00.0  Huawei Technologies Co., Ltd. HiSilicon ZIP Engine(Virtual Function)
> [root@localhost ~]#
> 
> Attempt to add the HNS VF to a different SMMUv3 will result in,
> 
> -device vfio-pci,host=0000:7d:02.2,bus=pcie.port3,iommufd=iommufd0: Unable to attach viommu
> -device vfio-pci,host=0000:7d:02.2,bus=pcie.port3,iommufd=iommufd0: vfio 0000:7d:02.2:
>    Failed to set iommu_device: [iommufd=29] error attach 0000:7d:02.2 (38) to id=11: Invalid argument
> 
> At present Qemu is not doing any extra validation other than the above
> failure to make sure the user configuration is correct or not. The
> assumption is libvirt will take care of this.
> 
> Thanks,
> Shameer
> [0] https://lore.kernel.org/qemu-devel/cover.1719361174.git.nicolinc@nvidia.com/
> [1] https://lore.kernel.org/linux-iommu/ZrVN05VylFq8lK4q@Asurada-Nvidia/
> 
> Eric Auger (1):
>   hw/arm/virt-acpi-build: Add IORT RMR regions to handle MSI nested
>     binding
> 
> Nicolin Chen (2):
>   hw/arm/virt: Add an SMMU_IO_LEN macro
>   hw/arm/virt-acpi-build: Build IORT with multiple SMMU nodes
> 
> Shameer Kolothum (2):
>   hw/arm/smmuv3: Add initial support for SMMUv3 Nested device
>   hw/arm/smmuv3: Associate a pci bus with a SMMUv3 Nested device
> 
>  hw/arm/smmuv3.c          |  61 ++++++++++++++++++++++
>  hw/arm/virt-acpi-build.c | 109 ++++++++++++++++++++++++++++++++-------
>  hw/arm/virt.c            |  33 ++++++++++--
>  hw/core/sysbus-fdt.c     |   1 +
>  include/hw/arm/smmuv3.h  |  17 ++++++
>  include/hw/arm/virt.h    |  15 ++++++
>  6 files changed, 215 insertions(+), 21 deletions(-)
> 
> -- 
> 2.34.1
> 
>

RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Shameerali Kolothum Thodi via 1 year, 2 months ago

Hi Mostafa,

> -----Original Message-----
> From: Mostafa Saleh <smostafa@google.com>
> Sent: Wednesday, November 13, 2024 4:17 PM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
> nicolinc@nvidia.com; ddutile@redhat.com; Linuxarm
> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> nested SMMUv3
> 
> Hi Shameer,
> 
> On Fri, Nov 08, 2024 at 12:52:37PM +0000, Shameer Kolothum via wrote:
> > Hi,
> >
> > This series adds initial support for a user-creatable "arm-smmuv3-nested"
> > device to Qemu. At present the Qemu ARM SMMUv3 emulation is per
> machine
> > and cannot support multiple SMMUv3s.
> >
> 
> I had a quick look at the SMMUv3 files, as now SMMUv3 supports nested
> translation emulation, would it make sense to rename this? As AFAIU,
> this is about virt (stage-1) SMMUv3 that is emulated to a guest.
> Including vSMMU or virt would help distinguish the code, as now
> some new function as smmu_nested_realize() looks confusing.

Yes. I have noticed that. We need to call it something else to avoid the 
confusion. Not sure including "virt" is a good idea as it may indicate virt
machine. Probably "acc" as Nicolin suggested to indicate hw accelerated. 
I will think about a better one. Open to suggestions.

Thanks,
Shameer

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Mostafa Saleh 1 year, 2 months ago

Hi Shameer,

On Thu, Nov 14, 2024 at 08:01:28AM +0000, Shameerali Kolothum Thodi wrote:
> Hi Mostafa,
> 
> > -----Original Message-----
> > From: Mostafa Saleh <smostafa@google.com>
> > Sent: Wednesday, November 13, 2024 4:17 PM
> > To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> > Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> > eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
> > nicolinc@nvidia.com; ddutile@redhat.com; Linuxarm
> > <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
> > jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
> > <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
> > Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> > nested SMMUv3
> > 
> > Hi Shameer,
> > 
> > On Fri, Nov 08, 2024 at 12:52:37PM +0000, Shameer Kolothum via wrote:
> > > Hi,
> > >
> > > This series adds initial support for a user-creatable "arm-smmuv3-nested"
> > > device to Qemu. At present the Qemu ARM SMMUv3 emulation is per
> > machine
> > > and cannot support multiple SMMUv3s.
> > >
> > 
> > I had a quick look at the SMMUv3 files, as now SMMUv3 supports nested
> > translation emulation, would it make sense to rename this? As AFAIU,
> > this is about virt (stage-1) SMMUv3 that is emulated to a guest.
> > Including vSMMU or virt would help distinguish the code, as now
> > some new function as smmu_nested_realize() looks confusing.
> 
> Yes. I have noticed that. We need to call it something else to avoid the 
> confusion. Not sure including "virt" is a good idea as it may indicate virt
> machine. Probably "acc" as Nicolin suggested to indicate hw accelerated. 
> I will think about a better one. Open to suggestions.

"acc" sounds good to me, also if possible we can have smmuv3-acc.c where
it has all the specific logic, and the main file just calls into it.

Thanks,
Mostafa

> 
> Thanks,
> Shameer
>

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Nicolin Chen 1 year, 2 months ago

On Fri, Nov 08, 2024 at 12:52:37PM +0000, Shameer Kolothum wrote:
> Few ToDos to note,
> 1. At present default-bus-bypass-iommu=on should be set when
>    arm-smmuv3-nested dev is specified. Otherwise you may get an IORT
>    related boot error.  Requires fixing.
> 2. Hot adding a device is not working at the moment. Looks like pcihp irq issue.
>    Could be a bug in IORT id mappings.

Do we have enough bus number space for each pbx bus in IORT?

The bus range is defined by min_/max_bus in hort_host_bridges(),
where the pci_bus_range() function call might not leave enough
space in the range for hotplugs IIRC.

> ./qemu-system-aarch64 -machine virt,gic-version=3,default-bus-bypass-iommu=on \
> -enable-kvm -cpu host -m 4G -smp cpus=8,maxcpus=8 \
> -object iommufd,id=iommufd0 \
> -bios QEMU_EFI.fd \
> -kernel Image \
> -device virtio-blk-device,drive=fs \
> -drive if=none,file=rootfs.qcow2,id=fs \
> -device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \
> -device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1 \
> -device arm-smmuv3-nested,id=smmuv1,pci-bus=pcie.1 \
> -device vfio-pci,host=0000:7d:02.1,bus=pcie.port1,iommufd=iommufd0 \
> -device pxb-pcie,id=pcie.2,bus_nr=16,bus=pcie.0 \
> -device pcie-root-port,id=pcie.port2,bus=pcie.2,chassis=2 \
> -device arm-smmuv3-nested,id=smmuv2,pci-bus=pcie.2 \
> -device vfio-pci,host=0000:75:00.1,bus=pcie.port2,iommufd=iommufd0 \
> -append "rdinit=init console=ttyAMA0 root=/dev/vda2 rw earlycon=pl011,0x9000000" \
> -device virtio-9p-pci,fsdev=p9fs2,mount_tag=p9,bus=pcie.0 \
> -fsdev local,id=p9fs2,path=p9root,security_model=mapped \
> -net none \
> -nographic
..
> With a pci topology like below,
> [root@localhost ~]# lspci -tv
> -+-[0000:00]-+-00.0  Red Hat, Inc. QEMU PCIe Host bridge
>  |           +-01.0  Red Hat, Inc. QEMU PCIe Expander bridge
>  |           +-02.0  Red Hat, Inc. QEMU PCIe Expander bridge
>  |           \-03.0  Virtio: Virtio filesystem
>  +-[0000:08]---00.0-[09]----00.0  Huawei Technologies Co., Ltd. HNS Network Controller (Virtual Function)
>  \-[0000:10]---00.0-[11]----00.0  Huawei Technologies Co., Ltd. HiSilicon ZIP Engine(Virtual Function)
> [root@localhost ~]#
> 
> And if you want to add another HNS VF, it should be added to the same SMMUv3
> as of the first HNS dev,
> 
> -device pcie-root-port,id=pcie.port3,bus=pcie.1,chassis=3 \
> -device vfio-pci,host=0000:7d:02.2,bus=pcie.port3,iommufd=iommufd0 \
..
> At present Qemu is not doing any extra validation other than the above
> failure to make sure the user configuration is correct or not. The
> assumption is libvirt will take care of this.

Nathan from NVIDIA side is working on the libvirt. And he already
did some prototype coding in libvirt that could generate required
PCI topology. I think he can take this patches for a combined test.

Thanks
Nicolin

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Nathan Chen 1 year, 2 months ago

Hi Shameer,

 >  Attempt to add the HNS VF to a different SMMUv3 will result in,
 >
 > -device vfio-pci,host=0000:7d:02.2,bus=pcie.port3,iommufd=iommufd0: 
Unable to attach viommu
 > -device vfio-pci,host=0000:7d:02.2,bus=pcie.port3,iommufd=iommufd0: 
vfio 0000:7d:02.2:
 >    Failed to set iommu_device: [iommufd=29] error attach 0000:7d:02.2 
(38) to id=11: Invalid argument
 >
 > At present Qemu is not doing any extra validation other than the above
 > failure to make sure the user configuration is correct or not. The
 > assumption is libvirt will take care of this.
Would you be able to elaborate what Qemu is validating with this error 
message? I'm not seeing these errors when assigning a GPU's 
pcie-root-port to different PXBs (with different associated SMMU nodes).

I launched a VM using my libvirt prototype code + your qemu branch and 
noted a few small things:
1. Are there plans to support "-device addr" for arm-smmuv3-nested's 
PCIe slot and function like any other device? If not I'll exclude it 
from my libvirt prototype.
2. Is "id" for  "-device arm-smmuv3-nested" necessary for any sort of 
functionality? If so, I'll make a change to my libvirt prototype to 
support this. I was able to boot a VM and see a similar VM PCI topology 
as your example without specifying "id".

Otherwise, the VM topology looks OK with your qemu branch + my libvirt 
prototype.

Also as a heads up, I've added support for auto-inserting PCIe switch 
between the PXB and GPUs in libvirt to attach multiple devices to a SMMU 
node per libvirt's documentation - "If you intend to plug multiple 
devices into a pcie-expander-bus, you must connect a 
pcie-switch-upstream-port to the pcie-root-port that is plugged into the 
pcie-expander-bus, and multiple pcie-switch-downstream-ports to the 
pcie-switch-upstream-port". Future unit-tests should follow this 
topology configuration.

Thanks,
Nathan

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Nathan Chen 1 year, 1 month ago

Hi Shameer,

Could you share the branch/version of the boot firmware file 
"QEMU_EFI.fd" from your example, and where you retrieved it from? I've 
been encountering PCI host bridge resource conflicts whenever assigning 
more than one passthrough device to a multi-vSMMU VM, booting with the 
boot firmware provided by qemu-efi-aarch64 version 2024.02-2. This 
prevents the VM from booting, eventually dropping into the UEFI shell 
with an error message indicating DMA mapping failed for the passthrough 
devices.

Thanks,
Nathan

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Nathan Chen 1 year, 1 month ago

 >with an error message indicating DMA mapping failed for the 
passthrough >devices.

A correction - the message indicates UEFI failed to find a mapping for 
the boot partition ("map: no mapping found"), not that DMA mapping 
failed. But earlier EDK debug logs still show PCI host bridge resource 
conflicts for the passthrough devices that seem related to the VM boot 
failure.

RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Shameerali Kolothum Thodi via 1 year, 1 month ago

Hi Nathan,

> -----Original Message-----
> From: Nathan Chen <nathanc@nvidia.com>
> Sent: Friday, December 13, 2024 1:02 AM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
> ddutile@redhat.com; Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> Jonathan Cameron <jonathan.cameron@huawei.com>;
> zhangfei.gao@linaro.org; Nicolin Chen <nicolinc@nvidia.com>
> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> nested SMMUv3
> 
> 
>  >with an error message indicating DMA mapping failed for the
> passthrough >devices.
> 
> A correction - the message indicates UEFI failed to find a mapping for
> the boot partition ("map: no mapping found"), not that DMA mapping
> failed. But earlier EDK debug logs still show PCI host bridge resource
> conflicts for the passthrough devices that seem related to the VM boot
> failure.

I have tried a 2023 version EFI which works. And for more recent tests I am
using a one built directly from,
https://github.com/tianocore/edk2.git master

Commit: 0f3867fa6ef0("UefiPayloadPkg/UefiPayloadEntry: Fix PT protection
in 5 level paging"

With both, I don’t remember seeing any boot failure and the above UEFI
related "map: no mapping found" error. But the Guest kernel at times
complaints about pci bridge window memory assignment failures.
...
pci 0000:10:01.0: bridge window [mem size 0x00200000 64bit pref]: can't assign; no space
pci 0000:10:01.0: bridge window [mem size 0x00200000 64bit pref]: failed to assign
pci 0000:10:00.0: bridge window [io  size 0x1000]:can't assign; no space
...

But Guest still boots and worked fine so far.

Thanks,
Shameer

RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Nathan Chen 1 year ago

>>  >with an error message indicating DMA mapping failed for the
>> passthrough >devices.
>> 
>> A correction - the message indicates UEFI failed to find a mapping for
>> the boot partition ("map: no mapping found"), not that DMA mapping
>> failed. But earlier EDK debug logs still show PCI host bridge resource
>> conflicts for the passthrough devices that seem related to the VM boot
>> failure.
> 
> I have tried a 2023 version EFI which works. And for more recent tests I am
> using a one built directly from,
> https://github.com/tianocore/edk2.git master
> 
> Commit: 0f3867fa6ef0("UefiPayloadPkg/UefiPayloadEntry: Fix PT protection
> in 5 level paging"
> 
> With both, I don’t remember seeing any boot failure and the above UEFI
> related "map: no mapping found" error. But the Guest kernel at times
> complaints about pci bridge window memory assignment failures.
> ...
> pci 0000:10:01.0: bridge window [mem size 0x00200000 64bit pref]: can't assign; no space
> pci 0000:10:01.0: bridge window [mem size 0x00200000 64bit pref]: failed to assign
> pci 0000:10:00.0: bridge window [io  size 0x1000]:can't assign; no space
> ...
> 
> But Guest still boots and worked fine so far.

Hi Shameer,

Just letting you know I resolved this by increasing the MMIO region size 
in hw/arm/virt.c to support passing through GPUs with large BAR regions 
(VIRT_HIGH_PCIE_MMIO). Thanks for taking a look.

Thanks,
Nathan

RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Shameerali Kolothum Thodi via 1 year ago


> -----Original Message-----
> From: Nathan Chen <nathanc@nvidia.com>
> Sent: Saturday, January 25, 2025 2:44 AM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: ddutile@redhat.com; eric.auger@redhat.com; jgg@nvidia.com;
> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>; Linuxarm <linuxarm@huawei.com>;
> nathanc@nvidia.com; nicolinc@nvidia.com; peter.maydell@linaro.org;
> qemu-arm@nongnu.org; Wangzhou (B) <wangzhou1@hisilicon.com>;
> zhangfei.gao@linaro.org; qemu-devel@nongnu.org
> Subject: RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> nested SMMUv3
> 
> >>  >with an error message indicating DMA mapping failed for the
> >> passthrough >devices.
> >>
> >> A correction - the message indicates UEFI failed to find a mapping for
> >> the boot partition ("map: no mapping found"), not that DMA mapping
> >> failed. But earlier EDK debug logs still show PCI host bridge resource
> >> conflicts for the passthrough devices that seem related to the VM boot
> >> failure.
> >
> > I have tried a 2023 version EFI which works. And for more recent tests I
> am
> > using a one built directly from,
> > https://github.com/tianocore/edk2.git master
> >
> > Commit: 0f3867fa6ef0("UefiPayloadPkg/UefiPayloadEntry: Fix PT
> protection
> > in 5 level paging"
> >
> > With both, I don’t remember seeing any boot failure and the above UEFI
> > related "map: no mapping found" error. But the Guest kernel at times
> > complaints about pci bridge window memory assignment failures.
> > ...
> > pci 0000:10:01.0: bridge window [mem size 0x00200000 64bit pref]: can't
> assign; no space
> > pci 0000:10:01.0: bridge window [mem size 0x00200000 64bit pref]: failed
> to assign
> > pci 0000:10:00.0: bridge window [io  size 0x1000]:can't assign; no space
> > ...
> >
> > But Guest still boots and worked fine so far.
> 
> Hi Shameer,
> 
> Just letting you know I resolved this by increasing the MMIO region size
> in hw/arm/virt.c to support passing through GPUs with large BAR regions
> (VIRT_HIGH_PCIE_MMIO). Thanks for taking a look.
> 

Ok. Thanks for that. Does that mean may be an optional property to specify
the size for VIRT_HIGH_PCIE_MMIO is worth adding?

And for the PCI bridge window specific errors that I mentioned above,

>>pci 0000:10:01.0: bridge window [mem size 0x00200000 64bit pref]: can't assign; no space

adding  ""mem-reserve=X" and "io-reserve=X" to pcie-root-port helps.

Thanks,
Shameer

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Nathan Chen 1 year ago

>>>>  >with an error message indicating DMA mapping failed for the
>>>> passthrough >devices.
>>>>
>>>> A correction - the message indicates UEFI failed to find a mapping for
>>>> the boot partition ("map: no mapping found"), not that DMA mapping
>>>> failed. But earlier EDK debug logs still show PCI host bridge resource
>>>> conflicts for the passthrough devices that seem related to the VM boot
>>>> failure.
>>>
>>> I have tried a 2023 version EFI which works. And for more recent tests I
>> am
>>> using a one built directly from,
>>> https://github.com/tianocore/edk2.git master
>>>
>>> Commit: 0f3867fa6ef0("UefiPayloadPkg/UefiPayloadEntry: Fix PT
>> protection
>>> in 5 level paging"
>>>
>>> With both, I don’t remember seeing any boot failure and the above UEFI
>>> related "map: no mapping found" error. But the Guest kernel at times
>>> complaints about pci bridge window memory assignment failures.
>>> ...
>>> pci 0000:10:01.0: bridge window [mem size 0x00200000 64bit pref]: can't
>> assign; no space
>>> pci 0000:10:01.0: bridge window [mem size 0x00200000 64bit pref]: failed
>> to assign
>>> pci 0000:10:00.0: bridge window [io  size 0x1000]:can't assign; no space
>>> ...
>>>
>>> But Guest still boots and worked fine so far.
>>
>> Hi Shameer,
>>
>> Just letting you know I resolved this by increasing the MMIO region size
>> in hw/arm/virt.c to support passing through GPUs with large BAR regions
>> (VIRT_HIGH_PCIE_MMIO). Thanks for taking a look.
>>
> 
> Ok. Thanks for that. Does that mean may be an optional property to specify
> the size for VIRT_HIGH_PCIE_MMIO is worth adding?

Yes, and actually we have a patch ready for the configurable highmem 
region size. Matt Ochs will send it out in the next day or so and CC you 
on the submission.

> adding  ""mem-reserve=X" and "io-reserve=X" to pcie-root-port helps

Ok, good to know - I'll keep that in mind for future testing.

Thanks,
Nathan

RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Shameerali Kolothum Thodi via 1 year, 2 months ago

Hi Nathan,

> -----Original Message-----
> From: Nathan Chen <nathanc@nvidia.com>
> Sent: Wednesday, November 20, 2024 11:59 PM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
> ddutile@redhat.com; Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> Jonathan Cameron <jonathan.cameron@huawei.com>;
> zhangfei.gao@linaro.org; Nicolin Chen <nicolinc@nvidia.com>
> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> nested SMMUv3
> 
> Hi Shameer,
> 
>  >  Attempt to add the HNS VF to a different SMMUv3 will result in,
>  >
>  > -device vfio-pci,host=0000:7d:02.2,bus=pcie.port3,iommufd=iommufd0:
> Unable to attach viommu
>  > -device vfio-pci,host=0000:7d:02.2,bus=pcie.port3,iommufd=iommufd0:
> vfio 0000:7d:02.2:
>  >    Failed to set iommu_device: [iommufd=29] error attach 0000:7d:02.2
> (38) to id=11: Invalid argument
>  >
>  > At present Qemu is not doing any extra validation other than the above
>  > failure to make sure the user configuration is correct or not. The
>  > assumption is libvirt will take care of this.
> Would you be able to elaborate what Qemu is validating with this error
> message? I'm not seeing these errors when assigning a GPU's
> pcie-root-port to different PXBs (with different associated SMMU nodes).

You should see that error when you have two devices that belongs to two
different physical SMMUv3s in the host kernel, is assigned to a single
PXB/SMMUv3 for Guest.

Something like,

-device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \
-device arm-smmuv3-nested,id=smmuv1,pci-bus=pcie.1 \
-device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1 \
-device pcie-root-port,id=pcie.port2,bus=pcie.1,chassis=1 \
-device vfio-pci,host=0000:7d:02.1,bus=pcie.port1,iommufd=iommufd0 \  --> This device belongs to phys SMMUv3_0
-device vfio-pci,host=0000:75:02.1,bus=pcie.port2,iommufd=iommufd0 \  --> This device belongs to phys SMMUv3_1

So the assumption above is that libvirt will be able to detect which devices belongs
to the same physical SMMUv3 and do the assignment for Guests correctly.

> I launched a VM using my libvirt prototype code + your qemu branch and
> noted a few small things:

Thanks for giving this a spin with libvirt.

> 1. Are there plans to support "-device addr" for arm-smmuv3-nested's
> PCIe slot and function like any other device? If not I'll exclude it
> from my libvirt prototype.

Not at the moment. arm-smmuv3-nested at the moment is not making any use
of PCI slot and  func info specifically. I am not sure how that will be useful
for this though.

> 2. Is "id" for  "-device arm-smmuv3-nested" necessary for any sort of
> functionality? If so, I'll make a change to my libvirt prototype to
> support this. I was able to boot a VM and see a similar VM PCI topology
> as your example without specifying "id".

Yes, "id" not used and without it, it will work.

> Otherwise, the VM topology looks OK with your qemu branch + my libvirt
> prototype.

That is good to know.
 
> Also as a heads up, I've added support for auto-inserting PCIe switch
> between the PXB and GPUs in libvirt to attach multiple devices to a SMMU
> node per libvirt's documentation - "If you intend to plug multiple
> devices into a pcie-expander-bus, you must connect a
> pcie-switch-upstream-port to the pcie-root-port that is plugged into the
> pcie-expander-bus, and multiple pcie-switch-downstream-ports to the
> pcie-switch-upstream-port". Future unit-tests should follow this
> topology configuration.

Ok. Could you please give me an example Qemu equivalent command option,
if possible, for the above case. I am not that familiar with libvirt and I would
also like to test the above scenario.

Thanks,
Shameer

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Nathan Chen 1 year, 2 months ago

 >> Also as a heads up, I've added support for auto-inserting PCIe switch
 >> between the PXB and GPUs in libvirt to attach multiple devices to a SMMU
 >> node per libvirt's documentation - "If you intend to plug multiple
 >> devices into a pcie-expander-bus, you must connect a
 >> pcie-switch-upstream-port to the pcie-root-port that is plugged into the
 >> pcie-expander-bus, and multiple pcie-switch-downstream-ports to the
 >> pcie-switch-upstream-port". Future unit-tests should follow this
 >> topology configuration.
 >
 > Ok. Could you please give me an example Qemu equivalent command option,
 > if possible, for the above case. I am not that familiar with libvirt 
and I would
 > also like to test the above scenario.

You can use "-device x3130-upstream" for the upstream switch port, and
"-device xio3130-downstream" for the downstream port:

  -device pxb-pcie,bus_nr=250,id=pci.1,bus=pcie.0,addr=0x1 \
  -device pcie-root-port,id=pci.2,bus=pci.1,addr=0x0 \
  -device x3130-upstream,id=pci.3,bus=pci.2,addr=0x0 \
  -device xio3130-downstream,id=pci.4,bus=pci.3,addr=0x0,chassis=17,port=1 \
  -device vfio-pci,host=0009:01:00.0,id=hostdev0,bus=pci.4,addr=0x0 \
  -device arm-smmuv3-nested,pci-bus=pci.1

-Nathan

RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Shameerali Kolothum Thodi via 1 year, 2 months ago


> -----Original Message-----
> From: Nathan Chen <nathanc@nvidia.com>
> Sent: Friday, November 22, 2024 1:42 AM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
> ddutile@redhat.com; Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> Jonathan Cameron <jonathan.cameron@huawei.com>;
> zhangfei.gao@linaro.org; Nicolin Chen <nicolinc@nvidia.com>
> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> nested SMMUv3
> 
>  >> Also as a heads up, I've added support for auto-inserting PCIe switch
>  >> between the PXB and GPUs in libvirt to attach multiple devices to a
> SMMU
>  >> node per libvirt's documentation - "If you intend to plug multiple
>  >> devices into a pcie-expander-bus, you must connect a
>  >> pcie-switch-upstream-port to the pcie-root-port that is plugged into the
>  >> pcie-expander-bus, and multiple pcie-switch-downstream-ports to the
>  >> pcie-switch-upstream-port". Future unit-tests should follow this
>  >> topology configuration.
>  >
>  > Ok. Could you please give me an example Qemu equivalent command
> option,
>  > if possible, for the above case. I am not that familiar with libvirt
> and I would
>  > also like to test the above scenario.
> 
> You can use "-device x3130-upstream" for the upstream switch port, and
> "-device xio3130-downstream" for the downstream port:
> 
>   -device pxb-pcie,bus_nr=250,id=pci.1,bus=pcie.0,addr=0x1 \
>   -device pcie-root-port,id=pci.2,bus=pci.1,addr=0x0 \
>   -device x3130-upstream,id=pci.3,bus=pci.2,addr=0x0 \
>   -device xio3130-
> downstream,id=pci.4,bus=pci.3,addr=0x0,chassis=17,port=1 \
>   -device vfio-pci,host=0009:01:00.0,id=hostdev0,bus=pci.4,addr=0x0 \
>   -device arm-smmuv3-nested,pci-bus=pci.1

Thanks. Just wondering why libvirt mandates usage of pcie-switch for multiple
device plugging rather than just using pcie-root-ports?

Please let me if there is any advantage in doing so that you are aware of.

Thanks,
Shameer

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Daniel P. Berrangé 1 year, 1 month ago

On Fri, Nov 22, 2024 at 05:38:54PM +0000, Shameerali Kolothum Thodi via wrote:
> 
> 
> > -----Original Message-----
> > From: Nathan Chen <nathanc@nvidia.com>
> > Sent: Friday, November 22, 2024 1:42 AM
> > To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> > Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> > eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
> > ddutile@redhat.com; Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> > <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> > Jonathan Cameron <jonathan.cameron@huawei.com>;
> > zhangfei.gao@linaro.org; Nicolin Chen <nicolinc@nvidia.com>
> > Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> > nested SMMUv3
> > 
> >  >> Also as a heads up, I've added support for auto-inserting PCIe switch
> >  >> between the PXB and GPUs in libvirt to attach multiple devices to a
> > SMMU
> >  >> node per libvirt's documentation - "If you intend to plug multiple
> >  >> devices into a pcie-expander-bus, you must connect a
> >  >> pcie-switch-upstream-port to the pcie-root-port that is plugged into the
> >  >> pcie-expander-bus, and multiple pcie-switch-downstream-ports to the
> >  >> pcie-switch-upstream-port". Future unit-tests should follow this
> >  >> topology configuration.
> >  >
> >  > Ok. Could you please give me an example Qemu equivalent command
> > option,
> >  > if possible, for the above case. I am not that familiar with libvirt
> > and I would
> >  > also like to test the above scenario.
> > 
> > You can use "-device x3130-upstream" for the upstream switch port, and
> > "-device xio3130-downstream" for the downstream port:
> > 
> >   -device pxb-pcie,bus_nr=250,id=pci.1,bus=pcie.0,addr=0x1 \
> >   -device pcie-root-port,id=pci.2,bus=pci.1,addr=0x0 \
> >   -device x3130-upstream,id=pci.3,bus=pci.2,addr=0x0 \
> >   -device xio3130-
> > downstream,id=pci.4,bus=pci.3,addr=0x0,chassis=17,port=1 \
> >   -device vfio-pci,host=0009:01:00.0,id=hostdev0,bus=pci.4,addr=0x0 \
> >   -device arm-smmuv3-nested,pci-bus=pci.1
> 
> Thanks. Just wondering why libvirt mandates usage of pcie-switch for multiple
> device plugging rather than just using pcie-root-ports?

Libvirt does not rquire use of pcie-switch. It supports them, but in the
absence of app requested configs, libvirt will always just populate
pcie-root-port devices. switches are something that has to be explicitly
asked for, and I don't see much need todo that.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Jason Gunthorpe 1 year, 1 month ago

On Fri, Dec 13, 2024 at 11:58:02AM +0000, Daniel P. Berrangé wrote:

> Libvirt does not rquire use of pcie-switch. It supports them, but in the
> absence of app requested configs, libvirt will always just populate
> pcie-root-port devices. switches are something that has to be explicitly
> asked for, and I don't see much need todo that.

If you are assigning all VFIO devices within a multi-device iommu
group there are good reasons to show the switch, and the switch has to
reflect certain ACS properties. We have some systems like this..

Jason

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Nathan Chen 1 year, 2 months ago

 >  >> Also as a heads up, I've added support for auto-inserting PCIe switch
 >  >> between the PXB and GPUs in libvirt to attach multiple devices to a
 > SMMU
 >  >> node per libvirt's documentation - "If you intend to plug multiple
 >  >> devices into a pcie-expander-bus, you must connect a
 >  >> pcie-switch-upstream-port to the pcie-root-port that is plugged 
into the
 >  >> pcie-expander-bus, and multiple pcie-switch-downstream-ports to the
 >  >> pcie-switch-upstream-port". Future unit-tests should follow this
 >  >> topology configuration.
 >  >
 > >  > Ok. Could you please give me an example Qemu equivalent command
 > > option,
 > >  > if possible, for the above case. I am not that familiar with libvirt
 > > and I would
 > >  > also like to test the above scenario.
 > >
 > > You can use "-device x3130-upstream" for the upstream switch port, and
 > > "-device xio3130-downstream" for the downstream port:
 > >
 > >   -device pxb-pcie,bus_nr=250,id=pci.1,bus=pcie.0,addr=0x1 \
 > >   -device pcie-root-port,id=pci.2,bus=pci.1,addr=0x0 \
 > >   -device x3130-upstream,id=pci.3,bus=pci.2,addr=0x0 \
 > >   -device xio3130-
 > > downstream,id=pci.4,bus=pci.3,addr=0x0,chassis=17,port=1 \
 > >   -device vfio-pci,host=0009:01:00.0,id=hostdev0,bus=pci.4,addr=0x0 \
 > >   -device arm-smmuv3-nested,pci-bus=pci.1
 >
 > Thanks. Just wondering why libvirt mandates usage of pcie-switch for 
multiple
 > device plugging rather than just using pcie-root-ports?
 >
 > Please let me if there is any advantage in doing so that you are 
aware > of.

Actually it seems like that documentation I quoted is out of date. That 
section of the documentation for pcie-expander-bus was written before a 
patch that revised libvirt's pxb to have 32 slots instead of just 1 
slot, and it wasn't updated afterwards.

With your branch and my libvirt prototype, I was still able to attach a 
passthrough device behind a PCIe switch and see it attached to a vSMMU 
in the VM, so I'm not sure if you need to make additional changes to 
your solution to support this. But I think we should still support/test 
the case where VFIO devices are behind a switch, otherwise we're placing 
a limitation on end users who have a use case for it.

-Nathan

Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Eric Auger 1 year ago

Hi Nathan,


On 11/22/24 7:53 PM, Nathan Chen wrote:
> >  >> Also as a heads up, I've added support for auto-inserting PCIe
> switch
> >  >> between the PXB and GPUs in libvirt to attach multiple devices to a
> > SMMU
> >  >> node per libvirt's documentation - "If you intend to plug multiple
> >  >> devices into a pcie-expander-bus, you must connect a
> >  >> pcie-switch-upstream-port to the pcie-root-port that is plugged
> into the
> >  >> pcie-expander-bus, and multiple pcie-switch-downstream-ports to the
> >  >> pcie-switch-upstream-port". Future unit-tests should follow this
> >  >> topology configuration.
> >  >
> > >  > Ok. Could you please give me an example Qemu equivalent command
> > > option,
> > >  > if possible, for the above case. I am not that familiar with
> libvirt
> > > and I would
> > >  > also like to test the above scenario.
> > >
> > > You can use "-device x3130-upstream" for the upstream switch port,
> and
> > > "-device xio3130-downstream" for the downstream port:
> > >
> > >   -device pxb-pcie,bus_nr=250,id=pci.1,bus=pcie.0,addr=0x1 \
> > >   -device pcie-root-port,id=pci.2,bus=pci.1,addr=0x0 \
> > >   -device x3130-upstream,id=pci.3,bus=pci.2,addr=0x0 \
> > >   -device xio3130-
> > > downstream,id=pci.4,bus=pci.3,addr=0x0,chassis=17,port=1 \
> > >   -device vfio-pci,host=0009:01:00.0,id=hostdev0,bus=pci.4,addr=0x0 \
> > >   -device arm-smmuv3-nested,pci-bus=pci.1
> >
> > Thanks. Just wondering why libvirt mandates usage of pcie-switch for
> multiple
> > device plugging rather than just using pcie-root-ports?
> >
> > Please let me if there is any advantage in doing so that you are
> aware > of.
>
> Actually it seems like that documentation I quoted is out of date.
> That section of the documentation for pcie-expander-bus was written
> before a patch that revised libvirt's pxb to have 32 slots instead of
> just 1 slot, and it wasn't updated afterwards.
you mean read QEMU documentation in qemu/docs/pcie.txt (esp PCI Express
only hierarchy)

Thanks

Eric
>
> With your branch and my libvirt prototype, I was still able to attach
> a passthrough device behind a PCIe switch and see it attached to a
> vSMMU in the VM, so I'm not sure if you need to make additional
> changes to your solution to support this. But I think we should still
> support/test the case where VFIO devices are behind a switch,
> otherwise we're placing a limitation on end users who have a use case
> for it.
>
> -Nathan

RE: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable nested SMMUv3

Posted by Shameerali Kolothum Thodi via 1 year, 2 months ago


> -----Original Message-----
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Tuesday, November 12, 2024 11:00 PM
> To: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; nathanc@nvidia.com
> Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org;
> eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com;
> ddutile@redhat.com; Linuxarm <linuxarm@huawei.com>; Wangzhou (B)
> <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>;
> Jonathan Cameron <jonathan.cameron@huawei.com>;
> zhangfei.gao@linaro.org
> Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable
> nested SMMUv3
> 
> On Fri, Nov 08, 2024 at 12:52:37PM +0000, Shameer Kolothum wrote:
> > Few ToDos to note,
> > 1. At present default-bus-bypass-iommu=on should be set when
> >    arm-smmuv3-nested dev is specified. Otherwise you may get an IORT
> >    related boot error.  Requires fixing.
> > 2. Hot adding a device is not working at the moment. Looks like pcihp irq
> issue.
> >    Could be a bug in IORT id mappings.
> 
> Do we have enough bus number space for each pbx bus in IORT?
> 
> The bus range is defined by min_/max_bus in hort_host_bridges(),
> where the pci_bus_range() function call might not leave enough
> space in the range for hotplugs IIRC.

Ok. Thanks for the pointer. I will debug that.

> > ./qemu-system-aarch64 -machine virt,gic-version=3,default-bus-bypass-
> iommu=on \
> > -enable-kvm -cpu host -m 4G -smp cpus=8,maxcpus=8 \
> > -object iommufd,id=iommufd0 \
> > -bios QEMU_EFI.fd \
> > -kernel Image \
> > -device virtio-blk-device,drive=fs \
> > -drive if=none,file=rootfs.qcow2,id=fs \
> > -device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \
> > -device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1 \
> > -device arm-smmuv3-nested,id=smmuv1,pci-bus=pcie.1 \
> > -device vfio-pci,host=0000:7d:02.1,bus=pcie.port1,iommufd=iommufd0 \
> > -device pxb-pcie,id=pcie.2,bus_nr=16,bus=pcie.0 \
> > -device pcie-root-port,id=pcie.port2,bus=pcie.2,chassis=2 \
> > -device arm-smmuv3-nested,id=smmuv2,pci-bus=pcie.2 \
> > -device vfio-pci,host=0000:75:00.1,bus=pcie.port2,iommufd=iommufd0 \
> > -append "rdinit=init console=ttyAMA0 root=/dev/vda2 rw
> earlycon=pl011,0x9000000" \
> > -device virtio-9p-pci,fsdev=p9fs2,mount_tag=p9,bus=pcie.0 \
> > -fsdev local,id=p9fs2,path=p9root,security_model=mapped \
> > -net none \
> > -nographic
> ..
> > With a pci topology like below,
> > [root@localhost ~]# lspci -tv
> > -+-[0000:00]-+-00.0  Red Hat, Inc. QEMU PCIe Host bridge
> >  |           +-01.0  Red Hat, Inc. QEMU PCIe Expander bridge
> >  |           +-02.0  Red Hat, Inc. QEMU PCIe Expander bridge
> >  |           \-03.0  Virtio: Virtio filesystem
> >  +-[0000:08]---00.0-[09]----00.0  Huawei Technologies Co., Ltd. HNS
> Network Controller (Virtual Function)
> >  \-[0000:10]---00.0-[11]----00.0  Huawei Technologies Co., Ltd. HiSilicon ZIP
> Engine(Virtual Function)
> > [root@localhost ~]#
> >
> > And if you want to add another HNS VF, it should be added to the same
> SMMUv3
> > as of the first HNS dev,
> >
> > -device pcie-root-port,id=pcie.port3,bus=pcie.1,chassis=3 \
> > -device vfio-pci,host=0000:7d:02.2,bus=pcie.port3,iommufd=iommufd0 \
> ..
> > At present Qemu is not doing any extra validation other than the above
> > failure to make sure the user configuration is correct or not. The
> > assumption is libvirt will take care of this.
> 
> Nathan from NVIDIA side is working on the libvirt. And he already
> did some prototype coding in libvirt that could generate required
> PCI topology. I think he can take this patches for a combined test.

Cool. That's good to know.

Thanks,
SHameer