backends/iommufd.c | 65 ++++ backends/trace-events | 2 + hw/arm/Kconfig | 5 + hw/arm/meson.build | 1 + hw/arm/smmuv3-accel.c | 16 +- hw/arm/smmuv3.c | 18 + hw/arm/tegra241-cmdqv.c | 759 ++++++++++++++++++++++++++++++++++++++ hw/arm/tegra241-cmdqv.h | 337 +++++++++++++++++ hw/arm/trace-events | 5 + hw/arm/virt-acpi-build.c | 110 +++++- hw/vfio/iommufd.c | 6 +- include/exec/cpu-common.h | 2 + include/hw/arm/smmuv3.h | 3 + include/hw/arm/virt.h | 2 + include/system/iommufd.h | 16 + system/physmem.c | 12 + 16 files changed, 1332 insertions(+), 27 deletions(-) create mode 100644 hw/arm/tegra241-cmdqv.c create mode 100644 hw/arm/tegra241-cmdqv.h
Hi,
This RFC series adds initial support for NVIDIA Tegra241 CMDQV
(Command Queue Virtualisation), an extension to ARM SMMUv3 that
provides hardware accelerated virtual command queues (VCMDQs) for
guests. CMDQV allows guests to issue SMMU invalidation commands
directly to hardware without VM exits, significantly reducing TLBI
overhead.
Thanks to Nicolin for the initial patches and testing on which this RFC
is based.
This is based on v6[0] of the SMMUv3 accel series, which is still under
review, though nearing convergence. This is sent as an RFC, with the goal
of gathering early feedback on the CMDQV design and its integration with
the SMMUv3 acceleration path.
Background:
Tegra241 CMDQV extends SMMUv3 by allocating per-VM "virtual interfaces"
(VINTFs), each hosting up to 128 VCMDQs.
Each VINTF exposes two 64KB MMIO pages:
- Page0 – guest owned control and status registers (directly mapped
into the VM)
- Page1 – queue configuration registers (trapped/emulated by QEMU)
Unlike the standard SMMU CMDQ, a guest owned Tegra241 VCMDQ does not
support the full command set. Only a subset, primarily invalidation
related commands, is accepted by the CMDQV hardware. For this reason,
a distinct CMDQV device must be exposed to the guest, and the guest OS
must include a Tegra241 CMDQV aware driver to take advantage of the
hardware acceleration.
VCMDQ support is integrated via the IOMMU_HW_QUEUE_ALLOC mechanism,
allowing QEMU to attach guest configured VCMDQ buffers to the
underlying CMDQV hardware through IOMMUFD. The Linux kernel already
supports the full CMDQV virtualisation model via IOMMUFD[0].
Summary of QEMU changes:
- Integrated into the existing SMMUv3 accel path via a
"tegra241-cmdqv" property.
- Support for allocating vIOMMU objects of type
IOMMU_VIOMMU_TYPE_TEGRA241_CMDQV.
- Mapping and emulation of the CMDQV MMIO register layout.
- VCMDQ/VINTF read/write handling and queue allocation using IOMMUFD
APIs.
- Reset and initialisation hooks, including checks for at least one
cold-plugged device.
- CMDQV hardware reads guest queue memory using host physical addresses
provided through IOMMUFD, which requires that the VCMDQ buffer be
physically contiguous not only in guest PA space but also in host
PA space. When Tegra241 CMDQV is enabled, QEMU must therefore only
expose a CMDQV size that the host can reliably back with contiguous
physical memory. Because of this constraint, it is suggested to use
huge pages to back the guest RAM.
- ACPI DSDT node generation for CMDQV devices on the virt machine.
These patches have been sanity tested on NVIDIA Grace platforms.
ToDo / revisit:
- Prevent hot-unplug of the last device associated with vIOMMU as
this might allow associating a different host SMMU/CMDQV.
- Locking requirements around error event propagation.
Feedback and testing are very welcome.
Thanks,
Shameer
[0] https://lore.kernel.org/qemu-devel/20251120132213.56581-1-skolothumtho@nvidia.com/
[1] https://lore.kernel.org/all/cover.1752126748.git.nicolinc@nvidia.com/
Nicolin Chen (12):
backends/iommufd: Update iommufd_backend_get_device_info
backends/iommufd: Update iommufd_backend_alloc_viommu to allow user
ptr
backends/iommufd: Introduce iommufd_backend_alloc_hw_queue
backends/iommufd: Introduce iommufd_backend_viommu_mmap
hw/arm/tegra241-cmdqv: Add initial Tegra241 CMDQ-Virtualisation
support
hw/arm/tegra241-cmdqv: Map VINTF Page0 into guest
hw/arm/tegra241-cmdqv: Add read emulation support for registers
system/physmem: Add helper to check whether a guest PA maps to RAM
hw/arm/tegra241-cmdqv:: Add write emulation for registers
hw/arm/tegra241-cmdqv: Add reset handler
hw/arm/tegra241-cmdqv: Limit queue size based on backend page size
hw/arm/virt-acpi: Advertise Tegra241 CMDQV nodes in DSDT
Shameer Kolothum (4):
hw/arm/tegra241-cmdqv: Allocate vEVENTQ object
hw/arm/tegra241-cmdqv: Read and propagate Tegra241 CMDQV errors
virt-acpi-build: Rename AcpiIortSMMUv3Dev to AcpiSMMUv3Dev
hw/arm/smmuv3: Add tegra241-cmdqv property for SMMUv3 device
backends/iommufd.c | 65 ++++
backends/trace-events | 2 +
hw/arm/Kconfig | 5 +
hw/arm/meson.build | 1 +
hw/arm/smmuv3-accel.c | 16 +-
hw/arm/smmuv3.c | 18 +
hw/arm/tegra241-cmdqv.c | 759 ++++++++++++++++++++++++++++++++++++++
hw/arm/tegra241-cmdqv.h | 337 +++++++++++++++++
hw/arm/trace-events | 5 +
hw/arm/virt-acpi-build.c | 110 +++++-
hw/vfio/iommufd.c | 6 +-
include/exec/cpu-common.h | 2 +
include/hw/arm/smmuv3.h | 3 +
include/hw/arm/virt.h | 2 +
include/system/iommufd.h | 16 +
system/physmem.c | 12 +
16 files changed, 1332 insertions(+), 27 deletions(-)
create mode 100644 hw/arm/tegra241-cmdqv.c
create mode 100644 hw/arm/tegra241-cmdqv.h
--
2.43.0
Hi Shameer, On 12/10/25 2:37 PM, Shameer Kolothum wrote: > Hi, > > This RFC series adds initial support for NVIDIA Tegra241 CMDQV > (Command Queue Virtualisation), an extension to ARM SMMUv3 that > provides hardware accelerated virtual command queues (VCMDQs) for > guests. CMDQV allows guests to issue SMMU invalidation commands > directly to hardware without VM exits, significantly reducing TLBI > overhead. > > Thanks to Nicolin for the initial patches and testing on which this RFC > is based. > > This is based on v6[0] of the SMMUv3 accel series, which is still under > review, though nearing convergence. This is sent as an RFC, with the goal > of gathering early feedback on the CMDQV design and its integration with > the SMMUv3 acceleration path. > > Background: > > Tegra241 CMDQV extends SMMUv3 by allocating per-VM "virtual interfaces" > (VINTFs), each hosting up to 128 VCMDQs. > > Each VINTF exposes two 64KB MMIO pages: > - Page0 – guest owned control and status registers (directly mapped > into the VM) > - Page1 – queue configuration registers (trapped/emulated by QEMU) > > Unlike the standard SMMU CMDQ, a guest owned Tegra241 VCMDQ does not > support the full command set. Only a subset, primarily invalidation > related commands, is accepted by the CMDQV hardware. For this reason, > a distinct CMDQV device must be exposed to the guest, and the guest OS > must include a Tegra241 CMDQV aware driver to take advantage of the > hardware acceleration. > > VCMDQ support is integrated via the IOMMU_HW_QUEUE_ALLOC mechanism, > allowing QEMU to attach guest configured VCMDQ buffers to the > underlying CMDQV hardware through IOMMUFD. The Linux kernel already > supports the full CMDQV virtualisation model via IOMMUFD[0]. > > Summary of QEMU changes: > > - Integrated into the existing SMMUv3 accel path via a > "tegra241-cmdqv" property. > - Support for allocating vIOMMU objects of type > IOMMU_VIOMMU_TYPE_TEGRA241_CMDQV. > - Mapping and emulation of the CMDQV MMIO register layout. > - VCMDQ/VINTF read/write handling and queue allocation using IOMMUFD > APIs. > - Reset and initialisation hooks, including checks for at least one > cold-plugged device. > - CMDQV hardware reads guest queue memory using host physical addresses > provided through IOMMUFD, which requires that the VCMDQ buffer be > physically contiguous not only in guest PA space but also in host > PA space. When Tegra241 CMDQV is enabled, QEMU must therefore only > expose a CMDQV size that the host can reliably back with contiguous > physical memory. Because of this constraint, it is suggested to use > huge pages to back the guest RAM. > - ACPI DSDT node generation for CMDQV devices on the virt machine. > > These patches have been sanity tested on NVIDIA Grace platforms. > > ToDo / revisit: > - Prevent hot-unplug of the last device associated with vIOMMU as > this might allow associating a different host SMMU/CMDQV. > - Locking requirements around error event propagation. > > Feedback and testing are very welcome. > > Thanks, > Shameer > [0] https://lore.kernel.org/qemu-devel/20251120132213.56581-1-skolothumtho@nvidia.com/ > [1] https://lore.kernel.org/all/cover.1752126748.git.nicolinc@nvidia.com/ do you have a branch to share with all the bits? Thanks Eric > > Nicolin Chen (12): > backends/iommufd: Update iommufd_backend_get_device_info > backends/iommufd: Update iommufd_backend_alloc_viommu to allow user > ptr > backends/iommufd: Introduce iommufd_backend_alloc_hw_queue > backends/iommufd: Introduce iommufd_backend_viommu_mmap > hw/arm/tegra241-cmdqv: Add initial Tegra241 CMDQ-Virtualisation > support > hw/arm/tegra241-cmdqv: Map VINTF Page0 into guest > hw/arm/tegra241-cmdqv: Add read emulation support for registers > system/physmem: Add helper to check whether a guest PA maps to RAM > hw/arm/tegra241-cmdqv:: Add write emulation for registers > hw/arm/tegra241-cmdqv: Add reset handler > hw/arm/tegra241-cmdqv: Limit queue size based on backend page size > hw/arm/virt-acpi: Advertise Tegra241 CMDQV nodes in DSDT > > Shameer Kolothum (4): > hw/arm/tegra241-cmdqv: Allocate vEVENTQ object > hw/arm/tegra241-cmdqv: Read and propagate Tegra241 CMDQV errors > virt-acpi-build: Rename AcpiIortSMMUv3Dev to AcpiSMMUv3Dev > hw/arm/smmuv3: Add tegra241-cmdqv property for SMMUv3 device > > backends/iommufd.c | 65 ++++ > backends/trace-events | 2 + > hw/arm/Kconfig | 5 + > hw/arm/meson.build | 1 + > hw/arm/smmuv3-accel.c | 16 +- > hw/arm/smmuv3.c | 18 + > hw/arm/tegra241-cmdqv.c | 759 ++++++++++++++++++++++++++++++++++++++ > hw/arm/tegra241-cmdqv.h | 337 +++++++++++++++++ > hw/arm/trace-events | 5 + > hw/arm/virt-acpi-build.c | 110 +++++- > hw/vfio/iommufd.c | 6 +- > include/exec/cpu-common.h | 2 + > include/hw/arm/smmuv3.h | 3 + > include/hw/arm/virt.h | 2 + > include/system/iommufd.h | 16 + > system/physmem.c | 12 + > 16 files changed, 1332 insertions(+), 27 deletions(-) > create mode 100644 hw/arm/tegra241-cmdqv.c > create mode 100644 hw/arm/tegra241-cmdqv.h >
Hi Eric, > -----Original Message----- > From: Eric Auger <eric.auger@redhat.com> > Sent: 11 December 2025 17:55 > To: Shameer Kolothum <skolothumtho@nvidia.com>; qemu- > arm@nongnu.org; qemu-devel@nongnu.org > Cc: peter.maydell@linaro.org; Nicolin Chen <nicolinc@nvidia.com>; Nathan > Chen <nathanc@nvidia.com>; Matt Ochs <mochs@nvidia.com>; Jason > Gunthorpe <jgg@nvidia.com>; jonathan.cameron@huawei.com; > zhangfei.gao@linaro.org; zhenzhong.duan@intel.com; Krishnakant Jaju > <kjaju@nvidia.com> > Subject: Re: [RFC PATCH 00/16] hw/arm: Introduce Tegra241 CMDQV support > for accelerated SMMUv3 > > External email: Use caution opening links or attachments [...] > > > [0] https://lore.kernel.org/qemu-devel/20251120132213.56581-1- > skolothumtho@nvidia.com/ > > [1] https://lore.kernel.org/all/cover.1752126748.git.nicolinc@nvidia.com/ > > do you have a branch to share with all the bits? Here: https://github.com/shamiali2008/qemu-master.git master-smmuv3-accel-v6-veventq-v2-vcmdq-rfcv1 Thanks, Shameer
> -----Original Message-----
> From: Shameer Kolothum <skolothumtho@nvidia.com>
> Sent: 12 December 2025 00:23
> To: eric.auger@redhat.com; qemu-arm@nongnu.org; qemu-
> devel@nongnu.org
> Cc: peter.maydell@linaro.org; Nicolin Chen <nicolinc@nvidia.com>; Nathan
> Chen <nathanc@nvidia.com>; Matt Ochs <mochs@nvidia.com>; Jason
> Gunthorpe <jgg@nvidia.com>; jonathan.cameron@huawei.com;
> zhangfei.gao@linaro.org; zhenzhong.duan@intel.com; Krishnakant Jaju
> <kjaju@nvidia.com>
> Subject: RE: [RFC PATCH 00/16] hw/arm: Introduce Tegra241 CMDQV support
> for accelerated SMMUv3
>
> Hi Eric,
>
> > -----Original Message-----
> > From: Eric Auger <eric.auger@redhat.com>
> > Sent: 11 December 2025 17:55
> > To: Shameer Kolothum <skolothumtho@nvidia.com>; qemu-
> arm@nongnu.org;
> > qemu-devel@nongnu.org
> > Cc: peter.maydell@linaro.org; Nicolin Chen <nicolinc@nvidia.com>;
> > Nathan Chen <nathanc@nvidia.com>; Matt Ochs <mochs@nvidia.com>;
> Jason
> > Gunthorpe <jgg@nvidia.com>; jonathan.cameron@huawei.com;
> > zhangfei.gao@linaro.org; zhenzhong.duan@intel.com; Krishnakant Jaju
> > <kjaju@nvidia.com>
> > Subject: Re: [RFC PATCH 00/16] hw/arm: Introduce Tegra241 CMDQV
> > support for accelerated SMMUv3
> >
> > External email: Use caution opening links or attachments
>
> [...]
>
> >
> > > [0] https://lore.kernel.org/qemu-devel/20251120132213.56581-1-
> > skolothumtho@nvidia.com/
> > > [1]
> > > https://lore.kernel.org/all/cover.1752126748.git.nicolinc@nvidia.com
> > > /
> >
> > do you have a branch to share with all the bits?
>
> Here:
> https://github.com/shamiali2008/qemu-master.git master-smmuv3-accel-
> v6-veventq-v2-vcmdq-rfcv1
I just realised this needs a fix in patch #16,
diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index 58c35c2af3..c32b35a9a7 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -1954,7 +1954,7 @@ static bool smmu_validate_property(SMMUv3State *s, Error **errp)
return false;
}
#endif
-#ifndef CONFIG_TEGRA241_CMDQ
+#ifndef CONFIG_TEGRA241_CMDQV
if (s->tegra241_cmdqv) {
error_setg(errp, "tegra241_cmdqv=on support not compiled in");
return false;
--
Pushed to above branch.
Thanks,
Shameer
On 12/10/25 2:37 PM, Shameer Kolothum wrote: > Hi, > > This RFC series adds initial support for NVIDIA Tegra241 CMDQV > (Command Queue Virtualisation), an extension to ARM SMMUv3 that > provides hardware accelerated virtual command queues (VCMDQs) for > guests. CMDQV allows guests to issue SMMU invalidation commands > directly to hardware without VM exits, significantly reducing TLBI > overhead. > > Thanks to Nicolin for the initial patches and testing on which this RFC > is based. > > This is based on v6[0] of the SMMUv3 accel series, which is still under > review, though nearing convergence. This is sent as an RFC, with the goal > of gathering early feedback on the CMDQV design and its integration with > the SMMUv3 acceleration path. > > Background: > > Tegra241 CMDQV extends SMMUv3 by allocating per-VM "virtual interfaces" > (VINTFs), each hosting up to 128 VCMDQs. Can you add a reference to some specification documet please? > > Each VINTF exposes two 64KB MMIO pages: > - Page0 – guest owned control and status registers (directly mapped > into the VM) > - Page1 – queue configuration registers (trapped/emulated by QEMU) > > Unlike the standard SMMU CMDQ, a guest owned Tegra241 VCMDQ does not > support the full command set. Only a subset, primarily invalidation > related commands, is accepted by the CMDQV hardware. For this reason, > a distinct CMDQV device must be exposed to the guest, and the guest OS > must include a Tegra241 CMDQV aware driver to take advantage of the > hardware acceleration. Do I understand correctly that this Tegra241 CMDQV aware driveris enabled by the CONFIG_TEGRA241_CMDQV on guest? Is it fully supported upstream? Eric > > VCMDQ support is integrated via the IOMMU_HW_QUEUE_ALLOC mechanism, > allowing QEMU to attach guest configured VCMDQ buffers to the > underlying CMDQV hardware through IOMMUFD. The Linux kernel already > supports the full CMDQV virtualisation model via IOMMUFD[0]. > > Summary of QEMU changes: > > - Integrated into the existing SMMUv3 accel path via a > "tegra241-cmdqv" property. > - Support for allocating vIOMMU objects of type > IOMMU_VIOMMU_TYPE_TEGRA241_CMDQV. > - Mapping and emulation of the CMDQV MMIO register layout. > - VCMDQ/VINTF read/write handling and queue allocation using IOMMUFD > APIs. > - Reset and initialisation hooks, including checks for at least one > cold-plugged device. > - CMDQV hardware reads guest queue memory using host physical addresses > provided through IOMMUFD, which requires that the VCMDQ buffer be > physically contiguous not only in guest PA space but also in host > PA space. When Tegra241 CMDQV is enabled, QEMU must therefore only > expose a CMDQV size that the host can reliably back with contiguous > physical memory. Because of this constraint, it is suggested to use > huge pages to back the guest RAM. > - ACPI DSDT node generation for CMDQV devices on the virt machine. > > These patches have been sanity tested on NVIDIA Grace platforms. > > ToDo / revisit: > - Prevent hot-unplug of the last device associated with vIOMMU as > this might allow associating a different host SMMU/CMDQV. > - Locking requirements around error event propagation. > > Feedback and testing are very welcome. > > Thanks, > Shameer > [0] https://lore.kernel.org/qemu-devel/20251120132213.56581-1-skolothumtho@nvidia.com/ > [1] https://lore.kernel.org/all/cover.1752126748.git.nicolinc@nvidia.com/ > > Nicolin Chen (12): > backends/iommufd: Update iommufd_backend_get_device_info > backends/iommufd: Update iommufd_backend_alloc_viommu to allow user > ptr > backends/iommufd: Introduce iommufd_backend_alloc_hw_queue > backends/iommufd: Introduce iommufd_backend_viommu_mmap > hw/arm/tegra241-cmdqv: Add initial Tegra241 CMDQ-Virtualisation > support > hw/arm/tegra241-cmdqv: Map VINTF Page0 into guest > hw/arm/tegra241-cmdqv: Add read emulation support for registers > system/physmem: Add helper to check whether a guest PA maps to RAM > hw/arm/tegra241-cmdqv:: Add write emulation for registers > hw/arm/tegra241-cmdqv: Add reset handler > hw/arm/tegra241-cmdqv: Limit queue size based on backend page size > hw/arm/virt-acpi: Advertise Tegra241 CMDQV nodes in DSDT > > Shameer Kolothum (4): > hw/arm/tegra241-cmdqv: Allocate vEVENTQ object > hw/arm/tegra241-cmdqv: Read and propagate Tegra241 CMDQV errors > virt-acpi-build: Rename AcpiIortSMMUv3Dev to AcpiSMMUv3Dev > hw/arm/smmuv3: Add tegra241-cmdqv property for SMMUv3 device > > backends/iommufd.c | 65 ++++ > backends/trace-events | 2 + > hw/arm/Kconfig | 5 + > hw/arm/meson.build | 1 + > hw/arm/smmuv3-accel.c | 16 +- > hw/arm/smmuv3.c | 18 + > hw/arm/tegra241-cmdqv.c | 759 ++++++++++++++++++++++++++++++++++++++ > hw/arm/tegra241-cmdqv.h | 337 +++++++++++++++++ > hw/arm/trace-events | 5 + > hw/arm/virt-acpi-build.c | 110 +++++- > hw/vfio/iommufd.c | 6 +- > include/exec/cpu-common.h | 2 + > include/hw/arm/smmuv3.h | 3 + > include/hw/arm/virt.h | 2 + > include/system/iommufd.h | 16 + > system/physmem.c | 12 + > 16 files changed, 1332 insertions(+), 27 deletions(-) > create mode 100644 hw/arm/tegra241-cmdqv.c > create mode 100644 hw/arm/tegra241-cmdqv.h >
> -----Original Message----- > From: Eric Auger <eric.auger@redhat.com> > Sent: 12 January 2026 15:44 > To: Shameer Kolothum <skolothumtho@nvidia.com>; qemu- > arm@nongnu.org; qemu-devel@nongnu.org > Cc: peter.maydell@linaro.org; Nicolin Chen <nicolinc@nvidia.com>; Nathan > Chen <nathanc@nvidia.com>; Matt Ochs <mochs@nvidia.com>; Jason > Gunthorpe <jgg@nvidia.com>; jonathan.cameron@huawei.com; > zhangfei.gao@linaro.org; zhenzhong.duan@intel.com; Krishnakant Jaju > <kjaju@nvidia.com> > Subject: Re: [RFC PATCH 00/16] hw/arm: Introduce Tegra241 CMDQV support > for accelerated SMMUv3 > > External email: Use caution opening links or attachments > > > On 12/10/25 2:37 PM, Shameer Kolothum wrote: > > Hi, > > > > This RFC series adds initial support for NVIDIA Tegra241 CMDQV > > (Command Queue Virtualisation), an extension to ARM SMMUv3 that > > provides hardware accelerated virtual command queues (VCMDQs) for > > guests. CMDQV allows guests to issue SMMU invalidation commands > > directly to hardware without VM exits, significantly reducing TLBI > > overhead. > > > > Thanks to Nicolin for the initial patches and testing on which this RFC > > is based. > > > > This is based on v6[0] of the SMMUv3 accel series, which is still under > > review, though nearing convergence. This is sent as an RFC, with the goal > > of gathering early feedback on the CMDQV design and its integration with > > the SMMUv3 acceleration path. > > > > Background: > > > > Tegra241 CMDQV extends SMMUv3 by allocating per-VM "virtual > interfaces" > > (VINTFs), each hosting up to 128 VCMDQs. > Can you add a reference to some specification documet please? > > > > Each VINTF exposes two 64KB MMIO pages: > > - Page0 – guest owned control and status registers (directly mapped > > into the VM) > > - Page1 – queue configuration registers (trapped/emulated by QEMU) > > > > Unlike the standard SMMU CMDQ, a guest owned Tegra241 VCMDQ does > not > > support the full command set. Only a subset, primarily invalidation > > related commands, is accepted by the CMDQV hardware. For this reason, > > a distinct CMDQV device must be exposed to the guest, and the guest OS > > must include a Tegra241 CMDQV aware driver to take advantage of the > > hardware acceleration. > Do I understand correctly that this Tegra241 CMDQV aware driveris > enabled by the CONFIG_TEGRA241_CMDQV on guest? Is it fully supported > upstream? Yes. With CONFIG_TEGRA241_CMDQV enabled, it should work. Thanks, Shameer
© 2016 - 2026 Red Hat, Inc.