[PATCH v6 05/22] hw/pci: Introduce pci_device_get_viommu_flags()

Zhenzhong Duan posted 22 patches 1 month, 3 weeks ago
Maintainers: Yi Liu <yi.l.liu@intel.com>, Eric Auger <eric.auger@redhat.com>, Zhenzhong Duan <zhenzhong.duan@intel.com>, "Michael S. Tsirkin" <mst@redhat.com>, Jason Wang <jasowang@redhat.com>, "Clément Mathieu--Drif" <clement.mathieu--drif@eviden.com>, Paolo Bonzini <pbonzini@redhat.com>, Richard Henderson <richard.henderson@linaro.org>, Eduardo Habkost <eduardo@habkost.net>, Marcel Apfelbaum <marcel.apfelbaum@gmail.com>, Alex Williamson <alex.williamson@redhat.com>, "Cédric Le Goater" <clg@redhat.com>, Fabiano Rosas <farosas@suse.de>, Laurent Vivier <lvivier@redhat.com>
There is a newer version of this series
[PATCH v6 05/22] hw/pci: Introduce pci_device_get_viommu_flags()
Posted by Zhenzhong Duan 1 month, 3 weeks ago
Introduce a new PCIIOMMUOps optional callback, get_viommu_flags() which
allows to retrieve flags exposed by a vIOMMU. The first planned vIOMMU
device flag is VIOMMU_FLAG_WANT_NESTING_PARENT that advertises the
support of HW nested stage translation scheme and wants other sub-system
like VFIO's cooperation to create nesting parent HWPT.

pci_device_get_viommu_flags() is a wrapper that can be called on a PCI
device potentially protected by a vIOMMU.

get_viommu_flags() is designed to return 64bit bitmap of purely vIOMMU
flags which are only determined by user's configuration, no host
capabilities involved. Reasons are:

1. host may has heterogeneous IOMMUs, each with different capabilities
2. this is migration friendly, return value is consistent between source
   and target.
3. host IOMMU capabilities are passed to vIOMMU through set_iommu_device()
   interface which have to be after attach_device(), when get_viommu_flags()
   is called in attach_device(), there is no way for vIOMMU to get host
   IOMMU capabilities yet, so only pure vIOMMU flags can be returned.
   See below sequence:

     vfio_device_attach():
         iommufd_cdev_attach():
             pci_device_get_viommu_flags() for HW nesting cap
             create a nesting parent HWPT
             attach device to the HWPT
             vfio_device_hiod_create_and_realize() creating hiod
     ...
     pci_device_set_iommu_device(hiod)

Suggested-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
 MAINTAINERS          |  1 +
 include/hw/iommu.h   | 19 +++++++++++++++++++
 include/hw/pci/pci.h | 27 +++++++++++++++++++++++++++
 hw/pci/pci.c         | 11 +++++++++++
 4 files changed, 58 insertions(+)
 create mode 100644 include/hw/iommu.h

diff --git a/MAINTAINERS b/MAINTAINERS
index f8cd513d8b..71457e4cde 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2307,6 +2307,7 @@ F: include/system/iommufd.h
 F: backends/host_iommu_device.c
 F: include/system/host_iommu_device.h
 F: include/qemu/chardev_open.h
+F: include/hw/iommu.h
 F: util/chardev_open.c
 F: docs/devel/vfio-iommufd.rst
 
diff --git a/include/hw/iommu.h b/include/hw/iommu.h
new file mode 100644
index 0000000000..65d652950a
--- /dev/null
+++ b/include/hw/iommu.h
@@ -0,0 +1,19 @@
+/*
+ * General vIOMMU flags
+ *
+ * Copyright (C) 2025 Intel Corporation.
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef HW_IOMMU_H
+#define HW_IOMMU_H
+
+#include "qemu/bitops.h"
+
+enum {
+    /* Nesting parent HWPT will be reused by vIOMMU to create nested HWPT */
+     VIOMMU_FLAG_WANT_NESTING_PARENT = BIT_ULL(0),
+};
+
+#endif /* HW_IOMMU_H */
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index bde9dca8e2..c54f2b53ae 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -462,6 +462,23 @@ typedef struct PCIIOMMUOps {
      * @devfn: device and function number of the PCI device.
      */
     void (*unset_iommu_device)(PCIBus *bus, void *opaque, int devfn);
+    /**
+     * @get_viommu_flags: get vIOMMU flags
+     *
+     * Optional callback, if not implemented, then vIOMMU doesn't support
+     * exposing flags to other sub-system, e.g., VFIO. Each flag can be
+     * an expectation or request to other sub-system or just a pure vIOMMU
+     * capability. vIOMMU can choose which flags to expose.
+     *
+     * @opaque: the data passed to pci_setup_iommu().
+     *
+     * Returns: 64bit bitmap with each bit represents a flag that vIOMMU
+     * wants to expose. See VIOMMU_FLAG_* in include/hw/iommu.h for all
+     * possible flags currently used. These flags are theoretical which
+     * are only determined by vIOMMU device properties and independent on
+     * the actual host capabilities they may depend on.
+     */
+    uint64_t (*get_viommu_flags)(void *opaque);
     /**
      * @get_iotlb_info: get properties required to initialize a device IOTLB.
      *
@@ -644,6 +661,16 @@ bool pci_device_set_iommu_device(PCIDevice *dev, HostIOMMUDevice *hiod,
                                  Error **errp);
 void pci_device_unset_iommu_device(PCIDevice *dev);
 
+/**
+ * pci_device_get_viommu_flags: get vIOMMU flags.
+ *
+ * Returns a 64bit bitmap with each bit represents a vIOMMU exposed
+ * flags, 0 if vIOMMU doesn't support that.
+ *
+ * @dev: PCI device pointer.
+ */
+uint64_t pci_device_get_viommu_flags(PCIDevice *dev);
+
 /**
  * pci_iommu_get_iotlb_info: get properties required to initialize a
  * device IOTLB.
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 4d4b9dda4d..1315ef13ea 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -3012,6 +3012,17 @@ void pci_device_unset_iommu_device(PCIDevice *dev)
     }
 }
 
+uint64_t pci_device_get_viommu_flags(PCIDevice *dev)
+{
+    PCIBus *iommu_bus;
+
+    pci_device_get_iommu_bus_devfn(dev, &iommu_bus, NULL, NULL);
+    if (iommu_bus && iommu_bus->iommu_ops->get_viommu_flags) {
+        return iommu_bus->iommu_ops->get_viommu_flags(iommu_bus->iommu_opaque);
+    }
+    return 0;
+}
+
 int pci_pri_request_page(PCIDevice *dev, uint32_t pasid, bool priv_req,
                          bool exec_req, hwaddr addr, bool lpig,
                          uint16_t prgi, bool is_read, bool is_write)
-- 
2.47.1
Re: [PATCH v6 05/22] hw/pci: Introduce pci_device_get_viommu_flags()
Posted by Yi Liu 1 month ago
On 2025/9/18 16:57, Zhenzhong Duan wrote:
> Introduce a new PCIIOMMUOps optional callback, get_viommu_flags() which
> allows to retrieve flags exposed by a vIOMMU. The first planned vIOMMU
> device flag is VIOMMU_FLAG_WANT_NESTING_PARENT that advertises the
> support of HW nested stage translation scheme and wants other sub-system
> like VFIO's cooperation to create nesting parent HWPT.
> 
> pci_device_get_viommu_flags() is a wrapper that can be called on a PCI
> device potentially protected by a vIOMMU.
> 
> get_viommu_flags() is designed to return 64bit bitmap of purely vIOMMU
> flags which are only determined by user's configuration, no host
> capabilities involved. Reasons are:
> 
> 1. host may has heterogeneous IOMMUs, each with different capabilities
> 2. this is migration friendly, return value is consistent between source
>     and target.
> 3. host IOMMU capabilities are passed to vIOMMU through set_iommu_device()
>     interface which have to be after attach_device(), when get_viommu_flags()
>     is called in attach_device(), there is no way for vIOMMU to get host
>     IOMMU capabilities yet, so only pure vIOMMU flags can be returned.
>     See below sequence:
> 
>       vfio_device_attach():
>           iommufd_cdev_attach():
>               pci_device_get_viommu_flags() for HW nesting cap
>               create a nesting parent HWPT
>               attach device to the HWPT
>               vfio_device_hiod_create_and_realize() creating hiod
>       ...
>       pci_device_set_iommu_device(hiod)
> 
> Suggested-by: Yi Liu <yi.l.liu@intel.com>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> ---
>   MAINTAINERS          |  1 +
>   include/hw/iommu.h   | 19 +++++++++++++++++++
>   include/hw/pci/pci.h | 27 +++++++++++++++++++++++++++
>   hw/pci/pci.c         | 11 +++++++++++
>   4 files changed, 58 insertions(+)
>   create mode 100644 include/hw/iommu.h
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index f8cd513d8b..71457e4cde 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -2307,6 +2307,7 @@ F: include/system/iommufd.h
>   F: backends/host_iommu_device.c
>   F: include/system/host_iommu_device.h
>   F: include/qemu/chardev_open.h
> +F: include/hw/iommu.h
>   F: util/chardev_open.c
>   F: docs/devel/vfio-iommufd.rst
>   
> diff --git a/include/hw/iommu.h b/include/hw/iommu.h
> new file mode 100644
> index 0000000000..65d652950a
> --- /dev/null
> +++ b/include/hw/iommu.h
> @@ -0,0 +1,19 @@
> +/*
> + * General vIOMMU flags
> + *
> + * Copyright (C) 2025 Intel Corporation.
> + *
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + */
> +
> +#ifndef HW_IOMMU_H
> +#define HW_IOMMU_H
> +
> +#include "qemu/bitops.h"
> +
> +enum {
> +    /* Nesting parent HWPT will be reused by vIOMMU to create nested HWPT */

vIOMMU needs nesting parent HWPT to create nested HWPT

> +     VIOMMU_FLAG_WANT_NESTING_PARENT = BIT_ULL(0),
> +};
> +
> +#endif /* HW_IOMMU_H */
> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> index bde9dca8e2..c54f2b53ae 100644
> --- a/include/hw/pci/pci.h
> +++ b/include/hw/pci/pci.h
> @@ -462,6 +462,23 @@ typedef struct PCIIOMMUOps {
>        * @devfn: device and function number of the PCI device.
>        */
>       void (*unset_iommu_device)(PCIBus *bus, void *opaque, int devfn);
> +    /**
> +     * @get_viommu_flags: get vIOMMU flags
> +     *
> +     * Optional callback, if not implemented, then vIOMMU doesn't support
> +     * exposing flags to other sub-system, e.g., VFIO. Each flag can be
> +     * an expectation or request to other sub-system or just a pure vIOMMU
> +     * capability. vIOMMU can choose which flags to expose.
> +     *
> +     * @opaque: the data passed to pci_setup_iommu().
> +     *
> +     * Returns: 64bit bitmap with each bit represents a flag that vIOMMU
> +     * wants to expose. See VIOMMU_FLAG_* in include/hw/iommu.h for all
> +     * possible flags currently used. These flags are theoretical which
> +     * are only determined by vIOMMU device properties and independent on
> +     * the actual host capabilities they may depend on.
> +     */
> +    uint64_t (*get_viommu_flags)(void *opaque);
>       /**
>        * @get_iotlb_info: get properties required to initialize a device IOTLB.
>        *
> @@ -644,6 +661,16 @@ bool pci_device_set_iommu_device(PCIDevice *dev, HostIOMMUDevice *hiod,
>                                    Error **errp);
>   void pci_device_unset_iommu_device(PCIDevice *dev);
>   
> +/**
> + * pci_device_get_viommu_flags: get vIOMMU flags.
> + *
> + * Returns a 64bit bitmap with each bit represents a vIOMMU exposed
> + * flags, 0 if vIOMMU doesn't support that.
> + *
> + * @dev: PCI device pointer.
> + */
> +uint64_t pci_device_get_viommu_flags(PCIDevice *dev);
> +
>   /**
>    * pci_iommu_get_iotlb_info: get properties required to initialize a
>    * device IOTLB.
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index 4d4b9dda4d..1315ef13ea 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -3012,6 +3012,17 @@ void pci_device_unset_iommu_device(PCIDevice *dev)
>       }
>   }
>   
> +uint64_t pci_device_get_viommu_flags(PCIDevice *dev)
> +{
> +    PCIBus *iommu_bus;
> +
> +    pci_device_get_iommu_bus_devfn(dev, &iommu_bus, NULL, NULL);
> +    if (iommu_bus && iommu_bus->iommu_ops->get_viommu_flags) {
> +        return iommu_bus->iommu_ops->get_viommu_flags(iommu_bus->iommu_opaque);
> +    }
> +    return 0;
> +}
> +
>   int pci_pri_request_page(PCIDevice *dev, uint32_t pasid, bool priv_req,
>                            bool exec_req, hwaddr addr, bool lpig,
>                            uint16_t prgi, bool is_read, bool is_write)

The patch LGTM.

Reviewed-by: Yi Liu <yi.l.liu@intel.com>
RE: [PATCH v6 05/22] hw/pci: Introduce pci_device_get_viommu_flags()
Posted by Duan, Zhenzhong 1 month ago

>-----Original Message-----
>From: Liu, Yi L <yi.l.liu@intel.com>
>Subject: Re: [PATCH v6 05/22] hw/pci: Introduce
>pci_device_get_viommu_flags()
>
>On 2025/9/18 16:57, Zhenzhong Duan wrote:
>> Introduce a new PCIIOMMUOps optional callback, get_viommu_flags()
>which
>> allows to retrieve flags exposed by a vIOMMU. The first planned vIOMMU
>> device flag is VIOMMU_FLAG_WANT_NESTING_PARENT that advertises the
>> support of HW nested stage translation scheme and wants other sub-system
>> like VFIO's cooperation to create nesting parent HWPT.
>>
>> pci_device_get_viommu_flags() is a wrapper that can be called on a PCI
>> device potentially protected by a vIOMMU.
>>
>> get_viommu_flags() is designed to return 64bit bitmap of purely vIOMMU
>> flags which are only determined by user's configuration, no host
>> capabilities involved. Reasons are:
>>
>> 1. host may has heterogeneous IOMMUs, each with different capabilities
>> 2. this is migration friendly, return value is consistent between source
>>     and target.
>> 3. host IOMMU capabilities are passed to vIOMMU through
>set_iommu_device()
>>     interface which have to be after attach_device(), when
>get_viommu_flags()
>>     is called in attach_device(), there is no way for vIOMMU to get host
>>     IOMMU capabilities yet, so only pure vIOMMU flags can be returned.
>>     See below sequence:
>>
>>       vfio_device_attach():
>>           iommufd_cdev_attach():
>>               pci_device_get_viommu_flags() for HW nesting cap
>>               create a nesting parent HWPT
>>               attach device to the HWPT
>>               vfio_device_hiod_create_and_realize() creating hiod
>>       ...
>>       pci_device_set_iommu_device(hiod)
>>
>> Suggested-by: Yi Liu <yi.l.liu@intel.com>
>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>> ---
>>   MAINTAINERS          |  1 +
>>   include/hw/iommu.h   | 19 +++++++++++++++++++
>>   include/hw/pci/pci.h | 27 +++++++++++++++++++++++++++
>>   hw/pci/pci.c         | 11 +++++++++++
>>   4 files changed, 58 insertions(+)
>>   create mode 100644 include/hw/iommu.h
>>
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index f8cd513d8b..71457e4cde 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -2307,6 +2307,7 @@ F: include/system/iommufd.h
>>   F: backends/host_iommu_device.c
>>   F: include/system/host_iommu_device.h
>>   F: include/qemu/chardev_open.h
>> +F: include/hw/iommu.h
>>   F: util/chardev_open.c
>>   F: docs/devel/vfio-iommufd.rst
>>
>> diff --git a/include/hw/iommu.h b/include/hw/iommu.h
>> new file mode 100644
>> index 0000000000..65d652950a
>> --- /dev/null
>> +++ b/include/hw/iommu.h
>> @@ -0,0 +1,19 @@
>> +/*
>> + * General vIOMMU flags
>> + *
>> + * Copyright (C) 2025 Intel Corporation.
>> + *
>> + * SPDX-License-Identifier: GPL-2.0-or-later
>> + */
>> +
>> +#ifndef HW_IOMMU_H
>> +#define HW_IOMMU_H
>> +
>> +#include "qemu/bitops.h"
>> +
>> +enum {
>> +    /* Nesting parent HWPT will be reused by vIOMMU to create nested
>HWPT */
>
>vIOMMU needs nesting parent HWPT to create nested HWPT

Will do.

Thanks
Zhenzhong
Re: [PATCH v6 05/22] hw/pci: Introduce pci_device_get_viommu_flags()
Posted by Nicolin Chen 1 month, 3 weeks ago
On Thu, Sep 18, 2025 at 04:57:44AM -0400, Zhenzhong Duan wrote:
> Introduce a new PCIIOMMUOps optional callback, get_viommu_flags() which
> allows to retrieve flags exposed by a vIOMMU. The first planned vIOMMU
> device flag is VIOMMU_FLAG_WANT_NESTING_PARENT that advertises the
> support of HW nested stage translation scheme and wants other sub-system
> like VFIO's cooperation to create nesting parent HWPT.
> 
> pci_device_get_viommu_flags() is a wrapper that can be called on a PCI
> device potentially protected by a vIOMMU.
> 
> get_viommu_flags() is designed to return 64bit bitmap of purely vIOMMU
> flags which are only determined by user's configuration, no host
> capabilities involved. Reasons are:
> 
> 1. host may has heterogeneous IOMMUs, each with different capabilities
> 2. this is migration friendly, return value is consistent between source
>    and target.
> 3. host IOMMU capabilities are passed to vIOMMU through set_iommu_device()
>    interface which have to be after attach_device(), when get_viommu_flags()
>    is called in attach_device(), there is no way for vIOMMU to get host
>    IOMMU capabilities yet, so only pure vIOMMU flags can be returned.

"no way" sounds too strong..

There is an iommufd_backend_get_device_info() call there. So, we
could have passed the host IOMMU capabilities to a vIOMMU. Just,
we chose not to (assuming for migration reason?).

>    See below sequence:
> 
>      vfio_device_attach():
>          iommufd_cdev_attach():
>              pci_device_get_viommu_flags() for HW nesting cap
>              create a nesting parent HWPT
>              attach device to the HWPT
>              vfio_device_hiod_create_and_realize() creating hiod
>      ...
>      pci_device_set_iommu_device(hiod)
> 
> Suggested-by: Yi Liu <yi.l.liu@intel.com>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>

Despite some nits, patch looks good to me:

Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>

> +enum {
> +    /* Nesting parent HWPT will be reused by vIOMMU to create nested HWPT */
> +     VIOMMU_FLAG_WANT_NESTING_PARENT = BIT_ULL(0),
> +};

How about adding a name and move the note here:

/*
 * Theoretical vIOMMU flags. Only determined by the vIOMMU device properties and
 * independent on the actual host IOMMU capabilities they may depend on.
 */
enum viommu_flags {
	...
};

> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> index bde9dca8e2..c54f2b53ae 100644
> --- a/include/hw/pci/pci.h
> +++ b/include/hw/pci/pci.h
> @@ -462,6 +462,23 @@ typedef struct PCIIOMMUOps {
>       * @devfn: device and function number of the PCI device.
>       */
>      void (*unset_iommu_device)(PCIBus *bus, void *opaque, int devfn);
> +    /**
> +     * @get_viommu_flags: get vIOMMU flags
> +     *
> +     * Optional callback, if not implemented, then vIOMMU doesn't support
> +     * exposing flags to other sub-system, e.g., VFIO. Each flag can be
> +     * an expectation or request to other sub-system or just a pure vIOMMU
> +     * capability. vIOMMU can choose which flags to expose.

The 2nd statement is somewhat redundant. Perhaps we could squash
it into the notes at enum viommu_flags above, if we really need.

> +     *
> +     * @opaque: the data passed to pci_setup_iommu().
> +     *
> +     * Returns: 64bit bitmap with each bit represents a flag that vIOMMU
> +     * wants to expose. See VIOMMU_FLAG_* in include/hw/iommu.h for all
> +     * possible flags currently used. These flags are theoretical which
> +     * are only determined by vIOMMU device properties and independent on
> +     * the actual host capabilities they may depend on.
> +     */
> +    uint64_t (*get_viommu_flags)(void *opaque);

With the notes above, we could simplify this:

     * Returns: bitmap with each representing a vIOMMU flag defined in
     * enum viommu_flags

> +/**
> + * pci_device_get_viommu_flags: get vIOMMU flags.
> + *
> + * Returns a 64bit bitmap with each bit represents a vIOMMU exposed
> + * flags, 0 if vIOMMU doesn't support that.
> + *
> + * @dev: PCI device pointer.
> + */
> +uint64_t pci_device_get_viommu_flags(PCIDevice *dev);
 
and could make this aligned too:

     * Returns: bitmap with each representing a vIOMMU flag defined in
     * enum viommu_flags. Or 0 if vIOMMU doesn't report any.

Nicolin
RE: [PATCH v6 05/22] hw/pci: Introduce pci_device_get_viommu_flags()
Posted by Duan, Zhenzhong 1 month, 3 weeks ago

>-----Original Message-----
>From: Nicolin Chen <nicolinc@nvidia.com>
>Subject: Re: [PATCH v6 05/22] hw/pci: Introduce
>pci_device_get_viommu_flags()
>
>On Thu, Sep 18, 2025 at 04:57:44AM -0400, Zhenzhong Duan wrote:
>> Introduce a new PCIIOMMUOps optional callback, get_viommu_flags()
>which
>> allows to retrieve flags exposed by a vIOMMU. The first planned vIOMMU
>> device flag is VIOMMU_FLAG_WANT_NESTING_PARENT that advertises the
>> support of HW nested stage translation scheme and wants other sub-system
>> like VFIO's cooperation to create nesting parent HWPT.
>>
>> pci_device_get_viommu_flags() is a wrapper that can be called on a PCI
>> device potentially protected by a vIOMMU.
>>
>> get_viommu_flags() is designed to return 64bit bitmap of purely vIOMMU
>> flags which are only determined by user's configuration, no host
>> capabilities involved. Reasons are:
>>
>> 1. host may has heterogeneous IOMMUs, each with different capabilities
>> 2. this is migration friendly, return value is consistent between source
>>    and target.
>> 3. host IOMMU capabilities are passed to vIOMMU through
>set_iommu_device()
>>    interface which have to be after attach_device(), when
>get_viommu_flags()
>>    is called in attach_device(), there is no way for vIOMMU to get host
>>    IOMMU capabilities yet, so only pure vIOMMU flags can be returned.
>
>"no way" sounds too strong..
>
>There is an iommufd_backend_get_device_info() call there. So, we
>could have passed the host IOMMU capabilities to a vIOMMU. Just,
>we chose not to (assuming for migration reason?).

What about 'it's hard for vIOMMU to get host IOMMU...'?

>
>>    See below sequence:
>>
>>      vfio_device_attach():
>>          iommufd_cdev_attach():
>>              pci_device_get_viommu_flags() for HW nesting cap
>>              create a nesting parent HWPT
>>              attach device to the HWPT
>>              vfio_device_hiod_create_and_realize() creating hiod
>>      ...
>>      pci_device_set_iommu_device(hiod)
>>
>> Suggested-by: Yi Liu <yi.l.liu@intel.com>
>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>
>Despite some nits, patch looks good to me:
>
>Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
>
>> +enum {
>> +    /* Nesting parent HWPT will be reused by vIOMMU to create nested
>HWPT */
>> +     VIOMMU_FLAG_WANT_NESTING_PARENT = BIT_ULL(0),
>> +};
>
>How about adding a name and move the note here:
>
>/*
> * Theoretical vIOMMU flags. Only determined by the vIOMMU device
>properties and
> * independent on the actual host IOMMU capabilities they may depend on.
> */
>enum viommu_flags {
>	...
>};
>
>> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
>> index bde9dca8e2..c54f2b53ae 100644
>> --- a/include/hw/pci/pci.h
>> +++ b/include/hw/pci/pci.h
>> @@ -462,6 +462,23 @@ typedef struct PCIIOMMUOps {
>>       * @devfn: device and function number of the PCI device.
>>       */
>>      void (*unset_iommu_device)(PCIBus *bus, void *opaque, int devfn);
>> +    /**
>> +     * @get_viommu_flags: get vIOMMU flags
>> +     *
>> +     * Optional callback, if not implemented, then vIOMMU doesn't
>support
>> +     * exposing flags to other sub-system, e.g., VFIO. Each flag can be
>> +     * an expectation or request to other sub-system or just a pure
>vIOMMU
>> +     * capability. vIOMMU can choose which flags to expose.
>
>The 2nd statement is somewhat redundant. Perhaps we could squash
>it into the notes at enum viommu_flags above, if we really need.
>
>> +     *
>> +     * @opaque: the data passed to pci_setup_iommu().
>> +     *
>> +     * Returns: 64bit bitmap with each bit represents a flag that vIOMMU
>> +     * wants to expose. See VIOMMU_FLAG_* in include/hw/iommu.h
>for all
>> +     * possible flags currently used. These flags are theoretical which
>> +     * are only determined by vIOMMU device properties and
>independent on
>> +     * the actual host capabilities they may depend on.
>> +     */
>> +    uint64_t (*get_viommu_flags)(void *opaque);
>
>With the notes above, we could simplify this:
>
>     * Returns: bitmap with each representing a vIOMMU flag defined in
>     * enum viommu_flags
>
>> +/**
>> + * pci_device_get_viommu_flags: get vIOMMU flags.
>> + *
>> + * Returns a 64bit bitmap with each bit represents a vIOMMU exposed
>> + * flags, 0 if vIOMMU doesn't support that.
>> + *
>> + * @dev: PCI device pointer.
>> + */
>> +uint64_t pci_device_get_viommu_flags(PCIDevice *dev);
>
>and could make this aligned too:
>
>     * Returns: bitmap with each representing a vIOMMU flag defined in
>     * enum viommu_flags. Or 0 if vIOMMU doesn't report any.

Will do all suggested changes above.

Thanks
Zhenzhong
Re: [PATCH v6 05/22] hw/pci: Introduce pci_device_get_viommu_flags()
Posted by Nicolin Chen 1 month, 3 weeks ago
On Wed, Sep 24, 2025 at 07:05:42AM +0000, Duan, Zhenzhong wrote:
> >From: Nicolin Chen <nicolinc@nvidia.com>
> >Subject: Re: [PATCH v6 05/22] hw/pci: Introduce
> >> get_viommu_flags() is designed to return 64bit bitmap of purely vIOMMU
> >> flags which are only determined by user's configuration, no host
> >> capabilities involved. Reasons are:
> >>
> >> 1. host may has heterogeneous IOMMUs, each with different capabilities
> >> 2. this is migration friendly, return value is consistent between source
> >>    and target.
> >> 3. host IOMMU capabilities are passed to vIOMMU through
> >set_iommu_device()
> >>    interface which have to be after attach_device(), when
> >get_viommu_flags()
> >>    is called in attach_device(), there is no way for vIOMMU to get host
> >>    IOMMU capabilities yet, so only pure vIOMMU flags can be returned.
> >
> >"no way" sounds too strong..
> >
> >There is an iommufd_backend_get_device_info() call there. So, we
> >could have passed the host IOMMU capabilities to a vIOMMU. Just,
> >we chose not to (assuming for migration reason?).
> 
> What about 'it's hard for vIOMMU to get host IOMMU...'?

vfio-iommufd core code gets all the host IOMMU caps via the vfio
device but chooses to not forward to vIOMMU. So, it's neither "no
way" nor "hard" :)

To be honest, I don't feel this very related to be the reason 3
to justify for the new op/API. 1 and 2 are quite okay?

Having said that, it's probably good to add as a side note:

"
Note that this op will be invoked at the attach_device() stage, at which
point host IOMMU capabilities are not yet forwarded to the vIOMMU through
the set_iommu_device() callback that will be after the attach_device().

See the below sequence:
"

Nicolin
RE: [PATCH v6 05/22] hw/pci: Introduce pci_device_get_viommu_flags()
Posted by Duan, Zhenzhong 1 month, 2 weeks ago

>-----Original Message-----
>From: Nicolin Chen <nicolinc@nvidia.com>
>Subject: Re: [PATCH v6 05/22] hw/pci: Introduce
>pci_device_get_viommu_flags()
>
>On Wed, Sep 24, 2025 at 07:05:42AM +0000, Duan, Zhenzhong wrote:
>> >From: Nicolin Chen <nicolinc@nvidia.com>
>> >Subject: Re: [PATCH v6 05/22] hw/pci: Introduce
>> >> get_viommu_flags() is designed to return 64bit bitmap of purely
>vIOMMU
>> >> flags which are only determined by user's configuration, no host
>> >> capabilities involved. Reasons are:
>> >>
>> >> 1. host may has heterogeneous IOMMUs, each with different capabilities
>> >> 2. this is migration friendly, return value is consistent between source
>> >>    and target.
>> >> 3. host IOMMU capabilities are passed to vIOMMU through
>> >set_iommu_device()
>> >>    interface which have to be after attach_device(), when
>> >get_viommu_flags()
>> >>    is called in attach_device(), there is no way for vIOMMU to get host
>> >>    IOMMU capabilities yet, so only pure vIOMMU flags can be
>returned.
>> >
>> >"no way" sounds too strong..
>> >
>> >There is an iommufd_backend_get_device_info() call there. So, we
>> >could have passed the host IOMMU capabilities to a vIOMMU. Just,
>> >we chose not to (assuming for migration reason?).
>>
>> What about 'it's hard for vIOMMU to get host IOMMU...'?
>
>vfio-iommufd core code gets all the host IOMMU caps via the vfio
>device but chooses to not forward to vIOMMU. So, it's neither "no
>way" nor "hard" :)

Yes, that needs to introduce another callback to forward the caps early,
unnecessarily complex.

>
>To be honest, I don't feel this very related to be the reason 3
>to justify for the new op/API. 1 and 2 are quite okay?
>
>Having said that, it's probably good to add as a side note:
>
>"
>Note that this op will be invoked at the attach_device() stage, at which
>point host IOMMU capabilities are not yet forwarded to the vIOMMU through
>the set_iommu_device() callback that will be after the attach_device().
>
>See the below sequence:
>"

OK, will drop 3 and add the side note.

Thanks
Zhenzhong
Re: [PATCH v6 05/22] hw/pci: Introduce pci_device_get_viommu_flags()
Posted by Eric Auger 1 month, 2 weeks ago

On 9/26/25 4:54 AM, Duan, Zhenzhong wrote:
>
>> -----Original Message-----
>> From: Nicolin Chen <nicolinc@nvidia.com>
>> Subject: Re: [PATCH v6 05/22] hw/pci: Introduce
>> pci_device_get_viommu_flags()
>>
>> On Wed, Sep 24, 2025 at 07:05:42AM +0000, Duan, Zhenzhong wrote:
>>>> From: Nicolin Chen <nicolinc@nvidia.com>
>>>> Subject: Re: [PATCH v6 05/22] hw/pci: Introduce
>>>>> get_viommu_flags() is designed to return 64bit bitmap of purely
>> vIOMMU
>>>>> flags which are only determined by user's configuration, no host
>>>>> capabilities involved. Reasons are:
>>>>>
>>>>> 1. host may has heterogeneous IOMMUs, each with different capabilities
>>>>> 2. this is migration friendly, return value is consistent between source
>>>>>    and target.
>>>>> 3. host IOMMU capabilities are passed to vIOMMU through
>>>> set_iommu_device()
>>>>>    interface which have to be after attach_device(), when
>>>> get_viommu_flags()
>>>>>    is called in attach_device(), there is no way for vIOMMU to get host
>>>>>    IOMMU capabilities yet, so only pure vIOMMU flags can be
>> returned.
>>>> "no way" sounds too strong..
>>>>
>>>> There is an iommufd_backend_get_device_info() call there. So, we
>>>> could have passed the host IOMMU capabilities to a vIOMMU. Just,
>>>> we chose not to (assuming for migration reason?).
>>> What about 'it's hard for vIOMMU to get host IOMMU...'?
>> vfio-iommufd core code gets all the host IOMMU caps via the vfio
>> device but chooses to not forward to vIOMMU. So, it's neither "no
>> way" nor "hard" :)
> Yes, that needs to introduce another callback to forward the caps early,
> unnecessarily complex.
>
>> To be honest, I don't feel this very related to be the reason 3
>> to justify for the new op/API. 1 and 2 are quite okay?
>>
>> Having said that, it's probably good to add as a side note:
>>
>> "
>> Note that this op will be invoked at the attach_device() stage, at which
>> point host IOMMU capabilities are not yet forwarded to the vIOMMU through
>> the set_iommu_device() callback that will be after the attach_device().
>>
>> See the below sequence:
>> "
> OK, will drop 3 and add the side note.

With Nicolin's suggestions:
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Eric
>
> Thanks
> Zhenzhong
>