[PATCH v5 31/32] vfio: Synthesize vPASID capability to VM

Shameer Kolothum posted 32 patches 3 months, 1 week ago
There is a newer version of this series
[PATCH v5 31/32] vfio: Synthesize vPASID capability to VM
Posted by Shameer Kolothum 3 months, 1 week ago
From: Yi Liu <yi.l.liu@intel.com>

If user wants to expose PASID capability in vIOMMU, then VFIO would also
report the PASID cap for this device if the underlying hardware supports
it as well.

As a start, this chooses to put the vPASID cap in the last 8 bytes of the
vconfig space. This is a choice in the good hope of no conflict with any
existing cap or hidden registers. For the devices that has hidden registers,
user should figure out a proper offset for the vPASID cap. This may require
an option for user to config it. Here we leave it as a future extension.
There are more discussions on the mechanism of finding the proper offset.

https://lore.kernel.org/kvm/BN9PR11MB5276318969A212AD0649C7BE8CBE2@BN9PR11MB5276.namprd11.prod.outlook.com/

Since we add a check to ensure the vIOMMU supports PASID, only devices
under those vIOMMUs can synthesize the vPASID capability. This gives
users control over which devices expose vPASID.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
---
 hw/vfio/pci.c      | 37 +++++++++++++++++++++++++++++++++++++
 include/hw/iommu.h |  1 +
 2 files changed, 38 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 06b06afc2b..2054eac897 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -24,6 +24,7 @@
 #include <sys/ioctl.h>
 
 #include "hw/hw.h"
+#include "hw/iommu.h"
 #include "hw/pci/msi.h"
 #include "hw/pci/msix.h"
 #include "hw/pci/pci_bridge.h"
@@ -2500,7 +2501,12 @@ static int vfio_setup_rebar_ecap(VFIOPCIDevice *vdev, uint16_t pos)
 
 static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
 {
+    HostIOMMUDevice *hiod = vdev->vbasedev.hiod;
+    HostIOMMUDeviceClass *hiodc = HOST_IOMMU_DEVICE_GET_CLASS(hiod);
     PCIDevice *pdev = PCI_DEVICE(vdev);
+    uint64_t max_pasid_log2 = 0;
+    bool pasid_cap_added = false;
+    uint64_t hw_caps;
     uint32_t header;
     uint16_t cap_id, next, size;
     uint8_t cap_ver;
@@ -2578,12 +2584,43 @@ static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
                 pcie_add_capability(pdev, cap_id, cap_ver, next, size);
             }
             break;
+        case PCI_EXT_CAP_ID_PASID:
+             pasid_cap_added = true;
+             /* fallthrough */
         default:
             pcie_add_capability(pdev, cap_id, cap_ver, next, size);
         }
 
     }
 
+#ifdef CONFIG_IOMMUFD
+    /*
+     * Although we check for PCI_EXT_CAP_ID_PASID above, the Linux VFIO
+     * framework currently hides this capability. Try to retrieve it
+     * through alternative kernel interfaces (e.g. IOMMUFD APIs).
+     */
+    if (!pasid_cap_added && hiodc->get_cap) {
+        hiodc->get_cap(hiod, HOST_IOMMU_DEVICE_CAP_GENERIC_HW, &hw_caps, NULL);
+        hiodc->get_cap(hiod, HOST_IOMMU_DEVICE_CAP_MAX_PASID_LOG2,
+                       &max_pasid_log2, NULL);
+    }
+
+    /*
+     * If supported, adds the PASID capability in the end of the PCIe config
+     * space. TODO: Add option for enabling pasid at a safe offset.
+     */
+    if (max_pasid_log2 && (pci_device_get_viommu_flags(pdev) &
+                           VIOMMU_FLAG_PASID_SUPPORTED)) {
+        bool exec_perm = (hw_caps & IOMMU_HW_CAP_PCI_PASID_EXEC) ? true : false;
+        bool priv_mod = (hw_caps & IOMMU_HW_CAP_PCI_PASID_PRIV) ? true : false;
+
+        pcie_pasid_init(pdev, PCIE_CONFIG_SPACE_SIZE - PCI_EXT_CAP_PASID_SIZEOF,
+                        max_pasid_log2, exec_perm, priv_mod);
+        /* PASID capability is fully emulated by QEMU */
+        memset(vdev->emulated_config_bits + pdev->exp.pasid_cap, 0xff, 8);
+    }
+#endif
+
     /* Cleanup chain head ID if necessary */
     if (pci_get_word(pdev->config + PCI_CONFIG_SPACE_SIZE) == 0xFFFF) {
         pci_set_word(pdev->config + PCI_CONFIG_SPACE_SIZE, 0);
diff --git a/include/hw/iommu.h b/include/hw/iommu.h
index 9b8bb94fc2..9635770bee 100644
--- a/include/hw/iommu.h
+++ b/include/hw/iommu.h
@@ -20,6 +20,7 @@
 enum viommu_flags {
     /* vIOMMU needs nesting parent HWPT to create nested HWPT */
     VIOMMU_FLAG_WANT_NESTING_PARENT = BIT_ULL(0),
+    VIOMMU_FLAG_PASID_SUPPORTED = BIT_ULL(1),
 };
 
 #endif /* HW_IOMMU_H */
-- 
2.43.0
Re: [PATCH v5 31/32] vfio: Synthesize vPASID capability to VM
Posted by Cédric Le Goater 2 months ago
Hello Shameer, Yi,

On 10/31/25 11:50, Shameer Kolothum wrote:
> From: Yi Liu <yi.l.liu@intel.com>
> 
> If user wants to expose PASID capability in vIOMMU, then VFIO would also
> report the PASID cap for this device if the underlying hardware supports
> it as well.
> 
> As a start, this chooses to put the vPASID cap in the last 8 bytes of the
> vconfig space. This is a choice in the good hope of no conflict with any
> existing cap or hidden registers. For the devices that has hidden registers,
> user should figure out a proper offset for the vPASID cap. This may require
> an option for user to config it. Here we leave it as a future extension.
> There are more discussions on the mechanism of finding the proper offset.
> 
> https://lore.kernel.org/kvm/BN9PR11MB5276318969A212AD0649C7BE8CBE2@BN9PR11MB5276.namprd11.prod.outlook.com/
> 
> Since we add a check to ensure the vIOMMU supports PASID, only devices
> under those vIOMMUs can synthesize the vPASID capability. This gives
> users control over which devices expose vPASID.
> 
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
> Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
> ---
>   hw/vfio/pci.c      | 37 +++++++++++++++++++++++++++++++++++++
>   include/hw/iommu.h |  1 +
>   2 files changed, 38 insertions(+)
> 
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 06b06afc2b..2054eac897 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -24,6 +24,7 @@
>   #include <sys/ioctl.h>
>   
>   #include "hw/hw.h"
> +#include "hw/iommu.h"
>   #include "hw/pci/msi.h"
>   #include "hw/pci/msix.h"
>   #include "hw/pci/pci_bridge.h"
> @@ -2500,7 +2501,12 @@ static int vfio_setup_rebar_ecap(VFIOPCIDevice *vdev, uint16_t pos)
>   
>   static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
>   {
> +    HostIOMMUDevice *hiod = vdev->vbasedev.hiod;
> +    HostIOMMUDeviceClass *hiodc = HOST_IOMMU_DEVICE_GET_CLASS(hiod);
>       PCIDevice *pdev = PCI_DEVICE(vdev);
> +    uint64_t max_pasid_log2 = 0;
> +    bool pasid_cap_added = false;
> +    uint64_t hw_caps;
>       uint32_t header;
>       uint16_t cap_id, next, size;
>       uint8_t cap_ver;
> @@ -2578,12 +2584,43 @@ static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
>                   pcie_add_capability(pdev, cap_id, cap_ver, next, size);
>               }
>               break;
> +        case PCI_EXT_CAP_ID_PASID:
> +             pasid_cap_added = true;
> +             /* fallthrough */
>           default:
>               pcie_add_capability(pdev, cap_id, cap_ver, next, size);
>           }
>   
>       }
>   
> +#ifdef CONFIG_IOMMUFD

The HostIOMMUDevice concept was introduced to abstract the use of
the Host IOMMU backends in VFIO (and other parts of QEMU):

- the VFIO IOMMU type1 backend, also referred as 'legacy',
- IOMMUFD

Adding code in VFIO under CONFIG_IOMMUFD should be avoided always
when possible. There are exceptions, such as for the definition
of the properties below in this file. This is, however, due to the
dual-bus nature of the VFIO devices and the limitation of QEMU class
inheritance.

In this case, I think we can extend HostIOMMUDevice and associated
class, to handle PASID support. Please rework this patch. I can
merge as a prereq change.


Also, IOMMUFD backend is not supported on all platforms, so these
changes, even if correct, won't compile.

Thanks,

C.


> +    /*
> +     * Although we check for PCI_EXT_CAP_ID_PASID above, the Linux VFIO
> +     * framework currently hides this capability. Try to retrieve it
> +     * through alternative kernel interfaces (e.g. IOMMUFD APIs).
> +     */
> +    if (!pasid_cap_added && hiodc->get_cap) {
> +        hiodc->get_cap(hiod, HOST_IOMMU_DEVICE_CAP_GENERIC_HW, &hw_caps, NULL);
> +        hiodc->get_cap(hiod, HOST_IOMMU_DEVICE_CAP_MAX_PASID_LOG2,
> +                       &max_pasid_log2, NULL);
> +    }
> +
> +    /*
> +     * If supported, adds the PASID capability in the end of the PCIe config
> +     * space. TODO: Add option for enabling pasid at a safe offset.
> +     */
> +    if (max_pasid_log2 && (pci_device_get_viommu_flags(pdev) &
> +                           VIOMMU_FLAG_PASID_SUPPORTED)) {
> +        bool exec_perm = (hw_caps & IOMMU_HW_CAP_PCI_PASID_EXEC) ? true : false;
> +        bool priv_mod = (hw_caps & IOMMU_HW_CAP_PCI_PASID_PRIV) ? true : false;
> +
> +        pcie_pasid_init(pdev, PCIE_CONFIG_SPACE_SIZE - PCI_EXT_CAP_PASID_SIZEOF,
> +                        max_pasid_log2, exec_perm, priv_mod);
> +        /* PASID capability is fully emulated by QEMU */
> +        memset(vdev->emulated_config_bits + pdev->exp.pasid_cap, 0xff, 8);
> +    }
> +#endif
> +
>       /* Cleanup chain head ID if necessary */
>       if (pci_get_word(pdev->config + PCI_CONFIG_SPACE_SIZE) == 0xFFFF) {
>           pci_set_word(pdev->config + PCI_CONFIG_SPACE_SIZE, 0);
> diff --git a/include/hw/iommu.h b/include/hw/iommu.h
> index 9b8bb94fc2..9635770bee 100644
> --- a/include/hw/iommu.h
> +++ b/include/hw/iommu.h
> @@ -20,6 +20,7 @@
>   enum viommu_flags {
>       /* vIOMMU needs nesting parent HWPT to create nested HWPT */
>       VIOMMU_FLAG_WANT_NESTING_PARENT = BIT_ULL(0),
> +    VIOMMU_FLAG_PASID_SUPPORTED = BIT_ULL(1),
>   };
>   
>   #endif /* HW_IOMMU_H */
RE: [PATCH v5 31/32] vfio: Synthesize vPASID capability to VM
Posted by Shameer Kolothum 2 months ago
Hi Cédric,

> -----Original Message-----
> From: Cédric Le Goater <clg@redhat.com>
> Sent: 09 December 2025 10:11
> To: Shameer Kolothum <skolothumtho@nvidia.com>; qemu-
> arm@nongnu.org; qemu-devel@nongnu.org
> Cc: eric.auger@redhat.com; peter.maydell@linaro.org; Jason Gunthorpe
> <jgg@nvidia.com>; Nicolin Chen <nicolinc@nvidia.com>;
> ddutile@redhat.com; berrange@redhat.com; Nathan Chen
> <nathanc@nvidia.com>; Matt Ochs <mochs@nvidia.com>;
> smostafa@google.com; wangzhou1@hisilicon.com;
> jiangkunkun@huawei.com; jonathan.cameron@huawei.com;
> zhangfei.gao@linaro.org; zhenzhong.duan@intel.com; yi.l.liu@intel.com;
> Krishnakant Jaju <kjaju@nvidia.com>
> Subject: Re: [PATCH v5 31/32] vfio: Synthesize vPASID capability to VM
> 
> External email: Use caution opening links or attachments
> 
> 
> Hello Shameer, Yi,
> 
> On 10/31/25 11:50, Shameer Kolothum wrote:
> > From: Yi Liu <yi.l.liu@intel.com>
> >
> > If user wants to expose PASID capability in vIOMMU, then VFIO would also
> > report the PASID cap for this device if the underlying hardware supports
> > it as well.
> >
> > As a start, this chooses to put the vPASID cap in the last 8 bytes of the
> > vconfig space. This is a choice in the good hope of no conflict with any
> > existing cap or hidden registers. For the devices that has hidden registers,
> > user should figure out a proper offset for the vPASID cap. This may require
> > an option for user to config it. Here we leave it as a future extension.
> > There are more discussions on the mechanism of finding the proper offset.
> >
> >
> https://lore.kernel.org/kvm/BN9PR11MB5276318969A212AD0649C7BE8CBE2
> @BN9PR11MB5276.namprd11.prod.outlook.com/
> >
> > Since we add a check to ensure the vIOMMU supports PASID, only devices
> > under those vIOMMUs can synthesize the vPASID capability. This gives
> > users control over which devices expose vPASID.
> >
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
> > Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
> > ---
> >   hw/vfio/pci.c      | 37 +++++++++++++++++++++++++++++++++++++
> >   include/hw/iommu.h |  1 +
> >   2 files changed, 38 insertions(+)
> >
> > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> > index 06b06afc2b..2054eac897 100644
> > --- a/hw/vfio/pci.c
> > +++ b/hw/vfio/pci.c
> > @@ -24,6 +24,7 @@
> >   #include <sys/ioctl.h>
> >
> >   #include "hw/hw.h"
> > +#include "hw/iommu.h"
> >   #include "hw/pci/msi.h"
> >   #include "hw/pci/msix.h"
> >   #include "hw/pci/pci_bridge.h"
> > @@ -2500,7 +2501,12 @@ static int vfio_setup_rebar_ecap(VFIOPCIDevice
> *vdev, uint16_t pos)
> >
> >   static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
> >   {
> > +    HostIOMMUDevice *hiod = vdev->vbasedev.hiod;
> > +    HostIOMMUDeviceClass *hiodc =
> HOST_IOMMU_DEVICE_GET_CLASS(hiod);
> >       PCIDevice *pdev = PCI_DEVICE(vdev);
> > +    uint64_t max_pasid_log2 = 0;
> > +    bool pasid_cap_added = false;
> > +    uint64_t hw_caps;
> >       uint32_t header;
> >       uint16_t cap_id, next, size;
> >       uint8_t cap_ver;
> > @@ -2578,12 +2584,43 @@ static void vfio_add_ext_cap(VFIOPCIDevice
> *vdev)
> >                   pcie_add_capability(pdev, cap_id, cap_ver, next, size);
> >               }
> >               break;
> > +        case PCI_EXT_CAP_ID_PASID:
> > +             pasid_cap_added = true;
> > +             /* fallthrough */
> >           default:
> >               pcie_add_capability(pdev, cap_id, cap_ver, next, size);
> >           }
> >
> >       }
> >
> > +#ifdef CONFIG_IOMMUFD
> 
> The HostIOMMUDevice concept was introduced to abstract the use of
> the Host IOMMU backends in VFIO (and other parts of QEMU):
> 
> - the VFIO IOMMU type1 backend, also referred as 'legacy',
> - IOMMUFD
> 
> Adding code in VFIO under CONFIG_IOMMUFD should be avoided always
> when possible. There are exceptions, such as for the definition
> of the properties below in this file. This is, however, due to the
> dual-bus nature of the VFIO devices and the limitation of QEMU class
> inheritance.
>
Yes, I did see the CONFIG_IOMMUFD usage in this file, but was unaware of
the reasons/constraints.

> In this case, I think we can extend HostIOMMUDevice and associated
> class, to handle PASID support. Please rework this patch. I can
> merge as a prereq change.

I had a go at extending the HostIOMMUDeviceClass in v4 here,
https://lore.kernel.org/qemu-devel/20250929133643.38961-26-skolothumtho@nvidia.com/

Is something similar you have in mind here?

> 
> Also, IOMMUFD backend is not supported on all platforms, so these
> changes, even if correct, won't compile.

Hmm..I am not sure I follow the compile failure case mentioned. Is the problem
will be with HostIOMMUDevice above or within this #ifdef CONFIG_IOMMUFD
block itself?

Thanks,
Shameer

Re: [PATCH v5 31/32] vfio: Synthesize vPASID capability to VM
Posted by Cédric Le Goater 2 months ago
Hello again,

>> In this case, I think we can extend HostIOMMUDevice and associated
>> class, to handle PASID support. Please rework this patch. I can
>> merge as a prereq change.
> 
> I had a go at extending the HostIOMMUDeviceClass in v4 here,
> https://lore.kernel.org/qemu-devel/20250929133643.38961-26-skolothumtho@nvidia.com/
> 
> Is something similar you have in mind here?

yes. That's the spirit. vfio-pci should be host IOMMU agnostic.

Thanks,

C.
Re: [PATCH v5 31/32] vfio: Synthesize vPASID capability to VM
Posted by Cédric Le Goater 2 months ago
Hello Shameer,

>> Also, IOMMUFD backend is not supported on all platforms, so these
>> changes, even if correct, won't compile.
> 
> Hmm..I am not sure I follow the compile failure case mentioned. Is the problem
> will be with HostIOMMUDevice above or within this #ifdef CONFIG_IOMMUFD
> block itself?

Try a ppc build.

Thanks,

C.
Re: [PATCH v5 31/32] vfio: Synthesize vPASID capability to VM
Posted by Eric Auger 3 months ago
Hi Shameer,
On 10/31/25 11:50 AM, Shameer Kolothum wrote:
> From: Yi Liu <yi.l.liu@intel.com>
>
> If user wants to expose PASID capability in vIOMMU, then VFIO would also
need to report?
> report the PASID cap for this device if the underlying hardware supports
> it as well.
>
> As a start, this chooses to put the vPASID cap in the last 8 bytes of the
> vconfig space. This is a choice in the good hope of no conflict with any
> existing cap or hidden registers. For the devices that has hidden registers,
> user should figure out a proper offset for the vPASID cap. This may require
> an option for user to config it. Here we leave it as a future extension.
> There are more discussions on the mechanism of finding the proper offset.
>
> https://lore.kernel.org/kvm/BN9PR11MB5276318969A212AD0649C7BE8CBE2@BN9PR11MB5276.namprd11.prod.outlook.com/
>
> Since we add a check to ensure the vIOMMU supports PASID, only devices
> under those vIOMMUs can synthesize the vPASID capability. This gives
> users control over which devices expose vPASID.
>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
> Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
> ---
>  hw/vfio/pci.c      | 37 +++++++++++++++++++++++++++++++++++++
>  include/hw/iommu.h |  1 +
>  2 files changed, 38 insertions(+)
>
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 06b06afc2b..2054eac897 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -24,6 +24,7 @@
>  #include <sys/ioctl.h>
>  
>  #include "hw/hw.h"
> +#include "hw/iommu.h"
>  #include "hw/pci/msi.h"
>  #include "hw/pci/msix.h"
>  #include "hw/pci/pci_bridge.h"
> @@ -2500,7 +2501,12 @@ static int vfio_setup_rebar_ecap(VFIOPCIDevice *vdev, uint16_t pos)
>  
>  static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
>  {
> +    HostIOMMUDevice *hiod = vdev->vbasedev.hiod;
> +    HostIOMMUDeviceClass *hiodc = HOST_IOMMU_DEVICE_GET_CLASS(hiod);
>      PCIDevice *pdev = PCI_DEVICE(vdev);
> +    uint64_t max_pasid_log2 = 0;
> +    bool pasid_cap_added = false;
> +    uint64_t hw_caps;
>      uint32_t header;
>      uint16_t cap_id, next, size;
>      uint8_t cap_ver;
> @@ -2578,12 +2584,43 @@ static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
>                  pcie_add_capability(pdev, cap_id, cap_ver, next, size);
>              }
>              break;
> +        case PCI_EXT_CAP_ID_PASID:
> +             pasid_cap_added = true;
> +             /* fallthrough */
>          default:
>              pcie_add_capability(pdev, cap_id, cap_ver, next, size);
>          }
>  
>      }
>  
> +#ifdef CONFIG_IOMMUFD
> +    /*
> +     * Although we check for PCI_EXT_CAP_ID_PASID above, the Linux VFIO
> +     * framework currently hides this capability. Try to retrieve it
> +     * through alternative kernel interfaces (e.g. IOMMUFD APIs).
I don't catch this sentence . When are you supposed to read above
PCI_EXT_CAP_ID_PASID cap id then?
> +     */
> +    if (!pasid_cap_added && hiodc->get_cap) {
> +        hiodc->get_cap(hiod, HOST_IOMMU_DEVICE_CAP_GENERIC_HW, &hw_caps, NULL);
> +        hiodc->get_cap(hiod, HOST_IOMMU_DEVICE_CAP_MAX_PASID_LOG2,
> +                       &max_pasid_log2, NULL);
> +    }
> +
> +    /*
> +     * If supported, adds the PASID capability in the end of the PCIe config
> +     * space. TODO: Add option for enabling pasid at a safe offset.
> +     */
> +    if (max_pasid_log2 && (pci_device_get_viommu_flags(pdev) &
> +                           VIOMMU_FLAG_PASID_SUPPORTED)) {
> +        bool exec_perm = (hw_caps & IOMMU_HW_CAP_PCI_PASID_EXEC) ? true : false;
can't you direct set exec_perm = (hw_caps & IOMMU_HW_CAP_PCI_PASID_EXEC);
> +        bool priv_mod = (hw_caps & IOMMU_HW_CAP_PCI_PASID_PRIV) ? true : false;
> +
> +        pcie_pasid_init(pdev, PCIE_CONFIG_SPACE_SIZE - PCI_EXT_CAP_PASID_SIZEOF,
> +                        max_pasid_log2, exec_perm, priv_mod);
> +        /* PASID capability is fully emulated by QEMU */
> +        memset(vdev->emulated_config_bits + pdev->exp.pasid_cap, 0xff, 8);
> +    }
> +#endif
> +
>      /* Cleanup chain head ID if necessary */
>      if (pci_get_word(pdev->config + PCI_CONFIG_SPACE_SIZE) == 0xFFFF) {
>          pci_set_word(pdev->config + PCI_CONFIG_SPACE_SIZE, 0);
> diff --git a/include/hw/iommu.h b/include/hw/iommu.h
> index 9b8bb94fc2..9635770bee 100644
> --- a/include/hw/iommu.h
> +++ b/include/hw/iommu.h
> @@ -20,6 +20,7 @@
>  enum viommu_flags {
>      /* vIOMMU needs nesting parent HWPT to create nested HWPT */
>      VIOMMU_FLAG_WANT_NESTING_PARENT = BIT_ULL(0),
> +    VIOMMU_FLAG_PASID_SUPPORTED = BIT_ULL(1),
>  };
>  
>  #endif /* HW_IOMMU_H */
Thanks

Eric
RE: [PATCH v5 31/32] vfio: Synthesize vPASID capability to VM
Posted by Shameer Kolothum 3 months ago

> -----Original Message-----
> From: Eric Auger <eric.auger@redhat.com>
> Sent: 06 November 2025 13:56
> To: Shameer Kolothum <skolothumtho@nvidia.com>; qemu-
> arm@nongnu.org; qemu-devel@nongnu.org
> Cc: peter.maydell@linaro.org; Jason Gunthorpe <jgg@nvidia.com>; Nicolin
> Chen <nicolinc@nvidia.com>; ddutile@redhat.com; berrange@redhat.com;
> Nathan Chen <nathanc@nvidia.com>; Matt Ochs <mochs@nvidia.com>;
> smostafa@google.com; wangzhou1@hisilicon.com;
> jiangkunkun@huawei.com; jonathan.cameron@huawei.com;
> zhangfei.gao@linaro.org; zhenzhong.duan@intel.com; yi.l.liu@intel.com;
> Krishnakant Jaju <kjaju@nvidia.com>
> Subject: Re: [PATCH v5 31/32] vfio: Synthesize vPASID capability to VM
> 
> External email: Use caution opening links or attachments
> 
> 
> Hi Shameer,
> On 10/31/25 11:50 AM, Shameer Kolothum wrote:
> > From: Yi Liu <yi.l.liu@intel.com>
> >
> > If user wants to expose PASID capability in vIOMMU, then VFIO would also
> need to report?
> > report the PASID cap for this device if the underlying hardware supports
> > it as well.
> >
> > As a start, this chooses to put the vPASID cap in the last 8 bytes of the
> > vconfig space. This is a choice in the good hope of no conflict with any
> > existing cap or hidden registers. For the devices that has hidden registers,
> > user should figure out a proper offset for the vPASID cap. This may require
> > an option for user to config it. Here we leave it as a future extension.
> > There are more discussions on the mechanism of finding the proper offset.
> >
> >
> https://lore.kernel.org/kvm/BN9PR11MB5276318969A212AD0649C7BE8C
> BE2@BN9PR11MB5276.namprd11.prod.outlook.com/
> >
> > Since we add a check to ensure the vIOMMU supports PASID, only devices
> > under those vIOMMUs can synthesize the vPASID capability. This gives
> > users control over which devices expose vPASID.
> >
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
> > Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
> > ---
> >  hw/vfio/pci.c      | 37 +++++++++++++++++++++++++++++++++++++
> >  include/hw/iommu.h |  1 +
> >  2 files changed, 38 insertions(+)
> >
> > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> > index 06b06afc2b..2054eac897 100644
> > --- a/hw/vfio/pci.c
> > +++ b/hw/vfio/pci.c
> > @@ -24,6 +24,7 @@
> >  #include <sys/ioctl.h>
> >
> >  #include "hw/hw.h"
> > +#include "hw/iommu.h"
> >  #include "hw/pci/msi.h"
> >  #include "hw/pci/msix.h"
> >  #include "hw/pci/pci_bridge.h"
> > @@ -2500,7 +2501,12 @@ static int vfio_setup_rebar_ecap(VFIOPCIDevice
> *vdev, uint16_t pos)
> >
> >  static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
> >  {
> > +    HostIOMMUDevice *hiod = vdev->vbasedev.hiod;
> > +    HostIOMMUDeviceClass *hiodc =
> HOST_IOMMU_DEVICE_GET_CLASS(hiod);
> >      PCIDevice *pdev = PCI_DEVICE(vdev);
> > +    uint64_t max_pasid_log2 = 0;
> > +    bool pasid_cap_added = false;
> > +    uint64_t hw_caps;
> >      uint32_t header;
> >      uint16_t cap_id, next, size;
> >      uint8_t cap_ver;
> > @@ -2578,12 +2584,43 @@ static void vfio_add_ext_cap(VFIOPCIDevice
> *vdev)
> >                  pcie_add_capability(pdev, cap_id, cap_ver, next, size);
> >              }
> >              break;
> > +        case PCI_EXT_CAP_ID_PASID:
> > +             pasid_cap_added = true;
> > +             /* fallthrough */
> >          default:
> >              pcie_add_capability(pdev, cap_id, cap_ver, next, size);
> >          }
> >
> >      }
> >
> > +#ifdef CONFIG_IOMMUFD
> > +    /*
> > +     * Although we check for PCI_EXT_CAP_ID_PASID above, the Linux VFIO
> > +     * framework currently hides this capability. Try to retrieve it
> > +     * through alternative kernel interfaces (e.g. IOMMUFD APIs).
> I don't catch this sentence . When are you supposed to read above
> PCI_EXT_CAP_ID_PASID cap id then?

That’s to make it future proof in case VFIO relaxes that.  If that happens
the code above by default, will add the CAP and we may end with a
duplicate at below offset.

> > +     */
> > +    if (!pasid_cap_added && hiodc->get_cap) {
> > +        hiodc->get_cap(hiod, HOST_IOMMU_DEVICE_CAP_GENERIC_HW,
> &hw_caps, NULL);
> > +        hiodc->get_cap(hiod, HOST_IOMMU_DEVICE_CAP_MAX_PASID_LOG2,
> > +                       &max_pasid_log2, NULL);
> > +    }
> > +
> > +    /*
> > +     * If supported, adds the PASID capability in the end of the PCIe config
> > +     * space. TODO: Add option for enabling pasid at a safe offset.
> > +     */
> > +    if (max_pasid_log2 && (pci_device_get_viommu_flags(pdev) &
> > +                           VIOMMU_FLAG_PASID_SUPPORTED)) {
> > +        bool exec_perm = (hw_caps & IOMMU_HW_CAP_PCI_PASID_EXEC) ?
> true : false;
> can't you direct set exec_perm = (hw_caps &
> IOMMU_HW_CAP_PCI_PASID_EXEC);

True 😊

Thanks,
Shameer
Re: [PATCH v5 31/32] vfio: Synthesize vPASID capability to VM
Posted by Eric Auger 3 months ago

On 11/6/25 3:27 PM, Shameer Kolothum wrote:
>
>> -----Original Message-----
>> From: Eric Auger <eric.auger@redhat.com>
>> Sent: 06 November 2025 13:56
>> To: Shameer Kolothum <skolothumtho@nvidia.com>; qemu-
>> arm@nongnu.org; qemu-devel@nongnu.org
>> Cc: peter.maydell@linaro.org; Jason Gunthorpe <jgg@nvidia.com>; Nicolin
>> Chen <nicolinc@nvidia.com>; ddutile@redhat.com; berrange@redhat.com;
>> Nathan Chen <nathanc@nvidia.com>; Matt Ochs <mochs@nvidia.com>;
>> smostafa@google.com; wangzhou1@hisilicon.com;
>> jiangkunkun@huawei.com; jonathan.cameron@huawei.com;
>> zhangfei.gao@linaro.org; zhenzhong.duan@intel.com; yi.l.liu@intel.com;
>> Krishnakant Jaju <kjaju@nvidia.com>
>> Subject: Re: [PATCH v5 31/32] vfio: Synthesize vPASID capability to VM
>>
>> External email: Use caution opening links or attachments
>>
>>
>> Hi Shameer,
>> On 10/31/25 11:50 AM, Shameer Kolothum wrote:
>>> From: Yi Liu <yi.l.liu@intel.com>
>>>
>>> If user wants to expose PASID capability in vIOMMU, then VFIO would also
>> need to report?
>>> report the PASID cap for this device if the underlying hardware supports
>>> it as well.
>>>
>>> As a start, this chooses to put the vPASID cap in the last 8 bytes of the
>>> vconfig space. This is a choice in the good hope of no conflict with any
>>> existing cap or hidden registers. For the devices that has hidden registers,
>>> user should figure out a proper offset for the vPASID cap. This may require
>>> an option for user to config it. Here we leave it as a future extension.
>>> There are more discussions on the mechanism of finding the proper offset.
>>>
>>>
>> https://lore.kernel.org/kvm/BN9PR11MB5276318969A212AD0649C7BE8C
>> BE2@BN9PR11MB5276.namprd11.prod.outlook.com/
>>> Since we add a check to ensure the vIOMMU supports PASID, only devices
>>> under those vIOMMUs can synthesize the vPASID capability. This gives
>>> users control over which devices expose vPASID.
>>>
>>> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
>>> Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
>>> Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
>>> ---
>>>  hw/vfio/pci.c      | 37 +++++++++++++++++++++++++++++++++++++
>>>  include/hw/iommu.h |  1 +
>>>  2 files changed, 38 insertions(+)
>>>
>>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>>> index 06b06afc2b..2054eac897 100644
>>> --- a/hw/vfio/pci.c
>>> +++ b/hw/vfio/pci.c
>>> @@ -24,6 +24,7 @@
>>>  #include <sys/ioctl.h>
>>>
>>>  #include "hw/hw.h"
>>> +#include "hw/iommu.h"
>>>  #include "hw/pci/msi.h"
>>>  #include "hw/pci/msix.h"
>>>  #include "hw/pci/pci_bridge.h"
>>> @@ -2500,7 +2501,12 @@ static int vfio_setup_rebar_ecap(VFIOPCIDevice
>> *vdev, uint16_t pos)
>>>  static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
>>>  {
>>> +    HostIOMMUDevice *hiod = vdev->vbasedev.hiod;
>>> +    HostIOMMUDeviceClass *hiodc =
>> HOST_IOMMU_DEVICE_GET_CLASS(hiod);
>>>      PCIDevice *pdev = PCI_DEVICE(vdev);
>>> +    uint64_t max_pasid_log2 = 0;
>>> +    bool pasid_cap_added = false;
>>> +    uint64_t hw_caps;
>>>      uint32_t header;
>>>      uint16_t cap_id, next, size;
>>>      uint8_t cap_ver;
>>> @@ -2578,12 +2584,43 @@ static void vfio_add_ext_cap(VFIOPCIDevice
>> *vdev)
>>>                  pcie_add_capability(pdev, cap_id, cap_ver, next, size);
>>>              }
>>>              break;
>>> +        case PCI_EXT_CAP_ID_PASID:
>>> +             pasid_cap_added = true;
>>> +             /* fallthrough */
>>>          default:
>>>              pcie_add_capability(pdev, cap_id, cap_ver, next, size);
>>>          }
>>>
>>>      }
>>>
>>> +#ifdef CONFIG_IOMMUFD
>>> +    /*
>>> +     * Although we check for PCI_EXT_CAP_ID_PASID above, the Linux VFIO
>>> +     * framework currently hides this capability. Try to retrieve it
>>> +     * through alternative kernel interfaces (e.g. IOMMUFD APIs).
>> I don't catch this sentence . When are you supposed to read above
>> PCI_EXT_CAP_ID_PASID cap id then?
> That’s to make it future proof in case VFIO relaxes that.  If that happens
> the code above by default, will add the CAP and we may end with a
> duplicate at below offset.
OK thanks for the clarification. Then I would move the comment about
VFIO kernel code currently hiding the extended cap along with

+             pasid_cap_added = true;

and explain it is added to make it future proof in case VFIO relaxes that

Eric

>
>>> +     */
>>> +    if (!pasid_cap_added && hiodc->get_cap) {
>>> +        hiodc->get_cap(hiod, HOST_IOMMU_DEVICE_CAP_GENERIC_HW,
>> &hw_caps, NULL);
>>> +        hiodc->get_cap(hiod, HOST_IOMMU_DEVICE_CAP_MAX_PASID_LOG2,
>>> +                       &max_pasid_log2, NULL);
>>> +    }
>>> +
>>> +    /*
>>> +     * If supported, adds the PASID capability in the end of the PCIe config
>>> +     * space. TODO: Add option for enabling pasid at a safe offset.
>>> +     */
>>> +    if (max_pasid_log2 && (pci_device_get_viommu_flags(pdev) &
>>> +                           VIOMMU_FLAG_PASID_SUPPORTED)) {
>>> +        bool exec_perm = (hw_caps & IOMMU_HW_CAP_PCI_PASID_EXEC) ?
>> true : false;
>> can't you direct set exec_perm = (hw_caps &
>> IOMMU_HW_CAP_PCI_PASID_EXEC);
> True 😊
>
> Thanks,
> Shameer


Re: [PATCH v5 31/32] vfio: Synthesize vPASID capability to VM
Posted by Jonathan Cameron via 3 months, 1 week ago
On Fri, 31 Oct 2025 10:50:04 +0000
Shameer Kolothum <skolothumtho@nvidia.com> wrote:

> From: Yi Liu <yi.l.liu@intel.com>
> 
> If user wants to expose PASID capability in vIOMMU, then VFIO would also
> report the PASID cap for this device if the underlying hardware supports
> it as well.
> 
> As a start, this chooses to put the vPASID cap in the last 8 bytes of the
> vconfig space. This is a choice in the good hope of no conflict with any
> existing cap or hidden registers. For the devices that has hidden registers,
> user should figure out a proper offset for the vPASID cap. This may require
> an option for user to config it. Here we leave it as a future extension.
> There are more discussions on the mechanism of finding the proper offset.
> 
> https://lore.kernel.org/kvm/BN9PR11MB5276318969A212AD0649C7BE8CBE2@BN9PR11MB5276.namprd11.prod.outlook.com/
> 
> Since we add a check to ensure the vIOMMU supports PASID, only devices
> under those vIOMMUs can synthesize the vPASID capability. This gives
> users control over which devices expose vPASID.
> 
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
> Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
Whilst not particularly keen on this hack, I can't see a better solution.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>