[PATCH v5 20/21] Workaround for ERRATA_772415_SPR17

Zhenzhong Duan posted 21 patches 2 months, 3 weeks ago
Maintainers: "Michael S. Tsirkin" <mst@redhat.com>, Jason Wang <jasowang@redhat.com>, Yi Liu <yi.l.liu@intel.com>, "Clément Mathieu--Drif" <clement.mathieu--drif@eviden.com>, Marcel Apfelbaum <marcel.apfelbaum@gmail.com>, Paolo Bonzini <pbonzini@redhat.com>, Richard Henderson <richard.henderson@linaro.org>, Eduardo Habkost <eduardo@habkost.net>, Alex Williamson <alex.williamson@redhat.com>, "Cédric Le Goater" <clg@redhat.com>, Eric Auger <eric.auger@redhat.com>, Zhenzhong Duan <zhenzhong.duan@intel.com>
[PATCH v5 20/21] Workaround for ERRATA_772415_SPR17
Posted by Zhenzhong Duan 2 months, 3 weeks ago
On a system influenced by ERRATA_772415, IOMMU_HW_INFO_VTD_ERRATA_772415_SPR17
is repored by IOMMU_DEVICE_GET_HW_INFO. Due to this errata, even the readonly
range mapped on stage-2 page table could still be written.

Reference from 4th Gen Intel Xeon Processor Scalable Family Specification
Update, Errata Details, SPR17.
https://edc.intel.com/content/www/us/en/design/products-and-solutions/processors-and-chipsets/eagle-stream/sapphire-rapids-specification-update/

Also copied the SPR17 details from above link:
"Problem: When remapping hardware is configured by system software in
scalable mode as Nested (PGTT=011b) and with PWSNP field Set in the
PASID-table-entry, it may Set Accessed bit and Dirty bit (and Extended
Access bit if enabled) in first-stage page-table entries even when
second-stage mappings indicate that corresponding first-stage page-table
is Read-Only.

Implication: Due to this erratum, pages mapped as Read-only in second-stage
page-tables may be modified by remapping hardware Access/Dirty bit updates.

Workaround: None identified. System software enabling nested translations
for a VM should ensure that there are no read-only pages in the
corresponding second-stage mappings."

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
 hw/vfio/iommufd.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index e503c232e1..59735e878c 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -324,6 +324,7 @@ static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
 {
     ERRP_GUARD();
     IOMMUFDBackend *iommufd = vbasedev->iommufd;
+    struct iommu_hw_info_vtd vtd;
     uint32_t type, flags = 0;
     uint64_t hw_caps;
     VFIOIOASHwpt *hwpt;
@@ -371,10 +372,15 @@ static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
      * instead.
      */
     if (!iommufd_backend_get_device_info(vbasedev->iommufd, vbasedev->devid,
-                                         &type, NULL, 0, &hw_caps, errp)) {
+                                         &type, &vtd, sizeof(vtd), &hw_caps,
+                                         errp)) {
         return false;
     }
 
+    if (vtd.flags & IOMMU_HW_INFO_VTD_ERRATA_772415_SPR17) {
+        container->bcontainer.bypass_ro = true;
+    }
+
     if (hw_caps & IOMMU_HW_CAP_DIRTY_TRACKING) {
         flags = IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
     }
-- 
2.47.1
Re: [PATCH v5 20/21] Workaround for ERRATA_772415_SPR17
Posted by Nicolin Chen 2 months, 3 weeks ago
On Fri, Aug 22, 2025 at 02:40:58AM -0400, Zhenzhong Duan wrote:
> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
> index e503c232e1..59735e878c 100644
> --- a/hw/vfio/iommufd.c
> +++ b/hw/vfio/iommufd.c
> @@ -324,6 +324,7 @@ static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>  {
>      ERRP_GUARD();
>      IOMMUFDBackend *iommufd = vbasedev->iommufd;
> +    struct iommu_hw_info_vtd vtd;

VendorCaps vendor_caps;

>      uint32_t type, flags = 0;
>      uint64_t hw_caps;
>      VFIOIOASHwpt *hwpt;
> @@ -371,10 +372,15 @@ static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>       * instead.
>       */
>      if (!iommufd_backend_get_device_info(vbasedev->iommufd, vbasedev->devid,
> -                                         &type, NULL, 0, &hw_caps, errp)) {
> +                                         &type, &vtd, sizeof(vtd), &hw_caps,

s/vtd/vendor_caps/g

> +                                         errp)) {
>          return false;
>      }
>  
> +    if (vtd.flags & IOMMU_HW_INFO_VTD_ERRATA_772415_SPR17) {
> +        container->bcontainer.bypass_ro = true;

This circled back to checking a vendor specific flag in the core..

Perhaps we could upgrade the get_viommu_cap op and its API:

enum viommu_flags {
    VIOMMU_FLAG_HW_NESTED = BIT_ULL(0),
    VIOMMU_FLAG_BYPASS_RO = BIT_ULL(1),
};

bool vfio_device_get_viommu_flags(VFIODevice *vbasedev, VendorCaps *vendor_caps,
                                  uint64_t *viommu_flags);

Then:
    if (viommu_flags & VIOMMU_FLAG_BYPASS_RO) {
        container->bcontainer.bypass_ro = true;
    }
...
    if (viommu_flags & VIOMMU_FLAG_HW_NESTED) {
        flags |= IOMMU_HWPT_ALLOC_NEST_PARENT;
    }

Nicolin
Re: [PATCH v5 20/21] Workaround for ERRATA_772415_SPR17
Posted by Yi Liu 2 months, 2 weeks ago

On 2025/8/23 07:55, Nicolin Chen wrote:
> On Fri, Aug 22, 2025 at 02:40:58AM -0400, Zhenzhong Duan wrote:
>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>> index e503c232e1..59735e878c 100644
>> --- a/hw/vfio/iommufd.c
>> +++ b/hw/vfio/iommufd.c
>> @@ -324,6 +324,7 @@ static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>>   {
>>       ERRP_GUARD();
>>       IOMMUFDBackend *iommufd = vbasedev->iommufd;
>> +    struct iommu_hw_info_vtd vtd;
> 
> VendorCaps vendor_caps;
> 
>>       uint32_t type, flags = 0;
>>       uint64_t hw_caps;
>>       VFIOIOASHwpt *hwpt;
>> @@ -371,10 +372,15 @@ static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>>        * instead.
>>        */
>>       if (!iommufd_backend_get_device_info(vbasedev->iommufd, vbasedev->devid,
>> -                                         &type, NULL, 0, &hw_caps, errp)) {
>> +                                         &type, &vtd, sizeof(vtd), &hw_caps,
> 
> s/vtd/vendor_caps/g
> 
>> +                                         errp)) {
>>           return false;
>>       }
>>   
>> +    if (vtd.flags & IOMMU_HW_INFO_VTD_ERRATA_772415_SPR17) {
>> +        container->bcontainer.bypass_ro = true;
> 
> This circled back to checking a vendor specific flag in the core..
> 
> Perhaps we could upgrade the get_viommu_cap op and its API:
> 
> enum viommu_flags {
>      VIOMMU_FLAG_HW_NESTED = BIT_ULL(0),
>      VIOMMU_FLAG_BYPASS_RO = BIT_ULL(1),

hmmm. I'm not quite on this idea as the two flags have different sources.
One determined by vIOMMU config, one by the hardware limit. Reporting
them in one API is strange.  I think the bypass RO can be determined in
VFIO just like the patch has done. But it should check if vIOMMU has 
requested nested hwpt and also the reported hw_info::type is
IOMMU_HW_INFO_TYPE_INTEL_VTD.

	if ((flags & IOMMU_HWPT_ALLOC_NEST_PARENT) &&
             type == IOMMU_HW_INFO_TYPE_INTEL_VTD &&
             vtd.flags & IOMMU_HW_INFO_VTD_ERRATA_772415_SPR17) {
             container->bcontainer.bypass_ro = true;
          }

Regards,
Yi Liu
Re: [PATCH v5 20/21] Workaround for ERRATA_772415_SPR17
Posted by Nicolin Chen 2 months, 2 weeks ago
On Wed, Aug 27, 2025 at 07:56:38PM +0800, Yi Liu wrote:
> On 2025/8/23 07:55, Nicolin Chen wrote:
> > > +    if (vtd.flags & IOMMU_HW_INFO_VTD_ERRATA_772415_SPR17) {
> > > +        container->bcontainer.bypass_ro = true;
> > 
> > This circled back to checking a vendor specific flag in the core..
> > 
> > Perhaps we could upgrade the get_viommu_cap op and its API:
> > 
> > enum viommu_flags {
> >      VIOMMU_FLAG_HW_NESTED = BIT_ULL(0),
> >      VIOMMU_FLAG_BYPASS_RO = BIT_ULL(1),
> 
> hmmm. I'm not quite on this idea as the two flags have different sources.
> One determined by vIOMMU config, one by the hardware limit. Reporting
> them in one API is strange.

It's fair enough that we want to make such a clear boundary between
a vIOMMU flag and a HW IOMMU flag of the same vendor..

> I think the bypass RO can be determined in
> VFIO just like the patch has done. But it should check if vIOMMU has
> requested nested hwpt and also the reported hw_info::type is
> IOMMU_HW_INFO_TYPE_INTEL_VTD.
> 
> 	if ((flags & IOMMU_HWPT_ALLOC_NEST_PARENT) &&
>             type == IOMMU_HW_INFO_TYPE_INTEL_VTD &&
>             vtd.flags & IOMMU_HW_INFO_VTD_ERRATA_772415_SPR17) {
>             container->bcontainer.bypass_ro = true;
>          }

Then, it feels odd to me that we don't have a clear boundary between
a generic flag and a vendor flag :-/

It's fine if we want to keep all the host-level vendor flags outside
the vIOMMU code, but at least could we please have a generic looking
function outside this iommufd_cdev_autodomains_get() to translate a
vendor flag to a generic looking flag?

We could start with a function that loads the HostIOMMUDeviceCaps (or
just VendorCaps) dealing with vendor types and outputs generic ones:

        host_iommu_flags = host_iommu_decode_vendor_caps(&vendor_caps);

        if (hwpt_flags & IOMMU_HWPT_ALLOC_NEST_PARENT &&
            host_iommu_flags & HOST_IOMMU_FLAG_BYPASS_RO) {
             container->bcontainer.bypass_ro = true;
        }

Over time, it can even grow into a separate file, if there are more
vendor specific requirement.

Nicolin
Re: [PATCH v5 20/21] Workaround for ERRATA_772415_SPR17
Posted by Yi Liu 2 months, 2 weeks ago
On 2025/8/27 23:09, Nicolin Chen wrote:
> On Wed, Aug 27, 2025 at 07:56:38PM +0800, Yi Liu wrote:
>> On 2025/8/23 07:55, Nicolin Chen wrote:
>>>> +    if (vtd.flags & IOMMU_HW_INFO_VTD_ERRATA_772415_SPR17) {
>>>> +        container->bcontainer.bypass_ro = true;
>>>
>>> This circled back to checking a vendor specific flag in the core..
>>>
>>> Perhaps we could upgrade the get_viommu_cap op and its API:
>>>
>>> enum viommu_flags {
>>>       VIOMMU_FLAG_HW_NESTED = BIT_ULL(0),
>>>       VIOMMU_FLAG_BYPASS_RO = BIT_ULL(1),
>>
>> hmmm. I'm not quite on this idea as the two flags have different sources.
>> One determined by vIOMMU config, one by the hardware limit. Reporting
>> them in one API is strange.
> 
> It's fair enough that we want to make such a clear boundary between
> a vIOMMU flag and a HW IOMMU flag of the same vendor..
> 
>> I think the bypass RO can be determined in
>> VFIO just like the patch has done. But it should check if vIOMMU has
>> requested nested hwpt and also the reported hw_info::type is
>> IOMMU_HW_INFO_TYPE_INTEL_VTD.
>>
>> 	if ((flags & IOMMU_HWPT_ALLOC_NEST_PARENT) &&
>>              type == IOMMU_HW_INFO_TYPE_INTEL_VTD &&
>>              vtd.flags & IOMMU_HW_INFO_VTD_ERRATA_772415_SPR17) {
>>              container->bcontainer.bypass_ro = true;
>>           }
> 
> Then, it feels odd to me that we don't have a clear boundary between
> a generic flag and a vendor flag :-/
> 
> It's fine if we want to keep all the host-level vendor flags outside
> the vIOMMU code, but at least could we please have a generic looking
> function outside this iommufd_cdev_autodomains_get() to translate a
> vendor flag to a generic looking flag?
> 
> We could start with a function that loads the HostIOMMUDeviceCaps (or
> just VendorCaps) dealing with vendor types and outputs generic ones:
> 
>          host_iommu_flags = host_iommu_decode_vendor_caps(&vendor_caps);
> 
>          if (hwpt_flags & IOMMU_HWPT_ALLOC_NEST_PARENT &&
>              host_iommu_flags & HOST_IOMMU_FLAG_BYPASS_RO) {
>               container->bcontainer.bypass_ro = true;
>          }
> 
> Over time, it can even grow into a separate file, if there are more
> vendor specific requirement.

you also have valid point. I've also considered to let vIOMMU to invoke
the vfio_listener_register(). This might need to change the VFIO logic a
lot. Conceptually, it does not stand very well... And it is too heavy
for WA an errata...So may we just start with a function as you proposed?

Regards,
Yi Liu
Re: [PATCH v5 20/21] Workaround for ERRATA_772415_SPR17
Posted by Nicolin Chen 2 months, 2 weeks ago
On Fri, Aug 29, 2025 at 04:16:41PM +0800, Yi Liu wrote:
> On 2025/8/27 23:09, Nicolin Chen wrote:
> > On Wed, Aug 27, 2025 at 07:56:38PM +0800, Yi Liu wrote:
> > > On 2025/8/23 07:55, Nicolin Chen wrote:
> > We could start with a function that loads the HostIOMMUDeviceCaps (or
> > just VendorCaps) dealing with vendor types and outputs generic ones:
> > 
> >          host_iommu_flags = host_iommu_decode_vendor_caps(&vendor_caps);
> > 
> >          if (hwpt_flags & IOMMU_HWPT_ALLOC_NEST_PARENT &&
> >              host_iommu_flags & HOST_IOMMU_FLAG_BYPASS_RO) {
> >               container->bcontainer.bypass_ro = true;
> >          }
> > 
> > Over time, it can even grow into a separate file, if there are more
> > vendor specific requirement.
> 
> you also have valid point. I've also considered to let vIOMMU to invoke
> the vfio_listener_register(). This might need to change the VFIO logic a
> lot. Conceptually, it does not stand very well... And it is too heavy
> for WA an errata...

I think it's fine to use a flag. Zhenzhong's bypass_ro patch looks
quite clean to me.

> So may we just start with a function as you proposed?

Yea.

I imagined that kind of decoding function in the backend/iommufd.c
or somewhere closer to HostIOMMU structure/function in this file.

And I expected that the WANTS_NESTING_PARENT flag would need some
validation from the vendor specific hw info too, by ensuring IOMMU
HW does support nesting. Then we could reject the allocation of a
nesting parent at an earlier stage.

Now, we are doing in a way of pre-allocating a nesting parent HWPT
(so long as vIOMMU wants) and letting the set_iommu_device callback
do the validation of the HW info. I think that's fine as well..

Yet one way or another, we do put iommu_hw_info_vtd (HW IOMMU caps)
in the vIOMMU code and validate that in the vIOMMU code right? So,
argubly the whole separation between vIOMMU and HW IOMMU things is
not that perfectly implemented? :-/

Thanks
Nic
RE: [PATCH v5 20/21] Workaround for ERRATA_772415_SPR17
Posted by Duan, Zhenzhong 2 months, 3 weeks ago

>-----Original Message-----
>From: Nicolin Chen <nicolinc@nvidia.com>
>Subject: Re: [PATCH v5 20/21] Workaround for ERRATA_772415_SPR17
>
>On Fri, Aug 22, 2025 at 02:40:58AM -0400, Zhenzhong Duan wrote:
>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>> index e503c232e1..59735e878c 100644
>> --- a/hw/vfio/iommufd.c
>> +++ b/hw/vfio/iommufd.c
>> @@ -324,6 +324,7 @@ static bool
>iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>>  {
>>      ERRP_GUARD();
>>      IOMMUFDBackend *iommufd = vbasedev->iommufd;
>> +    struct iommu_hw_info_vtd vtd;
>
>VendorCaps vendor_caps;
>
>>      uint32_t type, flags = 0;
>>      uint64_t hw_caps;
>>      VFIOIOASHwpt *hwpt;
>> @@ -371,10 +372,15 @@ static bool
>iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>>       * instead.
>>       */
>>      if (!iommufd_backend_get_device_info(vbasedev->iommufd,
>vbasedev->devid,
>> -                                         &type, NULL, 0,
>&hw_caps, errp)) {
>> +                                         &type, &vtd, sizeof(vtd),
>&hw_caps,
>
>s/vtd/vendor_caps/g
>
>> +                                         errp)) {
>>          return false;
>>      }
>>
>> +    if (vtd.flags & IOMMU_HW_INFO_VTD_ERRATA_772415_SPR17) {
>> +        container->bcontainer.bypass_ro = true;
>
>This circled back to checking a vendor specific flag in the core..

I'm not sure if VendorCaps struct wrapper is overprogramming as this ERRARA is only VTD specific. We still need to check VendorCaps.vtd.flags bit.

>
>Perhaps we could upgrade the get_viommu_cap op and its API:
>
>enum viommu_flags {
>    VIOMMU_FLAG_HW_NESTED = BIT_ULL(0),
>    VIOMMU_FLAG_BYPASS_RO = BIT_ULL(1),
>};
>
>bool vfio_device_get_viommu_flags(VFIODevice *vbasedev, VendorCaps
>*vendor_caps,
>                                  uint64_t *viommu_flags);
>
>Then:
>    if (viommu_flags & VIOMMU_FLAG_BYPASS_RO) {
>        container->bcontainer.bypass_ro = true;
>    }
>...
>    if (viommu_flags & VIOMMU_FLAG_HW_NESTED) {
>        flags |= IOMMU_HWPT_ALLOC_NEST_PARENT;
>    }

IOMMU_HW_INFO_VTD_ERRATA_772415_SPR17 is a VTD specific flag bit from host IOMMU, we have defined get_viommu_cap() to return pure vIOMMU capability bits, so no host IOMMU flag bit can be returned here. See patch2 commit log for the reason.

Thanks
Zhenzhong
Re: [PATCH v5 20/21] Workaround for ERRATA_772415_SPR17
Posted by Nicolin Chen 2 months, 3 weeks ago
On Mon, Aug 25, 2025 at 09:21:48AM +0000, Duan, Zhenzhong wrote:
> 
> 
> >-----Original Message-----
> >From: Nicolin Chen <nicolinc@nvidia.com>
> >Subject: Re: [PATCH v5 20/21] Workaround for ERRATA_772415_SPR17
> >
> >On Fri, Aug 22, 2025 at 02:40:58AM -0400, Zhenzhong Duan wrote:
> >> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
> >> index e503c232e1..59735e878c 100644
> >> --- a/hw/vfio/iommufd.c
> >> +++ b/hw/vfio/iommufd.c
> >> @@ -324,6 +324,7 @@ static bool
> >iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
> >>  {
> >>      ERRP_GUARD();
> >>      IOMMUFDBackend *iommufd = vbasedev->iommufd;
> >> +    struct iommu_hw_info_vtd vtd;
> >
> >VendorCaps vendor_caps;
> >
> >>      uint32_t type, flags = 0;
> >>      uint64_t hw_caps;
> >>      VFIOIOASHwpt *hwpt;
> >> @@ -371,10 +372,15 @@ static bool
> >iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
> >>       * instead.
> >>       */
> >>      if (!iommufd_backend_get_device_info(vbasedev->iommufd,
> >vbasedev->devid,
> >> -                                         &type, NULL, 0,
> >&hw_caps, errp)) {
> >> +                                         &type, &vtd, sizeof(vtd),
> >&hw_caps,
> >
> >s/vtd/vendor_caps/g
> >
> >> +                                         errp)) {
> >>          return false;
> >>      }
> >>
> >> +    if (vtd.flags & IOMMU_HW_INFO_VTD_ERRATA_772415_SPR17) {
> >> +        container->bcontainer.bypass_ro = true;
> >
> >This circled back to checking a vendor specific flag in the core..
> 
> I'm not sure if VendorCaps struct wrapper is overprogramming as this
> ERRARA is only VTD specific. We still need to check VendorCaps.vtd.flags bit.

Look, the HW_INFO call is done by the core.

Then, the core needs:
  1 HW caps for dirty tracking and PASID
  2 IOMMU_HWPT_ALLOC_NEST_PARENT (vIOMMU cap)
  3 bcontainer.bypass_ro (vIOMMU workaround)

Both 2 and 3 need to get from vIOMMU, while 3 needs VendorCaps.
Arguably 2 could do a bit validation using the VendorCaps too.

> >Perhaps we could upgrade the get_viommu_cap op and its API:
> >
> >enum viommu_flags {
> >    VIOMMU_FLAG_HW_NESTED = BIT_ULL(0),
> >    VIOMMU_FLAG_BYPASS_RO = BIT_ULL(1),
> >};
> >
> >bool vfio_device_get_viommu_flags(VFIODevice *vbasedev, VendorCaps
> >*vendor_caps,
> >                                  uint64_t *viommu_flags);
> >
> >Then:
> >    if (viommu_flags & VIOMMU_FLAG_BYPASS_RO) {
> >        container->bcontainer.bypass_ro = true;
> >    }
> >...
> >    if (viommu_flags & VIOMMU_FLAG_HW_NESTED) {
> >        flags |= IOMMU_HWPT_ALLOC_NEST_PARENT;
> >    }
> 
> IOMMU_HW_INFO_VTD_ERRATA_772415_SPR17 is a VTD specific flag bit
> from host IOMMU, we have defined get_viommu_cap() to return pure
> vIOMMU capability bits, so no host IOMMU flag bit can be returned
> here. See patch2 commit log for the reason.

VIOMMU_FLAG_BYPASS_RO is a "pure" vIOMMU flag, not confined to
VTD. IOW, if some other vIOMMU has a similar issue, they can use
it as well. Since we define a "bypass_ro" in the core bcontainer
structure, it makes sense to have a core-level flag for it, v.s.
checking the vendor flag in the core.

My sample code is turning this get_viommu_cap to something like
get_viommu_flags, which could include both "cap" and "errata".

Nicolin
RE: [PATCH v5 20/21] Workaround for ERRATA_772415_SPR17
Posted by Duan, Zhenzhong 2 months, 2 weeks ago

>-----Original Message-----
>From: Nicolin Chen <nicolinc@nvidia.com>
>Subject: Re: [PATCH v5 20/21] Workaround for ERRATA_772415_SPR17
>
>On Mon, Aug 25, 2025 at 09:21:48AM +0000, Duan, Zhenzhong wrote:
>>
>>
>> >-----Original Message-----
>> >From: Nicolin Chen <nicolinc@nvidia.com>
>> >Subject: Re: [PATCH v5 20/21] Workaround for ERRATA_772415_SPR17
>> >
>> >On Fri, Aug 22, 2025 at 02:40:58AM -0400, Zhenzhong Duan wrote:
>> >> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>> >> index e503c232e1..59735e878c 100644
>> >> --- a/hw/vfio/iommufd.c
>> >> +++ b/hw/vfio/iommufd.c
>> >> @@ -324,6 +324,7 @@ static bool
>> >iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>> >>  {
>> >>      ERRP_GUARD();
>> >>      IOMMUFDBackend *iommufd = vbasedev->iommufd;
>> >> +    struct iommu_hw_info_vtd vtd;
>> >
>> >VendorCaps vendor_caps;
>> >
>> >>      uint32_t type, flags = 0;
>> >>      uint64_t hw_caps;
>> >>      VFIOIOASHwpt *hwpt;
>> >> @@ -371,10 +372,15 @@ static bool
>> >iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>> >>       * instead.
>> >>       */
>> >>      if (!iommufd_backend_get_device_info(vbasedev->iommufd,
>> >vbasedev->devid,
>> >> -                                         &type, NULL, 0,
>> >&hw_caps, errp)) {
>> >> +                                         &type, &vtd,
>sizeof(vtd),
>> >&hw_caps,
>> >
>> >s/vtd/vendor_caps/g
>> >
>> >> +                                         errp)) {
>> >>          return false;
>> >>      }
>> >>
>> >> +    if (vtd.flags & IOMMU_HW_INFO_VTD_ERRATA_772415_SPR17) {
>> >> +        container->bcontainer.bypass_ro = true;
>> >
>> >This circled back to checking a vendor specific flag in the core..
>>
>> I'm not sure if VendorCaps struct wrapper is overprogramming as this
>> ERRARA is only VTD specific. We still need to check VendorCaps.vtd.flags
>bit.
>
>Look, the HW_INFO call is done by the core.
>
>Then, the core needs:
>  1 HW caps for dirty tracking and PASID
>  2 IOMMU_HWPT_ALLOC_NEST_PARENT (vIOMMU cap)
>  3 bcontainer.bypass_ro (vIOMMU workaround)

Why vIOMMU workaround? ERRATA is from host IOMMU. In a heterogeneous environment, some host IOMMUs can have this ERRATA while other newer IOMMUs not.

IOMMU_HW_INFO_VTD_ERRATA_772415_SPR17 may only exist on old IOMMUs with nesting capability, vIOMMU doesn't support nesting emulation yet, it's also no sense to emulate an ERRATA in vIOMMU.

>
>Both 2 and 3 need to get from vIOMMU, while 3 needs VendorCaps.
>Arguably 2 could do a bit validation using the VendorCaps too.
>
>> >Perhaps we could upgrade the get_viommu_cap op and its API:
>> >
>> >enum viommu_flags {
>> >    VIOMMU_FLAG_HW_NESTED = BIT_ULL(0),
>> >    VIOMMU_FLAG_BYPASS_RO = BIT_ULL(1),
>> >};
>> >
>> >bool vfio_device_get_viommu_flags(VFIODevice *vbasedev, VendorCaps
>> >*vendor_caps,
>> >                                  uint64_t *viommu_flags);
>> >
>> >Then:
>> >    if (viommu_flags & VIOMMU_FLAG_BYPASS_RO) {
>> >        container->bcontainer.bypass_ro = true;
>> >    }
>> >...
>> >    if (viommu_flags & VIOMMU_FLAG_HW_NESTED) {
>> >        flags |= IOMMU_HWPT_ALLOC_NEST_PARENT;
>> >    }
>>
>> IOMMU_HW_INFO_VTD_ERRATA_772415_SPR17 is a VTD specific flag bit
>> from host IOMMU, we have defined get_viommu_cap() to return pure
>> vIOMMU capability bits, so no host IOMMU flag bit can be returned
>> here. See patch2 commit log for the reason.
>
>VIOMMU_FLAG_BYPASS_RO is a "pure" vIOMMU flag, not confined to
>VTD. IOW, if some other vIOMMU has a similar issue, they can use
>it as well. Since we define a "bypass_ro" in the core bcontainer
>structure, it makes sense to have a core-level flag for it, v.s.
>checking the vendor flag in the core.

It's not a vIOMMU flag but host IOMMU flag except vIOMMU want to emulate that ERRATA.

Due to patch9, there is only one VFIO device under a container, so bypass_ro is set based on VFIO device's backend host IOMMU's flag IOMMU_HW_INFO_VTD_ERRATA_772415_SPR17.

Thanks
Zhenzhong

>
>My sample code is turning this get_viommu_cap to something like
>get_viommu_flags, which could include both "cap" and "errata".
>
>Nicolin
Re: [PATCH v5 20/21] Workaround for ERRATA_772415_SPR17
Posted by Nicolin Chen 2 months, 2 weeks ago
On Wed, Aug 27, 2025 at 07:11:54AM +0000, Duan, Zhenzhong wrote:
> >> >> +    if (vtd.flags & IOMMU_HW_INFO_VTD_ERRATA_772415_SPR17) {
> >> >> +        container->bcontainer.bypass_ro = true;
> >> >
> >> >This circled back to checking a vendor specific flag in the core..
> >>
> >> I'm not sure if VendorCaps struct wrapper is overprogramming as this
> >> ERRARA is only VTD specific. We still need to check VendorCaps.vtd.flags
> >bit.
> >
> >Look, the HW_INFO call is done by the core.
> >
> >Then, the core needs:
> >  1 HW caps for dirty tracking and PASID
> >  2 IOMMU_HWPT_ALLOC_NEST_PARENT (vIOMMU cap)
> >  3 bcontainer.bypass_ro (vIOMMU workaround)
> 
> Why vIOMMU workaround? ERRATA is from host IOMMU.
> In a heterogeneous environment, some host IOMMUs can have
> this ERRATA while other newer IOMMUs not.

To be fair, the subject of your patch is "Workaround". Though it
might be inaccurate to call it "vIOMMU Workaround", the idea was
to let vendor code decode vendor bits and flags.

Arguably, when a host IOMMU has an erratum while requiring a HW
acceleration like nesting, vIOMMU can be a partner to help apply
the workaround.

> IOMMU_HW_INFO_VTD_ERRATA_772415_SPR17 may only exist on old IOMMUs
> with nesting capability, vIOMMU doesn't support nesting emulation yet,
> it's also no sense to emulate an ERRATA in vIOMMU.

Certainly, I never suggest to "emulate an ERRATA".

> >Both 2 and 3 need to get from vIOMMU, while 3 needs VendorCaps.
> >Arguably 2 could do a bit validation using the VendorCaps too.
> >
> >> >Perhaps we could upgrade the get_viommu_cap op and its API:
> >> >
> >> >enum viommu_flags {
> >> >    VIOMMU_FLAG_HW_NESTED = BIT_ULL(0),
> >> >    VIOMMU_FLAG_BYPASS_RO = BIT_ULL(1),
> >> >};
> >> >
> >> >bool vfio_device_get_viommu_flags(VFIODevice *vbasedev, VendorCaps
> >> >*vendor_caps,
> >> >                                  uint64_t *viommu_flags);
> >> >
> >> >Then:
> >> >    if (viommu_flags & VIOMMU_FLAG_BYPASS_RO) {
> >> >        container->bcontainer.bypass_ro = true;
> >> >    }
> >> >...
> >> >    if (viommu_flags & VIOMMU_FLAG_HW_NESTED) {
> >> >        flags |= IOMMU_HWPT_ALLOC_NEST_PARENT;
> >> >    }
> >>
> >> IOMMU_HW_INFO_VTD_ERRATA_772415_SPR17 is a VTD specific flag bit
> >> from host IOMMU, we have defined get_viommu_cap() to return pure
> >> vIOMMU capability bits, so no host IOMMU flag bit can be returned
> >> here. See patch2 commit log for the reason.
> >
> >VIOMMU_FLAG_BYPASS_RO is a "pure" vIOMMU flag, not confined to
> >VTD. IOW, if some other vIOMMU has a similar issue, they can use
> >it as well. Since we define a "bypass_ro" in the core bcontainer
> >structure, it makes sense to have a core-level flag for it, v.s.
> >checking the vendor flag in the core.
> 
> It's not a vIOMMU flag but host IOMMU flag except vIOMMU want to emulate that ERRATA.

Again, the idea here is not to blame vIOMMU for every flag, nor
to emulate the erratum, but to use vIOMMU as a vendor specific
place to translate a vendor specific flag (either host IOMMU or
vIOMMU) to something that QEMU core code can understand.

The rule of thumb is to avoid checking vendor flags in the core,
which both Eric and I have noted for a couple of times..

It's okay that you don't like the way of grouping the host flag
with vIOMMU cap. Please find some other place in the vendor zone
to load the vendor flag? Maybe add another op/API to decode the
host IOMMU flags/caps exclusively?

Nicolin