[PATCH v4 08/10] vfio/migration: Fix a check on vbasedev->iommu_dirty_tracking

Zhenzhong Duan posted 10 patches 2 weeks, 2 days ago
Maintainers: Yi Liu <yi.l.liu@intel.com>, Eric Auger <eric.auger@redhat.com>, Zhenzhong Duan <zhenzhong.duan@intel.com>, "Michael S. Tsirkin" <mst@redhat.com>, Jason Wang <jasowang@redhat.com>, "Clément Mathieu--Drif" <clement.mathieu--drif@eviden.com>, Marcel Apfelbaum <marcel.apfelbaum@gmail.com>, Paolo Bonzini <pbonzini@redhat.com>, Richard Henderson <richard.henderson@linaro.org>, Eduardo Habkost <eduardo@habkost.net>, John Levon <john.levon@nutanix.com>, Thanos Makatos <thanos.makatos@nutanix.com>, "Cédric Le Goater" <clg@redhat.com>, Alex Williamson <alex@shazbot.org>
There is a newer version of this series
[PATCH v4 08/10] vfio/migration: Fix a check on vbasedev->iommu_dirty_tracking
Posted by Zhenzhong Duan 2 weeks, 2 days ago
VFIO IOMMU type1 claims to support IOMMU dirty tracking by setting
bcontainer->dirty_pages_supported, but in vfio_migration_realize()
vbasedev->iommu_dirty_tracking is checked, we should pass
bcontainer->dirty_pages_supported to vbasedev->iommu_dirty_tracking
in legacy backend so that the check is accurate.

Fixes: 30b916778517 ("vfio/common: Allow disabling device dirty page tracking")
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
 hw/vfio/container-legacy.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/vfio/container-legacy.c b/hw/vfio/container-legacy.c
index dd9c4a6a5a..fa726a2733 100644
--- a/hw/vfio/container-legacy.c
+++ b/hw/vfio/container-legacy.c
@@ -845,6 +845,7 @@ static bool vfio_device_get(VFIOGroup *group, const char *name,
                             VFIODevice *vbasedev, Error **errp)
 {
     g_autofree struct vfio_device_info *info = NULL;
+    VFIOContainer *bcontainer = VFIO_IOMMU(group->container);
     int fd;
 
     fd = vfio_cpr_group_get_device_fd(group->fd, name);
@@ -883,7 +884,8 @@ static bool vfio_device_get(VFIOGroup *group, const char *name,
         }
     }
 
-    vfio_device_prepare(vbasedev, VFIO_IOMMU(group->container), info);
+    vfio_device_prepare(vbasedev, bcontainer, info);
+    vbasedev->iommu_dirty_tracking = bcontainer->dirty_pages_supported;
 
     vbasedev->fd = fd;
     vbasedev->group = group;
-- 
2.47.1
Re: [PATCH v4 08/10] vfio/migration: Fix a check on vbasedev->iommu_dirty_tracking
Posted by Avihai Horon 2 weeks, 1 day ago
On 29/10/2025 11:53, Zhenzhong Duan wrote:
> External email: Use caution opening links or attachments
>
>
> VFIO IOMMU type1 claims to support IOMMU dirty tracking by setting
> bcontainer->dirty_pages_supported, but in vfio_migration_realize()
> vbasedev->iommu_dirty_tracking is checked, we should pass
> bcontainer->dirty_pages_supported to vbasedev->iommu_dirty_tracking
> in legacy backend so that the check is accurate.
>
> Fixes: 30b916778517 ("vfio/common: Allow disabling device dirty page tracking")
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> ---
>   hw/vfio/container-legacy.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/hw/vfio/container-legacy.c b/hw/vfio/container-legacy.c
> index dd9c4a6a5a..fa726a2733 100644
> --- a/hw/vfio/container-legacy.c
> +++ b/hw/vfio/container-legacy.c
> @@ -845,6 +845,7 @@ static bool vfio_device_get(VFIOGroup *group, const char *name,
>                               VFIODevice *vbasedev, Error **errp)
>   {
>       g_autofree struct vfio_device_info *info = NULL;
> +    VFIOContainer *bcontainer = VFIO_IOMMU(group->container);
>       int fd;
>
>       fd = vfio_cpr_group_get_device_fd(group->fd, name);
> @@ -883,7 +884,8 @@ static bool vfio_device_get(VFIOGroup *group, const char *name,
>           }
>       }
>
> -    vfio_device_prepare(vbasedev, VFIO_IOMMU(group->container), info);
> +    vfio_device_prepare(vbasedev, bcontainer, info);
> +    vbasedev->iommu_dirty_tracking = bcontainer->dirty_pages_supported;

IIRC, we intentionally don't consider VFIO IOMMU type1 dirty tracking as 
a real dirty tracker, because all it does is mark the whole tracked 
address range as dirty (which is the same as not having dirty tracking 
support at all).
Thus, when vbasedev->iommu_dirty_tracking was introduced, we 
intentionally set it only if IOMMUFD dirty tracking was supported.

So, I don't think this patch is needed.

BTW, do you have a real production use case for migration with VFIO 
IOMMU type1 dirty tracking? I mean, is the scenario you described in 
patch #7 a real use case or you just fixed it for completeness?
If there is no use case, maybe patch #9 is not really needed?

Thanks.
RE: [PATCH v4 08/10] vfio/migration: Fix a check on vbasedev->iommu_dirty_tracking
Posted by Duan, Zhenzhong 2 weeks, 1 day ago

>-----Original Message-----
>From: Avihai Horon <avihaih@nvidia.com>
>Subject: Re: [PATCH v4 08/10] vfio/migration: Fix a check on
>vbasedev->iommu_dirty_tracking
>
>
>On 29/10/2025 11:53, Zhenzhong Duan wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> VFIO IOMMU type1 claims to support IOMMU dirty tracking by setting
>> bcontainer->dirty_pages_supported, but in vfio_migration_realize()
>> vbasedev->iommu_dirty_tracking is checked, we should pass
>> bcontainer->dirty_pages_supported to vbasedev->iommu_dirty_tracking
>> in legacy backend so that the check is accurate.
>>
>> Fixes: 30b916778517 ("vfio/common: Allow disabling device dirty page
>tracking")
>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>> ---
>>   hw/vfio/container-legacy.c | 4 +++-
>>   1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/vfio/container-legacy.c b/hw/vfio/container-legacy.c
>> index dd9c4a6a5a..fa726a2733 100644
>> --- a/hw/vfio/container-legacy.c
>> +++ b/hw/vfio/container-legacy.c
>> @@ -845,6 +845,7 @@ static bool vfio_device_get(VFIOGroup *group,
>const char *name,
>>                               VFIODevice *vbasedev, Error **errp)
>>   {
>>       g_autofree struct vfio_device_info *info = NULL;
>> +    VFIOContainer *bcontainer = VFIO_IOMMU(group->container);
>>       int fd;
>>
>>       fd = vfio_cpr_group_get_device_fd(group->fd, name);
>> @@ -883,7 +884,8 @@ static bool vfio_device_get(VFIOGroup *group,
>const char *name,
>>           }
>>       }
>>
>> -    vfio_device_prepare(vbasedev, VFIO_IOMMU(group->container),
>info);
>> +    vfio_device_prepare(vbasedev, bcontainer, info);
>> +    vbasedev->iommu_dirty_tracking =
>bcontainer->dirty_pages_supported;
>
>IIRC, we intentionally don't consider VFIO IOMMU type1 dirty tracking as
>a real dirty tracker, because all it does is mark the whole tracked
>address range as dirty (which is the same as not having dirty tracking
>support at all).

Yes.

>Thus, when vbasedev->iommu_dirty_tracking was introduced, we
>intentionally set it only if IOMMUFD dirty tracking was supported.

Then we have conflict setting of vbasedev->iommu_dirty_tracking and bcontainer->dirty_pages_supported.
We had set bcontainer->dirty_pages_supported = true in vfio_get_iommu_info_migration().

bcontainer->dirty_pages_supported is checked in dirty tracking related functions of container scope,
e.g., vfio_container_set_dirty_page_tracking() and vfio_container_query_dirty_bitmap(). If we take the
legacy backend ditry tracking as not supported, we should set it to false, is that work for you?

>
>So, I don't think this patch needed.

Yes, if we set bcontainer->dirty_pages_supported = false, this patch and patch9 are both not needed.

>
>BTW, do you have a real production use case for migration with VFIO
>IOMMU type1 dirty tracking? I mean, is the scenario you described in
>patch #7 a real use case or you just fixed it for completeness?
>If there is no use case, maybe patch #9 is not really needed?

patch7 is a real use case, in guest, we switch device from IOMMU domain to block domain and see the issue.
We need to send accurate unmap notification with actual mapping during migration, for both backend.

patch9 is specifically for unmap_bitmap(), it's needed as long as we set bcontainer->dirty_pages_supported = true for legacy backend,
because vfio_legacy_dma_unmap_one checks bcontainer->dirty_pages_supported.

Thanks
Zhenzhong
Re: [PATCH v4 08/10] vfio/migration: Fix a check on vbasedev->iommu_dirty_tracking
Posted by Joao Martins 2 weeks, 1 day ago
On 30/10/2025 09:10, Duan, Zhenzhong wrote:
>> From: Avihai Horon <avihaih@nvidia.com>
>> On 29/10/2025 11:53, Zhenzhong Duan wrote:
>>
>> BTW, do you have a real production use case for migration with VFIO
>> IOMMU type1 dirty tracking? I mean, is the scenario you described in
>> patch #7 a real use case or you just fixed it for completeness?
>> If there is no use case, maybe patch #9 is not really needed?
> 
> patch7 is a real use case, in guest, we switch device from IOMMU domain to block domain and see the issue.
> We need to send accurate unmap notification with actual mapping during migration, for both backend.
> 

I think the real question is why you are using type1 backend overall for
purposes of dirty tracking.

type1 dirty tracking just returns everything in the bitmap as 1s. There's no
actual dirty tracking and we usually call 'perpectual' dirty tracking because
everything DMA mapped as write is always returned as dirty no matter what you
do. It doesn't look at pagetable neither for the unmap get dirty.

It's only (known) use has been for testing (in the lack of IOMMU HW + IOMMUFD)

But reading your statement in a different way: you are saying that you use *two*
backends at the same time? Why would you do that?
RE: [PATCH v4 08/10] vfio/migration: Fix a check on vbasedev->iommu_dirty_tracking
Posted by Duan, Zhenzhong 2 weeks, 1 day ago

>-----Original Message-----
>From: Joao Martins <joao.m.martins@oracle.com>
>Subject: Re: [PATCH v4 08/10] vfio/migration: Fix a check on
>vbasedev->iommu_dirty_tracking
>
>On 30/10/2025 09:10, Duan, Zhenzhong wrote:
>>> From: Avihai Horon <avihaih@nvidia.com>
>>> On 29/10/2025 11:53, Zhenzhong Duan wrote:
>>>
>>> BTW, do you have a real production use case for migration with VFIO
>>> IOMMU type1 dirty tracking? I mean, is the scenario you described in
>>> patch #7 a real use case or you just fixed it for completeness?
>>> If there is no use case, maybe patch #9 is not really needed?
>>
>> patch7 is a real use case, in guest, we switch device from IOMMU domain
>to block domain and see the issue.
>> We need to send accurate unmap notification with actual mapping during
>migration, for both backend.
>>
>
>I think the real question is why you are using type1 backend overall for
>purposes of dirty tracking.

Just because we had set bcontainer->dirty_pages_supported = true, the dirty tracking
by query through kernel VFIO type1 interface is used during migration, even though it's 'perpectual'.

>
>type1 dirty tracking just returns everything in the bitmap as 1s. There's no
>actual dirty tracking and we usually call 'perpectual' dirty tracking because
>everything DMA mapped as write is always returned as dirty no matter what
>you
>do. It doesn't look at pagetable neither for the unmap get dirty.
>
>It's only (known) use has been for testing (in the lack of IOMMU HW +
>IOMMUFD)

You mean testing live migration or the legacy VFIO type1 dirty tracking interface?
If you mean the first, we can force enable it with 'enable-migration=on' and set
bcontainer->dirty_pages_supported = false.

>
>But reading your statement in a different way: you are saying that you use
>*two*
>backends at the same time? Why would you do that?

I mean patch7 is needed no matter legacy backend or IOMMUFD backend.

btw: we do support a qemu cmdline with mixed backend, e.g., two devices,
one backed by legacy backend and the other backed by IOMMUFD. But I'm not mean that.

Thanks
Zhenzhong
Re: [PATCH v4 08/10] vfio/migration: Fix a check on vbasedev->iommu_dirty_tracking
Posted by Joao Martins 2 weeks, 1 day ago
On 30/10/2025 13:06, Duan, Zhenzhong wrote:
>> -----Original Message-----
>> From: Joao Martins <joao.m.martins@oracle.com>
>> Subject: Re: [PATCH v4 08/10] vfio/migration: Fix a check on
>> vbasedev->iommu_dirty_tracking
>>
>> On 30/10/2025 09:10, Duan, Zhenzhong wrote:
>>>> From: Avihai Horon <avihaih@nvidia.com>
>>>> On 29/10/2025 11:53, Zhenzhong Duan wrote:
>>>>
>>>> BTW, do you have a real production use case for migration with VFIO
>>>> IOMMU type1 dirty tracking? I mean, is the scenario you described in
>>>> patch #7 a real use case or you just fixed it for completeness?
>>>> If there is no use case, maybe patch #9 is not really needed?
>>>
>>> patch7 is a real use case, in guest, we switch device from IOMMU domain
>> to block domain and see the issue.
>>> We need to send accurate unmap notification with actual mapping during
>> migration, for both backend.
>>>
>>
>> I think the real question is why you are using type1 backend overall for
>> purposes of dirty tracking.
> 
> Just because we had set bcontainer->dirty_pages_supported = true, the dirty tracking
> by query through kernel VFIO type1 interface is used during migration, even though it's 'perpectual'.
> 

Right but that's what iommu_dirty_tracking is for. iommu_dirty_tracking should
only be set for something that actually uses IOMMU for dirty tracking. And this
patch does the opposite of that.

See below on the perpectual part.

>>
>> type1 dirty tracking just returns everything in the bitmap as 1s. There's no
>> actual dirty tracking and we usually call 'perpectual' dirty tracking because
>> everything DMA mapped as write is always returned as dirty no matter what
>> you
>> do. It doesn't look at pagetable neither for the unmap get dirty.
>>
>> It's only (known) use has been for testing (in the lack of IOMMU HW +
>> IOMMUFD)
> 
> You mean testing live migration or the legacy VFIO type1 dirty tracking interface?
> If you mean the first, we can force enable it with 'enable-migration=on' and set
> bcontainer->dirty_pages_supported = false.
> 

I'm aware. I meant testing of dirty_pages_supported code path before IOMMUFD
existed, or in the lack of IOMMU hardware.

>>
>> But reading your statement in a different way: you are saying that you use
>> *two*
>> backends at the same time? Why would you do that?
> 
> I mean patch7 is needed no matter legacy backend or IOMMUFD backend.
> 
> btw: we do support a qemu cmdline with mixed backend, e.g., two devices,
> one backed by legacy backend and the other backed by IOMMUFD. But I'm not mean that.
patch 7 I don't disagree. But this path 8 doesn't make sense, and it's not
fixing anything but rather introducing wrong behavior I think (also the Fixes:
wouild be wrong if you're code is setting iommu_dirty_tracking). It is setting
iommu_dirty_tracking on something that does not anything iommu dirty tracking
related. For type1 container it should be false.

iommu dirty tracking is purposedly there to make the distinct difference that
type1 dirty tracking is *not* actual IOMMU dirty tracking. And that the
migration *only* goes/is-allowed automatically if either you have device dirty
page tracking or IOMMU dirty tracking, but not on type1 legacy dirty tracking ..
unless you force it (with enable-migration=on).

Imagine type1 dirty tracking actually getting treated like iommu_dirty_tracking:
it means you are telling users that 1TB guests migration with VFs that don't
have device dirty tracking without IOMMU HW can be migrated *by default*. And in
pratice that means they transfer 1TB of RAM during stop-copy. And have them like
1min downtime.
RE: [PATCH v4 08/10] vfio/migration: Fix a check on vbasedev->iommu_dirty_tracking
Posted by Duan, Zhenzhong 2 weeks ago

>-----Original Message-----
>From: Joao Martins <joao.m.martins@oracle.com>
>Subject: Re: [PATCH v4 08/10] vfio/migration: Fix a check on
>vbasedev->iommu_dirty_tracking
>
>On 30/10/2025 13:06, Duan, Zhenzhong wrote:
>>> -----Original Message-----
>>> From: Joao Martins <joao.m.martins@oracle.com>
>>> Subject: Re: [PATCH v4 08/10] vfio/migration: Fix a check on
>>> vbasedev->iommu_dirty_tracking
>>>
>>> On 30/10/2025 09:10, Duan, Zhenzhong wrote:
>>>>> From: Avihai Horon <avihaih@nvidia.com>
>>>>> On 29/10/2025 11:53, Zhenzhong Duan wrote:
>>>>>
>>>>> BTW, do you have a real production use case for migration with VFIO
>>>>> IOMMU type1 dirty tracking? I mean, is the scenario you described in
>>>>> patch #7 a real use case or you just fixed it for completeness?
>>>>> If there is no use case, maybe patch #9 is not really needed?
>>>>
>>>> patch7 is a real use case, in guest, we switch device from IOMMU domain
>>> to block domain and see the issue.
>>>> We need to send accurate unmap notification with actual mapping during
>>> migration, for both backend.
>>>>
>>>
>>> I think the real question is why you are using type1 backend overall for
>>> purposes of dirty tracking.
>>
>> Just because we had set bcontainer->dirty_pages_supported = true, the
>dirty tracking
>> by query through kernel VFIO type1 interface is used during migration, even
>though it's 'perpectual'.
>>
>
>Right but that's what iommu_dirty_tracking is for. iommu_dirty_tracking
>should
>only be set for something that actually uses IOMMU for dirty tracking. And
>this
>patch does the opposite of that.
>
>See below on the perpectual part.
>
>>>
>>> type1 dirty tracking just returns everything in the bitmap as 1s. There's no
>>> actual dirty tracking and we usually call 'perpectual' dirty tracking because
>>> everything DMA mapped as write is always returned as dirty no matter
>what
>>> you
>>> do. It doesn't look at pagetable neither for the unmap get dirty.
>>>
>>> It's only (known) use has been for testing (in the lack of IOMMU HW +
>>> IOMMUFD)
>>
>> You mean testing live migration or the legacy VFIO type1 dirty tracking
>interface?
>> If you mean the first, we can force enable it with 'enable-migration=on' and
>set
>> bcontainer->dirty_pages_supported = false.
>>
>
>I'm aware. I meant testing of dirty_pages_supported code path before
>IOMMUFD
>existed, or in the lack of IOMMU hardware.
>
>>>
>>> But reading your statement in a different way: you are saying that you use
>>> *two*
>>> backends at the same time? Why would you do that?
>>
>> I mean patch7 is needed no matter legacy backend or IOMMUFD backend.
>>
>> btw: we do support a qemu cmdline with mixed backend, e.g., two devices,
>> one backed by legacy backend and the other backed by IOMMUFD. But I'm
>not mean that.
>patch 7 I don't disagree. But this path 8 doesn't make sense, and it's not
>fixing anything but rather introducing wrong behavior I think (also the Fixes:
>wouild be wrong if you're code is setting iommu_dirty_tracking). It is setting
>iommu_dirty_tracking on something that does not anything iommu dirty
>tracking
>related. For type1 container it should be false.
>
>iommu dirty tracking is purposedly there to make the distinct difference that
>type1 dirty tracking is *not* actual IOMMU dirty tracking. And that the
>migration *only* goes/is-allowed automatically if either you have device dirty
>page tracking or IOMMU dirty tracking, but not on type1 legacy dirty
>tracking ..
>unless you force it (with enable-migration=on).
>
>Imagine type1 dirty tracking actually getting treated like
>iommu_dirty_tracking:
>it means you are telling users that 1TB guests migration with VFs that don't
>have device dirty tracking without IOMMU HW can be migrated *by default*.
>And in
>pratice that means they transfer 1TB of RAM during stop-copy. And have
>them like
>1min downtime.

Clear, so it's intentional to have dirty_pages_supported = true and
iommu dirty tracking = false for legacy backend, I'll drop this patch.
Thank you all.

BRs,
Zhenzhong