Kernel allows user to switch IOMMU domain, e.g., switch between DMA
and identity domain. When this happen in IOMMU scalable mode, a pasid
cache invalidation request is sent, this request is ignored by vIOMMU
which leads to device binding to wrong address space, then DMA fails.
This issue exists in scalable mode with both first stage and second
stage translations, both emulated and passthrough devices.
Take network device for example, below sequence trigger issue:
1. start a guest with iommu=pt
2. echo 0000:01:00.0 > /sys/bus/pci/drivers/virtio-pci/unbind
3. echo DMA > /sys/kernel/iommu_groups/6/type
4. echo 0000:01:00.0 > /sys/bus/pci/drivers/virtio-pci/bind
5. Ping test
Fix it by switching address space in invalidation handler.
Fixes: 4a4f219e8a10 ("intel_iommu: add scalable-mode option to make scalable mode work")
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
hw/i386/intel_iommu.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 07bc0a749c..aca51cbd8e 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -3095,15 +3095,28 @@ static void vtd_pasid_cache_sync_locked(gpointer key, gpointer value,
VTDAddressSpace *vtd_as = value;
VTDPASIDCacheEntry *pc_entry = &vtd_as->pasid_cache_entry;
VTDPASIDEntry pe;
+ IOMMUNotifier *n;
uint16_t did;
if (vtd_dev_get_pe_from_pasid(vtd_as, &pe)) {
+ if (!pc_entry->valid) {
+ return;
+ }
/*
* No valid pasid entry in guest memory. e.g. pasid entry was modified
* to be either all-zero or non-present. Either case means existing
* pasid cache should be invalidated.
*/
pc_entry->valid = false;
+
+ /*
+ * When a pasid entry isn't valid any more, we should unmap all
+ * mappings in shadow pages instantly to ensure DMA security.
+ */
+ IOMMU_NOTIFIER_FOREACH(n, &vtd_as->iommu) {
+ vtd_address_space_unmap(vtd_as, n);
+ }
+ vtd_switch_address_space(vtd_as);
return;
}
@@ -3131,6 +3144,9 @@ static void vtd_pasid_cache_sync_locked(gpointer key, gpointer value,
pc_entry->pasid_entry = pe;
pc_entry->valid = true;
+
+ vtd_switch_address_space(vtd_as);
+ vtd_address_space_sync(vtd_as);
}
static void vtd_pasid_cache_sync(IntelIOMMUState *s, VTDPASIDCacheInfo *pc_info)
--
2.47.1
On 2025/10/16 15:45, Zhenzhong Duan wrote:
> Kernel allows user to switch IOMMU domain, e.g., switch between DMA
> and identity domain. When this happen in IOMMU scalable mode, a pasid
> cache invalidation request is sent, this request is ignored by vIOMMU
> which leads to device binding to wrong address space, then DMA fails.
>
> This issue exists in scalable mode with both first stage and second
> stage translations, both emulated and passthrough devices.
>
> Take network device for example, below sequence trigger issue:
>
> 1. start a guest with iommu=pt
> 2. echo 0000:01:00.0 > /sys/bus/pci/drivers/virtio-pci/unbind
> 3. echo DMA > /sys/kernel/iommu_groups/6/type
> 4. echo 0000:01:00.0 > /sys/bus/pci/drivers/virtio-pci/bind
> 5. Ping test
>
> Fix it by switching address space in invalidation handler.
>
> Fixes: 4a4f219e8a10 ("intel_iommu: add scalable-mode option to make scalable mode work")
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> ---
> hw/i386/intel_iommu.c | 16 ++++++++++++++++
> 1 file changed, 16 insertions(+)
>
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 07bc0a749c..aca51cbd8e 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -3095,15 +3095,28 @@ static void vtd_pasid_cache_sync_locked(gpointer key, gpointer value,
> VTDAddressSpace *vtd_as = value;
> VTDPASIDCacheEntry *pc_entry = &vtd_as->pasid_cache_entry;
> VTDPASIDEntry pe;
> + IOMMUNotifier *n;
> uint16_t did;
>
> if (vtd_dev_get_pe_from_pasid(vtd_as, &pe)) {
> + if (!pc_entry->valid) {
> + return;
> + }
> /*
> * No valid pasid entry in guest memory. e.g. pasid entry was modified
> * to be either all-zero or non-present. Either case means existing
> * pasid cache should be invalidated.
> */
> pc_entry->valid = false;
> +
> + /*
> + * When a pasid entry isn't valid any more, we should unmap all
> + * mappings in shadow pages instantly to ensure DMA security.
> + */
> + IOMMU_NOTIFIER_FOREACH(n, &vtd_as->iommu) {
> + vtd_address_space_unmap(vtd_as, n);
> + }
will the below switch as also unmap DMAs? Say guest switches vfio device
from identity domain to blocking domain. Guest will tear down pasid
entry and flush pasid cache. The below switch will convert as MR from
nodmar MR to iommu MR. It should trigger the vfio_listener to unmap DMA
as well. could you confirm it? :)
> + vtd_switch_address_space(vtd_as);
> return;
> }
>
> @@ -3131,6 +3144,9 @@ static void vtd_pasid_cache_sync_locked(gpointer key, gpointer value,
>
> pc_entry->pasid_entry = pe;
> pc_entry->valid = true;
> +
> + vtd_switch_address_space(vtd_as);
> + vtd_address_space_sync(vtd_as);
In the case of guest modifies fields other than PGTT/FS/SS page table
pointer of the pasid entry, we shall not sync as for vfio device. You
may compare the cached pasid entry with the one in guest memory to
detect it.
> }
>
> static void vtd_pasid_cache_sync(IntelIOMMUState *s, VTDPASIDCacheInfo *pc_info)\
Regards,
Yi Liu
>-----Original Message-----
>From: Liu, Yi L <yi.l.liu@intel.com>
>Subject: Re: [PATCH v2 3/3] intel_iommu: Fix DMA failure when guest
>switches IOMMU domain
>
>On 2025/10/16 15:45, Zhenzhong Duan wrote:
>> Kernel allows user to switch IOMMU domain, e.g., switch between DMA
>> and identity domain. When this happen in IOMMU scalable mode, a pasid
>> cache invalidation request is sent, this request is ignored by vIOMMU
>> which leads to device binding to wrong address space, then DMA fails.
>>
>> This issue exists in scalable mode with both first stage and second
>> stage translations, both emulated and passthrough devices.
>>
>> Take network device for example, below sequence trigger issue:
>>
>> 1. start a guest with iommu=pt
>> 2. echo 0000:01:00.0 > /sys/bus/pci/drivers/virtio-pci/unbind
>> 3. echo DMA > /sys/kernel/iommu_groups/6/type
>> 4. echo 0000:01:00.0 > /sys/bus/pci/drivers/virtio-pci/bind
>> 5. Ping test
>>
>> Fix it by switching address space in invalidation handler.
>>
>> Fixes: 4a4f219e8a10 ("intel_iommu: add scalable-mode option to make
>scalable mode work")
>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>> ---
>> hw/i386/intel_iommu.c | 16 ++++++++++++++++
>> 1 file changed, 16 insertions(+)
>>
>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
>> index 07bc0a749c..aca51cbd8e 100644
>> --- a/hw/i386/intel_iommu.c
>> +++ b/hw/i386/intel_iommu.c
>> @@ -3095,15 +3095,28 @@ static void
>vtd_pasid_cache_sync_locked(gpointer key, gpointer value,
>> VTDAddressSpace *vtd_as = value;
>> VTDPASIDCacheEntry *pc_entry = &vtd_as->pasid_cache_entry;
>> VTDPASIDEntry pe;
>> + IOMMUNotifier *n;
>> uint16_t did;
>>
>> if (vtd_dev_get_pe_from_pasid(vtd_as, &pe)) {
>> + if (!pc_entry->valid) {
>> + return;
>> + }
>> /*
>> * No valid pasid entry in guest memory. e.g. pasid entry was
>modified
>> * to be either all-zero or non-present. Either case means
>existing
>> * pasid cache should be invalidated.
>> */
>> pc_entry->valid = false;
>> +
>> + /*
>> + * When a pasid entry isn't valid any more, we should unmap all
>> + * mappings in shadow pages instantly to ensure DMA security.
>> + */
>> + IOMMU_NOTIFIER_FOREACH(n, &vtd_as->iommu) {
>> + vtd_address_space_unmap(vtd_as, n);
>> + }
>
>will the below switch as also unmap DMAs? Say guest switches vfio device
>from identity domain to blocking domain. Guest will tear down pasid
>entry and flush pasid cache. The below switch will convert as MR from
>nodmar MR to iommu MR. It should trigger the vfio_listener to unmap DMA
>as well. could you confirm it? :)
Yes, vtd_address_space_unmap() is none op in this case. But we need it for other scenarios,
e.g., switch from DMA domain->block domain.
>
>> + vtd_switch_address_space(vtd_as);
>> return;
>> }
>>
>> @@ -3131,6 +3144,9 @@ static void
>vtd_pasid_cache_sync_locked(gpointer key, gpointer value,
>>
>> pc_entry->pasid_entry = pe;
>> pc_entry->valid = true;
>> +
>> + vtd_switch_address_space(vtd_as);
>> + vtd_address_space_sync(vtd_as);
>
>In the case of guest modifies fields other than PGTT/FS/SS page table
>pointer of the pasid entry, we shall not sync as for vfio device. You
>may compare the cached pasid entry with the one in guest memory to
>detect it.
OK, will do.
Thanks
Zhenzhong
© 2016 - 2025 Red Hat, Inc.