[PATCH v2 00/20] AMD vIOMMU: DMA remapping support for VFIO devices

Alejandro Jimenez posted 20 patches 7 months, 3 weeks ago
Failed in applying to current master (apply log)
Maintainers: Paolo Bonzini <pbonzini@redhat.com>, Richard Henderson <richard.henderson@linaro.org>, Eduardo Habkost <eduardo@habkost.net>, "Michael S. Tsirkin" <mst@redhat.com>, Marcel Apfelbaum <marcel.apfelbaum@gmail.com>, Peter Xu <peterx@redhat.com>, David Hildenbrand <david@redhat.com>, "Philippe Mathieu-Daudé" <philmd@linaro.org>
There is a newer version of this series
hw/i386/amd_iommu.c | 1005 ++++++++++++++++++++++++++++++++++++-------
hw/i386/amd_iommu.h |   52 +++
qemu-options.hx     |   23 +
system/memory.c     |   10 +-
4 files changed, 934 insertions(+), 156 deletions(-)
[PATCH v2 00/20] AMD vIOMMU: DMA remapping support for VFIO devices
Posted by Alejandro Jimenez 7 months, 3 weeks ago
This series adds support for guests using the AMD vIOMMU to enable DMA
remapping for VFIO devices. In addition to the currently supported
passthrough (PT) mode, guest kernels are now able to to provide DMA
address translation and access permission checking to VFs attached to
paging domains, using the AMD v1 I/O page table format.

Please see v1[0] cover letter for additional details such as example
QEMU command line parameters used in testing.

Changes since v1[0]:
- Added documentation entry for '-device amd-iommu'
- Code movement with no functional changes to avoid use of forward
  declarations in later patches [Sairaj, mst]
- Moved addr_translation and dma-remap property to separate commits.
  The dma-remap feature is only available for users to enable after
  all required functionality is implemented [Sairaj]
- Explicit initialization of significant fields like addr_translation
  and notifier_flags [Sairaj]
- Fixed bug in decoding of invalidation size [Sairaj]
- Changed fetch_pte() to use an out parameter for pte, and be able to
  check for error conditions via negative return value [Clement]
- Removed UNMAP-only notifier optimization, leaving vhost support for
  later series [Sairaj]
- Fixed ordering between address space unmap and memory region activation
  on devtab invalidation [Sairaj]
- Fixed commit message with "V=1, TV=0" [Sairaj]
- Dropped patch removing the page_fault event. That area is better
  addressed in separate series.
- Independent testing by Sairaj (thank you!)

Thank you,
Alejandro

[0] https://lore.kernel.org/all/20250414020253.443831-1-alejandro.j.jimenez@oracle.com/

Alejandro Jimenez (20):
  memory: Adjust event ranges to fit within notifier boundaries
  amd_iommu: Document '-device amd-iommu' common options
  amd_iommu: Reorder device and page table helpers
  amd_iommu: Helper to decode size of page invalidation command
  amd_iommu: Add helper function to extract the DTE
  amd_iommu: Return an error when unable to read PTE from guest memory
  amd_iommu: Add helpers to walk AMD v1 Page Table format
  amd_iommu: Add a page walker to sync shadow page tables on
    invalidation
  amd_iommu: Add basic structure to support IOMMU notifier updates
  amd_iommu: Sync shadow page tables on page invalidation
  amd_iommu: Use iova_tree records to determine large page size on UNMAP
  amd_iommu: Unmap all address spaces under the AMD IOMMU on reset
  amd_iommu: Add replay callback
  amd_iommu: Invalidate address translations on INVALIDATE_IOMMU_ALL
  amd_iommu: Toggle memory regions based on address translation mode
  amd_iommu: Set all address spaces to default translation mode on reset
  amd_iommu: Add dma-remap property to AMD vIOMMU device
  amd_iommu: Toggle address translation mode on devtab entry
    invalidation
  amd_iommu: Do not assume passthrough translation when DTE[TV]=0
  amd_iommu: Refactor amdvi_page_walk() to use common code for page walk

 hw/i386/amd_iommu.c | 1005 ++++++++++++++++++++++++++++++++++++-------
 hw/i386/amd_iommu.h |   52 +++
 qemu-options.hx     |   23 +
 system/memory.c     |   10 +-
 4 files changed, 934 insertions(+), 156 deletions(-)


base-commit: 5134cf9b5d3aee4475fe7e1c1c11b093731073cf
-- 
2.43.5
Re: [PATCH v2 00/20] AMD vIOMMU: DMA remapping support for VFIO devices
Posted by Michael S. Tsirkin 6 months, 3 weeks ago
On Fri, May 02, 2025 at 02:15:45AM +0000, Alejandro Jimenez wrote:
> This series adds support for guests using the AMD vIOMMU to enable DMA
> remapping for VFIO devices. In addition to the currently supported
> passthrough (PT) mode, guest kernels are now able to to provide DMA
> address translation and access permission checking to VFs attached to
> paging domains, using the AMD v1 I/O page table format.
> 
> Please see v1[0] cover letter for additional details such as example
> QEMU command line parameters used in testing.

are you working on v3? there was a bug you wanted to fix.

> Changes since v1[0]:
> - Added documentation entry for '-device amd-iommu'
> - Code movement with no functional changes to avoid use of forward
>   declarations in later patches [Sairaj, mst]
> - Moved addr_translation and dma-remap property to separate commits.
>   The dma-remap feature is only available for users to enable after
>   all required functionality is implemented [Sairaj]
> - Explicit initialization of significant fields like addr_translation
>   and notifier_flags [Sairaj]
> - Fixed bug in decoding of invalidation size [Sairaj]
> - Changed fetch_pte() to use an out parameter for pte, and be able to
>   check for error conditions via negative return value [Clement]
> - Removed UNMAP-only notifier optimization, leaving vhost support for
>   later series [Sairaj]
> - Fixed ordering between address space unmap and memory region activation
>   on devtab invalidation [Sairaj]
> - Fixed commit message with "V=1, TV=0" [Sairaj]
> - Dropped patch removing the page_fault event. That area is better
>   addressed in separate series.
> - Independent testing by Sairaj (thank you!)
> 
> Thank you,
> Alejandro
> 
> [0] https://lore.kernel.org/all/20250414020253.443831-1-alejandro.j.jimenez@oracle.com/
> 
> Alejandro Jimenez (20):
>   memory: Adjust event ranges to fit within notifier boundaries
>   amd_iommu: Document '-device amd-iommu' common options
>   amd_iommu: Reorder device and page table helpers
>   amd_iommu: Helper to decode size of page invalidation command
>   amd_iommu: Add helper function to extract the DTE
>   amd_iommu: Return an error when unable to read PTE from guest memory
>   amd_iommu: Add helpers to walk AMD v1 Page Table format
>   amd_iommu: Add a page walker to sync shadow page tables on
>     invalidation
>   amd_iommu: Add basic structure to support IOMMU notifier updates
>   amd_iommu: Sync shadow page tables on page invalidation
>   amd_iommu: Use iova_tree records to determine large page size on UNMAP
>   amd_iommu: Unmap all address spaces under the AMD IOMMU on reset
>   amd_iommu: Add replay callback
>   amd_iommu: Invalidate address translations on INVALIDATE_IOMMU_ALL
>   amd_iommu: Toggle memory regions based on address translation mode
>   amd_iommu: Set all address spaces to default translation mode on reset
>   amd_iommu: Add dma-remap property to AMD vIOMMU device
>   amd_iommu: Toggle address translation mode on devtab entry
>     invalidation
>   amd_iommu: Do not assume passthrough translation when DTE[TV]=0
>   amd_iommu: Refactor amdvi_page_walk() to use common code for page walk
> 
>  hw/i386/amd_iommu.c | 1005 ++++++++++++++++++++++++++++++++++++-------
>  hw/i386/amd_iommu.h |   52 +++
>  qemu-options.hx     |   23 +
>  system/memory.c     |   10 +-
>  4 files changed, 934 insertions(+), 156 deletions(-)
> 
> 
> base-commit: 5134cf9b5d3aee4475fe7e1c1c11b093731073cf
> -- 
> 2.43.5
Re: [PATCH v2 00/20] AMD vIOMMU: DMA remapping support for VFIO devices
Posted by Alejandro Jimenez 6 months, 3 weeks ago

On 5/30/25 7:41 AM, Michael S. Tsirkin wrote:
> On Fri, May 02, 2025 at 02:15:45AM +0000, Alejandro Jimenez wrote:
>> This series adds support for guests using the AMD vIOMMU to enable DMA
>> remapping for VFIO devices. In addition to the currently supported
>> passthrough (PT) mode, guest kernels are now able to to provide DMA
>> address translation and access permission checking to VFs attached to
>> paging domains, using the AMD v1 I/O page table format.
>>
>> Please see v1[0] cover letter for additional details such as example
>> QEMU command line parameters used in testing.
> 
> are you working on v3?

Yes, there are suggestions from Sairaj that I will address on v3. I am 
also planning to include two small patches from Joao Martins that add 
support for the HATDis feature (this is something that Sairaj suggested 
earlier). The Linux changes are being reviewed here:
https://lore.kernel.org/all/cover.1746613368.git.Ankit.Soni@amd.com/

I will be offline from 6/2 to 6/6, so I didn't want to send a new 
revision and disappear. In general, the changes from v2->v3 are minor 
and well contained, so any reviews I receive for v2 will be valid.
That being said, I can send v3 today if you'd prefer that. Please let me 
know.

> there was a bug you wanted to fix.
>

I assume the bug is Sairaj's report of a dmesg warning with an NVME 
passthrough on a 4.15 kernel, but unfortunately I have not been able to 
reproduce that problem. We agreed that given the age of the kernel (and 
reports of the same warning on NVME devices in unrelated scenarios), 
this is likely a guest driver issue, and should not be a blocker.

More details:
I have tested an Ubuntu image with a 4.15 kernel, but I cannot hit any 
issues when I passthrough a CX-6 VF (I don't have access to NMVE VF). 
The kernel is old enough that I have to force bind the mlx5_core driver 
to the VF on the guest, but once I do the VF comes up with no errors and 
I can see DMA map/unmap activity in the traces.

Sairaj: Are you passing a full NVME device to the guest (i.e. a PF)? I 
ask because the BDF in '-device vfio-pci,host=0000:44:00.0' doesn't look 
like a typical VF...

Thank you,
Alejandro

>> Changes since v1[0]:
>> - Added documentation entry for '-device amd-iommu'
>> - Code movement with no functional changes to avoid use of forward
>>    declarations in later patches [Sairaj, mst]
>> - Moved addr_translation and dma-remap property to separate commits.
>>    The dma-remap feature is only available for users to enable after
>>    all required functionality is implemented [Sairaj]
>> - Explicit initialization of significant fields like addr_translation
>>    and notifier_flags [Sairaj]
>> - Fixed bug in decoding of invalidation size [Sairaj]
>> - Changed fetch_pte() to use an out parameter for pte, and be able to
>>    check for error conditions via negative return value [Clement]
>> - Removed UNMAP-only notifier optimization, leaving vhost support for
>>    later series [Sairaj]
>> - Fixed ordering between address space unmap and memory region activation
>>    on devtab invalidation [Sairaj]
>> - Fixed commit message with "V=1, TV=0" [Sairaj]
>> - Dropped patch removing the page_fault event. That area is better
>>    addressed in separate series.
>> - Independent testing by Sairaj (thank you!)
>>
>> Thank you,
>> Alejandro
>>
>> [0] https://lore.kernel.org/all/20250414020253.443831-1-alejandro.j.jimenez@oracle.com/
>>
>> Alejandro Jimenez (20):
>>    memory: Adjust event ranges to fit within notifier boundaries
>>    amd_iommu: Document '-device amd-iommu' common options
>>    amd_iommu: Reorder device and page table helpers
>>    amd_iommu: Helper to decode size of page invalidation command
>>    amd_iommu: Add helper function to extract the DTE
>>    amd_iommu: Return an error when unable to read PTE from guest memory
>>    amd_iommu: Add helpers to walk AMD v1 Page Table format
>>    amd_iommu: Add a page walker to sync shadow page tables on
>>      invalidation
>>    amd_iommu: Add basic structure to support IOMMU notifier updates
>>    amd_iommu: Sync shadow page tables on page invalidation
>>    amd_iommu: Use iova_tree records to determine large page size on UNMAP
>>    amd_iommu: Unmap all address spaces under the AMD IOMMU on reset
>>    amd_iommu: Add replay callback
>>    amd_iommu: Invalidate address translations on INVALIDATE_IOMMU_ALL
>>    amd_iommu: Toggle memory regions based on address translation mode
>>    amd_iommu: Set all address spaces to default translation mode on reset
>>    amd_iommu: Add dma-remap property to AMD vIOMMU device
>>    amd_iommu: Toggle address translation mode on devtab entry
>>      invalidation
>>    amd_iommu: Do not assume passthrough translation when DTE[TV]=0
>>    amd_iommu: Refactor amdvi_page_walk() to use common code for page walk
>>
>>   hw/i386/amd_iommu.c | 1005 ++++++++++++++++++++++++++++++++++++-------
>>   hw/i386/amd_iommu.h |   52 +++
>>   qemu-options.hx     |   23 +
>>   system/memory.c     |   10 +-
>>   4 files changed, 934 insertions(+), 156 deletions(-)
>>
>>
>> base-commit: 5134cf9b5d3aee4475fe7e1c1c11b093731073cf
>> -- 
>> 2.43.5
>
Re: [PATCH v2 00/20] AMD vIOMMU: DMA remapping support for VFIO devices
Posted by Sairaj Kodilkar 6 months, 2 weeks ago
> Sairaj: Are you passing a full NVME device to the guest (i.e. a PF)? I 
> ask because the BDF in '-device vfio-pci,host=0000:44:00.0' doesn't look 
> like a typical VF...
> 
Hey Alejandro,

I am passing full NVME device to the guest (not just VF).

Thanks
Sairaj
Re: [PATCH v2 00/20] AMD vIOMMU: DMA remapping support for VFIO devices
Posted by Sairaj Kodilkar 7 months ago

On 5/2/2025 7:45 AM, Alejandro Jimenez wrote:
> This series adds support for guests using the AMD vIOMMU to enable DMA
> remapping for VFIO devices. In addition to the currently supported
> passthrough (PT) mode, guest kernels are now able to to provide DMA
> address translation and access permission checking to VFs attached to
> paging domains, using the AMD v1 I/O page table format.
> 
> Please see v1[0] cover letter for additional details such as example
> QEMU command line parameters used in testing.
> 
> Changes since v1[0]:
> - Added documentation entry for '-device amd-iommu'
> - Code movement with no functional changes to avoid use of forward
>    declarations in later patches [Sairaj, mst]
> - Moved addr_translation and dma-remap property to separate commits.
>    The dma-remap feature is only available for users to enable after
>    all required functionality is implemented [Sairaj]
> - Explicit initialization of significant fields like addr_translation
>    and notifier_flags [Sairaj]
> - Fixed bug in decoding of invalidation size [Sairaj]
> - Changed fetch_pte() to use an out parameter for pte, and be able to
>    check for error conditions via negative return value [Clement]
> - Removed UNMAP-only notifier optimization, leaving vhost support for
>    later series [Sairaj]
> - Fixed ordering between address space unmap and memory region activation
>    on devtab invalidation [Sairaj]
> - Fixed commit message with "V=1, TV=0" [Sairaj]
> - Dropped patch removing the page_fault event. That area is better
>    addressed in separate series.
> - Independent testing by Sairaj (thank you!)
> 
> Thank you,
> Alejandro
> 
> [0] https://lore.kernel.org/all/20250414020253.443831-1-alejandro.j.jimenez@oracle.com/
> 
> Alejandro Jimenez (20):
>    memory: Adjust event ranges to fit within notifier boundaries
>    amd_iommu: Document '-device amd-iommu' common options
>    amd_iommu: Reorder device and page table helpers
>    amd_iommu: Helper to decode size of page invalidation command
>    amd_iommu: Add helper function to extract the DTE
>    amd_iommu: Return an error when unable to read PTE from guest memory
>    amd_iommu: Add helpers to walk AMD v1 Page Table format
>    amd_iommu: Add a page walker to sync shadow page tables on
>      invalidation
>    amd_iommu: Add basic structure to support IOMMU notifier updates
>    amd_iommu: Sync shadow page tables on page invalidation
>    amd_iommu: Use iova_tree records to determine large page size on UNMAP
>    amd_iommu: Unmap all address spaces under the AMD IOMMU on reset
>    amd_iommu: Add replay callback
>    amd_iommu: Invalidate address translations on INVALIDATE_IOMMU_ALL
>    amd_iommu: Toggle memory regions based on address translation mode
>    amd_iommu: Set all address spaces to default translation mode on reset
>    amd_iommu: Add dma-remap property to AMD vIOMMU device
>    amd_iommu: Toggle address translation mode on devtab entry
>      invalidation
>    amd_iommu: Do not assume passthrough translation when DTE[TV]=0
>    amd_iommu: Refactor amdvi_page_walk() to use common code for page walk
> 
>   hw/i386/amd_iommu.c | 1005 ++++++++++++++++++++++++++++++++++++-------
>   hw/i386/amd_iommu.h |   52 +++
>   qemu-options.hx     |   23 +
>   system/memory.c     |   10 +-
>   4 files changed, 934 insertions(+), 156 deletions(-)
> 
> 
> base-commit: 5134cf9b5d3aee4475fe7e1c1c11b093731073cf

Hi Alejandro,

Tested the v2, everything looks good when I boot guest with upstream
kernel. But I observed that NVME driver fails to load with guest kernel
version 4.15.0-213-generic. This is the default kernel that comes with
the ubuntu image.

This is what I see in the dmesg

[   26.702381] nvme nvme0: pci function 0000:00:04.0
[   26.817847] nvme nvme0: missing or invalid SUBNQN field.

I am using following command qemu command line

-enable-kvm -m 10G -smp cpus=$NUM_VCPUS  \
-device amd-iommu,dma-remap=on \
-netdev user,id=USER0,hostfwd=tcp::3333-:22 \
-device 
virtio-net-pci,id=vnet0,iommu_platform=on,disable-legacy=on,romfile=,netdev=USER0 
\
-cpu 
EPYC-Genoa,x2apic=on,kvm-msi-ext-dest-id=on,+kvm-pv-unhalt,kvm-pv-tlb-flush,kvm-pv-ipi,kvm-pv-sched-yield 
  \
-name guest=my-vm,debug-threads=on \
-machine q35,kernel_irqchip=split \
-global kvm-pit.lost_tick_policy=discard \
-nographic -vga none -chardev stdio,id=STDIO0,signal=off,mux=on \
-device isa-serial,id=isa-serial0,chardev=STDIO0 \
-smbios type=0,version=2.8 \
-blockdev 
node-name=drive0,driver=qcow2,file.driver=file,file.filename=$IMG \
-device virtio-blk-pci,num-queues=8,drive=drive0 \
-chardev socket,id=SOCKET1,server=on,wait=off,path=qemu.mon.user3333 \
-mon chardev=SOCKET1,mode=control \
-device vfio-pci,host=0000:44:00.0

Do you have any idea what might trigger this.

I see the error only when I am using emulated AMD IOMMU with passthrough
device. Regular passthrough works fine.

Regards
Sairaj Kodilkar

P.S. I know that the guest kernel is quite old but still wanted to make 
you aware.
Re: [PATCH v2 00/20] AMD vIOMMU: DMA remapping support for VFIO devices
Posted by Alejandro Jimenez 7 months ago
Hi Sairaj

On 5/16/25 4:07 AM, Sairaj Kodilkar wrote:
> 
> 
> On 5/2/2025 7:45 AM, Alejandro Jimenez wrote:

> Hi Alejandro,
> 
> Tested the v2, everything looks good when I boot guest with upstream
> kernel. But I observed that NVME driver fails to load with guest kernel
> version 4.15.0-213-generic. This is the default kernel that comes with
> the ubuntu image.

Thank you for the additional testing and for the report. I wanted to 
investigate and if possible solve the issue before replying, but since 
it is taking me some time I wanted to ACK your message. Minor comments 
below...
> 
> This is what I see in the dmesg
> 
> [   26.702381] nvme nvme0: pci function 0000:00:04.0
> [   26.817847] nvme nvme0: missing or invalid SUBNQN field.

There are multiple reports of that warning which would indicate that is 
not caused by an issue with the IOMMU emulation, but it is interesting 
that you don't see it with "regular passthrough" (I assume that means 
with guest kernel in pt mode).

> 
> I am using following command qemu command line
> 
> -enable-kvm -m 10G -smp cpus=$NUM_VCPUS  \
> -device amd-iommu,dma-remap=on \
> -netdev user,id=USER0,hostfwd=tcp::3333-:22 \
> -device virtio-net-pci,id=vnet0,iommu_platform=on,disable- 
> legacy=on,romfile=,netdev=USER0 \
> -cpu EPYC-Genoa,x2apic=on,kvm-msi-ext-dest-id=on,+kvm-pv-unhalt,kvm-pv- 
> tlb-flush,kvm-pv-ipi,kvm-pv-sched-yield  \
> -name guest=my-vm,debug-threads=on \
> -machine q35,kernel_irqchip=split \
> -global kvm-pit.lost_tick_policy=discard \
> -nographic -vga none -chardev stdio,id=STDIO0,signal=off,mux=on \
> -device isa-serial,id=isa-serial0,chardev=STDIO0 \
> -smbios type=0,version=2.8 \
> -blockdev node- 
> name=drive0,driver=qcow2,file.driver=file,file.filename=$IMG \
> -device virtio-blk-pci,num-queues=8,drive=drive0 \
> -chardev socket,id=SOCKET1,server=on,wait=off,path=qemu.mon.user3333 \
> -mon chardev=SOCKET1,mode=control \
> -device vfio-pci,host=0000:44:00.0
> 
> Do you have any idea what might trigger this.

There are some parameters above that are unnecessary and perhaps 
conflicting e.g. we don't need kvm-msi-ext-dest-id=on since the vIOMMU 
provides interrupt remapping (plus you are likely not using more than 
255 vCPUs). We also don't need kvm-pit.lost_tick_policy when using split 
irqchip, since the PIT is not emulated by KVM. But to be fair I don't 
believe those are likely to be causing the problem...

My main suspicion is the guest IOMMU driver being too old and missing 
lots of fixes, so it could be missing some essential operations that the 
emulation requires to work. e.g. if the guest driver does not comply 
with the spec and fails to issue a DEVTAB_INVALIDATE after changing the 
DTE, the vIOMMU code never gets the chance to enable the IOMMU memory 
region, and it all goes wrong from that point on.
But I need to reproduce the problem and figure out where/when the 
emulation is failing. I've tested as far back as 5.15 based kernels.

I would argue that while it is something that I am definitely going to 
address if possible, this issue should not be a blocker. I'll update as 
soon as I have more data on the cause.

Thank you,
Alejandro

> 
> I see the error only when I am using emulated AMD IOMMU with passthrough
> device. Regular passthrough works fine.
> 
> Regards
> Sairaj Kodilkar
> 
> P.S. I know that the guest kernel is quite old but still wanted to make 
> you aware.
> 


Re: [PATCH v2 00/20] AMD vIOMMU: DMA remapping support for VFIO devices
Posted by Sairaj Kodilkar 7 months ago

On 5/21/2025 8:05 AM, Alejandro Jimenez wrote:
> Hi Sairaj
> 
> On 5/16/25 4:07 AM, Sairaj Kodilkar wrote:
>>
>>
>> On 5/2/2025 7:45 AM, Alejandro Jimenez wrote:
> 
>> Hi Alejandro,
>>
>> Tested the v2, everything looks good when I boot guest with upstream
>> kernel. But I observed that NVME driver fails to load with guest kernel
>> version 4.15.0-213-generic. This is the default kernel that comes with
>> the ubuntu image.
> 
> Thank you for the additional testing and for the report. I wanted to 
> investigate and if possible solve the issue before replying, but since 
> it is taking me some time I wanted to ACK your message. Minor comments 
> below...
>>
>> This is what I see in the dmesg
>>
>> [   26.702381] nvme nvme0: pci function 0000:00:04.0
>> [   26.817847] nvme nvme0: missing or invalid SUBNQN field.
> 
> There are multiple reports of that warning which would indicate that is 
> not caused by an issue with the IOMMU emulation, but it is interesting 
> that you don't see it with "regular passthrough" (I assume that means 
> with guest kernel in pt mode).
> 

Yep The "regular passthrough" is guest without amd-iommu or pt=on

>>
>> I am using following command qemu command line
>>
>> -enable-kvm -m 10G -smp cpus=$NUM_VCPUS  \
>> -device amd-iommu,dma-remap=on \
>> -netdev user,id=USER0,hostfwd=tcp::3333-:22 \
>> -device virtio-net-pci,id=vnet0,iommu_platform=on,disable- 
>> legacy=on,romfile=,netdev=USER0 \
>> -cpu EPYC-Genoa,x2apic=on,kvm-msi-ext-dest-id=on,+kvm-pv-unhalt,kvm- 
>> pv- tlb-flush,kvm-pv-ipi,kvm-pv-sched-yield  \
>> -name guest=my-vm,debug-threads=on \
>> -machine q35,kernel_irqchip=split \
>> -global kvm-pit.lost_tick_policy=discard \
>> -nographic -vga none -chardev stdio,id=STDIO0,signal=off,mux=on \
>> -device isa-serial,id=isa-serial0,chardev=STDIO0 \
>> -smbios type=0,version=2.8 \
>> -blockdev node- 
>> name=drive0,driver=qcow2,file.driver=file,file.filename=$IMG \
>> -device virtio-blk-pci,num-queues=8,drive=drive0 \
>> -chardev socket,id=SOCKET1,server=on,wait=off,path=qemu.mon.user3333 \
>> -mon chardev=SOCKET1,mode=control \
>> -device vfio-pci,host=0000:44:00.0
>>
>> Do you have any idea what might trigger this.
> 
> There are some parameters above that are unnecessary and perhaps 
> conflicting e.g. we don't need kvm-msi-ext-dest-id=on since the vIOMMU 
> provides interrupt remapping (plus you are likely not using more than 
> 255 vCPUs). We also don't need kvm-pit.lost_tick_policy when using split 
> irqchip, since the PIT is not emulated by KVM. But to be fair I don't 
> believe those are likely to be causing the problem...

Thanks for letting me know, I'll update the script.

> 
> My main suspicion is the guest IOMMU driver being too old and missing 
> lots of fixes, so it could be missing some essential operations that the 
> emulation requires to work. e.g. if the guest driver does not comply 
> with the spec and fails to issue a DEVTAB_INVALIDATE after changing the 
> DTE, the vIOMMU code never gets the chance to enable the IOMMU memory 
> region, and it all goes wrong from that point on.
 > But I need to reproduce the problem and figure out where/when the > 
emulation is failing. I've tested as far back as 5.15 based kernels.
> 
> I would argue that while it is something that I am definitely going to 
> address if possible, this issue should not be a blocker. I'll update as 
> soon as I have more data on the cause.
> 
> Thank you,
> Alejandro
> 

I also think the same. This may be some old driver issue and we should 
not block on it.

Tested-by: Sairaj Kodilkar <sarunkod@amd.com>

Regards
Sairaj Kodilkar