This series modifies current implementation to use 128-bit cmpxchg to
update DTE when needed as specified in the AMD I/O Virtualization
Techonology (IOMMU) Specification.
Please note that I have verified with the hardware designer, and they have
confirmed that the IOMMU hardware has always been implemented with 256-bit
read. The next revision of the IOMMU spec will be updated to correctly
describe this part. Therefore, I have updated the implementation to avoid
unnecessary flushing.
Also, this has been a long series. I would like to thank several folks who
have helped review and provide suggestions along the way :)
Changes in v11:
* Remove the patch to introduce __READ_ONCE() for 128-bit data type
since all 128-bit DTE access is currently done under per-DTE spin_lock.
This is to help avoid complicating __unqual_scalar_typeof() further (Per Arnd).
* Patch 4, 6:
- Replace spin_lock/unlock() with spin_lock_irqsave/spin_unlock_irqrestore()
due to the following dmesg warning:
=====================================================
WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
6.12.0-rc5+ #29 Not tainted
-----------------------------------------------------
cc1/145047 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
ffff93737e255cb8 (&dev_data->dte_lock){+.+.}-{2:2}, at: update_dte256+0x5f/0x1b0
and this task is already holding:
ffff937371782150 (&domain->lock){-.-.}-{2:2}, at: alloc_pte.constprop.0+0x175/0x5e0
which would create a new lock dependency:
(&domain->lock){-.-.}-{2:2} -> (&dev_data->dte_lock){+.+.}-{2:2}
but this new dependency connects a HARDIRQ-irq-safe lock:
(&domain->lock){-.-.}-{2:2}
... which became HARDIRQ-irq-safe at:
__lock_acquire+0x399/0xbb0
lock_acquire.part.0+0xb0/0x250
__raw_spin_lock_irqsave+0x49/0x90
amd_iommu_flush_iotlb_all+0x1b/0x50
fq_flush_iotlb+0x22/0x40
queue_iova+0x12d/0x150
__iommu_dma_unmap+0xc2/0x140
iommu_dma_unmap_page+0x44/0x90
dma_unmap_page_attrs+0x202/0x240
nvme_pci_complete_batch+0xb3/0xd0 [nvme]
nvme_irq+0x7f/0x90 [nvme]
__handle_irq_event_percpu+0x81/0x270
handle_irq_event+0x34/0x70
handle_edge_irq+0x9f/0x240
__common_interrupt+0x70/0x140
common_interrupt+0xb2/0xd0
asm_common_interrupt+0x22/0x40
cpuidle_enter_state+0x11d/0x540
cpuidle_enter+0x29/0x40
cpuidle_idle_call+0x100/0x170
do_idle+0x96/0xf0
cpu_startup_entry+0x25/0x30
start_secondary+0x11d/0x140
common_startup_64+0x13e/0x141
to a HARDIRQ-irq-unsafe lock:
(&dev_data->dte_lock){+.+.}-{2:2}
... which became HARDIRQ-irq-unsafe at:
...
__lock_acquire+0x399/0xbb0
lock_acquire.part.0+0xb0/0x250
_raw_spin_lock+0x34/0x80
update_dte256+0x5f/0x1b0
set_dte_entry+0x1d1/0x290
dev_update_dte+0x53/0x120
attach_device.isra.0+0x120/0x4f0
amd_iommu_attach_device+0x83/0xd0
__iommu_attach_device+0x1d/0xd0
__iommu_device_set_domain+0x5b/0xb0
__iommu_group_set_domain_internal+0x68/0x120
iommu_setup_default_domain+0x204/0x350
iommu_device_register+0x156/0x250
iommu_init_pci+0x18f/0x570
amd_iommu_init_pci+0xcb/0x2b0
state_next+0x7e5/0x8e0
amd_iommu_init+0x1f/0x80
pci_iommu_init+0xe/0x40
do_one_initcall+0x5f/0x2c0
do_initcalls+0xb9/0x180
kernel_init_freeable+0x149/0x230
kernel_init+0x16/0x1c0
ret_from_fork+0x2d/0x50
ret_from_fork_asm+0x1a/0x30
* Patch 4:
- Introduce helper function iommu_atomic128_set() (Per Uros)
- In write_dte_update128(), remove unnecessary do-while() loop for
try-cmpxchg and remove __READ_ONCE() since the function is called
under DTE spin_lock;
* Patch 6: In get_dte256(), remove __READ_ONCE(), since the 128-bit data read
is inside DTE spin_lock.
* Patch 8: Remove READ/WRITE_ONCE() from amd_iommu_set_dirty_tracking()
since called inside DTE spin_lock.
v10: https://lore.kernel.org/lkml/20241113120327.5239-1-suravee.suthikulpanit@amd.com/
v9: https://lore.kernel.org/lkml/20241101162304.4688-1-suravee.suthikulpanit@amd.com/
v8: https://lore.kernel.org/lkml/20241031184243.4184-1-suravee.suthikulpanit@amd.com/
v7: https://lore.kernel.org/lkml/20241031091624.4895-1-suravee.suthikulpanit@amd.com/
v6: https://lore.kernel.org/lkml/20241016051756.4317-1-suravee.suthikulpanit@amd.com/
v5: https://lore.kernel.org/lkml/20241007041353.4756-1-suravee.suthikulpanit@amd.com/
v4: https://lore.kernel.org/lkml/20240916171805.324292-1-suravee.suthikulpanit@amd.com/
v3: https://lore.kernel.org/lkml/20240906121308.5013-1-suravee.suthikulpanit@amd.com/
v2: https://lore.kernel.org/lkml/20240829180726.5022-1-suravee.suthikulpanit@amd.com/
v1: https://lore.kernel.org/lkml/20240819161839.4657-1-suravee.suthikulpanit@amd.com/
Thanks,
Suravee
Suravee Suthikulpanit (9):
iommu/amd: Misc ACPI IVRS debug info clean up
iommu/amd: Disable AMD IOMMU if CMPXCHG16B feature is not supported
iommu/amd: Introduce struct ivhd_dte_flags to store persistent DTE
flags
iommu/amd: Introduce helper function to update 256-bit DTE
iommu/amd: Modify set_dte_entry() to use 256-bit DTE helpers
iommu/amd: Introduce helper function get_dte256()
iommu/amd: Modify clear_dte_entry() to avoid in-place update
iommu/amd: Lock DTE before updating the entry with WRITE_ONCE()
iommu/amd: Remove amd_iommu_apply_erratum_63()
drivers/iommu/amd/amd_iommu.h | 4 +-
drivers/iommu/amd/amd_iommu_types.h | 41 ++-
drivers/iommu/amd/init.c | 229 +++++++++--------
drivers/iommu/amd/iommu.c | 378 +++++++++++++++++++++-------
4 files changed, 440 insertions(+), 212 deletions(-)
--
2.34.1