drivers/iommu/amd/iommu.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
We are hitting the following soft lockup in production on v6.6 and
v6.12, but the bug exists in all versions
watchdog: BUG: soft lockup - CPU#24 stuck for 31s! [tokio-runtime-w:1274919]
CPU: 24 PID: 1274919 Comm: tokio-runtime-w Not tainted 6.6.105+ #1
Hardware name: Google Google Compute Engine/Google Comput Engine, BIOS Google 10/25/2025
RIP: 0010:__raw_spin_unlock_irqrestore+0x21/0x30
Call Trace:
<TASK>
amd_iommu_attach_device+0x69/0x450
__iommu_device_set_domain+0x7b/0x190
__iommu_group_set_core_domain+0x61/0xd0
iommu_detatch_group+0x27/0x40
vfio_iommu_type1_detach_group+0x157/0x780 [vfio_iommu_type1]
vfio_group_detach_container+0x59/0x160 [vfio]
vfio_group_fops_release+0x4d/0x90 [vfio]
__fput+0x95/0x2a0
task_work_run+0x93/0xc0
do_exit+0x321/0x950
do_group_exit+0x7f/0xa0
get_signal_0x77d/0x780
</TASK>
This occurs because we're a VM and we're splitting up the size
CMD_INV_IOMMU_ALL_PAGES_ADDRESS we get from
amd_iommu_domain_flush_tlb_pde() into a bunch of smaller flushes. These
trap into the host on each flush, all while holding the domain lock with
IRQs disabled.
Fix this by not splitting up this special size amount and sending the
whole command in, so perhaps the host will decide to be gracious and not
spend 7 business years to do a flush.
cc: stable@vger.kernel.org
Fixes: a270be1b3fdf ("iommu/amd: Use only natural aligned flushes in a VM")
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
drivers/iommu/amd/iommu.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 81c4d7733872..f0d3e06734ef 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1769,7 +1769,8 @@ void amd_iommu_domain_flush_pages(struct protection_domain *domain,
{
lockdep_assert_held(&domain->lock);
- if (likely(!amd_iommu_np_cache)) {
+ if (likely(!amd_iommu_np_cache) ||
+ size == CMD_INV_IOMMU_ALL_PAGES_ADDRESS) {
__domain_flush_pages(domain, address, size);
/* Wait until IOMMU TLB and all device IOTLB flushes are complete */
--
2.53.0
On Wed, Mar 04, 2026 at 04:30:03PM -0500, Josef Bacik wrote: > We are hitting the following soft lockup in production on v6.6 and > v6.12, but the bug exists in all versions > Can I get this reviewed/merged? I'm hitting this softlockup hundreds of times a day in production and I need it in stable so I can have it backported to our kernels. Thanks, Josef
On Wed, Mar 04, 2026 at 04:30:03PM -0500, Josef Bacik wrote: > We are hitting the following soft lockup in production on v6.6 and > v6.12, but the bug exists in all versions > > watchdog: BUG: soft lockup - CPU#24 stuck for 31s! [tokio-runtime-w:1274919] > CPU: 24 PID: 1274919 Comm: tokio-runtime-w Not tainted 6.6.105+ #1 > Hardware name: Google Google Compute Engine/Google Comput Engine, BIOS Google 10/25/2025 > RIP: 0010:__raw_spin_unlock_irqrestore+0x21/0x30 > Call Trace: > <TASK> > amd_iommu_attach_device+0x69/0x450 > __iommu_device_set_domain+0x7b/0x190 > __iommu_group_set_core_domain+0x61/0xd0 > iommu_detatch_group+0x27/0x40 > vfio_iommu_type1_detach_group+0x157/0x780 [vfio_iommu_type1] > vfio_group_detach_container+0x59/0x160 [vfio] > vfio_group_fops_release+0x4d/0x90 [vfio] > __fput+0x95/0x2a0 > task_work_run+0x93/0xc0 > do_exit+0x321/0x950 > do_group_exit+0x7f/0xa0 > get_signal_0x77d/0x780 > </TASK> > > This occurs because we're a VM and we're splitting up the size > CMD_INV_IOMMU_ALL_PAGES_ADDRESS we get from > amd_iommu_domain_flush_tlb_pde() into a bunch of smaller flushes. This function doesn't exist in the upstream kernel anymore, and the new code doesn't generate CMD_INV_IOMMU_ALL_PAGES_ADDRESS flushes at all, AFAIK. Your patch makes sense, but it needs to go to stable only somehow. Jason
On Thu, Mar 12, 2026 at 9:40 AM Jason Gunthorpe <jgg@ziepe.ca> wrote: > > On Wed, Mar 04, 2026 at 04:30:03PM -0500, Josef Bacik wrote: > > We are hitting the following soft lockup in production on v6.6 and > > v6.12, but the bug exists in all versions > > > > watchdog: BUG: soft lockup - CPU#24 stuck for 31s! [tokio-runtime-w:1274919] > > CPU: 24 PID: 1274919 Comm: tokio-runtime-w Not tainted 6.6.105+ #1 > > Hardware name: Google Google Compute Engine/Google Comput Engine, BIOS Google 10/25/2025 > > RIP: 0010:__raw_spin_unlock_irqrestore+0x21/0x30 > > Call Trace: > > <TASK> > > amd_iommu_attach_device+0x69/0x450 > > __iommu_device_set_domain+0x7b/0x190 > > __iommu_group_set_core_domain+0x61/0xd0 > > iommu_detatch_group+0x27/0x40 > > vfio_iommu_type1_detach_group+0x157/0x780 [vfio_iommu_type1] > > vfio_group_detach_container+0x59/0x160 [vfio] > > vfio_group_fops_release+0x4d/0x90 [vfio] > > __fput+0x95/0x2a0 > > task_work_run+0x93/0xc0 > > do_exit+0x321/0x950 > > do_group_exit+0x7f/0xa0 > > get_signal_0x77d/0x780 > > </TASK> > > > > This occurs because we're a VM and we're splitting up the size > > CMD_INV_IOMMU_ALL_PAGES_ADDRESS we get from > > amd_iommu_domain_flush_tlb_pde() into a bunch of smaller flushes. > > This function doesn't exist in the upstream kernel anymore, and the > new code doesn't generate CMD_INV_IOMMU_ALL_PAGES_ADDRESS flushes at > all, AFAIK. This was based on linus/master as of March 4th, and we get here via amd_iommu_flush_tlb_all, which definitely still exists, so what specifically are you talking about? Thanks, Josef
On Sat, Mar 14, 2026 at 02:24:11PM -0400, Josef Bacik wrote: > On Thu, Mar 12, 2026 at 9:40 AM Jason Gunthorpe <jgg@ziepe.ca> wrote: > > > > On Wed, Mar 04, 2026 at 04:30:03PM -0500, Josef Bacik wrote: > > > We are hitting the following soft lockup in production on v6.6 and > > > v6.12, but the bug exists in all versions > > > > > > watchdog: BUG: soft lockup - CPU#24 stuck for 31s! [tokio-runtime-w:1274919] > > > CPU: 24 PID: 1274919 Comm: tokio-runtime-w Not tainted 6.6.105+ #1 > > > Hardware name: Google Google Compute Engine/Google Comput Engine, BIOS Google 10/25/2025 > > > RIP: 0010:__raw_spin_unlock_irqrestore+0x21/0x30 > > > Call Trace: > > > <TASK> > > > amd_iommu_attach_device+0x69/0x450 > > > __iommu_device_set_domain+0x7b/0x190 > > > __iommu_group_set_core_domain+0x61/0xd0 > > > iommu_detatch_group+0x27/0x40 > > > vfio_iommu_type1_detach_group+0x157/0x780 [vfio_iommu_type1] > > > vfio_group_detach_container+0x59/0x160 [vfio] > > > vfio_group_fops_release+0x4d/0x90 [vfio] > > > __fput+0x95/0x2a0 > > > task_work_run+0x93/0xc0 > > > do_exit+0x321/0x950 > > > do_group_exit+0x7f/0xa0 > > > get_signal_0x77d/0x780 > > > </TASK> > > > > > > This occurs because we're a VM and we're splitting up the size > > > CMD_INV_IOMMU_ALL_PAGES_ADDRESS we get from > > > amd_iommu_domain_flush_tlb_pde() into a bunch of smaller flushes. > > > > This function doesn't exist in the upstream kernel anymore, and the > > new code doesn't generate CMD_INV_IOMMU_ALL_PAGES_ADDRESS flushes at > > all, AFAIK. > > This was based on linus/master as of March 4th, and we get here via > amd_iommu_flush_tlb_all, which definitely still exists, so what > specifically are you talking about? Thanks, $ git grep amd_iommu_domain_flush_tlb_pde | wc -l 0 The entire page table logic was rewritten. The stuff that caused these issues is gone and the new stuff doesn't appear to have this bug of passing size == CMD_INV_IOMMU_ALL_PAGES_ADDRESS. If it does please explain it in terms of the new stuff without referencing deleted functions. I don't know how you get something like this into -stable. Jason
> On Thu, Mar 26, 2026 19:05:12 -0300 Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > On Sat, Mar 14, 2026 at 02:24:11PM -0400, Josef Bacik wrote:
> > On Thu, Mar 12, 2026 at 9:40 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > >
> > > On Wed, Mar 04, 2026 at 04:30:03PM -0500, Josef Bacik wrote:
> > > > We are hitting the following soft lockup in production on v6.6 and
> > > > v6.12, but the bug exists in all versions
> > > >
> > > > watchdog: BUG: soft lockup - CPU#24 stuck for 31s! [tokio-runtime-w:1274919]
> > > > CPU: 24 PID: 1274919 Comm: tokio-runtime-w Not tainted 6.6.105+ #1
> > > > Hardware name: Google Google Compute Engine/Google Comput Engine, BIOS Google 10/25/2025
> > > > RIP: 0010:__raw_spin_unlock_irqrestore+0x21/0x30
> > > > Call Trace:
> > > > <TASK>
> > > > amd_iommu_attach_device+0x69/0x450
> > > > __iommu_device_set_domain+0x7b/0x190
> > > > __iommu_group_set_core_domain+0x61/0xd0
> > > > iommu_detatch_group+0x27/0x40
> > > > vfio_iommu_type1_detach_group+0x157/0x780 [vfio_iommu_type1]
> > > > vfio_group_detach_container+0x59/0x160 [vfio]
> > > > vfio_group_fops_release+0x4d/0x90 [vfio]
> > > > __fput+0x95/0x2a0
> > > > task_work_run+0x93/0xc0
> > > > do_exit+0x321/0x950
> > > > do_group_exit+0x7f/0xa0
> > > > get_signal_0x77d/0x780
> > > > </TASK>
> > > >
> > > > This occurs because we're a VM and we're splitting up the size
> > > > CMD_INV_IOMMU_ALL_PAGES_ADDRESS we get from
> > > > amd_iommu_domain_flush_tlb_pde() into a bunch of smaller flushes.
> > >
> > > This function doesn't exist in the upstream kernel anymore, and the
> > > new code doesn't generate CMD_INV_IOMMU_ALL_PAGES_ADDRESS flushes at
> > > all, AFAIK.
> >
> > This was based on linus/master as of March 4th, and we get here via
> > amd_iommu_flush_tlb_all, which definitely still exists, so what
> > specifically are you talking about? Thanks,
>
> $ git grep amd_iommu_domain_flush_tlb_pde | wc -l
> 0
>
> The entire page table logic was rewritten. The stuff that caused these
> issues is gone and the new stuff doesn't appear to have this bug of
> passing size == CMD_INV_IOMMU_ALL_PAGES_ADDRESS.
>
> If it does please explain it in terms of the new stuff without
> referencing deleted functions.
>
> I don't know how you get something like this into -stable.
I believe the function Josef is referring to on linux/master is amd_iommu_domain_flush_all().
https://elixir.bootlin.com/linux/v7.0-rc7/source/drivers/iommu/amd/iommu.c#L1820
The potential call sequence appears to be:
```
blocked_domain_attach_device() or amd_iommu_attach_device()
-> detach_device()
-> amd_iommu_domain_flush_all()
->amd_iommu_domain_flush_pages(...,
CMD_INV_IOMMU_ALL_PAGES_ADDRESS);
```
Based on the code in build_inv_address()[1], it doesn't make sense to break
the entire cache size into smaller sizes to perform multiple flushes for a chunk size
larger than 1 << 51(full flush)
[1] https://elixir.bootlin.com/linux/v7.0-rc7/source/drivers/iommu/amd/iommu.c#L1289
On Thu, Apr 09, 2026 at 08:12:25AM +0000, Weinan Liu wrote:
> > On Thu, Mar 26, 2026 19:05:12 -0300 Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > > On Sat, Mar 14, 2026 at 02:24:11PM -0400, Josef Bacik wrote:
> > > On Thu, Mar 12, 2026 at 9:40 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > > >
> > > > On Wed, Mar 04, 2026 at 04:30:03PM -0500, Josef Bacik wrote:
> > > > > We are hitting the following soft lockup in production on v6.6 and
> > > > > v6.12, but the bug exists in all versions
> > > > >
> > > > > watchdog: BUG: soft lockup - CPU#24 stuck for 31s! [tokio-runtime-w:1274919]
> > > > > CPU: 24 PID: 1274919 Comm: tokio-runtime-w Not tainted 6.6.105+ #1
> > > > > Hardware name: Google Google Compute Engine/Google Comput Engine, BIOS Google 10/25/2025
> > > > > RIP: 0010:__raw_spin_unlock_irqrestore+0x21/0x30
> > > > > Call Trace:
> > > > > <TASK>
> > > > > amd_iommu_attach_device+0x69/0x450
> > > > > __iommu_device_set_domain+0x7b/0x190
> > > > > __iommu_group_set_core_domain+0x61/0xd0
> > > > > iommu_detatch_group+0x27/0x40
> > > > > vfio_iommu_type1_detach_group+0x157/0x780 [vfio_iommu_type1]
> > > > > vfio_group_detach_container+0x59/0x160 [vfio]
> > > > > vfio_group_fops_release+0x4d/0x90 [vfio]
> > > > > __fput+0x95/0x2a0
> > > > > task_work_run+0x93/0xc0
> > > > > do_exit+0x321/0x950
> > > > > do_group_exit+0x7f/0xa0
> > > > > get_signal_0x77d/0x780
> > > > > </TASK>
> > > > >
> > > > > This occurs because we're a VM and we're splitting up the size
> > > > > CMD_INV_IOMMU_ALL_PAGES_ADDRESS we get from
> > > > > amd_iommu_domain_flush_tlb_pde() into a bunch of smaller flushes.
> > > >
> > > > This function doesn't exist in the upstream kernel anymore, and the
> > > > new code doesn't generate CMD_INV_IOMMU_ALL_PAGES_ADDRESS flushes at
> > > > all, AFAIK.
> > >
> > > This was based on linus/master as of March 4th, and we get here via
> > > amd_iommu_flush_tlb_all, which definitely still exists, so what
> > > specifically are you talking about? Thanks,
> >
> > $ git grep amd_iommu_domain_flush_tlb_pde | wc -l
> > 0
> >
> > The entire page table logic was rewritten. The stuff that caused these
> > issues is gone and the new stuff doesn't appear to have this bug of
> > passing size == CMD_INV_IOMMU_ALL_PAGES_ADDRESS.
> >
> > If it does please explain it in terms of the new stuff without
> > referencing deleted functions.
> >
> > I don't know how you get something like this into -stable.
>
> I believe the function Josef is referring to on linux/master is amd_iommu_domain_flush_all().
> https://elixir.bootlin.com/linux/v7.0-rc7/source/drivers/iommu/amd/iommu.c#L1820
That does seem to be an issue, but it is not going to be triggred by a
VFIO trace like Josef is showing. I've already fixed this properly in
my series:
https://lore.kernel.org/all/3-v2-90ddd19c0894+13561-iommupt_inv_amd_jgg@nvidia.com/
+ if (likely(!amd_iommu_np_cache) ||
+ unlikely(address == 0 && last == U64_MAX)) {
+ __domain_flush_pages(domain, address, last);
By fully getting rid of the wrong use of
CMD_INV_IOMMU_ALL_PAGES_ADDRESS as a size in the callers.
So there is a small window when this patch could land with a commit
message to address amd_iommu_domain_flush_all() and be backported
before it all gets reworked and backporting will become hard. Respin
it quickly?
Jason
© 2016 - 2026 Red Hat, Inc.