On Fri, 1 May 2026 11:09:27 -0300, Desnes Nunes wrote:
> On Thu, Apr 30, 2026 at 6:55 PM Michal Pecio <michal.pecio@gmail.com> wrote:
> > When xhci_handle_command_timeout() logs USBSTS, does it help to add:
> >
> > if (usbsts & STS_FATAL) {
> > xhci_halt(xhci);
> > xhci_hc_died(xhci);
> > goto time_out_completed;
> > }
> > It may not be perfect solution (race conditions?) but it could hint
> > that we are on the right track, if it works.
>
> This panicked the system as soon as I hit `echo c > /proc/sysrq-trigger`:
>
> [ 141.683476] sysrq: Trigger a crash
> [ 141.686970] Kernel panic - not syncing: sysrq triggered crash
Damn, that sucks. Any chance it's not a problem with my proposed change
but some sort of issue on your side?
Anyway, I think the patch below might cover it. It works for me in the
sense that the bus does get killed, without ill effect. I tested on
VL805 where HSE is easily triggered by disabling XHCI_TRB_OVERFETCH.
However, the patch isn't necessary here - VL805 doesn't clear CRCR.CRR
on HSE, so normal abort path is taken and times out, then hc_died().
Can somebody serious confirm if this issue actually exists in the first
place, and whether the patch solves it?
Hello Redhat, anyone alive there? Or only stochastic parrots?
Mathias, do you remember what's the point of the "Command timeout on
stopped ring" branch? Can it happen in any case other than dead chip?
I also wonder if it wouldn't make sense to just hc_died() on every
command timeout except Address Device. We rely on Stop Endpoint
timeouts to kill chips which go unresponsive without setting HCE/HSE,
because sooner or later somebody loses patience and unlinks an URB,
but this story (real or hallucinated, but plausible) shows that this
may not help when there are no devices created yet.
---
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index e5823650850a..3041deb67b57 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -1761,13 +1761,15 @@ void xhci_handle_command_timeout(struct work_struct *work)
/* mark this command to be cancelled */
xhci->current_cmd->status = COMP_COMMAND_ABORTED;
- /* Make sure command ring is running before aborting it */
+ /* check for crashed or disconnected chip */
hw_ring_state = xhci_read_64(xhci, &xhci->op_regs->cmd_ring);
- if (hw_ring_state == ~(u64)0) {
+ if (hw_ring_state == ~(u64)0 || usbsts & (STS_FATAL | STS_HCE)) {
+ xhci_info(xhci, "kill the damn thing\n");
xhci_hc_died(xhci);
goto time_out_completed;
}
+ /* Make sure command ring is running before aborting it */
if ((xhci->cmd_ring_state & CMD_RING_STATE_RUNNING) &&
(hw_ring_state & CMD_RING_RUNNING)) {
/* Prevent new doorbell, and start command abort */
Hello again Michal,
On Sat, May 2, 2026 at 6:47 AM Michal Pecio <michal.pecio@gmail.com> wrote:
>
> On Fri, 1 May 2026 11:09:27 -0300, Desnes Nunes wrote:
> > On Thu, Apr 30, 2026 at 6:55 PM Michal Pecio <michal.pecio@gmail.com> wrote:
> > > When xhci_handle_command_timeout() logs USBSTS, does it help to add:
> > >
> > > if (usbsts & STS_FATAL) {
> > > xhci_halt(xhci);
> > > xhci_hc_died(xhci);
> > > goto time_out_completed;
> > > }
> > > It may not be perfect solution (race conditions?) but it could hint
> > > that we are on the right track, if it works.
> >
> > This panicked the system as soon as I hit `echo c > /proc/sysrq-trigger`:
> >
> > [ 141.683476] sysrq: Trigger a crash
> > [ 141.686970] Kernel panic - not syncing: sysrq triggered crash
>
> Damn, that sucks. Any chance it's not a problem with my proposed change
> but some sort of issue on your side?
Indeed - bummer.
Don't think so, since I'm using the same system and procedures I used
for the wait_for_completion_timeout() patch:
https://lore.kernel.org/linux-usb/20260430014817.2006885-1-desnesn@redhat.com/T/#ma6ce987cea510349082831bbb822136e5c5c57da
> Anyway, I think the patch below might cover it. It works for me in the
> sense that the bus does get killed, without ill effect. I tested on
> VL805 where HSE is easily triggered by disabling XHCI_TRB_OVERFETCH.
> However, the patch isn't necessary here - VL805 doesn't clear CRCR.CRR
> on HSE, so normal abort path is taken and times out, then hc_died().
...
> I also wonder if it wouldn't make sense to just hc_died() on every
> command timeout except Address Device. We rely on Stop Endpoint
> timeouts to kill chips which go unresponsive without setting HCE/HSE,
> because sooner or later somebody loses patience and unlinks an URB,
> but this story (real or hallucinated, but plausible) shows that this
> may not help when there are no devices created yet.
>
> ---
>
> diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
> index e5823650850a..3041deb67b57 100644
> --- a/drivers/usb/host/xhci-ring.c
> +++ b/drivers/usb/host/xhci-ring.c
> @@ -1761,13 +1761,15 @@ void xhci_handle_command_timeout(struct work_struct *work)
> /* mark this command to be cancelled */
> xhci->current_cmd->status = COMP_COMMAND_ABORTED;
>
> - /* Make sure command ring is running before aborting it */
> + /* check for crashed or disconnected chip */
> hw_ring_state = xhci_read_64(xhci, &xhci->op_regs->cmd_ring);
> - if (hw_ring_state == ~(u64)0) {
> + if (hw_ring_state == ~(u64)0 || usbsts & (STS_FATAL | STS_HCE)) {
> + xhci_info(xhci, "kill the damn thing\n");
> xhci_hc_died(xhci);
> goto time_out_completed;
> }
>
> + /* Make sure command ring is running before aborting it */
> if ((xhci->cmd_ring_state & CMD_RING_STATE_RUNNING) &&
> (hw_ring_state & CMD_RING_RUNNING)) {
> /* Prevent new doorbell, and start command abort */
FYI, sorry to be the bearer of bad news, but this also panics the
system as soon as I run `echo c > /proc/sysrq-trigger`. Kdump doesn't
run and no vmcore is produced:
==========
Kernel 7.0.0-michal.pecio.v2 on x86_64
FQDN login:
[ 1063.290020] sysrq: Trigger a crash
[ 1063.293504] Kernel panic - not syncing: sysrq triggered crash
[ 1063.299348] CPU: 12 UID: 0 PID: 5483 Comm: bash Not tainted
7.0.0-michal.pecio.v2 #1 PREEMPT(full)
[ 1063.308548] Hardware name: Intel Corporation Arrow Lake Client
Platform/MTL-S UDIMM 1DPC EVCRB, BIOS MTLSFWI1.R00.5385.D80.2509230731
09/23/2025
[ 1063.321718] Call Trace:
[ 1063.324210] <TASK>
[ 1063.326347] dump_stack_lvl+0x4e/0x70
[ 1063.330084] vpanic+0x20a/0x410
[ 1063.333281] panic+0x6b/0x70
[ 1063.336210] sysrq_handle_crash+0x1a/0x20
[ 1063.340290] __handle_sysrq.cold+0x99/0xde
[ 1063.344456] write_sysrq_trigger+0x59/0xb0
[ 1063.348624] proc_reg_write+0x5a/0xa0
[ 1063.352348] vfs_write+0xcf/0x450
[ 1063.355721] ksys_write+0x6b/0xe0
[ 1063.359092] do_syscall_64+0x11b/0x6a0
[ 1063.362906] ? do_user_addr_fault+0x206/0x680
[ 1063.367337] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 1063.372474] RIP: 0033:0x7fc8e8a9a544
[ 1063.376112] Code: 89 02 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00
00 00 0f 1f 40 00 f3 0f 1e fa 80 3d a5 cb 0d 00 00 74 13 b8 01 00 00
00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 48 83 ec 28 48 89 54 24
18 48
[ 1063.395199] RSP: 002b:00007ffcb7ccb328 EFLAGS: 00000202 ORIG_RAX:
0000000000000001
[ 1063.402898] RAX: ffffffffffffffda RBX: 00007fc8e8b705c0 RCX: 00007fc8e8a9a544
[ 1063.410153] RDX: 0000000000000002 RSI: 000055ae272da170 RDI: 0000000000000001
[ 1063.417409] RBP: 0000000000000002 R08: 0000000000000073 R09: 00000000ffffffff
[ 1063.424664] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000002
[ 1063.431919] R13: 000055ae272da170 R14: 0000000000000002 R15: 00007fc8e8b6df00
[ 1063.439177] </TASK>
[ 1063.441691] Kernel Offset: 0x4200000 from 0xffffffff81000000
(relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 1063.452571] ---[ end Kernel panic - not syncing: sysrq triggered crash ]---
============
Best Regards
Desnes Nunes
On Sat, 2 May 2026 08:38:34 -0300, Desnes Nunes wrote:
> > diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
> > index e5823650850a..3041deb67b57 100644
> > --- a/drivers/usb/host/xhci-ring.c
> > +++ b/drivers/usb/host/xhci-ring.c
> > @@ -1761,13 +1761,15 @@ void xhci_handle_command_timeout(struct work_struct *work)
> > /* mark this command to be cancelled */
> > xhci->current_cmd->status = COMP_COMMAND_ABORTED;
> >
> > - /* Make sure command ring is running before aborting it */
> > + /* check for crashed or disconnected chip */
> > hw_ring_state = xhci_read_64(xhci, &xhci->op_regs->cmd_ring);
> > - if (hw_ring_state == ~(u64)0) {
> > + if (hw_ring_state == ~(u64)0 || usbsts & (STS_FATAL | STS_HCE)) {
> > + xhci_info(xhci, "kill the damn thing\n");
> > xhci_hc_died(xhci);
> > goto time_out_completed;
> > }
> >
> > + /* Make sure command ring is running before aborting it */
> > if ((xhci->cmd_ring_state & CMD_RING_STATE_RUNNING) &&
> > (hw_ring_state & CMD_RING_RUNNING)) {
> > /* Prevent new doorbell, and start command abort */
>
> FYI, sorry to be the bearer of bad news, but this also panics the
> system as soon as I run `echo c > /proc/sysrq-trigger`.
Is this not what's supposed to happen?
Sorry, that complaint is so odd that I thought I'm seeing another case
of debugging being outsourced to an AI chatbot, which forgot that panic
is triggered intentionally here. Now I'm just confused.
> Kdump doesn't run and no vmcore is produced:
Is the kdump kernel not launched, or does it crash during boot?
The latter would make sense if there is some problem with the code.
But I don't understand how patching xhci-hcd could possibly have
any effect on the former. Does this new code execute at all? Does
"kill the damn thing" ever appear in dmesg?
Regards,
Michal
Hello Michal,
On Sat, May 2, 2026 at 6:55 PM Michal Pecio <michal.pecio@gmail.com> wrote:
>
> On Sat, 2 May 2026 08:38:34 -0300, Desnes Nunes wrote:
> > > diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
> > > index e5823650850a..3041deb67b57 100644
> > > --- a/drivers/usb/host/xhci-ring.c
> > > +++ b/drivers/usb/host/xhci-ring.c
> > > @@ -1761,13 +1761,15 @@ void xhci_handle_command_timeout(struct work_struct *work)
> > > /* mark this command to be cancelled */
> > > xhci->current_cmd->status = COMP_COMMAND_ABORTED;
> > >
> > > - /* Make sure command ring is running before aborting it */
> > > + /* check for crashed or disconnected chip */
> > > hw_ring_state = xhci_read_64(xhci, &xhci->op_regs->cmd_ring);
> > > - if (hw_ring_state == ~(u64)0) {
> > > + if (hw_ring_state == ~(u64)0 || usbsts & (STS_FATAL | STS_HCE)) {
> > > + xhci_info(xhci, "kill the damn thing\n");
> > > xhci_hc_died(xhci);
> > > goto time_out_completed;
> > > }
> > >
> > > + /* Make sure command ring is running before aborting it */
> > > if ((xhci->cmd_ring_state & CMD_RING_STATE_RUNNING) &&
> > > (hw_ring_state & CMD_RING_RUNNING)) {
> > > /* Prevent new doorbell, and start command abort */
> >
> > FYI, sorry to be the bearer of bad news, but this also panics the
> > system as soon as I run `echo c > /proc/sysrq-trigger`.
>
> Is this not what's supposed to happen?
>
> Sorry, that complaint is so odd that I thought I'm seeing another case
> of debugging being outsourced to an AI chatbot, which forgot that panic
> is triggered intentionally here. Now I'm just confused.
No, guess you actually saw a case of poor explanation on my end -
apologies for not explaining the outcome properly.
What I tried poorly to explain was that the system simply hanged after
I intentionally triggered the panic with a sysrq - both times.
Nothing happens after the sysrq panic stack trace.
> > Kdump doesn't run and no vmcore is produced:
> Is the kdump kernel not launched, or does it crash during boot?
> The latter would make sense if there is some problem with the code.
Kdump kernel didn't launch at all, thus no vmcore was produced.
> But I don't understand how patching xhci-hcd could possibly have
> any effect on the former. Does this new code execute at all? Does
> "kill the damn thing" ever appear in dmesg?
Both kernels booted normally: the first one checking HSE after USBSTS
was logged on xhci_handle_command_timeout(), as well as this new code
checking for ring state or the HSE and HCE bits.
Since kdump didn't start, the message "kill the damn thing" never got
a chance to appear on crashkernel's dmesg.
Best Regards,
Desnes
On Sun, 3 May 2026 00:36:27 -0300, Desnes Nunes wrote: > On Sat, May 2, 2026 at 6:55 PM Michal Pecio <michal.pecio@gmail.com> wrote: > > On Sat, 2 May 2026 08:38:34 -0300, Desnes Nunes wrote: > > > Kdump doesn't run and no vmcore is produced: > > Is the kdump kernel not launched, or does it crash during boot? > > The latter would make sense if there is some problem with the code. > > Kdump kernel didn't launch at all, thus no vmcore was produced. Well, that's weird. But it seems you have serial console enabled so I guess you should know whether it fails to start or crashes. > > But I don't understand how patching xhci-hcd could possibly have > > any effect on the former. Does this new code execute at all? Does > > "kill the damn thing" ever appear in dmesg? > > Both kernels booted normally: the first one checking HSE after USBSTS > was logged on xhci_handle_command_timeout(), as well as this new code > checking for ring state or the HSE and HCE bits. > Since kdump didn't start, the message "kill the damn thing" never got > a chance to appear on crashkernel's dmesg. It could show on the main kernel before the panic is triggered, if the main kernel was patched too. Maybe they are the same kernel binary? I'm trying to come up with any conceivable theory how patching xhci-hcd could prevent the kdump kernel from loading. Still no idea... Regards, Michal
Hi Michal, On Sun, May 3, 2026 at 2:18 AM Michal Pecio <michal.pecio@gmail.com> wrote: > Well, that's weird. But it seems you have serial console enabled so > I guess you should know whether it fails to start or crashes. Yes, I have been checking all boots and crashes through the serial console. > It could show on the main kernel before the panic is triggered, if the > main kernel was patched too. Maybe they are the same kernel binary? Yes, same patched binary on the main kernel and kdump kernel. > I'm trying to come up with any conceivable theory how patching xhci-hcd > could prevent the kdump kernel from loading. Still no idea... Just found the reason: with the installation of this last kernel, my /boot partition got filled. Thus, the initframs image was not actually getting copied to /boot. After removing a few test kernels, kdump armed normally, collected a vmcore and no hangs due to the locks of xhci_alloc_dev() or device_shutdown() appeared. So, I confirm that this patch, which checks for HSE or HCE indeed fixes the bug, without having to rely to a wait_for_completion_timeout(): # grep -i HSE -A5 kexec-dmesg.log [Sun May 3 11:37:36 2026] xhci_hcd 0000:80:14.0: Command timeout, USBSTS: 0x00000015 HCHalted HSE PCD [Sun May 3 11:37:36 2026] xhci_hcd 0000:80:14.0: kill the damn thing [Sun May 3 11:37:36 2026] xhci_hcd 0000:80:14.0: xHCI host controller not responding, assume dead [Sun May 3 11:37:36 2026] xhci_hcd 0000:80:14.0: HC died; cleaning up [Sun May 3 11:37:36 2026] xhci_hcd 0000:80:14.0: Error while assigning device slot ID: Command Aborted [Sun May 3 11:37:36 2026] xhci_hcd 0000:80:14.0: Max number of devices this xHCI host supports is 64. Best Regards, Desnes
On Sun, 3 May 2026 13:20:38 -0300, Desnes Nunes wrote: > Yes, same patched binary on the main kernel and kdump kernel. That's not a great news because it seems that the same HSE could occur on any kexec, not just kdump. It's unclear why it happens, it seems that after initial boot the HC works normally (does it?) but then kexec-ing breaks it somehow. I don't think this has anything to do with the Battlemage, because in the particular case which you shared, GPU began initialization *after* HSE had already been logged. My first wild guess would be that HSE is caused by resetting IOMMU while the xHC is unaware of kexec and continuing to DMA old buffers. Attached patch checks for this and also tries to explicitly clear HSE, although resetting ought to clear it too. But HW has bugs... So it may not help, but maybe it will if we are lucky, or at least it may offer some hint about when things go wrong. > So, I confirm that this patch, which checks for HSE or HCE indeed > fixes the bug, without having to rely to a > wait_for_completion_timeout(): > > # grep -i HSE -A5 kexec-dmesg.log > [Sun May 3 11:37:36 2026] xhci_hcd 0000:80:14.0: Command timeout, > USBSTS: 0x00000015 HCHalted HSE PCD > [Sun May 3 11:37:36 2026] xhci_hcd 0000:80:14.0: kill the damn thing > [Sun May 3 11:37:36 2026] xhci_hcd 0000:80:14.0: xHCI host controller > not responding, assume dead > [Sun May 3 11:37:36 2026] xhci_hcd 0000:80:14.0: HC died; cleaning up > [Sun May 3 11:37:36 2026] xhci_hcd 0000:80:14.0: Error while > assigning device slot ID: Command Aborted Thanks for testing, that's what the patch was intended to do. There is no lockup, but of course the chip doesn't work afterwards. Regards, Michal
On Sun, 3 May 2026 21:31:11 +0200, Michal Pecio wrote: > My first wild guess would be that HSE is caused by resetting IOMMU > while the xHC is unaware of kexec and continuing to DMA old buffers. > Attached patch checks for this and also tries to explicitly clear > HSE, although resetting ought to clear it too. But HW has bugs... Never mind, here's the smoking gun: [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: xHCI Host Controller [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: new USB bus registered, assigned bus number 3 [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: // Halt the HC [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: Resetting HCD [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: // Reset the HC [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: Wait for controller to be ready for doorbell rings [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: Reset complete [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: Enabling 64-bit DMA addresses. [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: Calling HCD init [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: Starting xhci_init [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: HCD page size set to 4K [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: Device context base array address = 0x0x000000100167c000 (DMA), 00000000d042f7e3 (virt) [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: Allocated command ring at 0000000016f013a6 [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: First segment DMA is 0x0x000000100167d000 [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: Allocating primary event ring [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: Allocating 34 scratchpad buffers [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: Ext Cap 000000001bef6947, port offset = 1, count = 14, revision = 0x2 [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: PSIV:1 PSIE:2 PLT:0 PFD:0 LP:0 PSIM:12 [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: PSIV:2 PSIE:1 PLT:0 PFD:0 LP:0 PSIM:1500 [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: PSIV:3 PSIE:2 PLT:0 PFD:0 LP:0 PSIM:480 [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: xHCI 1.0: support USB2 hardware lpm [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: Ext Cap 00000000a5bcc554, port offset = 17, count = 8, revision = 0x3 [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: PSIV:4 PSIE:3 PLT:0 PFD:1 LP:0 PSIM:5 [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: PSIV:5 PSIE:3 PLT:0 PFD:1 LP:1 PSIM:10 [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: PSIV:6 PSIE:3 PLT:0 PFD:1 LP:1 PSIM:10 [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: PSIV:7 PSIE:3 PLT:0 PFD:1 LP:1 PSIM:20 [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: Found 14 USB 2.0 ports and 8 USB 3.0 ports. [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: xHC can handle at most 64 device slots [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: Setting Max device slots reg = 0x40 [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: Setting command ring address to 0x100167d001 [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: Doorbell array is located at offset 0x3000 from cap regs base addr [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: // Write event ring dequeue pointer, preserving EHB bit [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: Finished xhci_init [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: Called HCD init [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: hcc params 0x20007fc1 hci version 0x120 quirks 0x0000000200009810 [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: Got SBRN 50 [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: MWI active [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: Finished xhci_pci_reinit [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: supports USB remote wakeup [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: xhci_run [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: ERST deq = 64'h100167e000 [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: Finished xhci_run for main hcd [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: xHCI Host Controller [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: new USB bus registered, assigned bus number 4 [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: Host supports USB 3.2 Enhanced SuperSpeed [Fri May 1 09:46:41 2026] xhci_hcd 0000:80:14.0: supports USB remote wakeup [Fri May 1 09:46:41 2026] xhci_hcd 0000:80:14.0: Enable interrupts [Fri May 1 09:46:41 2026] xhci_hcd 0000:80:14.0: Enable primary interrupter [Fri May 1 09:46:41 2026] xhci_hcd 0000:80:14.0: // Turn on HC, cmd = 0x5. [Fri May 1 09:46:41 2026] DMAR: DRHD: handling fault status reg 2 [Fri May 1 09:46:41 2026] DMAR: [DMA Read NO_PASID] Request device [80:14.0] fault addr 0x1001680000 [fault reason 0x39] SM: Present bit in Root Entry is clear The chip IOMMU faults shortly after setting USBCMD.RUN = 1. Such fault is expected to cause HSE assertion and usually it does. You will probably find that HSE is already set while Enable Slot is being queued, even if it was clear in xhci_gen_setup(). 1001680000 is close to valid addresses like 100167e000 or 100167c000. Possible causes: - xHCI or IOMMU driver bug - HW corrupted a pointer - HW accessed something out of bounds - HW dereferenced a stale pointer from the original kernel Do you happen to have more of those logs saved, are they all like that? Any chance that 1001680000 appears somewhere in the main kernel's log? If not, I suppose we will have to log every single DMA mapping created by the driver and see if this gives any new clues. Regards, Michal
On Mon, 4 May 2026 09:31:18 +0200, Michal Pecio wrote: > Never mind, here's the smoking gun: > > [...] > [Fri May 1 09:46:41 2026] xhci_hcd 0000:80:14.0: // Turn on HC, cmd = 0x5. > [Fri May 1 09:46:41 2026] DMAR: DRHD: handling fault status reg 2 > [Fri May 1 09:46:41 2026] DMAR: [DMA Read NO_PASID] Request device > [80:14.0] fault addr 0x1001680000 [fault reason 0x39] SM: Present bit > in Root Entry is clear > > The chip IOMMU faults shortly after setting USBCMD.RUN = 1. > Such fault is expected to cause HSE assertion and usually it does. > You will probably find that HSE is already set while Enable Slot > is being queued, even if it was clear in xhci_gen_setup(). > > 1001680000 is close to valid addresses like 100167e000 or 100167c000. > > Possible causes: > - xHCI or IOMMU driver bug > - HW corrupted a pointer > - HW accessed something out of bounds > - HW dereferenced a stale pointer from the original kernel > > Do you happen to have more of those logs saved, are they all like that? > Any chance that 1001680000 appears somewhere in the main kernel's log? Hi again, I see a certain lack of interest in finding the root cause of this. I have done a simple test on my own HW: writing bogus CRCR to cause IOMMU fault when the first command is submitted. I found that not all HCs reliably set HSE in this case, but obviously none of them ever complete the command properly. It seems that unconditional hc_died() on Enable Slot timeout may not be a bad idea. Makes me wonder if the same shouldn't apply to all commands besides Address Device, they typically only timeout due to HW issues. Regards, Michal
Hello Michal, On Mon, May 18, 2026 at 3:33 AM Michal Pecio <michal.pecio@gmail.com> wrote: > > The chip IOMMU faults shortly after setting USBCMD.RUN = 1. > > Such fault is expected to cause HSE assertion and usually it does. > > You will probably find that HSE is already set while Enable Slot > > is being queued, even if it was clear in xhci_gen_setup(). I've just read HSE at these places and confirmed that HSE was already set even before queuing the enable slot trb, even though it was previously clear in xhci_gen_setup(). Also, now digging more into the IOMMU debug messages, I have found out that IOMMU also faults a Write of the network driver, prior to the xhci Read fault: ========= # lspci | grep "80:1f.6\|80:14.0" 80:14.0 USB controller: Intel Corporation 800 Series PCH USB 3.1 xHCI HC (rev 10) 80:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (19) I219-LM (rev 10) ========= ... [Tue May 19 10:06:31 2026] PCI host bridge to bus 0000:80 [Tue May 19 10:06:31 2026] pci_bus 0000:80: root bus resource [io 0x2000-0x8bff window] [Tue May 19 10:06:31 2026] pci_bus 0000:80: root bus resource [mem 0xb8000000-0xbdffffff window] [Tue May 19 10:06:31 2026] pci_bus 0000:80: root bus resource [mem 0x8000000000-0x9fdfffffff window] [Tue May 19 10:06:31 2026] pci_bus 0000:80: root bus resource [bus 80-df] [Tue May 19 10:06:31 2026] pci 0000:80:14.0: [8086:7f6e] type 00 class 0x0c0330 conventional PCI endpoint [Tue May 19 10:06:31 2026] pci 0000:80:14.0: BAR 0 [mem 0x8000200000-0x800020ffff 64bit] [Tue May 19 10:06:31 2026] pci 0000:80:14.0: PME# supported from D3hot D3cold ... [Tue May 19 10:06:31 2026] pci 0000:80:1f.6: [8086:550c] type 00 class 0x020000 conventional PCI endpoint [Tue May 19 10:06:31 2026] pci 0000:80:1f.6: BAR 0 [mem 0xb8100000-0xb811ffff] [Tue May 19 10:06:31 2026] pci 0000:80:1f.6: PME# supported from D0 D3hot D3cold ... [Tue May 19 10:06:32 2026] pci 0000:80:14.0: Adding to iommu group 20 ... [Tue May 19 10:06:32 2026] pci 0000:80:1f.6: Adding to iommu group 29 [Tue May 19 10:06:32 2026] pci 0000:81:00.0: Adding to iommu group 30 [Tue May 19 10:06:32 2026] DMAR: Intel(R) Virtualization Technology for Directed I/O [Tue May 19 10:06:32 2026] PCI-DMA: Using software bounce buffering for IO (SWIOTLB) [Tue May 19 10:06:32 2026] software IO TLB: mapped [mem 0x000000003b000000-0x000000003f000000] (64MB) [Tue May 19 10:06:32 2026] ACPI: bus type thunderbolt registered [Tue May 19 10:06:32 2026] DMAR: DRHD: handling fault status reg 3 => [Tue May 19 10:06:32 2026] DMAR: [DMA Write NO_PASID] Request device [80:1f.6] fault addr 0x115106000 [fault reason 0x39] SM: Present bit in Root Entry is clear ... [Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: xHCI Host Controller [Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: new USB bus registered, assigned bus number 3 [Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: // Halt the HC [Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: Resetting HCD [Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: // Reset the HC [Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: Wait for controller to be ready for doorbell rings [Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: Reset complete [Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: Enabling 64-bit DMA addresses. [Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: HCD page size set to 4K [Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: Starting xhci_mem_init => [Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: Device context base array address = 0x000000107513c000 (DMA), 000000002c3aab07 (virt) [Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: Allocated command ring at 00000000ee0da32e [Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: Allocating primary event ring [Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: Allocating 34 scratchpad buffers [Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: Ext Cap 00000000a35d82fb, port offset = 1, count = 14, revision = 0x2 [Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: PSIV:1 PSIE:2 PLT:0 PFD:0 LP:0 PSIM:12 [Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: PSIV:2 PSIE:1 PLT:0 PFD:0 LP:0 PSIM:1500 [Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: PSIV:3 PSIE:2 PLT:0 PFD:0 LP:0 PSIM:480 [Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: xHCI 1.0: support USB2 hardware lpm [Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: Ext Cap 000000006d495f89, port offset = 17, count = 8, revision = 0x3 [Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: PSIV:4 PSIE:3 PLT:0 PFD:1 LP:0 PSIM:5 [Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: PSIV:5 PSIE:3 PLT:0 PFD:1 LP:1 PSIM:10 [Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: PSIV:6 PSIE:3 PLT:0 PFD:1 LP:1 PSIM:10 [Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: PSIV:7 PSIE:3 PLT:0 PFD:1 LP:1 PSIM:20 [Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: Found 14 USB 2.0 ports and 8 USB 3.0 ports. [Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: Finished xhci_mem_init [Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: Starting xhci_init [Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: xHC can handle at most 64 device slots [Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: Setting Max device slots reg = 0x40 [Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: Setting command ring address to 0x107513d001 [Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: Doorbell array is located at offset 0x3000 from cap regs base addr [Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: // Write event ring dequeue pointer, preserving EHB bit [Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: Finished xhci_init [Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: hcc params 0x20007fc1 hci version 0x120 quirks 0x0000000200009810 => [Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: PATCHED xhci_gen_setup: USBSTS: 0x00000001 HCHalted [Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: Got SBRN 50 [Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: MWI active [Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: Finished xhci_pci_reinit [Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: supports USB remote wakeup [Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: xhci_run => [Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: ERST deq = 64'h107513e000 [Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: Finished xhci_run for main hcd [Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: xHCI Host Controller [Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: new USB bus registered, assigned bus number 4 [Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: Host supports USB 3.2 Enhanced SuperSpeed [Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: supports USB remote wakeup [Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: Enable interrupts [Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: Enable primary interrupter [Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: // Turn on HC, cmd = 0x5. [Tue May 19 10:06:37 2026] DMAR: DRHD: handling fault status reg 2 => [Tue May 19 10:06:37 2026] DMAR: [DMA Read NO_PASID] Request device [80:14.0] fault addr 0x1075140000 [fault reason 0x39] SM: Present bit in Root Entry is clear [Tue May 19 10:06:38 2026] usb usb3: default language 0x0409 [Tue May 19 10:06:38 2026] usb usb3: udev 1, busnum 3, minor = 256 [Tue May 19 10:06:38 2026] usb usb3: New USB device found, idVendor=1d6b, idProduct=0002, bcdDevice= 7.01 [Tue May 19 10:06:38 2026] usb usb3: New USB device strings: Mfr=3, Product=2, SerialNumber=1 [Tue May 19 10:06:38 2026] usb usb3: Product: xHCI Host Controller [Tue May 19 10:06:38 2026] usb usb3: Manufacturer: Linux 7.1.0-rc3-30e0ff6d6a83.debug xhci-hcd [Tue May 19 10:06:38 2026] usb usb3: SerialNumber: 0000:80:14.0 [Tue May 19 10:06:38 2026] usb usb3: usb_probe_device [Tue May 19 10:06:38 2026] usb usb3: configuration #1 chosen from 1 choice [Tue May 19 10:06:38 2026] xHCI xhci_add_endpoint called for root hub [Tue May 19 10:06:38 2026] xHCI xhci_check_bandwidth called for root hub [Tue May 19 10:06:38 2026] usb usb3: adding 3-0:1.0 (config #1, interface 0) [Tue May 19 10:06:38 2026] hub 3-0:1.0: usb_probe_interface [Tue May 19 10:06:38 2026] hub 3-0:1.0: usb_probe_interface - got id [Tue May 19 10:06:38 2026] hub 3-0:1.0: USB hub found [Tue May 19 10:06:38 2026] hub 3-0:1.0: 14 ports detected [Tue May 19 10:06:38 2026] xhci_hcd 0000:00:0d.0: Get port status 2-1 read: 0x2a0, return 0x2a0 [Tue May 19 10:06:38 2026] xhci_hcd 0000:00:0d.0: Get port status 2-2 read: 0x2a0, return 0x2a0 [Tue May 19 10:06:38 2026] hub 2-0:1.0: state 7 ports 2 chg 0000 evt 0000 [Tue May 19 10:06:38 2026] xhci_hcd 0000:00:0d.0: set port remote wake mask, actual port 2-1 status = 0xe0002a0 [Tue May 19 10:06:38 2026] xhci_hcd 0000:00:0d.0: set port remote wake mask, actual port 2-2 status = 0xe0002a0 [Tue May 19 10:06:38 2026] hub 2-0:1.0: hub_suspend [Tue May 19 10:06:38 2026] usb usb2: bus auto-suspend, wakeup 1 [Tue May 19 10:06:38 2026] usb usb2: suspend raced with wakeup event [Tue May 19 10:06:38 2026] usb usb2: usb auto-resume [Tue May 19 10:06:38 2026] hub 3-0:1.0: standalone hub [Tue May 19 10:06:38 2026] hub 3-0:1.0: no power switching (usb 1.0) [Tue May 19 10:06:38 2026] hub 3-0:1.0: individual port over-current protection [Tue May 19 10:06:38 2026] hub 3-0:1.0: Single TT [Tue May 19 10:06:38 2026] hub 3-0:1.0: TT requires at most 8 FS bit times (666 ns) [Tue May 19 10:06:38 2026] hub 3-0:1.0: power on to power good time: 20ms [Tue May 19 10:06:38 2026] hub 3-0:1.0: local power source is good [Tue May 19 10:06:38 2026] usb usb3-port14: DeviceRemovable is changed to 1 according to platform information. [Tue May 19 10:06:38 2026] hub 3-0:1.0: trying to enable port power on non-switchable hub [Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: set port power 3-1 ON, portsc: 0x2a0 [Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: set port power 3-2 ON, portsc: 0x2a0 [Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: set port power 3-3 ON, portsc: 0x2a0 [Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: set port power 3-4 ON, portsc: 0x2a0 [Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: set port power 3-5 ON, portsc: 0x2a0 [Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: set port power 3-6 ON, portsc: 0x2a0 [Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: set port power 3-7 ON, portsc: 0x2a0 [Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: set port power 3-8 ON, portsc: 0x2a0 [Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: set port power 3-9 ON, portsc: 0x2a0 [Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: set port power 3-10 ON, portsc: 0x2a0 [Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: set port power 3-11 ON, portsc: 0x2a0 [Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: set port power 3-12 ON, portsc: 0x2a0 [Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: set port power 3-13 ON, portsc: 0x2a0 [Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: set port power 3-14 ON, portsc: 0x206e1 [Tue May 19 10:06:38 2026] usb usb4: skipped 1 descriptor after endpoint [Tue May 19 10:06:38 2026] usb usb4: default language 0x0409 [Tue May 19 10:06:38 2026] usb usb4: udev 1, busnum 4, minor = 384 [Tue May 19 10:06:38 2026] usb usb4: New USB device found, idVendor=1d6b, idProduct=0003, bcdDevice= 7.01 [Tue May 19 10:06:38 2026] hub 2-0:1.0: hub_resume [Tue May 19 10:06:38 2026] usb usb4: New USB device strings: Mfr=3, Product=2, SerialNumber=1 [Tue May 19 10:06:38 2026] usb usb4: Product: xHCI Host Controller [Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: Get port status 3-1 read: 0x2a0, return 0x100 [Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: Get port status 3-2 read: 0x2a0, return 0x100 [Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: Get port status 3-3 read: 0x2a0, return 0x100 [Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: Get port status 3-4 read: 0x2a0, return 0x100 [Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: Get port status 3-5 read: 0x2a0, return 0x100 [Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: Get port status 3-6 read: 0x2a0, return 0x100 [Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: Get port status 3-7 read: 0x2a0, return 0x100 [Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: Get port status 3-8 read: 0x2a0, return 0x100 [Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: Get port status 3-9 read: 0x2a0, return 0x100 [Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: Get port status 3-10 read: 0x2a0, return 0x100 [Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: Get port status 3-11 read: 0x2a0, return 0x100 [Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: Get port status 3-12 read: 0x2a0, return 0x100 [Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: Get port status 3-13 read: 0x2a0, return 0x100 [Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: Get port status 3-14 read: 0x206e1, return 0x10101 [Tue May 19 10:06:38 2026] usb usb3-port14: status 0101 change 0001 [Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: clear port14 connect change, portsc: 0x6e1 [Tue May 19 10:06:38 2026] usb usb4: Manufacturer: Linux 7.1.0-rc3-30e0ff6d6a83.debug xhci-hcd [Tue May 19 10:06:38 2026] usb usb4: SerialNumber: 0000:80:14.0 [Tue May 19 10:06:38 2026] xhci_hcd 0000:00:0d.0: Get port status 2-1 read: 0x2a0, return 0x2a0 [Tue May 19 10:06:38 2026] xhci_hcd 0000:00:0d.0: Get port status 2-2 read: 0x2a0, return 0x2a0 [Tue May 19 10:06:38 2026] hub 2-0:1.0: state 7 ports 2 chg 0000 evt 0000 [Tue May 19 10:06:38 2026] usb usb4: usb_probe_device [Tue May 19 10:06:38 2026] usb usb4: configuration #1 chosen from 1 choice [Tue May 19 10:06:38 2026] xHCI xhci_add_endpoint called for root hub [Tue May 19 10:06:38 2026] xHCI xhci_check_bandwidth called for root hub [Tue May 19 10:06:38 2026] usb usb4: adding 4-0:1.0 (config #1, interface 0) [Tue May 19 10:06:38 2026] hub 4-0:1.0: usb_probe_interface [Tue May 19 10:06:38 2026] hub 4-0:1.0: usb_probe_interface - got id [Tue May 19 10:06:38 2026] hub 4-0:1.0: USB hub found [Tue May 19 10:06:38 2026] hub 4-0:1.0: 8 ports detected [Tue May 19 10:06:38 2026] hub 4-0:1.0: standalone hub [Tue May 19 10:06:38 2026] hub 4-0:1.0: no power switching (usb 1.0) [Tue May 19 10:06:38 2026] hub 4-0:1.0: individual port over-current protection [Tue May 19 10:06:38 2026] hub 4-0:1.0: TT requires at most 8 FS bit times (666 ns) [Tue May 19 10:06:38 2026] hub 4-0:1.0: power on to power good time: 100ms [Tue May 19 10:06:38 2026] hub 4-0:1.0: local power source is good [Tue May 19 10:06:38 2026] usb usb4-port1: peered to usb3-port9 [Tue May 19 10:06:38 2026] usb usb4-port2: peered to usb3-port12 [Tue May 19 10:06:38 2026] usb usb4-port3: peered to usb3-port8 [Tue May 19 10:06:38 2026] usb usb4-port4: peered to usb3-port7 [Tue May 19 10:06:38 2026] usb usb4-port5: peered to usb3-port10 [Tue May 19 10:06:38 2026] usb usb4-port6: peered to usb3-port3 [Tue May 19 10:06:38 2026] usb usb4-port7: peered to usb3-port4 [Tue May 19 10:06:38 2026] usb usb4-port8: peered to usb3-port5 [Tue May 19 10:06:38 2026] usb usb4: port-1 no _DSM function 5 [Tue May 19 10:06:38 2026] usb usb4: port-1 disable U1/U2 _DSM: -19 [Tue May 19 10:06:38 2026] usb usb4: port-2 no _DSM function 5 [Tue May 19 10:06:38 2026] usb usb4: port-2 disable U1/U2 _DSM: -19 [Tue May 19 10:06:38 2026] usb usb4: port-3 no _DSM function 5 [Tue May 19 10:06:38 2026] usb usb4: port-3 disable U1/U2 _DSM: -19 [Tue May 19 10:06:38 2026] usb usb4: port-4 no _DSM function 5 [Tue May 19 10:06:38 2026] usb usb4: port-4 disable U1/U2 _DSM: -19 [Tue May 19 10:06:38 2026] usb usb4: port-5 no _DSM function 5 [Tue May 19 10:06:38 2026] usb usb4: port-5 disable U1/U2 _DSM: -19 [Tue May 19 10:06:38 2026] usb usb4: port-6 no _DSM function 5 [Tue May 19 10:06:38 2026] usb usb4: port-6 disable U1/U2 _DSM: -19 [Tue May 19 10:06:38 2026] usb usb4: port-7 no _DSM function 5 [Tue May 19 10:06:38 2026] usb usb4: port-7 disable U1/U2 _DSM: -19 [Tue May 19 10:06:38 2026] usb usb4: port-8 no _DSM function 5 [Tue May 19 10:06:38 2026] usb usb4: port-8 disable U1/U2 _DSM: -19 [Tue May 19 10:06:38 2026] hub 4-0:1.0: trying to enable port power on non-switchable hub [Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: set port power 4-1 ON, portsc: 0x2a0 [Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: set port power 4-2 ON, portsc: 0x2a0 [Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: set port power 4-3 ON, portsc: 0x2a0 [Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: set port power 4-4 ON, portsc: 0x2a0 [Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: set port power 4-5 ON, portsc: 0x2a0 [Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: set port power 4-6 ON, portsc: 0x2a0 [Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: set port power 4-7 ON, portsc: 0x2a0 [Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: set port power 4-8 ON, portsc: 0x2a0 [Tue May 19 10:06:38 2026] usbcore: registered new interface driver usbserial_generic [Tue May 19 10:06:38 2026] usbserial: USB Serial support registered for generic [Tue May 19 10:06:38 2026] hub 3-0:1.0: state 7 ports 14 chg 4000 evt 0000 [Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: Get port status 3-14 read: 0x6e1, return 0x101 [Tue May 19 10:06:38 2026] usb usb3-port14: status 0101, change 0000, 12 Mb/s => [Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: PATCHED: xhci_alloc_dev: B4 TRB_ENABLE_SLOT USBSTS: 0x00000015 HCHalted HSE PCD [Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: // Ding dong! [Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: Get port status 4-1 read: 0x2a0, return 0x2a0 [Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: Get port status 4-2 read: 0x2a0, return 0x2a0 [Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: Get port status 4-3 read: 0x2a0, return 0x2a0 [Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: Get port status 4-4 read: 0x2a0, return 0x2a0 [Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: Get port status 4-5 read: 0x2a0, return 0x2a0 [Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: Get port status 4-6 read: 0x2a0, return 0x2a0 [Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: Get port status 4-7 read: 0x2a0, return 0x2a0 [Tue May 19 10:06:38 2026] xhci_hcd 0000:80:14.0: Get port status 4-8 read: 0x2a0, return 0x2a0 ... ========= > > 1001680000 is close to valid addresses like 100167e000 or 100167c000. > > > > Possible causes: > > - xHCI or IOMMU driver bug Since it happens on different drivers, it is starting to feel like iommu bug that only happens in this kdump scenario. However, init shouldn't be stuck waiting for the lock that hub kworker task is holding. The system should be able to reboot automatically after capturing the vmcore. > > - HW corrupted a pointer > > - HW accessed something out of bounds > > - HW dereferenced a stale pointer from the original kernel > > > > Do you happen to have more of those logs saved, are they all like that? Since the last time we interacted, I lost access to the system and it got formatted - no more old logs. However, I've got the system back today and had some interesting developments. > > Any chance that 1001680000 appears somewhere in the main kernel's log? The fault addresses do not appear in the main log, nor anywhere other than the DMAR fault addr messages in the crashkernel's log. However, by comparing the previous log messages from the past kernel, to the ones I saw with the new kernel I built today, I noticed the same 8K displacement from the fault addr. Maybe an iommu driver bug clue? = 7.0.0-clean = => [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: Device context base array address = 0x0x000000100167c000 (DMA), 00000000d042f7e3 (virt) => [Fri May 1 09:46:40 2026] xhci_hcd 0000:80:14.0: ERST deq = 64'h100167e000 => [Fri May 1 09:46:41 2026] DMAR: [DMA Read NO_PASID] Request device [80:14.0] fault addr 0x1001680000 [fault reason 0x39] SM: Present bit in Root Entry is clear 0x100167c000 0x100167e000 ^ 0x2000 v 0x1001680000 = 7.1.0-rc3-30e0ff6d6a83.debug = => [Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: Device context base array address = 0x000000107513c000 (DMA), 000000002c3aab07 (virt) => [Tue May 19 10:06:37 2026] xhci_hcd 0000:80:14.0: ERST deq = 64'h107513e000 => [Tue May 19 10:06:37 2026] DMAR: [DMA Read NO_PASID] Request device [80:14.0] fault addr 0x1075140000 [fault reason 0x39] SM: Present bit in Root Entry is clear 0x107513c000 0x107513e000 ^ 0x2000 v 0x1075140000 > I see a certain lack of interest in finding the root cause of this. Actually, I've just come back from a bereavement leave. During that time I also lost access to the system which I got back today - apologies for the radio silence. > I have done a simple test on my own HW: writing bogus CRCR to cause > IOMMU fault when the first command is submitted. I found that not all > HCs reliably set HSE in this case, but obviously none of them ever > complete the command properly. Wow - good to know! I guess I would had expected to have HSE always being set in these kind of situations in the command ring register. Just out of curiosity, how did you figure out that only some HCs set HSE? Tested on a few HCs or inferred that somehow? > It seems that unconditional hc_died() on Enable Slot timeout may not be a bad idea. That was the idea of the original patch when I saw the HSE at that point of the reboot sequence after the vmcore was captured. However, different from my original patch, only one XHCI_CMD_DEFAULT_TIMEOUT is enough (even tested a few weeks ago after submission). Now if we can't trust that all HCs will reliably set HSE on scenarios like this one (iommu issues on the crashkernel?), the unconditional hc_died() starts to feel like a safer approach. > Makes me wonder if the same shouldn't apply to all commands > besides Address Device, they typically only timeout due to HW issues. In the past (commit c311e391a7efd101250c0e123286709b7e736249 "xhci: rework command timeout and cancellation,") all commands used to wait for a timeout of XHCI_CMD_DEFAULT_TIMEOUT - even Address Device. > > PS: there was a big iommu PR a few days ago - all the results from on this email were performed with a recent 7.1.0-rc3 kernel checked out at 30e0ff6d6a83. Best Regards, Desnes
On Wed, 20 May 2026 01:59:41 -0300, Desnes Nunes wrote: > On Mon, May 18, 2026 at 3:33 AM Michal Pecio <michal.pecio@gmail.com> wrote: > > > The chip IOMMU faults shortly after setting USBCMD.RUN = 1. > > > Such fault is expected to cause HSE assertion and usually it does. > > > You will probably find that HSE is already set while Enable Slot > > > is being queued, even if it was clear in xhci_gen_setup(). > > I've just read HSE at these places and confirmed that HSE was already > set even before queuing the enable slot trb, even though it was > previously clear in xhci_gen_setup(). Makes sense, thanks for checking. > The fault addresses do not appear in the main log, nor anywhere other > than the DMAR fault addr messages in the crashkernel's log. > > However, by comparing the previous log messages from the past kernel, > to the ones I saw with the new kernel I built today, I noticed the > same 8K displacement from the fault addr. Maybe an iommu driver bug > clue? If the bug is deterministic it should be fairly easy to nail it down. Attached xhci debugfs patch adds a list of almost all memory the xHC is allowed to access, including (if I havevn't missed something) all mappings it is allowed to access before any slots are enabled. Please apply, reboot and then: zip -r before.zip /sys/kernel/debug/usb/xhci/0000:80:14.0 # trigger crash kexec and the bug zip -r after.zip /sys/kernel/debug/usb/xhci/0000:80:14.0 Note: if you need to use tar instead, copy the directory and then archive the copy, because tar doesn't work on debugfs directly. And since the bug may be an out of bounds access by the HW, if you don't mind running slightly experimental patch to a critical subsystem, please also apply the DMA guard pages patch. I've been using it for a few months without issues, but YMMV. It helps determine which mapping is accessed OOB. Note: DMA guard pages may casue USB to stop working before kexec if it's a HW bug masked by memory layout, or begin to work after kexec in case of some IOMMU subsystem issues. > PS: there was a big iommu PR a few days ago - all the results from on > this email were performed with a recent 7.1.0-rc3 kernel checked out > at 30e0ff6d6a83. Well, so those IOMMU changes neither broke it nor fixed it. Knowing xHCI I frankly suspect that the problem is here. Regards, Michal
Hello Michal, On Fri, May 22, 2026 at 6:03 AM Michal Pecio <michal.pecio@gmail.com> wrote: > If the bug is deterministic it should be fairly easy to nail it down. > Attached xhci debugfs patch adds a list of almost all memory the xHC > is allowed to access, including (if I havevn't missed something) all > mappings it is allowed to access before any slots are enabled. > > Please apply, reboot and then: > > zip -r before.zip /sys/kernel/debug/usb/xhci/0000:80:14.0 > # trigger crash kexec and the bug > zip -r after.zip /sys/kernel/debug/usb/xhci/0000:80:14.0 ... > And since the bug may be an out of bounds access by the HW, if you > don't mind running slightly experimental patch to a critical subsystem, > please also apply the DMA guard pages patch. I've been using it for a > few months without issues, but YMMV. It helps determine which mapping > is accessed OOB. > > Note: DMA guard pages may casue USB to stop working before kexec if > it's a HW bug masked by memory layout, or begin to work after kexec in > case of some IOMMU subsystem issues. Please find the requested information in the attachments below. Kernel was patched with both patches. Best Regards, Desnes
On Fri, 22 May 2026 17:45:22 -0300, Desnes Nunes wrote: > Hello Michal, > > On Fri, May 22, 2026 at 6:03 AM Michal Pecio <michal.pecio@gmail.com> wrote: > > If the bug is deterministic it should be fairly easy to nail it down. > > Attached xhci debugfs patch adds a list of almost all memory the xHC > > is allowed to access, including (if I havevn't missed something) all > > mappings it is allowed to access before any slots are enabled. > > > > Please apply, reboot and then: > > > > zip -r before.zip /sys/kernel/debug/usb/xhci/0000:80:14.0 > > # trigger crash kexec and the bug > > zip -r after.zip /sys/kernel/debug/usb/xhci/0000:80:14.0 > ... > > And since the bug may be an out of bounds access by the HW, if you > > don't mind running slightly experimental patch to a critical subsystem, > > please also apply the DMA guard pages patch. I've been using it for a > > few months without issues, but YMMV. It helps determine which mapping > > is accessed OOB. > > > > Note: DMA guard pages may casue USB to stop working before kexec if > > it's a HW bug masked by memory layout, or begin to work after kexec in > > case of some IOMMU subsystem issues. > > Please find the requested information in the attachments below. > Kernel was patched with both patches. Sorry, I forgot about the most important thing: crash kernel log, or at least the IOMMU fault message showing the bad address. However, in this case it's moot because it seems that the HC didn't fault after kexec and it worked normally - debugfs shows that USBSTS=0 and some device has been successfully enabled. (It's a little odd that CRCR.CRR appears to be clear, not sure what's the deal with that, but it's same thing in before.zip as well.) So either the bug isn't as deterministic as we thought, or one of the patches "fixed" it, and that could only be DMA Guard Pages. If you can't reproduce the problem with guard pages, please remove that patch and post before/after debugfs again, plus crash kernel dmesg. Regards, Michal
Hello Michal,
On Fri, May 22, 2026 at 9:29 PM Michal Pecio <michal.pecio@gmail.com> wrote:
> Sorry, I forgot about the most important thing: crash kernel log, or at
> least the IOMMU fault message showing the bad address.
I was indeed intrigued and almost sent it without you asking for it :-)
The crashkernel's fault address is shown latter on down below, but now
I have attached the full kexec dmesg too.
PS: Note that the debugfs file 'memory' from before contains the
addresses of the main kernel, not crashkernel's addresses:
- From main dmesg:
[ 6.728105] xhci_hcd 0000:80:14.0: Device context base array
address = 0x000000010a958000 (DMA), 00000000f542e3ba (virt)
[ 6.737602] xhci_hcd 0000:80:14.0: ERST deq = 64'h10a95a000
> However, in this case it's moot because it seems that the HC didn't
> fault after kexec and it worked normally - debugfs shows that USBSTS=0
> and some device has been successfully enabled.
I did notice the USBSTS=0
> (It's a little odd that CRCR.CRR appears to be clear, not sure what's
> the deal with that, but it's same thing in before.zip as well.)
hum ...
> So either the bug isn't as deterministic as we thought, or one of the
> patches "fixed" it, and that could only be DMA Guard Pages. If you
> can't reproduce the problem with guard pages, please remove that patch
> and post before/after debugfs again, plus crash kernel dmesg.
So, the range of the fault addr in the crashkernel is always 8k away
... but the bug is not deterministic?
BTW, same crashkernel addresses from the previous 7.1.0-rc3-30e0ff6d6a83.debug:
[Fri May 22 09:35:04 2026] xhci_hcd 0000:80:14.0: Device context base
array address = 0x000000107513c000 (DMA), 000000008417e034 (virt)
[Fri May 22 09:35:04 2026] xhci_hcd 0000:80:14.0: ERST deq = 64'h107513e000
[Fri May 22 09:35:05 2026] DMAR: [DMA Read NO_PASID] Request device
[80:14.0] fault addr 0x1075140000 [fault reason 0x39] SM: Present bit
in Root Entry is clear
= 7.1.0-rc3-michal.debug =
0x107513c000
0x107513e000
^
0x2000
v
0x1075140000
Moreover, bug is still there with the kernel that had both of the debug patches:
[ 206.189184] kdump[561]: saving vmcore complete
...
[ 206.747441] systemd-shutdown[1]: Syncing filesystems and block devices.
[ 206.754256] systemd-shutdown[1]: Rebooting.
[ 246.858501] INFO: task kworker/0:1:11 blocked for more than 122 seconds.
[ 246.865330] Not tainted 7.1.0-rc3-michal.debug #3
[ 246.870656] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 246.878618] task:kworker/0:1 state:D stack:0 pid:11
tgid:11 ppid:2 task_flags:0x4208160 flags:0x00080000
[ 246.889940] Workqueue: usb_hub_wq hub_event
[ 246.894200] Call Trace:
[ 246.896688] <TASK>
[ 246.898826] __schedule+0x2a3/0x6d0
[ 246.902379] schedule+0x27/0x80
[ 246.905575] schedule_timeout+0xbd/0x100
[ 246.909567] __wait_for_common+0x97/0x1b0
[ 246.913646] ? __pfx_schedule_timeout+0x10/0x10
[ 246.918254] xhci_alloc_dev+0x9e/0x2b0
[ 246.922068] usb_alloc_dev+0x7a/0x3b0
[ 246.925795] hub_port_connect+0x285/0x960
[ 246.929873] hub_port_connect_change+0x94/0x290
[ 246.934479] port_event+0x4bb/0x840
[ 246.938029] hub_event+0x141/0x460
[ 246.941490] process_one_work+0x196/0x390
[ 246.945571] worker_thread+0x1af/0x320
[ 246.949385] ? __pfx_worker_thread+0x10/0x10
[ 246.953727] kthread+0xe3/0x120
[ 246.956920] ? __pfx_kthread+0x10/0x10
[ 246.960734] ret_from_fork+0x199/0x260
[ 246.964550] ? __pfx_kthread+0x10/0x10
[ 246.968363] ret_from_fork_asm+0x1a/0x30
[ 246.972355] </TASK>
[ 369.738542] INFO: task systemd-shutdow:1 blocked for more than 122 seconds.
[ 369.745628] Not tainted 7.1.0-rc3-michal.debug #3
[ 369.750941] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 369.758903] task:systemd-shutdow state:D stack:0 pid:1
tgid:1 ppid:0 task_flags:0x400100 flags:0x00080000
[ 369.770133] Call Trace:
[ 369.772622] <TASK>
[ 369.774757] __schedule+0x2a3/0x6d0
[ 369.778308] schedule+0x27/0x80
[ 369.781503] schedule_preempt_disabled+0x15/0x30
[ 369.786200] __mutex_lock.constprop.0+0x5ac/0xad0
[ 369.790983] device_shutdown+0xac/0x1b0
[ 369.794884] kernel_restart+0x3a/0x70
[ 369.798609] __do_sys_reboot+0x147/0x240
[ 369.802600] do_syscall_64+0xe3/0x670
[ 369.806324] ? do_sys_openat2+0x9d/0xe0
[ 369.810225] ? __x64_sys_openat+0x54/0xa0
[ 369.814302] ? do_syscall_64+0x11a/0x670
[ 369.818291] ? do_syscall_64+0x98/0x670
[ 369.822191] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 369.827329] RIP: 0033:0x7f19d248a917
[ 369.830967] RSP: 002b:00007ffcfa2902d8 EFLAGS: 00000206 ORIG_RAX:
00000000000000a9
[ 369.838664] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f19d248a917
[ 369.845919] RDX: 0000000001234567 RSI: 0000000028121969 RDI: 00000000fee1dead
[ 369.853176] RBP: 00007ffcfa290450 R08: 0000000000000069 R09: 00000000ffffffff
[ 369.860432] R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
[ 369.867688] R13: 0000000000000000 R14: 00007ffcfa290578 R15: 0000000000000000
[ 369.874946] </TASK>
[ 369.877172] INFO: task systemd-shutdow:1 is blocked on a mutex
likely owned by task kworker/0:1:11.
Thanks for all the help on the matter so far Michal,
Desnes
On Sat, 23 May 2026 00:47:28 -0300, Desnes Nunes wrote: > Hello Michal, > > On Fri, May 22, 2026 at 9:29 PM Michal Pecio <michal.pecio@gmail.com> wrote: > > Sorry, I forgot about the most important thing: crash kernel log, > > or at least the IOMMU fault message showing the bad address. > > I was indeed intrigued and almost sent it without you asking for it :-) > > The crashkernel's fault address is shown latter on down below, but now > I have attached the full kexec dmesg too. > > PS: Note that the debugfs file 'memory' from before contains the > addresses of the main kernel, not crashkernel's addresses: > - From main dmesg: > [ 6.728105] xhci_hcd 0000:80:14.0: Device context base array > address = 0x000000010a958000 (DMA), 00000000f542e3ba (virt) > [ 6.737602] xhci_hcd 0000:80:14.0: ERST deq = 64'h10a95a000 Neither debugfs dump corresponds to this dmesg, addresses don't match. And it doesn't look like the guard pages patch is working here. But maybe it doesn't matter. Your "memory" files show a clear pattern of consecutive page-sized allocations (example from after.zip): 102fb6000 DCBAA 102fb7000 CR 102fb8000 ER segmnet 0 102fb9000 ER segment 1 102fba000 ERST We can make a guess that the faulting address is the ERST, which definitely should be accessible to the host controller. This simple patch logs ERST allocation and freeing; as far as I see nothing else touches that mapping. If the ERST is somehow freed before starting the HC, that's a bug. Otherwise, it seems you were right that you have some IOMMU problem. Regards, Michal
Hello Michal,
On Sat, May 23, 2026 at 5:28 AM Michal Pecio <michal.pecio@gmail.com> wrote:
> We can make a guess that the faulting address is the ERST, which
> definitely should be accessible to the host controller.
>
> This simple patch logs ERST allocation and freeing; as far as I see
> nothing else touches that mapping.
>
> If the ERST is somehow freed before starting the HC, that's a bug.
Tested the patch and only saw the allocation messages:
# grep "alloc ERST\|free ERST\|ERST\|Device context\|fault addr"
vmcore-dmesg.txt
[ 6.582282] xhci_hcd 0000:00:0d.0: Device context base array
address = 0x000000010aca8000 (DMA), 00000000282a4aa1 (virt)
[ 6.582287] xhci_hcd 0000:00:0d.0: alloc ERST at 0x000000010acac000
[ 6.598137] xhci_hcd 0000:00:0d.0: ERST deq = 64'h10acaa000
[ 6.715052] xhci_hcd 0000:80:14.0: Device context base array
address = 0x0000000102ec9000 (DMA), 00000000d1c656e7 (virt)
[ 6.715057] xhci_hcd 0000:80:14.0: alloc ERST at 0x0000000102ecd000
[ 6.730919] xhci_hcd 0000:80:14.0: ERST deq = 64'h102ecb000
# grep "alloc ERST\|free ERST\|ERST\|Device context\|fault addr" kexec-dmesg.log
[Tue May 26 08:41:56 2026] DMAR: [DMA Write NO_PASID] Request device
[80:1f.6] fault addr 0x106f06000 [fault reason 0x39] SM: Present bit
in Root Entry is clear
[Tue May 26 08:41:56 2026] DMAR: [DMA Write NO_PASID] Request device
[80:1f.6] fault addr 0x106f19000 [fault reason 0x39] SM: Present bit
in Root Entry is clear
[Tue May 26 08:41:57 2026] DMAR: [DMA Write NO_PASID] Request device
[80:1f.6] fault addr 0x106f1c000 [fault reason 0x39] SM: Present bit
in Root Entry is clear
[Tue May 26 08:42:01 2026] xhci_hcd 0000:00:0d.0: Device context base
array address = 0x00000010750bf000 (DMA), 00000000fcec19e7 (virt)
[Tue May 26 08:42:01 2026] xhci_hcd 0000:00:0d.0: alloc ERST at
0x00000010750c5000
[Tue May 26 08:42:01 2026] xhci_hcd 0000:00:0d.0: ERST deq = 64'h10750c3000
[Tue May 26 08:42:01 2026] xhci_hcd 0000:80:14.0: Device context base
array address = 0x000000107513c000 (DMA), 000000008803b985 (virt)
[Tue May 26 08:42:01 2026] xhci_hcd 0000:80:14.0: alloc ERST at
0x0000001075140000
[Tue May 26 08:42:01 2026] xhci_hcd 0000:80:14.0: ERST deq = 64'h107513e000
[Tue May 26 08:42:02 2026] DMAR: [DMA Read NO_PASID] Request device
[80:14.0] fault addr 0x1075140000 [fault reason 0x39] SM: Present bit
in Root Entry is clear
^ PS: Different address alloc on kdump though
> Otherwise, it seems you were right that you have some IOMMU problem.
Thus, I started to investigate this front now. This time I gave some
more attention to these dmar messages:
[Tue May 19 08:17:49 2026] DMAR: Intel-IOMMU force enabled due to
platform opt in
[Tue May 19 08:17:49 2026] DMAR: No RMRR found
[Tue May 19 08:17:49 2026] DMAR: No ATSR found
[Tue May 19 08:17:49 2026] DMAR: dmar0: Using Queued invalidation
=> [Tue May 19 08:17:49 2026] DMAR: Translation already enabled -
trying to copy translation structures
=> [Tue May 19 08:17:49 2026] DMAR: Copied translation tables from
previous kernel for dmar0
[Tue May 19 08:17:49 2026] DMAR: dmar1: Using Queued invalidation
=> [Tue May 19 08:17:49 2026] DMAR: Translation already enabled -
trying to copy translation structures
=> [Tue May 19 08:17:49 2026] DMAR: Copied translation tables from
previous kernel for dmar1
I started wondering if maybe on my system these translation tables
can't be fully trusted for some reason during kdump?
Maybe iommu is copying root_entries with the Present bit clear, and
thus generating the fault reason 0x39?
-> bus 0x80's? Both ethernet and xhci_hcd fault addr were on this bus
So, to test this theory out, I tried to disable translation and
allocate a clean root-entry table right away if I am running a kdump
kernel:
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index e236c7ec221f..de673f34f4e1 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -2135,24 +2135,31 @@ static int __init init_dmars(void)
if (translation_pre_enabled(iommu)) {
pr_info("Translation already enabled - trying
to copy translation structures\n");
- ret = copy_translation_tables(iommu);
- if (ret) {
- /*
- * We found the IOMMU with translation
- * enabled - but failed to copy over the
- * old root-entry table. Try to proceed
- * by disabling translation now and
- * allocating a clean root-entry table.
- * This might cause DMAR faults, but
- * probably the dump will still succeed.
- */
- pr_err("Failed to copy translation
tables from previous kernel for %s\n",
- iommu->name);
+ if (is_kdump_kernel()) {
+ pr_info("DESNES V2 IOMMU kdump kernel,
disabilng translation and allocating clean root-entry for %s\n",
+ iommu->name);
iommu_disable_translation(iommu);
clear_translation_pre_enabled(iommu);
} else {
- pr_info("Copied translation tables
from previous kernel for %s\n",
- iommu->name);
+ ret = copy_translation_tables(iommu);
+ if (ret) {
+ /*
+ * We found the IOMMU with translation
+ * enabled - but failed to copy over the
+ * old root-entry table. Try to proceed
+ * by disabling translation now and
+ * allocating a clean root-entry table.
+ * This might cause DMAR faults, but
+ * probably the dump will still succeed.
+ */
+ pr_err("DESNES V2 Failed to
copy translation tables from previous kernel for %s\n",
+ iommu->name);
+ iommu_disable_translation(iommu);
+ clear_translation_pre_enabled(iommu);
+ } else {
+ pr_info("DESNES V2 Copied
translation tables from previous kernel for %s\n",
+ iommu->name);
+ }
}
}
Didn't had time to check ERST or HSE yet, but with this I didn't had
any DMAR faults, vmcore was collected normally and system rebooted
smoothly afterwards:
[Tue May 26 22:52:58 2026] DMAR: Intel-IOMMU force enabled due to
platform opt in
[Tue May 26 22:52:58 2026] DMAR: No RMRR found
[Tue May 26 22:52:58 2026] DMAR: No ATSR found
[Tue May 26 22:52:58 2026] DMAR: dmar0: Using Queued invalidation
=> [Tue May 26 22:52:58 2026] DMAR: Translation already enabled -
trying to copy translation structures
=> [Tue May 26 22:52:58 2026] DMAR: DESNES V2 IOMMU kdump kernel,
disabilng translation and allocating clean root-entry for dmar0
[Tue May 26 22:52:58 2026] DMAR: dmar1: Using Queued invalidation
=> [Tue May 26 22:52:58 2026] DMAR: Translation already enabled -
trying to copy translation structures
=> [Tue May 26 22:52:58 2026] DMAR: DESNES V2 IOMMU kdump kernel,
disabilng translation and allocating clean root-entry for dmar1
Seems like a lead on this iommu front.
The funny thing is that the comment in this section literaly says that
doing this could cause faults, but here clearing it actually seemed to
solve them and made kdump succeed - commit
091d42e43d21b6ca7ec39bf5f9e17bc0bd8d4312 ("iommu/vt-d: Copy
translation tables from old kernel")
Let me do some more tests to dump and check the root-entry table
before clearing, as well as to check ERST allocations and HSE value,
and I'll get back to you Michal.
Best Regards,
Desnes
Adding Intel IOMMU people.
Context:
Desnes reported xHCI issues duing crash kernel boot after SysRq
triggered panic. Turns out, the chip gets an IOMMU fault, some other
devices also do. Faulting address is a successful dma_alloc_coherent()
allocation in xhci_alloc_erst(), no evidence that it's freed before
the fault occurs. No problems during normal boot.
On Wed, 27 May 2026 00:47:53 -0300, Desnes Nunes wrote:
> # grep "alloc ERST\|free ERST\|ERST\|Device context\|fault addr" kexec-dmesg.log
> [Tue May 26 08:41:56 2026] DMAR: [DMA Write NO_PASID] Request device
> [80:1f.6] fault addr 0x106f06000 [fault reason 0x39] SM: Present bit
> in Root Entry is clear
> [Tue May 26 08:41:56 2026] DMAR: [DMA Write NO_PASID] Request device
> [80:1f.6] fault addr 0x106f19000 [fault reason 0x39] SM: Present bit
> in Root Entry is clear
> [Tue May 26 08:41:57 2026] DMAR: [DMA Write NO_PASID] Request device
> [80:1f.6] fault addr 0x106f1c000 [fault reason 0x39] SM: Present bit
> in Root Entry is clear
> [...]
> [Tue May 26 08:42:01 2026] xhci_hcd 0000:80:14.0: alloc ERST at
> 0x0000001075140000
> [Tue May 26 08:42:01 2026] xhci_hcd 0000:80:14.0: ERST deq = 64'h107513e000
> [Tue May 26 08:42:02 2026] DMAR: [DMA Read NO_PASID] Request device
> [80:14.0] fault addr 0x1075140000 [fault reason 0x39] SM: Present bit
> in Root Entry is clear
>
> ^ PS: Different address alloc on kdump though
>
> > Otherwise, it seems you were right that you have some IOMMU problem.
>
> Thus, I started to investigate this front now. This time I gave some
> more attention to these dmar messages:
>
> [Tue May 19 08:17:49 2026] DMAR: Intel-IOMMU force enabled due to
> platform opt in
> [Tue May 19 08:17:49 2026] DMAR: No RMRR found
> [Tue May 19 08:17:49 2026] DMAR: No ATSR found
> [Tue May 19 08:17:49 2026] DMAR: dmar0: Using Queued invalidation
> => [Tue May 19 08:17:49 2026] DMAR: Translation already enabled -
> trying to copy translation structures
> => [Tue May 19 08:17:49 2026] DMAR: Copied translation tables from
> previous kernel for dmar0
> [Tue May 19 08:17:49 2026] DMAR: dmar1: Using Queued invalidation
> => [Tue May 19 08:17:49 2026] DMAR: Translation already enabled -
> trying to copy translation structures
> => [Tue May 19 08:17:49 2026] DMAR: Copied translation tables from
> previous kernel for dmar1
>
> I started wondering if maybe on my system these translation tables
> can't be fully trusted for some reason during kdump?
> Maybe iommu is copying root_entries with the Present bit clear, and
> thus generating the fault reason 0x39?
> -> bus 0x80's? Both ethernet and xhci_hcd fault addr were on this bus
>
> So, to test this theory out, I tried to disable translation and
> allocate a clean root-entry table right away if I am running a kdump
> kernel:
>
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index e236c7ec221f..de673f34f4e1 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -2135,24 +2135,31 @@ static int __init init_dmars(void)
> if (translation_pre_enabled(iommu)) {
> pr_info("Translation already enabled - trying
> to copy translation structures\n");
>
> - ret = copy_translation_tables(iommu);
> - if (ret) {
> - /*
> - * We found the IOMMU with translation
> - * enabled - but failed to copy over the
> - * old root-entry table. Try to proceed
> - * by disabling translation now and
> - * allocating a clean root-entry table.
> - * This might cause DMAR faults, but
> - * probably the dump will still succeed.
> - */
> - pr_err("Failed to copy translation
> tables from previous kernel for %s\n",
> - iommu->name);
> + if (is_kdump_kernel()) {
> + pr_info("DESNES V2 IOMMU kdump kernel,
> disabilng translation and allocating clean root-entry for %s\n",
> + iommu->name);
> iommu_disable_translation(iommu);
> clear_translation_pre_enabled(iommu);
> } else {
> - pr_info("Copied translation tables
> from previous kernel for %s\n",
> - iommu->name);
> + ret = copy_translation_tables(iommu);
> + if (ret) {
> + /*
> + * We found the IOMMU with translation
> + * enabled - but failed to copy over the
> + * old root-entry table. Try to proceed
> + * by disabling translation now and
> + * allocating a clean root-entry table.
> + * This might cause DMAR faults, but
> + * probably the dump will still succeed.
> + */
> + pr_err("DESNES V2 Failed to
> copy translation tables from previous kernel for %s\n",
> + iommu->name);
> + iommu_disable_translation(iommu);
> + clear_translation_pre_enabled(iommu);
> + } else {
> + pr_info("DESNES V2 Copied
> translation tables from previous kernel for %s\n",
> + iommu->name);
> + }
> }
> }
>
> Didn't had time to check ERST or HSE yet, but with this I didn't had
> any DMAR faults, vmcore was collected normally and system rebooted
> smoothly afterwards:
>
> [Tue May 26 22:52:58 2026] DMAR: Intel-IOMMU force enabled due to
> platform opt in
> [Tue May 26 22:52:58 2026] DMAR: No RMRR found
> [Tue May 26 22:52:58 2026] DMAR: No ATSR found
> [Tue May 26 22:52:58 2026] DMAR: dmar0: Using Queued invalidation
> => [Tue May 26 22:52:58 2026] DMAR: Translation already enabled -
> trying to copy translation structures
> => [Tue May 26 22:52:58 2026] DMAR: DESNES V2 IOMMU kdump kernel,
> disabilng translation and allocating clean root-entry for dmar0
> [Tue May 26 22:52:58 2026] DMAR: dmar1: Using Queued invalidation
> => [Tue May 26 22:52:58 2026] DMAR: Translation already enabled -
> trying to copy translation structures
> => [Tue May 26 22:52:58 2026] DMAR: DESNES V2 IOMMU kdump kernel,
> disabilng translation and allocating clean root-entry for dmar1
>
> Seems like a lead on this iommu front.
>
> The funny thing is that the comment in this section literaly says that
> doing this could cause faults, but here clearing it actually seemed to
> solve them and made kdump succeed - commit
> 091d42e43d21b6ca7ec39bf5f9e17bc0bd8d4312 ("iommu/vt-d: Copy
> translation tables from old kernel")
>
> Let me do some more tests to dump and check the root-entry table
> before clearing, as well as to check ERST allocations and HSE value,
> and I'll get back to you Michal.
>
> Best Regards,
>
> Desnes
>
Hello Michal and IOMMU maintainers, On Wed, May 27, 2026 at 5:32 AM Michal Pecio <michal.pecio@gmail.com> wrote: > Adding Intel IOMMU people. And thanks a lot for all the help getting this bug at this stage Michal. I have just found out the solution for the bug. It has been a while, but it happens that I had to dig deep down into the iommu root and context entries after my last message. > Context: > > Desnes reported xHCI issues duing crash kernel boot after SysRq > triggered panic. Turns out, the chip gets an IOMMU fault, some other > devices also do. Faulting address is a successful dma_alloc_coherent() > allocation in xhci_alloc_erst(), no evidence that it's freed before > the fault occurs. No problems during normal boot. Recap: After I triggered the panic, the system collects a vmcore smootly, but does not reboot afterwards. The crashkernel was not completing a TRB_ENABLE_SLOT command in xhci_alloc_dev() in xhci.c, which made xhci hold the xhci->lock and block a systemd kworker that was also waiting for that lock on device_shutdown(). Basing myself on past code, I created a first patch that aborted the TRB_ENABLE_SLOT command with wait_for_completion_timeout() and killed the HC - this released the lock and enabled the kworker to finish. However, after a few messages and test patches with Michal, we weren't totaly sure if this was good solution, mostly because we noticed that HSE was already set way before ever reaching that TRB_ENABLE_SLOT command. Iommu: At that moment, I noticed that dmar faults were happening to the e1000e and xhci drivers that shared the same bus. Michal noticed that the faulting address was a successful dma_alloc_coherent() allocation in xhci_alloc_erst(). This stated to point to an IOMMU bug. Afterwards, I tested booting the crashkernel with `intel_iommu=off` and confirmed that it rebooted smootly after vmcore was captured, so the plot tickened even more for an iommu bug. Diging deaper, I started to suspect that the copy of bus 128's root entry was being changed to have the Present bit cleared somehow. This made me write a v2 patch for the bug, now in iommu, which instead of copying the root-entry table, it disabled translation and allocated a clean root-entry table immediately if running a kdump kernel: no DMAR faults and smooth reboot was caried out. Following this lead, I patched the code to dump the root entries table using DMAR_RTADDR_REG before calling iommu_alloc_root_entry() (in init_dmars() at drivers/iommu/intel/iommu.c), while also observing bus 0x80's root entry right at the end of copy_translation_tables(): [Wed Jun 3 15:40:37 2026] DMAR: dmar1: debug: root[ 0]: lo=0x000000018a67f001 (P=1 LCTP=0x000000018a67f000) hi=0x00000001f855d001 (P=1 UCTP=0x00000001f855d000) [Wed Jun 3 15:40:37 2026] DMAR: dmar1: debug: root[ 1]: lo=0x00000001f8562001 (P=1 LCTP=0x00000001f8562000) hi=0x0000000000000000 (P=0 UCTP=0x0000000000000000) [Wed Jun 3 15:40:37 2026] DMAR: dmar1: debug: root[ 2]: lo=0x00000001f8564001 (P=1 LCTP=0x00000001f8564000) hi=0x0000000000000000 (P=0 UCTP=0x0000000000000000) [Wed Jun 3 15:40:37 2026] DMAR: dmar1: debug: root[ 3]: lo=0x00000001f8566001 (P=1 LCTP=0x00000001f8566000) hi=0x0000000000000000 (P=0 UCTP=0x0000000000000000) [Wed Jun 3 15:40:37 2026] DMAR: dmar1: debug: root[ 4]: lo=0x00000001f8569001 (P=1 LCTP=0x00000001f8569000) hi=0x0000000000000000 (P=0 UCTP=0x0000000000000000) [Wed Jun 3 15:40:37 2026] DMAR: dmar1: debug: root[ 5]: lo=0x00000001f856b001 (P=1 LCTP=0x00000001f856b000) hi=0x0000000000000000 (P=0 UCTP=0x0000000000000000) => [Wed Jun 3 15:40:37 2026] DMAR: dmar1: debug: root[128]: lo=0x0000000000000000 (P=0 LCTP=0x0000000000000000) hi=0x00000001f856d001 (P=1 UCTP=0x00000001f856d000) [Wed Jun 3 15:40:37 2026] DMAR: dmar1: debug: root[129]: lo=0x00000001f8583001 (P=1 LCTP=0x00000001f8583000) hi=0x0000000000000000 (P=0 UCTP=0x0000000000000000) [Wed Jun 3 15:40:37 2026] DMAR: dmar1: debug: 8 of 256 root entries non-zero [Wed Jun 3 15:40:37 2026] DMAR: Translation already enabled - trying to copy translation structures => [Wed Jun 3 15:40:37 2026] DMAR: dmar1: post-copy root[128]: lo=0xe89cda001 hi=0x0 [Wed Jun 3 15:40:37 2026] DMAR: Copied translation tables from previous kernel for dma ... [Wed Jun 3 15:40:37 2026] DMAR: DRHD: handling fault status reg 3 => [Wed Jun 3 15:40:37 2026] DMAR: [DMA Write NO_PASID] Request device [80:1f.6] fault addr 0x1aba8d000 [fault reason 0x39] SM: Present bit in Root Entry is clear [Wed Jun 3 15:40:37 2026] DMAR: Dump dmar1 table entries for IOVA 0x1aba8d000 => [Wed Jun 3 15:40:37 2026] DMAR: scalable mode root entry: hi 0x0000000000000000, low 0x0000000e89cda001 => [Wed Jun 3 15:40:37 2026] DMAR: context table is not present [Wed Jun 3 15:40:37 2026] DMAR: DRHD: handling fault status reg 3 [Wed Jun 3 15:40:37 2026] DMAR: [DMA Write NO_PASID] Request device [80:1f.6] fault addr 0x1aba89000 [fault reason 0x39] SM: Present bit in Root Entry is clear [Wed Jun 3 15:40:37 2026] DMAR: Dump dmar1 table entries for IOVA 0x1aba89000 => [Wed Jun 3 15:40:37 2026] DMAR: scalable mode root entry: hi 0x0000000000000000, low 0x0000000e89cda001 => [Wed Jun 3 15:40:37 2026] DMAR: context table is not present ... => [Wed Jun 3 15:40:44 2026] xhci_hcd 0000:80:14.0: alloc ERST at 0x0000000e8a356000 ... [Wed Jun 3 15:40:44 2026] DMAR: DRHD: handling fault status reg 2 => [Wed Jun 3 15:40:44 2026] DMAR: [DMA Read NO_PASID] Request device [80:14.0] fault addr 0xe8a356000 [fault reason 0x39] SM: Present bit in Root Entry is clear [Wed Jun 3 15:40:44 2026] DMAR: Dump dmar1 table entries for IOVA 0xe8a356000 => [Wed Jun 3 15:40:44 2026] DMAR: scalable mode root entry: hi 0x0000000000000000, low 0x0000000e89cda001 => [Wed Jun 3 15:40:44 2026] DMAR: context table is not present In scalable mode, a PCI bus may populate only the upper root half (UCTP) when all devices on that bus have devfn >= 0x80. On bus 0x80, I have e1000e at 80:1f.6 (devfn 0xfe) and xHCI at 80:14.0 (devfn 0xa0), so the hardware root entry correctly has lo=0 and hi=UCTP present. However, after copy_translation_tables(), I noticed that root[128].hi was zeroed-out (Present bit cleared) and another (expected) different value on root[128].lo. In short, the culprit here is having a zeroed LCTP, since at copy_context_table() the allocation of new_ce for LCTP context entries currently governs the pos variable; which is later used to save new_ce entries for UCTP at tbl[tbl + pos]. On the first iteration idx will be zero, old_ce_phys will be empty, thus this moves the loop straight to devfn=0x80. At devfn 0x80, idx wraps to 0 again ( (devfn * 2) mod 256), but since no new_ce was previouly allocated for LCTP context entries, pos will remain zero while copying UCTP context entries. After all upper context entries are saved, tbl will receive new_ce from UCTP at tbl[tbl_idx + 0], and not tbl[tbl_idx + 1]. These will be later written in copy_translation_tables() to iommu->root_entry[bus].lo and iommu->root_entry[bus].hi, which causes the bug. In summary, the hardware tables were correct, but the copy path misplaced the UCTP table for bus 0x80 when dealing with a LCTP zeroed-out during kdump. To fix this, I created a v3 patch that uses devfn to better track which half we are copying, so UCTP-only buses (lo=0, hi=P) are installed into the upper root half. I am doing some final tests now, but since this was a lot to digest, comments at this stage will be most appreciated. To IOMMU maintainers: should I send this patch to the iommu mailing list and move the discussion there? Thanks in advance for any help on the matter, Best Regards, Desnes
© 2016 - 2026 Red Hat, Inc.