[PATCH] usb: xhci: add xhci_halt() for HCE Handling

jiangdayu posted 1 patch 1 week, 4 days ago
There is a newer version of this series
drivers/usb/host/xhci-ring.c | 1 +
1 file changed, 1 insertion(+)
[PATCH] usb: xhci: add xhci_halt() for HCE Handling
Posted by jiangdayu 1 week, 4 days ago
When the xHCI controller reports a Host Controller Error (HCE) status
in the interrupt handler, the driver currently only logs a warning and
continues execution. However, a Host Controller Error indicates a
critical hardware failure that requires the controller to be halted.

Add xhci_halt(xhci) call after the HCE warning to properly halt the
controller when this error condition is detected. This ensures the
controller is in a consistent state and prevents further operations
on a failed hardware. Additionally, if there are still unhandled
interrupts at this point, it may cause interrupt storm.

The change is made in xhci_irq() function where STS_HCE status is
checked, mirroring the existing error handling pattern used for
STS_FATAL errors.

Fixes: 2a25e66d676df ("xhci: print warning when HCE was set")
Signed-off-by: jiangdayu <jiangdayu@xiaomi.com>
---
 drivers/usb/host/xhci-ring.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 9315ba18310d..1cbefee3c4ca 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -3195,6 +3195,7 @@ irqreturn_t xhci_irq(struct usb_hcd *hcd)
 
 	if (status & STS_HCE) {
 		xhci_warn(xhci, "WARNING: Host Controller Error\n");
+		xhci_halt(xhci);
 		goto out;
 	}
 
-- 
2.34.1
Re: [PATCH] usb: xhci: add xhci_halt() for HCE Handling
Posted by Mathias Nyman 1 week, 4 days ago
Hi

On 1/27/26 13:04, jiangdayu wrote:
> When the xHCI controller reports a Host Controller Error (HCE) status
> in the interrupt handler, the driver currently only logs a warning and
> continues execution. However, a Host Controller Error indicates a
> critical hardware failure that requires the controller to be halted.
> 

The host should cease all activity when it sets the HCE bit.

See xHCI spec 4.24.1 'Internal Errors':
"When the HCE flag is set to ‘1’ the xHC shall cease all activity.
  Software response to the assertion of HCE is to reset the
  xHC (HCRST = ‘1’) and reinitialize it."

Same is true for "Host system error" HSE (STS_FATAL), not sure
why we halt it manually in that case.

> Add xhci_halt(xhci) call after the HCE warning to properly halt the
> controller when this error condition is detected. This ensures the
> controller is in a consistent state and prevents further operations
> on a failed hardware. Additionally, if there are still unhandled
> interrupts at this point, it may cause interrupt storm.

Is this something that has been seen on real word hardware?
If yes, and halting the host helped ,then this fix is ok by me.
At least until a proper host reset solution is implemented.

Thanks
Mathias

Re: [PATCH] usb: xhci: add xhci_halt() for HCE Handling
Posted by Dayu Jiang 1 week, 3 days ago
On Tue, Jan 27, 2026 at 02:25:19PM +0200, Mathias Nyman wrote:
> Hi
> 
> On 1/27/26 13:04, jiangdayu wrote:
> > When the xHCI controller reports a Host Controller Error (HCE) status
> > in the interrupt handler, the driver currently only logs a warning and
> > continues execution. However, a Host Controller Error indicates a
> > critical hardware failure that requires the controller to be halted.
> > 
> 
> The host should cease all activity when it sets the HCE bit.
> 
> See xHCI spec 4.24.1 'Internal Errors':
> "When the HCE flag is set to ‘1’ the xHC shall cease all activity.
>  Software response to the assertion of HCE is to reset the
>  xHC (HCRST = ‘1’) and reinitialize it."
> 
> Same is true for "Host system error" HSE (STS_FATAL), not sure
> why we halt it manually in that case.
> 
> > Add xhci_halt(xhci) call after the HCE warning to properly halt the
> > controller when this error condition is detected. This ensures the
> > controller is in a consistent state and prevents further operations
> > on a failed hardware. Additionally, if there are still unhandled
> > interrupts at this point, it may cause interrupt storm.
> 
> Is this something that has been seen on real word hardware?
> If yes, and halting the host helped ,then this fix is ok by me.
> At least until a proper host reset solution is implemented.
Yes, the HCE issue (and subsequent interrupt storm) has been consistently 
observed on production Android devices during UASP device plug/unplug 
operations. Adding xhci_halt() effectively resolves the system-level interrupt 
storm issue caused by HCE.
> 
> Thanks
> Mathias
> 
Re: [PATCH] usb: xhci: add xhci_halt() for HCE Handling
Posted by Greg Kroah-Hartman 1 week, 4 days ago
On Tue, Jan 27, 2026 at 07:04:22PM +0800, jiangdayu wrote:
> When the xHCI controller reports a Host Controller Error (HCE) status
> in the interrupt handler, the driver currently only logs a warning and
> continues execution. However, a Host Controller Error indicates a
> critical hardware failure that requires the controller to be halted.
> 
> Add xhci_halt(xhci) call after the HCE warning to properly halt the
> controller when this error condition is detected. This ensures the
> controller is in a consistent state and prevents further operations
> on a failed hardware. Additionally, if there are still unhandled
> interrupts at this point, it may cause interrupt storm.
> 
> The change is made in xhci_irq() function where STS_HCE status is
> checked, mirroring the existing error handling pattern used for
> STS_FATAL errors.
> 
> Fixes: 2a25e66d676df ("xhci: print warning when HCE was set")
> Signed-off-by: jiangdayu <jiangdayu@xiaomi.com>

We need a full name, not an email alias, sorry.

And this isn't really "fixing" that commit, there's nothing wrong with
it as-is.  This is adding new functionality to the code.

> ---
>  drivers/usb/host/xhci-ring.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
> index 9315ba18310d..1cbefee3c4ca 100644
> --- a/drivers/usb/host/xhci-ring.c
> +++ b/drivers/usb/host/xhci-ring.c
> @@ -3195,6 +3195,7 @@ irqreturn_t xhci_irq(struct usb_hcd *hcd)
>  
>  	if (status & STS_HCE) {
>  		xhci_warn(xhci, "WARNING: Host Controller Error\n");
> +		xhci_halt(xhci);

What is going to start things back up again?  And as you are calling
this function, why is the warning message needed anymore?  The
tracepoint information will give you that message now, right?

And is this just papering over a hardware bug?  Should this really be
happening for any normal system?

thanks,

greg k-h
Re: [PATCH] usb: xhci: add xhci_halt() for HCE Handling
Posted by Dayu Jiang 1 week, 3 days ago
On Tue, Jan 27, 2026 at 12:22:40PM +0100, Greg Kroah-Hartman wrote:
> On Tue, Jan 27, 2026 at 07:04:22PM +0800, jiangdayu wrote:
> > When the xHCI controller reports a Host Controller Error (HCE) status
> > in the interrupt handler, the driver currently only logs a warning and
> > continues execution. However, a Host Controller Error indicates a
> > critical hardware failure that requires the controller to be halted.
> > 
> > Add xhci_halt(xhci) call after the HCE warning to properly halt the
> > controller when this error condition is detected. This ensures the
> > controller is in a consistent state and prevents further operations
> > on a failed hardware. Additionally, if there are still unhandled
> > interrupts at this point, it may cause interrupt storm.
> > 
> > The change is made in xhci_irq() function where STS_HCE status is
> > checked, mirroring the existing error handling pattern used for
> > STS_FATAL errors.
> > 
> > Fixes: 2a25e66d676df ("xhci: print warning when HCE was set")
> > Signed-off-by: jiangdayu <jiangdayu@xiaomi.com>
> 
> We need a full name, not an email alias, sorry.
Sorry for the confusion, I will use my full legal name (instead of the 
email alias) in the Signed-off-by line in the revised patch.  
> 
> And this isn't really "fixing" that commit, there's nothing wrong with
> it as-is.  This is adding new functionality to the code.
I initially used the Fixes tag because the original commit only logged
a warning for HCE with no further action, this incomplete handling 
risks interrupt storms on the SoC (since the interrupt isn’t cleared). 
That’s a robustness gap I wanted to fix with this patch.
> 
> > ---
> >  drivers/usb/host/xhci-ring.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
> > index 9315ba18310d..1cbefee3c4ca 100644
> > --- a/drivers/usb/host/xhci-ring.c
> > +++ b/drivers/usb/host/xhci-ring.c
> > @@ -3195,6 +3195,7 @@ irqreturn_t xhci_irq(struct usb_hcd *hcd)
> >  
> >  	if (status & STS_HCE) {
> >  		xhci_warn(xhci, "WARNING: Host Controller Error\n");
> > +		xhci_halt(xhci);
> 
> What is going to start things back up again?  And as you are calling
> this function, why is the warning message needed anymore?  The
> tracepoint information will give you that message now, right?
When HCE is triggered, it indicates a critical hardware failure. 
Aligning with the handling of HSE (STS_FATAL) by adding 
xhci_halt() here is more reasonable: without xhci_halt(), the 
USB controller may fall into an unpredictable and unstable state, 
which could exacerbate system issues.  

Retaining the warning message is necessary because it is directly 
visible in dmesg, whereas tracepoint information requires explicitly 
enabling xHCI tracepoints. Additionally, if xhci_halt() is called in 
xhci_irq() without the warning log, it would be impossible to 
distinguish whether the halt was triggered by HCE or HSE.
> 
> And is this just papering over a hardware bug?  Should this really be
> happening for any normal system?
Yes, this issue has been reproducible on real-world hardware: HCE is 
triggered in UAS Storage Device plug/unplug scenarios on Android 
devices, which enters this error branch and causes an interrupt storm, 
leading to severe system-level faults.
> 
> thanks,
> 
> greg k-h
Re: [PATCH] usb: xhci: add xhci_halt() for HCE Handling
Posted by Greg Kroah-Hartman 1 week, 3 days ago
On Wed, Jan 28, 2026 at 04:48:49PM +0800, Dayu Jiang wrote:
> On Tue, Jan 27, 2026 at 12:22:40PM +0100, Greg Kroah-Hartman wrote:
> > >  	if (status & STS_HCE) {
> > >  		xhci_warn(xhci, "WARNING: Host Controller Error\n");
> > > +		xhci_halt(xhci);
> > 
> > What is going to start things back up again?  And as you are calling
> > this function, why is the warning message needed anymore?  The
> > tracepoint information will give you that message now, right?
> When HCE is triggered, it indicates a critical hardware failure. 
> Aligning with the handling of HSE (STS_FATAL) by adding 
> xhci_halt() here is more reasonable: without xhci_halt(), the 
> USB controller may fall into an unpredictable and unstable state, 
> which could exacerbate system issues.  
> 
> Retaining the warning message is necessary because it is directly 
> visible in dmesg, whereas tracepoint information requires explicitly 
> enabling xHCI tracepoints. Additionally, if xhci_halt() is called in 
> xhci_irq() without the warning log, it would be impossible to 
> distinguish whether the halt was triggered by HCE or HSE.
> > 
> > And is this just papering over a hardware bug?  Should this really be
> > happening for any normal system?
> Yes, this issue has been reproducible on real-world hardware: HCE is 
> triggered in UAS Storage Device plug/unplug scenarios on Android 
> devices, which enters this error branch and causes an interrupt storm, 
> leading to severe system-level faults.

Great, please provide this information in the changelog text when you
resubmit this, thanks!

greg k-h