[PATCH] usb: xhci: Handle quirky SuperSpeed isoc error reporting by Etron HCs

Michal Pecio posted 1 patch 10 months, 1 week ago
drivers/usb/host/xhci-ring.c | 18 +++++++++++++-----
1 file changed, 13 insertions(+), 5 deletions(-)
[PATCH] usb: xhci: Handle quirky SuperSpeed isoc error reporting by Etron HCs
Posted by Michal Pecio 10 months, 1 week ago
xHCI 4.9.1 requires HCs to obey the IOC flag we set on the last TRB even
after an error has been reported on an earlier TRB. This typically means
that an error mid TD is followed by a success event for the last TRB.

On SuperSpeed (and only SS) isochronous endpoints Etron hosts were found
to emit a success event also after an error on the last TRB of a TD.

Reuse the machinery for handling errors mid TD to handle these otherwise
unexpected events. Avoid printing "TRB not part of current TD" errors,
ensure proper tracking of HC's internal dequeue pointer and distinguish
this known quirk from other bogus events caused by ordinary bugs.

This patch was found to eliminate all related warnings and errors while
running for 30 minutes with a UVC camera on a flaky cable which produces
transaction errors about every second. An altsetting was chosen which
causes some TDs to be multi-TRB, dynamic debug was used to confirm that
errors both mid TD and on the last TRB are handled as expected:

[ 6028.439776] xhci_hcd 0000:06:00.0: Transfer error for slot 1 ep 2 on endpoint
[ 6028.439784] xhci_hcd 0000:06:00.0: Error 4 mid isoc TD, wait for final completion event, is_last_trb=1
[ 6028.440268] xhci_hcd 0000:06:00.0: Successful completion on short TX for slot 1 ep 2 with last td short 0
[ 6028.440270] xhci_hcd 0000:06:00.0: Got event 1 after mid TD error
[ 6029.123683] xhci_hcd 0000:06:00.0: Transfer error for slot 1 ep 2 on endpoint
[ 6029.123694] xhci_hcd 0000:06:00.0: Error 4 mid isoc TD, wait for final completion event, is_last_trb=0
[ 6029.123697] xhci_hcd 0000:06:00.0: Successful completion on short TX for slot 1 ep 2 with last td short 0
[ 6029.123700] xhci_hcd 0000:06:00.0: Got event 1 after mid TD error

Handling of Stopped events is unaffected: finish_td() is called but it
does nothing and the TD waits until it's unlinked:

[ 7081.705544] xhci_hcd 0000:06:00.0: Transfer error for slot 1 ep 2 on endpoint
[ 7081.705546] xhci_hcd 0000:06:00.0: Error 4 mid isoc TD, wait for final completion event, is_last_trb=1
[ 7081.705630] xhci_hcd 0000:06:00.0: Stopped on Transfer TRB for slot 1 ep 2
[ 7081.705633] xhci_hcd 0000:06:00.0: Got event 26 after mid TD error
[ 7081.705678] xhci_hcd 0000:06:00.0: Stopped on Transfer TRB for slot 1 ep 2
[ 7081.705680] xhci_hcd 0000:06:00.0: Got event 26 after mid TD error
[ 7081.705759] xhci_hcd 0000:06:00.0: Stopped on No-op or Link TRB for slot 1 ep 2
[ 7081.705799] xhci_hcd 0000:06:00.0: Stopped on No-op or Link TRB for slot 1 ep 2

Reported-by: Kuangyi Chiang <ki.chiang65@gmail.com>
Closes: https://lore.kernel.org/linux-usb/20250205053750.28251-1-ki.chiang65@gmail.com/T/
Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
---



Hi Mathias,

This is the best I was able to do. It does add a few lines, but I don't
think it's too scary and IMO the switch looks even better this way. It
accurately predicts those events while not breaking anything else that
I can see or think of, save for the risk of firmware bugfix adding one
ESIT of latency on errors.

I tried to also test your Etron patch but it has whitespace damage all
over the place and would be hard to apply.

Regards,
Michal


 drivers/usb/host/xhci-ring.c | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 965bffce301e..7ff5075e5890 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -2437,6 +2437,7 @@ static void process_isoc_td(struct xhci_hcd *xhci, struct xhci_virt_ep *ep,
 	bool sum_trbs_for_length = false;
 	u32 remaining, requested, ep_trb_len;
 	int short_framestatus;
+	bool error_event = false, etron_quirk = false;
 
 	trb_comp_code = GET_COMP_CODE(le32_to_cpu(event->transfer_len));
 	urb_priv = td->urb->hcpriv;
@@ -2473,8 +2474,7 @@ static void process_isoc_td(struct xhci_hcd *xhci, struct xhci_virt_ep *ep,
 		fallthrough;
 	case COMP_ISOCH_BUFFER_OVERRUN:
 		frame->status = -EOVERFLOW;
-		if (ep_trb != td->end_trb)
-			td->error_mid_td = true;
+		error_event = true;
 		break;
 	case COMP_INCOMPATIBLE_DEVICE_ERROR:
 	case COMP_STALL_ERROR:
@@ -2483,8 +2483,7 @@ static void process_isoc_td(struct xhci_hcd *xhci, struct xhci_virt_ep *ep,
 	case COMP_USB_TRANSACTION_ERROR:
 		frame->status = -EPROTO;
 		sum_trbs_for_length = true;
-		if (ep_trb != td->end_trb)
-			td->error_mid_td = true;
+		error_event = true;
 		break;
 	case COMP_STOPPED:
 		sum_trbs_for_length = true;
@@ -2518,8 +2517,17 @@ static void process_isoc_td(struct xhci_hcd *xhci, struct xhci_virt_ep *ep,
 	td->urb->actual_length += frame->actual_length;
 
 finish_td:
+	/* An error event mid TD will be followed by more events, xHCI 4.9.1 */
+	td->error_mid_td |= error_event && (ep_trb != td->end_trb);
+
+	/* Etron treats *all* SuperSpeed isoc errors like errors mid TD */
+	if (xhci->quirks & XHCI_ETRON_HOST && td->urb->dev->speed == USB_SPEED_SUPER) {
+		td->error_mid_td |= error_event;
+		etron_quirk |= error_event;
+	}
+
 	/* Don't give back TD yet if we encountered an error mid TD */
-	if (td->error_mid_td && ep_trb != td->end_trb) {
+	if (td->error_mid_td && (ep_trb != td->end_trb || etron_quirk)) {
 		xhci_dbg(xhci, "Error mid isoc TD, wait for final completion event\n");
 		td->urb_length_set = true;
 		return;
-- 
2.48.1
Re: [PATCH] usb: xhci: Handle quirky SuperSpeed isoc error reporting by Etron HCs
Posted by Kuangyi Chiang 10 months, 1 week ago
Hi,

Michal Pecio <michal.pecio@gmail.com> 於 2025年2月11日 週二 下午8:36寫道:
>
> xHCI 4.9.1 requires HCs to obey the IOC flag we set on the last TRB even
> after an error has been reported on an earlier TRB. This typically means
> that an error mid TD is followed by a success event for the last TRB.
>
> On SuperSpeed (and only SS) isochronous endpoints Etron hosts were found
> to emit a success event also after an error on the last TRB of a TD.
>
> Reuse the machinery for handling errors mid TD to handle these otherwise
> unexpected events. Avoid printing "TRB not part of current TD" errors,
> ensure proper tracking of HC's internal dequeue pointer and distinguish
> this known quirk from other bogus events caused by ordinary bugs.
>
> This patch was found to eliminate all related warnings and errors while
> running for 30 minutes with a UVC camera on a flaky cable which produces
> transaction errors about every second. An altsetting was chosen which
> causes some TDs to be multi-TRB, dynamic debug was used to confirm that
> errors both mid TD and on the last TRB are handled as expected:
>
> [ 6028.439776] xhci_hcd 0000:06:00.0: Transfer error for slot 1 ep 2 on endpoint
> [ 6028.439784] xhci_hcd 0000:06:00.0: Error 4 mid isoc TD, wait for final completion event, is_last_trb=1
> [ 6028.440268] xhci_hcd 0000:06:00.0: Successful completion on short TX for slot 1 ep 2 with last td short 0
> [ 6028.440270] xhci_hcd 0000:06:00.0: Got event 1 after mid TD error
> [ 6029.123683] xhci_hcd 0000:06:00.0: Transfer error for slot 1 ep 2 on endpoint
> [ 6029.123694] xhci_hcd 0000:06:00.0: Error 4 mid isoc TD, wait for final completion event, is_last_trb=0
> [ 6029.123697] xhci_hcd 0000:06:00.0: Successful completion on short TX for slot 1 ep 2 with last td short 0
> [ 6029.123700] xhci_hcd 0000:06:00.0: Got event 1 after mid TD error
>
> Handling of Stopped events is unaffected: finish_td() is called but it
> does nothing and the TD waits until it's unlinked:
>
> [ 7081.705544] xhci_hcd 0000:06:00.0: Transfer error for slot 1 ep 2 on endpoint
> [ 7081.705546] xhci_hcd 0000:06:00.0: Error 4 mid isoc TD, wait for final completion event, is_last_trb=1
> [ 7081.705630] xhci_hcd 0000:06:00.0: Stopped on Transfer TRB for slot 1 ep 2
> [ 7081.705633] xhci_hcd 0000:06:00.0: Got event 26 after mid TD error
> [ 7081.705678] xhci_hcd 0000:06:00.0: Stopped on Transfer TRB for slot 1 ep 2
> [ 7081.705680] xhci_hcd 0000:06:00.0: Got event 26 after mid TD error
> [ 7081.705759] xhci_hcd 0000:06:00.0: Stopped on No-op or Link TRB for slot 1 ep 2
> [ 7081.705799] xhci_hcd 0000:06:00.0: Stopped on No-op or Link TRB for slot 1 ep 2
>
> Reported-by: Kuangyi Chiang <ki.chiang65@gmail.com>
> Closes: https://lore.kernel.org/linux-usb/20250205053750.28251-1-ki.chiang65@gmail.com/T/
> Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
> ---
>
>
>
> Hi Mathias,
>
> This is the best I was able to do. It does add a few lines, but I don't
> think it's too scary and IMO the switch looks even better this way. It
> accurately predicts those events while not breaking anything else that
> I can see or think of, save for the risk of firmware bugfix adding one
> ESIT of latency on errors.
>
> I tried to also test your Etron patch but it has whitespace damage all
> over the place and would be hard to apply.
>
> Regards,
> Michal
>
>
>  drivers/usb/host/xhci-ring.c | 18 +++++++++++++-----
>  1 file changed, 13 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
> index 965bffce301e..7ff5075e5890 100644
> --- a/drivers/usb/host/xhci-ring.c
> +++ b/drivers/usb/host/xhci-ring.c
> @@ -2437,6 +2437,7 @@ static void process_isoc_td(struct xhci_hcd *xhci, struct xhci_virt_ep *ep,
>         bool sum_trbs_for_length = false;
>         u32 remaining, requested, ep_trb_len;
>         int short_framestatus;
> +       bool error_event = false, etron_quirk = false;
>
>         trb_comp_code = GET_COMP_CODE(le32_to_cpu(event->transfer_len));
>         urb_priv = td->urb->hcpriv;
> @@ -2473,8 +2474,7 @@ static void process_isoc_td(struct xhci_hcd *xhci, struct xhci_virt_ep *ep,
>                 fallthrough;
>         case COMP_ISOCH_BUFFER_OVERRUN:
>                 frame->status = -EOVERFLOW;
> -               if (ep_trb != td->end_trb)
> -                       td->error_mid_td = true;
> +               error_event = true;
>                 break;
>         case COMP_INCOMPATIBLE_DEVICE_ERROR:
>         case COMP_STALL_ERROR:
> @@ -2483,8 +2483,7 @@ static void process_isoc_td(struct xhci_hcd *xhci, struct xhci_virt_ep *ep,
>         case COMP_USB_TRANSACTION_ERROR:
>                 frame->status = -EPROTO;
>                 sum_trbs_for_length = true;
> -               if (ep_trb != td->end_trb)
> -                       td->error_mid_td = true;
> +               error_event = true;
>                 break;
>         case COMP_STOPPED:
>                 sum_trbs_for_length = true;
> @@ -2518,8 +2517,17 @@ static void process_isoc_td(struct xhci_hcd *xhci, struct xhci_virt_ep *ep,
>         td->urb->actual_length += frame->actual_length;
>
>  finish_td:
> +       /* An error event mid TD will be followed by more events, xHCI 4.9.1 */
> +       td->error_mid_td |= error_event && (ep_trb != td->end_trb);
> +
> +       /* Etron treats *all* SuperSpeed isoc errors like errors mid TD */
> +       if (xhci->quirks & XHCI_ETRON_HOST && td->urb->dev->speed == USB_SPEED_SUPER) {
> +               td->error_mid_td |= error_event;
> +               etron_quirk |= error_event;

This would be the same as etron_quirk = error_event; right?

> +       }
> +
>         /* Don't give back TD yet if we encountered an error mid TD */
> -       if (td->error_mid_td && ep_trb != td->end_trb) {
> +       if (td->error_mid_td && (ep_trb != td->end_trb || etron_quirk)) {
>                 xhci_dbg(xhci, "Error mid isoc TD, wait for final completion event\n");
>                 td->urb_length_set = true;
>                 return;
> --
> 2.48.1

I tested this with Etron EJ168 and EJ188 under Linux-6.13.1. It works.

Thanks,
Kuangyi Chiang
Re: [PATCH] usb: xhci: Handle quirky SuperSpeed isoc error reporting by Etron HCs
Posted by Michał Pecio 10 months, 1 week ago
On Wed, 12 Feb 2025 13:59:49 +0800, Kuangyi Chiang wrote:
> > +       if (xhci->quirks & XHCI_ETRON_HOST && td->urb->dev->speed == USB_SPEED_SUPER) {
> > +               td->error_mid_td |= error_event;
> > +               etron_quirk |= error_event;  
> 
> This would be the same as etron_quirk = error_event; right?

Yeah, same thing I guess.

> I tested this with Etron EJ168 and EJ188 under Linux-6.13.1. It works.

Well, I found one case where it doesn't work optimally. There is a
separate patch to skip "Missed Service Error" TDs immediately when the
error is reported and also to treat MSE as another 'error mid TD', so
with this Etron patch we would end up expecting spurious success after
an MSE on the last TRB.

Well, AFAIS, no such event is generated by Etron in this case so we are
back to waiting till next event and then giving back the missed TD.


Maybe I will seriously look into decoupling giveback and dequeue ptr
tracking, not only for those spurious Etron events but everywhere.

Mathias is right that HW has no sensible reason to touch DMA buffers
after an error, I will look if the spec is very explicit about it.
If so, we could give back TDs after the first event and merely keep
enough information to recognize and silently ignore further events.

Michal
Re: [PATCH] usb: xhci: Handle quirky SuperSpeed isoc error reporting by Etron HCs
Posted by Mathias Nyman 9 months, 3 weeks ago
On 12.2.2025 10.12, Michał Pecio wrote:
> On Wed, 12 Feb 2025 13:59:49 +0800, Kuangyi Chiang wrote:
>>> +       if (xhci->quirks & XHCI_ETRON_HOST && td->urb->dev->speed == USB_SPEED_SUPER) {
>>> +               td->error_mid_td |= error_event;
>>> +               etron_quirk |= error_event;
>>
>> This would be the same as etron_quirk = error_event; right?
> 
> Yeah, same thing I guess.
> 
>> I tested this with Etron EJ168 and EJ188 under Linux-6.13.1. It works.
> 
> Well, I found one case where it doesn't work optimally. There is a
> separate patch to skip "Missed Service Error" TDs immediately when the
> error is reported and also to treat MSE as another 'error mid TD', so
> with this Etron patch we would end up expecting spurious success after
> an MSE on the last TRB.
> 
> Well, AFAIS, no such event is generated by Etron in this case so we are
> back to waiting till next event and then giving back the missed TD.
> 
> 
> Maybe I will seriously look into decoupling giveback and dequeue ptr
> tracking, not only for those spurious Etron events but everywhere.
> 
> Mathias is right that HW has no sensible reason to touch DMA buffers
> after an error, I will look if the spec is very explicit about it.
> If so, we could give back TDs after the first event and merely keep
> enough information to recognize and silently ignore further events.

This issue was left hanging, I'll clean up my proposal and send it as
a proper RFT PATCH.

Thanks
Mathias

Re: [PATCH] usb: xhci: Handle quirky SuperSpeed isoc error reporting by Etron HCs
Posted by Michał Pecio 9 months, 3 weeks ago
On Fri, 28 Feb 2025 18:13:50 +0200, Mathias Nyman wrote:
> On 12.2.2025 10.12, Michał Pecio wrote:
> > Maybe I will seriously look into decoupling giveback and dequeue ptr
> > tracking, not only for those spurious Etron events but everywhere.
> > 
> > Mathias is right that HW has no sensible reason to touch DMA buffers
> > after an error, I will look if the spec is very explicit about it.
> > If so, we could give back TDs after the first event and merely keep
> > enough information to recognize and silently ignore further events.
> >  
> 
> This issue was left hanging, I'll clean up my proposal and send it as
> a proper RFT PATCH.

I think it would be more pragmatic to have 'next_comp_code' instead of
'last_comp_code', because then you don't need this new helper function
which basically duplicates the switch statement from process_isoc_td().

And as long as Success is the only 'next_comp_code' supported, it can
be a simple boolean flag. So, basically, rename 'last_td_was_short' to
'expect_success_event', set it in process_isoc_td() and that's all.


What are your thoughts about killing error_mid_td completely and using
a similar mechanism to deal with those final events?

1. The events would be taken care of.

2. It should be OK wrt DMA, because the HC has no reason to touch data
buffers after an error. Short Packet is done this way and it works.

3. A remaining problem is that dequeue is advanced to end_trb too soon
and "tail" of the TD could be overwritten. Already a problem with Short
Packet and I think it can be solved by replacing most xhci_dequeue_td()
calls with xhci_td_cleanup() and adding to handle_tx_event():

    ep_ring->dequeue = ep_trb;
    ep_ring->deq_seg = ep_seg;


Regards,
Michal
Re: [PATCH] usb: xhci: Handle quirky SuperSpeed isoc error reporting by Etron HCs
Posted by Michał Pecio 9 months, 3 weeks ago
On Fri, 28 Feb 2025 18:11:46 +0100, Michał Pecio wrote:
> What are your thoughts about killing error_mid_td completely and using
> a similar mechanism to deal with those final events?
> 
> 1. The events would be taken care of.
> 
> 2. It should be OK wrt DMA, because the HC has no reason to touch data
> buffers after an error. Short Packet is done this way and it works.
> 
> 3. A remaining problem is that dequeue is advanced to end_trb too soon
> and "tail" of the TD could be overwritten. Already a problem with
> Short Packet and I think it can be solved by replacing most
> xhci_dequeue_td() calls with xhci_td_cleanup() and adding to
> handle_tx_event():
> 
>     ep_ring->dequeue = ep_trb;
>     ep_ring->deq_seg = ep_seg;

Forgot to add:

4. Guaranteed low latency of error reporting.

5. Some annoying code for giving back 'error_mid_td' URBs under weird
corner cases that I recently spent a few hours writing could be thrown
out and handle_tx_event() would become a little simpler.
[RFT PATCH] xhci: Handle spurious events on Etron host isoc enpoints
Posted by Mathias Nyman 9 months, 3 weeks ago
Unplugging a USB3.0 webcam from Etron hosts while streaming results
in errors like this:

[ 2.646387] xhci_hcd 0000:03:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 18 comp_code 13
[ 2.646446] xhci_hcd 0000:03:00.0: Looking for event-dma 000000002fdf8630 trb-start 000000002fdf8640 trb-end 000000002fdf8650
[ 2.646560] xhci_hcd 0000:03:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 18 comp_code 13
[ 2.646568] xhci_hcd 0000:03:00.0: Looking for event-dma 000000002fdf8660 trb-start 000000002fdf8670 trb-end 000000002fdf8670

Etron xHC generates two transfer events for the TRB if an error is
detected while processing the last TRB of an isoc TD.

The first event can be any sort of error (like USB Transaction or
Babble Detected, etc), and the final event is Success.

The xHCI driver will handle the TD after the first event and remove it
from its internal list, and then print an "Transfer event TRB DMA ptr
not part of current TD" error message after the final event.

Commit 5372c65e1311 ("xhci: process isoc TD properly when there was a
transaction error mid TD.") is designed to address isoc transaction
errors, but unfortunately it doesn't account for this scenario.

This issue is similar to the XHCI_SPURIOUS_SUCCESS case where a success
event follows a 'short transfer' event, but the TD the event points to
is already given back.

Expand the spurious success 'short transfer' event handling to cover
the spurious success after error on Etron hosts.

Kuangyi Chiang reported this issue and submitted a different solution
based on using error_mid_td. This commit message is mostly taken
from that patch.

Reported-by: Kuangyi Chiang <ki.chiang65@gmail.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
---
 drivers/usb/host/xhci-ring.c | 36 +++++++++++++++++++++++++-----------
 drivers/usb/host/xhci.h      |  2 +-
 2 files changed, 26 insertions(+), 12 deletions(-)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 965bffce301e..3d3e6cd69019 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -2644,6 +2644,22 @@ static int handle_transferless_tx_event(struct xhci_hcd *xhci, struct xhci_virt_
 	return 0;
 }
 
+static bool xhci_spurious_success_tx_event(struct xhci_hcd *xhci,
+					   struct xhci_ring *ring)
+{
+	switch (ring->old_trb_comp_code) {
+	case COMP_SHORT_PACKET:
+		return xhci->quirks & XHCI_SPURIOUS_SUCCESS;
+	case COMP_USB_TRANSACTION_ERROR:
+	case COMP_BABBLE_DETECTED_ERROR:
+	case COMP_ISOCH_BUFFER_OVERRUN:
+		return xhci->quirks & XHCI_ETRON_HOST &&
+			ring->type == TYPE_ISOC;
+	default:
+		return false;
+	}
+}
+
 /*
  * If this function returns an error condition, it means it got a Transfer
  * event with a corrupted Slot ID, Endpoint ID, or TRB DMA address.
@@ -2697,8 +2713,8 @@ static int handle_tx_event(struct xhci_hcd *xhci,
 	case COMP_SUCCESS:
 		if (EVENT_TRB_LEN(le32_to_cpu(event->transfer_len)) != 0) {
 			trb_comp_code = COMP_SHORT_PACKET;
-			xhci_dbg(xhci, "Successful completion on short TX for slot %u ep %u with last td short %d\n",
-				 slot_id, ep_index, ep_ring->last_td_was_short);
+			xhci_dbg(xhci, "Successful completion on short TX for slot %u ep %u with last td comp code %d\n",
+				 slot_id, ep_index, ep_ring->old_trb_comp_code);
 		}
 		break;
 	case COMP_SHORT_PACKET:
@@ -2846,7 +2862,7 @@ static int handle_tx_event(struct xhci_hcd *xhci,
 		 */
 		if (trb_comp_code != COMP_STOPPED &&
 		    trb_comp_code != COMP_STOPPED_LENGTH_INVALID &&
-		    !ep_ring->last_td_was_short) {
+		    !xhci_spurious_success_tx_event(xhci, ep_ring)) {
 			xhci_warn(xhci, "Event TRB for slot %u ep %u with no TDs queued\n",
 				  slot_id, ep_index);
 		}
@@ -2890,11 +2906,12 @@ static int handle_tx_event(struct xhci_hcd *xhci,
 
 			/*
 			 * Some hosts give a spurious success event after a short
-			 * transfer. Ignore it.
+			 * transfer or error on last TRB. Ignore it.
 			 */
-			if ((xhci->quirks & XHCI_SPURIOUS_SUCCESS) &&
-			    ep_ring->last_td_was_short) {
-				ep_ring->last_td_was_short = false;
+			if (xhci_spurious_success_tx_event(xhci, ep_ring)) {
+				xhci_dbg(xhci, "Spurious event dma %pad, comp_code %u after %u\n",
+					 &ep_trb_dma, trb_comp_code, ep_ring->old_trb_comp_code);
+				ep_ring->old_trb_comp_code = trb_comp_code;
 				return 0;
 			}
 
@@ -2922,10 +2939,7 @@ static int handle_tx_event(struct xhci_hcd *xhci,
 	 */
 	} while (ep->skip);
 
-	if (trb_comp_code == COMP_SHORT_PACKET)
-		ep_ring->last_td_was_short = true;
-	else
-		ep_ring->last_td_was_short = false;
+	ep_ring->old_trb_comp_code = trb_comp_code;
 
 	ep_trb = &ep_seg->trbs[(ep_trb_dma - ep_seg->dma) / sizeof(*ep_trb)];
 	trace_xhci_handle_transfer(ep_ring, (struct xhci_generic_trb *) ep_trb, ep_trb_dma);
diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
index 8c164340a2c3..c75c2c12ce53 100644
--- a/drivers/usb/host/xhci.h
+++ b/drivers/usb/host/xhci.h
@@ -1371,7 +1371,7 @@ struct xhci_ring {
 	unsigned int		num_trbs_free; /* used only by xhci DbC */
 	unsigned int		bounce_buf_len;
 	enum xhci_ring_type	type;
-	bool			last_td_was_short;
+	u32			last_td_comp_code;
 	struct radix_tree_root	*trb_address_map;
 };
 
-- 
2.43.0
Re: [RFT PATCH] xhci: Handle spurious events on Etron host isoc enpoints
Posted by Michał Pecio 9 months, 2 weeks ago
On Fri, 28 Feb 2025 18:18:24 +0200, Mathias Nyman wrote:
> Unplugging a USB3.0 webcam from Etron hosts while streaming results
> in errors like this:
> 
> [ 2.646387] xhci_hcd 0000:03:00.0: ERROR Transfer event TRB DMA ptr
> not part of current TD ep_index 18 comp_code 13 [ 2.646446] xhci_hcd
> 0000:03:00.0: Looking for event-dma 000000002fdf8630 trb-start
> 000000002fdf8640 trb-end 000000002fdf8650 [ 2.646560] xhci_hcd
> 0000:03:00.0: ERROR Transfer event TRB DMA ptr not part of current TD
> ep_index 18 comp_code 13 [ 2.646568] xhci_hcd 0000:03:00.0: Looking
> for event-dma 000000002fdf8660 trb-start 000000002fdf8670 trb-end
> 000000002fdf8670
> 
> Etron xHC generates two transfer events for the TRB if an error is
> detected while processing the last TRB of an isoc TD.
> 
> The first event can be any sort of error (like USB Transaction or
> Babble Detected, etc), and the final event is Success.
> 
> The xHCI driver will handle the TD after the first event and remove it
> from its internal list, and then print an "Transfer event TRB DMA ptr
> not part of current TD" error message after the final event.
> 
> Commit 5372c65e1311 ("xhci: process isoc TD properly when there was a
> transaction error mid TD.") is designed to address isoc transaction
> errors, but unfortunately it doesn't account for this scenario.
> 
> This issue is similar to the XHCI_SPURIOUS_SUCCESS case where a
> success event follows a 'short transfer' event, but the TD the event
> points to is already given back.
> 
> Expand the spurious success 'short transfer' event handling to cover
> the spurious success after error on Etron hosts.
> 
> Kuangyi Chiang reported this issue and submitted a different solution
> based on using error_mid_td. This commit message is mostly taken
> from that patch.
> 
> Reported-by: Kuangyi Chiang <ki.chiang65@gmail.com>
> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>

Works here too, modulo the obvious build problem.

Etron with errors:
[ 1064.865311] xhci_hcd 0000:06:00.0: Transfer error for slot 1 ep 2 on endpoint
[ 1064.865322] xhci_hcd 0000:06:00.0: Successful completion on short TX for slot 1 ep 2 with last td comp code 4
[ 1064.865326] xhci_hcd 0000:06:00.0: Spurious event dma 0x00000000ffef88c0, comp_code 13 after 4

Renesas with short packets:
[ 1365.299218] xhci_hcd 0000:08:00.0: Successful completion on short TX for slot 1 ep 2 with last td comp code 13
[ 1365.299223] xhci_hcd 0000:08:00.0: Spurious event dma 0x00000000ffbbf870, comp_code 13 after 13

BTW, it says "comp_code 13 after something" because of this crazy
TRUST_TX_LENGTH hack, which changes trb_comp_code if it's success
but the residual is nonzero. If I remove the hack,

Etron:
[ 2713.630443] xhci_hcd 0000:06:00.0: Spurious event dma 0x00000000ff3b6130, comp_code 1 after 4

Renesas:
[ 4033.652300] xhci_hcd 0000:08:00.0: Spurious event dma 0x00000000ffcd1b80, comp_code 1 after 13


The hack could almost be removed now, but if there really are HCs
which report Success on the first event, this won't work for them:

> +static bool xhci_spurious_success_tx_event(struct xhci_hcd *xhci,
> +					   struct xhci_ring *ring)
> +{
> +	switch (ring->old_trb_comp_code) {
> +	case COMP_SHORT_PACKET:
> +		return xhci->quirks & XHCI_SPURIOUS_SUCCESS;

Could it work without relying on fictional COMP_SHORT_PACKET events?

> +			if (xhci_spurious_success_tx_event(xhci, ep_ring)) {
> +				xhci_dbg(xhci, "Spurious event dma %pad, comp_code %u after %u\n",
> +					 &ep_trb_dma, trb_comp_code, ep_ring->old_trb_comp_code);
> +				ep_ring->old_trb_comp_code = trb_comp_code;

This part will (quite arbitrarily IMO) not execute if td_list is empty.

I had this idea that "empty td_list" and "no matching TD on td_list"
are practically identical cases, and their code could be merged.
Re: [RFT PATCH] xhci: Handle spurious events on Etron host isoc enpoints
Posted by Mathias Nyman 9 months, 2 weeks ago
On 3.3.2025 12.34, Michał Pecio wrote:
> 
> Works here too, modulo the obvious build problem.

Thanks for testing on both Etron and Renesas hosts.

> 
> Etron with errors:
> [ 1064.865311] xhci_hcd 0000:06:00.0: Transfer error for slot 1 ep 2 on endpoint
> [ 1064.865322] xhci_hcd 0000:06:00.0: Successful completion on short TX for slot 1 ep 2 with last td comp code 4
> [ 1064.865326] xhci_hcd 0000:06:00.0: Spurious event dma 0x00000000ffef88c0, comp_code 13 after 4
> 
> Renesas with short packets:
> [ 1365.299218] xhci_hcd 0000:08:00.0: Successful completion on short TX for slot 1 ep 2 with last td comp code 13
> [ 1365.299223] xhci_hcd 0000:08:00.0: Spurious event dma 0x00000000ffbbf870, comp_code 13 after 13
> 
> BTW, it says "comp_code 13 after something" because of this crazy
> TRUST_TX_LENGTH hack, which changes trb_comp_code if it's success
> but the residual is nonzero. If I remove the hack,
> 
> Etron:
> [ 2713.630443] xhci_hcd 0000:06:00.0: Spurious event dma 0x00000000ff3b6130, comp_code 1 after 4
> 
> Renesas:
> [ 4033.652300] xhci_hcd 0000:08:00.0: Spurious event dma 0x00000000ffcd1b80, comp_code 1 after 13
> 
> 
> The hack could almost be removed now, but if there really are HCs
> which report Success on the first event, this won't work for them:

This looks better, and I agree that the hack/quirk is annoying, but in fear of regression I
don't want to touch that in this patch yet.

> 
>> +static bool xhci_spurious_success_tx_event(struct xhci_hcd *xhci,
>> +					   struct xhci_ring *ring)
>> +{
>> +	switch (ring->old_trb_comp_code) {
>> +	case COMP_SHORT_PACKET:
>> +		return xhci->quirks & XHCI_SPURIOUS_SUCCESS;
> 
> Could it work without relying on fictional COMP_SHORT_PACKET events?
> 
>> +			if (xhci_spurious_success_tx_event(xhci, ep_ring)) {
>> +				xhci_dbg(xhci, "Spurious event dma %pad, comp_code %u after %u\n",
>> +					 &ep_trb_dma, trb_comp_code, ep_ring->old_trb_comp_code);
>> +				ep_ring->old_trb_comp_code = trb_comp_code;
> 
> This part will (quite arbitrarily IMO) not execute if td_list is empty.

Yes, if td_list is empty we don't take this path, and we don't print any
"ERROR Transfer event TRB DMA ptr not part of current TD..." message either,
just like before.

> 
> I had this idea that "empty td_list" and "no matching TD on td_list"
> are practically identical cases, and their code could be merged.'

Possibly, but not in this patch.

Thanks
Mathias
Re: [RFT PATCH] xhci: Handle spurious events on Etron host isoc enpoints
Posted by Michał Pecio 9 months, 2 weeks ago
On Mon, 3 Mar 2025 17:08:39 +0200, Mathias Nyman wrote:
> > The hack could almost be removed now, but if there really are HCs
> > which report Success on the first event, this won't work for them:  
> 
> This looks better, and I agree that the hack/quirk is annoying, but
> in fear of regression I don't want to touch that in this patch yet.

For the record, I didn't mean removing support for HCs reporting
Success with nonzero residual, the problem may be real and the commit
which introduced this code describes plausible symptoms.

But handle_tx_event() part of this workaround could be done without
changing trb_comp_code, like process_xxx_td() are. You are replacing
practically all of this code already, so it's an opportunity.

And one more thing:

> -				ep_ring->last_td_was_short = false;
...
> +				ep_ring->old_trb_comp_code = trb_comp_code;

This is a behavior change, due to the aforementioned hack. You are
replacing comp_code 13 with 13 and the mechanism stays "armed". It
will continue silently ignoring arbitrary events because there is
no validation if they really came from the "old" TD's TRB.

It's a pet peeve of mine, because I have already seen cases when the
old mechanism "swallows" illegal events which should be reported, and
now this problem may only get worse.

You can preserve behavior by clearing old_trb_comp_code to 0 or -1
or otherwise marking this entry as "inactive".

Regards,
Michal
Re: [RFT PATCH] xhci: Handle spurious events on Etron host isoc enpoints
Posted by Kuangyi Chiang 9 months, 2 weeks ago
Hi,

Thanks for the patch.

Mathias Nyman <mathias.nyman@linux.intel.com> 於 2025年3月1日 週六 上午12:17寫道:
>
> Unplugging a USB3.0 webcam from Etron hosts while streaming results
> in errors like this:
>
> [ 2.646387] xhci_hcd 0000:03:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 18 comp_code 13
> [ 2.646446] xhci_hcd 0000:03:00.0: Looking for event-dma 000000002fdf8630 trb-start 000000002fdf8640 trb-end 000000002fdf8650
> [ 2.646560] xhci_hcd 0000:03:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 18 comp_code 13
> [ 2.646568] xhci_hcd 0000:03:00.0: Looking for event-dma 000000002fdf8660 trb-start 000000002fdf8670 trb-end 000000002fdf8670
>
> Etron xHC generates two transfer events for the TRB if an error is
> detected while processing the last TRB of an isoc TD.
>
> The first event can be any sort of error (like USB Transaction or
> Babble Detected, etc), and the final event is Success.
>
> The xHCI driver will handle the TD after the first event and remove it
> from its internal list, and then print an "Transfer event TRB DMA ptr
> not part of current TD" error message after the final event.
>
> Commit 5372c65e1311 ("xhci: process isoc TD properly when there was a
> transaction error mid TD.") is designed to address isoc transaction
> errors, but unfortunately it doesn't account for this scenario.
>
> This issue is similar to the XHCI_SPURIOUS_SUCCESS case where a success
> event follows a 'short transfer' event, but the TD the event points to
> is already given back.
>
> Expand the spurious success 'short transfer' event handling to cover
> the spurious success after error on Etron hosts.
>
> Kuangyi Chiang reported this issue and submitted a different solution
> based on using error_mid_td. This commit message is mostly taken
> from that patch.
>
> Reported-by: Kuangyi Chiang <ki.chiang65@gmail.com>
> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
> ---
>  drivers/usb/host/xhci-ring.c | 36 +++++++++++++++++++++++++-----------
>  drivers/usb/host/xhci.h      |  2 +-
>  2 files changed, 26 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
> index 965bffce301e..3d3e6cd69019 100644
> --- a/drivers/usb/host/xhci-ring.c
> +++ b/drivers/usb/host/xhci-ring.c
> @@ -2644,6 +2644,22 @@ static int handle_transferless_tx_event(struct xhci_hcd *xhci, struct xhci_virt_
>         return 0;
>  }
>
> +static bool xhci_spurious_success_tx_event(struct xhci_hcd *xhci,
> +                                          struct xhci_ring *ring)
> +{
> +       switch (ring->old_trb_comp_code) {
> +       case COMP_SHORT_PACKET:
> +               return xhci->quirks & XHCI_SPURIOUS_SUCCESS;
> +       case COMP_USB_TRANSACTION_ERROR:
> +       case COMP_BABBLE_DETECTED_ERROR:
> +       case COMP_ISOCH_BUFFER_OVERRUN:
> +               return xhci->quirks & XHCI_ETRON_HOST &&
> +                       ring->type == TYPE_ISOC;
> +       default:
> +               return false;
> +       }
> +}
> +
>  /*
>   * If this function returns an error condition, it means it got a Transfer
>   * event with a corrupted Slot ID, Endpoint ID, or TRB DMA address.
> @@ -2697,8 +2713,8 @@ static int handle_tx_event(struct xhci_hcd *xhci,
>         case COMP_SUCCESS:
>                 if (EVENT_TRB_LEN(le32_to_cpu(event->transfer_len)) != 0) {
>                         trb_comp_code = COMP_SHORT_PACKET;
> -                       xhci_dbg(xhci, "Successful completion on short TX for slot %u ep %u with last td short %d\n",
> -                                slot_id, ep_index, ep_ring->last_td_was_short);
> +                       xhci_dbg(xhci, "Successful completion on short TX for slot %u ep %u with last td comp code %d\n",
> +                                slot_id, ep_index, ep_ring->old_trb_comp_code);
>                 }
>                 break;
>         case COMP_SHORT_PACKET:
> @@ -2846,7 +2862,7 @@ static int handle_tx_event(struct xhci_hcd *xhci,
>                  */
>                 if (trb_comp_code != COMP_STOPPED &&
>                     trb_comp_code != COMP_STOPPED_LENGTH_INVALID &&
> -                   !ep_ring->last_td_was_short) {
> +                   !xhci_spurious_success_tx_event(xhci, ep_ring)) {
>                         xhci_warn(xhci, "Event TRB for slot %u ep %u with no TDs queued\n",
>                                   slot_id, ep_index);
>                 }
> @@ -2890,11 +2906,12 @@ static int handle_tx_event(struct xhci_hcd *xhci,
>
>                         /*
>                          * Some hosts give a spurious success event after a short
> -                        * transfer. Ignore it.
> +                        * transfer or error on last TRB. Ignore it.
>                          */
> -                       if ((xhci->quirks & XHCI_SPURIOUS_SUCCESS) &&
> -                           ep_ring->last_td_was_short) {
> -                               ep_ring->last_td_was_short = false;
> +                       if (xhci_spurious_success_tx_event(xhci, ep_ring)) {
> +                               xhci_dbg(xhci, "Spurious event dma %pad, comp_code %u after %u\n",
> +                                        &ep_trb_dma, trb_comp_code, ep_ring->old_trb_comp_code);
> +                               ep_ring->old_trb_comp_code = trb_comp_code;
>                                 return 0;
>                         }
>
> @@ -2922,10 +2939,7 @@ static int handle_tx_event(struct xhci_hcd *xhci,
>          */
>         } while (ep->skip);
>
> -       if (trb_comp_code == COMP_SHORT_PACKET)
> -               ep_ring->last_td_was_short = true;
> -       else
> -               ep_ring->last_td_was_short = false;
> +       ep_ring->old_trb_comp_code = trb_comp_code;
>
>         ep_trb = &ep_seg->trbs[(ep_trb_dma - ep_seg->dma) / sizeof(*ep_trb)];
>         trace_xhci_handle_transfer(ep_ring, (struct xhci_generic_trb *) ep_trb, ep_trb_dma);
> diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
> index 8c164340a2c3..c75c2c12ce53 100644
> --- a/drivers/usb/host/xhci.h
> +++ b/drivers/usb/host/xhci.h
> @@ -1371,7 +1371,7 @@ struct xhci_ring {
>         unsigned int            num_trbs_free; /* used only by xhci DbC */
>         unsigned int            bounce_buf_len;
>         enum xhci_ring_type     type;
> -       bool                    last_td_was_short;
> +       u32                     last_td_comp_code;

Should be changed to old_trb_comp_code.

>         struct radix_tree_root  *trb_address_map;
>  };
>
> --
> 2.43.0
>

I'll test this later.

Thanks,
Kuangyi Chiang
Re: [RFT PATCH] xhci: Handle spurious events on Etron host isoc enpoints
Posted by Mathias Nyman 9 months, 2 weeks ago
On 1.3.2025 4.05, Kuangyi Chiang wrote:
> Hi,
> 
> Thanks for the patch.
> 
> Mathias Nyman <mathias.nyman@linux.intel.com> 於 2025年3月1日 週六 上午12:17寫道:

>> index 8c164340a2c3..c75c2c12ce53 100644
>> --- a/drivers/usb/host/xhci.h
>> +++ b/drivers/usb/host/xhci.h
>> @@ -1371,7 +1371,7 @@ struct xhci_ring {
>>          unsigned int            num_trbs_free; /* used only by xhci DbC */
>>          unsigned int            bounce_buf_len;
>>          enum xhci_ring_type     type;
>> -       bool                    last_td_was_short;
>> +       u32                     last_td_comp_code;
> 
> Should be changed to old_trb_comp_code.
> 

Thanks, forgot to add that last xhci.h change to the patch

-Mathias

Re: [RFT PATCH] xhci: Handle spurious events on Etron host isoc enpoints
Posted by Kuangyi Chiang 9 months, 2 weeks ago
Hi,

Kuangyi Chiang <ki.chiang65@gmail.com> 於 2025年3月1日 週六 上午10:05寫道:
>
> Hi,
>
> Thanks for the patch.
>
> Mathias Nyman <mathias.nyman@linux.intel.com> 於 2025年3月1日 週六 上午12:17寫道:
> >
> > Unplugging a USB3.0 webcam from Etron hosts while streaming results
> > in errors like this:
> >
> > [ 2.646387] xhci_hcd 0000:03:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 18 comp_code 13
> > [ 2.646446] xhci_hcd 0000:03:00.0: Looking for event-dma 000000002fdf8630 trb-start 000000002fdf8640 trb-end 000000002fdf8650
> > [ 2.646560] xhci_hcd 0000:03:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 18 comp_code 13
> > [ 2.646568] xhci_hcd 0000:03:00.0: Looking for event-dma 000000002fdf8660 trb-start 000000002fdf8670 trb-end 000000002fdf8670
> >
> > Etron xHC generates two transfer events for the TRB if an error is
> > detected while processing the last TRB of an isoc TD.
> >
> > The first event can be any sort of error (like USB Transaction or
> > Babble Detected, etc), and the final event is Success.
> >
> > The xHCI driver will handle the TD after the first event and remove it
> > from its internal list, and then print an "Transfer event TRB DMA ptr
> > not part of current TD" error message after the final event.
> >
> > Commit 5372c65e1311 ("xhci: process isoc TD properly when there was a
> > transaction error mid TD.") is designed to address isoc transaction
> > errors, but unfortunately it doesn't account for this scenario.
> >
> > This issue is similar to the XHCI_SPURIOUS_SUCCESS case where a success
> > event follows a 'short transfer' event, but the TD the event points to
> > is already given back.
> >
> > Expand the spurious success 'short transfer' event handling to cover
> > the spurious success after error on Etron hosts.
> >
> > Kuangyi Chiang reported this issue and submitted a different solution
> > based on using error_mid_td. This commit message is mostly taken
> > from that patch.
> >
> > Reported-by: Kuangyi Chiang <ki.chiang65@gmail.com>
> > Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
> > ---
> >  drivers/usb/host/xhci-ring.c | 36 +++++++++++++++++++++++++-----------
> >  drivers/usb/host/xhci.h      |  2 +-
> >  2 files changed, 26 insertions(+), 12 deletions(-)
> >
> > diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
> > index 965bffce301e..3d3e6cd69019 100644
> > --- a/drivers/usb/host/xhci-ring.c
> > +++ b/drivers/usb/host/xhci-ring.c
> > @@ -2644,6 +2644,22 @@ static int handle_transferless_tx_event(struct xhci_hcd *xhci, struct xhci_virt_
> >         return 0;
> >  }
> >
> > +static bool xhci_spurious_success_tx_event(struct xhci_hcd *xhci,
> > +                                          struct xhci_ring *ring)
> > +{
> > +       switch (ring->old_trb_comp_code) {
> > +       case COMP_SHORT_PACKET:
> > +               return xhci->quirks & XHCI_SPURIOUS_SUCCESS;
> > +       case COMP_USB_TRANSACTION_ERROR:
> > +       case COMP_BABBLE_DETECTED_ERROR:
> > +       case COMP_ISOCH_BUFFER_OVERRUN:
> > +               return xhci->quirks & XHCI_ETRON_HOST &&
> > +                       ring->type == TYPE_ISOC;
> > +       default:
> > +               return false;
> > +       }
> > +}
> > +
> >  /*
> >   * If this function returns an error condition, it means it got a Transfer
> >   * event with a corrupted Slot ID, Endpoint ID, or TRB DMA address.
> > @@ -2697,8 +2713,8 @@ static int handle_tx_event(struct xhci_hcd *xhci,
> >         case COMP_SUCCESS:
> >                 if (EVENT_TRB_LEN(le32_to_cpu(event->transfer_len)) != 0) {
> >                         trb_comp_code = COMP_SHORT_PACKET;
> > -                       xhci_dbg(xhci, "Successful completion on short TX for slot %u ep %u with last td short %d\n",
> > -                                slot_id, ep_index, ep_ring->last_td_was_short);
> > +                       xhci_dbg(xhci, "Successful completion on short TX for slot %u ep %u with last td comp code %d\n",
> > +                                slot_id, ep_index, ep_ring->old_trb_comp_code);
> >                 }
> >                 break;
> >         case COMP_SHORT_PACKET:
> > @@ -2846,7 +2862,7 @@ static int handle_tx_event(struct xhci_hcd *xhci,
> >                  */
> >                 if (trb_comp_code != COMP_STOPPED &&
> >                     trb_comp_code != COMP_STOPPED_LENGTH_INVALID &&
> > -                   !ep_ring->last_td_was_short) {
> > +                   !xhci_spurious_success_tx_event(xhci, ep_ring)) {
> >                         xhci_warn(xhci, "Event TRB for slot %u ep %u with no TDs queued\n",
> >                                   slot_id, ep_index);
> >                 }
> > @@ -2890,11 +2906,12 @@ static int handle_tx_event(struct xhci_hcd *xhci,
> >
> >                         /*
> >                          * Some hosts give a spurious success event after a short
> > -                        * transfer. Ignore it.
> > +                        * transfer or error on last TRB. Ignore it.
> >                          */
> > -                       if ((xhci->quirks & XHCI_SPURIOUS_SUCCESS) &&
> > -                           ep_ring->last_td_was_short) {
> > -                               ep_ring->last_td_was_short = false;
> > +                       if (xhci_spurious_success_tx_event(xhci, ep_ring)) {
> > +                               xhci_dbg(xhci, "Spurious event dma %pad, comp_code %u after %u\n",
> > +                                        &ep_trb_dma, trb_comp_code, ep_ring->old_trb_comp_code);
> > +                               ep_ring->old_trb_comp_code = trb_comp_code;
> >                                 return 0;
> >                         }
> >
> > @@ -2922,10 +2939,7 @@ static int handle_tx_event(struct xhci_hcd *xhci,
> >          */
> >         } while (ep->skip);
> >
> > -       if (trb_comp_code == COMP_SHORT_PACKET)
> > -               ep_ring->last_td_was_short = true;
> > -       else
> > -               ep_ring->last_td_was_short = false;
> > +       ep_ring->old_trb_comp_code = trb_comp_code;
> >
> >         ep_trb = &ep_seg->trbs[(ep_trb_dma - ep_seg->dma) / sizeof(*ep_trb)];
> >         trace_xhci_handle_transfer(ep_ring, (struct xhci_generic_trb *) ep_trb, ep_trb_dma);
> > diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
> > index 8c164340a2c3..c75c2c12ce53 100644
> > --- a/drivers/usb/host/xhci.h
> > +++ b/drivers/usb/host/xhci.h
> > @@ -1371,7 +1371,7 @@ struct xhci_ring {
> >         unsigned int            num_trbs_free; /* used only by xhci DbC */
> >         unsigned int            bounce_buf_len;
> >         enum xhci_ring_type     type;
> > -       bool                    last_td_was_short;
> > +       u32                     last_td_comp_code;
>
> Should be changed to old_trb_comp_code.
>
> >         struct radix_tree_root  *trb_address_map;
> >  };
> >
> > --
> > 2.43.0
> >
>
> I'll test this later.

It works. See the below dynamic debug messages:

[ 1138.281772] xhci_hcd 0000:04:00.0: Successful completion on short
TX for slot 1 ep 18 with last td comp code 13
[ 1138.281777] xhci_hcd 0000:04:00.0: Spurious event dma
0x000000010b5ba600, comp_code 13 after 13
[ 1138.282013] xhci_hcd 0000:04:00.0: Successful completion on short
TX for slot 1 ep 18 with last td comp code 13
[ 1138.282019] xhci_hcd 0000:04:00.0: Spurious event dma
0x000000010b5ba630, comp_code 13 after 13
[ 1138.282271] xhci_hcd 0000:04:00.0: Successful completion on short
TX for slot 1 ep 18 with last td comp code 13
[ 1138.282277] xhci_hcd 0000:04:00.0: Spurious event dma
0x000000010b5ba660, comp_code 13 after 13
[ 1138.282420] xhci_hcd 0000:04:00.0: Successful completion on short
TX for slot 1 ep 18 with last td comp code 13
[ 1138.282425] xhci_hcd 0000:04:00.0: Spurious event dma
0x000000010b5ba680, comp_code 13 after 13
[ 1138.282653] xhci_hcd 0000:04:00.0: Transfer error for slot 1 ep 18
on endpoint
[ 1138.282659] xhci_hcd 0000:04:00.0: Error mid isoc TD, wait for
final completion event
[ 1138.282779] xhci_hcd 0000:04:00.0: Successful completion on short
TX for slot 1 ep 18 with last td comp code 4
[ 1138.282785] xhci_hcd 0000:04:00.0: Transfer error for slot 1 ep 18
on endpoint
[ 1138.282911] xhci_hcd 0000:04:00.0: Successful completion on short
TX for slot 1 ep 18 with last td comp code 4
[ 1138.282916] xhci_hcd 0000:04:00.0: Spurious event dma
0x000000010b5ba6c0, comp_code 13 after 4
[ 1138.282920] xhci_hcd 0000:04:00.0: Transfer error for slot 1 ep 18
on endpoint
[ 1138.282923] xhci_hcd 0000:04:00.0: Error mid isoc TD, wait for
final completion event
[ 1138.283039] xhci_hcd 0000:04:00.0: Successful completion on short
TX for slot 1 ep 18 with last td comp code 4
[ 1138.283045] xhci_hcd 0000:04:00.0: Transfer error for slot 1 ep 18
on endpoint
[ 1138.283163] xhci_hcd 0000:04:00.0: Successful completion on short
TX for slot 1 ep 18 with last td comp code 4
[ 1138.283167] xhci_hcd 0000:04:00.0: Spurious event dma
0x000000010b5ba6f0, comp_code 13 after 4

>
> Thanks,
> Kuangyi Chiang

Thanks,
Kuangyi Chiang
Re: [RFT PATCH] xhci: Handle spurious events on Etron host isoc enpoints
Posted by kernel test robot 9 months, 3 weeks ago
Hi Mathias,

kernel test robot noticed the following build errors:

[auto build test ERROR on usb/usb-testing]
[also build test ERROR on usb/usb-next usb/usb-linus linus/master v6.14-rc4 next-20250228]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Mathias-Nyman/xhci-Handle-spurious-events-on-Etron-host-isoc-enpoints/20250301-001842
base:   https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git usb-testing
patch link:    https://lore.kernel.org/r/20250228161824.3164826-1-mathias.nyman%40linux.intel.com
patch subject: [RFT PATCH] xhci: Handle spurious events on Etron host isoc enpoints
config: hexagon-randconfig-002-20250301 (https://download.01.org/0day-ci/archive/20250301/202503010302.yhUVRgse-lkp@intel.com/config)
compiler: clang version 21.0.0git (https://github.com/llvm/llvm-project 14170b16028c087ca154878f5ed93d3089a965c6)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250301/202503010302.yhUVRgse-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202503010302.yhUVRgse-lkp@intel.com/

All errors (new ones prefixed by >>):

>> drivers/usb/host/xhci-ring.c:2650:16: error: no member named 'old_trb_comp_code' in 'struct xhci_ring'
    2650 |         switch (ring->old_trb_comp_code) {
         |                 ~~~~  ^
   drivers/usb/host/xhci-ring.c:2717:34: error: no member named 'old_trb_comp_code' in 'struct xhci_ring'
    2717 |                                  slot_id, ep_index, ep_ring->old_trb_comp_code);
         |                                                     ~~~~~~~  ^
   drivers/usb/host/xhci.h:1733:56: note: expanded from macro 'xhci_dbg'
    1733 |         dev_dbg(xhci_to_hcd(xhci)->self.controller , fmt , ## args)
         |                                                               ^~~~
   include/linux/dev_printk.h:165:39: note: expanded from macro 'dev_dbg'
     165 |         dynamic_dev_dbg(dev, dev_fmt(fmt), ##__VA_ARGS__)
         |                                              ^~~~~~~~~~~
   include/linux/dynamic_debug.h:274:19: note: expanded from macro 'dynamic_dev_dbg'
     274 |                            dev, fmt, ##__VA_ARGS__)
         |                                        ^~~~~~~~~~~
   include/linux/dynamic_debug.h:250:59: note: expanded from macro '_dynamic_func_call'
     250 |         _dynamic_func_call_cls(_DPRINTK_CLASS_DFLT, fmt, func, ##__VA_ARGS__)
         |                                                                  ^~~~~~~~~~~
   include/linux/dynamic_debug.h:248:65: note: expanded from macro '_dynamic_func_call_cls'
     248 |         __dynamic_func_call_cls(__UNIQUE_ID(ddebug), cls, fmt, func, ##__VA_ARGS__)
         |                                                                        ^~~~~~~~~~~
   include/linux/dynamic_debug.h:224:15: note: expanded from macro '__dynamic_func_call_cls'
     224 |                 func(&id, ##__VA_ARGS__);                       \
         |                             ^~~~~~~~~~~
   drivers/usb/host/xhci-ring.c:2913:44: error: no member named 'old_trb_comp_code' in 'struct xhci_ring'
    2913 |                                          &ep_trb_dma, trb_comp_code, ep_ring->old_trb_comp_code);
         |                                                                      ~~~~~~~  ^
   drivers/usb/host/xhci.h:1733:56: note: expanded from macro 'xhci_dbg'
    1733 |         dev_dbg(xhci_to_hcd(xhci)->self.controller , fmt , ## args)
         |                                                               ^~~~
   include/linux/dev_printk.h:165:39: note: expanded from macro 'dev_dbg'
     165 |         dynamic_dev_dbg(dev, dev_fmt(fmt), ##__VA_ARGS__)
         |                                              ^~~~~~~~~~~
   include/linux/dynamic_debug.h:274:19: note: expanded from macro 'dynamic_dev_dbg'
     274 |                            dev, fmt, ##__VA_ARGS__)
         |                                        ^~~~~~~~~~~
   include/linux/dynamic_debug.h:250:59: note: expanded from macro '_dynamic_func_call'
     250 |         _dynamic_func_call_cls(_DPRINTK_CLASS_DFLT, fmt, func, ##__VA_ARGS__)
         |                                                                  ^~~~~~~~~~~
   include/linux/dynamic_debug.h:248:65: note: expanded from macro '_dynamic_func_call_cls'
     248 |         __dynamic_func_call_cls(__UNIQUE_ID(ddebug), cls, fmt, func, ##__VA_ARGS__)
         |                                                                        ^~~~~~~~~~~
   include/linux/dynamic_debug.h:224:15: note: expanded from macro '__dynamic_func_call_cls'
     224 |                 func(&id, ##__VA_ARGS__);                       \
         |                             ^~~~~~~~~~~
   drivers/usb/host/xhci-ring.c:2914:14: error: no member named 'old_trb_comp_code' in 'struct xhci_ring'
    2914 |                                 ep_ring->old_trb_comp_code = trb_comp_code;
         |                                 ~~~~~~~  ^
   drivers/usb/host/xhci-ring.c:2942:11: error: no member named 'old_trb_comp_code' in 'struct xhci_ring'
    2942 |         ep_ring->old_trb_comp_code = trb_comp_code;
         |         ~~~~~~~  ^
   5 errors generated.


vim +2650 drivers/usb/host/xhci-ring.c

  2646	
  2647	static bool xhci_spurious_success_tx_event(struct xhci_hcd *xhci,
  2648						   struct xhci_ring *ring)
  2649	{
> 2650		switch (ring->old_trb_comp_code) {
  2651		case COMP_SHORT_PACKET:
  2652			return xhci->quirks & XHCI_SPURIOUS_SUCCESS;
  2653		case COMP_USB_TRANSACTION_ERROR:
  2654		case COMP_BABBLE_DETECTED_ERROR:
  2655		case COMP_ISOCH_BUFFER_OVERRUN:
  2656			return xhci->quirks & XHCI_ETRON_HOST &&
  2657				ring->type == TYPE_ISOC;
  2658		default:
  2659			return false;
  2660		}
  2661	}
  2662	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Re: [RFT PATCH] xhci: Handle spurious events on Etron host isoc enpoints
Posted by kernel test robot 9 months, 3 weeks ago
Hi Mathias,

kernel test robot noticed the following build errors:

[auto build test ERROR on usb/usb-testing]
[also build test ERROR on usb/usb-next usb/usb-linus linus/master v6.14-rc4 next-20250228]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Mathias-Nyman/xhci-Handle-spurious-events-on-Etron-host-isoc-enpoints/20250301-001842
base:   https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git usb-testing
patch link:    https://lore.kernel.org/r/20250228161824.3164826-1-mathias.nyman%40linux.intel.com
patch subject: [RFT PATCH] xhci: Handle spurious events on Etron host isoc enpoints
config: sh-randconfig-002-20250301 (https://download.01.org/0day-ci/archive/20250301/202503010346.46nbVSmT-lkp@intel.com/config)
compiler: sh4-linux-gcc (GCC) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250301/202503010346.46nbVSmT-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202503010346.46nbVSmT-lkp@intel.com/

All error/warnings (new ones prefixed by >>):

   drivers/usb/host/xhci-ring.c: In function 'xhci_spurious_success_tx_event':
>> drivers/usb/host/xhci-ring.c:2650:21: error: 'struct xhci_ring' has no member named 'old_trb_comp_code'
    2650 |         switch (ring->old_trb_comp_code) {
         |                     ^~
   In file included from include/linux/device.h:15,
                    from include/linux/dma-mapping.h:5,
                    from drivers/usb/host/xhci-ring.c:59:
   drivers/usb/host/xhci-ring.c: In function 'handle_tx_event':
   drivers/usb/host/xhci-ring.c:2717:60: error: 'struct xhci_ring' has no member named 'old_trb_comp_code'
    2717 |                                  slot_id, ep_index, ep_ring->old_trb_comp_code);
         |                                                            ^~
   include/linux/dev_printk.h:139:56: note: in definition of macro 'dev_no_printk'
     139 |                         _dev_printk(level, dev, fmt, ##__VA_ARGS__);    \
         |                                                        ^~~~~~~~~~~
   drivers/usb/host/xhci.h:1733:9: note: in expansion of macro 'dev_dbg'
    1733 |         dev_dbg(xhci_to_hcd(xhci)->self.controller , fmt , ## args)
         |         ^~~~~~~
   drivers/usb/host/xhci-ring.c:2716:25: note: in expansion of macro 'xhci_dbg'
    2716 |                         xhci_dbg(xhci, "Successful completion on short TX for slot %u ep %u with last td comp code %d\n",
         |                         ^~~~~~~~
   drivers/usb/host/xhci-ring.c:2913:77: error: 'struct xhci_ring' has no member named 'old_trb_comp_code'
    2913 |                                          &ep_trb_dma, trb_comp_code, ep_ring->old_trb_comp_code);
         |                                                                             ^~
   include/linux/dev_printk.h:139:56: note: in definition of macro 'dev_no_printk'
     139 |                         _dev_printk(level, dev, fmt, ##__VA_ARGS__);    \
         |                                                        ^~~~~~~~~~~
   drivers/usb/host/xhci.h:1733:9: note: in expansion of macro 'dev_dbg'
    1733 |         dev_dbg(xhci_to_hcd(xhci)->self.controller , fmt , ## args)
         |         ^~~~~~~
   drivers/usb/host/xhci-ring.c:2912:33: note: in expansion of macro 'xhci_dbg'
    2912 |                                 xhci_dbg(xhci, "Spurious event dma %pad, comp_code %u after %u\n",
         |                                 ^~~~~~~~
   drivers/usb/host/xhci-ring.c:2914:40: error: 'struct xhci_ring' has no member named 'old_trb_comp_code'
    2914 |                                 ep_ring->old_trb_comp_code = trb_comp_code;
         |                                        ^~
   drivers/usb/host/xhci-ring.c:2942:16: error: 'struct xhci_ring' has no member named 'old_trb_comp_code'
    2942 |         ep_ring->old_trb_comp_code = trb_comp_code;
         |                ^~
   drivers/usb/host/xhci-ring.c: In function 'xhci_spurious_success_tx_event':
>> drivers/usb/host/xhci-ring.c:2661:1: warning: control reaches end of non-void function [-Wreturn-type]
    2661 | }
         | ^


vim +2650 drivers/usb/host/xhci-ring.c

  2646	
  2647	static bool xhci_spurious_success_tx_event(struct xhci_hcd *xhci,
  2648						   struct xhci_ring *ring)
  2649	{
> 2650		switch (ring->old_trb_comp_code) {
  2651		case COMP_SHORT_PACKET:
  2652			return xhci->quirks & XHCI_SPURIOUS_SUCCESS;
  2653		case COMP_USB_TRANSACTION_ERROR:
  2654		case COMP_BABBLE_DETECTED_ERROR:
  2655		case COMP_ISOCH_BUFFER_OVERRUN:
  2656			return xhci->quirks & XHCI_ETRON_HOST &&
  2657				ring->type == TYPE_ISOC;
  2658		default:
  2659			return false;
  2660		}
> 2661	}
  2662	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki