[PATCH v2 5/5] xhci: Correct handling of one-TRB isoc TD on Etron xHCI host

Kuangyi Chiang posted 5 patches 4 weeks ago
[PATCH v2 5/5] xhci: Correct handling of one-TRB isoc TD on Etron xHCI host
Posted by Kuangyi Chiang 4 weeks ago
Unplugging a USB3.0 webcam while streaming results in errors
like this:

[ 132.646387] xhci_hcd 0000:03:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 18 comp_code 13
[ 132.646446] xhci_hcd 0000:03:00.0: Looking for event-dma 000000002fdf8630 trb-start 000000002fdf8640 trb-end 000000002fdf8650 seg-start 000000002fdf8000 seg-end 000000002fdf8ff0
[ 132.646560] xhci_hcd 0000:03:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 18 comp_code 13
[ 132.646568] xhci_hcd 0000:03:00.0: Looking for event-dma 000000002fdf8660 trb-start 000000002fdf8670 trb-end 000000002fdf8670 seg-start 000000002fdf8000 seg-end 000000002fdf8ff0

If an error is detected while processing an one-TRB isoc TD,
the Etron xHC generates two transfer events for the TRB that
the error was detected on. The first event is "USB Transcation
Error", and the second event is "Success".

The xHCI driver will handle the TD after the first event and
remove it from its internal list, and then print an "Transfer
event TRB DMA ptr not part of current TD" error message after
the second event.

As a solution, we can set the flag after the first error event
and don't print the error message after the second event if
the flag is set.

Commit ad808333d820 ("Intel xhci: Ignore spurious successful
event.") implements a similar mechanism that we can reuse to
solve this problem since short transfer and transfer error
doesn't occur concurrently. Also, rename the flag to make it
more meaningful.

Check if the XHCI_ETRON_HOST quirk flag is set before invoking
the workaround in process_isoc_td().

This patch doesn't affect other host controllers that have the
XHCI_SPURIOUS_SUCCESS quirk flag applied.

Signed-off-by: Kuangyi Chiang <ki.chiang65@gmail.com>
---
 drivers/usb/host/xhci-ring.c | 26 +++++++++++++++-----------
 drivers/usb/host/xhci.h      |  2 +-
 2 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 9e132b08bfde..33fa8a11c934 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -2437,6 +2437,10 @@ static int process_isoc_td(struct xhci_hcd *xhci, struct xhci_virt_ep *ep,
 		sum_trbs_for_length = true;
 		if (ep_trb != td->last_trb)
 			td->error_mid_td = true;
+		if ((xhci->quirks & XHCI_ETRON_HOST) &&
+		    td->urb->dev->speed >= USB_SPEED_SUPER &&
+		    td->first_trb == td->last_trb)
+			ep_ring->spurious_event = true;
 		break;
 	case COMP_STOPPED:
 		sum_trbs_for_length = true;
@@ -2655,8 +2659,8 @@ static int handle_tx_event(struct xhci_hcd *xhci,
 	case COMP_SUCCESS:
 		if (EVENT_TRB_LEN(le32_to_cpu(event->transfer_len)) != 0) {
 			trb_comp_code = COMP_SHORT_PACKET;
-			xhci_dbg(xhci, "Successful completion on short TX for slot %u ep %u with last td short %d\n",
-				 slot_id, ep_index, ep_ring->last_td_was_short);
+			xhci_dbg(xhci, "Successful completion on short TX for slot %u ep %u with spurious event %d\n",
+				 slot_id, ep_index, ep_ring->spurious_event);
 		}
 		break;
 	case COMP_SHORT_PACKET:
@@ -2801,13 +2805,13 @@ static int handle_tx_event(struct xhci_hcd *xhci,
 	if (list_empty(&ep_ring->td_list)) {
 		/*
 		 * Don't print wanings if ring is empty due to a stopped endpoint generating an
-		 * extra completion event if the device was suspended. Or, a event for the last TRB
-		 * of a short TD we already got a short event for. The short TD is already removed
-		 * from the TD list.
+		 * extra completion event if the device was suspended. Or, the spurious event flag
+		 * is set at the last TD of the TD list due to a short transfer or an one-TRB isoc
+		 * TD error, and such TD is already removed from the TD list.
 		 */
 		if (trb_comp_code != COMP_STOPPED &&
 		    trb_comp_code != COMP_STOPPED_LENGTH_INVALID &&
-		    !ep_ring->last_td_was_short) {
+		    !ep_ring->spurious_event) {
 			xhci_warn(xhci, "Event TRB for slot %u ep %u with no TDs queued\n",
 				  slot_id, ep_index);
 		}
@@ -2851,11 +2855,11 @@ static int handle_tx_event(struct xhci_hcd *xhci,
 
 			/*
 			 * Some hosts give a spurious success event after a short
-			 * transfer. Ignore it.
+			 * transfer or an one-TRB isoc TD error. Ignore it.
 			 */
 			if ((xhci->quirks & XHCI_SPURIOUS_SUCCESS) &&
-			    ep_ring->last_td_was_short) {
-				ep_ring->last_td_was_short = false;
+			    ep_ring->spurious_event) {
+				ep_ring->spurious_event = false;
 				return 0;
 			}
 
@@ -2884,9 +2888,9 @@ static int handle_tx_event(struct xhci_hcd *xhci,
 	} while (ep->skip);
 
 	if (trb_comp_code == COMP_SHORT_PACKET)
-		ep_ring->last_td_was_short = true;
+		ep_ring->spurious_event = true;
 	else
-		ep_ring->last_td_was_short = false;
+		ep_ring->spurious_event = false;
 
 	ep_trb = &ep_seg->trbs[(ep_trb_dma - ep_seg->dma) / sizeof(*ep_trb)];
 	trace_xhci_handle_transfer(ep_ring, (struct xhci_generic_trb *) ep_trb);
diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
index 4f5b732e8944..dca9091b8134 100644
--- a/drivers/usb/host/xhci.h
+++ b/drivers/usb/host/xhci.h
@@ -1359,7 +1359,7 @@ struct xhci_ring {
 	unsigned int		num_trbs_free; /* used only by xhci DbC */
 	unsigned int		bounce_buf_len;
 	enum xhci_ring_type	type;
-	bool			last_td_was_short;
+	bool			spurious_event;
 	struct radix_tree_root	*trb_address_map;
 };
 
-- 
2.25.1
Re: [PATCH v2 5/5] xhci: Correct handling of one-TRB isoc TD on Etron xHCI host
Posted by Mathias Nyman 3 weeks, 4 days ago
On 28.10.2024 4.53, Kuangyi Chiang wrote:
> Unplugging a USB3.0 webcam while streaming results in errors
> like this:
> 
> [ 132.646387] xhci_hcd 0000:03:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 18 comp_code 13
> [ 132.646446] xhci_hcd 0000:03:00.0: Looking for event-dma 000000002fdf8630 trb-start 000000002fdf8640 trb-end 000000002fdf8650 seg-start 000000002fdf8000 seg-end 000000002fdf8ff0
> [ 132.646560] xhci_hcd 0000:03:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 18 comp_code 13
> [ 132.646568] xhci_hcd 0000:03:00.0: Looking for event-dma 000000002fdf8660 trb-start 000000002fdf8670 trb-end 000000002fdf8670 seg-start 000000002fdf8000 seg-end 000000002fdf8ff0
> 
> If an error is detected while processing an one-TRB isoc TD,
> the Etron xHC generates two transfer events for the TRB that
> the error was detected on. The first event is "USB Transcation
> Error", and the second event is "Success".
> 
> The xHCI driver will handle the TD after the first event and
> remove it from its internal list, and then print an "Transfer
> event TRB DMA ptr not part of current TD" error message after
> the second event.
> 
> As a solution, we can set the flag after the first error event
> and don't print the error message after the second event if
> the flag is set.
> 
> Commit ad808333d820 ("Intel xhci: Ignore spurious successful
> event.") implements a similar mechanism that we can reuse to
> solve this problem since short transfer and transfer error
> doesn't occur concurrently. Also, rename the flag to make it
> more meaningful.
> 
> Check if the XHCI_ETRON_HOST quirk flag is set before invoking
> the workaround in process_isoc_td().
> 
> This patch doesn't affect other host controllers that have the
> XHCI_SPURIOUS_SUCCESS quirk flag applied.
> 
> Signed-off-by: Kuangyi Chiang <ki.chiang65@gmail.com>

I'm leaving this out of the series due to both ongoing discussion about
this patch, and because it conflicts with another series touching
handle_tx_event()

All other patches in series are added

Thanks
Mathias
Re: [PATCH v2 5/5] xhci: Correct handling of one-TRB isoc TD on Etron xHCI host
Posted by Michał Pecio 4 weeks ago
Hi,

That's a bug I'm familiar with.

> Unplugging a USB3.0 webcam while streaming results in errors
> like this

Not only unplugging but also any random error due to EMI or bad cable.

> If an error is detected while processing an one-TRB isoc TD,
> the Etron xHC generates two transfer events for the TRB that
> the error was detected on. The first event is "USB Transcation
> Error", and the second event is "Success".

IIRC, it wasn't just Transaction Errors but any sort of error, like
Babble or Bandwidth Overrun. But not sure about Missed Service, etc.

And IIRC I confirmed that it was *not* the case on Short Packet.

Also, I'm 99% sure the problem is not limited to one-TRB TDs, but
it occurs every time there is an error on the last TRB of any TD.

> As a solution, we can set the flag after the first error event
> and don't print the error message after the second event if the
> flag is set.

Yes, but I think it would be better to use error_mid_td instead of
last_td_was_short, so that the TD is only freed on the final event,
not on the first one.

The spec is clear that we should only free TRBs when the xHC is done
with them. Maybe it wouldn't be a problem in this case, and it surely
wouldn't be worse than what happens with Etron today, but IMO it could
be a real (even if rare) problem in other cases when this flag is used,
so I would rather remove the flag and handle short packets as per spec.

Regards,
Michal
Re: [PATCH v2 5/5] xhci: Correct handling of one-TRB isoc TD on Etron xHCI host
Posted by Kuangyi Chiang 3 weeks, 5 days ago
Hi,

Thank you for the review.

Michał Pecio <michal.pecio@gmail.com> 於 2024年10月28日 週一 下午5:54寫道:
>
> Hi,
>
> That's a bug I'm familiar with.
>
> > Unplugging a USB3.0 webcam while streaming results in errors
> > like this
>
> Not only unplugging but also any random error due to EMI or bad cable.
>
> > If an error is detected while processing an one-TRB isoc TD,
> > the Etron xHC generates two transfer events for the TRB that
> > the error was detected on. The first event is "USB Transcation
> > Error", and the second event is "Success".
>
> IIRC, it wasn't just Transaction Errors but any sort of error, like
> Babble or Bandwidth Overrun. But not sure about Missed Service, etc.
>
> And IIRC I confirmed that it was *not* the case on Short Packet.

Yes, it is not.

>
> Also, I'm 99% sure the problem is not limited to one-TRB TDs, but
> it occurs every time there is an error on the last TRB of any TD.

Yes, this can happen, I didn't account for this scenario.

>
> > As a solution, we can set the flag after the first error event
> > and don't print the error message after the second event if the
> > flag is set.
>
> Yes, but I think it would be better to use error_mid_td instead of
> last_td_was_short, so that the TD is only freed on the final event,
> not on the first one.
>
> The spec is clear that we should only free TRBs when the xHC is done
> with them. Maybe it wouldn't be a problem in this case, and it surely
> wouldn't be worse than what happens with Etron today, but IMO it could
> be a real (even if rare) problem in other cases when this flag is used,
> so I would rather remove the flag and handle short packets as per spec.

Thank you for the explanation and suggestion. Maybe I should start
trying to use error_mid_td to solve this problem.

>
> Regards,
> Michal

Thanks,
Kuangyi Chiang