[PATCH] usb: xhci: Don't unchain link TRBs on quirky HCs

Michal Pecio posted 1 patch 1 month, 1 week ago
drivers/usb/host/xhci-ring.c | 27 ++++++++++++++++-----------
1 file changed, 16 insertions(+), 11 deletions(-)
[PATCH] usb: xhci: Don't unchain link TRBs on quirky HCs
Posted by Michal Pecio 1 month, 1 week ago
Some old HCs ignore transfer ring link TRBs whose chain bit is unset.
This breaks endpoint operation and sometimes makes it execute other
ring's TDs, which may corrupt their buffers or cause unwanted device
action. We avoid this by chaining all link TRBs on affected rings.

Fix an omission which allows them to be unchained by cancelling TDs.

The patch was tested by reproducing this condition on an isochronous
endpoint (non-power-of-two TDs are sometimes split not to cross 64K)
and printing link TRBs in trb_to_noop() on good and buggy HCs.

Actual hardware malfunction is rare since it requires Missed Service
Error shortly before the unchained link TRB, at least on NEC and AMD.
I have never seen it after commit bb0ba4cb1065 ("usb: xhci: Apply the
link chain quirk on NEC isoc endpoints"), but it's Russian roulette
and I can't test all affected hosts and workloads. Fairly often MSEs
happen after cancellation because the endpoint was stopped.

Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
---
 drivers/usb/host/xhci-ring.c | 27 ++++++++++++++++-----------
 1 file changed, 16 insertions(+), 11 deletions(-)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index a9e468ea19c5..fc0043ca85a4 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -128,11 +128,11 @@ static void inc_td_cnt(struct urb *urb)
 	urb_priv->num_tds_done++;
 }
 
-static void trb_to_noop(union xhci_trb *trb, u32 noop_type)
+static void trb_to_noop(union xhci_trb *trb, u32 noop_type, bool unchain_links)
 {
 	if (trb_is_link(trb)) {
-		/* unchain chained link TRBs */
-		trb->link.control &= cpu_to_le32(~TRB_CHAIN);
+		if (unchain_links)
+			trb->link.control &= cpu_to_le32(~TRB_CHAIN);
 	} else {
 		trb->generic.field[0] = 0;
 		trb->generic.field[1] = 0;
@@ -465,7 +465,7 @@ static void xhci_handle_stopped_cmd_ring(struct xhci_hcd *xhci,
 		xhci_dbg(xhci, "Turn aborted command %p to no-op\n",
 			 i_cmd->command_trb);
 
-		trb_to_noop(i_cmd->command_trb, TRB_CMD_NOOP);
+		trb_to_noop(i_cmd->command_trb, TRB_CMD_NOOP, false);
 
 		/*
 		 * caller waiting for completion is called when command
@@ -797,13 +797,18 @@ static int xhci_move_dequeue_past_td(struct xhci_hcd *xhci,
  * (The last TRB actually points to the ring enqueue pointer, which is not part
  * of this TD.)  This is used to remove partially enqueued isoc TDs from a ring.
  */
-static void td_to_noop(struct xhci_td *td, bool flip_cycle)
+static void td_to_noop(struct xhci_hcd *xhci, struct xhci_virt_ep *ep,
+			struct xhci_td *td, bool flip_cycle)
 {
+	bool unchain_links;
 	struct xhci_segment *seg	= td->start_seg;
 	union xhci_trb *trb		= td->start_trb;
 
+	/* link TRBs should now be unchained, but some old HCs expect otherwise */
+	unchain_links = !xhci_link_chain_quirk(xhci, ep->ring ? ep->ring->type : TYPE_STREAM);
+
 	while (1) {
-		trb_to_noop(trb, TRB_TR_NOOP);
+		trb_to_noop(trb, TRB_TR_NOOP, unchain_links);
 
 		/* flip cycle if asked to */
 		if (flip_cycle && trb != td->start_trb && trb != td->end_trb)
@@ -1091,16 +1096,16 @@ static int xhci_invalidate_cancelled_tds(struct xhci_virt_ep *ep)
 						  "Found multiple active URBs %p and %p in stream %u?\n",
 						  td->urb, cached_td->urb,
 						  td->urb->stream_id);
-					td_to_noop(cached_td, false);
+					td_to_noop(xhci, ep, cached_td, false);
 					cached_td->cancel_status = TD_CLEARED;
 				}
-				td_to_noop(td, false);
+				td_to_noop(xhci, ep, td, false);
 				td->cancel_status = TD_CLEARING_CACHE;
 				cached_td = td;
 				break;
 			}
 		} else {
-			td_to_noop(td, false);
+			td_to_noop(xhci, ep, td, false);
 			td->cancel_status = TD_CLEARED;
 		}
 	}
@@ -1125,7 +1130,7 @@ static int xhci_invalidate_cancelled_tds(struct xhci_virt_ep *ep)
 				continue;
 			xhci_warn(xhci, "Failed to clear cancelled cached URB %p, mark clear anyway\n",
 				  td->urb);
-			td_to_noop(td, false);
+			td_to_noop(xhci, ep, td, false);
 			td->cancel_status = TD_CLEARED;
 		}
 	}
@@ -4273,7 +4278,7 @@ static int xhci_queue_isoc_tx(struct xhci_hcd *xhci, gfp_t mem_flags,
 	 */
 	urb_priv->td[0].end_trb = ep_ring->enqueue;
 	/* Every TRB except the first & last will have its cycle bit flipped. */
-	td_to_noop(&urb_priv->td[0], true);
+	td_to_noop(xhci, xep, &urb_priv->td[0], true);
 
 	/* Reset the ring enqueue back to the first TRB and its cycle bit. */
 	ep_ring->enqueue = urb_priv->td[0].start_trb;
-- 
2.48.1
Re: [PATCH] usb: xhci: Don't unchain link TRBs on quirky HCs
Posted by Mathias Nyman 1 month, 1 week ago
On 11/7/25 12:08, Michal Pecio wrote:
> Some old HCs ignore transfer ring link TRBs whose chain bit is unset.
> This breaks endpoint operation and sometimes makes it execute other
> ring's TDs, which may corrupt their buffers or cause unwanted device
> action. We avoid this by chaining all link TRBs on affected rings.
> 
> Fix an omission which allows them to be unchained by cancelling TDs.
> 
> The patch was tested by reproducing this condition on an isochronous
> endpoint (non-power-of-two TDs are sometimes split not to cross 64K)
> and printing link TRBs in trb_to_noop() on good and buggy HCs.
> 
> Actual hardware malfunction is rare since it requires Missed Service
> Error shortly before the unchained link TRB, at least on NEC and AMD.
> I have never seen it after commit bb0ba4cb1065 ("usb: xhci: Apply the
> link chain quirk on NEC isoc endpoints"), but it's Russian roulette
> and I can't test all affected hosts and workloads. Fairly often MSEs
> happen after cancellation because the endpoint was stopped.
> 
> Signed-off-by: Michal Pecio <michal.pecio@gmail.com>

Makes sense, thanks for fixing this

> ---
>   drivers/usb/host/xhci-ring.c | 27 ++++++++++++++++-----------
>   1 file changed, 16 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
> index a9e468ea19c5..fc0043ca85a4 100644
> --- a/drivers/usb/host/xhci-ring.c
> +++ b/drivers/usb/host/xhci-ring.c
> @@ -128,11 +128,11 @@ static void inc_td_cnt(struct urb *urb)
>   	urb_priv->num_tds_done++;
>   }
>   
> -static void trb_to_noop(union xhci_trb *trb, u32 noop_type)
> +static void trb_to_noop(union xhci_trb *trb, u32 noop_type, bool unchain_links)
>   {
>   	if (trb_is_link(trb)) {
> -		/* unchain chained link TRBs */
> -		trb->link.control &= cpu_to_le32(~TRB_CHAIN);
> +		if (unchain_links)
> +			trb->link.control &= cpu_to_le32(~TRB_CHAIN);
>   	} else {
>   		trb->generic.field[0] = 0;
>   		trb->generic.field[1] = 0;
> @@ -465,7 +465,7 @@ static void xhci_handle_stopped_cmd_ring(struct xhci_hcd *xhci,
>   		xhci_dbg(xhci, "Turn aborted command %p to no-op\n",
>   			 i_cmd->command_trb);
>   
> -		trb_to_noop(i_cmd->command_trb, TRB_CMD_NOOP);
> +		trb_to_noop(i_cmd->command_trb, TRB_CMD_NOOP, false);
>   
>   		/*
>   		 * caller waiting for completion is called when command
> @@ -797,13 +797,18 @@ static int xhci_move_dequeue_past_td(struct xhci_hcd *xhci,
>    * (The last TRB actually points to the ring enqueue pointer, which is not part
>    * of this TD.)  This is used to remove partially enqueued isoc TDs from a ring.
>    */
> -static void td_to_noop(struct xhci_td *td, bool flip_cycle)
> +static void td_to_noop(struct xhci_hcd *xhci, struct xhci_virt_ep *ep,
> +			struct xhci_td *td, bool flip_cycle)

we could avoid passing xhci pointer to td_to_noop() and just grab it from
the xhci_virt_ep structure instead. i.e. ep->xhci

Otherwise this looks good to me

Thanks
Mathias
Re: [PATCH] usb: xhci: Don't unchain link TRBs on quirky HCs
Posted by Michal Pecio 1 month ago
On Tue, 11 Nov 2025 18:19:38 +0200, Mathias Nyman wrote:
> On 11/7/25 12:08, Michal Pecio wrote:
> > +static void td_to_noop(struct xhci_hcd *xhci, struct xhci_virt_ep *ep,
> > +			struct xhci_td *td, bool flip_cycle)  
> 
> we could avoid passing xhci pointer to td_to_noop() and just grab it from
> the xhci_virt_ep structure instead. i.e. ep->xhci

I can do a v2 if you want.

But OTOH, I didn't expect such pointer to exist (though I'm sure I must
have seen it many times) because it doesn't seem strictly necessary.

Maybe do the reverse and get rid of ep->xhci, or stop adding new users
and clean up existing ones along the way?

Main users are invalidate_cancelled_tds()/giveback_invalidated_tds(),
their callers all have xhci and could easily supply it to them. And we
even discussed removing the latter completely, but I got sidetracked by
issues with URB_ZERO_PACKET.

Another user is xhci_handle_cmd_set_deq() which already has xhci.
I have a compile-tested patch which removes it completely right now,
it took five minutes and zero mental effort to prepare.

Regards,
Michal
Re: [PATCH] usb: xhci: Don't unchain link TRBs on quirky HCs
Posted by Mathias Nyman 1 month ago
On 11/14/25 13:32, Michal Pecio wrote:
> On Tue, 11 Nov 2025 18:19:38 +0200, Mathias Nyman wrote:
>> On 11/7/25 12:08, Michal Pecio wrote:
>>> +static void td_to_noop(struct xhci_hcd *xhci, struct xhci_virt_ep *ep,
>>> +			struct xhci_td *td, bool flip_cycle)
>>
>> we could avoid passing xhci pointer to td_to_noop() and just grab it from
>> the xhci_virt_ep structure instead. i.e. ep->xhci
> 
> I can do a v2 if you want.
> 
> But OTOH, I didn't expect such pointer to exist (though I'm sure I must
> have seen it many times) because it doesn't seem strictly necessary.
> 
> Maybe do the reverse and get rid of ep->xhci, or stop adding new users
> and clean up existing ones along the way?
> 
> Main users are invalidate_cancelled_tds()/giveback_invalidated_tds(),
> their callers all have xhci and could easily supply it to them. And we
> even discussed removing the latter completely, but I got sidetracked by
> issues with URB_ZERO_PACKET.

Ok, I'll take this v1 now as it is.
It prevents a potential issue, and it's tested by you.

We can then later figure out how we want to clean up and refactor things.

Thanks
Mathias