If skipping is deferred to events other than Missed Service Error itsef,
it means we are running on an xHCI 1.0 host and don't know how many TDs
were missed until we reach some ordinary transfer completion event.
And in case of ring xrun, we can't know where the xrun happened either.
If we skip all pending TDs, we may prematurely give back TDs added after
the xrun had occurred, risking data loss or buffer UAF by the xHC.
If we skip none, a driver may become confused and stop working when all
its URBs are missed and appear to be "in flight" forever.
Skip exactly one TD on each xrun event - the first one that was missed,
as we can now be sure that the HC has finished processing it. Provided
that one more TD is queued before any subsequent doorbell ring, it will
become safe to skip another TD by the time we get an xrun again.
Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
---
drivers/usb/host/xhci-ring.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 878abf5b745d..049206a1db76 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -2875,6 +2875,18 @@ static int handle_tx_event(struct xhci_hcd *xhci,
if (!ep_seg && usb_endpoint_xfer_isoc(&td->urb->ep->desc)) {
skip_isoc_td(xhci, td, ep, status);
+
+ if (ring_xrun_event) {
+ /*
+ * If we are here, we are on xHCI 1.0 host with no idea how
+ * many TDs were missed and where the xrun occurred. Don't
+ * skip more TDs, they may have been queued after the xrun.
+ */
+ xhci_dbg(xhci, "Skipped one TD for slot %u ep %u",
+ slot_id, ep_index);
+ break;
+ }
+
if (!list_empty(&ep_ring->td_list))
continue;
--
2.48.1
On 10.2.2025 9.42, Michal Pecio wrote:
> If skipping is deferred to events other than Missed Service Error itsef,
> it means we are running on an xHCI 1.0 host and don't know how many TDs
> were missed until we reach some ordinary transfer completion event.
>
> And in case of ring xrun, we can't know where the xrun happened either.
>
> If we skip all pending TDs, we may prematurely give back TDs added after
> the xrun had occurred, risking data loss or buffer UAF by the xHC.
>
> If we skip none, a driver may become confused and stop working when all
> its URBs are missed and appear to be "in flight" forever.
>
> Skip exactly one TD on each xrun event - the first one that was missed,
> as we can now be sure that the HC has finished processing it. Provided
> that one more TD is queued before any subsequent doorbell ring, it will
> become safe to skip another TD by the time we get an xrun again.
>
> Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
> ---
> drivers/usb/host/xhci-ring.c | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
>
> diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
> index 878abf5b745d..049206a1db76 100644
> --- a/drivers/usb/host/xhci-ring.c
> +++ b/drivers/usb/host/xhci-ring.c
> @@ -2875,6 +2875,18 @@ static int handle_tx_event(struct xhci_hcd *xhci,
>
> if (!ep_seg && usb_endpoint_xfer_isoc(&td->urb->ep->desc)) {
> skip_isoc_td(xhci, td, ep, status);
> +
> + if (ring_xrun_event) {
> + /*
> + * If we are here, we are on xHCI 1.0 host with no idea how
> + * many TDs were missed and where the xrun occurred. Don't
> + * skip more TDs, they may have been queued after the xrun.
> + */
> + xhci_dbg(xhci, "Skipped one TD for slot %u ep %u",
> + slot_id, ep_index);
> + break;
This would be the same as return 0; right?
Whole series looks good, I'll add it
-Mathias
On Tue, 11 Feb 2025 17:41:39 +0200, Mathias Nyman wrote:
> On 10.2.2025 9.42, Michal Pecio wrote:
> > + if (ring_xrun_event) {
> > + /*
> > + * If we are here, we are on xHCI 1.0 host with no idea how
> > + * many TDs were missed and where the xrun occurred. Don't
> > + * skip more TDs, they may have been queued after the xrun.
> > + */
> > + xhci_dbg(xhci, "Skipped one TD for slot %u ep %u",
> > + slot_id, ep_index);
> > + break;
>
> This would be the same as return 0; right?
>
> Whole series looks good, I'll add it
I hope you haven't sent it out yet because I found two minor issues.
Firstly,
[PATCH 3/5] usb: xhci: Fix isochronous Ring Underrun/Overrun event handling
increases the number of xrun events that we handle but doesn't suppress
the "Event TRB for slot %u ep %u with no TDs queued\n" warning, so the
warning started to show up sometimes for no good reason. The fix is to
add ring_xrun_event to the list of exception for this warning.
Secondly,
[PATCH 5/5] usb: xhci: Skip only one TD on Ring Underrun/Overrun
can be improved to clear the skip flag if skipped TD was the only one.
This eliminates any confusion and risk of skipping bugs in the future.
The change is a matter of moving that code to a different branch.
I also changed 'break' to 'return 0' because it gets hard to follow at
this level of indentation.
I'll send a v2 of those two patches. Sorry for any inconvenience.
Michal
On 21.2.2025 3.17, Michał Pecio wrote:
> On Tue, 11 Feb 2025 17:41:39 +0200, Mathias Nyman wrote:
>> On 10.2.2025 9.42, Michal Pecio wrote:
>>> + if (ring_xrun_event) {
>>> + /*
>>> + * If we are here, we are on xHCI 1.0 host with no idea how
>>> + * many TDs were missed and where the xrun occurred. Don't
>>> + * skip more TDs, they may have been queued after the xrun.
>>> + */
>>> + xhci_dbg(xhci, "Skipped one TD for slot %u ep %u",
>>> + slot_id, ep_index);
>>> + break;
>>
>> This would be the same as return 0; right?
>>
>> Whole series looks good, I'll add it
>
> I hope you haven't sent it out yet because I found two minor issues.
>
>
> Firstly,
> [PATCH 3/5] usb: xhci: Fix isochronous Ring Underrun/Overrun event handling
>
> increases the number of xrun events that we handle but doesn't suppress
> the "Event TRB for slot %u ep %u with no TDs queued\n" warning, so the
> warning started to show up sometimes for no good reason. The fix is to
> add ring_xrun_event to the list of exception for this warning.
>
>
> Secondly,
> [PATCH 5/5] usb: xhci: Skip only one TD on Ring Underrun/Overrun
>
> can be improved to clear the skip flag if skipped TD was the only one.
> This eliminates any confusion and risk of skipping bugs in the future.
> The change is a matter of moving that code to a different branch.
>
> I also changed 'break' to 'return 0' because it gets hard to follow at
> this level of indentation.
>
>
> I'll send a v2 of those two patches. Sorry for any inconvenience.
Patches updated, they are now in my for-usb-next branch
Thanks
Mathias
The TRB pointer of these events points at enqueue at the time of error
occurrence on xHCI 1.1+ HCs or it's NULL on older ones. By the time we
are handling the event, a new TD may be queued at this ring position.
I can trigger this race by rising interrupt moderation to increase IRQ
handling delay. Similar delay may occur naturally due to system load.
If this ever happens after a Missed Service Error, missed TDs will be
skipped and the new TD processed as if it matched the event. It could
be given back prematurely, risking data loss or buffer UAF by the xHC.
Don't complete TDs on xrun events and don't warn if queued TDs don't
match the event's TRB pointer, which can be NULL or a link/no-op TRB.
Now that it's safe, also handle xrun events if the skip flag is clear.
This ensures completion of any TD stuck in 'error mid TD' state right
before the xrun event, which could happen if a driver submits a finite
number of URBs to a buggy HC and then an error occurs on the last TD.
Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
---
drivers/usb/host/xhci-ring.c | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index a278f284f4c0..4cf17801a233 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -2632,6 +2632,7 @@ static int handle_tx_event(struct xhci_hcd *xhci,
int status = -EINPROGRESS;
struct xhci_ep_ctx *ep_ctx;
u32 trb_comp_code;
+ bool ring_xrun_event = false;
slot_id = TRB_TO_SLOT_ID(le32_to_cpu(event->flags));
ep_index = TRB_TO_EP_ID(le32_to_cpu(event->flags)) - 1;
@@ -2738,14 +2739,12 @@ static int handle_tx_event(struct xhci_hcd *xhci,
* Underrun Event for OUT Isoch endpoint.
*/
xhci_dbg(xhci, "Underrun event on slot %u ep %u\n", slot_id, ep_index);
- if (ep->skip)
- break;
- return 0;
+ ring_xrun_event = true;
+ break;
case COMP_RING_OVERRUN:
xhci_dbg(xhci, "Overrun event on slot %u ep %u\n", slot_id, ep_index);
- if (ep->skip)
- break;
- return 0;
+ ring_xrun_event = true;
+ break;
case COMP_MISSED_SERVICE_ERROR:
/*
* When encounter missed service error, one or more isoc tds
@@ -2818,6 +2817,7 @@ static int handle_tx_event(struct xhci_hcd *xhci,
*/
if (trb_comp_code != COMP_STOPPED &&
trb_comp_code != COMP_STOPPED_LENGTH_INVALID &&
+ !ring_xrun_event &&
!ep_ring->last_td_was_short) {
xhci_warn(xhci, "Event TRB for slot %u ep %u with no TDs queued\n",
slot_id, ep_index);
@@ -2862,6 +2862,9 @@ static int handle_tx_event(struct xhci_hcd *xhci,
*/
} while (ep->skip);
+ if (ring_xrun_event)
+ return 0; /* don't warn or complete any TDs */
+
if (!ep_seg) {
/*
* Skip the Force Stopped Event. The 'ep_trb' of FSE is not in the current
--
2.48.1
If skipping is deferred to events other than Missed Service Error itsef,
it means we are running on an xHCI 1.0 host and don't know how many TDs
were missed until we reach some ordinary transfer completion event.
And in case of ring xrun, we can't know where the xrun happened either.
If we skip all pending TDs, we may prematurely give back TDs added after
the xrun had occurred, risking data loss or buffer UAF by the xHC.
If we skip none, a driver may become confused and stop working when all
its URBs are missed and appear to be "in flight" forever.
Skip exactly one TD on each xrun event - the first one that was missed,
as we can now be sure that the HC has finished processing it. Provided
that one more TD is queued before any subsequent doorbell ring, it will
become safe to skip another TD by the time we get an xrun again.
Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
---
drivers/usb/host/xhci-ring.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 167ae7dfca47..d78b7877f65f 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -2844,8 +2844,21 @@ static int handle_tx_event(struct xhci_hcd *xhci,
if (!ep_seg && usb_endpoint_xfer_isoc(&td->urb->ep->desc)) {
skip_isoc_td(xhci, td, ep, status);
- if (!list_empty(&ep_ring->td_list))
+
+ if (!list_empty(&ep_ring->td_list)) {
+ if (ring_xrun_event) {
+ /*
+ * If we are here, we are on xHCI 1.0 host with no
+ * idea how many TDs were missed or where the xrun
+ * occurred. New TDs may have been added after the
+ * xrun, so skip only one TD to be safe.
+ */
+ xhci_dbg(xhci, "Skipped one TD for slot %u ep %u",
+ slot_id, ep_index);
+ return 0;
+ }
continue;
+ }
xhci_dbg(xhci, "All TDs skipped for slot %u ep %u. Clear skip flag.\n",
slot_id, ep_index);
--
2.48.1
On Tue, 11 Feb 2025 17:41:39 +0200, Mathias Nyman wrote:
> > + if (ring_xrun_event) {
> > + /*
> > + * If we are here, we are on xHCI 1.0 host with no idea how
> > + * many TDs were missed and where the xrun occurred. Don't
> > + * skip more TDs, they may have been queued after the xrun.
> > + */
> > + xhci_dbg(xhci, "Skipped one TD for slot %u ep %u",
> > + slot_id, ep_index);
> > + break;
>
> This would be the same as return 0; right?
Currently, yes. I know it looks silly, but I thought it would be more
future proof than hardcoding 'return 0' into the loop. The point it to
simply stop iteration, what happens next is none of the loop's business.
I hope gcc is clever enough to do the right thing here.
Regards,
Michal
© 2016 - 2026 Red Hat, Inc.