[PATCH 0/1] usb: xhci: Queue URB_ZERO_PACKET as one TD

Michal Pecio posted 1 patch 3 weeks, 3 days ago
[PATCH 0/1] usb: xhci: Queue URB_ZERO_PACKET as one TD
Posted by Michal Pecio 3 weeks, 3 days ago
Hi Mathias,

I wanted to finish and send v2 of "Simplify TD cancellation and drop
some states" over the weekend, but I encountered an annoying roadblock
and I need your input.

Basically, there is a bug: URB_ZERO_PACKET is queued as two TDs, and
when the first TD halts, the driver simply advances to the second one.

I found that extending the event handler to cover this case requires
multiple changes:

1. obviously, all TDs must be cancelled, not just the current one
2. they may be given back in weird order (waiting for Set Deq), so
   we need to store the status on urb_priv and override td->status
3. xhci_invalidate_cancelled_tds() would need to recognize multiple
   halted TDs on the same URB as valid

This is doable, and I have already implemented most of it in that
series, but there is an alternative solution: simply stop worrying
about halted multi-TD URBs, because this is the only case and it
can be converted to pretend that it's just one TD per URB as usual.

If you are OK with this patch, cancellation logic will be simpler,
because this time there really are no remaining cases of multi-TD
URBs except isochronous. This is clear in xhci_urb_enqueue():

        if (usb_endpoint_xfer_isoc(&urb->ep->desc))
                num_tds = urb->number_of_packets;
        else         
                num_tds = 1;

Regards,
Michal
Re: [PATCH 0/1] usb: xhci: Queue URB_ZERO_PACKET as one TD
Posted by Mathias Nyman 3 weeks, 2 days ago
On 8.9.2025 14.01, Michal Pecio wrote:
> Hi Mathias,
> 
> I wanted to finish and send v2 of "Simplify TD cancellation and drop
> some states" over the weekend, but I encountered an annoying roadblock
> and I need your input.
> 
> Basically, there is a bug: URB_ZERO_PACKET is queued as two TDs, and
> when the first TD halts, the driver simply advances to the second one.
> 
> I found that extending the event handler to cover this case requires
> multiple changes:
> 
> 1. obviously, all TDs must be cancelled, not just the current one
> 2. they may be given back in weird order (waiting for Set Deq), so
>     we need to store the status on urb_priv and override td->status
> 3. xhci_invalidate_cancelled_tds() would need to recognize multiple
>     halted TDs on the same URB as valid
> 
> This is doable, and I have already implemented most of it in that
> series, but there is an alternative solution: simply stop worrying
> about halted multi-TD URBs, because this is the only case and it
> can be converted to pretend that it's just one TD per URB as usual.
> 
> If you are OK with this patch, cancellation logic will be simpler,
> because this time there really are no remaining cases of multi-TD
> URBs except isochronous. This is clear in xhci_urb_enqueue():
> 

Adding the zero-length TRB to the original TD when we need to send a
zero-length packet would simplify things, and I would otherwise fully
support this, but the xHCI spec is pretty clear that it requires a
dedicated TD for zero-length transactions.

See xhci spec section 4.9.1:

"To generate a “zero-length” USB transaction, software shall explicitly
define a TD with a single Transfer TRB, and its TRB Transfer Length
field shall equal ‘0’. Note that this TD may include non-Transfer TRBs,
e.g. an Event Data or Link TRB."

Thanks
Mathias
Re: [PATCH 0/1] usb: xhci: Queue URB_ZERO_PACKET as one TD
Posted by Michal Pecio 3 weeks, 2 days ago
On Tue, 9 Sep 2025 16:04:33 +0300, Mathias Nyman wrote:
> Adding the zero-length TRB to the original TD when we need to send a
> zero-length packet would simplify things, and I would otherwise fully
> support this, but the xHCI spec is pretty clear that it requires a
> dedicated TD for zero-length transactions.

You are right of course, an empty TRB in a TD would simply send no
data, or maybe it's a TRB Error, I'm not sure.

But this is not what this patch is about - the trick is to use an
*unchained* TRB, which is a separate TD from HW's perspective, and
to count it as part of the same TD from the driver's perspective.

Control URBs are like that and they work fine. They can halt on any
TRB (which are all unchained, per spec) and the whole URB goes out.

This bug is (probably?) low impact, but it bothers me because it's
a design flaw: either non-isoc multi-TD URBs are supported, or they
are not. One or another part of the driver needs to adapt.
Re: [PATCH 0/1] usb: xhci: Queue URB_ZERO_PACKET as one TD
Posted by Mathias Nyman 3 weeks, 2 days ago
On 9.9.2025 20.38, Michal Pecio wrote:
> On Tue, 9 Sep 2025 16:04:33 +0300, Mathias Nyman wrote:
>> Adding the zero-length TRB to the original TD when we need to send a
>> zero-length packet would simplify things, and I would otherwise fully
>> support this, but the xHCI spec is pretty clear that it requires a
>> dedicated TD for zero-length transactions.
> 
> You are right of course, an empty TRB in a TD would simply send no
> data, or maybe it's a TRB Error, I'm not sure.
> 
> But this is not what this patch is about - the trick is to use an
> *unchained* TRB, which is a separate TD from HW's perspective, and
> to count it as part of the same TD from the driver's perspective.

Ok, I see.
The whole TD without completion flag does worry me a bit.

We need to make sure stop/stald mid TD cases work, and  urb length is
set correctly.

> 
> Control URBs are like that and they work fine. They can halt on any
> TRB (which are all unchained, per spec) and the whole URB goes out.
> 
> This bug is (probably?) low impact, but it bothers me because it's
> a design flaw: either non-isoc multi-TD URBs are supported, or they
> are not. One or another part of the driver needs to adapt.

There is a risk that this is one of those "cure is worse than the
disease" cases.

Thanks
Mathias
Re: [PATCH 0/1] usb: xhci: Queue URB_ZERO_PACKET as one TD
Posted by Michal Pecio 1 week, 3 days ago
On Wed, 10 Sep 2025 01:57:39 +0300, Mathias Nyman wrote:
> On 9.9.2025 20.38, Michal Pecio wrote:
> > But this is not what this patch is about - the trick is to use an
> > *unchained* TRB, which is a separate TD from HW's perspective, and
> > to count it as part of the same TD from the driver's perspective.  
> 
> Ok, I see.
> The whole TD without completion flag does worry me a bit.
> 
> We need to make sure stop/stald mid TD cases work, and  urb length is
> set correctly.

I came up with a potential problem case for clearing IOC:

1. all data of the first TD are sent out sucessfully
2. no completion is generated because no IOC
3. ring stops before advancing to the zero-length TD
4. we only get FSE (Stopped - Length Invalid)

See xHCI 4.6.9:
     Table 4-2: Stop Endpoint Command TRB Handling
       2nd row: Stopped on TD boundary

Current event handler doesn't expect this to happen and actual length
will be reported incorrectly. This would be easy to fix.

But there is also the 0.96 spec where FSE was optional (xHCI G.2), so
on some HCs (like NEC uPD720200) we won't get any event whatsoever and
the almost fully completed URB will seem to have transferred no data.

(This assumes that any HC would stop in this manner rather than advance
to the zero-length TD atomically after previous TD completion and stop
normally in the zero-length TD. So not sure if it's a real problem and
the condition seems hard to trigger for testing purposes.)


Control URBs have the same problem - FSE isn't handled very well and
old HCs would seem to need IOC on the data stage to ensure correct
actual length of cancelled URBs, if anyone cares.

Regards,
Michal
Re: [PATCH 0/1] usb: xhci: Queue URB_ZERO_PACKET as one TD
Posted by Michal Pecio 3 weeks, 2 days ago
On Wed, 10 Sep 2025 01:57:39 +0300, Mathias Nyman wrote:
> On 9.9.2025 20.38, Michal Pecio wrote:
> > On Tue, 9 Sep 2025 16:04:33 +0300, Mathias Nyman wrote:  
> >> Adding the zero-length TRB to the original TD when we need to send a
> >> zero-length packet would simplify things, and I would otherwise fully
> >> support this, but the xHCI spec is pretty clear that it requires a
> >> dedicated TD for zero-length transactions.  
> > 
> > You are right of course, an empty TRB in a TD would simply send no
> > data, or maybe it's a TRB Error, I'm not sure.
> > 
> > But this is not what this patch is about - the trick is to use an
> > *unchained* TRB, which is a separate TD from HW's perspective, and
> > to count it as part of the same TD from the driver's perspective.  
> 
> Ok, I see.
> The whole TD without completion flag does worry me a bit.
>
> We need to make sure stop/stald mid TD cases work, and  urb length is
> set correctly.

It looks odd, but I can't find anything wrong.

4.10.4 discusses what happens when IOC is clear on the last TRB of
a TD so it looks like this is allowed.

If the first TD halts or stops before completion then it doesn't
matter that we cleared its IOC. Everything works as before, except
that Set TR Deq will skip both TDs and the URB will be given back.

If the first TD completes, the xHC silently moves to the second TD.

No matter what happens to the second TD, URB length is calculated:

       if (ep_trb == td->end_trb)	/* end_trb is second TD */
                td->urb->actual_length = requested - remaining;

where 'requested' is full URB and 'remaining' is TRB transfer residue
which should be zero because buffer length is zero. Looks OK.

And curiously, this behavior too is no different from what exists now.
I see nothing stopping the second TD from overwriting whatever was set
when the first TD generated its event (success or otherwise).

Regards,
Michal
Re: [PATCH 0/1] usb: xhci: Queue URB_ZERO_PACKET as one TD
Posted by Michal Pecio 3 weeks, 2 days ago
On Wed, 10 Sep 2025 02:03:06 +0200, Michal Pecio wrote:
> On Wed, 10 Sep 2025 01:57:39 +0300, Mathias Nyman wrote:
> > On 9.9.2025 20.38, Michal Pecio wrote:  
> > > On Tue, 9 Sep 2025 16:04:33 +0300, Mathias Nyman wrote:    
> > >> Adding the zero-length TRB to the original TD when we need to send a
> > >> zero-length packet would simplify things, and I would otherwise fully
> > >> support this, but the xHCI spec is pretty clear that it requires a
> > >> dedicated TD for zero-length transactions.    
> > > 
> > > You are right of course, an empty TRB in a TD would simply send no
> > > data, or maybe it's a TRB Error, I'm not sure.
> > > 
> > > But this is not what this patch is about - the trick is to use an
> > > *unchained* TRB, which is a separate TD from HW's perspective, and
> > > to count it as part of the same TD from the driver's perspective.    
> > 
> > Ok, I see.
> > The whole TD without completion flag does worry me a bit.
> >
> > We need to make sure stop/stald mid TD cases work, and  urb length is
> > set correctly.  
> 
> It looks odd, but I can't find anything wrong.
> 
> 4.10.4 discusses what happens when IOC is clear on the last TRB of
> a TD so it looks like this is allowed.
> 
> If the first TD halts or stops before completion then it doesn't
> matter that we cleared its IOC. Everything works as before, except
> that Set TR Deq will skip both TDs and the URB will be given back.

Well, there is one difference, but so far I found no ill effects.

All those (ep_trb == td->end_trb) comparisons will be false in case
of an event on the last TRB of the first TD, currently they are true.

But it should be harmless:
* COMP_SUCCESS case in process_bulk_intr_td() is impossible (no IOC)
* on errors, we may use sum_trb_lengths() unnecessarily, should be OK.

These are the only such checks I see. Nothing in handle_tx_event()
and finish_td(), and from there we go to handle_halted_endpoint().


Generally, I tried running this with wMaxPacket=64, TRB length reduced
to 64B (patched xhci_hcd) to force multiple TRBs in the first TD and
with transfer lengths of 32, 64, 96, 128, 192, 256. It worked.

I can run it again tomorrow and send event-ring/trbs and epXX/trbs.
Re: [PATCH 0/1] usb: xhci: Queue URB_ZERO_PACKET as one TD
Posted by Michal Pecio 3 weeks, 1 day ago
On Wed, 10 Sep 2025 02:15:19 +0200, Michal Pecio wrote:
> Generally, I tried running this with wMaxPacket=64, TRB length reduced
> to 64B (patched xhci_hcd) to force multiple TRBs in the first TD and
> with transfer lengths of 32, 64, 96, 128, 192, 256. It worked.
> 
> I can run it again tomorrow and send event-ring/trbs and epXX/trbs.

I found that my rtw88 WiFi dongle actually uses URB_ZERO_PACKET.
And it needs it - if the driver is patched to ignore the flag, ping at
certain packet size simply stops working, unless there is additional
traffic with random packet sizes to produce short transfers on the USB.

This is a high speed device. I also patched xhci to split TRBs on 512
rather than 65536 byte boundaries to get more TRBs per TD.

Result is OK I think, completion events as expected, the device works.
Note: the class driver submits URBs with misaligned data buffers.

 1 0x00000000ffed8f20: Buffer 00000000ffe9a80e length 498 TD size 2 intr 0 type 'Normal' flags b:i:i:C:s:i:e:C
 1 0x00000000ffed8f30: Buffer 00000000ffe9aa00 length 512 TD size 1 intr 0 type 'Normal' flags b:i:i:C:s:i:e:C
 1 0x00000000ffed8f40: Buffer 00000000ffe9ac00 length 14 TD size 0 intr 0 type 'Normal' flags b:i:i:c:s:i:e:C
 1 0x00000000ffed8f50: Buffer 0000000000000000 length 0 TD size 0 intr 0 type 'Normal' flags b:i:I:c:s:i:e:C
=>  0 0x00000000ff988c60: TRB 00000000ffed8f50 status 'Success' len 0 slot 2 ep 16 type 'Transfer Event' flags e:c 
 1 0x00000000ffed8f60: Buffer 00000000fff0e80e length 498 TD size 2 intr 0 type 'Normal' flags b:i:i:C:s:i:e:C
 1 0x00000000ffed8f70: Buffer 00000000fff0ea00 length 512 TD size 1 intr 0 type 'Normal' flags b:i:i:C:s:i:e:C
 1 0x00000000ffed8f80: Buffer 00000000fff0ec00 length 14 TD size 0 intr 0 type 'Normal' flags b:i:i:c:s:i:e:C
 1 0x00000000ffed8f90: Buffer 0000000000000000 length 0 TD size 0 intr 0 type 'Normal' flags b:i:I:c:s:i:e:C
=>  0 0x00000000ff988e20: TRB 00000000ffed8f90 status 'Success' len 0 slot 2 ep 16 type 'Transfer Event' flags e:c 

This was tested on some AMD, ASMedia, Renesas, NEC, VIA and Fresco.

Ånd then there is Etron. This old junk doesn't support zero-length TDs
at all, neither my version nor mainline works. No events and apparently
no USB packets either, because even if xhci is patched to "skip missed
TDs" to keep the class driver happy, ping is still getting stuck.

Maybe something to keep in mind if there is ever a bug about it.
I hope not too many people are still using this horrible HW.

Regards,
Michal