usb: xhci: Skip configure EP for disabled slots during teardown

[RFC PATCH] usb: xhci: Skip configure EP for disabled slots during teardown

Posted by Udipto Goswami 1 month ago

Consider a scenario when a HS headset fails resume and the hub performs
a logical disconnect, the USB core tears down endpoints and calls
hcd->check_bandwidth() on the way out, which with xHCI translates to a
drop-only Configure Endpoint command (add_flags == SLOT_FLAG, drop_flags
!= 0). If the slot is already disabled (slot_id == 0) or the virtual
device has been freed, issuing this Configure Endpoint command is
pointless and may appear stuck until event handling catches up,
causing unnecessary delays during disconnect teardown.

Fix this by adding a check in xhci_check_bandwidth(), return success
immediately if slot_id == 0 or vdev is missing, preventing the
Configure Endpoint command from being queued at all. Additionally,
in xhci_configure_endpoint() for drop-only Configure Endpoint operations,
return success early if slot_id == 0 or vdev is already freed,
avoiding spurious command waits.

Signed-off-by: Udipto Goswami <udipto.goswami@oss.qualcomm.com>
---
 drivers/usb/host/xhci.c | 31 ++++++++++++++++++++++++++++++-
 1 file changed, 30 insertions(+), 1 deletion(-)

diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index 02c9bfe21ae2..bc92edbad468 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -2958,6 +2958,7 @@ static int xhci_configure_endpoint(struct xhci_hcd *xhci,
 	struct xhci_input_control_ctx *ctrl_ctx;
 	struct xhci_virt_device *virt_dev;
 	struct xhci_slot_ctx *slot_ctx;
+	u32 add_flags, drop_flags;
 
 	if (!command)
 		return -EINVAL;
@@ -2979,6 +2980,19 @@ static int xhci_configure_endpoint(struct xhci_hcd *xhci,
 		return -ENOMEM;
 	}
 
+	/*
+	 * For drop-only Configure Endpoint (add_flags == SLOT_FLAG
+	 * and drop_flags != 0), if vdev is already gone, there is no hardware
+	 * to configure. Return success early to avoid issuing pointless commands.
+	 */
+	add_flags = le32_to_cpu(ctrl_ctx->add_flags);
+	drop_flags = le32_to_cpu(ctrl_ctx->drop_flags);
+	if (!ctx_change && add_flags == SLOT_FLAG && drop_flags != 0 && !virt_dev) {
+		spin_unlock_irqrestore(&xhci->lock, flags);
+		xhci_dbg(xhci, "skip drop-only Configure EP; vdev already freed\n");
+		return 0;
+	}
+
 	if ((xhci->quirks & XHCI_EP_LIMIT_QUIRK) &&
 			xhci_reserve_host_resources(xhci, ctrl_ctx)) {
 		spin_unlock_irqrestore(&xhci->lock, flags);
@@ -3082,12 +3096,27 @@ int xhci_check_bandwidth(struct usb_hcd *hcd, struct usb_device *udev)
 	if (ret <= 0)
 		return ret;
 	xhci = hcd_to_xhci(hcd);
+
+	/*
+	 * If the slot is already disabled (slot_id == 0) or the vdev is gone,
+	 * we're in teardown. There's nothing to update in HW. Treat as success
+	 * and skip issuing Configure Endpoint command.
+	 */
+	if (!udev->slot_id) {
+		xhci_dbg(xhci, "Slot already disabled for udev %p\n", udev);
+		return 0;
+	}
+
+	virt_dev = xhci->devs[udev->slot_id];
+	if (!virt_dev) {
+		xhci_dbg(xhci, "virt_dev already freed for slot %d\n", udev->slot_id);
+		return 0;
+	}
 	if ((xhci->xhc_state & XHCI_STATE_DYING) ||
 		(xhci->xhc_state & XHCI_STATE_REMOVING))
 		return -ENODEV;
 
 	xhci_dbg(xhci, "%s called for udev %p\n", __func__, udev);
-	virt_dev = xhci->devs[udev->slot_id];
 
 	command = xhci_alloc_command(xhci, true, GFP_KERNEL);
 	if (!command)
-- 
2.34.1

Re: [RFC PATCH] usb: xhci: Skip configure EP for disabled slots during teardown

Posted by Mathias Nyman 1 month ago

Hi

On 1/5/26 10:48, Udipto Goswami wrote:
> Consider a scenario when a HS headset fails resume and the hub performs
> a logical disconnect, the USB core tears down endpoints and calls
> hcd->check_bandwidth() on the way out, which with xHCI translates to a
> drop-only Configure Endpoint command (add_flags == SLOT_FLAG, drop_flags
> != 0). If the slot is already disabled (slot_id == 0) or the virtual
> device has been freed, issuing this Configure Endpoint command is
> pointless and may appear stuck until event handling catches up,
> causing unnecessary delays during disconnect teardown.
> 
> Fix this by adding a check in xhci_check_bandwidth(), return success
> immediately if slot_id == 0 or vdev is missing, preventing the
> Configure Endpoint command from being queued at all. Additionally,
> in xhci_configure_endpoint() for drop-only Configure Endpoint operations,
> return success early if slot_id == 0 or vdev is already freed,
> avoiding spurious command waits.
> 
> Signed-off-by: Udipto Goswami <udipto.goswami@oss.qualcomm.com>

Makes sense to prevent unnecessary 'configure endpoint' commands

Could you share more details how we end up tearing down endpoints and
calling xhci_check_bandwidth() after vdev is freed and slot_id set to zero?

Did the whole xHC controller fail to resume and was reinitialized in
xhci_resume() power_lost path?

Or is this related to audio offload and xhci sideband usage?

If we end up in this situation in normal headset resume failure then there
might be something else wrong.

Thanks
Mathias

Re: [RFC PATCH] usb: xhci: Skip configure EP for disabled slots during teardown

Posted by Udipto Goswami 1 month ago

On Mon, Jan 5, 2026 at 4:32 PM Mathias Nyman
<mathias.nyman@linux.intel.com> wrote:
>
> Hi
>
> On 1/5/26 10:48, Udipto Goswami wrote:
> > Consider a scenario when a HS headset fails resume and the hub performs
> > a logical disconnect, the USB core tears down endpoints and calls
> > hcd->check_bandwidth() on the way out, which with xHCI translates to a
> > drop-only Configure Endpoint command (add_flags == SLOT_FLAG, drop_flags
> > != 0). If the slot is already disabled (slot_id == 0) or the virtual
> > device has been freed, issuing this Configure Endpoint command is
> > pointless and may appear stuck until event handling catches up,
> > causing unnecessary delays during disconnect teardown.
> >
> > Fix this by adding a check in xhci_check_bandwidth(), return success
> > immediately if slot_id == 0 or vdev is missing, preventing the
> > Configure Endpoint command from being queued at all. Additionally,
> > in xhci_configure_endpoint() for drop-only Configure Endpoint operations,
> > return success early if slot_id == 0 or vdev is already freed,
> > avoiding spurious command waits.
> >
> > Signed-off-by: Udipto Goswami <udipto.goswami@oss.qualcomm.com>
>
> Makes sense to prevent unnecessary 'configure endpoint' commands
>
> Could you share more details how we end up tearing down endpoints and
> calling xhci_check_bandwidth() after vdev is freed and slot_id set to zero?
>
> Did the whole xHC controller fail to resume and was reinitialized in
> xhci_resume() power_lost path?
>
> Or is this related to audio offload and xhci sideband usage?
>
> If we end up in this situation in normal headset resume failure then there
> might be something else wrong.
>

Apologies! My mailbox was configured with HTML.
Re-sending in plain text.

Hi Mathias,

Yes, we are using offloaded audio in this case and xhci-sideband is involved.

Scenario:
The headset is connected to the platform with no active playback, so
it suspends. No physical disconnect occurs.

1. Audio DSP sends a playback request while the USB headset (device
1-1) is suspended
2. Resume chain is triggered:
   handle_uaudio_stream_req
   → enable_audio_stream
   → snd_usb_autoresume
   → dwc3-parent_wrapper (Qualcomm) → xhci → roothub → USB headset (1-1)
3. Resume fails at device 1-1:The headset fails to resume from
suspend. Note that the xHCI controller itself resumes
successfully—only the headset device fails.
4. Hub performs logical disconnect as a recovery mechanism
5. Race condition occurs: The USB core begins to teardown (calling
'check_bandwidth()'), but the xHCI driver may have already started
freeing the slot due to the failed resume.

Two parallel paths:
PATH1: (slower usb core teardown)

hub_port_connect_change()
└─ Device resume fails
   └─ hub_port_logical_disconnect()
      └─ usb_disconnect()
         └─ usb_disable_device()
            ├─ usb_disable_endpoint() [for each endpoint]
            │  └─ usb_hcd_disable_endpoint()
            └─ usb_hcd_alloc_bandwidth()
               └─ usb_hcd_check_bandwidth()
                  └─ xhci_check_bandwidth() ← POINT OF FAILURE
                     └─ Tries to issue Configure Endpoint
                        └─ But slot_id == 0 or virt_dev == NULL!

PATH2: (faster - xhci slot cleanup)
hub_port_logical_disconnect()
└─ usb_disconnect()
   └─ usb_release_dev()
      └─ usb_hcd_free_dev()
         └─ xhci_free_dev()
            └─ xhci_disable_slot()
               ├─ Issues TRB_DISABLE_SLOT command
               ├─ Waits for completion
               └─ xhci_free_virt_device()
                  ├─ Sets udev->slot_id = 0
                  ├─ Frees virt_dev
                  └─ Sets xhci->devs[slot_id] = NULL

RACE TIMELINE:

Path 2 (fast)
      Path 1 (slow)
─────────────────────────────────────────────────
T1: xhci_free_dev() starts
T2: xhci_disable_slot() issued
T3: slot_id = 0
T4: virt_dev freed
usb_disable_endpoint()
T5: xhci->devs[slot_id] = NULL                             (still processing...)
T6:
     xhci_check_bandwidth() ← RACE!
T7:
     Tries Configure Endpoint
T8:
     But slot is already freed!

Path 1 is slower because it must iterate through all endpoints,
calling usb_disable_endpoint() for each one before reaching
check_bandwidth().
Path 2 completes faster with a single disable slot command. So if
T3-T5 has already executed, meaning tthe slot has already freed then
configure endpoint commands can be skipped i.e T6-T8.
Please let me know if this makes sense ?

Thanks,
-Udipto

Re: [RFC PATCH] usb: xhci: Skip configure EP for disabled slots during teardown

Posted by Mathias Nyman 1 month ago

On 1/6/26 12:22, Udipto Goswami wrote:
> On Mon, Jan 5, 2026 at 4:32 PM Mathias Nyman
> <mathias.nyman@linux.intel.com> wrote:
>>
>> Hi
>>
>> On 1/5/26 10:48, Udipto Goswami wrote:
>>> Consider a scenario when a HS headset fails resume and the hub performs
>>> a logical disconnect, the USB core tears down endpoints and calls
>>> hcd->check_bandwidth() on the way out, which with xHCI translates to a
>>> drop-only Configure Endpoint command (add_flags == SLOT_FLAG, drop_flags
>>> != 0). If the slot is already disabled (slot_id == 0) or the virtual
>>> device has been freed, issuing this Configure Endpoint command is
>>> pointless and may appear stuck until event handling catches up,
>>> causing unnecessary delays during disconnect teardown.
>>>
>>> Fix this by adding a check in xhci_check_bandwidth(), return success
>>> immediately if slot_id == 0 or vdev is missing, preventing the
>>> Configure Endpoint command from being queued at all. Additionally,
>>> in xhci_configure_endpoint() for drop-only Configure Endpoint operations,
>>> return success early if slot_id == 0 or vdev is already freed,
>>> avoiding spurious command waits.
>>>
>>> Signed-off-by: Udipto Goswami <udipto.goswami@oss.qualcomm.com>
>>
>> Makes sense to prevent unnecessary 'configure endpoint' commands
>>
>> Could you share more details how we end up tearing down endpoints and
>> calling xhci_check_bandwidth() after vdev is freed and slot_id set to zero?
>>
>> Did the whole xHC controller fail to resume and was reinitialized in
>> xhci_resume() power_lost path?
>>
>> Or is this related to audio offload and xhci sideband usage?
>>
>> If we end up in this situation in normal headset resume failure then there
>> might be something else wrong.
>>
> 
> Apologies! My mailbox was configured with HTML.
> Re-sending in plain text.
> 
> Hi Mathias,
> 
> Yes, we are using offloaded audio in this case and xhci-sideband is involved.
> 
> Scenario:
> The headset is connected to the platform with no active playback, so
> it suspends. No physical disconnect occurs.
> 
> 1. Audio DSP sends a playback request while the USB headset (device
> 1-1) is suspended
> 2. Resume chain is triggered:
>     handle_uaudio_stream_req
>     → enable_audio_stream
>     → snd_usb_autoresume
>     → dwc3-parent_wrapper (Qualcomm) → xhci → roothub → USB headset (1-1)
> 3. Resume fails at device 1-1:The headset fails to resume from
> suspend. Note that the xHCI controller itself resumes
> successfully—only the headset device fails.
> 4. Hub performs logical disconnect as a recovery mechanism
> 5. Race condition occurs: The USB core begins to teardown (calling
> 'check_bandwidth()'), but the xHCI driver may have already started
> freeing the slot due to the failed resume.
> 
> Two parallel paths:
> PATH1: (slower usb core teardown)
> 
> hub_port_connect_change()
> └─ Device resume fails
>     └─ hub_port_logical_disconnect()
>        └─ usb_disconnect()
>           └─ usb_disable_device()
>              ├─ usb_disable_endpoint() [for each endpoint]
>              │  └─ usb_hcd_disable_endpoint()
>              └─ usb_hcd_alloc_bandwidth()
>                 └─ usb_hcd_check_bandwidth()
>                    └─ xhci_check_bandwidth() ← POINT OF FAILURE
>                       └─ Tries to issue Configure Endpoint
>                          └─ But slot_id == 0 or virt_dev == NULL!
> 
> PATH2: (faster - xhci slot cleanup)
> hub_port_logical_disconnect()
> └─ usb_disconnect()
>     └─ usb_release_dev()
>        └─ usb_hcd_free_dev()
>           └─ xhci_free_dev()
>              └─ xhci_disable_slot()
>                 ├─ Issues TRB_DISABLE_SLOT command
>                 ├─ Waits for completion
>                 └─ xhci_free_virt_device()
>                    ├─ Sets udev->slot_id = 0
>                    ├─ Frees virt_dev
>                    └─ Sets xhci->devs[slot_id] = NULL
> 
> RACE TIMELINE:
> 
> Path 2 (fast)
>        Path 1 (slow)
> ─────────────────────────────────────────────────
> T1: xhci_free_dev() starts
> T2: xhci_disable_slot() issued
> T3: slot_id = 0
> T4: virt_dev freed
> usb_disable_endpoint()
> T5: xhci->devs[slot_id] = NULL                             (still processing...)
> T6:
>       xhci_check_bandwidth() ← RACE!
> T7:
>       Tries Configure Endpoint
> T8:
>       But slot is already freed!
> 
> Path 1 is slower because it must iterate through all endpoints,
> calling usb_disable_endpoint() for each one before reaching
> check_bandwidth().
> Path 2 completes faster with a single disable slot command. So if
> T3-T5 has already executed, meaning tthe slot has already freed then
> configure endpoint commands can be skipped i.e T6-T8.
> Please let me know if this makes sense ?

Thanks, well explained and nicely laid out.

There is something still odd in this scenario.

There shouldn't be two racing paths as both cases should be handled by
the hub work 'thread' that only has one active work item.

If resume fails then hub_port_logical_disconnect() is called and marks the device
as "USB_STATE_NOTATTACHED", and adds a change_bit for the port.
hub work should take over from there.

hub work should then do:
hub_event()
   port_event(hub, i);    // because hub->change_bit is set for this port
     hub_port_connect_change()
       hub_port_connect()
         if (udev)
           usb_disconnect()
             usb_disable_device()  //children first
               usb_disable_device_endpoints()  // for each endpoint
                 usb_hcd_alloc_bandwidth(dev, NULL, NULL, NULL);
                   hcd->driver->check_bandwidth()  // does all the configure endpoint commands
             device_del(&udev->dev);
             hub_free_dev(udev)
               hcd->driver->free_dev(hcd, udev);  // clears virt_dev and slot_id here
             put_device(&udev->dev);

To me this looks like driver->check_bandwitdth() is called before driver->free_dev().
  
Thanks
Mathias