[PATCH v2] xhci: Fix Link TRB DMA in command ring stopped completion event

Faisal Hassan posted 1 patch 1 month ago
There is a newer version of this series
drivers/usb/host/xhci-ring.c | 43 +++++++++++++++++++++++++++++++-----
1 file changed, 38 insertions(+), 5 deletions(-)
[PATCH v2] xhci: Fix Link TRB DMA in command ring stopped completion event
Posted by Faisal Hassan 1 month ago
During the aborting of a command, the software receives a command
completion event for the command ring stopped, with the TRB pointing
to the next TRB after the aborted command.

If the command we abort is located just before the Link TRB in the
command ring, then during the 'command ring stopped' completion event,
the xHC gives the Link TRB in the event's cmd DMA, which causes a
mismatch in handling command completion event.

To handle this situation, an additional check has been added to ignore
the mismatch error and continue the operation.

Fixes: 7f84eef0dafb ("USB: xhci: No-op command queueing and irq handler.")
Cc: stable@vger.kernel.org
Signed-off-by: Faisal Hassan <quic_faisalh@quicinc.com>
---
Changes in v2:
- Removed traversing of TRBs with in_range() API.
- Simplified the if condition check.

v1 link:
https://lore.kernel.org/all/20241018195953.12315-1-quic_faisalh@quicinc.com

 drivers/usb/host/xhci-ring.c | 43 +++++++++++++++++++++++++++++++-----
 1 file changed, 38 insertions(+), 5 deletions(-)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index b2950c35c740..de375c9f08ca 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -126,6 +126,29 @@ static void inc_td_cnt(struct urb *urb)
 	urb_priv->num_tds_done++;
 }
 
+/*
+ * Return true if the DMA is pointing to a Link TRB in the ring;
+ * otherwise, return false.
+ */
+static bool is_dma_link_trb(struct xhci_ring *ring, dma_addr_t dma)
+{
+	struct xhci_segment *seg;
+	union xhci_trb *trb;
+
+	seg = ring->first_seg;
+	do {
+		if (in_range(dma, seg->dma, TRB_SEGMENT_SIZE)) {
+			/* found the TRB, check if it's link */
+			trb = &seg->trbs[(dma - seg->dma) / sizeof(*trb)];
+			return trb_is_link(trb);
+		}
+
+		seg = seg->next;
+	} while (seg != ring->first_seg);
+
+	return false;
+}
+
 static void trb_to_noop(union xhci_trb *trb, u32 noop_type)
 {
 	if (trb_is_link(trb)) {
@@ -1718,6 +1741,7 @@ static void handle_cmd_completion(struct xhci_hcd *xhci,
 
 	trace_xhci_handle_command(xhci->cmd_ring, &cmd_trb->generic);
 
+	cmd_comp_code = GET_COMP_CODE(le32_to_cpu(event->status));
 	cmd_dequeue_dma = xhci_trb_virt_to_dma(xhci->cmd_ring->deq_seg,
 			cmd_trb);
 	/*
@@ -1725,17 +1749,26 @@ static void handle_cmd_completion(struct xhci_hcd *xhci,
 	 * command.
 	 */
 	if (!cmd_dequeue_dma || cmd_dma != (u64)cmd_dequeue_dma) {
-		xhci_warn(xhci,
-			  "ERROR mismatched command completion event\n");
-		return;
+		/*
+		 * For the 'command ring stopped' completion event, there
+		 * is a risk of a mismatch in dequeue pointers if we abort
+		 * the command just before the link TRB in the command ring.
+		 * In this scenario, the cmd_dma in the event would point
+		 * to a link TRB, while the software dequeue pointer circles
+		 * back to the start.
+		 */
+		if (!(cmd_comp_code == COMP_COMMAND_RING_STOPPED &&
+		      is_dma_link_trb(xhci->cmd_ring, cmd_dma))) {
+			xhci_warn(xhci,
+				  "ERROR mismatched command completion event\n");
+			return;
+		}
 	}
 
 	cmd = list_first_entry(&xhci->cmd_list, struct xhci_command, cmd_list);
 
 	cancel_delayed_work(&xhci->cmd_timer);
 
-	cmd_comp_code = GET_COMP_CODE(le32_to_cpu(event->status));
-
 	/* If CMD ring stopped we own the trbs between enqueue and dequeue */
 	if (cmd_comp_code == COMP_COMMAND_RING_STOPPED) {
 		complete_all(&xhci->cmd_ring_stop_completion);
-- 
2.17.1
Re: [PATCH v2] xhci: Fix Link TRB DMA in command ring stopped completion event
Posted by Mathias Nyman 1 month ago
On 21.10.2024 16.19, Faisal Hassan wrote:
> During the aborting of a command, the software receives a command
> completion event for the command ring stopped, with the TRB pointing
> to the next TRB after the aborted command.
> 
> If the command we abort is located just before the Link TRB in the
> command ring, then during the 'command ring stopped' completion event,
> the xHC gives the Link TRB in the event's cmd DMA, which causes a
> mismatch in handling command completion event.
> 
> To handle this situation, an additional check has been added to ignore
> the mismatch error and continue the operation.
> 
> Fixes: 7f84eef0dafb ("USB: xhci: No-op command queueing and irq handler.")
> Cc: stable@vger.kernel.org
> Signed-off-by: Faisal Hassan <quic_faisalh@quicinc.com>
> ---
> Changes in v2:
> - Removed traversing of TRBs with in_range() API.
> - Simplified the if condition check.
> 
> v1 link:
> https://lore.kernel.org/all/20241018195953.12315-1-quic_faisalh@quicinc.com
> 
>   drivers/usb/host/xhci-ring.c | 43 +++++++++++++++++++++++++++++++-----
>   1 file changed, 38 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
> index b2950c35c740..de375c9f08ca 100644
> --- a/drivers/usb/host/xhci-ring.c
> +++ b/drivers/usb/host/xhci-ring.c
> @@ -126,6 +126,29 @@ static void inc_td_cnt(struct urb *urb)
>   	urb_priv->num_tds_done++;
>   }
>   
> +/*
> + * Return true if the DMA is pointing to a Link TRB in the ring;
> + * otherwise, return false.
> + */
> +static bool is_dma_link_trb(struct xhci_ring *ring, dma_addr_t dma)
> +{
> +	struct xhci_segment *seg;
> +	union xhci_trb *trb;
> +
> +	seg = ring->first_seg;
> +	do {
> +		if (in_range(dma, seg->dma, TRB_SEGMENT_SIZE)) {
> +			/* found the TRB, check if it's link */
> +			trb = &seg->trbs[(dma - seg->dma) / sizeof(*trb)];
> +			return trb_is_link(trb);
> +		}
> +
> +		seg = seg->next;
> +	} while (seg != ring->first_seg);
> +
> +	return false;
> +}
> +
>   static void trb_to_noop(union xhci_trb *trb, u32 noop_type)
>   {
>   	if (trb_is_link(trb)) {
> @@ -1718,6 +1741,7 @@ static void handle_cmd_completion(struct xhci_hcd *xhci,
>   
>   	trace_xhci_handle_command(xhci->cmd_ring, &cmd_trb->generic);
>   
> +	cmd_comp_code = GET_COMP_CODE(le32_to_cpu(event->status));
>   	cmd_dequeue_dma = xhci_trb_virt_to_dma(xhci->cmd_ring->deq_seg,
>   			cmd_trb);
>   	/*
> @@ -1725,17 +1749,26 @@ static void handle_cmd_completion(struct xhci_hcd *xhci,
>   	 * command.
>   	 */
>   	if (!cmd_dequeue_dma || cmd_dma != (u64)cmd_dequeue_dma) {
> -		xhci_warn(xhci,
> -			  "ERROR mismatched command completion event\n");
> -		return;
> +		/*
> +		 * For the 'command ring stopped' completion event, there
> +		 * is a risk of a mismatch in dequeue pointers if we abort
> +		 * the command just before the link TRB in the command ring.
> +		 * In this scenario, the cmd_dma in the event would point
> +		 * to a link TRB, while the software dequeue pointer circles
> +		 * back to the start.
> +		 */
> +		if (!(cmd_comp_code == COMP_COMMAND_RING_STOPPED &&
> +		      is_dma_link_trb(xhci->cmd_ring, cmd_dma))) {


Do we in this COMP_COMMAND_RING_STOPPED case even need to check if
cmd_dma != (u64)cmd_dequeue_dma, or if command ring stopped on a link TRB?

Could we just move the COMP_COMMAND_RING_STOPPED handling a bit earlier?

if (cmd_comp_code == COMP_COMMAND_RING_STOPPED) {
	complete_all(&xhci->cmd_ring_stop_completion);
         return;
}

If I remember correctly it should just turn aborted command TRBs into no-ops,
and restart the command ring

Thanks
Mathias
Re: [PATCH v2] xhci: Fix Link TRB DMA in command ring stopped completion event
Posted by Faisal Hassan 1 month ago
Hi Mathias,

On 10/21/2024 9:09 PM, Mathias Nyman wrote:
> On 21.10.2024 16.19, Faisal Hassan wrote:
>> During the aborting of a command, the software receives a command
>> completion event for the command ring stopped, with the TRB pointing
>> to the next TRB after the aborted command.
>>
>> If the command we abort is located just before the Link TRB in the
>> command ring, then during the 'command ring stopped' completion event,
>> the xHC gives the Link TRB in the event's cmd DMA, which causes a
>> mismatch in handling command completion event.
>>
>> To handle this situation, an additional check has been added to ignore
>> the mismatch error and continue the operation.
>>
>> Fixes: 7f84eef0dafb ("USB: xhci: No-op command queueing and irq
>> handler.")
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Faisal Hassan <quic_faisalh@quicinc.com>
>> ---
>> Changes in v2:
>> - Removed traversing of TRBs with in_range() API.
>> - Simplified the if condition check.
>>
>> v1 link:
>> https://lore.kernel.org/all/20241018195953.12315-1-
>> quic_faisalh@quicinc.com
>>
>>   drivers/usb/host/xhci-ring.c | 43 +++++++++++++++++++++++++++++++-----
>>   1 file changed, 38 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
>> index b2950c35c740..de375c9f08ca 100644
>> --- a/drivers/usb/host/xhci-ring.c
>> +++ b/drivers/usb/host/xhci-ring.c
>> @@ -126,6 +126,29 @@ static void inc_td_cnt(struct urb *urb)
>>       urb_priv->num_tds_done++;
>>   }
>>   +/*
>> + * Return true if the DMA is pointing to a Link TRB in the ring;
>> + * otherwise, return false.
>> + */
>> +static bool is_dma_link_trb(struct xhci_ring *ring, dma_addr_t dma)
>> +{
>> +    struct xhci_segment *seg;
>> +    union xhci_trb *trb;
>> +
>> +    seg = ring->first_seg;
>> +    do {
>> +        if (in_range(dma, seg->dma, TRB_SEGMENT_SIZE)) {
>> +            /* found the TRB, check if it's link */
>> +            trb = &seg->trbs[(dma - seg->dma) / sizeof(*trb)];
>> +            return trb_is_link(trb);
>> +        }
>> +
>> +        seg = seg->next;
>> +    } while (seg != ring->first_seg);
>> +
>> +    return false;
>> +}
>> +
>>   static void trb_to_noop(union xhci_trb *trb, u32 noop_type)
>>   {
>>       if (trb_is_link(trb)) {
>> @@ -1718,6 +1741,7 @@ static void handle_cmd_completion(struct
>> xhci_hcd *xhci,
>>         trace_xhci_handle_command(xhci->cmd_ring, &cmd_trb->generic);
>>   +    cmd_comp_code = GET_COMP_CODE(le32_to_cpu(event->status));
>>       cmd_dequeue_dma = xhci_trb_virt_to_dma(xhci->cmd_ring->deq_seg,
>>               cmd_trb);
>>       /*
>> @@ -1725,17 +1749,26 @@ static void handle_cmd_completion(struct
>> xhci_hcd *xhci,
>>        * command.
>>        */
>>       if (!cmd_dequeue_dma || cmd_dma != (u64)cmd_dequeue_dma) {
>> -        xhci_warn(xhci,
>> -              "ERROR mismatched command completion event\n");
>> -        return;
>> +        /*
>> +         * For the 'command ring stopped' completion event, there
>> +         * is a risk of a mismatch in dequeue pointers if we abort
>> +         * the command just before the link TRB in the command ring.
>> +         * In this scenario, the cmd_dma in the event would point
>> +         * to a link TRB, while the software dequeue pointer circles
>> +         * back to the start.
>> +         */
>> +        if (!(cmd_comp_code == COMP_COMMAND_RING_STOPPED &&
>> +              is_dma_link_trb(xhci->cmd_ring, cmd_dma))) {
> 
> 
> Do we in this COMP_COMMAND_RING_STOPPED case even need to check if
> cmd_dma != (u64)cmd_dequeue_dma, or if command ring stopped on a link TRB?
> 
> Could we just move the COMP_COMMAND_RING_STOPPED handling a bit earlier?
> 
> if (cmd_comp_code == COMP_COMMAND_RING_STOPPED) {
>     complete_all(&xhci->cmd_ring_stop_completion);
>         return;
> }
> 
> If I remember correctly it should just turn aborted command TRBs into
> no-ops,
> and restart the command ring
> 

Thanks for reviewing the changes!

Yes, you’re right. As part of restarting the command ring, we just ring
the doorbell.

If we move the event handling without validating the dequeue pointer,
wouldn’t it be a risk if we don’t check what the xHC is holding in its
dequeue pointer? If we are not setting it, it starts from wherever it
stopped. What if the dequeue pointer got corrupted or is not pointing to
any of the TRBs in the command ring?

> Thanks
> Mathias
> 

Thanks,
Faisal
Re: [PATCH v2] xhci: Fix Link TRB DMA in command ring stopped completion event
Posted by Mathias Nyman 1 month ago
On 22.10.2024 15.34, Faisal Hassan wrote:
> Hi Mathias,
> 
>> Do we in this COMP_COMMAND_RING_STOPPED case even need to check if
>> cmd_dma != (u64)cmd_dequeue_dma, or if command ring stopped on a link TRB?
>>
>> Could we just move the COMP_COMMAND_RING_STOPPED handling a bit earlier?
>>
>> if (cmd_comp_code == COMP_COMMAND_RING_STOPPED) {
>>      complete_all(&xhci->cmd_ring_stop_completion);
>>          return;
>> }
>>
>> If I remember correctly it should just turn aborted command TRBs into
>> no-ops,
>> and restart the command ring
>>
> 
> Thanks for reviewing the changes!
> 
> Yes, you’re right. As part of restarting the command ring, we just ring
> the doorbell.
> 
> If we move the event handling without validating the dequeue pointer,
> wouldn’t it be a risk if we don’t check what the xHC is holding in its
> dequeue pointer? If we are not setting it, it starts from wherever it
> stopped. What if the dequeue pointer got corrupted or is not pointing to
> any of the TRBs in the command ring?

For that to happen the xHC host would have to corrupt its internal command
ring dequeue pointer. Not impossible, but an unlikely HW flaw, and a separate
issue. A case like that could be solved by writing the address of the next valid
(non-aborted) command to the CRCR register in xhci_handle_stopped_cmd_ring() before
ringing the doorbell.

The case you found where a command abort is not handled properly due to stopping
on a link TRB is a real xhci driver issue that would be nice to get solved.

For the COMP_COMMAND_RING_STOPPED case we don't really care that much
on which command it stopped, for other commands we do.

Thanks
Mathias

Re: [PATCH v2] xhci: Fix Link TRB DMA in command ring stopped completion event
Posted by Faisal Hassan 1 month ago

On 10/22/2024 8:03 PM, Mathias Nyman wrote:
> On 22.10.2024 15.34, Faisal Hassan wrote:
>> Hi Mathias,
>>
>>> Do we in this COMP_COMMAND_RING_STOPPED case even need to check if
>>> cmd_dma != (u64)cmd_dequeue_dma, or if command ring stopped on a link
>>> TRB?
>>>
>>> Could we just move the COMP_COMMAND_RING_STOPPED handling a bit earlier?
>>>
>>> if (cmd_comp_code == COMP_COMMAND_RING_STOPPED) {
>>>      complete_all(&xhci->cmd_ring_stop_completion);
>>>          return;
>>> }
>>>
>>> If I remember correctly it should just turn aborted command TRBs into
>>> no-ops,
>>> and restart the command ring
>>>
>>
>> Thanks for reviewing the changes!
>>
>> Yes, you’re right. As part of restarting the command ring, we just ring
>> the doorbell.
>>
>> If we move the event handling without validating the dequeue pointer,
>> wouldn’t it be a risk if we don’t check what the xHC is holding in its
>> dequeue pointer? If we are not setting it, it starts from wherever it
>> stopped. What if the dequeue pointer got corrupted or is not pointing to
>> any of the TRBs in the command ring?
> 
> For that to happen the xHC host would have to corrupt its internal command
> ring dequeue pointer. Not impossible, but an unlikely HW flaw, and a
> separate
> issue. A case like that could be solved by writing the address of the
> next valid
> (non-aborted) command to the CRCR register in
> xhci_handle_stopped_cmd_ring() before
> ringing the doorbell.
> 
> The case you found where a command abort is not handled properly due to
> stopping
> on a link TRB is a real xhci driver issue that would be nice to get solved.
> 
> For the COMP_COMMAND_RING_STOPPED case we don't really care that much
> on which command it stopped, for other commands we do.
> 
> Thanks
> Mathias
> 

Sure, I will submit a v3 with the command ring stopped check moved a bit
earlier.

Thanks,
Faisal