[PATCH] net: mctp: Fix tx queue stall

Jinliang Wang posted 1 patch 3 months, 2 weeks ago
There is a newer version of this series
drivers/net/mctp/mctp-usb.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
[PATCH] net: mctp: Fix tx queue stall
Posted by Jinliang Wang 3 months, 2 weeks ago
The tx queue can become permanently stuck in a stopped state due to a
race condition between the URB submission path and its completion
callback.

The URB completion callback can run immediately after usb_submit_urb()
returns, before the submitting function calls netif_stop_queue(). If
this occurs, the queue state management becomes desynchronized, leading
to a stall where the queue is never woken.

Fix this by moving the netif_stop_queue() call to before submitting the
URB. This closes the race window by ensuring the network stack is aware
the queue is stopped before the URB completion can possibly run.

We found one error case: the tx queue is hold on stopped state forever.
The root cause is that URB complete callback may be executed right
after usb_submit_urb and cause tx queue being stopped forever.

Signed-off-by: Jinliang Wang <jinliangw@google.com>
---
 drivers/net/mctp/mctp-usb.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/net/mctp/mctp-usb.c b/drivers/net/mctp/mctp-usb.c
index 36ccc53b1797..ef860cfc629f 100644
--- a/drivers/net/mctp/mctp-usb.c
+++ b/drivers/net/mctp/mctp-usb.c
@@ -96,11 +96,13 @@ static netdev_tx_t mctp_usb_start_xmit(struct sk_buff *skb,
 			  skb->data, skb->len,
 			  mctp_usb_out_complete, skb);
 
+	/* Stops TX queue first to prevent race condition with URB complete */
+	netif_stop_queue(dev);
 	rc = usb_submit_urb(urb, GFP_ATOMIC);
-	if (rc)
+	if (rc) {
+		netif_wake_queue(dev);
 		goto err_drop;
-	else
-		netif_stop_queue(dev);
+	}
 
 	return NETDEV_TX_OK;
 
-- 
2.51.1.821.gb6fe4d2222-goog
[PATCH v2] net: mctp: Fix tx queue stall
Posted by Jinliang Wang 3 months, 2 weeks ago
The tx queue can become permanently stuck in a stopped state due to a
race condition between the URB submission path and its completion
callback.

The URB completion callback can run immediately after usb_submit_urb()
returns, before the submitting function calls netif_stop_queue(). If
this occurs, the queue state management becomes desynchronized, leading
to a stall where the queue is never woken.

Fix this by moving the netif_stop_queue() call to before submitting the
URB. This closes the race window by ensuring the network stack is aware
the queue is stopped before the URB completion can possibly run.

Signed-off-by: Jinliang Wang <jinliangw@google.com>
---
 drivers/net/mctp/mctp-usb.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/net/mctp/mctp-usb.c b/drivers/net/mctp/mctp-usb.c
index 36ccc53b1797..ef860cfc629f 100644
--- a/drivers/net/mctp/mctp-usb.c
+++ b/drivers/net/mctp/mctp-usb.c
@@ -96,11 +96,13 @@ static netdev_tx_t mctp_usb_start_xmit(struct sk_buff *skb,
 			  skb->data, skb->len,
 			  mctp_usb_out_complete, skb);
 
+	/* Stops TX queue first to prevent race condition with URB complete */
+	netif_stop_queue(dev);
 	rc = usb_submit_urb(urb, GFP_ATOMIC);
-	if (rc)
+	if (rc) {
+		netif_wake_queue(dev);
 		goto err_drop;
-	else
-		netif_stop_queue(dev);
+	}
 
 	return NETDEV_TX_OK;
 
-- 
2.51.1.821.gb6fe4d2222-goog
Re: [PATCH v2] net: mctp: Fix tx queue stall
Posted by Jeremy Kerr 3 months, 1 week ago
Hi Jinliang,

Thanks for the fix! A couple of comments on the metadata:

> Subject: [PATCH v2] net: mctp: Fix tx queue stall

You'll want to indicate that this is for net rather than net-next in
the subject prefix, so:

> Subject: [PATCH net v2] net: mctp: Fix tx queue stall

Then, since we're targeting net, we'll want a fixes tag too. I would
suggest:

Fixes: 0791c0327a6e ("net: mctp: Add MCTP USB transport driver")

- so we get appropriate backports. No need to reply to the original
message for subsequent versions either, a new thread is best.

Also, it's helpful to indicate changes between submitted versions,
under the '---' marker, which doesn't end up in the git commit.
Something like:

---
v3:
 - target net tree, add fixes tag

v2:
 - remove duplicate comment in commit message

---
 drivers/net/mctp/mctp-usb.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

> The tx queue can become permanently stuck in a stopped state due to a
> race condition between the URB submission path and its completion
> callback.
> 
> The URB completion callback can run immediately after usb_submit_urb()
> returns, before the submitting function calls netif_stop_queue(). If
> this occurs, the queue state management becomes desynchronized, leading
> to a stall where the queue is never woken.
> 
> Fix this by moving the netif_stop_queue() call to before submitting the
> URB. This closes the race window by ensuring the network stack is aware
> the queue is stopped before the URB completion can possibly run.

LGTM. With the changes above:

Acked-by: Jeremy Kerr <jk@codeconstruct.com.au>

Cheers,


Jeremy