[PATCH net] xen-netback: properly sync TX responses

Jan Beulich posted 1 patch 2 months, 4 weeks ago
Failed in applying to current master (apply log)
[PATCH net] xen-netback: properly sync TX responses
Posted by Jan Beulich 2 months, 4 weeks ago
Invoking the make_tx_response() / push_tx_responses() pair with no lock
held would be acceptable only if all such invocations happened from the
same context (NAPI instance or dealloc thread). Since this isn't the
case, and since the interface "spec" also doesn't demand that multicast
operations may only be performed with no in-flight transmits,
MCAST_{ADD,DEL} processing also needs to acquire the response lock
around the invocations.

To prevent similar mistakes going forward, "downgrade" the present
functions to private helpers of just the two remaining ones using them
directly, with no forward declarations anymore. This involves renaming
what so far was make_tx_response(), for the new function of that name
to serve the new (wrapper) purpose.

While there,
- constify the txp parameters,
- correct xenvif_idx_release()'s status parameter's type,
- rename {,_}make_tx_response()'s status parameters for consistency with
  xenvif_idx_release()'s.

Fixes: 210c34dcd8d9 ("xen-netback: add support for multicast control")
Cc: stable@vger.kernel.org
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
Of course this could be split into two or even more separate changes,
but I think these transformations are best done all in one go.

It remains questionable whether push_tx_responses() really needs
invoking after every single _make_tx_response().

MCAST_{ADD,DEL} are odd also from another perspective: They're supposed
to come with "dummy requests", with the comment in the public header
leaving open what that means. Netback doesn't check dummy-ness (e.g.
size being zero). Furthermore the description in the public header
doesn't really make clear that there's a restriction of one such "extra"
per dummy request. Yet the way xenvif_get_extras() works precludes
multiple ADDs or multiple DELs in a single dummy request (only the last
one would be honored afaict). While the way xenvif_tx_build_gops() works
precludes an ADD and a DEL coming together in a single dummy request
(the DEL would be ignored).

--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -104,13 +104,12 @@ bool provides_xdp_headroom = true;
 module_param(provides_xdp_headroom, bool, 0644);
 
 static void xenvif_idx_release(struct xenvif_queue *queue, u16 pending_idx,
-			       u8 status);
+			       s8 status);
 
 static void make_tx_response(struct xenvif_queue *queue,
-			     struct xen_netif_tx_request *txp,
+			     const struct xen_netif_tx_request *txp,
 			     unsigned int extra_count,
-			     s8       st);
-static void push_tx_responses(struct xenvif_queue *queue);
+			     s8 status);
 
 static void xenvif_idx_unmap(struct xenvif_queue *queue, u16 pending_idx);
 
@@ -208,13 +207,9 @@ static void xenvif_tx_err(struct xenvif_
 			  unsigned int extra_count, RING_IDX end)
 {
 	RING_IDX cons = queue->tx.req_cons;
-	unsigned long flags;
 
 	do {
-		spin_lock_irqsave(&queue->response_lock, flags);
 		make_tx_response(queue, txp, extra_count, XEN_NETIF_RSP_ERROR);
-		push_tx_responses(queue);
-		spin_unlock_irqrestore(&queue->response_lock, flags);
 		if (cons == end)
 			break;
 		RING_COPY_REQUEST(&queue->tx, cons++, txp);
@@ -465,12 +460,7 @@ static void xenvif_get_requests(struct x
 	for (shinfo->nr_frags = 0; nr_slots > 0 && shinfo->nr_frags < MAX_SKB_FRAGS;
 	     nr_slots--) {
 		if (unlikely(!txp->size)) {
-			unsigned long flags;
-
-			spin_lock_irqsave(&queue->response_lock, flags);
 			make_tx_response(queue, txp, 0, XEN_NETIF_RSP_OKAY);
-			push_tx_responses(queue);
-			spin_unlock_irqrestore(&queue->response_lock, flags);
 			++txp;
 			continue;
 		}
@@ -496,14 +486,8 @@ static void xenvif_get_requests(struct x
 
 		for (shinfo->nr_frags = 0; shinfo->nr_frags < nr_slots; ++txp) {
 			if (unlikely(!txp->size)) {
-				unsigned long flags;
-
-				spin_lock_irqsave(&queue->response_lock, flags);
 				make_tx_response(queue, txp, 0,
 						 XEN_NETIF_RSP_OKAY);
-				push_tx_responses(queue);
-				spin_unlock_irqrestore(&queue->response_lock,
-						       flags);
 				continue;
 			}
 
@@ -995,7 +979,6 @@ static void xenvif_tx_build_gops(struct
 					 (ret == 0) ?
 					 XEN_NETIF_RSP_OKAY :
 					 XEN_NETIF_RSP_ERROR);
-			push_tx_responses(queue);
 			continue;
 		}
 
@@ -1007,7 +990,6 @@ static void xenvif_tx_build_gops(struct
 
 			make_tx_response(queue, &txreq, extra_count,
 					 XEN_NETIF_RSP_OKAY);
-			push_tx_responses(queue);
 			continue;
 		}
 
@@ -1433,8 +1415,35 @@ int xenvif_tx_action(struct xenvif_queue
 	return work_done;
 }
 
+static void _make_tx_response(struct xenvif_queue *queue,
+			     const struct xen_netif_tx_request *txp,
+			     unsigned int extra_count,
+			     s8 status)
+{
+	RING_IDX i = queue->tx.rsp_prod_pvt;
+	struct xen_netif_tx_response *resp;
+
+	resp = RING_GET_RESPONSE(&queue->tx, i);
+	resp->id     = txp->id;
+	resp->status = status;
+
+	while (extra_count-- != 0)
+		RING_GET_RESPONSE(&queue->tx, ++i)->status = XEN_NETIF_RSP_NULL;
+
+	queue->tx.rsp_prod_pvt = ++i;
+}
+
+static void push_tx_responses(struct xenvif_queue *queue)
+{
+	int notify;
+
+	RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&queue->tx, notify);
+	if (notify)
+		notify_remote_via_irq(queue->tx_irq);
+}
+
 static void xenvif_idx_release(struct xenvif_queue *queue, u16 pending_idx,
-			       u8 status)
+			       s8 status)
 {
 	struct pending_tx_info *pending_tx_info;
 	pending_ring_idx_t index;
@@ -1444,8 +1453,8 @@ static void xenvif_idx_release(struct xe
 
 	spin_lock_irqsave(&queue->response_lock, flags);
 
-	make_tx_response(queue, &pending_tx_info->req,
-			 pending_tx_info->extra_count, status);
+	_make_tx_response(queue, &pending_tx_info->req,
+			  pending_tx_info->extra_count, status);
 
 	/* Release the pending index before pusing the Tx response so
 	 * its available before a new Tx request is pushed by the
@@ -1459,32 +1468,19 @@ static void xenvif_idx_release(struct xe
 	spin_unlock_irqrestore(&queue->response_lock, flags);
 }
 
-
 static void make_tx_response(struct xenvif_queue *queue,
-			     struct xen_netif_tx_request *txp,
+			     const struct xen_netif_tx_request *txp,
 			     unsigned int extra_count,
-			     s8       st)
+			     s8 status)
 {
-	RING_IDX i = queue->tx.rsp_prod_pvt;
-	struct xen_netif_tx_response *resp;
-
-	resp = RING_GET_RESPONSE(&queue->tx, i);
-	resp->id     = txp->id;
-	resp->status = st;
-
-	while (extra_count-- != 0)
-		RING_GET_RESPONSE(&queue->tx, ++i)->status = XEN_NETIF_RSP_NULL;
+	unsigned long flags;
 
-	queue->tx.rsp_prod_pvt = ++i;
-}
+	spin_lock_irqsave(&queue->response_lock, flags);
 
-static void push_tx_responses(struct xenvif_queue *queue)
-{
-	int notify;
+	_make_tx_response(queue, txp, extra_count, status);
+	push_tx_responses(queue);
 
-	RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&queue->tx, notify);
-	if (notify)
-		notify_remote_via_irq(queue->tx_irq);
+	spin_unlock_irqrestore(&queue->response_lock, flags);
 }
 
 static void xenvif_idx_unmap(struct xenvif_queue *queue, u16 pending_idx)
Re: [PATCH net] xen-netback: properly sync TX responses
Posted by patchwork-bot+netdevbpf@kernel.org 2 months, 3 weeks ago
Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Mon, 29 Jan 2024 14:03:08 +0100 you wrote:
> Invoking the make_tx_response() / push_tx_responses() pair with no lock
> held would be acceptable only if all such invocations happened from the
> same context (NAPI instance or dealloc thread). Since this isn't the
> case, and since the interface "spec" also doesn't demand that multicast
> operations may only be performed with no in-flight transmits,
> MCAST_{ADD,DEL} processing also needs to acquire the response lock
> around the invocations.
> 
> [...]

Here is the summary with links:
  - [net] xen-netback: properly sync TX responses
    https://git.kernel.org/netdev/net/c/7b55984c96ff

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
Re: [PATCH net] xen-netback: properly sync TX responses
Posted by Paul Durrant 2 months, 3 weeks ago
On 29/01/2024 13:03, Jan Beulich wrote:
> Invoking the make_tx_response() / push_tx_responses() pair with no lock
> held would be acceptable only if all such invocations happened from the
> same context (NAPI instance or dealloc thread). Since this isn't the
> case, and since the interface "spec" also doesn't demand that multicast
> operations may only be performed with no in-flight transmits,
> MCAST_{ADD,DEL} processing also needs to acquire the response lock
> around the invocations.
> 
> To prevent similar mistakes going forward, "downgrade" the present
> functions to private helpers of just the two remaining ones using them
> directly, with no forward declarations anymore. This involves renaming
> what so far was make_tx_response(), for the new function of that name
> to serve the new (wrapper) purpose.
> 
> While there,
> - constify the txp parameters,
> - correct xenvif_idx_release()'s status parameter's type,
> - rename {,_}make_tx_response()'s status parameters for consistency with
>    xenvif_idx_release()'s.
> 
> Fixes: 210c34dcd8d9 ("xen-netback: add support for multicast control")
> Cc: stable@vger.kernel.org
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> Of course this could be split into two or even more separate changes,
> but I think these transformations are best done all in one go.
> 
> It remains questionable whether push_tx_responses() really needs
> invoking after every single _make_tx_response().
> 
> MCAST_{ADD,DEL} are odd also from another perspective: They're supposed
> to come with "dummy requests", with the comment in the public header
> leaving open what that means.

IIRC the only reference I had at the time was Solaris code... I don't 
really there being any documentation of the feature. I think that 
https://github.com/illumos/illumos-gate/blob/master/usr/src/uts/common/xen/io/xnb.c 
is probably similar to the code I looked at to determine what the 
Solaris frontend expected. So I think 'dummy' means 'ignored'.

> Netback doesn't check dummy-ness (e.g.
> size being zero). Furthermore the description in the public header
> doesn't really make clear that there's a restriction of one such "extra"
> per dummy request. Yet the way xenvif_get_extras() works precludes
> multiple ADDs or multiple DELs in a single dummy request (only the last
> one would be honored afaict). While the way xenvif_tx_build_gops() works
> precludes an ADD and a DEL coming together in a single dummy request
> (the DEL would be ignored).

It appears the Solaris backend never coped with multiple extra_info so 
what the 'correct' semantic would be is unclear.

Anyway...

Reviewed-by: Paul Durrant <paul@xen.org>

> 
> --- a/drivers/net/xen-netback/netback.c
> +++ b/drivers/net/xen-netback/netback.c
> @@ -104,13 +104,12 @@ bool provides_xdp_headroom = true;
>   module_param(provides_xdp_headroom, bool, 0644);
>   
>   static void xenvif_idx_release(struct xenvif_queue *queue, u16 pending_idx,
> -			       u8 status);
> +			       s8 status);
>   
>   static void make_tx_response(struct xenvif_queue *queue,
> -			     struct xen_netif_tx_request *txp,
> +			     const struct xen_netif_tx_request *txp,
>   			     unsigned int extra_count,
> -			     s8       st);
> -static void push_tx_responses(struct xenvif_queue *queue);
> +			     s8 status);
>   
>   static void xenvif_idx_unmap(struct xenvif_queue *queue, u16 pending_idx);
>   
> @@ -208,13 +207,9 @@ static void xenvif_tx_err(struct xenvif_
>   			  unsigned int extra_count, RING_IDX end)
>   {
>   	RING_IDX cons = queue->tx.req_cons;
> -	unsigned long flags;
>   
>   	do {
> -		spin_lock_irqsave(&queue->response_lock, flags);
>   		make_tx_response(queue, txp, extra_count, XEN_NETIF_RSP_ERROR);
> -		push_tx_responses(queue);
> -		spin_unlock_irqrestore(&queue->response_lock, flags);
>   		if (cons == end)
>   			break;
>   		RING_COPY_REQUEST(&queue->tx, cons++, txp);
> @@ -465,12 +460,7 @@ static void xenvif_get_requests(struct x
>   	for (shinfo->nr_frags = 0; nr_slots > 0 && shinfo->nr_frags < MAX_SKB_FRAGS;
>   	     nr_slots--) {
>   		if (unlikely(!txp->size)) {
> -			unsigned long flags;
> -
> -			spin_lock_irqsave(&queue->response_lock, flags);
>   			make_tx_response(queue, txp, 0, XEN_NETIF_RSP_OKAY);
> -			push_tx_responses(queue);
> -			spin_unlock_irqrestore(&queue->response_lock, flags);
>   			++txp;
>   			continue;
>   		}
> @@ -496,14 +486,8 @@ static void xenvif_get_requests(struct x
>   
>   		for (shinfo->nr_frags = 0; shinfo->nr_frags < nr_slots; ++txp) {
>   			if (unlikely(!txp->size)) {
> -				unsigned long flags;
> -
> -				spin_lock_irqsave(&queue->response_lock, flags);
>   				make_tx_response(queue, txp, 0,
>   						 XEN_NETIF_RSP_OKAY);
> -				push_tx_responses(queue);
> -				spin_unlock_irqrestore(&queue->response_lock,
> -						       flags);
>   				continue;
>   			}
>   
> @@ -995,7 +979,6 @@ static void xenvif_tx_build_gops(struct
>   					 (ret == 0) ?
>   					 XEN_NETIF_RSP_OKAY :
>   					 XEN_NETIF_RSP_ERROR);
> -			push_tx_responses(queue);
>   			continue;
>   		}
>   
> @@ -1007,7 +990,6 @@ static void xenvif_tx_build_gops(struct
>   
>   			make_tx_response(queue, &txreq, extra_count,
>   					 XEN_NETIF_RSP_OKAY);
> -			push_tx_responses(queue);
>   			continue;
>   		}
>   
> @@ -1433,8 +1415,35 @@ int xenvif_tx_action(struct xenvif_queue
>   	return work_done;
>   }
>   
> +static void _make_tx_response(struct xenvif_queue *queue,
> +			     const struct xen_netif_tx_request *txp,
> +			     unsigned int extra_count,
> +			     s8 status)
> +{
> +	RING_IDX i = queue->tx.rsp_prod_pvt;
> +	struct xen_netif_tx_response *resp;
> +
> +	resp = RING_GET_RESPONSE(&queue->tx, i);
> +	resp->id     = txp->id;
> +	resp->status = status;
> +
> +	while (extra_count-- != 0)
> +		RING_GET_RESPONSE(&queue->tx, ++i)->status = XEN_NETIF_RSP_NULL;
> +
> +	queue->tx.rsp_prod_pvt = ++i;
> +}
> +
> +static void push_tx_responses(struct xenvif_queue *queue)
> +{
> +	int notify;
> +
> +	RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&queue->tx, notify);
> +	if (notify)
> +		notify_remote_via_irq(queue->tx_irq);
> +}
> +
>   static void xenvif_idx_release(struct xenvif_queue *queue, u16 pending_idx,
> -			       u8 status)
> +			       s8 status)
>   {
>   	struct pending_tx_info *pending_tx_info;
>   	pending_ring_idx_t index;
> @@ -1444,8 +1453,8 @@ static void xenvif_idx_release(struct xe
>   
>   	spin_lock_irqsave(&queue->response_lock, flags);
>   
> -	make_tx_response(queue, &pending_tx_info->req,
> -			 pending_tx_info->extra_count, status);
> +	_make_tx_response(queue, &pending_tx_info->req,
> +			  pending_tx_info->extra_count, status);
>   
>   	/* Release the pending index before pusing the Tx response so
>   	 * its available before a new Tx request is pushed by the
> @@ -1459,32 +1468,19 @@ static void xenvif_idx_release(struct xe
>   	spin_unlock_irqrestore(&queue->response_lock, flags);
>   }
>   
> -
>   static void make_tx_response(struct xenvif_queue *queue,
> -			     struct xen_netif_tx_request *txp,
> +			     const struct xen_netif_tx_request *txp,
>   			     unsigned int extra_count,
> -			     s8       st)
> +			     s8 status)
>   {
> -	RING_IDX i = queue->tx.rsp_prod_pvt;
> -	struct xen_netif_tx_response *resp;
> -
> -	resp = RING_GET_RESPONSE(&queue->tx, i);
> -	resp->id     = txp->id;
> -	resp->status = st;
> -
> -	while (extra_count-- != 0)
> -		RING_GET_RESPONSE(&queue->tx, ++i)->status = XEN_NETIF_RSP_NULL;
> +	unsigned long flags;
>   
> -	queue->tx.rsp_prod_pvt = ++i;
> -}
> +	spin_lock_irqsave(&queue->response_lock, flags);
>   
> -static void push_tx_responses(struct xenvif_queue *queue)
> -{
> -	int notify;
> +	_make_tx_response(queue, txp, extra_count, status);
> +	push_tx_responses(queue);
>   
> -	RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&queue->tx, notify);
> -	if (notify)
> -		notify_remote_via_irq(queue->tx_irq);
> +	spin_unlock_irqrestore(&queue->response_lock, flags);
>   }
>   
>   static void xenvif_idx_unmap(struct xenvif_queue *queue, u16 pending_idx)
Re: [PATCH net] xen-netback: properly sync TX responses
Posted by Jakub Kicinski 2 months, 3 weeks ago
On Mon, 29 Jan 2024 14:03:08 +0100 Jan Beulich wrote:
> Invoking the make_tx_response() / push_tx_responses() pair with no lock
> held would be acceptable only if all such invocations happened from the
> same context (NAPI instance or dealloc thread). Since this isn't the
> case, and since the interface "spec" also doesn't demand that multicast
> operations may only be performed with no in-flight transmits,
> MCAST_{ADD,DEL} processing also needs to acquire the response lock
> around the invocations.
> 
> To prevent similar mistakes going forward, "downgrade" the present
> functions to private helpers of just the two remaining ones using them
> directly, with no forward declarations anymore. This involves renaming
> what so far was make_tx_response(), for the new function of that name
> to serve the new (wrapper) purpose.
> 
> While there,
> - constify the txp parameters,
> - correct xenvif_idx_release()'s status parameter's type,
> - rename {,_}make_tx_response()'s status parameters for consistency with
>   xenvif_idx_release()'s.

Hi Paul, is this one on your TODO list to review or should 
we do our best? :)
-- 
pw-bot: needs-ack
Re: [PATCH net] xen-netback: properly sync TX responses
Posted by Paul Durrant 2 months, 3 weeks ago
On 01/02/2024 00:23, Jakub Kicinski wrote:
> On Mon, 29 Jan 2024 14:03:08 +0100 Jan Beulich wrote:
>> Invoking the make_tx_response() / push_tx_responses() pair with no lock
>> held would be acceptable only if all such invocations happened from the
>> same context (NAPI instance or dealloc thread). Since this isn't the
>> case, and since the interface "spec" also doesn't demand that multicast
>> operations may only be performed with no in-flight transmits,
>> MCAST_{ADD,DEL} processing also needs to acquire the response lock
>> around the invocations.
>>
>> To prevent similar mistakes going forward, "downgrade" the present
>> functions to private helpers of just the two remaining ones using them
>> directly, with no forward declarations anymore. This involves renaming
>> what so far was make_tx_response(), for the new function of that name
>> to serve the new (wrapper) purpose.
>>
>> While there,
>> - constify the txp parameters,
>> - correct xenvif_idx_release()'s status parameter's type,
>> - rename {,_}make_tx_response()'s status parameters for consistency with
>>    xenvif_idx_release()'s.
> 
> Hi Paul, is this one on your TODO list to review or should
> we do our best? :)

Sorry for the delay. I'll take a look at this now.

   Paul