[PATCH net-next] virtio_net: Fix napi_skb_cache_put warning

Breno Leitao posted 1 patch 1 year, 5 months ago
There is a newer version of this series
drivers/net/virtio_net.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
[PATCH net-next] virtio_net: Fix napi_skb_cache_put warning
Posted by Breno Leitao 1 year, 5 months ago
After the commit bdacf3e34945 ("net: Use nested-BH locking for
napi_alloc_cache.") was merged, the following warning began to appear:

	 WARNING: CPU: 5 PID: 1 at net/core/skbuff.c:1451 napi_skb_cache_put+0x82/0x4b0

	  __warn+0x12f/0x340
	  napi_skb_cache_put+0x82/0x4b0
	  napi_skb_cache_put+0x82/0x4b0
	  report_bug+0x165/0x370
	  handle_bug+0x3d/0x80
	  exc_invalid_op+0x1a/0x50
	  asm_exc_invalid_op+0x1a/0x20
	  __free_old_xmit+0x1c8/0x510
	  napi_skb_cache_put+0x82/0x4b0
	  __free_old_xmit+0x1c8/0x510
	  __free_old_xmit+0x1c8/0x510
	  __pfx___free_old_xmit+0x10/0x10

The issue arises because virtio is assuming it's running in NAPI context
even when it's not, such as in the netpoll case.

To resolve this, modify virtnet_poll_tx() to only set NAPI when budget
is available. Same for virtnet_poll_cleantx(), which always assumed that
it was in a NAPI context.

Fixes: df133f3f9625 ("virtio_net: bulk free tx skbs")
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
---
 drivers/net/virtio_net.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 0b4747e81464..fb1331827308 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -2341,7 +2341,7 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
 	return packets;
 }
 
-static void virtnet_poll_cleantx(struct receive_queue *rq)
+static void virtnet_poll_cleantx(struct receive_queue *rq, int budget)
 {
 	struct virtnet_info *vi = rq->vq->vdev->priv;
 	unsigned int index = vq2rxq(rq->vq);
@@ -2359,7 +2359,7 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
 
 		do {
 			virtqueue_disable_cb(sq->vq);
-			free_old_xmit(sq, txq, true);
+			free_old_xmit(sq, txq, !!budget);
 		} while (unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
 
 		if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS) {
@@ -2404,7 +2404,7 @@ static int virtnet_poll(struct napi_struct *napi, int budget)
 	unsigned int xdp_xmit = 0;
 	bool napi_complete;
 
-	virtnet_poll_cleantx(rq);
+	virtnet_poll_cleantx(rq, budget);
 
 	received = virtnet_receive(rq, budget, &xdp_xmit);
 	rq->packets_in_napi += received;
@@ -2526,7 +2526,7 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget)
 	txq = netdev_get_tx_queue(vi->dev, index);
 	__netif_tx_lock(txq, raw_smp_processor_id());
 	virtqueue_disable_cb(sq->vq);
-	free_old_xmit(sq, txq, true);
+	free_old_xmit(sq, txq, !!budget);
 
 	if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS) {
 		if (netif_tx_queue_stopped(txq)) {
-- 
2.43.0
Re: [PATCH net-next] virtio_net: Fix napi_skb_cache_put warning
Posted by Heng Qi 1 year, 5 months ago
在 2024/7/12 下午7:53, Breno Leitao 写道:
> After the commit bdacf3e34945 ("net: Use nested-BH locking for
> napi_alloc_cache.") was merged, the following warning began to appear:
>
> 	 WARNING: CPU: 5 PID: 1 at net/core/skbuff.c:1451 napi_skb_cache_put+0x82/0x4b0
>
> 	  __warn+0x12f/0x340
> 	  napi_skb_cache_put+0x82/0x4b0
> 	  napi_skb_cache_put+0x82/0x4b0
> 	  report_bug+0x165/0x370
> 	  handle_bug+0x3d/0x80
> 	  exc_invalid_op+0x1a/0x50
> 	  asm_exc_invalid_op+0x1a/0x20
> 	  __free_old_xmit+0x1c8/0x510
> 	  napi_skb_cache_put+0x82/0x4b0
> 	  __free_old_xmit+0x1c8/0x510
> 	  __free_old_xmit+0x1c8/0x510
> 	  __pfx___free_old_xmit+0x10/0x10
>
> The issue arises because virtio is assuming it's running in NAPI context
> even when it's not, such as in the netpoll case.
>
> To resolve this, modify virtnet_poll_tx() to only set NAPI when budget
> is available. Same for virtnet_poll_cleantx(), which always assumed that
> it was in a NAPI context.
>
> Fixes: df133f3f9625 ("virtio_net: bulk free tx skbs")
> Suggested-by: Jakub Kicinski <kuba@kernel.org>
> Signed-off-by: Breno Leitao <leitao@debian.org>
> ---
>   drivers/net/virtio_net.c | 8 ++++----
>   1 file changed, 4 insertions(+), 4 deletions(-)

Reviewed-by: Heng Qi <hengqi@linux.alibaba.com>

>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 0b4747e81464..fb1331827308 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -2341,7 +2341,7 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
>   	return packets;
>   }
>   
> -static void virtnet_poll_cleantx(struct receive_queue *rq)
> +static void virtnet_poll_cleantx(struct receive_queue *rq, int budget)
>   {
>   	struct virtnet_info *vi = rq->vq->vdev->priv;
>   	unsigned int index = vq2rxq(rq->vq);
> @@ -2359,7 +2359,7 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
>   
>   		do {
>   			virtqueue_disable_cb(sq->vq);
> -			free_old_xmit(sq, txq, true);
> +			free_old_xmit(sq, txq, !!budget);
>   		} while (unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
>   
>   		if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS) {
> @@ -2404,7 +2404,7 @@ static int virtnet_poll(struct napi_struct *napi, int budget)
>   	unsigned int xdp_xmit = 0;
>   	bool napi_complete;
>   
> -	virtnet_poll_cleantx(rq);
> +	virtnet_poll_cleantx(rq, budget);
>   
>   	received = virtnet_receive(rq, budget, &xdp_xmit);
>   	rq->packets_in_napi += received;
> @@ -2526,7 +2526,7 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget)
>   	txq = netdev_get_tx_queue(vi->dev, index);
>   	__netif_tx_lock(txq, raw_smp_processor_id());
>   	virtqueue_disable_cb(sq->vq);
> -	free_old_xmit(sq, txq, true);
> +	free_old_xmit(sq, txq, !!budget);
>   
>   	if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS) {
>   		if (netif_tx_queue_stopped(txq)) {
Re: [PATCH net-next] virtio_net: Fix napi_skb_cache_put warning
Posted by Jason Wang 1 year, 5 months ago
On Fri, Jul 12, 2024 at 7:54 PM Breno Leitao <leitao@debian.org> wrote:
>
> After the commit bdacf3e34945 ("net: Use nested-BH locking for
> napi_alloc_cache.") was merged, the following warning began to appear:
>
>          WARNING: CPU: 5 PID: 1 at net/core/skbuff.c:1451 napi_skb_cache_put+0x82/0x4b0
>
>           __warn+0x12f/0x340
>           napi_skb_cache_put+0x82/0x4b0
>           napi_skb_cache_put+0x82/0x4b0
>           report_bug+0x165/0x370
>           handle_bug+0x3d/0x80
>           exc_invalid_op+0x1a/0x50
>           asm_exc_invalid_op+0x1a/0x20
>           __free_old_xmit+0x1c8/0x510
>           napi_skb_cache_put+0x82/0x4b0
>           __free_old_xmit+0x1c8/0x510
>           __free_old_xmit+0x1c8/0x510
>           __pfx___free_old_xmit+0x10/0x10
>
> The issue arises because virtio is assuming it's running in NAPI context
> even when it's not, such as in the netpoll case.
>
> To resolve this, modify virtnet_poll_tx() to only set NAPI when budget
> is available. Same for virtnet_poll_cleantx(), which always assumed that
> it was in a NAPI context.
>
> Fixes: df133f3f9625 ("virtio_net: bulk free tx skbs")
> Suggested-by: Jakub Kicinski <kuba@kernel.org>
> Signed-off-by: Breno Leitao <leitao@debian.org>
> ---

Acked-by: Jason Wang <jasowang@redhat.com>

Thanks
Re: [PATCH net-next] virtio_net: Fix napi_skb_cache_put warning
Posted by Michael S. Tsirkin 1 year, 5 months ago
On Fri, Jul 12, 2024 at 04:53:25AM -0700, Breno Leitao wrote:
> After the commit bdacf3e34945 ("net: Use nested-BH locking for
> napi_alloc_cache.") was merged, the following warning began to appear:
> 
> 	 WARNING: CPU: 5 PID: 1 at net/core/skbuff.c:1451 napi_skb_cache_put+0x82/0x4b0
> 
> 	  __warn+0x12f/0x340
> 	  napi_skb_cache_put+0x82/0x4b0
> 	  napi_skb_cache_put+0x82/0x4b0
> 	  report_bug+0x165/0x370
> 	  handle_bug+0x3d/0x80
> 	  exc_invalid_op+0x1a/0x50
> 	  asm_exc_invalid_op+0x1a/0x20
> 	  __free_old_xmit+0x1c8/0x510
> 	  napi_skb_cache_put+0x82/0x4b0
> 	  __free_old_xmit+0x1c8/0x510
> 	  __free_old_xmit+0x1c8/0x510
> 	  __pfx___free_old_xmit+0x10/0x10
> 
> The issue arises because virtio is assuming it's running in NAPI context
> even when it's not, such as in the netpoll case.
> 
> To resolve this, modify virtnet_poll_tx() to only set NAPI when budget
> is available. Same for virtnet_poll_cleantx(), which always assumed that
> it was in a NAPI context.
> 
> Fixes: df133f3f9625 ("virtio_net: bulk free tx skbs")
> Suggested-by: Jakub Kicinski <kuba@kernel.org>
> Signed-off-by: Breno Leitao <leitao@debian.org>

Acked-by: Michael S. Tsirkin <mst@redhat.com>

though I'm not sure I understand the connection with bdacf3e34945.

> ---
>  drivers/net/virtio_net.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 0b4747e81464..fb1331827308 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -2341,7 +2341,7 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
>  	return packets;
>  }
>  
> -static void virtnet_poll_cleantx(struct receive_queue *rq)
> +static void virtnet_poll_cleantx(struct receive_queue *rq, int budget)
>  {
>  	struct virtnet_info *vi = rq->vq->vdev->priv;
>  	unsigned int index = vq2rxq(rq->vq);
> @@ -2359,7 +2359,7 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
>  
>  		do {
>  			virtqueue_disable_cb(sq->vq);
> -			free_old_xmit(sq, txq, true);
> +			free_old_xmit(sq, txq, !!budget);
>  		} while (unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
>  
>  		if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS) {
> @@ -2404,7 +2404,7 @@ static int virtnet_poll(struct napi_struct *napi, int budget)
>  	unsigned int xdp_xmit = 0;
>  	bool napi_complete;
>  
> -	virtnet_poll_cleantx(rq);
> +	virtnet_poll_cleantx(rq, budget);
>  
>  	received = virtnet_receive(rq, budget, &xdp_xmit);
>  	rq->packets_in_napi += received;
> @@ -2526,7 +2526,7 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget)
>  	txq = netdev_get_tx_queue(vi->dev, index);
>  	__netif_tx_lock(txq, raw_smp_processor_id());
>  	virtqueue_disable_cb(sq->vq);
> -	free_old_xmit(sq, txq, true);
> +	free_old_xmit(sq, txq, !!budget);
>  
>  	if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS) {
>  		if (netif_tx_queue_stopped(txq)) {
> -- 
> 2.43.0
Re: [PATCH net-next] virtio_net: Fix napi_skb_cache_put warning
Posted by Breno Leitao 1 year, 5 months ago
Hello Michael,

On Sun, Jul 14, 2024 at 03:38:42AM -0400, Michael S. Tsirkin wrote:
> On Fri, Jul 12, 2024 at 04:53:25AM -0700, Breno Leitao wrote:
> > After the commit bdacf3e34945 ("net: Use nested-BH locking for
> > napi_alloc_cache.") was merged, the following warning began to appear:
> > 
> > 	 WARNING: CPU: 5 PID: 1 at net/core/skbuff.c:1451 napi_skb_cache_put+0x82/0x4b0
> > 
> > 	  __warn+0x12f/0x340
> > 	  napi_skb_cache_put+0x82/0x4b0
> > 	  napi_skb_cache_put+0x82/0x4b0
> > 	  report_bug+0x165/0x370
> > 	  handle_bug+0x3d/0x80
> > 	  exc_invalid_op+0x1a/0x50
> > 	  asm_exc_invalid_op+0x1a/0x20
> > 	  __free_old_xmit+0x1c8/0x510
> > 	  napi_skb_cache_put+0x82/0x4b0
> > 	  __free_old_xmit+0x1c8/0x510
> > 	  __free_old_xmit+0x1c8/0x510
> > 	  __pfx___free_old_xmit+0x10/0x10
> > 
> > The issue arises because virtio is assuming it's running in NAPI context
> > even when it's not, such as in the netpoll case.
> > 
> > To resolve this, modify virtnet_poll_tx() to only set NAPI when budget
> > is available. Same for virtnet_poll_cleantx(), which always assumed that
> > it was in a NAPI context.
> > 
> > Fixes: df133f3f9625 ("virtio_net: bulk free tx skbs")
> > Suggested-by: Jakub Kicinski <kuba@kernel.org>
> > Signed-off-by: Breno Leitao <leitao@debian.org>
> 
> Acked-by: Michael S. Tsirkin <mst@redhat.com>
> 
> though I'm not sure I understand the connection with bdacf3e34945.

The warning above appeared after bdacf3e34945 landed.
Re: [PATCH net-next] virtio_net: Fix napi_skb_cache_put warning
Posted by Leonardo Bras 1 year, 3 months ago
On Mon, Jul 15, 2024 at 04:25:06AM -0700, Breno Leitao wrote:
> Hello Michael,
> 
> On Sun, Jul 14, 2024 at 03:38:42AM -0400, Michael S. Tsirkin wrote:
> > On Fri, Jul 12, 2024 at 04:53:25AM -0700, Breno Leitao wrote:
> > > After the commit bdacf3e34945 ("net: Use nested-BH locking for
> > > napi_alloc_cache.") was merged, the following warning began to appear:
> > > 
> > > 	 WARNING: CPU: 5 PID: 1 at net/core/skbuff.c:1451 napi_skb_cache_put+0x82/0x4b0
> > > 
> > > 	  __warn+0x12f/0x340
> > > 	  napi_skb_cache_put+0x82/0x4b0
> > > 	  napi_skb_cache_put+0x82/0x4b0
> > > 	  report_bug+0x165/0x370
> > > 	  handle_bug+0x3d/0x80
> > > 	  exc_invalid_op+0x1a/0x50
> > > 	  asm_exc_invalid_op+0x1a/0x20
> > > 	  __free_old_xmit+0x1c8/0x510
> > > 	  napi_skb_cache_put+0x82/0x4b0
> > > 	  __free_old_xmit+0x1c8/0x510
> > > 	  __free_old_xmit+0x1c8/0x510
> > > 	  __pfx___free_old_xmit+0x10/0x10
> > > 
> > > The issue arises because virtio is assuming it's running in NAPI context
> > > even when it's not, such as in the netpoll case.
> > > 
> > > To resolve this, modify virtnet_poll_tx() to only set NAPI when budget
> > > is available. Same for virtnet_poll_cleantx(), which always assumed that
> > > it was in a NAPI context.
> > > 
> > > Fixes: df133f3f9625 ("virtio_net: bulk free tx skbs")
> > > Suggested-by: Jakub Kicinski <kuba@kernel.org>
> > > Signed-off-by: Breno Leitao <leitao@debian.org>
> > 
> > Acked-by: Michael S. Tsirkin <mst@redhat.com>
> > 
> > though I'm not sure I understand the connection with bdacf3e34945.
> 
> The warning above appeared after bdacf3e34945 landed.

Hi Breno,
Thanks for fixing this!

I think the confusion is around the fact that the commit on Fixes 
(df133f3f9625) tag is different from the commit in the commit message
(bdacf3e34945).

Please help me check if the following is correct:
###
Any tree which includes df133f3f9625 ("virtio_net: bulk free tx skbs") 
should also include your patch, since it fixes stuff in there.

The fact that the warning was only made visible in 
bdacf3e34945 ("net: Use nested-BH locking for napi_alloc_cache.")
does not change the fact that it was already present before.

Also, having bdacf3e34945 is not necessary for the backport, since
it only made the bug visible.
###

Are above statements right?

It's important to make it clear since this helps the backporting process.

Thanks!
Leo
Re: [PATCH net-next] virtio_net: Fix napi_skb_cache_put warning
Posted by Breno Leitao 1 year, 3 months ago
Hello Leonardo, good to see you here,

On Tue, Sep 03, 2024 at 01:28:50PM -0300, Leonardo Bras wrote:
> Please help me check if the following is correct:
> ###
> Any tree which includes df133f3f9625 ("virtio_net: bulk free tx skbs") 
> should also include your patch, since it fixes stuff in there.
> 
> The fact that the warning was only made visible in 
> bdacf3e34945 ("net: Use nested-BH locking for napi_alloc_cache.")
> does not change the fact that it was already present before.
> 
> Also, having bdacf3e34945 is not necessary for the backport, since
> it only made the bug visible.
> ###
> 
> Are above statements right?

That is exactly correct.

The bug was introduced by df133f3f9625 ("virtio_net: bulk free tx
skbs"), but it was not visible until bdacf3e34945 ("net: Use nested-BH
locking for napi_alloc_cache.") landed.

You don't need bdacf3e34945 ("net: Use nested-BH locking for
napi_alloc_cache.") patch backported if you don't want to.

I hope it helps,
--breno
Re: [PATCH net-next] virtio_net: Fix napi_skb_cache_put warning
Posted by Jakub Kicinski 1 year, 5 months ago
On Fri, 12 Jul 2024 04:53:25 -0700 Breno Leitao wrote:
> After the commit bdacf3e34945 ("net: Use nested-BH locking for
> napi_alloc_cache.") was merged, the following warning began to appear:
> 
> 	 WARNING: CPU: 5 PID: 1 at net/core/skbuff.c:1451 napi_skb_cache_put+0x82/0x4b0
> 
> 	  __warn+0x12f/0x340
> 	  napi_skb_cache_put+0x82/0x4b0
> 	  napi_skb_cache_put+0x82/0x4b0
> 	  report_bug+0x165/0x370
> 	  handle_bug+0x3d/0x80
> 	  exc_invalid_op+0x1a/0x50
> 	  asm_exc_invalid_op+0x1a/0x20
> 	  __free_old_xmit+0x1c8/0x510
> 	  napi_skb_cache_put+0x82/0x4b0
> 	  __free_old_xmit+0x1c8/0x510
> 	  __free_old_xmit+0x1c8/0x510
> 	  __pfx___free_old_xmit+0x10/0x10
> 
> The issue arises because virtio is assuming it's running in NAPI context
> even when it's not, such as in the netpoll case.
> 
> To resolve this, modify virtnet_poll_tx() to only set NAPI when budget
> is available. Same for virtnet_poll_cleantx(), which always assumed that
> it was in a NAPI context.
> 
> Fixes: df133f3f9625 ("virtio_net: bulk free tx skbs")
> Suggested-by: Jakub Kicinski <kuba@kernel.org>
> Signed-off-by: Breno Leitao <leitao@debian.org>

Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Re: [PATCH net-next] virtio_net: Fix napi_skb_cache_put warning
Posted by Jakub Kicinski 1 year, 5 months ago
On Fri, 12 Jul 2024 04:53:25 -0700 Breno Leitao wrote:
> Subject: [PATCH net-next] virtio_net: Fix napi_skb_cache_put warning

[PATCH net] for fixes so that the bot knows what to test against :)
No need to repost (this time).
Re: [PATCH net-next] virtio_net: Fix napi_skb_cache_put warning
Posted by Breno Leitao 1 year, 5 months ago
Hello Jakub,

On Fri, Jul 12, 2024 at 07:54:32AM -0700, Jakub Kicinski wrote:
> On Fri, 12 Jul 2024 04:53:25 -0700 Breno Leitao wrote:
> > Subject: [PATCH net-next] virtio_net: Fix napi_skb_cache_put warning
> 
> [PATCH net] for fixes so that the bot knows what to test against :)
> No need to repost (this time).

I didn't send to `net` since this WARNING is only "showing" in net-next,
due to commit bdacf3e34945 ("net: Use nested-BH locking for
napi_alloc_cache.") being only in net-next.

But you have a good point, this is a fix and it should go through `net`.
sorry about it.

--breno
Re: [PATCH net-next] virtio_net: Fix napi_skb_cache_put warning
Posted by Jakub Kicinski 1 year, 5 months ago
On Fri, 12 Jul 2024 07:58:49 -0700 Breno Leitao wrote:
> I didn't send to `net` since this WARNING is only "showing" in net-next,
> due to commit bdacf3e34945 ("net: Use nested-BH locking for
> napi_alloc_cache.") being only in net-next.
> 
> But you have a good point, this is a fix and it should go through `net`.
> sorry about it.

Hah, but it doesn't seem to apply to net. Let's wait and see if Linus
cuts final on Sunday. If he does I'll apply to net-next and you'll have
to send the net version for stable to Greg. Less merge conflicts for me
that way ;) If there's -rc8 please rebase.