[PATCH net] virtio-net: correctly enable callback during start_xmit

Jason Wang posted 1 patch 2 years, 9 months ago
There is a newer version of this series
drivers/net/virtio_net.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
[PATCH net] virtio-net: correctly enable callback during start_xmit
Posted by Jason Wang 2 years, 9 months ago
Commit a7766ef18b33("virtio_net: disable cb aggressively") enables
virtqueue callback via the following statement:

        do {
           ......
	} while (use_napi && kick &&
               unlikely(!virtqueue_enable_cb_delayed(sq->vq)));

This will cause a missing call to virtqueue_enable_cb_delayed() when
kick is false. Fixing this by removing the checking of the kick from
the condition to make sure callback is enabled correctly.

Fixes: a7766ef18b33 ("virtio_net: disable cb aggressively")
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
The patch is needed for -stable.
---
 drivers/net/virtio_net.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 86e52454b5b5..44d7daf0267b 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1834,8 +1834,8 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
 
 		free_old_xmit_skbs(sq, false);
 
-	} while (use_napi && kick &&
-	       unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
+	} while (use_napi &&
+		 unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
 
 	/* timestamp packet in software */
 	skb_tx_timestamp(skb);
-- 
2.25.1
Re: [PATCH net] virtio-net: correctly enable callback during start_xmit
Posted by Michael S. Tsirkin 2 years, 9 months ago
On Mon, Dec 12, 2022 at 05:10:29PM +0800, Jason Wang wrote:
> Commit a7766ef18b33("virtio_net: disable cb aggressively") enables
> virtqueue callback via the following statement:
> 
>         do {
>            ......
> 	} while (use_napi && kick &&
>                unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
> 
> This will cause a missing call to virtqueue_enable_cb_delayed() when
> kick is false. Fixing this by removing the checking of the kick from
> the condition to make sure callback is enabled correctly.
> 
> Fixes: a7766ef18b33 ("virtio_net: disable cb aggressively")
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
> The patch is needed for -stable.

stable rules don't allow for theoretical fixes. Was a problem observed?

> ---
>  drivers/net/virtio_net.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 86e52454b5b5..44d7daf0267b 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1834,8 +1834,8 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
>  
>  		free_old_xmit_skbs(sq, false);
>  
> -	} while (use_napi && kick &&
> -	       unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
> +	} while (use_napi &&
> +		 unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
>

A bit more explanation pls.  kick simply means !netdev_xmit_more -
if it's false we know there will be another packet, then transmissing
that packet will invoke virtqueue_enable_cb_delayed. No?




  
>  	/* timestamp packet in software */
>  	skb_tx_timestamp(skb);
> -- 
> 2.25.1
Re: [PATCH net] virtio-net: correctly enable callback during start_xmit
Posted by Xuan Zhuo 2 years, 9 months ago
On Mon, 12 Dec 2022 04:25:22 -0500, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Mon, Dec 12, 2022 at 05:10:29PM +0800, Jason Wang wrote:
> > Commit a7766ef18b33("virtio_net: disable cb aggressively") enables
> > virtqueue callback via the following statement:
> >
> >         do {
> >            ......
> > 	} while (use_napi && kick &&
> >                unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
> >
> > This will cause a missing call to virtqueue_enable_cb_delayed() when
> > kick is false. Fixing this by removing the checking of the kick from
> > the condition to make sure callback is enabled correctly.
> >
> > Fixes: a7766ef18b33 ("virtio_net: disable cb aggressively")
> > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > ---
> > The patch is needed for -stable.
>
> stable rules don't allow for theoretical fixes. Was a problem observed?
>
> > ---
> >  drivers/net/virtio_net.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 86e52454b5b5..44d7daf0267b 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -1834,8 +1834,8 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> >
> >  		free_old_xmit_skbs(sq, false);
> >
> > -	} while (use_napi && kick &&
> > -	       unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
> > +	} while (use_napi &&
> > +		 unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
> >
>
> A bit more explanation pls.  kick simply means !netdev_xmit_more -
> if it's false we know there will be another packet, then transmissing
> that packet will invoke virtqueue_enable_cb_delayed. No?

It's just that there may be a next packet, but in fact there may not be.
For example, the vq is full, and the driver stops the queue.

Thanks.

>
>
>
>
>
> >  	/* timestamp packet in software */
> >  	skb_tx_timestamp(skb);
> > --
> > 2.25.1
>
> _______________________________________________
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net] virtio-net: correctly enable callback during start_xmit
Posted by Jason Wang 2 years, 9 months ago
On Tue, Dec 13, 2022 at 11:38 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Mon, 12 Dec 2022 04:25:22 -0500, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > On Mon, Dec 12, 2022 at 05:10:29PM +0800, Jason Wang wrote:
> > > Commit a7766ef18b33("virtio_net: disable cb aggressively") enables
> > > virtqueue callback via the following statement:
> > >
> > >         do {
> > >            ......
> > >     } while (use_napi && kick &&
> > >                unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
> > >
> > > This will cause a missing call to virtqueue_enable_cb_delayed() when
> > > kick is false. Fixing this by removing the checking of the kick from
> > > the condition to make sure callback is enabled correctly.
> > >
> > > Fixes: a7766ef18b33 ("virtio_net: disable cb aggressively")
> > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > ---
> > > The patch is needed for -stable.
> >
> > stable rules don't allow for theoretical fixes. Was a problem observed?

Yes, running a pktgen sample script can lead to a tx timeout.

> >
> > > ---
> > >  drivers/net/virtio_net.c | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index 86e52454b5b5..44d7daf0267b 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -1834,8 +1834,8 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> > >
> > >             free_old_xmit_skbs(sq, false);
> > >
> > > -   } while (use_napi && kick &&
> > > -          unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
> > > +   } while (use_napi &&
> > > +            unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
> > >
> >
> > A bit more explanation pls.  kick simply means !netdev_xmit_more -
> > if it's false we know there will be another packet, then transmissing
> > that packet will invoke virtqueue_enable_cb_delayed. No?
>
> It's just that there may be a next packet, but in fact there may not be.
> For example, the vq is full, and the driver stops the queue.

Exactly, when the queue is about to be full we disable tx and wait for
the next tx interrupt to re-enable tx.

Thanks

>
> Thanks.
>
> >
> >
> >
> >
> >
> > >     /* timestamp packet in software */
> > >     skb_tx_timestamp(skb);
> > > --
> > > 2.25.1
> >
> > _______________________________________________
> > Virtualization mailing list
> > Virtualization@lists.linux-foundation.org
> > https://lists.linuxfoundation.org/mailman/listinfo/virtualization
>
Re: [PATCH net] virtio-net: correctly enable callback during start_xmit
Posted by Michael S. Tsirkin 2 years, 9 months ago
On Tue, Dec 13, 2022 at 11:43:36AM +0800, Jason Wang wrote:
> On Tue, Dec 13, 2022 at 11:38 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Mon, 12 Dec 2022 04:25:22 -0500, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > On Mon, Dec 12, 2022 at 05:10:29PM +0800, Jason Wang wrote:
> > > > Commit a7766ef18b33("virtio_net: disable cb aggressively") enables
> > > > virtqueue callback via the following statement:
> > > >
> > > >         do {
> > > >            ......
> > > >     } while (use_napi && kick &&
> > > >                unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
> > > >
> > > > This will cause a missing call to virtqueue_enable_cb_delayed() when
> > > > kick is false. Fixing this by removing the checking of the kick from
> > > > the condition to make sure callback is enabled correctly.
> > > >
> > > > Fixes: a7766ef18b33 ("virtio_net: disable cb aggressively")
> > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > ---
> > > > The patch is needed for -stable.
> > >
> > > stable rules don't allow for theoretical fixes. Was a problem observed?
> 
> Yes, running a pktgen sample script can lead to a tx timeout.

Since April 2021 and we only noticed now? Are you sure it's the
right Fixes tag?

> > >
> > > > ---
> > > >  drivers/net/virtio_net.c | 4 ++--
> > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > index 86e52454b5b5..44d7daf0267b 100644
> > > > --- a/drivers/net/virtio_net.c
> > > > +++ b/drivers/net/virtio_net.c
> > > > @@ -1834,8 +1834,8 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> > > >
> > > >             free_old_xmit_skbs(sq, false);
> > > >
> > > > -   } while (use_napi && kick &&
> > > > -          unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
> > > > +   } while (use_napi &&
> > > > +            unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
> > > >
> > >
> > > A bit more explanation pls.  kick simply means !netdev_xmit_more -
> > > if it's false we know there will be another packet, then transmissing
> > > that packet will invoke virtqueue_enable_cb_delayed. No?
> >
> > It's just that there may be a next packet, but in fact there may not be.
> > For example, the vq is full, and the driver stops the queue.
> 
> Exactly, when the queue is about to be full we disable tx and wait for
> the next tx interrupt to re-enable tx.
> 
> Thanks

OK, it's a good idea to document that.
And we should enable callbacks at that point, not here on data path.


> >
> > Thanks.
> >
> > >
> > >
> > >
> > >
> > >
> > > >     /* timestamp packet in software */
> > > >     skb_tx_timestamp(skb);
> > > > --
> > > > 2.25.1
> > >
> > > _______________________________________________
> > > Virtualization mailing list
> > > Virtualization@lists.linux-foundation.org
> > > https://lists.linuxfoundation.org/mailman/listinfo/virtualization
> >
Re: [PATCH net] virtio-net: correctly enable callback during start_xmit
Posted by Jason Wang 2 years, 9 months ago
On Tue, Dec 13, 2022 at 2:38 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Dec 13, 2022 at 11:43:36AM +0800, Jason Wang wrote:
> > On Tue, Dec 13, 2022 at 11:38 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > On Mon, 12 Dec 2022 04:25:22 -0500, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > On Mon, Dec 12, 2022 at 05:10:29PM +0800, Jason Wang wrote:
> > > > > Commit a7766ef18b33("virtio_net: disable cb aggressively") enables
> > > > > virtqueue callback via the following statement:
> > > > >
> > > > >         do {
> > > > >            ......
> > > > >     } while (use_napi && kick &&
> > > > >                unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
> > > > >
> > > > > This will cause a missing call to virtqueue_enable_cb_delayed() when
> > > > > kick is false. Fixing this by removing the checking of the kick from
> > > > > the condition to make sure callback is enabled correctly.
> > > > >
> > > > > Fixes: a7766ef18b33 ("virtio_net: disable cb aggressively")
> > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > > ---
> > > > > The patch is needed for -stable.
> > > >
> > > > stable rules don't allow for theoretical fixes. Was a problem observed?
> >
> > Yes, running a pktgen sample script can lead to a tx timeout.
>
> Since April 2021 and we only noticed now? Are you sure it's the
> right Fixes tag?

Well, reverting a7766ef18b33 makes pktgen work again.

The reason we doesn't notice is probably because:

1) We don't support BQL, so no bulk dequeuing (skb list) in normal traffic
2) When burst is enabled for pktgen, it can do bulk xmit via skb list by its own

>
> > > >
> > > > > ---
> > > > >  drivers/net/virtio_net.c | 4 ++--
> > > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > > >
> > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > index 86e52454b5b5..44d7daf0267b 100644
> > > > > --- a/drivers/net/virtio_net.c
> > > > > +++ b/drivers/net/virtio_net.c
> > > > > @@ -1834,8 +1834,8 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> > > > >
> > > > >             free_old_xmit_skbs(sq, false);
> > > > >
> > > > > -   } while (use_napi && kick &&
> > > > > -          unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
> > > > > +   } while (use_napi &&
> > > > > +            unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
> > > > >
> > > >
> > > > A bit more explanation pls.  kick simply means !netdev_xmit_more -
> > > > if it's false we know there will be another packet, then transmissing
> > > > that packet will invoke virtqueue_enable_cb_delayed. No?
> > >
> > > It's just that there may be a next packet, but in fact there may not be.
> > > For example, the vq is full, and the driver stops the queue.
> >
> > Exactly, when the queue is about to be full we disable tx and wait for
> > the next tx interrupt to re-enable tx.
> >
> > Thanks
>
> OK, it's a good idea to document that.

Will do.

> And we should enable callbacks at that point, not here on data path.

I'm not sure I understand here. Are you suggesting removing the
!user_napi check here?

                if (!use_napi &&
                    unlikely(!virtqueue_enable_cb_delayed(sq->vq))) {
                        /* More just got used, free them then recheck. */
                        free_old_xmit_skbs(sq, false);
                        if (sq->vq->num_free >= 2+MAX_SKB_FRAGS) {
                                netif_start_subqueue(dev, qnum);
                                virtqueue_disable_cb(sq->vq);
                        }
                }

Btw, it doesn't differ too much as kick is always true without pktgen
and that may even need more comments or make the code even harder to
read. We need a patch for -stable at least so I prefer to let this
patch go first and do optimization on top.

Thanks

>
>
> > >
> > > Thanks.
> > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > >     /* timestamp packet in software */
> > > > >     skb_tx_timestamp(skb);
> > > > > --
> > > > > 2.25.1
> > > >
> > > > _______________________________________________
> > > > Virtualization mailing list
> > > > Virtualization@lists.linux-foundation.org
> > > > https://lists.linuxfoundation.org/mailman/listinfo/virtualization
> > >
>
Re: [PATCH net] virtio-net: correctly enable callback during start_xmit
Posted by Michael S. Tsirkin 2 years, 9 months ago
On Tue, Dec 13, 2022 at 02:57:54PM +0800, Jason Wang wrote:
> On Tue, Dec 13, 2022 at 2:38 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Tue, Dec 13, 2022 at 11:43:36AM +0800, Jason Wang wrote:
> > > On Tue, Dec 13, 2022 at 11:38 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > On Mon, 12 Dec 2022 04:25:22 -0500, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > On Mon, Dec 12, 2022 at 05:10:29PM +0800, Jason Wang wrote:
> > > > > > Commit a7766ef18b33("virtio_net: disable cb aggressively") enables
> > > > > > virtqueue callback via the following statement:
> > > > > >
> > > > > >         do {
> > > > > >            ......
> > > > > >     } while (use_napi && kick &&
> > > > > >                unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
> > > > > >
> > > > > > This will cause a missing call to virtqueue_enable_cb_delayed() when
> > > > > > kick is false. Fixing this by removing the checking of the kick from
> > > > > > the condition to make sure callback is enabled correctly.
> > > > > >
> > > > > > Fixes: a7766ef18b33 ("virtio_net: disable cb aggressively")
> > > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > > > ---
> > > > > > The patch is needed for -stable.
> > > > >
> > > > > stable rules don't allow for theoretical fixes. Was a problem observed?
> > >
> > > Yes, running a pktgen sample script can lead to a tx timeout.
> >
> > Since April 2021 and we only noticed now? Are you sure it's the
> > right Fixes tag?
> 
> Well, reverting a7766ef18b33 makes pktgen work again.
> 
> The reason we doesn't notice is probably because:
> 
> 1) We don't support BQL, so no bulk dequeuing (skb list) in normal traffic
> 2) When burst is enabled for pktgen, it can do bulk xmit via skb list by its own
> 
> >
> > > > >
> > > > > > ---
> > > > > >  drivers/net/virtio_net.c | 4 ++--
> > > > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > index 86e52454b5b5..44d7daf0267b 100644
> > > > > > --- a/drivers/net/virtio_net.c
> > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > @@ -1834,8 +1834,8 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> > > > > >
> > > > > >             free_old_xmit_skbs(sq, false);
> > > > > >
> > > > > > -   } while (use_napi && kick &&
> > > > > > -          unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
> > > > > > +   } while (use_napi &&
> > > > > > +            unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
> > > > > >
> > > > >
> > > > > A bit more explanation pls.  kick simply means !netdev_xmit_more -
> > > > > if it's false we know there will be another packet, then transmissing
> > > > > that packet will invoke virtqueue_enable_cb_delayed. No?
> > > >
> > > > It's just that there may be a next packet, but in fact there may not be.
> > > > For example, the vq is full, and the driver stops the queue.
> > >
> > > Exactly, when the queue is about to be full we disable tx and wait for
> > > the next tx interrupt to re-enable tx.
> > >
> > > Thanks
> >
> > OK, it's a good idea to document that.
> 
> Will do.
> 
> > And we should enable callbacks at that point, not here on data path.
> 
> I'm not sure I understand here. Are you suggesting removing the
> !user_napi check here?
> 
>                 if (!use_napi &&
>                     unlikely(!virtqueue_enable_cb_delayed(sq->vq))) {
>                         /* More just got used, free them then recheck. */
>                         free_old_xmit_skbs(sq, false);
>                         if (sq->vq->num_free >= 2+MAX_SKB_FRAGS) {
>                                 netif_start_subqueue(dev, qnum);
>                                 virtqueue_disable_cb(sq->vq);
>                         }
>                 }


At least, I suggest calling virtqueue_enable_cb_delayed around
this area of code. I have not really thought all this path through
and how all the corner cases interact.



> Btw, it doesn't differ too much as kick is always true without pktgen
> and that may even need more comments or make the code even harder to
> read. We need a patch for -stable at least so I prefer to let this
> patch go first and do optimization on top.
> 
> Thanks

There's a chance of perf regression here too.  Let's write the full
patch first of all. If you want to make it a 2 patch series that is fine
but it is here since 2021 I don't see why we should rush a fix. Worry
about backporting later.

> >
> >
> > > >
> > > > Thanks.
> > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > >     /* timestamp packet in software */
> > > > > >     skb_tx_timestamp(skb);
> > > > > > --
> > > > > > 2.25.1
> > > > >
> > > > > _______________________________________________
> > > > > Virtualization mailing list
> > > > > Virtualization@lists.linux-foundation.org
> > > > > https://lists.linuxfoundation.org/mailman/listinfo/virtualization
> > > >
> >