[PATCH net-next v5 3/8] TUN, TAP & vhost_net: Stop netdev queue before reaching a full ptr_ring

Simon Schippers posted 8 patches 1 week, 2 days ago
There is a newer version of this series
[PATCH net-next v5 3/8] TUN, TAP & vhost_net: Stop netdev queue before reaching a full ptr_ring
Posted by Simon Schippers 1 week, 2 days ago
Stop the netdev queue ahead of __ptr_ring_produce when
__ptr_ring_full_next signals the ring is about to fill. Due to the
smp_wmb() of __ptr_ring_produce the consumer is guaranteed to be able to
notice the stopped netdev queue after seeing the new ptr_ring entry. As
both __ptr_ring_full_next and __ptr_ring_produce need the producer_lock,
the lock is held during the execution of both methods.

dev->lltx is disabled to ensure that tun_net_xmit is not called even
though the netdev queue is stopped (which happened in my testing,
resulting in rare packet drops). Consequently, the update of trans_start
in tun_net_xmit is also removed.

Co-developed-by: Tim Gebauer <tim.gebauer@tu-dortmund.de>
Signed-off-by: Tim Gebauer <tim.gebauer@tu-dortmund.de>
Signed-off-by: Simon Schippers <simon.schippers@tu-dortmund.de>
---
 drivers/net/tun.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 86a9e927d0ff..c6b22af9bae8 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -931,7 +931,7 @@ static int tun_net_init(struct net_device *dev)
 	dev->vlan_features = dev->features &
 			     ~(NETIF_F_HW_VLAN_CTAG_TX |
 			       NETIF_F_HW_VLAN_STAG_TX);
-	dev->lltx = true;
+	dev->lltx = false;
 
 	tun->flags = (tun->flags & ~TUN_FEATURES) |
 		      (ifr->ifr_flags & TUN_FEATURES);
@@ -1060,14 +1060,18 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
 
 	nf_reset_ct(skb);
 
-	if (ptr_ring_produce(&tfile->tx_ring, skb)) {
+	queue = netdev_get_tx_queue(dev, txq);
+
+	spin_lock(&tfile->tx_ring.producer_lock);
+	if (__ptr_ring_full_next(&tfile->tx_ring))
+		netif_tx_stop_queue(queue);
+
+	if (unlikely(__ptr_ring_produce(&tfile->tx_ring, skb))) {
+		spin_unlock(&tfile->tx_ring.producer_lock);
 		drop_reason = SKB_DROP_REASON_FULL_RING;
 		goto drop;
 	}
-
-	/* dev->lltx requires to do our own update of trans_start */
-	queue = netdev_get_tx_queue(dev, txq);
-	txq_trans_cond_update(queue);
+	spin_unlock(&tfile->tx_ring.producer_lock);
 
 	/* Notify and wake up reader process */
 	if (tfile->flags & TUN_FASYNC)
-- 
2.43.0
Re: [PATCH net-next v5 3/8] TUN, TAP & vhost_net: Stop netdev queue before reaching a full ptr_ring
Posted by Michael S. Tsirkin 1 week, 1 day ago
On Tue, Sep 23, 2025 at 12:15:48AM +0200, Simon Schippers wrote:
> Stop the netdev queue ahead of __ptr_ring_produce when
> __ptr_ring_full_next signals the ring is about to fill. Due to the
> smp_wmb() of __ptr_ring_produce the consumer is guaranteed to be able to
> notice the stopped netdev queue after seeing the new ptr_ring entry. As
> both __ptr_ring_full_next and __ptr_ring_produce need the producer_lock,
> the lock is held during the execution of both methods.
> 
> dev->lltx is disabled to ensure that tun_net_xmit is not called even
> though the netdev queue is stopped (which happened in my testing,
> resulting in rare packet drops). Consequently, the update of trans_start
> in tun_net_xmit is also removed.
> 
> Co-developed-by: Tim Gebauer <tim.gebauer@tu-dortmund.de>
> Signed-off-by: Tim Gebauer <tim.gebauer@tu-dortmund.de>
> Signed-off-by: Simon Schippers <simon.schippers@tu-dortmund.de>
> ---
>  drivers/net/tun.c | 16 ++++++++++------
>  1 file changed, 10 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 86a9e927d0ff..c6b22af9bae8 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -931,7 +931,7 @@ static int tun_net_init(struct net_device *dev)
>  	dev->vlan_features = dev->features &
>  			     ~(NETIF_F_HW_VLAN_CTAG_TX |
>  			       NETIF_F_HW_VLAN_STAG_TX);
> -	dev->lltx = true;
> +	dev->lltx = false;
>  
>  	tun->flags = (tun->flags & ~TUN_FEATURES) |
>  		      (ifr->ifr_flags & TUN_FEATURES);
> @@ -1060,14 +1060,18 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
>  
>  	nf_reset_ct(skb);
>  
> -	if (ptr_ring_produce(&tfile->tx_ring, skb)) {
> +	queue = netdev_get_tx_queue(dev, txq);
> +
> +	spin_lock(&tfile->tx_ring.producer_lock);
> +	if (__ptr_ring_full_next(&tfile->tx_ring))
> +		netif_tx_stop_queue(queue);
> +
> +	if (unlikely(__ptr_ring_produce(&tfile->tx_ring, skb))) {
> +		spin_unlock(&tfile->tx_ring.producer_lock);
>  		drop_reason = SKB_DROP_REASON_FULL_RING;
>  		goto drop;
>  	}

The comment makes it sound like you always keep one slot free
in the queue but that is not the case - you just
check before calling __ptr_ring_produce.


But it is racy isn't it? So first of all I suspect you
are missing an mb before netif_tx_stop_queue.

Second it's racy because more entries can get freed
afterwards. Which maybe is ok in this instance?
But it really should be explained in more detail, if so.



Now - why not just check ring full *after* __ptr_ring_produce?
Why do we need all these new APIs, and we can
use existing ones which at least are not so hard to understand.




> -
> -	/* dev->lltx requires to do our own update of trans_start */
> -	queue = netdev_get_tx_queue(dev, txq);
> -	txq_trans_cond_update(queue);
> +	spin_unlock(&tfile->tx_ring.producer_lock);
>  
>  	/* Notify and wake up reader process */
>  	if (tfile->flags & TUN_FASYNC)
> -- 
> 2.43.0
[PATCH net-next v5 3/8] TUN, TAP & vhost_net: Stop netdev queue before reaching a full ptr_ring
Posted by Simon Schippers 1 week ago
Hi,
first of all thank you very much for your detailed replies! :)

On 23.09.25 16:47, Michael S. Tsirkin wrote:
> On Tue, Sep 23, 2025 at 12:15:48AM +0200, Simon Schippers wrote:
>> Stop the netdev queue ahead of __ptr_ring_produce when
>> __ptr_ring_full_next signals the ring is about to fill. Due to the
>> smp_wmb() of __ptr_ring_produce the consumer is guaranteed to be able to
>> notice the stopped netdev queue after seeing the new ptr_ring entry. As
>> both __ptr_ring_full_next and __ptr_ring_produce need the producer_lock,
>> the lock is held during the execution of both methods.
>>
>> dev->lltx is disabled to ensure that tun_net_xmit is not called even
>> though the netdev queue is stopped (which happened in my testing,
>> resulting in rare packet drops). Consequently, the update of trans_start
>> in tun_net_xmit is also removed.
>>
>> Co-developed-by: Tim Gebauer <tim.gebauer@tu-dortmund.de>
>> Signed-off-by: Tim Gebauer <tim.gebauer@tu-dortmund.de>
>> Signed-off-by: Simon Schippers <simon.schippers@tu-dortmund.de>
>> ---
>>  drivers/net/tun.c | 16 ++++++++++------
>>  1 file changed, 10 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>> index 86a9e927d0ff..c6b22af9bae8 100644
>> --- a/drivers/net/tun.c
>> +++ b/drivers/net/tun.c
>> @@ -931,7 +931,7 @@ static int tun_net_init(struct net_device *dev)
>>  	dev->vlan_features = dev->features &
>>  			     ~(NETIF_F_HW_VLAN_CTAG_TX |
>>  			       NETIF_F_HW_VLAN_STAG_TX);
>> -	dev->lltx = true;
>> +	dev->lltx = false;
>>  
>>  	tun->flags = (tun->flags & ~TUN_FEATURES) |
>>  		      (ifr->ifr_flags & TUN_FEATURES);
>> @@ -1060,14 +1060,18 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
>>  
>>  	nf_reset_ct(skb);
>>  
>> -	if (ptr_ring_produce(&tfile->tx_ring, skb)) {
>> +	queue = netdev_get_tx_queue(dev, txq);
>> +
>> +	spin_lock(&tfile->tx_ring.producer_lock);
>> +	if (__ptr_ring_full_next(&tfile->tx_ring))
>> +		netif_tx_stop_queue(queue);
>> +
>> +	if (unlikely(__ptr_ring_produce(&tfile->tx_ring, skb))) {
>> +		spin_unlock(&tfile->tx_ring.producer_lock);
>>  		drop_reason = SKB_DROP_REASON_FULL_RING;
>>  		goto drop;
>>  	}
> 
> The comment makes it sound like you always keep one slot free
> in the queue but that is not the case - you just
> check before calling __ptr_ring_produce.
> 

I agree.

> 
> But it is racy isn't it? So first of all I suspect you
> are missing an mb before netif_tx_stop_queue.
> 

I don’t really get this point right now.

> Second it's racy because more entries can get freed
> afterwards. Which maybe is ok in this instance?
> But it really should be explained in more detail, if so.
> 

Will be covered in the next mail.

> 
> 
> Now - why not just check ring full *after* __ptr_ring_produce?
> Why do we need all these new APIs, and we can
> use existing ones which at least are not so hard to understand.
> 
> 

You convinced me about changing my implementation anyway but here my (old) 
idea:
I did this in V1-V4. The problem is that vhost_net is only called on 
EPOLLIN triggered by tun_net_xmit. Then, after consuming a batch from the 
ptr_ring, it must be able to see if the netdev queue stopped or not. If 
this is not the case the ptr_ring might get empty and vhost_net is not 
able to wake the queue again (because it is not stopped from its POV), 
which happened in my testing in my V4.

This is the reason why, now in the V5, in tun_net_xmit I stop the netdev 
queue before producing. With that I exploit the smp_wmb() in 
__ptr_ring_produce which is paired with the READ_ONCE in __ptr_ring_peek 
to ensure that the consumer in vhost_net sees that the netdev queue 
stopped after consuming a batch.

> 
> 
>> -
>> -	/* dev->lltx requires to do our own update of trans_start */
>> -	queue = netdev_get_tx_queue(dev, txq);
>> -	txq_trans_cond_update(queue);
>> +	spin_unlock(&tfile->tx_ring.producer_lock);
>>  
>>  	/* Notify and wake up reader process */
>>  	if (tfile->flags & TUN_FASYNC)
>> -- 
>> 2.43.0
> 
Re: [PATCH net-next v5 3/8] TUN, TAP & vhost_net: Stop netdev queue before reaching a full ptr_ring
Posted by Michael S. Tsirkin 1 week ago
On Wed, Sep 24, 2025 at 07:41:28AM +0200, Simon Schippers wrote:
> Hi,
> first of all thank you very much for your detailed replies! :)
> 
> On 23.09.25 16:47, Michael S. Tsirkin wrote:
> > On Tue, Sep 23, 2025 at 12:15:48AM +0200, Simon Schippers wrote:
> >> Stop the netdev queue ahead of __ptr_ring_produce when
> >> __ptr_ring_full_next signals the ring is about to fill. Due to the
> >> smp_wmb() of __ptr_ring_produce the consumer is guaranteed to be able to
> >> notice the stopped netdev queue after seeing the new ptr_ring entry. As
> >> both __ptr_ring_full_next and __ptr_ring_produce need the producer_lock,
> >> the lock is held during the execution of both methods.
> >>
> >> dev->lltx is disabled to ensure that tun_net_xmit is not called even
> >> though the netdev queue is stopped (which happened in my testing,
> >> resulting in rare packet drops). Consequently, the update of trans_start
> >> in tun_net_xmit is also removed.
> >>
> >> Co-developed-by: Tim Gebauer <tim.gebauer@tu-dortmund.de>
> >> Signed-off-by: Tim Gebauer <tim.gebauer@tu-dortmund.de>
> >> Signed-off-by: Simon Schippers <simon.schippers@tu-dortmund.de>
> >> ---
> >>  drivers/net/tun.c | 16 ++++++++++------
> >>  1 file changed, 10 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> >> index 86a9e927d0ff..c6b22af9bae8 100644
> >> --- a/drivers/net/tun.c
> >> +++ b/drivers/net/tun.c
> >> @@ -931,7 +931,7 @@ static int tun_net_init(struct net_device *dev)
> >>  	dev->vlan_features = dev->features &
> >>  			     ~(NETIF_F_HW_VLAN_CTAG_TX |
> >>  			       NETIF_F_HW_VLAN_STAG_TX);
> >> -	dev->lltx = true;
> >> +	dev->lltx = false;
> >>  
> >>  	tun->flags = (tun->flags & ~TUN_FEATURES) |
> >>  		      (ifr->ifr_flags & TUN_FEATURES);
> >> @@ -1060,14 +1060,18 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
> >>  
> >>  	nf_reset_ct(skb);
> >>  
> >> -	if (ptr_ring_produce(&tfile->tx_ring, skb)) {
> >> +	queue = netdev_get_tx_queue(dev, txq);
> >> +
> >> +	spin_lock(&tfile->tx_ring.producer_lock);
> >> +	if (__ptr_ring_full_next(&tfile->tx_ring))
> >> +		netif_tx_stop_queue(queue);
> >> +
> >> +	if (unlikely(__ptr_ring_produce(&tfile->tx_ring, skb))) {
> >> +		spin_unlock(&tfile->tx_ring.producer_lock);
> >>  		drop_reason = SKB_DROP_REASON_FULL_RING;
> >>  		goto drop;
> >>  	}
> > 
> > The comment makes it sound like you always keep one slot free
> > in the queue but that is not the case - you just
> > check before calling __ptr_ring_produce.
> > 
> 
> I agree.
> 
> > 
> > But it is racy isn't it? So first of all I suspect you
> > are missing an mb before netif_tx_stop_queue.
> > 
> 
> I don’t really get this point right now.

ring full next is a read. stop queue is a write. if you are
relying on ordering them in some way you need a full mb generally.




> > Second it's racy because more entries can get freed
> > afterwards. Which maybe is ok in this instance?
> > But it really should be explained in more detail, if so.
> > 
> 
> Will be covered in the next mail.
> 
> > 
> > 
> > Now - why not just check ring full *after* __ptr_ring_produce?
> > Why do we need all these new APIs, and we can
> > use existing ones which at least are not so hard to understand.
> > 
> > 
> 
> You convinced me about changing my implementation anyway but here my (old) 
> idea:
> I did this in V1-V4. The problem is that vhost_net is only called on 
> EPOLLIN triggered by tun_net_xmit. Then, after consuming a batch from the 
> ptr_ring, it must be able to see if the netdev queue stopped or not. If 
> this is not the case the ptr_ring might get empty and vhost_net is not 
> able to wake the queue again (because it is not stopped from its POV), 
> which happened in my testing in my V4.
> 
> This is the reason why, now in the V5, in tun_net_xmit I stop the netdev 
> queue before producing. With that I exploit the smp_wmb() in 
> __ptr_ring_produce which is paired with the READ_ONCE in __ptr_ring_peek 
> to ensure that the consumer in vhost_net sees that the netdev queue 
> stopped after consuming a batch.

yea you said it somewhere in code, too, and I am not sure I understand it all, but
wmb isn't paired with READ_ONCE generally. barrier pairing
is described in memory-barriers.txt, READ_ONCE is not a barrier
at all.

> > 
> > 
> >> -
> >> -	/* dev->lltx requires to do our own update of trans_start */
> >> -	queue = netdev_get_tx_queue(dev, txq);
> >> -	txq_trans_cond_update(queue);
> >> +	spin_unlock(&tfile->tx_ring.producer_lock);
> >>  
> >>  	/* Notify and wake up reader process */
> >>  	if (tfile->flags & TUN_FASYNC)
> >> -- 
> >> 2.43.0
> >