Stop the netdev queue ahead of __ptr_ring_produce when
__ptr_ring_full_next signals the ring is about to fill. Due to the
smp_wmb() of __ptr_ring_produce the consumer is guaranteed to be able to
notice the stopped netdev queue after seeing the new ptr_ring entry. As
both __ptr_ring_full_next and __ptr_ring_produce need the producer_lock,
the lock is held during the execution of both methods.
dev->lltx is disabled to ensure that tun_net_xmit is not called even
though the netdev queue is stopped (which happened in my testing,
resulting in rare packet drops). Consequently, the update of trans_start
in tun_net_xmit is also removed.
Co-developed-by: Tim Gebauer <tim.gebauer@tu-dortmund.de>
Signed-off-by: Tim Gebauer <tim.gebauer@tu-dortmund.de>
Signed-off-by: Simon Schippers <simon.schippers@tu-dortmund.de>
---
drivers/net/tun.c | 16 ++++++++++------
1 file changed, 10 insertions(+), 6 deletions(-)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 86a9e927d0ff..c6b22af9bae8 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -931,7 +931,7 @@ static int tun_net_init(struct net_device *dev)
dev->vlan_features = dev->features &
~(NETIF_F_HW_VLAN_CTAG_TX |
NETIF_F_HW_VLAN_STAG_TX);
- dev->lltx = true;
+ dev->lltx = false;
tun->flags = (tun->flags & ~TUN_FEATURES) |
(ifr->ifr_flags & TUN_FEATURES);
@@ -1060,14 +1060,18 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
nf_reset_ct(skb);
- if (ptr_ring_produce(&tfile->tx_ring, skb)) {
+ queue = netdev_get_tx_queue(dev, txq);
+
+ spin_lock(&tfile->tx_ring.producer_lock);
+ if (__ptr_ring_full_next(&tfile->tx_ring))
+ netif_tx_stop_queue(queue);
+
+ if (unlikely(__ptr_ring_produce(&tfile->tx_ring, skb))) {
+ spin_unlock(&tfile->tx_ring.producer_lock);
drop_reason = SKB_DROP_REASON_FULL_RING;
goto drop;
}
-
- /* dev->lltx requires to do our own update of trans_start */
- queue = netdev_get_tx_queue(dev, txq);
- txq_trans_cond_update(queue);
+ spin_unlock(&tfile->tx_ring.producer_lock);
/* Notify and wake up reader process */
if (tfile->flags & TUN_FASYNC)
--
2.43.0
On Tue, Sep 23, 2025 at 12:15:48AM +0200, Simon Schippers wrote: > Stop the netdev queue ahead of __ptr_ring_produce when > __ptr_ring_full_next signals the ring is about to fill. Due to the > smp_wmb() of __ptr_ring_produce the consumer is guaranteed to be able to > notice the stopped netdev queue after seeing the new ptr_ring entry. As > both __ptr_ring_full_next and __ptr_ring_produce need the producer_lock, > the lock is held during the execution of both methods. > > dev->lltx is disabled to ensure that tun_net_xmit is not called even > though the netdev queue is stopped (which happened in my testing, > resulting in rare packet drops). Consequently, the update of trans_start > in tun_net_xmit is also removed. > > Co-developed-by: Tim Gebauer <tim.gebauer@tu-dortmund.de> > Signed-off-by: Tim Gebauer <tim.gebauer@tu-dortmund.de> > Signed-off-by: Simon Schippers <simon.schippers@tu-dortmund.de> > --- > drivers/net/tun.c | 16 ++++++++++------ > 1 file changed, 10 insertions(+), 6 deletions(-) > > diff --git a/drivers/net/tun.c b/drivers/net/tun.c > index 86a9e927d0ff..c6b22af9bae8 100644 > --- a/drivers/net/tun.c > +++ b/drivers/net/tun.c > @@ -931,7 +931,7 @@ static int tun_net_init(struct net_device *dev) > dev->vlan_features = dev->features & > ~(NETIF_F_HW_VLAN_CTAG_TX | > NETIF_F_HW_VLAN_STAG_TX); > - dev->lltx = true; > + dev->lltx = false; > > tun->flags = (tun->flags & ~TUN_FEATURES) | > (ifr->ifr_flags & TUN_FEATURES); > @@ -1060,14 +1060,18 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev) > > nf_reset_ct(skb); > > - if (ptr_ring_produce(&tfile->tx_ring, skb)) { > + queue = netdev_get_tx_queue(dev, txq); > + > + spin_lock(&tfile->tx_ring.producer_lock); > + if (__ptr_ring_full_next(&tfile->tx_ring)) > + netif_tx_stop_queue(queue); > + > + if (unlikely(__ptr_ring_produce(&tfile->tx_ring, skb))) { > + spin_unlock(&tfile->tx_ring.producer_lock); > drop_reason = SKB_DROP_REASON_FULL_RING; > goto drop; > } The comment makes it sound like you always keep one slot free in the queue but that is not the case - you just check before calling __ptr_ring_produce. But it is racy isn't it? So first of all I suspect you are missing an mb before netif_tx_stop_queue. Second it's racy because more entries can get freed afterwards. Which maybe is ok in this instance? But it really should be explained in more detail, if so. Now - why not just check ring full *after* __ptr_ring_produce? Why do we need all these new APIs, and we can use existing ones which at least are not so hard to understand. > - > - /* dev->lltx requires to do our own update of trans_start */ > - queue = netdev_get_tx_queue(dev, txq); > - txq_trans_cond_update(queue); > + spin_unlock(&tfile->tx_ring.producer_lock); > > /* Notify and wake up reader process */ > if (tfile->flags & TUN_FASYNC) > -- > 2.43.0
Hi, first of all thank you very much for your detailed replies! :) On 23.09.25 16:47, Michael S. Tsirkin wrote: > On Tue, Sep 23, 2025 at 12:15:48AM +0200, Simon Schippers wrote: >> Stop the netdev queue ahead of __ptr_ring_produce when >> __ptr_ring_full_next signals the ring is about to fill. Due to the >> smp_wmb() of __ptr_ring_produce the consumer is guaranteed to be able to >> notice the stopped netdev queue after seeing the new ptr_ring entry. As >> both __ptr_ring_full_next and __ptr_ring_produce need the producer_lock, >> the lock is held during the execution of both methods. >> >> dev->lltx is disabled to ensure that tun_net_xmit is not called even >> though the netdev queue is stopped (which happened in my testing, >> resulting in rare packet drops). Consequently, the update of trans_start >> in tun_net_xmit is also removed. >> >> Co-developed-by: Tim Gebauer <tim.gebauer@tu-dortmund.de> >> Signed-off-by: Tim Gebauer <tim.gebauer@tu-dortmund.de> >> Signed-off-by: Simon Schippers <simon.schippers@tu-dortmund.de> >> --- >> drivers/net/tun.c | 16 ++++++++++------ >> 1 file changed, 10 insertions(+), 6 deletions(-) >> >> diff --git a/drivers/net/tun.c b/drivers/net/tun.c >> index 86a9e927d0ff..c6b22af9bae8 100644 >> --- a/drivers/net/tun.c >> +++ b/drivers/net/tun.c >> @@ -931,7 +931,7 @@ static int tun_net_init(struct net_device *dev) >> dev->vlan_features = dev->features & >> ~(NETIF_F_HW_VLAN_CTAG_TX | >> NETIF_F_HW_VLAN_STAG_TX); >> - dev->lltx = true; >> + dev->lltx = false; >> >> tun->flags = (tun->flags & ~TUN_FEATURES) | >> (ifr->ifr_flags & TUN_FEATURES); >> @@ -1060,14 +1060,18 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev) >> >> nf_reset_ct(skb); >> >> - if (ptr_ring_produce(&tfile->tx_ring, skb)) { >> + queue = netdev_get_tx_queue(dev, txq); >> + >> + spin_lock(&tfile->tx_ring.producer_lock); >> + if (__ptr_ring_full_next(&tfile->tx_ring)) >> + netif_tx_stop_queue(queue); >> + >> + if (unlikely(__ptr_ring_produce(&tfile->tx_ring, skb))) { >> + spin_unlock(&tfile->tx_ring.producer_lock); >> drop_reason = SKB_DROP_REASON_FULL_RING; >> goto drop; >> } > > The comment makes it sound like you always keep one slot free > in the queue but that is not the case - you just > check before calling __ptr_ring_produce. > I agree. > > But it is racy isn't it? So first of all I suspect you > are missing an mb before netif_tx_stop_queue. > I don’t really get this point right now. > Second it's racy because more entries can get freed > afterwards. Which maybe is ok in this instance? > But it really should be explained in more detail, if so. > Will be covered in the next mail. > > > Now - why not just check ring full *after* __ptr_ring_produce? > Why do we need all these new APIs, and we can > use existing ones which at least are not so hard to understand. > > You convinced me about changing my implementation anyway but here my (old) idea: I did this in V1-V4. The problem is that vhost_net is only called on EPOLLIN triggered by tun_net_xmit. Then, after consuming a batch from the ptr_ring, it must be able to see if the netdev queue stopped or not. If this is not the case the ptr_ring might get empty and vhost_net is not able to wake the queue again (because it is not stopped from its POV), which happened in my testing in my V4. This is the reason why, now in the V5, in tun_net_xmit I stop the netdev queue before producing. With that I exploit the smp_wmb() in __ptr_ring_produce which is paired with the READ_ONCE in __ptr_ring_peek to ensure that the consumer in vhost_net sees that the netdev queue stopped after consuming a batch. > > >> - >> - /* dev->lltx requires to do our own update of trans_start */ >> - queue = netdev_get_tx_queue(dev, txq); >> - txq_trans_cond_update(queue); >> + spin_unlock(&tfile->tx_ring.producer_lock); >> >> /* Notify and wake up reader process */ >> if (tfile->flags & TUN_FASYNC) >> -- >> 2.43.0 >
On Wed, Sep 24, 2025 at 07:41:28AM +0200, Simon Schippers wrote: > Hi, > first of all thank you very much for your detailed replies! :) > > On 23.09.25 16:47, Michael S. Tsirkin wrote: > > On Tue, Sep 23, 2025 at 12:15:48AM +0200, Simon Schippers wrote: > >> Stop the netdev queue ahead of __ptr_ring_produce when > >> __ptr_ring_full_next signals the ring is about to fill. Due to the > >> smp_wmb() of __ptr_ring_produce the consumer is guaranteed to be able to > >> notice the stopped netdev queue after seeing the new ptr_ring entry. As > >> both __ptr_ring_full_next and __ptr_ring_produce need the producer_lock, > >> the lock is held during the execution of both methods. > >> > >> dev->lltx is disabled to ensure that tun_net_xmit is not called even > >> though the netdev queue is stopped (which happened in my testing, > >> resulting in rare packet drops). Consequently, the update of trans_start > >> in tun_net_xmit is also removed. > >> > >> Co-developed-by: Tim Gebauer <tim.gebauer@tu-dortmund.de> > >> Signed-off-by: Tim Gebauer <tim.gebauer@tu-dortmund.de> > >> Signed-off-by: Simon Schippers <simon.schippers@tu-dortmund.de> > >> --- > >> drivers/net/tun.c | 16 ++++++++++------ > >> 1 file changed, 10 insertions(+), 6 deletions(-) > >> > >> diff --git a/drivers/net/tun.c b/drivers/net/tun.c > >> index 86a9e927d0ff..c6b22af9bae8 100644 > >> --- a/drivers/net/tun.c > >> +++ b/drivers/net/tun.c > >> @@ -931,7 +931,7 @@ static int tun_net_init(struct net_device *dev) > >> dev->vlan_features = dev->features & > >> ~(NETIF_F_HW_VLAN_CTAG_TX | > >> NETIF_F_HW_VLAN_STAG_TX); > >> - dev->lltx = true; > >> + dev->lltx = false; > >> > >> tun->flags = (tun->flags & ~TUN_FEATURES) | > >> (ifr->ifr_flags & TUN_FEATURES); > >> @@ -1060,14 +1060,18 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev) > >> > >> nf_reset_ct(skb); > >> > >> - if (ptr_ring_produce(&tfile->tx_ring, skb)) { > >> + queue = netdev_get_tx_queue(dev, txq); > >> + > >> + spin_lock(&tfile->tx_ring.producer_lock); > >> + if (__ptr_ring_full_next(&tfile->tx_ring)) > >> + netif_tx_stop_queue(queue); > >> + > >> + if (unlikely(__ptr_ring_produce(&tfile->tx_ring, skb))) { > >> + spin_unlock(&tfile->tx_ring.producer_lock); > >> drop_reason = SKB_DROP_REASON_FULL_RING; > >> goto drop; > >> } > > > > The comment makes it sound like you always keep one slot free > > in the queue but that is not the case - you just > > check before calling __ptr_ring_produce. > > > > I agree. > > > > > But it is racy isn't it? So first of all I suspect you > > are missing an mb before netif_tx_stop_queue. > > > > I don’t really get this point right now. ring full next is a read. stop queue is a write. if you are relying on ordering them in some way you need a full mb generally. > > Second it's racy because more entries can get freed > > afterwards. Which maybe is ok in this instance? > > But it really should be explained in more detail, if so. > > > > Will be covered in the next mail. > > > > > > > Now - why not just check ring full *after* __ptr_ring_produce? > > Why do we need all these new APIs, and we can > > use existing ones which at least are not so hard to understand. > > > > > > You convinced me about changing my implementation anyway but here my (old) > idea: > I did this in V1-V4. The problem is that vhost_net is only called on > EPOLLIN triggered by tun_net_xmit. Then, after consuming a batch from the > ptr_ring, it must be able to see if the netdev queue stopped or not. If > this is not the case the ptr_ring might get empty and vhost_net is not > able to wake the queue again (because it is not stopped from its POV), > which happened in my testing in my V4. > > This is the reason why, now in the V5, in tun_net_xmit I stop the netdev > queue before producing. With that I exploit the smp_wmb() in > __ptr_ring_produce which is paired with the READ_ONCE in __ptr_ring_peek > to ensure that the consumer in vhost_net sees that the netdev queue > stopped after consuming a batch. yea you said it somewhere in code, too, and I am not sure I understand it all, but wmb isn't paired with READ_ONCE generally. barrier pairing is described in memory-barriers.txt, READ_ONCE is not a barrier at all. > > > > > >> - > >> - /* dev->lltx requires to do our own update of trans_start */ > >> - queue = netdev_get_tx_queue(dev, txq); > >> - txq_trans_cond_update(queue); > >> + spin_unlock(&tfile->tx_ring.producer_lock); > >> > >> /* Notify and wake up reader process */ > >> if (tfile->flags & TUN_FASYNC) > >> -- > >> 2.43.0 > >
© 2016 - 2025 Red Hat, Inc.